Dynamic Ranking Algorithm Using Multi Graph Technology

7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

1/9

International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

ISSN: 1837-7823

36

Dynamic Ranking Algorithm Using Multi Graph Technology

S.N.Sheela Evangelin Prasad1

and Dr.M.V.Srinath2

1Associate Professor CSE Dept, Sri Krishna Engineering College, Chennai

2Director STET Womens College, Mannargudi.

Abstract

Dynamic Ranking is a system that approximates object rank results by utilizing a hybrid

approach inspired by materialized views in traditional query processing. Number of relativelysmall subsets of the multi graph are materialized in such a way that any keyword query can be

answered by running Object Rank on only one of the multi graph. Dynamic ranking generates

the multi graphs by partitioning all the terms in the corpus based on their co-occurrence,executing Object Rank for each partition using the terms to generate a set of random walkstarting points, and keeping only those objects that receive non-negligible scores. The intuition is

that a multi graph that contains all objects and links relevant to a set of related terms should have

all the information needed to rank objects with respect to one of these terms. We present atheoretically well-founded retrieval model for dynamically generating rankings based on

interactive user feedback. Unlike conventional rankings that remain static after the query was

issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine theotherwise contradictory goals of result diversification and high recall.

Keywords : Object Rank, Page Rank, Dynamic Rank, Multi graph

I Introduction

Object Rank is a system to perform authority-based keyword search on databases, inspired by

Page Rank. Page Rank is an excellent tool to rank the global importance of the pages of the

Web, proven by the success of Google. However, Google uses Page Rank as a tool to measurethe global importance of the pages, independently of a keyword query. Google uses traditional

IR techniques to estimate the relevance of a page to a keyword query, which is then combined

with the Page Rank value to calculate the final score of a page. We appropriately extend andmodify Page Rank to perform keyword search on databases. For example, consider the

publications database of Figure 1, where edges denote citations (edges start from citing and endat cited paper), and the keyword query Sorting. Then, using the original variant of Object Rank,

theAccess Path Selection in a Relational Database Management System paper would be rankedhighest, because it is cited by four papers containing sorting (or sort). The Fundamental

Techniques for Order Optimization paper would be ranked second, since it is cited by only

three sorting papers. The Page Rank algorithm utilizes the Web graph link structure to assignglobal importance to Web pages. It works by modeling the behavior of a random Web surfer


2/9


ISSN: 1837-7823

37

who starts at a random Web page and follows outgoing links with uniform probability. Dynamic

versions of the Page Rank algorithm, Personalized Page Rank (PPR) for Web graph datasets andObject Rank for graph-modeled databases have become popular which are characterized by a

query-specific choice of the random walk starting points. PPR is a modification of Page Rank

that performs search personalized on a preference set that contains web pages that a user likes.

Object Rank extends (personalized) Page Rank to perform keyword search in databases. ObjectRank uses a query term posting list as a set of random walk starting points and conducts the walk

on the instance graph of the database. Object Rank has successfully been applied to databasesthat have social networking components, such as bibliographic data and collaborative product

design. Object Rank suffers from the same scalability issues as personalized Page Rank, as it

requires multiple iterations over all nodes and links of the entire database graph.

Fig.1

II Dynamic Ranking

Web documents are dynamic. Newspaper homepages such as the The Hindu change several

times a day, market pace sites such as amazon can change many times an hour and blogs are

updated with varying frequencies when new posts and comments are added. Some of thesechanges are substantial and significant for information seekers- new stories appearing on a

homepage or new comments to a blog post. Others hold less interest for those looking for

information- visitation counters, advertisement content, or formatting changes have little impacton the page content.


3/9


ISSN: 1837-7823

38

Currently, document ranking algorithms only have a static view of the page content. In this work

we explore the interaction between the dynamics of web documents and relevance ranking, usingdocument representations that view a document as a dynamic entity. We focus specifically on

navigational searches, where there is very little variation across users on the clicked results, and

there tend to be a small number of highly relevant documents that are consistently relevant

across time. We find that, for these queries, there are significant relationships between thelikelihood of change and the relevance level of the page. We develop a novel probabilistic

retrieval model which takes into account dynamic content, and show significant performanceimprovements over a model that only views a document at a single point in time. To our

knowledge, this is the first published study looking at content change within documents from a

relevance ranking perspective.

III Document Dynamics and Relevance

Documents change for many reasons. The Hindu pages change whenever new stories are addedor old stories are updated, amazon when new classified ads are added, and academics' home

pages when new papers are published. All of these pages change at different frequencies and indifferent amounts. In this section we provide some examples and intuitions about how suchchange may be used to improve relevance ranking. We examine two change features: (1) a

query-relevant feature reflecting how the terms on a page (in particular those that match the

query) change over time, and (2) a query-independent feature reflecting how frequently or byhow much the page changes over time. Different terms in a page's vocabulary may be more

stable or dynamic, they may remain constant over the lifetime of the page, or they may appear or

disappear as the document changes. These differences in temporal term characteristics may lend

some insight into the terms' importance on the page for various information needs.

For example, on the page http://allrecipes.com, a popular website for sharing and rating recipes,

stable terms that appear consistently over time include: all recipes, cook, cookbooks, copyright,desserts, easy, healthy, newsroom, quick, recipe, and recipes. These terms represent a mix of

characteristic terms that are descriptive of the overall central topic of the page and navigational

elements. In contrast, terms that come and go during the summer months include: independence,themed, ag, fourth, macaroni, cream, zucchini, and grilled. These terms represent specific

content that may have been on the page for a period of time, in this case relating to current

holidays or the most recent recipes. This dynamic group of terms, although pertinent to the

content of the page at a particular time, are not central to the main topic of the page. Whenconsidering whether a document is relevant for a particular query, we may wish to consider

whether the information need is more likely to be addressed by consistent or changing terms. Is

the searcher more likely to be seeking dynamic or static content? Queries reflecting currentevents or late-breaking news may be better served by content that is recent (thus dynamic over

time). In the above example, a searcher looking for recipes to cook for the Fourth of July holiday

might be satisfied with term matches in the more dynamic portion of the page. On the otherhand, for navigational searches we may want to favor content that is stable over a longer period

of time and characteristic of the page in general. In our example, a searcher looking for the

allrecipes.com homepage would be better served by that portion of the document that does notchange.


4/9


ISSN: 1837-7823

39

IV Dynamic Ranked Retrieval

We now formalize the goal of Dynamic Ranked Retrieval into a well-founded yet simple

decision-theoretic model. The core component is the notion of a ranking tree, which replaces the

static ranking of a conventional retrieval system. The nodes in the tree correspond to individual

results (i.e. documents), and each user's search experience corresponds to a path in the tree. Thepath a particular user takes depends on that user's actions, in particular whether the user decides

to expand a result to view the corresponding indented ranking. Expanding a result corresponds totaking the right branch of the corresponding node in the ranking tree, and skipping corresponds

to taking the left branch. non-relevant documents. Note that users with different query intents

consider different documents as relevant, and so will take different paths through the tree. Wewill explore other user policies later, in particular policies involving noisy user behavior.

It is now very natural to score the retrieval quality of a particular user's search experience via the

documents encountered on her path through the ranking tree. Note that the traversed path

corresponds to the final dynamic ranking presented in the user's browser, so that the i-thdocument on the traversed path corresponds to the i-th document the user sees. Thus, the

traversed path is essentially a user-specific ranking, which we can evaluate using existing

performance measures like n DCG, average precision, or Precision @ k.

V Personalization

Personalization is one of the latest trends in search engines. The two key ways to achieve

personalization in authority flow-based search systems like Page Rank are using a personalized

base set and adjusting the authority flow weight of the edges. The former involves selecting userdependent entities as the source of the authority in the data graph. The latter allows users to

assign different importance to different types of edges. For instance, a biologist querying NCBI

Entrez genomic resources may assign a high weight to the gene-to-protein link type whereas a

practitioner may assign a higher score to the publication-cites-publication link type. Object Rankwas the first work to propose customization of the weight associated with link types. This type of

ranking is referred as authority flow ranking. The problem of achieving scalable personalizationbased on a personalized base set, i.e., a personalization vector. However, no previous work has

addressed the problem of scalable link-based personalization based on user-dependent authority

flow weights. The latter is the focus of this paper. The specified problem arises both in thecontext of the Web as well as other databases with association links between their entities, e.g.,

biological, clinical or bibliographic databases. There are two reasons why personalization ofauthority flow is expensive. One is that the specific weights associated with a link type will be

determined by the specific user when they submit a query. Another dimension is the query-specific vs. query-independent nature of computing the ranking. Page Rank creates a global

ranking of the Web pages, whereas Object Rank creates a query-specific ranking. This isachieved by adding all query-related nodes of the data graph to a base set. To summarize, theaspect of choosing a personalized authority flow weight assignment is orthogonal to that of the

base set selection. Hence, our work is applicable to both the Page Rank and the Object Rank

problem variants.


5/9


ISSN: 1837-7823

40

VI Quality and Scalability

Object Rank returns top-k search results for a given query using both the content and the link

structure in G. Since it utilizes the link structure that captures the semantic relationships between

objects, an object that does not contain a given keyword but is highly relevant to the keyword

can be included in the top-k list. This is in contrast to the static Page Rank approach that onlyreturns objects containing the keyword sorted according to their Page Rank score. This key

difference is one of the main reasons for Object Ranks superior result quality, as demonstrated

by the relevance feedback survey reported in. For a given query, Object Rank iterates over theentire graph G to calculate the Object Rank vector r until | ri(k+1)- ri(k)| is less than the

convergence threshold for every ri(k+1) in r(k+1) and ri(k) and r(k).This is a very strict stopping

condition. This iterative computation may take a very long time if G has a large number of nodesand edges. This iterative computation may take a very long time if G has a large number of

nodes and edges. Therefore, instead of evaluating a keyword query at query time, the original

Object Rank system precomputes the Object Rank vectors of keywords in H, the set ofkeywords, during the preprocessing stage, and then, stores a list of pairs

per keyword. However, the preprocessing stage of Object Rank is expensive, as it requires |H|

Object Rank executions and O(|V | . |H|) bits of storage. In fact, according to the worst- case

bounds for PPR index size proven in [4], the index size must be (|V| . |H|) bits, for any systemthat returns the exact Object Rank vectors.

ScaleRank assumes a repository of precomputed rankings for a given set of authority flow

weights. It approximates the authority flow ranking of a user-specified assignment of authority

flow weights by first selecting a subset of rankings from the repository and then computing a

weighted combination of these selected rankings. A key principle behind ScaleRank is the

authority flow linearity theorem for the aggregate surfer; her behavior is controlled by multiple

personalized rankings.

VII Algorithms for Dynamic Ranking

In the following, we propose two efficient algorithms for constructing dynamic ranking trees.

Both algorithms build ranking trees top-down by recursively adding child nodes to the currentleaves (similar to most decision-tree learning algorithms). Unlike StaticMyopic, document

selection is performed by conditioning on the sequence of user interactions (e.g. result

expansions and skips) that led the user to that node.


6/9


ISSN: 1837-7823

41

VIII The ScaleRank System Architecture

Figure 2 shows the architecture of the system, which inputs a query (a weight assignment vectorq) and outputs the top K objects based on their authority score. The system maintains a

repository ofMcandidate rankings. For each candidate ranking we store its weight assignment

vector, and its ranking vector. Given a query, the Candidate Ranking Selector selects m

candidate rankings out of the M in the repository based on a heuristic described below. Thereason that onlymare selected is that the cost of ScaleRank depends on the number of input

rankings. ScaleRank algorithm then computes the best way to linearly combine these m rankings.

Finally a top Kalgorithm is used to produce the top Kobjects. Figure shows the architecture ofthe BinRank system. During query processing stage, we execute the Object Rank algorithm on

the subgraphs instead of the full graph and produce high-quality approximations of top-k lists at

a small fraction of the cost. In order to save preprocessing cost and storage, each MSG isdesigned to answer multiple term queries. We observed in the Wikipedia data set that a single

MSG can be used for 330-2,000 terms, on average.


7/9


ISSN: 1837-7823

42

Fig.2

IX Conclusion

This paper proposed a dynamic ranked retrieval model which allows users to interactively

expand the ranking to further refine the information need. The model is based on a concise

decision-theoretic framework that naturally generalizes both the standard and the intent-awarestatic retrieval models. The framework provides a principled way of evaluating dynamic retrieval

systems, as well as a basis for deriving dynamic ranked retrieval algorithms. We presented two

such algorithms and prove theoretical guarantees for their retrieval quality. We also evaluated the

algorithms empirically and find that dynamic rankings can provide very substantial gains in

retrieval performance. Finally, we showed that the retrieval functions of these algorithms can belearned from training data. Our contributions in this work include: the first evaluation of the

relationship between document dynamics and relevance ranking, the introduction of a noveldocument ranking algorithm for use with dynamic documents, and a query independent

document prior based on document dynamics. We show that these two approaches to ranking

dynamic documents are complementary and both yield significant performance gains.

In this paper we studied the problem of finding the most probable ranking of the set of objects

when preference probabilities are known for every pair of objects. We showed the connectionbetween this problem and a problem in multi graph and proposed three algorithms for finding the

most probable ranking. Evaluation on both synthetic and real world datasets showed that none of

the algorithms outperformed the others in all the situations and each one has its strengths andweaknesses. That would suggest that it probably makes sense to combine the algorithms to getoptimal results.


8/9


ISSN: 1837-7823

43

References

[1] J. Aalbersberg. Incremental relevance feedback. In ACM Conference on Research and

Development in Information Retrieval (SIGIR), pages 11-22, 1992.

[2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ACM

Conference on Web Search and Data Mining (WSDM), 2009.

[3] A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for

query recommendation. In ACM Conference on Web Search and Data Mining (WSDM),2010.

[4] A. Abdulkader, J. A. Drakopoulos, and Q. Zhang. Comparative classifier aggregation. In

ICPR 06: Proceedings of the 18th International Conference on Pattern Recognition, pages156159, Washington, DC, USA, 2006. IEEE Computer Society.

[5] N. Ailon and M. Mohri. An efficient reduction of ranking to classification. Technical report,NYU, 2007.

[6] F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. B. Sorkin.Robust reductions from ranking to classification. Mach. Learn., 72(1-2):139153, 2008.

[7] Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD '06.

[8] A. Balmin, V. Hristidis, and Y. Papakonstantinou.Object Rank: Authoritybased keyword

search in databases. In VLDB, pages 564575, 2004.

[9] S. Chakrabarti. Dynamic personalized Page Rank in entityrelation graphs. In WWW '07:Proceedings of the 16th international conference on World Wide Web, pages 571580, New

York, NY, USA, 2007. ACM.

[10] J. Cho and U. Schonfeld, Rankmass Crawler: A Crawler with High Page Rank Coverage

Guarantee, Proc. Intl Conf. Very Large Data Bases

[11] R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating

rankings with ties, in PODS 04.LDB), 2007.

[12] H. Hwang, A. Balmin, B. Reinwald, and E. Nijkamp, Binrank: Scaling dynamic authority-

based search using materialized subgraphs, in ICDE 09, 2009, pp. 6677.

[13] G. Jeh and J. Widom, Scaling personalized web search, in WWW 03. New York, NY,

USA: ACM, 2003, pp. 271279

[14] D.Fogaras, B.Racz,K.Csalogany,and .Sarlos,"Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds,and Experiment", Internet Math.,vol.2,no.3,pp.333-

358,2005.


9/9


ISSN: 1837-7823

44

[15] K.Avrachenkov,N.Litvak,D.Nemirovsky, N.Osipova,"Monte Carlo Methods in Page Rank

Computation:When One Iteration Is Sufficient", SIAM J.Numerical Analysis,vol.45,no.2,pp.890-904,2007.

[16] A.Balmin,V.Hristidis, Y.Papakonstantinou,"Object Rank:A uthority-Based Keyboard

Search in Databases", Proc.Intl Conf.Very Large Data Bases (VLDB),2004.

[17] Z.Nie,Y.Zhang,J.-R.Wen,W.-Y.Ma,"Object-Level Ranking:Bringing Order to WebObjects", Proc.Intl World Wide Web Conf.(WWW), pp.567-574,2005.

Dynamic Ranking Algorithm Using Multi Graph Technology

Documents

Transcript of Dynamic Ranking Algorithm Using Multi Graph Technology