Generalized comparison of graph-based ranking algorithms ...
Dynamic Ranking Algorithm Using Multi Graph Technology
-
Upload
rachel-wheeler -
Category
Documents
-
view
218 -
download
0
Transcript of Dynamic Ranking Algorithm Using Multi Graph Technology
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
1/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
36
Dynamic Ranking Algorithm Using Multi Graph Technology
S.N.Sheela Evangelin Prasad1
and Dr.M.V.Srinath2
1Associate Professor CSE Dept, Sri Krishna Engineering College, Chennai
2Director STET Womens College, Mannargudi.
Abstract
Dynamic Ranking is a system that approximates object rank results by utilizing a hybrid
approach inspired by materialized views in traditional query processing. Number of relativelysmall subsets of the multi graph are materialized in such a way that any keyword query can be
answered by running Object Rank on only one of the multi graph. Dynamic ranking generates
the multi graphs by partitioning all the terms in the corpus based on their co-occurrence,executing Object Rank for each partition using the terms to generate a set of random walkstarting points, and keeping only those objects that receive non-negligible scores. The intuition is
that a multi graph that contains all objects and links relevant to a set of related terms should have
all the information needed to rank objects with respect to one of these terms. We present atheoretically well-founded retrieval model for dynamically generating rankings based on
interactive user feedback. Unlike conventional rankings that remain static after the query was
issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine theotherwise contradictory goals of result diversification and high recall.
Keywords : Object Rank, Page Rank, Dynamic Rank, Multi graph
I Introduction
Object Rank is a system to perform authority-based keyword search on databases, inspired by
Page Rank. Page Rank is an excellent tool to rank the global importance of the pages of the
Web, proven by the success of Google. However, Google uses Page Rank as a tool to measurethe global importance of the pages, independently of a keyword query. Google uses traditional
IR techniques to estimate the relevance of a page to a keyword query, which is then combined
with the Page Rank value to calculate the final score of a page. We appropriately extend andmodify Page Rank to perform keyword search on databases. For example, consider the
publications database of Figure 1, where edges denote citations (edges start from citing and endat cited paper), and the keyword query Sorting. Then, using the original variant of Object Rank,
theAccess Path Selection in a Relational Database Management System paper would be rankedhighest, because it is cited by four papers containing sorting (or sort). The Fundamental
Techniques for Order Optimization paper would be ranked second, since it is cited by only
three sorting papers. The Page Rank algorithm utilizes the Web graph link structure to assignglobal importance to Web pages. It works by modeling the behavior of a random Web surfer
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
2/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
37
who starts at a random Web page and follows outgoing links with uniform probability. Dynamic
versions of the Page Rank algorithm, Personalized Page Rank (PPR) for Web graph datasets andObject Rank for graph-modeled databases have become popular which are characterized by a
query-specific choice of the random walk starting points. PPR is a modification of Page Rank
that performs search personalized on a preference set that contains web pages that a user likes.
Object Rank extends (personalized) Page Rank to perform keyword search in databases. ObjectRank uses a query term posting list as a set of random walk starting points and conducts the walk
on the instance graph of the database. Object Rank has successfully been applied to databasesthat have social networking components, such as bibliographic data and collaborative product
design. Object Rank suffers from the same scalability issues as personalized Page Rank, as it
requires multiple iterations over all nodes and links of the entire database graph.
Fig.1
II Dynamic Ranking
Web documents are dynamic. Newspaper homepages such as the The Hindu change several
times a day, market pace sites such as amazon can change many times an hour and blogs are
updated with varying frequencies when new posts and comments are added. Some of thesechanges are substantial and significant for information seekers- new stories appearing on a
homepage or new comments to a blog post. Others hold less interest for those looking for
information- visitation counters, advertisement content, or formatting changes have little impacton the page content.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
3/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
38
Currently, document ranking algorithms only have a static view of the page content. In this work
we explore the interaction between the dynamics of web documents and relevance ranking, usingdocument representations that view a document as a dynamic entity. We focus specifically on
navigational searches, where there is very little variation across users on the clicked results, and
there tend to be a small number of highly relevant documents that are consistently relevant
across time. We find that, for these queries, there are significant relationships between thelikelihood of change and the relevance level of the page. We develop a novel probabilistic
retrieval model which takes into account dynamic content, and show significant performanceimprovements over a model that only views a document at a single point in time. To our
knowledge, this is the first published study looking at content change within documents from a
relevance ranking perspective.
III Document Dynamics and Relevance
Documents change for many reasons. The Hindu pages change whenever new stories are addedor old stories are updated, amazon when new classified ads are added, and academics' home
pages when new papers are published. All of these pages change at different frequencies and indifferent amounts. In this section we provide some examples and intuitions about how suchchange may be used to improve relevance ranking. We examine two change features: (1) a
query-relevant feature reflecting how the terms on a page (in particular those that match the
query) change over time, and (2) a query-independent feature reflecting how frequently or byhow much the page changes over time. Different terms in a page's vocabulary may be more
stable or dynamic, they may remain constant over the lifetime of the page, or they may appear or
disappear as the document changes. These differences in temporal term characteristics may lend
some insight into the terms' importance on the page for various information needs.
For example, on the page http://allrecipes.com, a popular website for sharing and rating recipes,
stable terms that appear consistently over time include: all recipes, cook, cookbooks, copyright,desserts, easy, healthy, newsroom, quick, recipe, and recipes. These terms represent a mix of
characteristic terms that are descriptive of the overall central topic of the page and navigational
elements. In contrast, terms that come and go during the summer months include: independence,themed, ag, fourth, macaroni, cream, zucchini, and grilled. These terms represent specific
content that may have been on the page for a period of time, in this case relating to current
holidays or the most recent recipes. This dynamic group of terms, although pertinent to the
content of the page at a particular time, are not central to the main topic of the page. Whenconsidering whether a document is relevant for a particular query, we may wish to consider
whether the information need is more likely to be addressed by consistent or changing terms. Is
the searcher more likely to be seeking dynamic or static content? Queries reflecting currentevents or late-breaking news may be better served by content that is recent (thus dynamic over
time). In the above example, a searcher looking for recipes to cook for the Fourth of July holiday
might be satisfied with term matches in the more dynamic portion of the page. On the otherhand, for navigational searches we may want to favor content that is stable over a longer period
of time and characteristic of the page in general. In our example, a searcher looking for the
allrecipes.com homepage would be better served by that portion of the document that does notchange.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
4/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
39
IV Dynamic Ranked Retrieval
We now formalize the goal of Dynamic Ranked Retrieval into a well-founded yet simple
decision-theoretic model. The core component is the notion of a ranking tree, which replaces the
static ranking of a conventional retrieval system. The nodes in the tree correspond to individual
results (i.e. documents), and each user's search experience corresponds to a path in the tree. Thepath a particular user takes depends on that user's actions, in particular whether the user decides
to expand a result to view the corresponding indented ranking. Expanding a result corresponds totaking the right branch of the corresponding node in the ranking tree, and skipping corresponds
to taking the left branch. non-relevant documents. Note that users with different query intents
consider different documents as relevant, and so will take different paths through the tree. Wewill explore other user policies later, in particular policies involving noisy user behavior.
It is now very natural to score the retrieval quality of a particular user's search experience via the
documents encountered on her path through the ranking tree. Note that the traversed path
corresponds to the final dynamic ranking presented in the user's browser, so that the i-thdocument on the traversed path corresponds to the i-th document the user sees. Thus, the
traversed path is essentially a user-specific ranking, which we can evaluate using existing
performance measures like n DCG, average precision, or Precision @ k.
V Personalization
Personalization is one of the latest trends in search engines. The two key ways to achieve
personalization in authority flow-based search systems like Page Rank are using a personalized
base set and adjusting the authority flow weight of the edges. The former involves selecting userdependent entities as the source of the authority in the data graph. The latter allows users to
assign different importance to different types of edges. For instance, a biologist querying NCBI
Entrez genomic resources may assign a high weight to the gene-to-protein link type whereas a
practitioner may assign a higher score to the publication-cites-publication link type. Object Rankwas the first work to propose customization of the weight associated with link types. This type of
ranking is referred as authority flow ranking. The problem of achieving scalable personalizationbased on a personalized base set, i.e., a personalization vector. However, no previous work has
addressed the problem of scalable link-based personalization based on user-dependent authority
flow weights. The latter is the focus of this paper. The specified problem arises both in thecontext of the Web as well as other databases with association links between their entities, e.g.,
biological, clinical or bibliographic databases. There are two reasons why personalization ofauthority flow is expensive. One is that the specific weights associated with a link type will be
determined by the specific user when they submit a query. Another dimension is the query-specific vs. query-independent nature of computing the ranking. Page Rank creates a global
ranking of the Web pages, whereas Object Rank creates a query-specific ranking. This isachieved by adding all query-related nodes of the data graph to a base set. To summarize, theaspect of choosing a personalized authority flow weight assignment is orthogonal to that of the
base set selection. Hence, our work is applicable to both the Page Rank and the Object Rank
problem variants.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
5/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
40
VI Quality and Scalability
Object Rank returns top-k search results for a given query using both the content and the link
structure in G. Since it utilizes the link structure that captures the semantic relationships between
objects, an object that does not contain a given keyword but is highly relevant to the keyword
can be included in the top-k list. This is in contrast to the static Page Rank approach that onlyreturns objects containing the keyword sorted according to their Page Rank score. This key
difference is one of the main reasons for Object Ranks superior result quality, as demonstrated
by the relevance feedback survey reported in. For a given query, Object Rank iterates over theentire graph G to calculate the Object Rank vector r until | ri(k+1)- ri(k)| is less than the
convergence threshold for every ri(k+1) in r(k+1) and ri(k) and r(k).This is a very strict stopping
condition. This iterative computation may take a very long time if G has a large number of nodesand edges. This iterative computation may take a very long time if G has a large number of
nodes and edges. Therefore, instead of evaluating a keyword query at query time, the original
Object Rank system precomputes the Object Rank vectors of keywords in H, the set ofkeywords, during the preprocessing stage, and then, stores a list of pairs
per keyword. However, the preprocessing stage of Object Rank is expensive, as it requires |H|
Object Rank executions and O(|V | . |H|) bits of storage. In fact, according to the worst- case
bounds for PPR index size proven in [4], the index size must be (|V| . |H|) bits, for any systemthat returns the exact Object Rank vectors.
ScaleRank assumes a repository of precomputed rankings for a given set of authority flow
weights. It approximates the authority flow ranking of a user-specified assignment of authority
flow weights by first selecting a subset of rankings from the repository and then computing a
weighted combination of these selected rankings. A key principle behind ScaleRank is the
authority flow linearity theorem for the aggregate surfer; her behavior is controlled by multiple
personalized rankings.
VII Algorithms for Dynamic Ranking
In the following, we propose two efficient algorithms for constructing dynamic ranking trees.
Both algorithms build ranking trees top-down by recursively adding child nodes to the currentleaves (similar to most decision-tree learning algorithms). Unlike StaticMyopic, document
selection is performed by conditioning on the sequence of user interactions (e.g. result
expansions and skips) that led the user to that node.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
6/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
41
VIII The ScaleRank System Architecture
Figure 2 shows the architecture of the system, which inputs a query (a weight assignment vectorq) and outputs the top K objects based on their authority score. The system maintains a
repository ofMcandidate rankings. For each candidate ranking we store its weight assignment
vector, and its ranking vector. Given a query, the Candidate Ranking Selector selects m
candidate rankings out of the M in the repository based on a heuristic described below. Thereason that onlymare selected is that the cost of ScaleRank depends on the number of input
rankings. ScaleRank algorithm then computes the best way to linearly combine these m rankings.
Finally a top Kalgorithm is used to produce the top Kobjects. Figure shows the architecture ofthe BinRank system. During query processing stage, we execute the Object Rank algorithm on
the subgraphs instead of the full graph and produce high-quality approximations of top-k lists at
a small fraction of the cost. In order to save preprocessing cost and storage, each MSG isdesigned to answer multiple term queries. We observed in the Wikipedia data set that a single
MSG can be used for 330-2,000 terms, on average.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
7/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
42
Fig.2
IX Conclusion
This paper proposed a dynamic ranked retrieval model which allows users to interactively
expand the ranking to further refine the information need. The model is based on a concise
decision-theoretic framework that naturally generalizes both the standard and the intent-awarestatic retrieval models. The framework provides a principled way of evaluating dynamic retrieval
systems, as well as a basis for deriving dynamic ranked retrieval algorithms. We presented two
such algorithms and prove theoretical guarantees for their retrieval quality. We also evaluated the
algorithms empirically and find that dynamic rankings can provide very substantial gains in
retrieval performance. Finally, we showed that the retrieval functions of these algorithms can belearned from training data. Our contributions in this work include: the first evaluation of the
relationship between document dynamics and relevance ranking, the introduction of a noveldocument ranking algorithm for use with dynamic documents, and a query independent
document prior based on document dynamics. We show that these two approaches to ranking
dynamic documents are complementary and both yield significant performance gains.
In this paper we studied the problem of finding the most probable ranking of the set of objects
when preference probabilities are known for every pair of objects. We showed the connectionbetween this problem and a problem in multi graph and proposed three algorithms for finding the
most probable ranking. Evaluation on both synthetic and real world datasets showed that none of
the algorithms outperformed the others in all the situations and each one has its strengths andweaknesses. That would suggest that it probably makes sense to combine the algorithms to getoptimal results.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
8/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
43
References
[1] J. Aalbersberg. Incremental relevance feedback. In ACM Conference on Research and
Development in Information Retrieval (SIGIR), pages 11-22, 1992.
[2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ACM
Conference on Web Search and Data Mining (WSDM), 2009.
[3] A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for
query recommendation. In ACM Conference on Web Search and Data Mining (WSDM),2010.
[4] A. Abdulkader, J. A. Drakopoulos, and Q. Zhang. Comparative classifier aggregation. In
ICPR 06: Proceedings of the 18th International Conference on Pattern Recognition, pages156159, Washington, DC, USA, 2006. IEEE Computer Society.
[5] N. Ailon and M. Mohri. An efficient reduction of ranking to classification. Technical report,NYU, 2007.
[6] F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. B. Sorkin.Robust reductions from ranking to classification. Mach. Learn., 72(1-2):139153, 2008.
[7] Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD '06.
[8] A. Balmin, V. Hristidis, and Y. Papakonstantinou.Object Rank: Authoritybased keyword
search in databases. In VLDB, pages 564575, 2004.
[9] S. Chakrabarti. Dynamic personalized Page Rank in entityrelation graphs. In WWW '07:Proceedings of the 16th international conference on World Wide Web, pages 571580, New
York, NY, USA, 2007. ACM.
[10] J. Cho and U. Schonfeld, Rankmass Crawler: A Crawler with High Page Rank Coverage
Guarantee, Proc. Intl Conf. Very Large Data Bases
[11] R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating
rankings with ties, in PODS 04.LDB), 2007.
[12] H. Hwang, A. Balmin, B. Reinwald, and E. Nijkamp, Binrank: Scaling dynamic authority-
based search using materialized subgraphs, in ICDE 09, 2009, pp. 6677.
[13] G. Jeh and J. Widom, Scaling personalized web search, in WWW 03. New York, NY,
USA: ACM, 2003, pp. 271279
[14] D.Fogaras, B.Racz,K.Csalogany,and .Sarlos,"Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds,and Experiment", Internet Math.,vol.2,no.3,pp.333-
358,2005.
-
7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology
9/9
International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10
ISSN: 1837-7823
44
[15] K.Avrachenkov,N.Litvak,D.Nemirovsky, N.Osipova,"Monte Carlo Methods in Page Rank
Computation:When One Iteration Is Sufficient", SIAM J.Numerical Analysis,vol.45,no.2,pp.890-904,2007.
[16] A.Balmin,V.Hristidis, Y.Papakonstantinou,"Object Rank:A uthority-Based Keyboard
Search in Databases", Proc.Intl Conf.Very Large Data Bases (VLDB),2004.
[17] Z.Nie,Y.Zhang,J.-R.Wen,W.-Y.Ma,"Object-Level Ranking:Bringing Order to WebObjects", Proc.Intl World Wide Web Conf.(WWW), pp.567-574,2005.