Dynamic Ranking Algorithm Using Multi Graph Technology

download Dynamic Ranking Algorithm Using Multi Graph Technology

of 9

Transcript of Dynamic Ranking Algorithm Using Multi Graph Technology

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    1/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    36

    Dynamic Ranking Algorithm Using Multi Graph Technology

    S.N.Sheela Evangelin Prasad1

    and Dr.M.V.Srinath2

    1Associate Professor CSE Dept, Sri Krishna Engineering College, Chennai

    2Director STET Womens College, Mannargudi.

    Abstract

    Dynamic Ranking is a system that approximates object rank results by utilizing a hybrid

    approach inspired by materialized views in traditional query processing. Number of relativelysmall subsets of the multi graph are materialized in such a way that any keyword query can be

    answered by running Object Rank on only one of the multi graph. Dynamic ranking generates

    the multi graphs by partitioning all the terms in the corpus based on their co-occurrence,executing Object Rank for each partition using the terms to generate a set of random walkstarting points, and keeping only those objects that receive non-negligible scores. The intuition is

    that a multi graph that contains all objects and links relevant to a set of related terms should have

    all the information needed to rank objects with respect to one of these terms. We present atheoretically well-founded retrieval model for dynamically generating rankings based on

    interactive user feedback. Unlike conventional rankings that remain static after the query was

    issued, dynamic rankings allow and anticipate user activity, thus providing a way to combine theotherwise contradictory goals of result diversification and high recall.

    Keywords : Object Rank, Page Rank, Dynamic Rank, Multi graph

    I Introduction

    Object Rank is a system to perform authority-based keyword search on databases, inspired by

    Page Rank. Page Rank is an excellent tool to rank the global importance of the pages of the

    Web, proven by the success of Google. However, Google uses Page Rank as a tool to measurethe global importance of the pages, independently of a keyword query. Google uses traditional

    IR techniques to estimate the relevance of a page to a keyword query, which is then combined

    with the Page Rank value to calculate the final score of a page. We appropriately extend andmodify Page Rank to perform keyword search on databases. For example, consider the

    publications database of Figure 1, where edges denote citations (edges start from citing and endat cited paper), and the keyword query Sorting. Then, using the original variant of Object Rank,

    theAccess Path Selection in a Relational Database Management System paper would be rankedhighest, because it is cited by four papers containing sorting (or sort). The Fundamental

    Techniques for Order Optimization paper would be ranked second, since it is cited by only

    three sorting papers. The Page Rank algorithm utilizes the Web graph link structure to assignglobal importance to Web pages. It works by modeling the behavior of a random Web surfer

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    2/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    37

    who starts at a random Web page and follows outgoing links with uniform probability. Dynamic

    versions of the Page Rank algorithm, Personalized Page Rank (PPR) for Web graph datasets andObject Rank for graph-modeled databases have become popular which are characterized by a

    query-specific choice of the random walk starting points. PPR is a modification of Page Rank

    that performs search personalized on a preference set that contains web pages that a user likes.

    Object Rank extends (personalized) Page Rank to perform keyword search in databases. ObjectRank uses a query term posting list as a set of random walk starting points and conducts the walk

    on the instance graph of the database. Object Rank has successfully been applied to databasesthat have social networking components, such as bibliographic data and collaborative product

    design. Object Rank suffers from the same scalability issues as personalized Page Rank, as it

    requires multiple iterations over all nodes and links of the entire database graph.

    Fig.1

    II Dynamic Ranking

    Web documents are dynamic. Newspaper homepages such as the The Hindu change several

    times a day, market pace sites such as amazon can change many times an hour and blogs are

    updated with varying frequencies when new posts and comments are added. Some of thesechanges are substantial and significant for information seekers- new stories appearing on a

    homepage or new comments to a blog post. Others hold less interest for those looking for

    information- visitation counters, advertisement content, or formatting changes have little impacton the page content.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    3/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    38

    Currently, document ranking algorithms only have a static view of the page content. In this work

    we explore the interaction between the dynamics of web documents and relevance ranking, usingdocument representations that view a document as a dynamic entity. We focus specifically on

    navigational searches, where there is very little variation across users on the clicked results, and

    there tend to be a small number of highly relevant documents that are consistently relevant

    across time. We find that, for these queries, there are significant relationships between thelikelihood of change and the relevance level of the page. We develop a novel probabilistic

    retrieval model which takes into account dynamic content, and show significant performanceimprovements over a model that only views a document at a single point in time. To our

    knowledge, this is the first published study looking at content change within documents from a

    relevance ranking perspective.

    III Document Dynamics and Relevance

    Documents change for many reasons. The Hindu pages change whenever new stories are addedor old stories are updated, amazon when new classified ads are added, and academics' home

    pages when new papers are published. All of these pages change at different frequencies and indifferent amounts. In this section we provide some examples and intuitions about how suchchange may be used to improve relevance ranking. We examine two change features: (1) a

    query-relevant feature reflecting how the terms on a page (in particular those that match the

    query) change over time, and (2) a query-independent feature reflecting how frequently or byhow much the page changes over time. Different terms in a page's vocabulary may be more

    stable or dynamic, they may remain constant over the lifetime of the page, or they may appear or

    disappear as the document changes. These differences in temporal term characteristics may lend

    some insight into the terms' importance on the page for various information needs.

    For example, on the page http://allrecipes.com, a popular website for sharing and rating recipes,

    stable terms that appear consistently over time include: all recipes, cook, cookbooks, copyright,desserts, easy, healthy, newsroom, quick, recipe, and recipes. These terms represent a mix of

    characteristic terms that are descriptive of the overall central topic of the page and navigational

    elements. In contrast, terms that come and go during the summer months include: independence,themed, ag, fourth, macaroni, cream, zucchini, and grilled. These terms represent specific

    content that may have been on the page for a period of time, in this case relating to current

    holidays or the most recent recipes. This dynamic group of terms, although pertinent to the

    content of the page at a particular time, are not central to the main topic of the page. Whenconsidering whether a document is relevant for a particular query, we may wish to consider

    whether the information need is more likely to be addressed by consistent or changing terms. Is

    the searcher more likely to be seeking dynamic or static content? Queries reflecting currentevents or late-breaking news may be better served by content that is recent (thus dynamic over

    time). In the above example, a searcher looking for recipes to cook for the Fourth of July holiday

    might be satisfied with term matches in the more dynamic portion of the page. On the otherhand, for navigational searches we may want to favor content that is stable over a longer period

    of time and characteristic of the page in general. In our example, a searcher looking for the

    allrecipes.com homepage would be better served by that portion of the document that does notchange.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    4/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    39

    IV Dynamic Ranked Retrieval

    We now formalize the goal of Dynamic Ranked Retrieval into a well-founded yet simple

    decision-theoretic model. The core component is the notion of a ranking tree, which replaces the

    static ranking of a conventional retrieval system. The nodes in the tree correspond to individual

    results (i.e. documents), and each user's search experience corresponds to a path in the tree. Thepath a particular user takes depends on that user's actions, in particular whether the user decides

    to expand a result to view the corresponding indented ranking. Expanding a result corresponds totaking the right branch of the corresponding node in the ranking tree, and skipping corresponds

    to taking the left branch. non-relevant documents. Note that users with different query intents

    consider different documents as relevant, and so will take different paths through the tree. Wewill explore other user policies later, in particular policies involving noisy user behavior.

    It is now very natural to score the retrieval quality of a particular user's search experience via the

    documents encountered on her path through the ranking tree. Note that the traversed path

    corresponds to the final dynamic ranking presented in the user's browser, so that the i-thdocument on the traversed path corresponds to the i-th document the user sees. Thus, the

    traversed path is essentially a user-specific ranking, which we can evaluate using existing

    performance measures like n DCG, average precision, or Precision @ k.

    V Personalization

    Personalization is one of the latest trends in search engines. The two key ways to achieve

    personalization in authority flow-based search systems like Page Rank are using a personalized

    base set and adjusting the authority flow weight of the edges. The former involves selecting userdependent entities as the source of the authority in the data graph. The latter allows users to

    assign different importance to different types of edges. For instance, a biologist querying NCBI

    Entrez genomic resources may assign a high weight to the gene-to-protein link type whereas a

    practitioner may assign a higher score to the publication-cites-publication link type. Object Rankwas the first work to propose customization of the weight associated with link types. This type of

    ranking is referred as authority flow ranking. The problem of achieving scalable personalizationbased on a personalized base set, i.e., a personalization vector. However, no previous work has

    addressed the problem of scalable link-based personalization based on user-dependent authority

    flow weights. The latter is the focus of this paper. The specified problem arises both in thecontext of the Web as well as other databases with association links between their entities, e.g.,

    biological, clinical or bibliographic databases. There are two reasons why personalization ofauthority flow is expensive. One is that the specific weights associated with a link type will be

    determined by the specific user when they submit a query. Another dimension is the query-specific vs. query-independent nature of computing the ranking. Page Rank creates a global

    ranking of the Web pages, whereas Object Rank creates a query-specific ranking. This isachieved by adding all query-related nodes of the data graph to a base set. To summarize, theaspect of choosing a personalized authority flow weight assignment is orthogonal to that of the

    base set selection. Hence, our work is applicable to both the Page Rank and the Object Rank

    problem variants.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    5/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    40

    VI Quality and Scalability

    Object Rank returns top-k search results for a given query using both the content and the link

    structure in G. Since it utilizes the link structure that captures the semantic relationships between

    objects, an object that does not contain a given keyword but is highly relevant to the keyword

    can be included in the top-k list. This is in contrast to the static Page Rank approach that onlyreturns objects containing the keyword sorted according to their Page Rank score. This key

    difference is one of the main reasons for Object Ranks superior result quality, as demonstrated

    by the relevance feedback survey reported in. For a given query, Object Rank iterates over theentire graph G to calculate the Object Rank vector r until | ri(k+1)- ri(k)| is less than the

    convergence threshold for every ri(k+1) in r(k+1) and ri(k) and r(k).This is a very strict stopping

    condition. This iterative computation may take a very long time if G has a large number of nodesand edges. This iterative computation may take a very long time if G has a large number of

    nodes and edges. Therefore, instead of evaluating a keyword query at query time, the original

    Object Rank system precomputes the Object Rank vectors of keywords in H, the set ofkeywords, during the preprocessing stage, and then, stores a list of pairs

    per keyword. However, the preprocessing stage of Object Rank is expensive, as it requires |H|

    Object Rank executions and O(|V | . |H|) bits of storage. In fact, according to the worst- case

    bounds for PPR index size proven in [4], the index size must be (|V| . |H|) bits, for any systemthat returns the exact Object Rank vectors.

    ScaleRank assumes a repository of precomputed rankings for a given set of authority flow

    weights. It approximates the authority flow ranking of a user-specified assignment of authority

    flow weights by first selecting a subset of rankings from the repository and then computing a

    weighted combination of these selected rankings. A key principle behind ScaleRank is the

    authority flow linearity theorem for the aggregate surfer; her behavior is controlled by multiple

    personalized rankings.

    VII Algorithms for Dynamic Ranking

    In the following, we propose two efficient algorithms for constructing dynamic ranking trees.

    Both algorithms build ranking trees top-down by recursively adding child nodes to the currentleaves (similar to most decision-tree learning algorithms). Unlike StaticMyopic, document

    selection is performed by conditioning on the sequence of user interactions (e.g. result

    expansions and skips) that led the user to that node.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    6/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    41

    VIII The ScaleRank System Architecture

    Figure 2 shows the architecture of the system, which inputs a query (a weight assignment vectorq) and outputs the top K objects based on their authority score. The system maintains a

    repository ofMcandidate rankings. For each candidate ranking we store its weight assignment

    vector, and its ranking vector. Given a query, the Candidate Ranking Selector selects m

    candidate rankings out of the M in the repository based on a heuristic described below. Thereason that onlymare selected is that the cost of ScaleRank depends on the number of input

    rankings. ScaleRank algorithm then computes the best way to linearly combine these m rankings.

    Finally a top Kalgorithm is used to produce the top Kobjects. Figure shows the architecture ofthe BinRank system. During query processing stage, we execute the Object Rank algorithm on

    the subgraphs instead of the full graph and produce high-quality approximations of top-k lists at

    a small fraction of the cost. In order to save preprocessing cost and storage, each MSG isdesigned to answer multiple term queries. We observed in the Wikipedia data set that a single

    MSG can be used for 330-2,000 terms, on average.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    7/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    42

    Fig.2

    IX Conclusion

    This paper proposed a dynamic ranked retrieval model which allows users to interactively

    expand the ranking to further refine the information need. The model is based on a concise

    decision-theoretic framework that naturally generalizes both the standard and the intent-awarestatic retrieval models. The framework provides a principled way of evaluating dynamic retrieval

    systems, as well as a basis for deriving dynamic ranked retrieval algorithms. We presented two

    such algorithms and prove theoretical guarantees for their retrieval quality. We also evaluated the

    algorithms empirically and find that dynamic rankings can provide very substantial gains in

    retrieval performance. Finally, we showed that the retrieval functions of these algorithms can belearned from training data. Our contributions in this work include: the first evaluation of the

    relationship between document dynamics and relevance ranking, the introduction of a noveldocument ranking algorithm for use with dynamic documents, and a query independent

    document prior based on document dynamics. We show that these two approaches to ranking

    dynamic documents are complementary and both yield significant performance gains.

    In this paper we studied the problem of finding the most probable ranking of the set of objects

    when preference probabilities are known for every pair of objects. We showed the connectionbetween this problem and a problem in multi graph and proposed three algorithms for finding the

    most probable ranking. Evaluation on both synthetic and real world datasets showed that none of

    the algorithms outperformed the others in all the situations and each one has its strengths andweaknesses. That would suggest that it probably makes sense to combine the algorithms to getoptimal results.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    8/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    43

    References

    [1] J. Aalbersberg. Incremental relevance feedback. In ACM Conference on Research and

    Development in Information Retrieval (SIGIR), pages 11-22, 1992.

    [2] R. Agrawal, S. Gollapudi, A. Halverson, and S. Ieong. Diversifying search results. In ACM

    Conference on Web Search and Data Mining (WSDM), 2009.

    [3] A. Anagnostopoulos, L. Becchetti, C. Castillo, and A. Gionis. An optimization framework for

    query recommendation. In ACM Conference on Web Search and Data Mining (WSDM),2010.

    [4] A. Abdulkader, J. A. Drakopoulos, and Q. Zhang. Comparative classifier aggregation. In

    ICPR 06: Proceedings of the 18th International Conference on Pattern Recognition, pages156159, Washington, DC, USA, 2006. IEEE Computer Society.

    [5] N. Ailon and M. Mohri. An efficient reduction of ranking to classification. Technical report,NYU, 2007.

    [6] F. Balcan, N. Bansal, A. Beygelzimer, D. Coppersmith, J. Langford, and G. B. Sorkin.Robust reductions from ranking to classification. Mach. Learn., 72(1-2):139153, 2008.

    [7] Agarwal, S. Chakrabarti, and S. Aggarwal. Learning to rank networked entities. In KDD '06.

    [8] A. Balmin, V. Hristidis, and Y. Papakonstantinou.Object Rank: Authoritybased keyword

    search in databases. In VLDB, pages 564575, 2004.

    [9] S. Chakrabarti. Dynamic personalized Page Rank in entityrelation graphs. In WWW '07:Proceedings of the 16th international conference on World Wide Web, pages 571580, New

    York, NY, USA, 2007. ACM.

    [10] J. Cho and U. Schonfeld, Rankmass Crawler: A Crawler with High Page Rank Coverage

    Guarantee, Proc. Intl Conf. Very Large Data Bases

    [11] R. Fagin, R. Kumar, M. Mahdian, D. Sivakumar, and E. Vee, Comparing and aggregating

    rankings with ties, in PODS 04.LDB), 2007.

    [12] H. Hwang, A. Balmin, B. Reinwald, and E. Nijkamp, Binrank: Scaling dynamic authority-

    based search using materialized subgraphs, in ICDE 09, 2009, pp. 6677.

    [13] G. Jeh and J. Widom, Scaling personalized web search, in WWW 03. New York, NY,

    USA: ACM, 2003, pp. 271279

    [14] D.Fogaras, B.Racz,K.Csalogany,and .Sarlos,"Towards Scaling Fully Personalized PageRank: Algorithms, Lower Bounds,and Experiment", Internet Math.,vol.2,no.3,pp.333-

    358,2005.

  • 7/29/2019 Dynamic Ranking Algorithm Using Multi Graph Technology

    9/9

    International Journal of Computational Intelligence and Information Security, December 2012 Vol. 3, No. 10

    ISSN: 1837-7823

    44

    [15] K.Avrachenkov,N.Litvak,D.Nemirovsky, N.Osipova,"Monte Carlo Methods in Page Rank

    Computation:When One Iteration Is Sufficient", SIAM J.Numerical Analysis,vol.45,no.2,pp.890-904,2007.

    [16] A.Balmin,V.Hristidis, Y.Papakonstantinou,"Object Rank:A uthority-Based Keyboard

    Search in Databases", Proc.Intl Conf.Very Large Data Bases (VLDB),2004.

    [17] Z.Nie,Y.Zhang,J.-R.Wen,W.-Y.Ma,"Object-Level Ranking:Bringing Order to WebObjects", Proc.Intl World Wide Web Conf.(WWW), pp.567-574,2005.