Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research...

20
Relevance Propagation Relevance Propagation for Web Search for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research Asia Joint Work with Tao Qin, Tsinghua University.
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Relevance Propagation for Web Search Dr. Tie-Yan Liu Web Search and Mining Group Microsoft Research...

Relevance Propagation Relevance Propagation for Web Searchfor Web Search

Relevance Propagation Relevance Propagation for Web Searchfor Web Search

Dr. Tie-Yan LiuWeb Search and Mining Group

Microsoft Research Asia

Joint Work with Tao Qin, Tsinghua University.

2006-3-13 DCWC 20062

OutlineOutline

• Introduction• Generic framework for relevance propagation• Evaluations

, Effectiveness analysis, Complexity analysis

• Conclusions

2006-3-13 DCWC 20063

IntroductionIntroduction

• Web Search ≠ Information Retrieval, Beside the content relevance, various structure

information also plays an important role in Web search

• Hyperlink graph• Local sitemap• Webpage layout

A1

...

A33A32A31

A22A21

p1

p5

p3

...

...

p4

...

...

...

p2

2006-3-13 DCWC 20064

IntroductionIntroduction

• Three ways of utilizing the structure information for Web search, Linear combination of content relevance and importance

scores computed from hyperlink graph• β∙Relevance + (1-β)∙ PageRank

, Enhance link analysis with the help of content relevance• Query-dependent link graph in HITS

• Topic-sensitive PageRank

, Propagate content relevance along the Web structure• The use of anchor text in Search Engines

• Hyperlink-based relevance score propagation (TREC 2003)

• Sitemap-based feature propagation (TREC 2004)

2006-3-13 DCWC 20065

Hyperlink-based Relevance Score Propagation (Zhai et al, TREC2003)

Hyperlink-based Relevance Score Propagation (Zhai et al, TREC2003)

• Assumption, Hyperlinked pages have correlated content

links outlinks

2006-3-13 DCWC 20066

Hyperlink-based Relevance Score Propagation (Zhai et al, TREC2003)

Hyperlink-based Relevance Score Propagation (Zhai et al, TREC2003)

• Assumption, Hyperlinked pages have correlated content

• Propagation model

, Weighted inlink model

, Weighted outlink model

, Uniform outlink model

1 0( ) ( ) ( ) ( , ) ( ) ( , ), ( ) ( )i j

k k k

i I i j O jp p p p

h p s p h p p p h p p p h p s p

1( ) ( ) (1 ) ( ) ( , ),i

k ki I i

p p

h p s p h p p p

1( ) ( ) (1 ) ( ) ( , ),

j

k kj O j

p p

h p s p h p p p

1( ) ( ) (1 ) ( )j

k kj

p p

h p s p h p

( , ) ( )I ip p s p

( , ) ( )O j jp p s p Original relevance score

Propagation from the inllinks

Propagation from the outlinks

2006-3-13 DCWC 20067

Sitemap-based Feature Propagation (Liu and Qin, TREC2004)Sitemap-based Feature Propagation (Liu and Qin, TREC2004)

• Assumption, Child pages are extensions of their parent

page, One should consider the contribution of the

child pages while computing the relevance of the parent page to a query.

• Propagation model

( )

(1 )'( ) ( ) ( )

( )t t tq Child p

f p f p f qChild p

A1

...

A33A32A31

A22A21

2006-3-13 DCWC 20068

Generic Relevance Propagation FrameworkGeneric Relevance Propagation Framework• Modification of the sitemap-based feature propagation model

• Reminder of the hyperlink-based propagation model

• A generic framework to cover both hyperlink-based and sitemap-based propagations

1 0

( )

1( ) ( ) (1 ) ( )

( )k kt t t

q Child p

f p f p f qChild p

( )

(1 )'( ) ( ) ( )

( )t t tq Child p

f p f p f qChild p

1 0( ) ( ), ( )k kpc p g c p c N

1 0( ) ( ) ( ) ( , ) ( ) ( , ), ( ) ( )i j

k k k

i I i j O jp p p p

h p s p h p p p h p p p h p s p

2006-3-13 DCWC 20069

More Derived Propagation ModelsMore Derived Propagation Models

1 0( ) ( ), ( )k kpc p g c p c N

Score level Feature level

HyperlinkHyperlink based score

propagation model

SitemapSitemap based feature

propagation model

Hyperlink-based Feature Propagation Model• Weighted inlink model

• Weighted outlink model

•Uniform outlink model

1 0( ) ( ) (1 ) ( ) ( , )i

k kt t t i It i

p p

f p f p f p p p

1 0( ) ( ) (1 ) ( ) ( , )

j

k kt t t j Ot j

p p

f p f p f p p p

1 0( ) ( ) (1 ) ( )

j

k kt t t j

p p

f p f p f p

Sitemap-based Score Propagation Model1

( )

1( ) ( ) (1 ) ( )

( )k k

q Child p

h p s p h qChild p

2006-3-13 DCWC 200610

Summary: All Models Covered by the Generic FrameworkSummary: All Models Covered by the Generic FrameworkAlgorithm Abbreviation

Weighted in-link case of hyperlink based score propagation model HS-WI

Weighted out-link case of hyperlink based score propagation model HS-WO

Uniform out-link case of hyperlink based score propagation model HS-UO

Weighted in-link case of hyperlink based feature propagation model HF-WI

Weighted out-link case of hyperlink based feature propagation model HF-WO

Uniform out-link case of hyperlink based feature propagation model HF-UO

Sitemap based score propagation model SS

Sitemap based feature propagation model SF

2006-3-13 DCWC 200611

Benchmark DatasetsBenchmark Datasets

• Corpora , .GOV

• 1M pages

• Queries: TD 2003, 2004

, MSN• 2M pages

• Query: 100 most popular queries from MSN query log

• Base Ranking function, BM2500

1 3

3

( 1) ( 1)

( )( )T Q

k tf k qtf

K tf k qtf

2006-3-13 DCWC 200612

P@10

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

Experimental Results (1)Experimental Results (1)

MAP

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

TREC 2003

2006-3-13 DCWC 200613

Experimental Results (2)Experimental Results (2)

P@10

0

0.05

0.1

0.15

0.2

0.25

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

TREC 2004

MAP

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

2006-3-13 DCWC 200614

P@10

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

Experimental Results (3)Experimental Results (3)

MAP

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0 0.2 0.4 0.6 0.8 1

a

SF

SS

HS-WI

HS-WO

HS-UO

HF-WI

HF-WO

HF-UO

MSN

2006-3-13 DCWC 200615

Conclusions on EffectivenessConclusions on Effectiveness

• In general, relevance propagation can boost the search performance with proper parameter settings;

• The sitemap-based models are more effective than the hyperlink-based models;, Hyperlinks ≠ Content Correlation, while the pages in the

same sub site usually talk about correlated topics.

• Detailed comparisons, The two sitemap-based models have similar performance.

, Among the hyperlink-based models, the HF-WI model

performs best.

2006-3-13 DCWC 200616

Online ComplexityOnline Complexity• w is the size of the working set, q is the number of query terms, l

is the average number of inlinks / outlinks, t is the number of iterations.

• For the SS model, the complexity is O(w),, The SS model needs to propagate the relevance score of a

page to its parent only once if we conduct the propagation from the leaf nodes in a bottom-up manner.

• For the SF model, the complexity is O(qw).• For the HS models, the complexity is O(twl)

, In each step of t iterations of the HS models, we need to propagate the relevance score of a page along its in-link or out-link in the sub graph of the working set.

• For the HF models, the complexity is O(tqwl).

2006-3-13 DCWC 200617

Online ComplexityOnline Complexity

Algorithm Complexity average w average l average t average q CPU time

HS-WI O(twl) 6796.5 11.0 7.4 - 47.9

HS-WO O(twl) 6796.5 11.0 6.5 - 36.5

HS-UO O(twl) 6796.5 11.0 6.6 - 39.8

HF-WI O(tqwl) 6796.5 11.0 9.1 1.5 54.0

HT-WO O(tqwl) 6796.5 11.0 11.1 1.5 63.3

HF-UO O(tqwl) 6796.5 11.0 8.9 1.5 51.6

SS O(w) 10000.0 - 1 - 1.9

SF O(qw) 10000.0 - 1 3 8.3

The sitemap-based models are more efficient than the hyperlink-based models

The score-level propagation models are faster than feature-level models

2006-3-13 DCWC 200618

Offline ComplexityOffline Complexity

• Score-level propagation is very difficult to implement offline, The score can only be computed online w.r.t the query.

• For feature-level propagations, , The time complexity of the SF model for offline

implementation is acceptable; • 62.2 hours, or 2.6 days to re-index 8 billion pages

, The time complexity of the HF model is out of tolerance.• 1083 hours, or 45 days to re-index 8 billion pages

, The ST model is easy for parallel implementation while the parallel implementation of the HF model is non-trivial

2006-3-13 DCWC 200619

Conclusions of this StudyConclusions of this Study

• Generally speaking, relevance propagation can boost the performance of web information retrieval.

• Sitemap-based propagation models outperform hyperlink-based propagation models in terms of both effectiveness and efficiency. Notably, sitemap-based propagation can be implemented in parallel.

• Score-level propagation and feature-level propagation have almost similar effectiveness. Although the former is more efficient in on-line implementations, it is not practical for real-world search engines because it can not be implemented offline.

• Overall speaking, sitemap-based feature propagation model is the best choice for real search engines.

Thanks!Thanks!

[email protected]://research.microsoft.com/users/tyliu/