Retrieval and Feedback Models for Blog Feed Search
-
Upload
jelsas -
Category
Technology
-
view
2.809 -
download
3
description
Transcript of Retrieval and Feedback Models for Blog Feed Search
![Page 1: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/1.jpg)
SIGIR 2008Singapore
Jonathan Elsas, Jaime Arguello,
Jamie Callan & Jaime Carbonell
LTI/SCS/CMU
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Retrieval and Feedback Models for Blog Feed
Search
![Page 2: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/2.jpg)
Outline
• The task– Overview of Blogs & Blog Search– Challenges in Blog Search
• Our approach– Retrieval Models– Query Expansion Models
• Conclusion
![Page 3: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/3.jpg)
Background
![Page 4: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/4.jpg)
What is a Blog?
![Page 5: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/5.jpg)
What is a Feed?<xml>
<feed>
<entry>
<author>Peter …</>
<title>Good, Evil…</>
<content>I’ve said…</>
</entry>
<entry>
<author>Peter …</>
<title>Agreeing…</>
<content>Some peo…</>
</entry>
…
![Page 6: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/6.jpg)
Blog-Feed Correspondence
Blog Feed
Post Entry
HTMLHTML XMLXML
![Page 7: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/7.jpg)
Why are Blogs important?
Technorati currently tracking > 112.8 Million Blogs> 175,000 new Blogs per day> 1.6 Million posts per day
[http://www.technorati.com/about/]
![Page 8: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/8.jpg)
The Task
![Page 9: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/9.jpg)
Feed Search at TREC
Ranking Blogs/Feeds (collections of posts) in response to a user’s query, [X]
“A relevant feed should have a principle and recurring interest in X”
— TREC 2007 Blog Track
(a.k.a. Blog Distillation)
![Page 10: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/10.jpg)
Feed Search at TREC
[Gardening][Apple iPod]
[Violence in Sudan][Gun Control]
[Food][Wine]
RepresentOngoing
Information Needs
FrequentlyVery
General
![Page 11: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/11.jpg)
Challenges in Feed Search
![Page 12: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/12.jpg)
Challenges in Feed Search
entries
time
feed
1.A feed is a collection of documents
![Page 13: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/13.jpg)
1.A feed is a collection of documents – How does relevance at the entry level
correspond to relevance at the feed level?
Challenges in Feed Search
entries
time
feed
![Page 14: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/14.jpg)
Challenges in Feed Search
2. Even a topical feed is topically diverse
time
NASA
China’s plans for the moon
shuttle launch
My dog
Mars rover
Boeing
Space Exploration
topic
![Page 15: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/15.jpg)
Challenges in Feed Search
2. Even a topical feed is topically diverse– Can we favor entries close to the
central topic of the feed?
Space Exploration
time
topic
![Page 16: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/16.jpg)
Challenges in Feed Search
3. Feeds are noisy– Spam blogs, Spam & off topic comments
time
![Page 17: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/17.jpg)
Challenges in Feed Search
4. General & Ongoing Information Needs
[Mac]
[Music]
[Food]
[Wine]
… post regularly about new products, features, or application software of Apple Mac computers.
… describing songs, biographies of musicians, musical styles andtheir influences of music on people are discussed.
…such as tastings, reviews, food matching or pairing, and oenophile news and events.
… describing experiences eating cuisines, culinary delights,recipes, nutrition plans.
![Page 18: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/18.jpg)
Our Approach
![Page 19: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/19.jpg)
Retrieval Models
Feedback Models
Feeds:Topically Diverse
Noisy
Collections
Information Needs:
General & Ongoing
ChallengesOur
Approach
![Page 20: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/20.jpg)
Retrieval Models
• Challenge: ranking topically diverse
collections
• Representation: feed vs. entry• Model topical relationship between entries
![Page 21: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/21.jpg)
Large Document (Feed) Model
<?xml……
</…>
`<?xml……
</…>
<?xml……
</…>
<?xml…<feed><entry><entry><entry><entry><entry>
…</…>
<?xml……
</…>
<?xml……
</…>
<?xml……
</…>
<?xml…<feed><entry><entry><entry><entry><entry>
…</…>
Feed Document Collection
[Q]
Ranked Feeds
Rank by
Indri’s standard retrieval model[Metzler and Croft, 2004; 2005]
![Page 22: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/22.jpg)
Large Document (Feed) Model
Advantages:
• A straightforward application of existing retrieval techniques
Potential Pitfalls:
• Large entries dominate a feed’s language model
• Ignores relationship among entries
Feed
Entry E E Entry Entry E
![Page 23: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/23.jpg)
Small Document (Entry) Model
<entry><entry><entry><entry><?xml…<entry>
Entry Document Collection
<entry><entry><entry><entry><?xml…<entry>
<entry><entry><entry><entry><?xml…<entry>
<entry><entry><entry><entry><?xml…<entry>
<entry><entry><entry><entry><?xml…<entry>
<entry><entry><entry><entry><?xml…<entry>
<entry><entry><entry><entry><?xml…<entry>
Ranked FeedsRanked Entriesdocument = entry
[Q]
Apply some rankaggregation function
Rank By
![Page 24: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/24.jpg)
Small Document (Entry) Model
• Query Likelihood• Entry Centrality• Feed Prior: favors longer feeds
ReDDE Federated Search Algortihm[Si & Callan, 2003]
![Page 25: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/25.jpg)
Entry Centrality
Uniform :
Geometric Mean :
time
topic
![Page 26: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/26.jpg)
Small Document (Entry) Model
Advantages:• Controls for differing entry length
• Models topical relationship among entries
Disadvantages:• Centrality computation is slow(er)
Q
Not only improves speed, Also performance
![Page 27: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/27.jpg)
Retrieval Model Results
![Page 28: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/28.jpg)
Retrieval Model Results
• 45 Queries from the TREC 2007 Blog Distillation Task
• BLOG06 test collection, XML feeds only
• 5-Fold Cross Validation for all retrieval model smoothing parameters
![Page 29: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/29.jpg)
Retrieval Model Results
0.29
0.277
0.290.298
0.315
0.245
0.265
0.285
0.305
0.325
Mean Average Precision
LargeDocument(Feed)Model
Small Document (Entry) Models
![Page 30: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/30.jpg)
Retrieval Model Results
0.29
0.277
0.290.298
0.315
0.245
0.265
0.285
0.305
0.325
Mean Average Precision
Uniform Log(Feed Length)UniformLog PriorMap 0.188
![Page 31: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/31.jpg)
Retrieval Model Results
0.29
0.277
0.290.298
0.315
0.245
0.265
0.285
0.305
0.325
Mean Average Precision
Uniform Log(Feed Length)Uniform
n/a
![Page 32: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/32.jpg)
Feedback Models
• Challenge: Noisy collection with general
& ongoing information needs
• Use a cleaner external collection for query expansion (Wikipedia)
• With an expansion technique designed to identify multiple query facets
![Page 33: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/33.jpg)
Query Expansion (PRF)
[Q]
BLOG06Collection
Related Terms from top K documents[Q + Terms]
[Lavrenko & Croft, 2001]
![Page 34: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/34.jpg)
Query Expansion Example
Idealdigital
photography
depth of field
photographic film
photojournalism
cinematography
[Photography]PRF
photographynudeeroticartgirlfreeteen
fashionwomen
![Page 35: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/35.jpg)
Feedback Model Results
0.2
0.24
0.28
0.32
0.36
BLOG LD BLOG SD
Mean Average Precision None PRF
![Page 36: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/36.jpg)
Query Expansion (Wikipedia PRF)
[Q]
BLOG06Collection
[Q + Terms]
[Lavrenko & Croft, 2001]
Wikipedia
[Diaz & Metzler, 2006]
Related Terms from top K documents
![Page 37: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/37.jpg)
Query Expansion Example
Idealdigital
photography
depth of field
photographic film
photojournalism
cinematography
[Photography]PRF
photographynudeeroticartgirlfreeteen
fashionwomen
Wikipedia PRFphotographydirectorspecialfilmart
cameramusic
cinematographerphotographic
![Page 38: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/38.jpg)
Feedback Model Results
0.2
0.24
0.28
0.32
0.36
BLOG LD BLOG SD
Mean Average Precision None PRF Wiki. PRF
![Page 39: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/39.jpg)
Query Expansion (Wikipedia Link)
[Q]
BLOG06Collection
[Q + Terms]
Wikipedia
Related Terms from link structure
![Page 40: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/40.jpg)
Wikipedia Link-BasedQuery Expansion
![Page 41: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/41.jpg)
Wikipedia Link-Based ExpansionWikipedia
…
Q
![Page 42: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/42.jpg)
Wikipedia Link-Based Expansion
…
Wikipedia
Relevance Set, Top R = 100
Working Set, Top W = 1000
Q
![Page 43: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/43.jpg)
Wikipedia Link-Based Expansion
…
Wikipedia
Q
Relevance Set, Top R = 100
Working Set, Top W = 1000
![Page 44: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/44.jpg)
Wikipedia Link-Based Expansion
Relevance Set, Top R = 100
Working Set, Top W = 1000
…
Wikipedia
Extract anchor text fromWorking Set that link tothe Relevance Set.
Q
![Page 45: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/45.jpg)
Wikipedia Link-Based Expansion
Relevance Set, Top R = 500
Working Set, Top W = 1000
…
Wikipedia
Extract anchor text fromWorking Set that link tothe Relevance Set.
Q
Combines relevance and popularity
Relevance: An anchor phrase that links to a high ranked article gets a high score
Popularity: An anchor phrase that links many times to a mid-ranked articles also gets high score
![Page 46: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/46.jpg)
Query Expansion Example
Wikipedia Link-Based
photographyphotographer
digital photographyphotographicdepth of field
feature photographyfilm
photographic filmphotojournalism
[Photography]PRF
photographynudeeroticartgirlfreeteen
fashionwomen
Idealdigital photography
depth of field
photographic film
photojournalism
cinematography
![Page 47: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/47.jpg)
Feedback Model Results
0.2
0.24
0.28
0.32
0.36
0.4
BLOG LD BLOG SD
Mean Average Precision None PRF Wiki. PRFWiki. Link
![Page 48: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/48.jpg)
Conclusion
• Feed Search Challenges:– Feeds are topically diverse, noisy collections
– Ranked against ongoing & general information needs
• Novel Retrieval Models:– Ranking collections, sensitive to topical relationship among entries
• Novel Feedback Models:– Discover multiple query facets & robust to collection noise
![Page 49: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/49.jpg)
Thank You!
Student Travel Grant funding from: ACM SIGIR, Amit Singhal, Microsoft Research
![Page 50: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/50.jpg)
Entry Centrality GM Derivation
where
Entry Generation Likelihood:
|E|
![Page 51: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/51.jpg)
Query Expansion Examples
Wikipedia ExpansionMusic
Folk musicElectronic music
FolkMusic videoWorld music
AmbientElectronic
Country music
[Music]
PRFMusicCountryDownloadFreeMP3Mp3andmoreLyricListenSong
![Page 52: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/52.jpg)
Query Expansion Examples
Wikipedia Expansionscotland
scottish parliamentscottish
scottish national party wars of scottish
independencescottish independence
william wallaceglasgow
scottish socialist party
[Scottish Independence]
PRFscotlandindependencepartyconventionpoliticssnpnationalpeoplescot
![Page 53: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/53.jpg)
Query Expansion Examples
Wikipedia Expansionmachine learning
learningartificial intelligence
turing machine machine gun
neural networksupport vector machine
supervised learningartificial neural network
[Machine Learning]
PRFlearnmachinecreditcardkaraokejournalsexmodelsew
![Page 54: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/54.jpg)
Query Generality Characteristics• Query Length:
– BLOG: 1.9 words – TB04: 3.2 words– TB05: 3.0 words
• ODP Depth– BLOG: 4.7 levels– TB04: 5.2 levels– TB05: 5.3 levels
![Page 55: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/55.jpg)
Relevance Set Cohesiveness
…
Wikipedia
Relevance Set, Top R = 100 Cohesivenes
s
=| Lin |
| Lin U Lout |
![Page 56: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/56.jpg)
Relevant Set Cohesiveness
![Page 57: Retrieval and Feedback Models for Blog Feed Search](https://reader035.fdocuments.us/reader035/viewer/2022062704/55614ce9d8b42aa20d8b4b4e/html5/thumbnails/57.jpg)
Is it the Queries?
Feed Search Queries ≠
TB Adhoc Queries
But, none of these measurespredict whether wikipedia
expansions helps…