Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
-
Upload
bennett-homer-stanley -
Category
Documents
-
view
213 -
download
0
description
Transcript of Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
Personalizing Web Search
Jaime Teevan, MITwith Susan T. Dumais and Eric Horvitz, MSR
Demo
Personalizing Web Search
MotivationAlgorithmsResultsFuture Work
Personalizing Web Search
MotivationAlgorithmsResultsFuture Work
Study of Personal Relevancy
15 SIS users x ~10 queriesEvaluate 50 results
Highly relevant / Relevant / IrrelevantQuery selection
Previously issued query Chose from 10 pre-selected queries
Collected evaluations for 137 queries 53 of pre-selected queries (2-9/query)
Relevant Results Have Low Rank
1 5 9 13 17 21 25 29 33 37 41 45 49
Rank
Highly Relevant
Relevant
Irrelevant
Same Query, Different Intent
Different meanings “Information about the astronomical/astrological
sign of cancer” “information about cancer treatments”
Different intents “is there any new tests for cancer?” “information about cancer treatments”
Same Intent, Different Evaluation
Query: Microsoft “information about microsoft, the company” “Things related to the Microsoft corporation” “Information on Microsoft Corp”
31/50 rated as not irrelevant Only 6/31 do more than one agree All three agree only for www.microsoft.com
More to Understand
Do people cluster? Even if they can’t state their intention
How are the differences reflected? Can they be seen from the information on a
person’s computer?Can we do better than the ranking that
would make everyone the most happy? Best common ranking: +38% Best personalized ranking: +55%
Personalizing Web Search
MotivationAlgorithmsResultsFuture Work
Personalization Algorithms
Standard IR
Related to relevance feedbackQuery expansion
Document
Query
User
Server
Client
v. Result re-ranking
Result Re-Ranking
Takes full advantage of SISEnsures privacyGood evaluation frameworkLook at light weight user models
Collected on server side Sent as query expansion
BM25
N
ni
N
ni
wi = log
riR
with Relevance Feedback
Score = Σ tfi * wi
BM25 with Relevance Feedback
N
ni
(ri+0.5)(N-ni-R+ri+0.5)
(ni-ri+0.5)(R-ri+0.5)
riR
wi = log
Score = Σ tfi * wi
(ri+0.5)(N-ni-R+ri+0.5)
(ni- ri+0.5)(R-ri+0.5)
User Model as Relevance Feedback
N
ni
Rri
Score = Σ tfi * wi
(ri+0.5)(N’-ni’-R+ri+0.5)
(ni’- ri+0.5)(R-ri+0.5)wi = log
N’ = N+R
ni’ = ni+ri
User Model as Relevance Feedback
N
ni
Rri
World
User
Score = Σ tfi * wi
User Model as Relevance Feedback
Rri
User
N
ni
World
World related to query
Nni
Score = Σ tfi * wi
User Model as Relevance Feedback
N
ni
Rri
World
UserWorld related to query
User related to query
R
Nni
ri
Query Focused Matching
Score = Σ tfi * wi
User Model as Relevance Feedback
N
ni
Rri
World
UserWeb related to query
User related to query
R
N ri
Query Focused Matching
ni
World Focused Matching
Score = Σ tfi * wi
Parameters
Matching
User representation
World representation
Query expansion
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused
User Representation
Stuff I’ve Seen (SIS) indexRecently indexed documentsWeb documents in SIS indexQuery historyRelevance judgmentsNone
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Parameters
Matching
User representation
World representation
Query expansion
Query Focused
World Focused All SISRecent SISWeb SISQuery HistoryRelevance FeedbackNone
World Representation
Document Representation Full text Title and snippet
Corpus Representation Web Result set – title and snippet Result set – full text
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Full textTitle and snippet
WebResult set – full textResult set – title and snippet
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Full textTitle and snippet
WebResult set – full textResult set – title and snippet
Query Expansion
All words in document
Query focused
The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...
The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Full textTitle and snippet
WebResult set – full textResult set – title and snippet
All words
Query focused
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Full textTitle and snippet
WebResult set – full textResult set – title and snippet
All words
Query focused
Parameters
Matching
User representation
World representation
Query expansion
Query focused
World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone
Full textTitle and snippet
WebResult set – full textResult set – title and snippet
All words
Query focused
Personalizing Web Search
MotivationAlgorithmsResultsFuture Work
Baselines
Best possibleRandomText based rankingWeb rankingURL Boost
http://mail.yahoo.com/inbox/msg10
http://mail.yahoo.com/inbox/msg10
http://mail.yahoo.com/inbox/msg10
+1
+1
Best Parameter Settings
Richer user representation better SIS > Recent > Web > Query History > None
Suggests rich client importantEfficiency hacks don’t hurt
Snippets query focused Length normalization not an issue
Query focus good
Text Alone Not Enough
Better than some baselines Better than random Better than no user representation Better than relevance feedback
Worse than Web resultsBlend in other features
Web ranking URL boost
Good, but Lots of Room to Grow
Best combination: 9.1% improvementBest possible: 51.5% improvementAssumes best Web combination selectedOnly improves results 2/3 of the time
Personalizing Web Search
MotivationAlgorithmsResultsFuture Work
Finding the Best Parameter Setting
Almost always some parameter setting that improves results
Use learning to select parameters Based on individual Based on query Based on results
Give user control?
Further Exploration of Algorithms
Larger parameter space to explore More complex user model subsets Different parsing (e.g., phrases) Tune BM25 parameters
What is really helping? Generic user model or personal model Use different indices for the queries
Deploy system
Practical Issues
Efficiency issues Can interfaces mitigate some of the issues?
Merging server and client Query expansion
Get more relevant results in the set to be re-ranked Design snippets for personalization
Thank you!