Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Personalizing Web Search

Jaime Teevan, MITwith Susan T. Dumais and Eric Horvitz, MSR


MotivationAlgorithmsResultsFuture Work

Study of Personal Relevancy

15 SIS users x ~10 queriesEvaluate 50 results

Highly relevant / Relevant / IrrelevantQuery selection

Previously issued query Chose from 10 pre-selected queries

Collected evaluations for 137 queries 53 of pre-selected queries (2-9/query)

Relevant Results Have Low Rank

1 5 9 13 17 21 25 29 33 37 41 45 49

Rank

Highly Relevant

Relevant

Irrelevant

Same Query, Different Intent

Different meanings “Information about the astronomical/astrological

sign of cancer” “information about cancer treatments”

Different intents “is there any new tests for cancer?” “information about cancer treatments”

Same Intent, Different Evaluation

Query: Microsoft “information about microsoft, the company” “Things related to the Microsoft corporation” “Information on Microsoft Corp”

31/50 rated as not irrelevant Only 6/31 do more than one agree All three agree only for www.microsoft.com

More to Understand

Do people cluster? Even if they can’t state their intention

How are the differences reflected? Can they be seen from the information on a

person’s computer?Can we do better than the ranking that

would make everyone the most happy? Best common ranking: +38% Best personalized ranking: +55%

Personalization Algorithms

Standard IR

Related to relevance feedbackQuery expansion

Document

Query

User

Server

Client

v. Result re-ranking

Result Re-Ranking

Takes full advantage of SISEnsures privacyGood evaluation frameworkLook at light weight user models

Collected on server side Sent as query expansion

BM25

N

ni

N

ni

wi = log

riR

with Relevance Feedback

Score = Σ tfi * wi

BM25 with Relevance Feedback

N

ni

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)

riR

wi = log

Score = Σ tfi * wi

(ri+0.5)(N-ni-R+ri+0.5)

(ni- ri+0.5)(R-ri+0.5)

User Model as Relevance Feedback

N

ni

Rri

Score = Σ tfi * wi

(ri+0.5)(N’-ni’-R+ri+0.5)

(ni’- ri+0.5)(R-ri+0.5)wi = log

N’ = N+R

ni’ = ni+ri


N

ni

Rri

World

User

Score = Σ tfi * wi


Rri

User

N

ni

World

World related to query

Nni

Score = Σ tfi * wi


N

ni

Rri

World

UserWorld related to query

User related to query

R

Nni

ri

Query Focused Matching

Score = Σ tfi * wi


N

ni

Rri

World

UserWeb related to query

User related to query

R

N ri

Query Focused Matching

ni

World Focused Matching

Score = Σ tfi * wi

Parameters

Matching

User representation

World representation

Query expansion

Parameters

Matching

User representation


Query expansion

Query focused

World focused

User Representation

Stuff I’ve Seen (SIS) indexRecently indexed documentsWeb documents in SIS indexQuery historyRelevance judgmentsNone

Parameters

Matching

User representation


Query expansion

Query focused

World focused All SISRecent SISWeb SISQuery historyRelevance feedbackNone

Parameters

Matching

User representation


Query expansion

Query Focused

World Focused All SISRecent SISWeb SISQuery HistoryRelevance FeedbackNone

World Representation

Document Representation Full text Title and snippet

Corpus Representation Web Result set – title and snippet Result set – full text

Parameters

Matching

User representation


Query expansion

Query focused


Full textTitle and snippet

WebResult set – full textResult set – title and snippet

Query Expansion

All words in document

Query focused

The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...

The American Cancer Society is dedicated to eliminating cancer as a major health problem by preventing cancer, saving lives, and diminishing suffering through ...

Parameters

Matching

User representation


Query expansion

Query focused


Full textTitle and snippet

WebResult set – full textResult set – title and snippet

All words

Query focused

Baselines

Best possibleRandomText based rankingWeb rankingURL Boost

http://mail.yahoo.com/inbox/msg10



+1

+1

Best Parameter Settings

Richer user representation better SIS > Recent > Web > Query History > None

Suggests rich client importantEfficiency hacks don’t hurt

Snippets query focused Length normalization not an issue

Query focus good

Text Alone Not Enough

Better than some baselines Better than random Better than no user representation Better than relevance feedback

Worse than Web resultsBlend in other features

Web ranking URL boost

Good, but Lots of Room to Grow

Best combination: 9.1% improvementBest possible: 51.5% improvementAssumes best Web combination selectedOnly improves results 2/3 of the time

Finding the Best Parameter Setting

Almost always some parameter setting that improves results

Use learning to select parameters Based on individual Based on query Based on results

Give user control?

Further Exploration of Algorithms

Larger parameter space to explore More complex user model subsets Different parsing (e.g., phrases) Tune BM25 parameters

What is really helping? Generic user model or personal model Use different indices for the queries

Deploy system

Practical Issues

Efficiency issues Can interfaces mitigate some of the issues?

Merging server and client Query expansion

Get more relevant results in the set to be re-ranked Design snippets for personalization

Thank you!

Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.

Documents

Transcript of Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.