Personalizing Search

27
Personalizing Search Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR

description

Personalizing Search. Jaime Teevan, MIT Susan T. Dumais, MSR and Eric Horvitz, MSR. Relevant result. Query:. “pia workshop”. Outline. Approaches to personalization The PS algorithm Evaluation Results Future work. Approaches to Personalization. Content of user profile - PowerPoint PPT Presentation

Transcript of Personalizing Search

Page 1: Personalizing Search

Personalizing SearchJaime Teevan, MIT

Susan T. Dumais, MSR

and Eric Horvitz, MSR

Page 2: Personalizing Search

Relevant result

“pia workshop”Query:

Page 3: Personalizing Search

Outline

Approaches to personalizationThe PS algorithmEvaluationResultsFuture work

Page 4: Personalizing Search

Approaches to Personalization

Content of user profile Long-term interests

Liu, et al. [14], Compass Filter [13] Short-term interests

Query refinement [2,12,15], Watson [4]

How user profile is developed Explicit

Relevance feedback [19], query refinement [2,12,15] Implicit

Query history [20, 22], browsing history [16, 23]

Very rich user profile

Page 5: Personalizing Search

PS Search Engine

query

Page 6: Personalizing Search

PS Search Engine

query

dog cat monkey banana

food

baby infant

child boy girl

forest hiking

walking gorp

baby infant

child boy girl

csail mit artificial research

robotweb

search retrieval ir

hunt

Page 7: Personalizing Search

1.6 0.26.0

0.2 2.7

1.3

PS Search Engine

query

Search results page

web search retrieval ir hunt

1.3

Page 8: Personalizing Search

Calculating a Document’s Score

Based on standard tf.idf

Score = Σ tfi * wi

web search retrieval ir hunt

1.3

Page 9: Personalizing Search

Calculating a Document’s Score

Based on standard tf.idf

Score = Σ tfi * wi

Σ0.1

0.05 0.5 0.35

0.3

1.3

Page 10: Personalizing Search

0.3 0.7 0.1 0.23 0.6 0.6

0.002 0.7 0.1 0.01 0.6

0.2 0.8 0.1 0.001

0.3 0.4

0.1 0.7 0.001

0.23 0.6

0.1 0.7 0.001 0.23 0.6

0.1 0.05

0.5 0.35 0.3

0.1 0.05

0.5 0.35 0.3

N

ni

Calculating a Document’s Score

Based on standard tf.idf

Score = Σ tfi * wi

World (N)

(ni)wi = log

Σ 1.3

Page 11: Personalizing Search

0.002 0.7 0.1 0.01 0.6

0.002 0.7 0.1 0.01 0.6

0.3 0.7 0.1 0.23 0.6 0.6

0.002 0.7 0.1 0.01 0.6

0.2 0.8 0.1 0.001

0.3 0.4

0.1 0.7 0.001

0.23 0.6

0.1 0.7 0.001 0.23 0.6

0.1 0.05

0.5 0.35 0.3

N

ni

Calculating a Document’s Score

Based on standard tf.idf

Score = Σ tfi * wi

(N)

(ni)wi = logWorld

ri R

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)wi = log

† From Sparck Jones, Walker and Roberson, 1998 [21].

Where: N = N+R, ni = ni+ri’’

Client

(ri+0.5)(N-ni-R+ri+0.5)

(ni-ri+0.5)(R-ri+0.5)wi = log

Page 12: Personalizing Search

Finding the Parameter Values

Corpus representation (N, ni) How common is the term in general? Web vs. result set

User representation (R, ri) How well does it represent the user’s interest? All vs. recent vs. Web vs. queries vs. none

Document representation What terms to sum over? Full document vs. snippet

web search retrieval ir hunt

Page 13: Personalizing Search

Building a Test Bed

15 evaluators x ~10 queries 131 queries total

Personally meaningful queries Selected from a list Queries issued earlier (kept diary)

Evaluate 50 results for each query Highly relevant / relevant / irrelevant

Index of personal information

Page 14: Personalizing Search

Evaluating Personalized Search

Measure algorithm quality

DCG(i) = {Look at one parameter at a time

67 different parameter combinations! Hold other parameters constant and vary one

Look at best parameter combination Compare with various baselines

Gain(i),DCG(i–1) + Gain(i)/log(i),

if i = 1otherwise

Page 15: Personalizing Search

Analysis of Parameters

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

0.35Full

text

Web

Snip

pet

None

Query

Web

Recent

All

Snip

pet

Full

text

User

Page 16: Personalizing Search

Analysis of Parameters

0.27

0.28

0.29

0.3

0.31

0.32

0.33

0.34

0.35Full

text

Web

Snip

pet

None

Query

Web

Recent

All

Snip

pet

Full

text

Corpus User Document

Page 17: Personalizing Search

PS Improves Text Retrieval

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

No modelRelevance

FeedbackPersonalized

Search0.37

0.410.46

Page 18: Personalizing Search

Text Features Not Enough

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

0.370.41

0.46

0.56

Page 19: Personalizing Search

Take Advantage of Web Ranking

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

No RF PS Web Combo

DC

G

0.370.41

0.46

0.56 0.58

PS+Web

Page 20: Personalizing Search

Summary

Personalization of Web search Result re-ranking User’s documents as relevance feedback

Rich representations important Rich user profile particularly important Efficiency hacks possible Need to incorporate features beyond text

Page 21: Personalizing Search

Further Exploration

Improved non-text components Usage data Personalized PageRank

Learn parameters Based on individual Based on query Based on results

UIs for user control

Page 22: Personalizing Search

User Interface Issues

Make personalization transparentGive user control over personalization

Slider between Web and personalized results Allows for background computation

Exacerbates problem with re-finding Results change as user model changes Thesis research – Re:Search Engine

Page 24: Personalizing Search
Page 25: Personalizing Search

Much Room for Improvement

Group ranking Best improves on

Web by 23% More people

Less improvement

Personal ranking Best improves on

Web by 38% Remains constant

0.8

0.85

0.9

0.95

1

1.05

1 2 3 4 5 6

Number of People

DC

G

Personalized Group

Potential forPersonalization

Page 26: Personalizing Search

Evaluating Personalized Search

Query selection Chose from 10 pre-selected queries Previously issued query

cancerMicrosofttraffic…

bison friseRed Soxairlines…

Las VegasriceMcDonalds…

Pre-selected

53 pre-selected (2-9/query)

Total: 137

JoeMary

Page 27: Personalizing Search

Making PS Practical

Learn most about personalization by deploying a system

Best algorithm reasonably efficientMerging server and client

Query expansion Get more relevant results in the set to be re-ranked

Design snippets for personalization