Post on 29-May-2020
Table of Content
• Search Engine Logs
• Eyetracking data on position bias
Click data for ranker training [Joachims, KDD02]
• Case study: Use of click data for search ranking [ Agichtein et al, SIGIR 06]
5
mustang ford mustang
Nova
… www.fordvehicles.com/
cars/mustang
www.mustang.com
en.wikipedia.org/wiki/
Ford_Mustang AlsoTry
Search
sessions
Query session
Query sessions and analysis
6
Session
Mission Mission Mission …
Query Query Query
Click Click Click
Query Query
Click Click
fixation fixation fixation
Query level
Click level
Eye-tracking level
Query-URL correlations:
• Query-to-pick
• Query-to-query
• Pick-to-pick
Examples of behavior analysis with
search logs
• Query-pick (click) analysis
• Session detection
• Classification
x1, x2, …, xN y
eg, whether the session has a commercial intent
• Sequence labeling
x1, x2, …, xN y1, y2, …, yN
eg, segment a search sequence into missions and goals
• Prediction
x1, x2, …, xN-1 yN
• Similarity
Similarity(S1, S2)
Query-pick (click) analysis
• Search Results for “CIKM”
3/6/2014
8
CIKM'09 Tutorial, Hong Kong, China
# of clicks received
Interpret Clicks: an Example
• Clicks are good…
Are these two clicks
equally “good”?
• Non-clicks may have
excuses:
Not relevant
Not examined
3/6/2014
CIKM'09 Tutorial, Hong Kong, China 9
Use of behavior data
• Adapt ranking to user clicks?
3/6/2014
10
CIKM'09 Tutorial, Hong Kong, China
# of clicks received
Non-trivial cases
• Tools needed for non-trivial cases
3/6/2014
11
CIKM'09 Tutorial, Hong Kong, China
# of clicks received
Higher positions receive more user attention (eye fixation) and clicks than lower positions.
This is true even in the extreme setting where the order of positions is reversed.
“Clicks are informative but biased”.
14
3/6/2014
CIKM'09 Tutorial, Hong Kong, China
[Joachims+07]
Click Position-bias
Normal Position
Pe
rce
nta
ge
Reversed Impression
Pe
rce
nta
ge
Clicks as Relative Judgments for Rank
Training
• “Clicked > Skipped Above” [Joachims, KDD02]
3/6/2014
CIKM'09 Tutorial, Hong Kong, China 15
Preference pairs: #5>#2, #5>#3, #5>#4.
Use Rank SVM to optimize the retrieval function.
Limitation:
Confidence of judgments
Little implication to user modeling
1
2
3
4
5
6
7
8
Additional relation for relative relevance
judgments
click > skip above
last click > click above
click > click earlier
last click > click previous
click > no-click next
17
Web Search Ranking by Incorporating User Behavior
Information Rank pages relevant for a query
•Eugene Agichtein, Eric Brill, Susan Dumais SIGIR
2006
• Categories of Features (Signals) for Web Search
Ranking
Content match
– e.g., page terms, anchor text, term weights, term span
Document quality
– e.g., web topology, spam features
• Add one more category:
Implicit user feedback from click data
18
Related Work
• Personalization
Rerank results based on user’s clickthrough and
browsing history
• Collaborative filtering
Amazon, Ask.com DirectHit: rank by clickthrough
• General ranking
Joachims et al. [KDD 2002], Radlinski et al. [KDD
2005]: tuning ranking functions with clickthrough
19
Rich User Behavior Feature Space
• Observed and distributional features
Aggregate observed values over all user interactions
for each query and result pair
Distributional features: deviations from the “expected”
behavior for the query
• Represent user interactions as vectors in
user behavior space
Presentation: what a user sees before a click
Clickthrough: frequency and timing of clicks
Browsing: what users do after a click
20
Ranking Features (Signals)
Presentation
ResultPosition Position of the URL in Current ranking
QueryTitleOverlap Fraction of query terms in result Title
Clickthrough
DeliberationTime Seconds between query and first click
ClickFrequency Fraction of all clicks landing on page
ClickDeviation Deviation from expected click frequency
Browsing
DwellTime Result page dwell time
DwellTimeDeviation Deviation from expected dwell time for query
24
Training a User Behavior Model
• Map user behavior features to relevance
judgements
• RankNet: Burges et al., [ICML 2005]
Neural Net based learning
Input: user behavior + relevance labels
Output: weights for behavior feature values
Used as testbed for all experiments
25
User Behavior Models for Ranking
• Use interactions from previous instances of query General-purpose (not personalized)
Only available for queries with past user interactions
• 3 Models: Rerank, clickthrough only:
reorder results by number of clicks
Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences
Integrate directly into ranker: incorporate user behavior features with other categories of ranking (e.g. text matching)
26
Evaluation Metrics
• Precision at K: fraction of relevant in top K
• NDCG at K: norm. discounted cumulative gain Top-ranked results most important
• MAP: mean average precision Average precision for each query: mean of the
precision at K values computed after each relevant document was retrieved
K
j
jr
qq jMN1
)( )1log(/)12(
27
Datasets
• 8 weeks of user behavior data from anonymized opt-in client instrumentation
• Millions of unique queries and interaction traces
• Random sample of 3,000 queries Gathered independently of user behavior
1,500 train, 500 validation, 1,000 test
• Explicit relevance assessments for top 10 results for each query in sample
28
Methods Compared
• Full Search Engine
Content match feature uses BM25F
A variation of TF-IDF model
• Compare 4 ranking models
BM25F only
Clickthrough: called Rerank-CT
– Rerank these queries with sufficient historic click data
Full user behavior model predictions: called
Rerank-All
Integrate all user behavior features directly: +All
– User behavior features + content match
29
Content, User Behavior:
Precision at K, queries with interactions
BM25 < Rerank-CT < Rerank-All < +All
0.38
0.43
0.48
0.53
0.58
0.63
1 3 5 10K
Pre
cis
ion
BM25
Rerank-CT
Rerank-All
BM25+All
30
Content, User Behavior: NDCG
BM25 < Rerank-CT < Rerank-All < +All
0.5
0.52
0.54
0.56
0.58
0.6
0.62
0.64
0.66
0.68
1 2 3 4 5 6 7 8 9 10K
ND
CG
BM25
Rerank-CT
Rerank-All
BM25+All
31
Impact: All Queries, Precision at K
< 50% of test queries w/ prior interactions
+0.06-0.12 precision over all test queries
0.4
0.45
0.5
0.55
0.6
0.65
0.7
1 3 5 10K
Pre
cis
ion
RNRerank-AllRN+All
32
Impact: All Queries, NDCG
+0.03-0.05 NDCG over all test queries
0.56
0.58
0.6
0.62
0.64
0.66
0.68
0.7
1 2 3 4 5 6 7 8 9 10K
ND
CG
RN
Rerank-All
RN+All
33
Which Queries Benefit Most
0
50
100
150
200
250
300
350
0.1 0.2 0.3 0.4 0.5 0.6
-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.2
Frequency Average Gain
Most gains are for queries with poor ranking
34
Conclusions
• Incorporating user behavior into web search ranking dramatically improves relevance
• Providing rich user interaction features to ranker is the most effective strategy
• Large improvement shown for up to 50% of test queries