Search Query Disambiguation from Short Sessions
description
Transcript of Search Query Disambiguation from Short Sessions
![Page 1: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/1.jpg)
1
Search Query Disambiguation
from Short SessionsLilyana Mihalkova & Raymond
Mooney
The University of Texas at Austin
![Page 2: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/2.jpg)
2
Query Disambiguation
scrubs
?
![Page 3: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/3.jpg)
3
Existing Work
• Well-studied problem: [e.g., Sugiyama et al. 04, Sun et al. 05, Dou et al. 07]
• Common Assumption: Information about each user is available over a relatively long period of time.
![Page 4: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/4.jpg)
4
Privacy Concerns
• NY Times: “A Face is Exposed for AOL Searcher no. 4417749”
• [Conti, 06]: “Googling Considered Harmful”
![Page 5: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/5.jpg)
5
Pragmatic Concerns
• Identifying users across search sessions– Log-in?– IP Address?
• Managing and protecting user-specific information
![Page 6: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/6.jpg)
6
Proposed Setting
• Base personalization only on short-term search histories– complete search histories cannot be
reconstructed
• Relate current session to previous short sessions of other users, based on the search activity in these sessions
![Page 7: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/7.jpg)
7
How Short is Short-Term?N
umbe
r of
ses
sion
s w
ith th
at m
any
quer
ies
Number of queries before ambiguous query
![Page 8: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/8.jpg)
8
Is This Enough Info?
98.7 fm
kroq
scrubs
www.star987.com
www.kroq.com
???
huntsville hospital
ebay.com
scrubs
www.huntsvillehospital.com
www.ebay.com
???scrubs-tv.com scrubs.com
![Page 9: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/9.jpg)
9
More Closely Related Work
• [Almeida & Almeida 04]: Similar assumption of short sessions, but better suited for a specialized search engine (e.g. on computer science literature)
• [Krause & Horvitz 08]: Explicitly models the tradeoff between better performance and more user information.
![Page 10: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/10.jpg)
10
Main Challenge
• How to harness this small amount of potentially noisy information available for each user?– Exploit the relations among users,
sessions, URLs– Use statistical relational learning (SRL)
[Getoor & Taskar 07]
![Page 11: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/11.jpg)
11
Using Relational Information
huntsville hospital
ebay
scrubs
huntsvillehospital.org
ebay.com
???
huntsville school
. . .
. . .
hospitallink.com
scrubs
scrubs-tv.com
…
ebay.com
scrubs
scrubs.com
![Page 12: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/12.jpg)
12
Details
• Used Markov logic networks (MLNs) [Richardson & Domingos 06]– MLN structure is provided as domain
knowledge– Weights are learned from the data
• Weight learning: Adapted contrastive divergence [Lowd & Domingos 07] for incremental learning
![Page 13: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/13.jpg)
13
Predicates
• Evidence predicates– provide information about clicked URLs and
keywords shared between sessions, i.e.• shares-keyword-between-clicks(ActiveS, backgroundS,
keyword)• shares-keyword-between-click-and-search(ActiveS,
backgroundS, keyword)• shares-clicks(ActiveS, BackgroundS, hostname)
– provide information about clicked URLs and keywords in current session
• Query predicate– states that user will chose particular URL
• clicks-on(ActiveS, hostname)
![Page 14: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/14.jpg)
14
Re-Ranking of Search Results
• Search engine produces a list of search results
• For each possible search result R, compute the probability that
clicks-on(ActiveS, R)
• Rank the search results by their likelihood of being clicked
![Page 15: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/15.jpg)
15
MLN 1
• User will click on at least one result• User will select result chosen by
previous user with whom a click is shared
ambiguous query www.someplace1.com
some query ambiguous query
www.clickedResult.com
. . .
www.someplace1.com
![Page 16: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/16.jpg)
16
MLN 2
• MLN1 +• User will select result chosen by
previous user with whom a keyword is shared– click-to-click, click-to-search, search-to-
click, search-to-search
ambiguous query www.aClick.com
some query ambiguous query
www.clickedResult.comwww.someplace1.com
some other
![Page 17: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/17.jpg)
17
MLN 3
• MLN 2 + • User will choose result that shares a
keyword with a previous search or click in the current session
ambiguous query
some query
www.someplace1.com
www.someResult.com
www.anotherPossibility.com
www.yetAnother.com
![Page 18: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/18.jpg)
18
Data
• Collected from the MSN engine in May 2006
• Contains time-stamped records of searches and clicked URLs, grouped by sessions – Average session length is 3.28– No across-session identifiers
• Used first 25 days for training/validation and last 6 days for test
![Page 19: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/19.jpg)
19
Data Limitation #1:
• Data does not specify what queries are ambiguous– Consider query as ambiguous, if over all
pages clicked after searching for this query, at least 2 fall in different high-level categories in the DMOZ (dmoz.org) hierarchy.
– Limit to query strings of up to two words (43.7%)• 6,360 ambiguous queries (2.4% of all two-word
query strings)
![Page 20: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/20.jpg)
20
Data Limitation #2
• Data does not provide the full list of search results presented to the user; only the ones actually clicked– Assume that the URLs seen by the user
are those clicked by at least one person after searching for the exact query string
– Consequence: result sets have differing lengths
![Page 21: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/21.jpg)
21
Result Set Sizes
Size of result set for ambiguous queryNum
ber
of q
ueri
es w
ith th
at r
esul
t set
siz
e
![Page 22: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/22.jpg)
22
Evaluation Metrics: MAP
• Mean average precision – identical to the area under the
interpolated precision/recall curve
![Page 23: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/23.jpg)
23
Evaluation Metrics: AUC-ROC
• Area under the ROC curve– identical to the mean average true
negative rate
![Page 24: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/24.jpg)
24
Baselines
• Random: Rank randomly• Click-Sim: Rank by similarity based on
shared clicks• Click-KW-Sim: Rank by similarity based
on shared clicks and keywords
![Page 25: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/25.jpg)
25
Click-Sim
huntsville hospital
scrubs
huntsvillehospital.org
???
scrubs
scrubs.tv
scrubs
scrubs.tv
scrubsscrubs
scrubs
scrubs
scrubs.med
scrubs.med
scrubs.med
scrubs.med
. . . . . . . . .
. . .
. . .
. . .
scrubs.tv
Average similaritybased on shared clicks
![Page 26: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/26.jpg)
26
Click-KW-Sim
huntsville hospital
scrubs
huntsvillehospital.org
???
scrubs
scrubs.tv
scrubs
scrubs.tv
scrubsscrubs
scrubs
scrubs
scrubs.med
scrubs.med
scrubs.med
scrubs.med
. . . . . . . . .
. . .
. . .
. . .
scrubs.tv
Average similaritybased on shared clicksand keywords
![Page 27: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/27.jpg)
27
Results (MAP)MAP
0.28
0.29
0.3
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39
Random Click-Sim Click-KW-Sim
MLN1 MLN2 MLN3
* **
*
![Page 28: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/28.jpg)
28
Results (AUC-ROC)
AUC-ROC
0.46
0.48
0.5
0.52
0.54
0.56
0.58
Random Click-Sim
Click-KW-Sim
MLN1 MLN2 MLN3
*
**
![Page 29: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/29.jpg)
29
Current/Future Work
• Incorporating more information in the models– Actual content of clicked pages– Popularity of pages– Weighing evidence based on how close it
is in time to ambiguous query• Learning separate weights for each
connecting keyword or domain/group of keywords or domains
• Revising the provided clauses
![Page 30: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/30.jpg)
30
Questions?
![Page 31: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/31.jpg)
31
1
• The popularity of a possible result provides a strong signal, but providing relational information on top of popularity gives further performance improvements– Rank by popularity + click-KW-Sim baseline:
• MAP (0.383), AUC-ROC (0.536)
– Rank by popularity only:• MAP(0.380), AUC-ROC (0.525)
![Page 32: Search Query Disambiguation from Short Sessions](https://reader034.fdocuments.us/reader034/viewer/2022051402/56815b66550346895dc95730/html5/thumbnails/32.jpg)
32
2N
umbe
r of
ses
sion
s w
ith th
at m
any
clic
ks
Number of distinct clicks before ambiguous query