Information Re-Retrieval: Repeat Queries in Yahoo’s Logs
-
Upload
laura-vincent -
Category
Documents
-
view
13 -
download
3
description
Transcript of Information Re-Retrieval: Repeat Queries in Yahoo’s Logs
Information Re-Retrieval: Repeat Queries in Yahoo’s Logs
Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts
SIGIR 2007
Motivation
• Re-finding information is a common activity of Web search
• What is the intention of re-finding information?
• What factors favor/indicate user’s re-finding of information?
Dataset
• 114 Yahoo users search trace over 1 year (Aug 2004 – July 2005)– 115 queries / trace– Considered as repeat
when separated > 30 minutes
• 119 volunteers in a controlled experiment– users are asked to repeat
one query made 30 mins to 1 hour ago
Techniques used
• Normalizing query terms– Capitalization, stop words removal, duplicate words removal, extra white
space, stemming
– Word order (e.g. “new york department of state” and “department of state new york”)
– Non-alphanumerics (e.g. “sub-urban” vs “sub urban”)
– Word merge (e.g. “wal mart” vs “walmart”)
– Domain (e.g. hotmail vs hotmail.com)
– Words swap (e.g. “american embassy london” vs “american consulate london”)
• SVM classifier– Applied to predict whether a result will be clicked again
Discovery
• Navigation query is one major type of re-finding information– Bank, news, mail– .com, .edu, .net
• Rank changes affects re-finding
Discovery
• Memory fades– Control experiment
30% are mis-remembered (36/119)27 out of 36 are equivalent after normalization
– Yahoo Logs
• Indicators of repeat click– # clicks in first query– # clicks in previous query– # unique clicks in previous query