Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

6
Information Re- Retrieval: Repeat Queries in Yahoo’s Logs Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts SIGIR 2007

description

Information Re-Retrieval: Repeat Queries in Yahoo’s Logs. Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts SIGIR 2007. Motivation. Re-finding information is a common activity of W e b search What is the intention of re-finding information? - PowerPoint PPT Presentation

Transcript of Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Page 1: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Jaime Teevan, Eytan Adar, Rosie Jones, Michael A. S. Potts

SIGIR 2007

Page 2: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Motivation

• Re-finding information is a common activity of Web search

• What is the intention of re-finding information?

• What factors favor/indicate user’s re-finding of information?

Page 3: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Dataset

• 114 Yahoo users search trace over 1 year (Aug 2004 – July 2005)– 115 queries / trace– Considered as repeat

when separated > 30 minutes

• 119 volunteers in a controlled experiment– users are asked to repeat

one query made 30 mins to 1 hour ago

Page 4: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Techniques used

• Normalizing query terms– Capitalization, stop words removal, duplicate words removal, extra white

space, stemming

– Word order (e.g. “new york department of state” and “department of state new york”)

– Non-alphanumerics (e.g. “sub-urban” vs “sub urban”)

– Word merge (e.g. “wal mart” vs “walmart”)

– Domain (e.g. hotmail vs hotmail.com)

– Words swap (e.g. “american embassy london” vs “american consulate london”)

• SVM classifier– Applied to predict whether a result will be clicked again

Page 5: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Discovery

• Navigation query is one major type of re-finding information– Bank, news, mail– .com, .edu, .net

• Rank changes affects re-finding

Page 6: Information Re-Retrieval: Repeat Queries in Yahoo’s Logs

Discovery

• Memory fades– Control experiment

30% are mis-remembered (36/119)27 out of 36 are equivalent after normalization

– Yahoo Logs

• Indicators of repeat click– # clicks in first query– # clicks in previous query– # unique clicks in previous query