Intent Mining from Search Results
Jan Pedersen
Outline
• Intro to Web Search– Free text queries– Architecture– Why it works
• Result Set Mining– Disambiguation– Correction– Amplification
The Worst Interface (ca 1990)
The Search Interface (ca 2010)
Search wasn’t always like this
ttl/(tennis and (racquet or racket))isd/1/8/2002 and motorcyclein/newmar-julieSource: USPTO
Salton’s Contribution
Source: cs.cornell.edu
• Free text queries• Approximate matching• Relevance ranking
• Exploit redundancy• Meta data• Scored-OR
Life of a query
Gerry Salton
(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))
Index• Separation between user query and backend query
• Relevance scoring and ranking• Query-in-context summaries
Why Does it Work?
Semantic Meta-Data
Segment Tail OverallAll Queries 100% 100%Word Count > 4 41% 20%Misspelled 21% 11%Perfect Matches Popularity 28% 54%Partial Matches Popularity 45% 28%No Matches Popularity 9% 7%
RESULT SET MINING
Query Expansion
• [Gerry Salton] [Gerry Salton Cornell]• Disambiguation via Expansion• Pseudo Relevance Feedback (Evans)
Life of a query (2)
Gerry Salton
(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))
Index
Gerry Salton Gerry Salton Cornell
• Result Set Analysis• Automated Query expansion• Reranking
Spelling Correction
• Session Log Mining• Multiple queries with Blending• Behavioral feedback loop
Blend(Scored-AND(200, “britinay”, “spares”), Scored-AND(200, “britney”, “spears”))
Scored-AND(200, OR(“britinay”, “britney”), OR(“spares”, “spears”))
Web Search
Gerry Salton
• Speller• Synonyms
Index
First Stage reRanking: 100K
(Scored-AND 200,”Gerry”, “Salton”)
IndexIndexIndexIndexIndex100B
LocalNews
Second Stage reRanking: 5K
Third Stage reRanking: 50
• Query Understanding• Federation• ReRanking and Blending
• Entity Detection• Grouping• Summarization
Post Result Triggering
• Alternative to Answer Blending• Structured Data integration• Off-page data joins
Grouping
• Reranked Results• Compressed Presentation• Coherently grouped
Summary
• Web Queries are not User Intent– Suffer from ambiguity and errors
• Intent can be mined from results– Query Correction– Disambiguation– Grouping and Organization
Top Related