Intent Mining from Search Results

Post on 22-Feb-2016

48 views 0 download

description

Intent Mining from Search Results. Jan Pedersen. Outline. Intro to Web Search Free text queries Architecture Why it works Result Set Mining Disambiguation Correction Amplification. The Worst Interface ( ca 1990). The Search Interface ( ca 2010). Search wasn’t always like this. - PowerPoint PPT Presentation

Transcript of Intent Mining from Search Results

Intent Mining from Search Results

Jan Pedersen

Outline

• Intro to Web Search– Free text queries– Architecture– Why it works

• Result Set Mining– Disambiguation– Correction– Amplification

The Worst Interface (ca 1990)

The Search Interface (ca 2010)

Search wasn’t always like this

ttl/(tennis and (racquet or racket))isd/1/8/2002 and motorcyclein/newmar-julieSource: USPTO

Salton’s Contribution

Source: cs.cornell.edu

• Free text queries• Approximate matching• Relevance ranking

• Exploit redundancy• Meta data• Scored-OR

Life of a query

Gerry Salton

(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))

Index• Separation between user query and backend query

• Relevance scoring and ranking• Query-in-context summaries

Why Does it Work?

Semantic Meta-Data

Segment Tail OverallAll Queries 100% 100%Word Count > 4 41% 20%Misspelled 21% 11%Perfect Matches Popularity 28% 54%Partial Matches Popularity 45% 28%No Matches Popularity 9% 7%

RESULT SET MINING

Query Expansion

• [Gerry Salton] [Gerry Salton Cornell]• Disambiguation via Expansion• Pseudo Relevance Feedback (Evans)

Life of a query (2)

Gerry Salton

(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))

Index

Gerry Salton Gerry Salton Cornell

• Result Set Analysis• Automated Query expansion• Reranking

Spelling Correction

• Session Log Mining• Multiple queries with Blending• Behavioral feedback loop

Blend(Scored-AND(200, “britinay”, “spares”), Scored-AND(200, “britney”, “spears”))

Scored-AND(200, OR(“britinay”, “britney”), OR(“spares”, “spears”))

Web Search

Gerry Salton

• Speller• Synonyms

Index

First Stage reRanking: 100K

(Scored-AND 200,”Gerry”, “Salton”)

IndexIndexIndexIndexIndex100B

LocalNews

Second Stage reRanking: 5K

Third Stage reRanking: 50

• Query Understanding• Federation• ReRanking and Blending

• Entity Detection• Grouping• Summarization

Post Result Triggering

• Alternative to Answer Blending• Structured Data integration• Off-page data joins

Grouping

• Reranked Results• Compressed Presentation• Coherently grouped

Summary

• Web Queries are not User Intent– Suffer from ambiguity and errors

• Intent can be mined from results– Query Correction– Disambiguation– Grouping and Organization