DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

CrowdQ: A Search Engine with Crowdsourced Query Understanding Daniel Bruckner, Daniel Haas, Jonathan Harper

h6p://ec2-‐50-‐16-‐103-‐42.compute-‐1.amazonaws.com:8001/

MOTIVATION

QUERY TEMPLATES

EVALUATION

1-HOP SEMANTICS

CROWD INTERFACE

ARCHITECTURE WEB INTERFACE

FUTURE WORK

User

Keyword QueryOn#line'Complex'Query

ProcessingComplex

query classifier

CrowdsourcingPlatform

Vetrical selection,

Unstructured Search, ...

POS + NER tagging

Query Template Index

Crowd Manager

N

Y

Queries Templ +Answer Types

StructuredLOD Search

Result Joiner

Template Generation

SERP

t1t2t3

Off#line'Complex'QueryDecomposition

Structured Query

Query Logquery

N

Answ

erCo

mpo

sitio

n

LOD Open Data Cloud

Match with existingquery templates

•  Templates represent many similar queries. •  Given a 1-‐Hop query, we generalize it by abstracJng

the source en@ty. •  Example: the 1-‐Hop for “capital of Canada” uses

“Canada” as its source node. We generalize “Canada” to <poliJcal enJty>, and can now answer queries about the capital of any poliJcal enJty.

•  Challenge: Correctness of templates is hard to ensure.

O7en templates are too broad or too specific.

Search engines have begun providing direct answers to web search queries, but there is a long tail of less common queries that cannot be answered this way.

•  97% of unique queries occur 10 or fewer @mes •  State-‐of-‐the-‐art NLP techniques are not reliable

enough to answer these queries •  Crowds have been used to gather answers, but this

approach is expensive, order of $0.50 per query •  Meanwhile, large open data sets like DBpedia already

contain many answers, but Crowd input is needed to understand queries and map them onto these databases

Challenge: Understanding arbitrary query seman>cs is hard Solu@on: Focus on a subset of queries with common seman>cs

Example relaJonship extracJon HIT interface.

Our search engine UI. Results (center) are not web pages but direct answers. Structured data about the query is shown at lea, and alternaJve interpretaJons of the query are displayed at right, as a fallback. The UI achieves interacJve latencies.

Key Abstrac@on: a 1-‐Hop encapsulates single semanJc jump, e.g., “Beatles live albums” or “capital of Canada” •  Source: a known named enJty in the query (“Beatles”) •  Answer: an enJty linked directly to source (an album) •  Filter: a predicate answer must match (the “live album” type)

Answer candidate graphs are used to generate English sentences. Mechanical Turk assignments are generated to ask the crowd for the best query interpreta@on.

Data. DBPedia (general, dirty) and MusicBrainz (narrow, clean). InteresJngly, it is easier to produce templates for dirJer data sets, but then templates are less general. Queries. 100+ queries from QALD-‐2 benchmark. 1-‐Hop abstracJon applies to majority of QALD queries (29 DBPedia, 73 MusicBrainz) Candidate genera@on. Text search on DBPedia finds candidate 1-‐Hops for 62% of test queries. Crowd Efficiency. How efficient is the Crowd? We posted 252 tasks on Mechanical Turk, cosJng $0.84 per template. The crowd was 66.7% accurate in answering keyword queries. We evaluated two interfaces: mulJ-‐select and single-‐select, and found that accuracy was the same in both approaches. Template Coverage. How useful are our templates? Template Performance. Do we achieve interacJve latencies?

0

0.05

0.1

0.15

0.2

0.25

0.3

1 10 100 1000 10000 100000

Frac@o

n of Tem

plates

Relevant En@@es in Template (Lower Bound Shown)

Example Templates We measure generality by how many source enJJes their 1-‐Hops match (DistribuJon at right).

Query Size Comment Actors in <Top Gun> 8,642 Good! <Maribor> populaJon 164,329 Great! members of <The Prodigy> 3 Too narrow German Shepherd breeds 659,430 Too general

•  Improve candidate template generaJon with NLP tools like stemmers and WordNet

•  Extend 1-‐Hop abstracJon to support more complex queries

•  Augment quality controls for data and templates, e.g., by adding verificaJon to crowd pipeline

•  Build ML model to enable complex template matching •  Op@mize the crowd interface performance and apply it

to addiJonal sub-‐problems •  Run on larger query logs (requires enJty extracJon!)

Charts at right show latency distribuJon for a randomized 10K query benchmark. The client was local in the lea chart. Average latency for local requests is 26ms and maximum observed is 240ms.

DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

Documents

Transcript of DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...