DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

CrowdQ: A Search Engine with Crowdsourced Query Understanding Daniel Bruckner, Daniel Haas, Jonathan Harper

h6p://ec2-‐50-‐16-‐103-‐42.compute-‐1.amazonaws.com:8001/

MOTIVATION

QUERY TEMPLATES

EVALUATION

1-HOP SEMANTICS

CROWD INTERFACE

ARCHITECTURE WEB INTERFACE

FUTURE WORK

Keyword QueryOn#line'Complex'Query

ProcessingComplex

query classifier

CrowdsourcingPlatform

Vetrical selection,

Unstructured Search, ...

POS + NER tagging

Query Template Index

Crowd Manager

Queries Templ +Answer Types

StructuredLOD Search

Result Joiner

Template Generation

t1t2t3

Off#line'Complex'QueryDecomposition

Structured Query

Query Logquery

LOD Open Data Cloud

Match with existingquery templates

•  Templates represent many similar queries. •  Given a 1-‐Hop query, we generalize it by abstracJng

the source en@ty. •  Example: the 1-‐Hop for “capital of Canada” uses

“Canada” as its source node. We generalize “Canada” to <poliJcal enJty>, and can now answer queries about the capital of any poliJcal enJty.

•  Challenge: Correctness of templates is hard to ensure.

O7en templates are too broad or too specific.

Search engines have begun providing direct answers to web search queries, but there is a long tail of less common queries that cannot be answered this way.

•  97% of unique queries occur 10 or fewer @mes •  State-‐of-‐the-‐art NLP techniques are not reliable

enough to answer these queries •  Crowds have been used to gather answers, but this

approach is expensive, order of $0.50 per query •  Meanwhile, large open data sets like DBpedia already

contain many answers, but Crowd input is needed to understand queries and map them onto these databases

Challenge: Understanding arbitrary query seman>cs is hard Solu@on: Focus on a subset of queries with common seman>cs

Example relaJonship extracJon HIT interface.

Our search engine UI. Results (center) are not web pages but direct answers. Structured data about the query is shown at lea, and alternaJve interpretaJons of the query are displayed at right, as a fallback. The UI achieves interacJve latencies.

Key Abstrac@on: a 1-‐Hop encapsulates single semanJc jump, e.g., “Beatles live albums” or “capital of Canada” •  Source: a known named enJty in the query (“Beatles”) •  Answer: an enJty linked directly to source (an album) •  Filter: a predicate answer must match (the “live album” type)

Answer candidate graphs are used to generate English sentences. Mechanical Turk assignments are generated to ask the crowd for the best query interpreta@on.

Data. DBPedia (general, dirty) and MusicBrainz (narrow, clean). InteresJngly, it is easier to produce templates for dirJer data sets, but then templates are less general. Queries. 100+ queries from QALD-‐2 benchmark. 1-‐Hop abstracJon applies to majority of QALD queries (29 DBPedia, 73 MusicBrainz) Candidate genera@on. Text search on DBPedia finds candidate 1-‐Hops for 62% of test queries. Crowd Efficiency. How efficient is the Crowd? We posted 252 tasks on Mechanical Turk, cosJng $0.84 per template. The crowd was 66.7% accurate in answering keyword queries. We evaluated two interfaces: mulJ-‐select and single-‐select, and found that accuracy was the same in both approaches. Template Coverage. How useful are our templates? Template Performance. Do we achieve interacJve latencies?

1 10 100 1000 10000 100000

Frac@o

n of Tem

plates

Relevant En@@es in Template (Lower Bound Shown)

Example Templates We measure generality by how many source enJJes their 1-‐Hops match (DistribuJon at right).

Query Size Comment Actors in <Top Gun> 8,642 Good! <Maribor> populaJon 164,329 Great! members of <The Prodigy> 3 Too narrow German Shepherd breeds 659,430 Too general

•  Improve candidate template generaJon with NLP tools like stemmers and WordNet

•  Extend 1-‐Hop abstracJon to support more complex queries

•  Augment quality controls for data and templates, e.g., by adding verificaJon to crowd pipeline

•  Build ML model to enable complex template matching •  Op@mize the crowd interface performance and apply it

to addiJonal sub-‐problems •  Run on larger query logs (requires enJty extracJon!)

Charts at right show latency distribuJon for a randomized 10K query benchmark. The client was local in the lea chart. Average latency for local requests is 26ms and maximum observed is 240ms.

DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

Documents

Transcript of DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

ASearch for the Truth Does our System Provide for It? · ASearch for the Truth Does our System Provide for It? AB Gordon Johannesburg Bar After practising at the Johannesburg Bar

CrowdQ: Crowdsourced Query Understanding (1).pdfpropose using paid crowdsourcing to aid in query decom-position to understand the relationship between the query’s di erent elements.

IN LIEU OF DIRECTORS’ MEETINGUse the ASearch Selection@ screen and search by application number (i.e. SP18021, SP18019). ... Administrative Amendment No. 18026, to Special Permit

ASearch FOR LONG-LIVED PARTICLES THAT STOP IN THE CMS DETECTOR AND DECAY TO

Gartner’Hype’Cycle,’July’2013,’h6p:// ... · 5 Gigabytes: 8mm exabyte tale 10 Gigabytes: 20 Gigabytes: Audio collection of the works of Beethoven; five exabyte tapes; VHS

Фронтенд в Avito · Socket Client • Собственная низкоуровневая библиотека • Два транспорта — socket и h6p • Поддерживает

Oracle clusterware 11gR2 - WordPress.com · Oracle clusterware 11gR2 UKOUG TEBS 2010 Frits Hoogland ... security –Technical)security,)performance Blog: h6p ... –Oracle clusterware

LOW INTELLIGENCEusing the Terman-Merrill revision ofthe Stanford-Binet Intelligence ScaleFormsLandLM).Asearch ofrecordsin theBirminghamPublicHealthDepart-ment and in the Special Schools

Download-Stykzfrom h6p://stykz.net/downloads/-or-ask-Mrs. … · 2019. 5. 15. · Change-the-line-weightto-1-and-the- colour-to-the-same-colour-thatyou-wantyour-triangle-to-be.-

Adding CVPR13 poster3 2Adding&Unlabeled&Samples&to&Categories&by&Learned&Aributes& h6p://umiacs.umd.edu/~jhchoi/addingbya6r& [1]Salakhutdinov,Torralba,Tenenbaum,"“Learning"to"Share"Visual

CrowdQ: Crowdsourced(Query(Understanding((...User Keyword Query On#line'Complex'Query Processing Complex query classiﬁer Crowdsourcing Platform Vetrical selection, Unstructured Search,

TIMESCITY of...Hyderabad in its domestic leg, customs officials main-tained a special watch to catch the smuggler and re-cover the gold Asearch of the aircraft led to the recovery

CrowdQ: A Search Engine with Crowdsourced Query ...kubitron/courses/cs262...structured template for each semantic class of queries that is used by the online answer engine to e ciently

CrowdQ: Crowdsourced Query Understanding