DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

1
CrowdQ: A Search Engine with Crowdsourced Query Understanding Daniel Bruckner, Daniel Haas, Jonathan Harper h6p://ec2501610342.compute1.amazonaws.com:8001/ MOTIVATION QUERY TEMPLATES EVALUATION 1-HOP SEMANTICS CROWD INTERFACE ARCHITECTURE WEB INTERFACE FUTURE WORK User Keyword Query On#line Complex Query Processing Complex query classifier Crowdsourcing Platform Vetrical selection, Unstructured Search, ... POS + NER tagging Query Template Index Crowd Manager N Y Queries Templ + Answer Types Structured LOD Search Result Joiner Template Generation SERP t1 t2 t3 Off#line Complex Query Decomposition Structured Query Query Log query N Answer Composition LOD Open Data Cloud Match with existing query templates Templates represent many similar queries. Given a 1Hop query, we generalize it by abstracJng the source en@ty. Example: the 1Hop for “capital of Canada” uses “Canada” as its source node. We generalize “Canada” to <poliJcal enJty>, and can now answer queries about the capital of any poliJcal enJty. Challenge: Correctness of templates is hard to ensure. O7en templates are too broad or too specific. Search engines have begun providing direct answers to web search queries, but there is a long tail of less common queries that cannot be answered this way. 97% of unique queries occur 10 or fewer @mes Stateoftheart NLP techniques are not reliable enough to answer these queries Crowds have been used to gather answers, but this approach is expensive, order of $0.50 per query Meanwhile, large open data sets like DBpedia already contain many answers, but Crowd input is needed to understand queries and map them onto these databases Challenge: Understanding arbitrary query seman>cs is hard Solu@on: Focus on a subset of queries with common seman>cs Example relaJonship extracJon HIT interface. Our search engine UI. Results (center) are not web pages but direct answers. Structured data about the query is shown at lea, and alternaJve interpretaJons of the query are displayed at right, as a fallback. The UI achieves interacJve latencies. Key Abstrac@on:a 1Hop encapsulates single semanJc jump, e.g., “Beatles live albums” or “capital of Canada” Source: a known named enJty in the query (“Beatles”) Answer: an enJty linked directly to source (an album) Filter: a predicate answer must match (the “live album” type) Answer candidate graphs are used to generate English sentences. Mechanical Turk assignments are generated to ask the crowd for the best query interpreta@on. Data. DBPedia (general, dirty) and MusicBrainz (narrow, clean). InteresJngly, it is easier to produce templates for dirJer data sets, but then templates are less general. Queries. 100+ queries from QALD2 benchmark. 1Hop abstracJon applies to majority of QALD queries (29 DBPedia, 73 MusicBrainz) Candidate genera@on. Text search on DBPedia finds candidate 1Hops for 62% of test queries. Crowd Efficiency. How efficient is the Crowd? We posted 252 tasks on Mechanical Turk, cosJng $0.84 per template. The crowd was 66.7% accurate in answering keyword queries. We evaluated two interfaces: mulJselect and single select, and found that accuracy was the same in both approaches. Template Coverage. How useful are our templates? Template Performance. Do we achieve interacJve latencies? 0 0.05 0.1 0.15 0.2 0.25 0.3 1 10 100 1000 10000 100000 Frac@on of Templates Relevant En@@es in Template (Lower Bound Shown) Example Templates We measure generality by how many source enJJes their 1Hops match (DistribuJon at right). Query Size Comment Actors in <Top Gun> 8,642 Good! <Maribor> populaJon 164,329 Great! members of <The Prodigy> 3 Too narrow German Shepherd breeds 659,430 Too general Improve candidate template generaJon with NLP tools like stemmers and WordNet Extend 1Hop abstracJon to support more complex queries Augment quality controls for data and templates, e.g., by adding verificaJon to crowd pipeline Build ML model to enable complex template matching Op@mize the crowd interface performance and apply it to addiJonal subproblems Run on larger query logs (requires enJty extracJon!) Charts at right show latency distribuJon for a randomized 10K query benchmark. The client was local in the lea chart. Average latency for local requests is 26ms and maximum observed is 240ms.

Transcript of DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with(...

Page 1: DanielBruckner,DanielHaas,JonathanHarperkubitron/courses/...CrowdQ: ASearch(Engine(with( Crowdsourced(Query(Understanding(" Daniel"Bruckner,"Daniel"Haas,"Jonathan"Harper" h6p://ec29509169103942.compute91.amazonaws.com:8001

CrowdQ:  A  Search  Engine  with  Crowdsourced  Query  Understanding    Daniel  Bruckner,  Daniel  Haas,  Jonathan  Harper  

h6p://ec2-­‐50-­‐16-­‐103-­‐42.compute-­‐1.amazonaws.com:8001/  

MOTIVATION

QUERY TEMPLATES

EVALUATION

1-HOP SEMANTICS

CROWD INTERFACE

ARCHITECTURE WEB INTERFACE

FUTURE WORK

User

Keyword QueryOn#line'Complex'Query

ProcessingComplex

query classifier

CrowdsourcingPlatform

Vetrical selection,

Unstructured Search, ...

POS + NER tagging

Query Template Index

Crowd Manager

N

Y

Queries Templ +Answer Types

StructuredLOD Search

Result Joiner

Template Generation

SERP

t1t2t3

Off#line'Complex'QueryDecomposition

Structured Query

Query Logquery

N

Answ

erCo

mpo

sitio

n

LOD Open Data Cloud

Match with existingquery templates

•  Templates  represent  many  similar  queries.  •  Given  a  1-­‐Hop  query,  we  generalize  it  by  abstracJng  

the  source  en@ty.    •  Example:  the  1-­‐Hop  for  “capital  of  Canada”  uses  

“Canada”  as  its  source  node.  We  generalize  “Canada”  to  <poliJcal  enJty>,  and  can  now  answer  queries  about  the  capital  of  any  poliJcal  enJty.  

 •  Challenge:  Correctness  of  templates  is  hard  to  ensure.    

O7en  templates  are  too  broad  or  too  specific.  

Search  engines  have  begun  providing  direct  answers  to  web  search  queries,  but  there  is  a  long  tail  of  less  common  queries  that  cannot  be  answered  this  way.  

•  97%  of  unique  queries  occur  10  or  fewer  @mes  •  State-­‐of-­‐the-­‐art  NLP  techniques  are  not  reliable  

enough  to  answer  these  queries      •  Crowds  have  been  used  to  gather  answers,  but  this  

approach  is  expensive,  order  of  $0.50  per  query  •  Meanwhile,  large  open  data  sets  like  DBpedia  already  

contain  many  answers,  but  Crowd  input  is  needed  to  understand  queries  and  map  them  onto  these  databases  

Challenge:  Understanding  arbitrary  query  seman>cs  is  hard    Solu@on:  Focus  on  a  subset  of  queries  with  common  seman>cs  

Example  relaJonship  extracJon  HIT  interface.  

Our  search  engine  UI.    Results  (center)  are  not  web  pages  but  direct  answers.    Structured  data  about  the  query  is  shown  at  lea,  and  alternaJve  interpretaJons  of  the  query  are  displayed  at  right,  as  a  fallback.    The  UI  achieves  interacJve  latencies.  

Key  Abstrac@on:  a  1-­‐Hop  encapsulates  single  semanJc  jump,  e.g.,  “Beatles  live  albums”  or  “capital  of  Canada”    •  Source:  a  known  named  enJty  in  the  query  (“Beatles”)  •  Answer:  an  enJty  linked  directly  to  source  (an  album)  •  Filter:  a  predicate  answer  must  match  (the  “live  album”  type)  

Answer  candidate  graphs  are  used  to  generate  English  sentences.      Mechanical  Turk  assignments  are  generated  to  ask  the  crowd  for  the  best  query  interpreta@on.  

Data.  DBPedia  (general,  dirty)  and  MusicBrainz  (narrow,  clean).    InteresJngly,  it  is  easier  to  produce  templates  for  dirJer  data  sets,  but  then  templates  are  less  general.    Queries.  100+  queries  from  QALD-­‐2  benchmark.    1-­‐Hop  abstracJon  applies  to  majority  of  QALD  queries  (29  DBPedia,  73  MusicBrainz)    Candidate  genera@on.  Text  search  on  DBPedia  finds  candidate  1-­‐Hops  for  62%  of  test  queries.    Crowd  Efficiency.  How  efficient  is  the  Crowd?  We  posted  252  tasks  on  Mechanical  Turk,  cosJng  $0.84  per  template.  The  crowd  was  66.7%  accurate  in  answering  keyword  queries.  We  evaluated  two  interfaces:  mulJ-­‐select  and  single-­‐select,  and  found  that  accuracy  was  the  same  in  both  approaches.    Template  Coverage.  How  useful  are  our  templates?                  Template  Performance.  Do  we  achieve  interacJve  latencies?    

0  

0.05  

0.1  

0.15  

0.2  

0.25  

0.3  

1   10   100   1000   10000   100000  

Frac@o

n  of  Tem

plates  

Relevant  En@@es  in  Template  (Lower  Bound  Shown)  

Example  Templates            We  measure  generality  by  how  many  source  enJJes  their  1-­‐Hops  match  (DistribuJon  at  right).  

Query   Size   Comment  Actors  in  <Top  Gun>   8,642   Good!  <Maribor>  populaJon   164,329   Great!  members  of  <The  Prodigy>   3   Too  narrow  German  Shepherd  breeds   659,430   Too  general  

•  Improve  candidate  template  generaJon  with  NLP  tools  like  stemmers  and  WordNet  

•  Extend  1-­‐Hop  abstracJon  to  support  more  complex  queries  

•  Augment  quality  controls  for  data  and  templates,  e.g.,  by  adding  verificaJon  to  crowd  pipeline  

•  Build  ML  model  to  enable  complex  template  matching    •  Op@mize  the  crowd  interface  performance  and  apply  it  

to  addiJonal  sub-­‐problems  •  Run  on  larger  query  logs  (requires  enJty  extracJon!)  

Charts  at  right  show  latency  distribuJon  for  a  randomized  10K  query  benchmark.    The  client  was  local  in  the  lea  chart.    Average  latency  for  local  requests  is  26ms  and  maximum  observed  is  240ms.