CrowdQ: Crowdsourced Query Understanding

15
CrowdQ: Crowdsourced Query Understanding Gianluca Demar8ni, Beth Trushkowsky, Tim Kraska, Michael J. Franklin

description

Conference talk at CIDR 2013, January 2013, Asilomar, CA, USA.

Transcript of CrowdQ: Crowdsourced Query Understanding

Page 1: CrowdQ: Crowdsourced Query Understanding

CrowdQ:  Crowdsourced  Query  Understanding    

Gianluca  Demar8ni,  Beth  Trushkowsky,  Tim  Kraska,  Michael  J.  Franklin  

Page 2: CrowdQ: Crowdsourced Query Understanding

Scenario  

Find  the  birthdate  of  the  mayor  of  the  capital  city  of  France    

Gianluca  Demar8ni   2  

Page 3: CrowdQ: Crowdsourced Query Understanding

Gianluca  Demar8ni   3  

Page 4: CrowdQ: Crowdsourced Query Understanding

Gianluca  Demar8ni   4  

Page 5: CrowdQ: Crowdsourced Query Understanding

Gianluca  Demar8ni   5  

Page 6: CrowdQ: Crowdsourced Query Understanding

Gianluca  Demar8ni   6  

Page 7: CrowdQ: Crowdsourced Query Understanding

Mo8va8on  

•  Web  Search  Engines  can  answer  simple  factual  queries  directly  on  the  result  page  

•  Users  with  complex  informa8on  needs  are  oQen  unsa8sfied  

•  Purely  automa8c  techniques  are  not  enough  

•  We  want  to  solve  it  with  Crowdsourcing!  

Gianluca  Demar8ni   7  

Page 8: CrowdQ: Crowdsourced Query Understanding

Background  

•  Crowdsourcing  so  far  used  for  data  processing  – DB/SemWeb:  Data  integra8on  and  cleaning  –  IR:  Relevance  judgments  

 We  use  the  crowd  to  understand  the  query  

Gianluca  Demar8ni   8  

Page 9: CrowdQ: Crowdsourced Query Understanding

CrowdQ  

•  CrowdQ  is  the  first  system  that  uses  crowdsourcing  to  – Understand  the  intended  meaning  

– Build  a  structured  query  template  – Answer  the  query  over  Linked  Open  Data  

Gianluca  Demar8ni   9  

Page 10: CrowdQ: Crowdsourced Query Understanding

Gianluca  Demar8ni   10  

Page 11: CrowdQ: Crowdsourced Query Understanding

User

Keyword QueryOn#line'Complex'Query

ProcessingComplex

query classifier

CrowdsourcingPlatform

Vetrical selection,

Unstructured Search, ...

POS + NER tagging

Query Template Index

Crowd Manager

N

Y

Queries Templ +Answer Types

StructuredLOD Search

Result Joiner

Template Generation

SERP

t1t2t3

Off#line'Complex'QueryDecomposition

Structured Query

Query Logquery

N

Answ

erCo

mpo

sitio

n

LOD Open Data Cloud

Match with existingquery templates

CrowdQ  Architecture  

Gianluca  Demar8ni   11  

Off-­‐line:  query  template  genera8on  with  the  help  of  the  crowd  On-­‐line:  query  template  matching  using  NLP  and  search  over  open  data  

Page 12: CrowdQ: Crowdsourced Query Understanding

Hybrid  Human-­‐Machine  Pipeline  

Gianluca  Demar8ni   12  

Q=  birthdate  of  actors  of  forrest  gump  

Query  annota8on   Noun   Noun   Named  en8ty  

Verifica8on  

En8ty  Rela8ons  

Is  forrest  gump  this  en8ty  in  the  query?  

Which  is  the  rela8on  between:  actors  and  forrest  gump   starring  

Schema  element   Starring                          <dbpedia-­‐owl:starring>    

Verifica8on   Is  the  rela8on  between:  Indiana  Jones  –  Harrison  Ford  Back  to  the  Future  –  Michael  J.  Fox  of  the  same  type  as  Forrest  Gump  -­‐  actors        

Page 13: CrowdQ: Crowdsourced Query Understanding

Structured  query  genera8on  

SELECT  ?y  ?x  WHERE  {  ?y  <dbpedia-­‐owl:birthdate>  ?x  .  

     ?z  <dbpedia-­‐owl:starring>  ?y  .  

     ?z  <rdfs:label>  ‘Forrest  Gump’  }  

Gianluca  Demar8ni   13  

Results  from  BTC09:  

Q=  birthdate  of  actors  of  forrest  gump  MOVIE  

MOVIE  

Page 14: CrowdQ: Crowdsourced Query Understanding

Current  Status  

•  Realize  the  vision  •  Running  demo:  – Daniel  Haas,  Daniel  Bruckner,  Jonathan  Harper  

•  Next  Steps  – Evalua8on  of  Crowd  effec8veness  at  each  step  – Comparison  hybrid  vs  machine  pipeline  

Gianluca  Demar8ni   14  

Page 15: CrowdQ: Crowdsourced Query Understanding

Conclusions  

•  CrowdQ:  an  hybrid  approach  to  complex  query  understanding  

•  Combines  techniques  from  DB,  NLP,  IR,  Data  Mining,  and  Human  Intelligence    

•  Ini8al  experiments  show  the  poten8al  of  CrowdQ  

Gianluca  Demar8ni   15