Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+...
Transcript of Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+...
![Page 1: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/1.jpg)
Hybrid Human-‐machine Systems
Lecture 5 Gianluca Demar8ni
University of Sheffield
![Page 2: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/2.jpg)
Background
• Hybrid systems – Combining the scalability of machines and the quality of human intelligence
2
![Page 3: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/3.jpg)
Hybrid Systems: Key Issues
• The role of machine (i.e., algorithm) and humans – use only humans? both? who’s doing what?
• Quality control • Op#miza#on: What to crowdsource • Scalability: How much to crowdsource
3
![Page 4: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/4.jpg)
Thinking About Hybrid Systems
Algorithms
Machines
People
search
Watson/IBM
4
![Page 5: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/5.jpg)
Example: Hybrid Image Search
5
Yan, Kumar, Ganesan, CrowdSearch: Exploi8ng Crowds for Accurate Real-‐8me Image Search on Mobile Phones, Mobisys 2010.
![Page 6: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/6.jpg)
Not sure
Example: Hybrid Data Integra8on
paper conf Data integration VLDB-01
Data mining SIGMOD-02
title author email OLAP Mike mike@a
Social media Jane jane@b
l Generate plausible matches – paper = 8tle, paper = author, paper = email, paper = venue – conf = 8tle, conf = author, conf = email, conf = venue
l Ask users to verify
paper conf Data integration VLDB-01
Data mining SIGMOD-02
title author email venue OLAP Mike mike@a ICDE-02
Social media Jane jane@b PODS-05
Does aaribute paper match aaribute author?
No Yes
McCann, Shen, Doan: Matching Schemas in Online Communi8es. ICDE, 2008 6
![Page 7: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/7.jpg)
Example: Hybrid Query Processing
7
Use the crowd to answer DB-‐hard queries Where to use the crowd: • Find missing data • Make subjec#ve
comparisons • Recognize paJerns
But not: • Anything the computer
already does well Disk 2
Disk 1
Parser
Optimizer
Stat
istics
CrowdSQL Results
Executor
Files Access Methods
UI Template Manager
Form Editor
UI Creation
HIT Manager
Met
aDat
a
Turker Relationship Manager
M. Franklin, D. Kossmann, T. Kraska, S. Ramesh and R. Xin . CrowdDB: Answering Queries with Crowdsourcing, SIGMOD 2011
![Page 8: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/8.jpg)
Crowdsourced Data Management Applica8ons
• Rela8onal – informa8on extrac8on – schema matching – en8ty resolu8on – building structured KBs – sor8ng – top-‐k – ...
• Beyond rela8onal – graph search – classifica8on – mobile image search – social media analysis – ques8on answering – NLP – text summariza8on – sen8ment analysis – seman8c wikis – ...
8
![Page 9: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/9.jpg)
Qurk (MIT)
• Goal: crowd-‐source comparisons, missing data • Basis: SQL3 + UDF
– UDF encapsulate crowd input – special template language for crowd UDFs – specify UI, quality control, … (possibly opt. hints)
9
![Page 10: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/10.jpg)
Qurk Example [Markus et al. CrowdCrowd 2011]
• Task: Find all women in a “people” database • Schema
CREATE TABLE people( name varchar(256), photo blob );
• Query SELECT name FROM people p WHERE isFemale(p);
10
![Page 11: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/11.jpg)
Qurk Example [Markus et al. CrowdCrowd 2011]
• Task: Find all women in a “people” database • Schema
CREATE TABLE people( name varchar(256), photo blob );
• Query SELECT name FROM people p WHERE isFemale(p);
TASK isFemale(tuple) TYPE: Filter Ques#on: “is %s Female”, tuple[“photo”] YesText: “Yes” NoText: “No”
11
![Page 12: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/12.jpg)
Qurk Example [Markus et al. CrowdCrowd 2011]
TASK isFemale(tuple) TYPE: Filter Ques#on: “is %s Female”,
tuple[“photo”] YesText: “Yes” NoText: “No”
12
![Page 13: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/13.jpg)
The magic is in the Templates
• Templates generate UIs for different kinds of crowd-‐sourcing tasks – filters: Yes / No ques8ons – joins: comparisons between two tuples (equality) – order by: comparisions between two tuples (gt?) – genera8ve: crowd-‐source aaribute values
• Templates also specify quality control; e.g., COMBINER: MajorityVote
13
![Page 14: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/14.jpg)
Crowdsourcing DB Systems
• Fundamentally new way of tackling data management issues using large networks of anonymous users
• At this point, first interes8ng systems and results, but s8ll more ques8ons than answers – Hot research topic
• Unique, unexpected issues – “My database hates me”
14
![Page 15: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/15.jpg)
Problem: Populate Infoboxes
15
![Page 16: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/16.jpg)
Solu8on: IE Using Machine + Human
16
Infobox
Born July 21, 1899 Nationality American
Hemingway was an American author ...
Verify with crowd
Train a “nationality” extractor
The American readers ...
Apply extractor to new pages to extract nationalities.
![Page 17: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/17.jpg)
Recrui8ng Model: Search Adver8sing
• Interrupt user in the middle of a primary task – searching for informa8on on Ray Bradbury
• Ask if user is willing to contribute
• Evaluate different UIs: pop-‐up, highlight, icon – in terms of intrusiveness and willingness to contribute
17
“ray bradbury”
![Page 18: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/18.jpg)
Popup Interface 18
![Page 19: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/19.jpg)
Highlight Interface 19
![Page 20: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/20.jpg)
Other examples of hybrid systems
![Page 21: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/21.jpg)
hap://dbpedia.org/resource/Facebook
hap://dbpedia.org/resource/Instagram
tase:Instagram owl:sameAs
Android
<p>Facebook is not wai8ng for its ini8al public offering to make its first big purchase.</p><p>In its largest acquisi8on to date, the social network has purchased Instagram, the popular photo-‐sharing applica8on, for about $1 billion in cash and stock, the company said Monday.</p>
<p><span about="hap://dbpedia.org/resource/Facebook"><cite property=”rdfs:label">Facebook</cite> is not wai8ng for its ini8al public offering to make its first big purchase.</span></p><p><span about="hap://dbpedia.org/resource/Instagram">In its largest acquisi8on to date, the social network has purchased <cite property=”rdfs:label">Instagram</cite> , the popular photo-‐sharing applica8on, for about $1 billion in cash and stock, the company said Monday.</span></p>
RDFa enrichment
HTML:
21
![Page 22: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/22.jpg)
ZenCrowd
• Combine both algorithmic and manual linking • Automate manual linking via crowdsourcing • Dynamically assess human workers with a probabilis8c reasoning framework
22
Crowd
Algorithms Machines
![Page 23: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/23.jpg)
ZenCrowd Architecture
Micro Matching
Tasks
HTMLPages
HTML+ RDFaPages
LOD Open Data Cloud
CrowdsourcingPlatform
ZenCrowdEntity
Extractors
LOD Index Get Entity
Input Output
Probabilistic Network
Decision Engine
Micr
o-Ta
sk M
anag
er
Workers Decisions
AlgorithmicMatchers
23
Gianluca Demar8ni, Djellel Eddine Difallah, and Philippe Cudré-‐Mauroux. ZenCrowd: Leveraging Probabilis8c Reasoning and Crowdsourcing Techniques for Large-‐Scale En8ty Linking. In: 21st Interna8onal Conference on World Wide Web (WWW 2012).
![Page 24: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/24.jpg)
En8ty Factor Graphs
• Graph components – Workers, links, clicks – Prior probabili8es – Link Factors – Constraints
• Probabilis8c Inference – Select all links with posterior prob >τ
w1 w2
l1 l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l3
lf3( )
pl3( )
c11 c22c12c21 c13 c23
u2-3( )sa1-2( )
2 workers, 6 clicks, 3 candidate links
Link priors
Worker priors
Observed variables
Link factors
SameAs constraints
Dataset Unicity constraints
24
![Page 25: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/25.jpg)
Experimental Evalua8on • Datasets
– 25 news ar8cles from • CNN.com (Global news) • NYTimes.com (Global news) • Washington-‐post.com (US local news) • Timesofindia.india8mes.com (India news) • Swissinfo.com (Switzerland local news)
– 40M en88es (Freebase, DBPedia, Geonames, NYT)
25
![Page 26: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/26.jpg)
Worker Selec8on
26
Top$US$Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&P
recision
&
Number&of&Tasks&
US$Workers$
IN$Workers$
0.6$0.62$0.64$0.66$0.68$0.7$0.72$0.74$0.76$0.78$0.8$
1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$Precision)
Top)K)workers)
![Page 27: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/27.jpg)
Lessons Learnt
• Crowdsourcing + Prob reasoning works! • But
– Different worker communi8es perform differently – Many low quality workers – Comple8on 8me may vary (based on reward)
• Need to find the right workers for your task (see WWW13 paper)
27
![Page 28: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/28.jpg)
ZenCrowd Summary
• ZenCrowd: Probabilis8c reasoning over automa8c and crowdsourcing methods for en8ty linking
• Standard crowdsourcing improves 6% over automa8c • 4% -‐ 35% improvement over standard crowdsourcing • 14% average improvement over automa8c approaches
• Follow up-‐work (VLDBJ): – Also used for instance matching across datasets – 3-‐way blocking with the crowd
hap://exascale.info/zencrowd/
28
![Page 29: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/29.jpg)
ZenCrowd Architecture
Gianluca Demar8ni, Djellel Eddine Difallah, and Philippe Cudré-‐Mauroux. ZenCrowd: Leveraging Probabilis#c Reasoning and Crowdsourcing Techniques for Large-‐Scale En#ty Linking. In: 21st Interna8onal Conference on World Wide Web (WWW 2012)
Micro Matching
Tasks
HTMLPages
LOD Open Data Cloud
CrowdsourcingPlatform
ZenCrowdEntity
Extractors
LOD Index
Output
Probabilistic Network
Decision Engine
AlgorithmicLinkers
Micr
o-Ta
sk M
anag
er
Workers Decisions
AlgorithmicMatchers
InputDataset Pair
Graph DB 1
2
3
HTML + RDFa Pages
New Matchings<owl:sameAs>
Indexing
Input
29
![Page 30: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/30.jpg)
Blocking for Instance Matching
• Find the instances about the same real-‐world en8ty within two datasets
• Avoid Comparison of all possible pairs – Step 1: cluster similar items using a cheap similarity measure
– Step 2: n*n comparison within the clusters with an expensive measure
30
![Page 31: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/31.jpg)
3-‐steps Blocking with the Crowd • Crowdsourcing as the most expensive similarity measure
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e
c1
c2
c3
e c3
e c1
e c1
e c3
e c2
e c1
e
c1
c2
c3
e
c1
c2
c3
Micro Matching
Tasks
Step 1:Inverted Index Candidates
Matches with HIgh Confidance
e c3
e c1
e c1
e c3
e c2
e c1
e c2
e c1
Probabilistic network
SchemaBased
ConfidenceComputation
Step 2: Crowdsourcing non confidant matches
Step 3: Results Combination
31
![Page 32: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/32.jpg)
CrowdQ – Crowd-‐powered Query Understanding
![Page 33: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/33.jpg)
birthdate of the mayor of the capital city of italy
33
![Page 34: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/34.jpg)
capital city of italy
34
![Page 35: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/35.jpg)
mayor of rome
35
![Page 36: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/36.jpg)
birthdate of ignazio marino
36
![Page 37: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/37.jpg)
Mo8va8on
• Web Search Engines can answer simple factual queries directly on the result page
• Users with complex informa8on needs are o�en unsa8sfied
• Purely automa8c techniques are not enough • We want to solve it with Crowdsourcing!
37
![Page 38: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/38.jpg)
CrowdQ
• CrowdQ is the first system that uses crowdsourcing to – Understand the intended meaning – Build a structured query template – Answer the query over Linked Open Data
38
Gianluca Demar8ni, Beth Trushkowsky, Tim Kraska, and Michael Franklin. CrowdQ: Crowdsourced Query Understanding. In: 6th Biennial Conference on Innova8ve Data Systems Research (CIDR 2013).
![Page 39: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/39.jpg)
39
![Page 40: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/40.jpg)
User
Keyword QueryOn#line'Complex'Query
ProcessingComplex
query classifier
CrowdsourcingPlatform
Vetrical selection,
Unstructured Search, ...
POS + NER tagging
Query Template Index
Crowd Manager
N
Y
Queries Templ +Answer Types
StructuredLOD Search
Result Joiner
Template Generation
SERP
t1t2t3
Off#line'Complex'QueryDecomposition
Structured Query
Query Logquery
N
Answ
erCo
mpo
sitio
n
LOD Open Data Cloud
Match with existingquery templates
CrowdQ Architecture
40
Off-‐line: query template genera8on with the help of the crowd On-‐line: query template matching using NLP and search over open data
![Page 41: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/41.jpg)
Hybrid Human-‐Machine Pipeline
41
Q= birthdate of actors of forrest gump
Query annota8on Noun Noun Named en8ty
Verifica8on
En8ty Rela8ons
Is forrest gump this en8ty in the query?
Which is the rela8on between: actors and forrest gump starring
Schema element Starring <dbpedia-‐owl:starring>
Verifica8on Is the rela8on between: Indiana Jones – Harrison Ford Back to the Future – Michael J. Fox of the same type as Forrest Gump -‐ actors
![Page 42: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/42.jpg)
Structured query genera8on
SELECT ?y ?x WHERE { ?y <dbpedia-‐owl:birthdate> ?x .
?z <dbpedia-‐owl:starring> ?y . ?z <rdfs:label> ‘Forrest Gump’ }
42
Results from BTC09:
Q= birthdate of actors of forrest gump MOVIE
MOVIE
![Page 43: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/43.jpg)
Overview of hybrid systems
43
![Page 44: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/44.jpg)
Overview of hybrid systems
• Balance between systems that use the human component as pre-‐processing or post-‐processing of data (11 vs 13)
• Mostly monetary reward • Majority of systems perform batch data processing rather than real-‐8me jobs
• In 2014 we can observe a decreased number of hybrid human-‐machine systems being propose : focus on solving core problems rather than building new systems
44
![Page 45: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’](https://reader036.fdocuments.us/reader036/viewer/2022070923/5fbb83505534c9544f696c13/html5/thumbnails/45.jpg)
Summary
• Crowdsourcing big data can make you go bankrupt! -‐> hybrid systems
• When ask a human, when trust the machine • Hybrid (human in the loop)
– Pre-‐processing: training data for ML – Post-‐processing: based on confidence scores – Mix: ac8ve learning
Gianluca Demar8ni. Hybrid Human-‐Machine Informa#on Systems: Challenges and Opportuni#es. In: Computer Networks, Special Issue on Crowdsourcing, Elsevier. 45