Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd...
Transcript of Information Management via CrowdSourcing · Not to be confused with: •Wisdom of the Crowd...
1
Information Management
via CrowdSourcing Hector Garcia-Molina
Stanford University
Crowdsourcing
2
Not to be confused with:
• Wisdom of the Crowd
• Cloud Computing :-)
3
Does figure show > 45 dots?
4
Question A
Does figure show > 45 dots?
5
Does figure show > 45 dots?
6
Report Results for Question A
Does figure show > 45 dots?
7
Question B
Does figure show > 45 dots?
8
Does figure show > 45 dots?
9
Report Results for Question B
Many Crowdsourcing Marketplaces!
Real World Examples
11
Image Matching
Translation
Categorizing Images
Search Relevance
Data Gathering
Many Research Projects!
12
The Many Faces of Crowdsourcing
13
The Many Faces of Crowdsourcing
14
Human-Computer Interaction
Software Systems
Machine Learning
Human Issues
Information Management
Crowd Information Management
• Two Aspects:
– Crowd as Information Source
– Crowd as Data Processor
15
16
Fundamental Tradeoffs
Latency
Cost
Uncertainty
17
Efficiency: Fundamental Tradeoffs
Latency
Cost
Uncertainty
How much $$ can I spend?
How long can I wait?
What is the desired quality?
18
Efficiency: Fundamental Tradeoffs
Latency
Cost
Uncertainty
How much $$ can I spend?
How long can I wait?
What is the desired quality?
• Which questions do I ask humans?
• Do I ask in sequence or in parallel?
• How much redundancy in questions?
• How do I combine the answers?
• When do I stop?
Example: CrowdScreen
19
Dataset of Items Predicate
Y Y N
Item X satisfies predicate?
Filtered Dataset
Strategy
20
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
Strategy
21
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
continue
decide PASS
decide FAIL
Strategy
22
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
decision point
More Examples
23
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
More Examples
24
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
More Examples
25
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
More Examples
26
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
Some Optimizations
27
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
What is “best” strategy?
28
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
What is “best” strategy?
29
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
p(x,y) = probability of miss-classification at x,y
Expected error:
∑ p(x,y)*end(x,y)
end(x,y)= probability of terminating at x,y
Expected cost:
∑ (x+y)*end(x,y)
One (of many) optimization problems:
30
6 5 4 3 2 1
6
5
4
3
2
1
NOs
YESs
Find strategy that minimizes expected cost (# questions),
such that expected error is less than threshold
(and number of questions never exceeds m).
Example of Results
31
Beyond Single Filter
• Probabilistic Strategies
• Multiple Filters
• Categorizer (output more than 2 types)
32
Beyond Filtering
• Finding Max
• Sorting
• Clustering
• Entity Resolution
• Adding terms to a taxonomy
• Building a Folksonomy
• ...
33
Beyond Simple Models
• Worker Error Models
• Task Design
• Tracking Worker Abilities
• Payments
• Response Time Issues
• ...
34
35
Crowd As Information Source
DBMS like thing
Declarative queries
Web
36
The Deco Data Model
RDBMS
Actual schema
Conceptual schema
Schema designer
relations and other stuff
End user
relations
automatic (system)
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
Small Example
User view
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
⋈ o
Small Example
38
User view
restaurant
Chez Panisse
Bytes
• • •
restaurant rating
Chez Panisse 4.8
Chez Panisse 5.0
Chez Panisse 4.9
Bytes 3.6
Bytes 4.0
• • • • • •
restaurant cuisine
Chez Panisse French
Chez Panisse California
Bytes California
Bytes California
• • • • • •
• • • • • •
Anchor
Dependent Dependent
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
⋈ o
Small Example
39
User view
restaurant
Chez Panisse
Bytes
• • •
restaurant rating
Chez Panisse 4.8
Chez Panisse 5.0
Chez Panisse 4.9
Bytes 3.6
Bytes 4.0
• • • • • •
restaurant cuisine
Chez Panisse French
Chez Panisse California
Bytes California
Bytes California
• • • • • •
• • • • • •
Anchor
Dependent Dependent
fetch rule
fetch rule Bytes
Chez Panisse
fetch rule
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
⋈ o
Small Example
40
User view
restaurant
Chez Panisse
Bytes
• • •
restaurant rating
Chez Panisse 4.8
Chez Panisse 5.0
Chez Panisse 4.9
Bytes 3.6
Bytes 4.0
• • • • • •
restaurant cuisine
Chez Panisse French
Chez Panisse California
Bytes California
Bytes California
• • • • • •
• • • • • •
Anchor
Dependent Dependent
fetch rule
fetch rule Bytes
Chez Panisse
fetch rule fetch rule
French
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
⋈ o
Small Example
41
User view
restaurant
Chez Panisse
Bytes
• • •
restaurant rating
Chez Panisse 4.8
Chez Panisse 5.0
Chez Panisse 4.9
Bytes 3.6
Bytes 4.0
• • • • • •
restaurant cuisine
Chez Panisse French
Chez Panisse California
Bytes California
Bytes California
• • • • • •
• • • • • •
Anchor
Dependent Dependent
resolution rule
resolution rule
Bytes
Chez Panisse
restaurant rating cuisine
Chez Panisse 4.9 French
Chez Panisse 4.9 California
Bytes 3.8 California
• • • • • • • • •
⋈ o
Small Example
42
User view
restaurant
Chez Panisse
Bytes
• • •
restaurant rating
Chez Panisse 4.8
Chez Panisse 5.0
Chez Panisse 4.9
Bytes 3.6
Bytes 4.0
• • • • • •
restaurant cuisine
Chez Panisse French
Chez Panisse California
Bytes California
Bytes California
• • • • • •
• • • • • •
Anchor
Dependent Dependent
1. Fetch
2. Resolve
3. Join
Fetch
[n]
Fetch
[ln]
Fetch
[ln,c]
Scan
D1(n,l)
Scan
A(n) 43
Join
Join
AtLeast [8] SELECT n,l,c
FROM country
WHERE l = ‘Spanish’
ATLEAST 8
Resolve[m3] Resolve[d.e]
Fetch
[nl]
Scan
D2(n,c)
Resolve[m3]
Fetch
[nl,c]
Many Query Processing Challenges
Filter [l=‘Spanish’]
Fetch
[nl,c]
Deco Prototype V1.0
44
Experimental Setup
• Experimental Goals:
– Different fetch configurations
• Basic: n + nl + nc
• Reverse: ln + nl + nc
• Hybrid: ln,c + nl,c
– Different filter locations
• After vs. between joins
• Experimental Setup:
– 5 cents/task on MTurk
– Empty tables initially
– Default: reverse + between
SELECT n,l,c FROM country WHERE l = ‘Spanish’ ATLEAST 8
45
Experiment 1: Different Fetch Rules
46
Experiment 1: Different Fetch Rules
47
$12.00 and 2 hrs
$2.20 and 15 min
$1.30 and 11 min
Conclusion
• Crowdsourcing is exciting area!
• Many challenges!
48
49