[Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
-
Upload
shuyo-nakatani -
Category
Technology
-
view
8.947 -
download
0
description
Transcript of [Karger+ NIPS11] Iterative Learning for Reliable Crowdsourcing Systems
[Karger+] Iterative Learning for
Reliable Crowdsourcing Systems
2012/04/08 #NIPSreading
Nakatani Shuyo
Crowdsourcing
• Outsource to undefined public
– Almost workers are not experts
– Some workers may be SPAMMERs
• Amazon Mechanical Turk
– Separate a large task into microtasks
– Workers gain a few cents per a microtask
2
Spammer and Hammer
• Spam/Spammer
– submitting arbitrary answers for fee
• Ham/Hammer
– answering question correctly
• It is difficult to distinguish spam/spammers
– Requester doesn’t have a gold standard
– Workers are neither persistent nor unidentifiable
3
Questions
• How to ensure reliability of workers
– Is this worker is a spammer or hammer?
• How to minimize total price
– ∝ number of task assignments
• How to predict answers
– majority voting? EMA?
• How to estimate upper bound of error rate
– estimate upper bound
4
Setting
• 𝑡𝑖: tasks, 𝑖 = 1, ⋯ , 𝑚
• 𝑤𝑗: workers, 𝑗 = 1, ⋯ , 𝑛
• (l, r)-regular bipartite graph
– Each task assigns to l workers.
– Each worker assigns to r tasks.
• Given m and r, how to select l?
– 𝑚𝑙 = 𝑛𝑟, then 𝑛 =𝑚𝑙
𝑟 is decided.
5
t1 t2 t3 tm
w1 w2 w3 wn
…
…
Model
• 𝑠𝑖 = ±1: correct answers of ti (unobserved)
• 𝐴𝑖𝑗 : answers to ti of wj (observed)
• 𝑝𝑗 = 𝑝 𝐴𝑖𝑗 = 𝑠𝑖 for ∀𝑖 : reliability of workers
– It assumes independent on task
• 𝐄 2𝑝𝑗 − 12
= 𝑞 : average quality parameter
– 𝑞 ∈ 0, 1 close to 1 indicates that almost workers are
diligent
– q is set to 0.3 on the later experiment
6
Example: spammer-hammer model
• For 𝑞 ∈ 0, 1 given,
• 𝑝𝑗 = 1 with probability 𝑞
– wj is a perfect hammer (all correct).
• 𝑝𝑗 = 1/2 with probability 1 − 𝑞
– wj is a spammer (random answers)
• Then 𝐄 2𝑝𝑗 − 12
= 𝑞 × 1 + 1 − 𝑞 × 0 = 𝑞
7
Iterative Inference
• 𝑥𝑖→𝑗: real-valued task messages from ti to wj
• 𝑦𝑗→𝑖: worker messages from wj to ti
8 from [Karger+ NIPS11]
Prediction
• predicted answer:
𝑠𝑖 𝐴𝑖𝑗 𝑖,𝑗 ∈𝐸= sign 𝐴𝑖𝑗𝑦𝑗→𝑖
𝑗∈𝜕𝑖
– where 𝜕𝑖: neighborhood of ti
• error rate:
lim sup𝑚→∞
1
𝑚 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴𝑖𝑗 𝑖,𝑗 ∈𝐸
𝑚
𝑖=1
9
Performance Guarantee
10
Theorem 2.1
• For l >1, r >1, 𝑞 ∈ 0, 1 given, let 𝑙 = 𝑙 − 1, 𝑟 = 𝑟 − 1.
• Assume m tasks assign to 𝑛 = 𝑚𝑙/𝑟 workers according
to (l, r)-regular bipartite graph
• Estimate from k iterations of the iterative algorithm
• If 𝜇 ≡ 𝐄 2𝑝𝑗 − 1 > 0 and 𝑞2 > 1/𝑙 𝑟 , then
lim sup𝑚→∞
1
𝑚 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴𝑖𝑗 𝑖,𝑗 ∈𝐸
𝑚
𝑖=1
≤ 𝑒−
𝑙𝑞
2𝜌𝑘2
– where
11
Corollary 2.2
• Under the hypotheses of Theorem 2.1,
lim sup𝑘→∞
lim sup𝑚→∞
1
𝑚 𝑝 𝑠𝑖 ≠ 𝑠𝑖 𝐴𝑖𝑗 𝑖,𝑗 ∈𝐸
𝑚
𝑖=1
≤ 𝑒−
𝑙𝑞
2𝜌∞2
• where
– For 𝑞 = 0.3, 𝑙 = 𝑟 = 25 then r.h.s. = 0.31
– For 𝑞 = 0.5, 𝑙 = 25, 𝑟 = 10 then r.h.s. = 0.15
12
Experiments
• m = n = 1000, l = r
• left: q=0.3, 𝑙 ∈ [1,30]
• right: l = 25, 𝑞 ∈ [0, 0.4]
13 from [Karger+ NIPS11]
Lower Bound
14