Data Processing with Mechanical Turk
-
Upload
aweber -
Category
Technology
-
view
444 -
download
0
description
Transcript of Data Processing with Mechanical Turk
![Page 1: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/1.jpg)
data processing with mechanical turkKelly O'Brien @klm427; github.com/kellyob
Michael Becker @beckerfuffle; github.com/mdbecker
#ptw2013
![Page 2: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/2.jpg)
Mechanical Turk
#ptw2013
![Page 3: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/3.jpg)
"The Turk"
#ptw2013
![Page 4: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/4.jpg)
Lets focus on the crowdsourcing...
Relatively cheap means of getting random samples of input for small, tedious tasks
"Crowdsourced labor can cost companies less than half as much as typical outsourcing"
-- Panagiotis G. Ipeirotis, an associate professor at NYU's Stern School of Business
#ptw2013
![Page 5: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/5.jpg)
"Nothing is a waste of time if you use the experience wisely."
~Auguste Rodin
#ptw2013
![Page 6: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/6.jpg)
The business challenge....
#ptw2013
![Page 7: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/7.jpg)
The solution....
#ptw2013
![Page 8: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/8.jpg)
Let start with the basics
TemplateDataRequester's
Data
TemplateTemplate
TemplateHITs
Workers (Turkers)#ptw2013
![Page 9: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/9.jpg)
Use cases
● Classification
● Transcription
● Content Generation
● Surveys
#ptw2013
![Page 10: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/10.jpg)
Do people actually use this?
#ptw2013
![Page 11: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/11.jpg)
AOL
#ptw2013
![Page 12: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/12.jpg)
#ptw2013
![Page 13: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/13.jpg)
CardMunch @LinkedIn
#ptw2013
![Page 14: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/14.jpg)
The Sheep Market
#ptw2013
![Page 15: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/15.jpg)
Development Tools
● Requester user interface
● Amazon offers four official APIs○ Ruby, .NET, Perl, and Java
● AWS API
● Boto mturk○ Python
● Houdini, Clockwork Raven, Crowdflower, QuikTurKit
#ptw2013
![Page 16: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/16.jpg)
Create a HIT
● A title● A description● Keywords, used to help Workers find the HITs with a search● The amount of the reward● An amount of time in which the Worker must complete the HIT● An amount of time after which the HIT will no longer be available
to Workers● The number of Workers needed to submit results for the HIT
before the HIT is considered complete● Qualification requirements● All of the information required to answer the question
#ptw2013
![Page 17: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/17.jpg)
Process Results
● Assignment id● Worker id● HIT id● Assignment status● Auto approval time● Accept time● Submit time● Approval time● Rejection time● Deadline● Answer● Requester feedback
#ptw2013
![Page 18: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/18.jpg)
What was the question?
● Question forms
● External questions
● HTML questions
#ptw2013
![Page 19: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/19.jpg)
Formatting HITs
● Compact
● Coherent
● Cost-effective
#ptw2013
![Page 20: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/20.jpg)
Bad Actors
"Unfortunately, since manually verifying the quality of the submitted results is hard, malicious workers often take advantage of the verification difficulty and submit answers of low quality." [1]
#ptw2013
![Page 21: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/21.jpg)
Quality Control
● Manually spot check
● Qualifications● Multiple agreement● Gold HITs● Calculate worker
error
#ptw2013
![Page 22: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/22.jpg)
Quality Control: Manually Check
Look through the results of some workers and manually reject/ban those which look bad
#ptw2013
![Page 23: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/23.jpg)
Quality Control: Multiple Agreement
1. Submit HITs to multiple turks (3-10)2. Reject/throw out all HITs below some
agreement threshold
#ptw2013
![Page 24: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/24.jpg)
Quality Control: Qualifications
● Pay extra for "superior" turks
● Build your own custom qualification
"Thought Masters was just bad for non-blessed workers? It's even worse for requesters [1]"#ptw2013
![Page 25: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/25.jpg)
Quality Control: Gold HITs
1. Give turks HITs which we know the correct answer to2. Reject/Ban turks with high error ratesThis technique is used by CrowdFlower
#ptw2013
![Page 26: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/26.jpg)
Quality Control: Calculate Error
Calculate each worker's error rate based solely on their agreement with other workers. Use an expectation-maximization algorithm as described by Dawid and Skene.
Lots of math, consider using 3rd party service like Project Troia
#ptw2013
![Page 27: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/27.jpg)
Auto-approval
"Quick approval is important, too. Watching that money pile up is a serious motivator; I’ll sometimes choose a lower-paying task that approves in close to real
time over a higher-paying one that won’t pay out for several days."-worker[1]
#ptw2013
![Page 28: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/28.jpg)
Turkopticon
"Turkopticon lets you REPORT and AVOID shady employers"
#ptw2013
![Page 29: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/29.jpg)
Turkernation
"If you want to make a living on Amazon Mechanical Turk, this is the forum for you"
#ptw2013
![Page 30: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/30.jpg)
Do's and Don'ts
#ptw2013
![Page 31: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/31.jpg)
What exactly do I do with this?
#ptw2013
![Page 32: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/32.jpg)
A demo in python
#ptw2013
![Page 33: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/33.jpg)
Requirements
#ptw2013
![Page 34: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/34.jpg)
Data Details
#ptw2013
![Page 35: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/35.jpg)
Question template
#ptw2013
![Page 36: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/36.jpg)
Build a custom qualification
#ptw2013
![Page 37: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/37.jpg)
Post HITs....
#ptw2013
![Page 38: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/38.jpg)
Success.
#ptw2013
![Page 39: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/39.jpg)
Let the work begin.
#ptw2013
![Page 40: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/40.jpg)
To get results...
#ptw2013
![Page 41: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/41.jpg)
AWeber
We're hiring.aweber.jobs
![Page 42: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/42.jpg)
....and we have slides.
aweberopenhouse.eventbrite.com
![Page 43: Data Processing with Mechanical Turk](https://reader034.fdocuments.us/reader034/viewer/2022052310/55499836b4c9050c708b4793/html5/thumbnails/43.jpg)