Post on 26-Feb-2016
description
© 2009 Amazon.com, Inc. or its Affiliates.
Amazon Mechanical TurkNew York City Meet Up
September 1, 2009
WELCOME!
© 2009 Amazon.com, Inc. or its Affiliates.
AGENDA
Welcoming Statements Introductions Dolores Labs – Video Directory Use Case Knewton – Adaptive Learning Use Case FreedomOSS – Enterprise Integration New York University – Worker Quality Solution Panel Questions and Answers
© 2009 Amazon.com, Inc. or its Affiliates.
Amazon Mechanical TurkRequester Meetup
Howie LiuDolores Labs
© 2009 Amazon.com, Inc. or its Affiliates.
Dolores Labs Introduction
Founded in 2008 by Lukas Biewald, Senior Scientist, Powerset (MSFT); Yahoo! Search; Stanford AI Lab– Recognized enormous potential of AMT
platform Dolores Labs develops quality control technology
(CrowdControl™) to make AMT more accessible and reliable
© 2009 Amazon.com, Inc. or its Affiliates.
Case Study
A large video directory needed to select relevant thumbnails for 200k+ videos
© 2009 Amazon.com, Inc. or its Affiliates.
Why Mechanical Turk? Size of project and turnover speed made MTurk the
obvious solution– Given the needs of the client, traditional
outsourcing or hiring employees was not an option
– However, the client was concerned about quality of results
Inherent variability of Mechanical Turk workers– Unlike other Amazon marketplaces, workers are
not a perfect commodity– Significant variations in quality (accuracy)– Need to ensure workers diligently completed work– Intelligently aggregate multiple responses to find
the single best thumbnail for a video
6
© 2009 Amazon.com, Inc. or its Affiliates.
3 Step Process for Optimizing the Task
Baseline Performance
• Create a custom interactive UI
• 74% result accuracy
CrowdControl™
• Apply statistical quality control
• 90% result accuracy
CrowdControl™ + 2 pass
• Second pass for Turkers to verify results
• 98% result accuracy
© 2009 Amazon.com, Inc. or its Affiliates.
High Quality on Mechanical Turk: Best Practices
Statistical inference algorithms to dynamically assess quality– …Of each worker, of each result– …While the task is live– Smart allocation of worker resources
• Blindly increasing redundancy is expensive Aggregating all responses from workers with varying quality into
a single “best” answer White paper with Stanford AI Lab about quality on AMT
http://bit.ly/DLpaper
Baseline PerformanceCrowdControl™
CrowdControl™_x000d_+ Custom Solutions
70 75 80 85 90 95 100
CrowdControl™ vs Baseline Result Ac-curacy
© 2009 Amazon.com, Inc. or its Affiliates.
Other Insights
Clear task instructions are crucial for good results– Garbage in, garbage out
Intuitive and efficient task interface makes the task faster (read—cheaper) and more fun!
Mechanical Turk is an unprecedented, hyper-efficient labor marketplace– Need to understand its dynamics
through experience in order to harness its power
© 2009 Amazon.com, Inc. or its Affiliates.
Amazon Mechanical TurkRequester MeetupDahn Tamir, Knewton Inc.
© 2009 Amazon.com, Inc. or its Affiliates.
Knewton - Introduction
Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine.
Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field.
© 2009 Amazon.com, Inc. or its Affiliates.
How we use MTurk
Quality assurance
Focus Groups and Surveys
Database building
Marketing
Calibration for computer-adaptive testing
© 2009 Amazon.com, Inc. or its Affiliates.
Why Mturk?
Cost
Appropriate worker population for each task
Quality
Speed
© 2009 Amazon.com, Inc. or its Affiliates.
What We Learned
Use qualification tests
Invest in building good HITs
Hesitate to reject work (but not cheaters)
Turkers are a diverse and capable population
Meet Turker Nation
© 2009 Amazon.com, Inc. or its Affiliates.
Thank you!---
Questions?
dahn@knewton.com978-KNEWTON
© 2009 Amazon.com, Inc. or its Affiliates.
Amazon Mechanical TurkRequester Meet-up(Max Yankelevich, Chief Architect– Freedom OSS)
© 2009 Amazon.com, Inc. or its Affiliates.
Freedom OSS- Introduction Freedom OSS is a professional services organization with a focus on Practical
Implementations using Cloud Computing & Open Source Technologies International Firm
– US Offices: PA,NYC, GA, KC ,NV, WA,NC– 4 Large Solution Centers in Eastern Europe (Russia, Belarus, Ukraine and
Lithuania) Practical Approach to Cloud Computing – most successfully completed
Enterprise Cloud Computing projects in the Industry Key Cloud Computing Partnerships
– Top Amazon AWS Enterprise System Integrator – Top Eucalyptus Enterprise Partner
Key Open Source Partnerships– Top Red Hat Advanced Business Partner– #1 JBoss Advanced Business Partner in US
2008 “JBoss SOA Innovation” Award Winner 2007-08 “Practical SOA” Award Winner 2008 “Red Hat Extensive Ecosystem” Award Winner Leading technology partner for many Fortune 2000 companies Freedom is a privately held corporation
© 2009 Amazon.com, Inc. or its Affiliates.
MTurk and Enterprise Integration Most Legacy systems are not architected to include the human
intervention Providing a technological interface to maintain the workflow
while inserting human intelligence and building self adjudicating business flows
Leveraging Mechanical Turk programmatically in your everyday systems
Freedom OSS has leveraged the power of Enterprise Service Bus (ESB) & Practical Service Oriented Architecture (SOA) to make the process of on-boarding and managing MTurk workers a rapid and cost effective process
Using its Professional Open Source ESB – freeESB , Freedom has developed many powerful Connectors for some of the most used Enterprise Systems and Technologies such as SAP, Mainframe, Siebel, Java/J2EE, Oracle , IBM MQ ,etc
© 2009 Amazon.com, Inc. or its Affiliates.
Master Data Cleansing & Validation Use Case Keeping Master Customer Data File
(Master Data Management)– Record de-duping– Contact information validation
Traditional MDM tactics– Expensive software– Big Bang approach– Invasive Code Changes to Legacy
Applications Clean and consistent customer data
© 2009 Amazon.com, Inc. or its Affiliates.
AWS Cloud
freeESBRouting , Transformation, Connectivity, QoS
Business Applications
Real-timeEvents
Real-time access
Legacy Applications
Mainframe, Client-Server, Oracle, .NET, SAP, Siebel ,etc
APIFirst Turk Task –
Simple Data Checking
Second Turk Task – Deeper Data Checking
Third Turk Task – Data
Edit/Trusted Task
Master Data
Business Process Orchestration &
WorkflowBusiness Rules
Engine
© 2009 Amazon.com, Inc. or its Affiliates.
Outcome
Low operational costs Non-invasive data integration High-degree of accuracy due to multi-task
distribution Some Best Practices when integrating MTurk
within an Enterprise– Deliver value incrementally– Inversion of Control
© 2009 Amazon.com, Inc. or its Affiliates.
Thank you!---
Questions?
© 2009 Amazon.com, Inc. or its Affiliates.
Amazon Mechanical TurkRequester Meetup(Panos Ipeirotis – New York University)
© 2009 Amazon.com, Inc. or its Affiliates.
“A Computer Scientist in a Business
School”
http://behind-the-enemy-lines.blogspot.com/
Email: panos@nyu.edu
Panos Ipeirotis - Introduction New York University, Stern School of Business
© 2009 Amazon.com, Inc. or its Affiliates.
Example: Build an Adult Web Site Classifier
Need a large number of hand-labeled sites Get people to look at sites and classify them
as:G (general), PG (parental guidance), R (restricted), X (porn)
Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost:
$15/hr MTurk: 2500 websites/hr, cost: $12/hr
© 2009 Amazon.com, Inc. or its Affiliates.
Bad news: Spammers!
Worker ATAMRO447HWJQ
labeled X (porn) sites as G (general audience)
© 2009 Amazon.com, Inc. or its Affiliates.
Improve Data Quality through Repeated Labeling
Get multiple, redundant labels using multiple workers Pick the correct label based on majority vote
Probability of correctness increases with number of workers
Probability of correctness increases with quality of workers
1 worker
70%
correct
11 workers
93%
correct
© 2009 Amazon.com, Inc. or its Affiliates.
11-vote Statistics MTurk: 227 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost:
$15/hr
Single Vote Statistics MTurk: 2500 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost:
$15/hr
But Majority Voting is Expensive
© 2009 Amazon.com, Inc. or its Affiliates.
Using redundant votes, we can infer worker quality
Look at our spammer friend ATAMRO447HWJQ together with other 9 workers
Our “friend” ATAMRO447HWJQ mainly marked sites as G.Obviously a spammer…
We can compute error rates for each worker
Error rates for ATAMRO447HWJQ P[X → X]=9.847% P[X → G]=90.153% P[G → X]=0.053% P[G → G]=99.947%
© 2009 Amazon.com, Inc. or its Affiliates.
Rejecting spammers and BenefitsRandom answers error rate = 50%Average error rate for ATAMRO447HWJQ: 45.2% P[X → X]=9.847% P[X → G]=90.153% P[G → X]=0.053% P[G → G]=99.947%
Action: REJECT and BLOCK
Results: Over time you block all spammers Spammers learn to avoid your HITS You can decrease redundancy, as quality of workers is
higher
© 2009 Amazon.com, Inc. or its Affiliates.
After rejecting spammers, quality goes up Spam keeps quality down Without spam, workers are of higher quality Need less redundancy for same quality Same quality of results for lower cost
With spam
1 worker
70%
correct
With spam
11 workers
93%
correct
Without
spam
1 worker
80% correct
Without
spam
5 workers
94% correct
© 2009 Amazon.com, Inc. or its Affiliates.
Correcting biases Classifying sites as G, PG, R, X Sometimes workers are careful but biased
Classifies G → P and P → R Average error rate for ATLJIK76YH1TF: 45.0%
Error Rates for Worker: ATLJIK76YH1TFP[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X →
X]=100.0%
Is ATLJIK76YH1TF a spammer?
© 2009 Amazon.com, Inc. or its Affiliates.
Correcting biases
For ATLJIK76YH1TF, we simply need to compute the “non-recoverable” error-rate (technical details omitted)
Non-recoverable error-rate for ATLJIK76YH1TF: 9%
Error Rates for Worker: ATLJIK76YH1TFP[G → G]=20.0% P[G → P]=80.0% P[G → R]=0.0% P[G → X]=0.0%P[P → G]=0.0% P[P → P]=0.0% P[P → R]=100.0% P[P → X]=0.0%P[R → G]=0.0% P[R → P]=0.0% P[R → R]=100.0% P[R → X]=0.0%P[X → G]=0.0% P[X → P]=0.0% P[X → R]=0.0% P[X →
X]=100.0%
© 2009 Amazon.com, Inc. or its Affiliates.
Too much theory?Open source implementation available at:
http://code.google.com/p/get-another-label/
Input: – Labels from Mechanical Turk– Cost of incorrect labelings (e.g., XG costlier than
GX) Output:
– Corrected labels– Worker error rates– Ranking of workers according to their quality
Alpha version, more improvements to come! Suggestions and collaborations welcomed!
© 2009 Amazon.com, Inc. or its Affiliates.
Thank you!
Questions?
“A Computer Scientist in a Business School”
http://behind-the-enemy-lines.blogspot.com/
Email: panos@nyu.edu