Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are...

88
Collective Intelligence: Crowdsourcing groundtruth data for large scale evaluation in Information Retrieval Sergej Zerr

Transcript of Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are...

Page 1: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Collective Intelligence: Crowdsourcing groundtruth data for large scale evaluation in

Information RetrievalSergej Zerr

Page 2: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Challenges in IR Evaluation

documentsD_1

D_2

:D_k

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 2

Document Set

query

Test CollectionTest

CollectionTest CollectionTest

CollectionTest Collections

➢ BigData➢ Heterogeneity (larger annotation demand)➢ Dynamicity (updates required)➢ Novel tasks (no test collections)

• Relevance ranking• Search result diversification• Temporal retrieval• etc.

Page 3: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Challenges in IR Evaluation

3

➢ Better human accessibility➢ WiFi, Mobile Networks, Portable gadgets (larger crowd)

➢ Challenges:

➢ How to motivate the crowd to work?➢ How to obtain meaningful results from the individuals?➢ How to aggregate the crowdsourced results?➢ How to evaluate the output?

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr

Page 4: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Outline

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 4

• Collaborative Advantages• The wisdom of crowds• Conditions for a successful collaboration

• Obtaining collaborative knowledge• Crowd motivation• Scalability/Efficiency• Own work

• Input/Output Evaluation• Users and Data• Quality assurance

• Discussion

Page 5: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Collaboration

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 5

Often we need more than one hand

Also more than one brain

“Why the Many Are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies and Nations”James Suroewicki

.. The Wisdom of Crowds ..

Page 6: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 6

In 1906, the statistician Francis Galton observed a competition at a country fair. The crowd accurately guessed the weight of an ox when their individual guesses were averaged (the average was closer to the ox's true butchered weight than the estimates of most experienced crowd members)

Page 7: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Crowd IQ: aggregating opinions to boost performance

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 7

Yoram Bachrach, Thore Graepel, Gjergji Kasneci, Michal Kosinski, Jurgen Van Gael: Crowd IQ: aggregating opinions to boost performance. AAMAS 2012

Page 8: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

United Brains

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 8

Expert

Util

ityMasses

# of contributors10 10,000+

Equivalent, or greater, utility under the curve

Page 9: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

4,000 experts80,000 articles200 years to developAnnual Updates

>~100,000 amateurs1.6 Million articles5 years to developReal-Time Updates

United Brains

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 9

Expert $$$$

Util

ityMasses $

# of contributors4000 100,000+

Page 10: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Can we collaborate? (Task 1)

➢ Guess the weight in KG

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 10

Page 11: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

(In,-) Direct Collaboration in IR can be used:

➢ Collaborative tagging, ➢ Favorite assignments,➢ Click logs,➢ Data partitioning,➢ Recomendations,➢ ect., ect.,ect….

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 11

Tags: Rainbow, Sea, Island, Green, Palm tree, Maui

Page 12: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Collaboration: Paradox

➢ Using „Wisdom of Crowds” is not always straight-forward to achieve.

➢ Collaborative work needs to be managed efficiently

➢ Kasparov won against the world in 1999http://en.wikipedia.org/wiki/Kasparov_versus_the_World

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 12

ORACLE speedups. https://docs.oracle.com/cd/E19205-01/819-5265/bjael/index.html

#processors vs performance

Page 13: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Collaboration: Success Criteria

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 13

Criteria DescriptionDiversity of opinion Each person should have “private” information.Independence People's opinions aren't determined by the opinions of those

around them.Decentralization People are able to specialize and draw on local knowledge.Aggregation Effective mechanism exists for turning private judgments into

a collective

Page 14: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Groupthink Symptoms:Irving Lester Janis (26 May 1918 - 15 November 1990)

• Collective rationalization • Self-censorship

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 14

• Direct pressure on dissenters • Self-appointed ‘mindguards’

https://www.youtube.com/watch?v=fuIXiXqv978

https://www.youtube.com/watch?v=6vWjXgxQhOk

Page 15: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Groupthink Symptoms:Irving Lester Janis (26 May 1918 - 15 November 1990)

• Collective rationalization • Self-censorship

28.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 15

• Direct pressure on dissenters • Self-appointed ‘mindguards’

https://www.youtube.com/watch?v=fuIXiXqv978

Page 16: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Collaboration

“The best collective decisions are the product of disagreement and contest, not consensus or compromise.”

“The best way for a group to be smart is for each person in it to think and act as independently as possible.”

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 16

Page 17: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Outline

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 17

• Collaborative Advantages• The Wisdom of Crowds• Conditions for a successful collaboration

• Obtaining collaborative knowledge• Crowd motivation• Scalability/Efficiency• Own work

• Input/Output Evaluation• Users and Data• Quality assurance

• Discussion

Two different images that share the same labels:man and woman

Page 18: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Machine Vs. Human

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 18

Humans can (yet) solve some tasks more efficient and/or accurate as a machine would do.

• Captcha (OCR)• Classification • Image tagging• Speech recognition• Face/emotion recognition

Page 19: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Declarative Crowdsourcing Systems

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 19

J. Fan et al: CrowdOp: Query Optimization for Declarative Crowdsourcing Systems, TKDE, 2015.Michael J. Franklin et al: CrowdDB: answering queries with crowdsourcing, SIGMOD 2011A. Marcus et al: Human-powered Sorts and Joins, VLDB 2011

SELECT c.nameFROM celeb c JOIN photos pON samePerson(c.img,p.img)TASK samePerson(f1, f2) TYPE EquiJoin:SingluarName: "celebrity"LeftPreview: "<img src=’%s’>",tuple1[f1]RightPreview: "<img src=’%s’ >",tuple2[f2]Combiner: MajorityVote

SELECT image i FROM serengetyORDER BY CROWDORDER (i, "Which image contains more baby animals");

=

=

Page 20: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Gathering Input, Reusing “Natural” Human Power

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 20

Page 21: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Human Computation Platforms and Motivation for Participation

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 21

• Helping/Contribute to something important• Social pressure• Virtual goods• Competitions• Gaming• Money

Paid Crowdsourcing

Citizen Science

Page 22: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Monetary based Motivation

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 22

Human Intelligence Task (HIT)

Page 23: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Mturk: IR Example – Snippet Evaluation

➢ Research question and setup ➢ Study on summary lengths➢ Determine preferred result length➢ Asked workers to evaluate snippet quality

➢ Payment between $0.01 and $0.05 per HIT➢ 12,790 queries - 40K judgments 400$-2000$ (300h of work)

M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 50

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 23

Page 24: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

IR Example – Relevance Assessment

➢ Research question and setup➢ Replace TREC-like relevance assessors with MTurk?➢ Selected topic “space program” (011)➢ Modified original 4-page instructions from TREC

➢ Result:➢ Workers more accurate than original assessors!➢ 40% provided justification for each answer

➢ Payment between $0.02 per HIT➢ 1 topic, 29 documents - 290 judgments (6$)

O. Alonso and S. Mizzaro. “Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment”, SIGIR Workshop on the Future of IR Evaluation, 2009. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 51

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 24

Page 25: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Games

➢ ESP Game: label images➢ Image retrieval by text

➢ Squigl: match the labels to areas➢ Object recognition

➢ Matchin: find the better image➢ Image ranking

➢ FlipIt: memory with similar images➢ Near duplicate detection

• Other areas covered as well: label songs, find synonyms, describe videos• See: www.gwap.com by Luis von Ahn

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 25

Page 26: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Sergej Zerr 26

Useful human power for annotating the Web

• 5000 people playing simultaneously could label all images on Google in 30 days!

• Individual games in Yahoo! and MSN average over 5,000 players at a time

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 26

Page 27: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Urbanopoly

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 27

I. Celino et al., "Urbanopoly -- A Social and Location-Based Game with a Purpose to Crowdsource Your Urban Data," Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom)

Page 28: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Competition based Motivation (Image Privacy)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 28

Page 29: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Gathering average community notion of privacy

➢ We crawled “most recently uploaded” Flickr photos (2 Months)➢ Started a social annotation game (over the course of 2 weeks)➢ 81 users (colleagues, social networks , forum users) , 6 teams➢ Collected around 30K annotated photos

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 29

Sergej Zerr , Stefan Siersdorfer , Jonathon Hare , Elena Demidova Privacy-Aware Image Classification and Search , SIGIR‘12

Page 30: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

The GUI for Privacy Aware Image IR

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 30

Page 31: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Motivation: Add Social Pressure

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 31

Oluwaseyi Feyisetan, Elena Simperl: Please Stay vs Let's Play: Social Pressure Incentives in Paid Collaborative Crowdsourcing. ICWE 2016

Page 32: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Combine Gamification, Competition and Money

➢ Problem: improve time aware cost effectiveness of crowdsourcing

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 32

Individual reward mechanisms

Competitive game designs for improving the cost

effectiveness of crowdsourcing

CIKM’14

Team-based reward mechanisms

Groupsourcing: Team competition designs for

crowdsourcingWWW’15

Temporal-based crowdsourcing

performance

Just in Time: Controlling Temporal Performance in

Crowdsourcing CompetitionsWWW’16

Page 33: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Reward Distribution 1: “Pay-per-Task” (Baseline)

Reward Distribution 1: “Pay-per-Task” (Baseline)➢ Fixed reward rate c ($ per task) for each worker➢ Reward of workers proportional to value produced by worker (e.g. no. of annotations, ratings, etc.)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 33

M. Rokicki, S. Chelaru, S. Zerr, and S. Siersdorfer. Competitive game designs for improving the cost effectiveness of crowdsourcing. CIKM’14

Page 34: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Reward Distribution 2: Competitions

➢ Workers compete during limited time period➢ Workers obtain scores based on their performance (e.g. no. of tasks fulfilled)

− Ranking of the workers based on their performance− Distributing of the rewards according to the rank

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 34

Page 35: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Workers’ View: Tasks

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 35

Page 36: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Performance of Strategies (Captcha Task)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 36

Page 37: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Competition among Top-10 Workers

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 37

Page 38: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Team-based reward mechanismsCan we use work groups to further improve the performance?

M. Rokicki, S. Zerr, and S. Siersdorfer. Groupsourcing: Team competition designs for crowdsourcing. WWW’15

WWW 2015

monetary incentives

competitions

individuals teams

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 38

Page 39: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

➢ Rewards:➢ Non-linear distribution among teams➢ Individual share proportional to contribution

➢ Communication:➢ Team chats with notifications

➢ Combinations with individual reward− balanceTS− ind-balanceTS

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 39

Page 40: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Performance of the Strategies

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 40

Page 41: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Results: Team Contributions

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 41

Page 42: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Temporal-based crowdsourcing performanceCan we control the crowd to annotate at right times?

M. Rokicki, S. Zerr, and S. Siersdorfer. Just in Time: Controlling Temporal Performance in Crowdsourcing Competitions. WWW’15

WWW 2016 (to appear)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 42

Page 43: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Peak vs NonPeak

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 43

Page 44: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Number of correct annotations per minute around a typical bonus hour.

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 44

Page 45: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Annotation dynamics

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 45

Page 46: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Workers Interaction

➢Communication in team chats➢ 3,400 messages by over 200 participants

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 46

Bonus coming 14 minutes from now. prepare everyone, we must try that first place :D

our rank fall from rank 4-5 to 12 bcoz the other team work on time of bonus

as soon as they announce the timeof the next bonus I will email you

Page 47: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Contributions

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr

Medium Information policy and exponential rewards significantly increase crowd performance by 300%

47

Balanced teams + individual rewards further increase performance and make the work more attractive (+30%)

Framework using our strategies additionally increased output in peak times by more than 300%

Individual reward mechanisms

Competitive game designs for improving the cost

effectiveness of crowdsourcingCIKM’14

Team-based reward mechanisms

Groupsourcing: Team competition designs for

crowdsourcingWWW’15

Temporal-based crowdsourcing performance

Just in Time: Controlling Temporal Performance in Crowdsourcing

CompetitionsWWW’16

Page 48: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Motivation: Contribute to Science (“Zooniverse”)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 48

https://www.zooniverse.org/

Start: 20091.6M users106M classifications90 Projects, 50 active

Page 49: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Motivation: Contribute to Science (“Cities at Night”)

➢ Classification of night photos from ISS to estimate artificial light pollution in cities➢ Observe temporal development, measure impact on citizens and biosphere

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 49

http://stars4all.euhttp://www.citiesatnight.org/

Start: 2011Users: 50K1.3M Tasks647 Projects

Page 50: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Motivation: Contribute to Science (“Cities at Night”)

➢ Task 1: “Dark skies” – Find night cities in a photo stream (over 100K annotated)

➢ Task 2: “Lost at night” – identify the city on the photograph (around 500 identified)

➢ Task 3: “Night cities” – position, rotate and scale the image to the map.

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 50

Page 51: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Combine Human and Machine Input

➢ Assign DBPedia class to entities

➢ Baby food, Petroleum industry in Nigeria, Light infantry

➢ "Region", "Locality„ "Settlement"

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 51

Using microtasks to crowdsource Dbpedia entity classification: A study in workflow Design. Qiong Bu, Elena Simperl, Sergej Zerr and Yunjia Li

Page 52: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Combine Human and Machine Input

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 52

Page 53: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Output Aggregation

➢ Statistical models➢ Majority voting

➢ Graphical models

➢ Optimization models

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 53

Page 54: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Outline

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 54

• Collaborative Advantages• The Wisdom of Crowds• Conditions for a successful collaboration

• Obtaining collaborative knowledge• Gathering Data from Social Web / Mechanical Turk• From Data to knowledge (Applications)• Own work

• Input/Output Evaluation• Users and Data• Quality assurance

• Discussion

Page 55: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Asking questions

➢ Ask / formulate the right questions ➢ Part art, part science ➢ Instructions are key➢ Workers may not be IR experts (don’t assume the same understanding in terms of terminology)

➢ Show examples

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 55

N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004

Page 56: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Quality: Ambiguity and Subjectivity

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 56

Nederland, netherlands, holland, dutchRotterdam, wielrennen, cycling, duckle grand depart, tour de france, Reklame, caravan, Funny Fotos

What is relevant?

„Alice saw Bob with the binoculars“

Page 57: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Quality: Data from Social Web

➢ Simple random sample can result in a set dominated by few power user

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 57

Page 58: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Demographic Bias (Zooniverse)

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 58

Projectpreferences

Working sessions

S. Zerr, R. Tinati, M. Luczak-Roesch, and E. Simperl: Investigating the Global Adoption of Citizen Science. Collective Intelligence 2016

Page 59: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Rater Reliability “Where is the cat?”

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 59

X XV V V V

XX XVV V

X X X X X X

V VX XX X Results

Page 60: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Quality Assurance

➢ Qualification Tests➢ Test questions➢ „Static“ Honeypots➢ „Dynamic“ honeypots➢ Workers‘ reputation mechaisms➢ Inter-rater agreement

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 60

Page 61: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Test Questions

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 61

Matthew Lease and Omar Alonso: http://de.slideshare.net/mattlease/crowdsourcing-for-search-evaluation-and-socialalgorithmic-search

Throw the coin and tell us the result

• Head• Tail

Results

• Head 61• Tail 39

People often tend just to select the first option ☹

Better: Some preliminary textual answer

• Coin type?• Head or tail.

Page 62: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

➢ Static honeypots➢ Let the workers perform the task. Reject the results with honeypot errors

➢ Dynamic batches with injected honeypots➢ Only reject the low quality batches

Honeypots

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 62

b1

b2

Page 63: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Measure the Inter–Rater Reliability

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 63

Attractive?

1 1

0 1

0 0

1 0

1 0

1 1

Atotal

Yes NoB Yes 2 1 3

No 2 1 3

total 4 2 6

Naive approach: 3 cases out of 6 = 0.5 agreement

Kilem L. Gwet, Handbook of inter-rater reliability 2010

A B

Page 64: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Statistic Significance

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 64

First experiment:Throw the dices with the right hand

10 times.

Compute the average3.7

Second experiment:Throw the dices with the left hand

10 times.

Compute the average4.4

Claim that the left hand is better…..

Page 65: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Inter–Rater Reliability: Cohen’s Kappa(1960)

➢ Idea: We need to remove agreement achieved just by chance

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 65

Atotal

Yes NoB Yes 35 20 55

No 5 40 45

total 40 60 100

Atotal

Yes No

B Yes n11 n12

No n21 n22

total

Page 66: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Inter–Rater Reliability: Missing Values

➢ Idea: Use partial ratings to estimate the marginal probability only

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 66

Atotal

Yes No XB Yes 30 18 2 50

No 5 34 3 42X 5 3 0 8

total 40 55 5 100

Atotal

Yes No X

B Yes n11 n12 nx1

No n21 n22 nx2

X n1x n2x 0

total

Page 67: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Task 2: Cohens Kappa

➢ Exercise2: Find a partner. Using your judgements from Task 2,compute Cohens Kappa between your and your partners' rating.

https://en.wikipedia.org/wiki/Cohen%27s_kappa

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 67

Page 68: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Inter–Rater Reliability: Extensions

➢ Multiple Raters/Categories: ➢ Fleiss 1971 – Average over random pairs of raters for random objects

➢ Adjustment for Ordinal and Interval Data, Weighting: ➢ weight judgments using distances between categories.➢ Measures: AC1, AC2(ordinal and interval data)

➢ Check for statistical significance: ➢ The number of categories and/or raters matters.

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 68

Kilem L Gwet: andbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters, 2014

Page 69: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Task 3: Fleiss Kappa

➢ Exercise 3: Using your judgements from Task 3, compute Cohens Kappas (3 categories) between your and your partners' rating.

➢ Exercise 4: Split the class into 2 groups. Using your judgements from Task 3,compute Fleiss Kappas within the group.

https://en.wikipedia.org/wiki/Fleiss%27_kappa

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 69

Page 70: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Inter–Rater Reliability: Kappa Interpretations

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 70

Koch

Kappa Strenght of Agreement<0.0 Poor

0.0 – 0.20 Slight

0.21 - 0.40 Fair

0.41 - 0.60 Moderate

0.61 - 0.80 Substantual

0.81 - 100 Almost Perfect

Fleiss

Kappa Strenght of Agreement0.0-0.40 Poor

0.41 – 0.75 Intermediate to Good

>0.75 Excellent

Altman

Kappa Strenght of Agreement<0.20 Poor

0.21 - 0.40 Fair

0.41 - 0.60 Moderate

0.61 - 0.80 Good

0.81 - 100 Very Good

Please note: These interpretations were proven to be usefull mostly in medical domain (diagnosis)

Page 71: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Truth Inference: Motivating Example

➢ Where was 2014 FIFA World Cup held?➢ A: Brazil➢ B: Germany

6/27/2018 2nd Keystone Training School, Sergej Zerr 71

I think, it was Germany.

Can we trust him?

Page 72: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Motivating Example

➢ Where was 2014 FIFA World Cup held?➢ A: Germany➢ B: Brazil

6/27/2018 2nd Keystone Training School, Sergej Zerr 72

I think, B. I think, B.I support,

A.

I vote, A.

What is the truth?

Page 73: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Truth Inference Definition

➢ Given different tasks’ answers collected from workers, the target is to infer the truth of each task.

27.06.2018 2nd Keystone Training School, Sergej Zerr 73

Tasks Votes Workers Answers

Truth?

Truth?

.

.

.

Truth

Page 74: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

A Simple Solution: Majority Voting

Take the answer that is voted by the majority (or most) of workers.➢ LimitationTreats each worker equally, neglecting the diverse quality for each worker.

27.06.2018 2nd Keystone Training School, Sergej Zerr 74

ExpertGood at search

Spammer

Random.

Page 75: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

The Key Idea to Truth Inference

➢ Knowing each worker’s quality

27.06.2018 2nd Keystone Training School, Sergej Zerr 75

Page 76: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

How to estimate the worker’s quality?

➢ ◦ Idea: Compute each worker’s quality by considering the workers’ answers for all tasks

27.06.2018 2nd Keystone Training School, Sergej Zerr 76

AB

AB

BB

Where was 2014 FIFA World Cup held?A: BrazilB: Germany

Where is 2018 FIFA World Cup held?A: BrazilB: Russia

Page 77: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Workers’ Quality, Step 1: Infer the labels

27.06.2018 2nd Keystone Training School, Sergej Zerr 77

AB

AA

BB

Where was 2014 FIFA World Cup held?A: Brazil (2)B: Germany (1)

Where is 2018 FIFA World Cup held?A: Brazil (1)B: Russia (2)

Workers’ quality1.0

1.0

1.0

Page 78: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Workers’ Quality, Step 2: Infer the quality

27.06.2018 2nd Keystone Training School, Sergej Zerr 78

AB

AA

BB

Where was 2014 FIFA World Cup held?A: BrazilB: Germany

Where is 2018 FIFA World Cup held?A: BrazilB: Russia

Workers’ quality1.0

0.5

0.5

Correct: ½

Correct: ½

Correct: ½

Page 79: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Unified Framework in Existing Works

➢ Input: Workers’ answers for all tasks➢ Output: Quality for each worker and Truth for each task

➢ Algorithm:

27.06.2018 2nd Keystone Training School, Sergej Zerr 79

Page 80: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Basics for Truth Inference from the Literature

➢ Tasks➢ Different Task Types

➢ Workers➢ Different Worker Models

➢ Objectives➢ Different Objective Functions

27.06.2018 2nd Keystone Training School, Sergej Zerr 80

Page 81: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Truth Inference Exercises

➢ Exercise 5: Infer the truth values for data collected for 3.b

➢ Exercise 6: Using Cohens Kappa for 3 categories, compare agreement between your inferred truth values and the groundtruth

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 81

x

x

x

x

Aggregating Workers Input

Page 82: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Summary

➢ Wisdom of the Crowd: Collective Intelligence and Groupthinking

➢ Obtaining Collaborative Knowledge: Motivation in Paid Crowdsourcing and Citizen Science

➢ Result Aggregation and Quality Assurance

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 82

Page 83: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Sergej Zerr 83

Discussions / Questions / Remarks

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 83

Page 84: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Outline

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 84

• Collaborative Advantages• The Wisdom of Crowds• Conditions for a successful collaboration

• Small experiment• Can we collaborate?

• Obtaining collaborative knowledge• Crowd motivation• Scalability/Efficiency• Own work

• Input/Output Evaluation• Users and Data• Quality assurance

• Discussion

Page 85: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr

Source: http://www.zimbio.com/pictures/zrf_WCjHyqn/Baby+Elephant+Born+Munich+Zoo/H_xQAzpvSP9

85

Page 86: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Results of the Experiment (June 2016):

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 86

Baby Elephant: http://www.zimbio.com/pictures/zrf_WCjHyqn/Baby+Elephant+Born+Munich+Zoo/H_xQAzpvSP9

➢ The real weight of the Babyphant: 112 KG

➢ Average of the 38 estimates: 113,32 KG

➢ Max/Min guesses: 300/2

The graph shows the single estimations as blue points, the average after each estimate as the grey dotted line and the real value as the orange line.

Page 87: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

Results of the Experiment (May 2018):

28.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 87

Baby Elephant: http://www.zimbio.com/pictures/zrf_WCjHyqn/Baby+Elephant+Born+Munich+Zoo/H_xQAzpvSP9

➢ The real weight of the Babyphant: 112 KG

➢ Average of the 8 estimates: 119,92 KG

➢ Max/Min guesses: 320/27

The graph shows the single estimations as blue points, the average after each estimate as the grey dotted line and the real value as the orange line.

Page 88: Collective Intelligence: Crowdsourcing groundtruth data ... · “The best collective decisions are the product of disagreement and contest, not consensus or compromise.” “The

27.06.2018 Advanced Methods of Information Retrieval, Elena Demidova, Sergej Zerr 88

Oluwaseyi Feyisetan, Elena Simperl: Please Stay vs Let's Play: Social Pressure Incentives in Paid Collaborative Crowdsourcing. ICWE 2016J. Fan et al: CrowdOp: Query Optimization for Declarative Crowdsourcing Systems, TKDE, 2015.Kilem L Gwet: andbook of Inter-Rater Reliability: The Definitive Guide to Measuring the Extent of Agreement Among Raters, 2014Lei Chen (HKUST), Dongwon, and MeihuiZhang: Crowdsourcing in Information and Knowledge Management, 2014 CIKM TutorialYoram Bachrach, Thore Graepel, Gjergji Kasneci, Michal Kosinski, Jurgen Van Gael: Crowd IQ: aggregating opinions to boost performance. AAMAS 2012I. Celino et al., "Urbanopoly -- A Social and Location-Based Game with a Purpose to Crowdsource Your Urban Data," Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom)Matthew Lease, Omar Alfonso: Crowdsourcing for Search Evaluation and Social-Algorithmic Search, 2012 SIGIR TutorialOmar Alfonso, Matthew Lease : Crowdsourcing for Information Retrieval: Principles, Methods, and Applications, 2011Michael J. Franklin et al: CrowdDB: answering queries with crowdsourcing, SIGMOD 2011A. Marcus et al: Human-powered Sorts and Joins, VLDB 2011M. Kaisser, M. Hearst, and L. Lowe. “Improving Search Results Quality by Customizing Summary Lengths”, ACL/HLT, 2008.July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 50O. Alonso and S. Mizzaro. “Can we get rid of TREC assessors? Using Mechanical Turk for relevance assessment”, SIGIR Workshop on the Future of IR Evaluation, 2009. July 24, 2011 Crowdsourcing for Information Retrieval: Principles, Methods, and Applications 51N. Bradburn, S. Sudman, and B. Wansink. Asking Questions: The Definitive Guide to Questionnaire Design, Jossey-Bass, 2004James Surowiecki : The Wisdom of Crowds, 2004Sergej Zerr , Stefan Siersdorfer , Jonathon Hare , Elena Demidova Privacy-Aware Image Classification and Search , SIGIR‘12M. Rokicki, S. Chelaru, S. Zerr, and S. Siersdorfer. Competitive game designs for improving the cost effectiveness of crowdsourcing. CIKM’14M. Rokicki, S. Zerr, and S. Siersdorfer. Groupsourcing: Team competition designs for crowdsourcing. WWW’15M. Rokicki, S. Zerr, and S. Siersdorfer. Just in Time: Controlling Temporal Performance in Crowdsourcing Competitions. WWW’15

References