Crowdsourcing Knowledge-Intensive Tasks In Cultural Heritage: WebSci2014 Poster

1
45% 12% 13% 3% 27% Genus common Genus botanical Species common Species botanical Non9flower names Crowdsourcing Knowledge-Intensive Tasks In Cultural Heritage Jasper Oosterman, Alessandro Bozzon, Geert-Jan Houben Archana Nottamkandath, Chris Dijkshoorn, Lora Aroyo Delft University of Technology VU University Amsterdam Cultural Heritage Collections Aspects of Knowledge Intensive Tasks Experiment Conclusions Enrich data collections by tapping into the interest and expertise of crowds to create knowledge; Crowd Generated Knowledge. Data-intensive Rijksmuseum has 1M art pieces requiring annotation Knowledge-intensive Diverse and specific knowledge needed Goals Coverage: Enrich complete (sub)collection Quality: High quality annotations A platform to support crowd-enabled, collaborative annotation processes. What is the relation between entity identification difficulty and crowd annotation behavior? 1 2 Challenge Identification Identify relevant entities Prominence and amount of entities Artistic interpretation, lack of detail, fantasy Species Rosa Californica Genus Rosa Family Rosaceae Annotation Tag identified entities Specificity of tags Domain and culture specific knowledge Setup 82 prints from the Rijksmuseum containing flowers Tasks: annotate prints with specific flower names Executed by experts and crowd workers via crowdsourcing platforms Experimental platform: Accurator 3 Insights Domain specific tasks on CS platforms not popular, but knowledge is present in some workers. Flower prominence does not affect identification Print difficulty only affects flower types identification Low crowd annotator agreement worker selection and task orchestration are required Links Demo Video Experiment performed within the SEALINCMedia project. Scan for a demo of Accurator or a video explaining our research together with the Rijksmuseum. % Wrong Answer 0 0.2 0.4 0.6 0.8 1.0 # of Flowers NP P Flower Types NP P # Workers opened task 732 # Workers passed test questions 84 # Selected workers 44 # Annotation tasks performed 488 # “Fantasy” task 58 # “Unable” task 70 # “Flowers” task 360 # Flower labels 465 Median Median Error Rate 0 0.2 0.4 0.6 0.8 1.0 Flower # Id. Easy Average Hard Flower Type Id. Easy Average Hard

description

 

Transcript of Crowdsourcing Knowledge-Intensive Tasks In Cultural Heritage: WebSci2014 Poster

Page 1: Crowdsourcing Knowledge-Intensive Tasks In Cultural Heritage: WebSci2014 Poster

45%$

12%$

13%$

3%$

27%$

Type%of%crowd%workers%annota1ons%

Genus$common$

Genus$botanical$

Species$common$

Species$botanical$

Non9flower$names$

Crowdsourcing Knowledge-Intensive Tasks In Cultural Heritage

Jasper Oosterman, Alessandro Bozzon, Geert-Jan Houben Archana Nottamkandath, Chris Dijkshoorn, Lora Aroyo

Delft University of Technology VU University Amsterdam

Cultural Heritage Collections

Aspects of Knowledge Intensive Tasks

Experiment Conclusions

Enrich data collections by tapping into the interest and expertise of crowds to create knowledge; Crowd Generated Knowledge. ü  Data-intensive

• Rijksmuseum has 1M art pieces requiring annotation

ü  Knowledge-intensive • Diverse and specific knowledge needed

ü  Goals • Coverage: Enrich complete (sub)collection • Quality: High quality annotations

A platform to support crowd-enabled, collaborative annotation processes.

What is the relation between entity

identification difficulty and crowd annotation behavior?

1

2

Challenge Identification •  Identify relevant entities •  Prominence and amount

of entities •  Artistic interpretation,

lack of detail, fantasy

Species Rosa Californica

Genus Rosa

Family Rosaceae

Annotation •  Tag identified entities •  Specificity of tags •  Domain and culture

specific knowledge

Setup •  82 prints from the Rijksmuseum containing flowers •  Tasks: annotate prints with specific flower names •  Executed by experts and crowd workers via

crowdsourcing platforms Experimental platform: Accurator

3

Insights •  Domain specific tasks on CS platforms not popular, but

knowledge is present in some workers. •  Flower prominence does not affect identification •  Print difficulty only affects flower types identification •  Low crowd annotator agreement à worker selection

and task orchestration are required

Links Demo Video

Experiment performed within the SEALINCMedia project. Scan for a demo of Accurator or a video explaining our research together with the Rijksmuseum.

% W

rong

Ans

wer

00.20.40.60.81.0

# of FlowersNP P

Flower TypesNP P

# Workers opened task 732

# Workers passed test questions 84

# Selected workers 44

# Annotation tasks performed 488

# “Fantasy” task 58

# “Unable” task 70

# “Flowers” task 360

# Flower labels 465

Median

Median

Erro

r Rat

e

00.20.40.60.81.0

Flower # Id.Easy Average Hard

Flower Type Id.Easy Average Hard