Digital Humanities Benelux 2017: Keynote Lora Aroyo

Post on 21-Jan-2018

258 views 0 download

Transcript of Digital Humanities Benelux 2017: Keynote Lora Aroyo

http://lora-aroyo.org @laroyo

Harnessing Human Semantics at Scale

Measurable, Reproducible, Engaging, SustainableCrowdsourcing & Nichesourcing

Lora Aroyo

http://lora-aroyo.org @laroyo

20071998 2006 2009

from DVDs to data science

http://lora-aroyo.org @laroyo

20071998 2006 2009

Team BellKor wins Netflix Prize

http://lora-aroyo.org @laroyo

20061994 2003 2016 2017

from books to data science

http://lora-aroyo.org @laroyo

20061994 2003 2016 2017

from books to data science

http://lora-aroyo.org @laroyo

20061994 2003 2016 2017

from books to data science

http://lora-aroyo.org @laroyo

20061994 2003 2016 2017

from books to data science

http://lora-aroyo.org @laroyo

20061994 2003 2016 2017

from books to data science

http://lora-aroyo.org @laroyo

data is at the centre of every process

http://lora-aroyo.org @laroyo

data is essential to evolve with users

http://lora-aroyo.org @laroyo

Ceci n'est pas … la mona lisa

http://lora-aroyo.org @laroyo

Ceci n'est pas … la mona lisa

Louvre’s Mona Lisa

is only #14

http://lora-aroyo.org @laroyo

the battle of two worlds

9,3 million

Louvre

visitors 2014

14 million

website visitors

2,3 million

social media

http://lora-aroyo.org @laroyo

in the (very near) future

most visitors will be digital-born

not bound by time or location

native to new forms of co-makership

native to new mediaSiebe Weide, Max Meijer and Marieke Krabshuis (2012).

Agenda 2026: Study on the Future of the Dutch Museum Sector

http://lora-aroyo.org @laroyo

variety of meaningsmultitude of perspectivesabundance of sourcesendless contexts

know your data

http://lora-aroyo.org @laroyo

crowdsourcing to know your data at scale

http://lora-aroyo.org @laroyo

variety of typesmultitude of platformsabundance of interactionsendless characteristics

know your crowds

http://lora-aroyo.org @laroyo

https://www.rijksmuseum.nl/en/rijksstudio

Engage with Co-creation

http://lora-aroyo.org @laroyo

Engage with Co-creativity

http://lora-aroyo.org @laroyo

Engage with Co-curation

http://lora-aroyo.org @laroyo

Engage the Expert Niche

http://annotate.accurator.nl

http://lora-aroyo.org @laroyo

expertise of Rijksmuseum professionals is in annotating their collection

with art-historical information, e.g. when they were created, by whom, etc.

http://lora-aroyo.org @laroyo

detailed domain-specific information about depicted objects, e.g. which species the

animal or plant belongs to,is in most cases not available

http://lora-aroyo.org @laroyo

use nichesourcing, i.e. niches of people with the right expertise, to add more specific

information

http://lora-aroyo.org @laroyo

Keep Reproducing

http://annotate.accurator.nl

http://lora-aroyo.org @laroyo

Engage with Games

training the general crowd to be a niche:game in which players can carry out an expert

annotation tasks with some assistance

http://lora-aroyo.org @laroyo

http://waisda.nl

Engage with Games

http://lora-aroyo.org @laroyo

http://waisda.nl

Engage with Games

http://lora-aroyo.org @laroyo

http://spotvogel.vroegevogels.vara.nl

Keep Reproducing

http://lora-aroyo.org @laroyo

CrowdTruth.org

Experiment with Paid Crowds

http://lora-aroyo.org @laroyo

CrowdTruth.org

Experiment with Paid Crowds

http://lora-aroyo.org @laroyo

CrowdTruth.org

Experiment with Paid Crowds

http://lora-aroyo.org @laroyo

http://crowdtruth.org/

http://lora-aroyo.org @laroyo

http://data.crowdtruth.org/

http://lora-aroyo.org @laroyo

Challenges

http://lora-aroyo.org @laroyo

Low reproducibility ratesDifficult to estimate & control the time to complete Difficult to assess & compare quality Demands continuous promotional effortActive learning (human-in-the-loop) needs different expertiseDifficult to incorporate results into existing content infrastructure

Challenges

Crowdsourcing typically undertaken in isolation

http://lora-aroyo.org @laroyo

Assess Impact of Task Design

http://lora-aroyo.org @laroyo

InstructionsLayoutSequenceCrowdsPaymentCampaign

Assess Impact of Task Design

experiment with different designs

http://lora-aroyo.org @laroyo

for example

mapping music to mood

http://lora-aroyo.org @laroyo

Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Otherpassionate, rollicking, literate, humorous, silly, aggressive, fiery, does not fit into

rousing, cheerful, fun, poignant, wistful, campy, quirky, tense, anxious, any of the 5

confident, sweet, amiable, bittersweet, whimsical, witty, intense, volatile, clusters

boisterous, good-natured autumnal, wry visceral

rowdy brooding

Choose one:

Which is the mood most appropriate

for each song?

Goal:

(Lee and Hu 2012)

1 song - 1 mood???

http://lora-aroyo.org @laroyo

If “One Truth” & “No Disagreement”

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5

W1 1

W2 1

W3 1

W4 1

W5 1

W6 1

W7

W8

W9 1

W10 1

Totals 1 3 1 2 1

http://lora-aroyo.org @laroyo

Worker Mood-C1 Mood-C2 Mood-C3 Mood-C4 Mood-C5 Other

W1 1 1 1

W2 1 1 1

W3 1 1 1

W4 1 1

W5 1 1

W6 1 1 1

W7 1 1 1

W8 1 1 1

W9 1 1

W10 1 1 1 1 1

Totals 3 5 6 5 2 8

If “Many Truths” & “Disagreement”

Web & Media Group

http://lora-aroyo.org @laroyo

simplification of context

this all results in

Web & Media Group

http://lora-aroyo.org @laroyo

http://lora-aroyo.org @laroyo

● Identify Crowdsourcing Goals through user log analysis

○ # queries, #unique queries, #queries of specific type

○ ranked by popularity

○ ranked by popularity and with error, e.g.

■ # queries entered over 50 times with 0 results

■ # queries of specific type with 0 results

○ which will have biggest impact

○ which has biggest urgency

● … or through other user analysis

Assess Impact of Results

http://lora-aroyo.org @laroyo

for example

in video search

http://lora-aroyo.org @laroyo

people search for fragmentsexperts annotate full videos

35% of search queries result in not found

people search for fragmentsexperts annotate full videos

35% of search queries result in not found

for example

in video search

http://lora-aroyo.org @laroyo

Measure Quality

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

http://lora-aroyo.org @laroyo

Measure Quality

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

time-based annotationbernhard

88% of the tags usefulfor specific genres

describe short segmentsoften not very specificdon’t describe program as a whole

http://lora-aroyo.org @laroyo

for example

in video search

video annotation is time-consuming5 times the video duration

experts use a specific vocabularythat is unknown to general audiences

video annotation is time-consuming5 times the video duration

experts use a specific vocabularythat is unknown to general audiences

http://lora-aroyo.org @laroyo

user vocabulary8% in professional vocabulary23% in Dutch lexicon89% found on Google

locations (7%)

engeland

persons (31%)

objects (57%)

Measure Quality

“On the Role of User-Generated Metadata in A/V Collections”, Riste Gligorov et al. KCAP2011

Web & Media Group

http://lora-aroyo.org @laroyo

human subjectivity, ambiguity & uncertainty of expression

natural part of human semantics

http://lora-aroyo.org @laroyo

measure quality

quality is not just about spamquality is typically multi-dimensionalunderstand the diversity in crowd answers do not ignore multitude of interpretationsunderstand the variety of contextsidentify cases with high ambiguity, similarity, …experiment with explicit metricsexperiment with different designs

http://lora-aroyo.org @laroyo

Measure Progress

6 months 2 years

340,551 tags 36,981 tags

137.421 matches

602 items 1.782 items

555 registered players 2,017 users (taggers)

thousands of anonymous players

12,279 visits (3+ min online)

44,362 pageviews

Riste Gligorov, Michiel Hildebrand, Jacco van Ossenbruggen, Guus Schreiber, Lora Aroyo

(2011). On the role of user-generated metadata in audio visual collections. International

conference on Knowledge capture K-CAP '11, Pages 145-152

http://lora-aroyo.org @laroyo

campaign, campaign, campaign

http://lora-aroyo.org @laroyo

http://lora-aroyo.org @laroyo

http://lora-aroyo.org @laroyo

http://lora-aroyo.org @laroyo

Measurable qualityReproducible resultsSustainable settingsEngaging interaction

Goals