SED2012 Dataset

18
The 2012 Social Event Detection Dataset Symeon Papadopoulos 1 , Emmanouil Schinas 1 , Vasileios Mezaris 1 , Raphaël Troncy 2 , Yiannis Kompatsiaris 1 1 CERTH-ITI, Thessaloniki, Greece 2 EURECOM, Sophia Antipolis, France Oslo, 28 Feb - 1 Mar 2013

description

Presentation of the SED2012 dataset @ MMSys 2013, Oslo, Norway

Transcript of SED2012 Dataset

Page 1: SED2012 Dataset

The 2012 Social Event Detection DatasetSymeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1, Raphaël Troncy2, Yiannis Kompatsiaris1

1 CERTH-ITI, Thessaloniki, Greece2 EURECOM, Sophia Antipolis, France

Oslo, 28 Feb - 1 Mar 2013

Page 2: SED2012 Dataset

2

SED2012 Overview

• Large collection (>160K) of CC-licensed Flickr photos and some of their metadata

• Event annotations for 149 target events (of specific categories and locations of interest)

• Primary use: Social event detection– Used in the context of MediaEval 2012 (SED task)

• Secondary uses: image geotagging, distractors in CBIR, city summarization

Page 3: SED2012 Dataset

3

Dataset Overview

Flickr photo collection• 167,332 photos• 4,422 unique contributors• Creative Commons licenses

Event Annotations• Challenge 1: Technical events in Germany• Challenge 2: Soccer events in Hamburg and Madrid• Challenge 3: Indignados movement events in Madrid

Page 4: SED2012 Dataset

4

Data Collection Process

• Flickr API: http://www.flickr.com/services/api/• Used method flickr.photo.search with five

geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid

• Time period: Jan 2009 – Dec 2011• All photos CC licensed• 403 photos from the

EventMedia collectionR. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010

Page 5: SED2012 Dataset

5

Photo Distribution

Place distribution

Yearly distribution

Language distribution

Page 6: SED2012 Dataset

6

Dataset Collection MotivationSelection of five cities (three German, two Spanish):• Include large number of non-English text metadata (cf.

language distribution table)• Ensure existence of numerous events for the target types • Include distractor images:

– Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid

– Challenge 3: Barcelona distractor for Madrid

Selection of only geotagged photos:• Ease of annotation

Selection of only CC-licensed photos:• Reuse of collection for research

Page 7: SED2012 Dataset

7

Tag Statistics (1/2)

51,611 unique tags

prevalence of location specific tags

event-specific tags

number of users using the tag

Page 8: SED2012 Dataset

8

Tag Statistics (2/2)

barcelonaspain

madrid>20K photos have no tags

83.9% less than or equal to 10 tags >40K tags appear less than 10 times

>57% of tags appear once or twice

Page 9: SED2012 Dataset

9

User Statistics

30 most active users contribute ~30% of dataset

60% of users less than 10 photos

Page 10: SED2012 Dataset

10

Ground Truth Creation• Manual annotations by use of CrEve

– web-based annotation– two-round annotation by five annotators (three in the

first, two in the second)– interactive annotation (search & annotate)– each round terminated as soon as no new event-related

photos discovered– approximate effort: 100 person-hours

• Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany)

C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012

Page 11: SED2012 Dataset

11

Ground Truth Statistics (1/3)

10 events related with >100 photos

~27% of events associated with 1 or 2 photos

Page 12: SED2012 Dataset

12

Ground Truth Statistics (2/3)106 events are captured by single users

9 events captured by more than 10 people

erroneous timestamps in photos

The majority of events last for less than a day (typical for soccer)

Page 13: SED2012 Dataset

13

Ground Truth Statistics (3/3)Madrid events

Vicente Calderon stadium

Puerta del SolSantiago Bernabeu stadium

Stadium of Butarque

Page 14: SED2012 Dataset

14

Technical Event ExamplesPHP Unconf. 2010 Gamescom 2009

CeBIT 2010 Convention Camp 2011

Page 15: SED2012 Dataset

15

Soccer Event ExamplesReal Madrid – Milan (2010) World Cup 2010

St. Pauli – HSV (2010) Spain – Colombia (2011)

Page 16: SED2012 Dataset

16

Indignados Event ExamplesInaugural march, 15 May Large gathering, 20 May

Gathering, 15 Oct Demonstration, 17 Nov

Page 17: SED2012 Dataset

17

Evaluation• F-measure (macro), Precision, Recall

– goodness of retrieved photos, but not how well they were clustered into events

• Normalized Mutual Information (NMI)– compares automatically extracted clustering of

photos into events with the ground truth• Evaluation script is made available together

with the dataset.• Implementation of event detection available:

http://mklab.iti.gr/project/sed2012_certh

Page 18: SED2012 Dataset

Questions

@sympapadopoulos www.slideshare.net/sympapadopoulos