Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

23
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags Tina Walber, Ansgar Scherp , Steffen Staab University of Koblenz-Landau, Koblenz, Germany Multimedia Modeling Conference Klagenfurt, Austria

description

Slides of our MMM 2012 paper.Download slides to enjoy all animations.

Transcript of Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

Page 1: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

Tina Walber, Ansgar Scherp, Steffen StaabUniversity of Koblenz-Landau, Koblenz, Germany

Multimedia Modeling ConferenceKlagenfurt, AustriaJanuary 4-6, 2012

Page 2: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 2 of 21

Motivation: Image Tagging

Find specific objects in images Analyzing the user’s gaze path only

sidewalk

car

store

tree

people

girl

Page 3: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 3 of 21

Research Questions

1.Best fixation measure to find the correct image region given a specific tag?

2. Can we differentiate two regions in the same image?

Page 4: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 4 of 21

3 Steps Conducted by Users

Look at red blinking dot Decide whether tag can be seen (“y” or “n”)

Page 5: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 5 of 21

Dataset LabelMe community images

Manually drawn polygons Regions annotated with tags

182.657 images (August 2010)

High-quality segmentation and annotation

Used as ground truth

Page 6: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 6 of 21

Experiment Images and Tags Randomly selected 51 images Contain at least two tagged regions

Created two tag sets for the 51 images Each image is assigned two tags (one per set)

Keep subjects concentrated during experiment

Tags are either “true” or “false” “true” object described by tag can be seen “false” object cannot be seen on the image

Page 7: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 7 of 21

Subjects & Experiment System 20 subjects

16 male, 4 female (age: 23-40, Ø=29.6) Undergrads (6), PhD (12), office clerks (2)

Experiment system Simple web page in Internet Explorer Standard notebook, resolution 1680x1050 Tobii X60 eye-tracker (60 Hz, 0.5° accuracy)

Page 8: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 8 of 21

Conducting the Experiment Each user looked at 51 tag-image-pairs First tag-image-pair dismissed

94.3% correct answers Equal for true/false tags ~3s until decision (average)

85% of users strongly agreed or agreed that they felt comfortable during the experiment

Eyetracker did not much influence comfort

Page 9: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 9 of 21

Pre-processing of Eye-tracking Data Obtained 547 gaze paths from 20 users where

Users gave correct answers Image has “true” tag assigned

Fixation extraction Tobii Studio’s velocity & distance thresholds Fixation: focus on particular point on screen

One fixation inside or near the correct region 476 (87%) gaze paths fulfill this requirement

Page 10: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 10 of 21

Analysis of Gaze Fixations (1) Applied 13 fixation measures on the 476 paths (2 new, 7 standard Tobii , 4 literature)

Fixation measure: function on users’ gaze paths Calculated for each image region, over all users

viewing the same tag-image-pair

Page 11: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 11 of 21

Considered Fixation MeasuresNr Name Favorite region r Origin

1 firstFixation No. of fixations before 1st on r Tobii

2 secondFixation No. of fixations before 2nd on r [13]

3 fixationsAfter No. of fixations after last on r [4]

4 fixationsBeforeDecision

fixationsAfter, but before decision New

5 fixationsAfterDecision

fixationsBeforeDecision and after New

6 fixationDuration Total duration of all fixations on r Tobii

7 firstFixationDuration

Duration of first fixation on r Tobii

8 lastFixationDuration

Duration of last fixation on r [11]

9 fixationCount Number of fixations on r Tobii

10 maxVisitDuration Max time first fixation until outside r Tobii

11 meanVisitDuration Mean time first fixation until outside r Tobii

12 visitCount No. of fixations until outside r Tobii

13 saccLength Saccade length, before fixation on r [6]

Page 12: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 12 of 21

Analysis of Gaze Fixations (2)

For every image region (b) the fixation measure is calculated over all gaze paths (c)

Results are summed up per region Regions ordered according to fixation measure If favorite region (d) and tag (a) match, result is

true positive (tp), otherwise false positive (fp)

Page 13: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 13 of 21

Precision per Fixation MeasuremeanVisitDuration

Sum

of t

p an

d fp

ass

ignm

ents

Fixation measures

P

fixationsBeforeDecision lastFixationDuration

fixationDuration

Page 14: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 14 of 21

Adding Boundaries and Weights Take eye-tracker inaccuracies into account Extension of region boundaries by 13 pixels

Larger regions more likely to be fixated Give weight to regions < 5% of image size

meanVisitDuration increases to P = 0.67

Page 15: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 15 of 21

Examples: Tag-Region-Assignments

Page 16: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 16 of 21

Comparison with Baselines

Naïve baseline: largest region r is favorite Random baseline: randomly select favorite r

Gaze / Gaze* significantly better (χ², α<0.001)

Page 17: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 17 of 21

Effect of Gaze Path Aggregation

+46%

+4%

Aggregation of precision P for Gaze*

Single user still significantly better (χ² for naive with α<0.001 and random with α<0.002)

P

Number of gaze paths used

Page 18: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 18 of 21

Research Questions

1.Best fixation measure to find the correct image region given a specific tag?

2. Can we differentiate two regions in the same image?

meanVisitDuration with precision of 67%

Page 19: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 19 of 21

Differentiate Two Objects Use second tag set to identify different objects in the same image

16 images (of our 51) have two “true” tags 6 images had two correct regions identified

Proportion of 38%

Average precision for single object is 67%

Correct tag assignment for two images: 44%

Page 20: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 20 of 21

Correctly Differentiated Objects

Page 21: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 21 of 21

Research Questions

1.Best fixation measure to find the correct image region given a specific tag?

2. Can we differentiate two regions in the same image?

meanVisitDuration with precision of 67%

Accuracy of 38%

Acknowledgement: This research was partially supported by the EU projects Petamedia (FP7-216444) and SocialSensor (FP7-287975).

Page 22: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 22 of 21

Influence of Red Dot

First 5 fixations, over all subjects and all images

Page 23: Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Provided Tags

T. Walber, A. Scherp, S. Staab – Identifying Objects in Images 23 of 21

Experiment Data Cleaning Manually replaced images with

a) Tags that are incomprehensible, require expert-knowledge, or nonsense

b) Tag refers to multiple regions, but not all are drawn into the image (e.g., bicycle)

c) Obstructed objects (bicycle behind a car)

d) “False”-tag actually refers to a visible part of the image and thus were “true” tags