TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

4
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050 051 052 053 054 055 056 057 058 059 060 061 062 063 064 065 066 067 068 069 070 071 072 073 074 075 076 077 078 079 080 081 082 083 084 085 086 087 088 089 090 091 092 093 094 095 096 097 098 099 100 101 102 103 104 105 106 107 ICCV #80 ICCV #80 ICCV 2015 Submission #80. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking Anonymous ICCV submission Paper ID 80 In the supplementary material, we demonstrated more examples collected by TurkerGaze of: • gaze prediction error during image free viewing • gaze prediction accuracy evaluated by a 33-point chart • meanshift fixation detection results • image saliency for fully annotated images • center-bias-free saliency map for a panoramic image • video saliency Time (ms) 0 500 1000 1500 2000 2500 X 200 300 400 500 600 700 800 900 Eyelink Ours Time (ms) 0 500 1000 1500 2000 2500 Y 320 340 360 380 400 420 440 460 Time (ms) 0 500 1000 1500 2000 2500 X 350 400 450 500 550 600 650 700 750 Eyelink Ours Time (ms) 0 500 1000 1500 2000 2500 Y 250 300 350 400 450 500 550 600 650 Time (ms) 0 500 1000 1500 2000 2500 X 200 250 300 350 400 450 500 550 600 Eyelink Ours Time (ms) 0 500 1000 1500 2000 2500 Y 250 300 350 400 Figure 1: Gaze prediction error during image free viewing. On the top, the red curve shows the eye position recorded by the Eyelink eye tracker operating at 1000 Hz, and the green squares indicate the predictions of our system with a 30 Hz sample rate. On the bottom is position error in the x and y direction over time. 1

Transcript of TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

Page 1: TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053

054055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107

ICCV#80

ICCV#80

ICCV 2015 Submission #80. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

TurkerGaze: Crowdsourcing Saliency with Webcam based Eye Tracking

Anonymous ICCV submissionPaper ID 80

In the supplementary material, we demonstrated more examples collected by TurkerGaze of:

• gaze prediction error during image free viewing

• gaze prediction accuracy evaluated by a 33-point chart

• meanshift fixation detection results

• image saliency for fully annotated images

• center-bias-free saliency map for a panoramic image

• video saliency

Time (ms)0 500 1000 1500 2000 2500

X

200

300

400

500

600

700

800

900

EyelinkOurs

Time (ms)0 500 1000 1500 2000 2500

Y

320

340

360

380

400

420

440

460

Time (ms)0 500 1000 1500 2000 2500

X

350

400

450

500

550

600

650

700

750

EyelinkOurs

Time (ms)0 500 1000 1500 2000 2500

Y

250

300

350

400

450

500

550

600

650

Time (ms)0 500 1000 1500 2000 2500

X

200

250

300

350

400

450

500

550

600

EyelinkOurs

Time (ms)0 500 1000 1500 2000 2500

Y

250

300

350

400

Figure 1: Gaze prediction error during image free viewing. On the top, the red curve shows the eye position recordedby the Eyelink eye tracker operating at 1000 Hz, and the green squares indicate the predictions of our system with a 30 Hzsample rate. On the bottom is position error in the x and y direction over time.

1

Page 2: TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161

162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215

ICCV#80

ICCV#80

ICCV 2015 Submission #80. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

0 256 512 768 1024

192

384

576

768

Screen Width

Scre

en H

eigh

t

Error in pixelError in pixel

0 100

0 256 512 768 1024

192

384

576

768

Screen Width

Scre

en H

eigh

tError in pixelError in pixel

0 100

0 256 512 768 1024

192

384

576

768

Screen Width

Scre

en H

eigh

t

Error in pixelError in pixel

0 100

Figure 2: Gaze prediction accuracy evaluated by a 33-point chart. The radius of each dot indicates the median error ofthe corresponding testing position. Each plot corresponds to one experiment for one subject.

Figure 3: Meanshift fixation detection results on two Judd images. On left is the noisy, subsampled gaze data used fortesting. On right are the detected fixations. Red circles indicate ground truth (detected in 240Hz data using code provided byJudd et al.) and yellow crosses indicate fixations detected in the noisy 30Hz data using our meanshift approach.

2

Page 3: TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269

270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323

ICCV#80

ICCV#80

ICCV 2015 Submission #80. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

Figure 4: Examples of center-bias-free saliency maps for panoramic images collected by our system. We uniformlysampled overlapping views from the panorama, projecting each to a regular photo perspective, and collected gaze data foreach projection. We then projected the saliency maps back onto the panorama and averaged saliency for overlapping areas.This gives a panoramic saliency map with no photographer bias.

3

Page 4: TurkerGaze: Crowdsourcing Saliency with Webcam based Eye ...

324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377

378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431

ICCV#80

ICCV#80

ICCV 2015 Submission #80. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE.

10

20

30

40

50

60

10

20

30

40

50

60

10

20

30

40

50

60

Figure 5: Examples of the image saliency data obtained by our system. In a free-viewing task, users were shown selectedimages from the SUN database with fully annotated objects. From the raw eye tracking data, we used the proposed algorithmto estimate fixations and then computed the saliency map. This map could be used to evaluate the saliency of individualobjects in the image. The 1st row: the original image; the 2nd row: the raw eye tracking data; the 3rd row: the fixationsdetected by the meanshift algorithm; the 4th row: the saliency map; the 5th row: the semantic object segments; the 6th row:salient objects.

4