Spatio-temporal saliency model to predict eye movements in video free viewing Gipsa-lab, Grenoble...

Spatio-temporal saliency model to predict eye movements in video free viewing

Gipsa-lab, Grenoble Département Images et Signal

CNRS, UMR 5216

S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué

GDR-vision 12/06/2008

Plan Introduction

Model

Experiment and results

Conclusion

2/24

Salient region attracts attention and so the eyes

Saliency depends mainly on two factors: Bottom-up : task-independent, depending on intrinsic

features of the stimuli Top-down : task-dependant, integrating high-level

processes (cognitive state, ...)

3/24

Introduction

Introduction

Spatio-temporal saliency model

Achromatic stimuli

Simulates some parts of the human visual system: retina, primary visual cortex (V1)

Two pathways : static and dynamic

4/24

Model

Model

Ms Md

5/24

Model

Model

Ms Md


Achromatic stimuli



6/24

Model

Model

Ms Md


Achromatic stimuli



7/24

Model

Model


Achromatic stimuli



Two outputs :

Magnocellular-like: Low spatial frequencies, band pass filter,

whitens spectrum, provides global information

Parvocellular-like: High spatial frequencies, high pass filter,

whitens spectrum, enhances frame contrast

8/24

Retina model

Model _ retina model

9/24

Retina model

Model _ retina model

« Parvocellular-like » « Magnocellular-like »

Ms Md

Visual stimuli are processed in different frequency bands and orientation in V1 Static: 6 orientations, 4

frequency bands Dynamic: 6 orientations, 3

frequency bands (lower)

10/24

Cortical-like filters

Model _ cortical-like filters

Ms Md

11/24

Static pathway

Model _ static pathway

Static pathway:

Interactions: strengthens the contours Short: between cells of

overlapping receptive fieldLong: between collinear cells

Normalization

Summation in all orientation and frequency bands: static saliency Ms Md

12/24

Dynamic pathway

Model _ dynamic pathway

Dynamic pathway:

2 motion estimation steps: Dominant motion compensationLocal motion estimation using the

same bank of cortical filters as static pathway

Temporal filtering

Dynamic saliency: module of motion vector Ms Md

13/24

Dynamic pathway

Model _ dynamic pathway

Dynamic pathway:

2 motion estimation steps: Dominant motion compensationLocal motion estimation using the

same bank of cortical filters as static pathway

Temporal filtering

Dynamic saliency: module of motion vector Ms Md

Multiplicative fusion

14/24

Fusion and example of saliency maps

Model _ fusion and example of saliency maps

),(),(),( yxMyxMyxM dsand

Original video

Md Mand

Ms

Purpose : compare model results with human eye positions

Free viewing, eye positions recorded by Eyetracker Eyelink II

15 subjects20 clips of 30s composed of different snippets

strung together

Stimulus size = 720x576 pixels, 40°x30° field of view

15/24

Experiment and results

Experiment and result

Snippet 1 Snippet 2 Snippet k-1 Snippet kSnippet k-2

[Itti] : R. Carmi and L. Itti, « Visual causes versus correlates of attentional selection in dynamic scenes », Vision Research, vol.46, 2008

Mh

Criterion : Normalized Scanpath Saliency (NSS) [Itti]

16/24

Global analysis

Experiment _ global analysis

mapsaliency model mapdensity position eyehuman

),,(),,(),,()(),,(

m

h

kyxM

mmh

MM

kyxMkyxMkyxMkNSSm

0.540.440.54Real eye movementsMdnMsnSDMsnH

Naives saliency maps

Ms: staticMd: dynamicMand: fusion

MsnH: entropyMsnSD: standard-deviationMdn: absolute difference

[Itti] : R. J. peters and L. Itti, « Applying computational tools to predict gaze direction in interactive visual environments », ACM Trans. On Applied Perception, vol.5, 2008

Saliency maps Ms Md Mand

Real eye movements 0.68 0.87 0.96

NSS as a function of frame

17/24

Temporal analysis

Experiment _ temporal analysis

Snippet 1 Snippet 2 Snippet N

Average on the kth frame of each snippet

1…k …

Frame rate = 25 fps


18/24

Temporal analysis


Frame rate = 25 fps



1…k …


19/24

Temporal analysis


Frame rate = 25 fps



1…k …


20/24

Temporal analysis


Frame rate = 25 fps



1…k …


21/24

Temporal analysis


Dispersion of eye positions as a function of frame

Frame rate = 25 fps


22/24

Temporal analysis



Frame rate = 25 fps

10-13th frame ≈ 400-520 ms


23/24

Temporal analysis



Frame rate = 25 fps

New model of spatio-temporal saliency, biologically inspired

Retina filter with two outputs Interactions Same bank of cortical-like filters for static and dynamic pathways

This model is reliable to predict the first fixations

references : S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué, « Spatio-temporal saliency model to predict

eye movements in video free viewing », Proc. Eusipco 2008 S. Marat, T. Ho Phuoc, L. Granjon, N. Guyader, D. Pellerin, A. Guérin-Dugué, « Modelling spatio-temporal saliency to

predict gaze direction for short videos », submitted in International Journal of Computer vision

24/24

Conclusion

Conclusion

Thanks for your attention !

Spatio-temporal saliency model to predict eye movements in video free viewing Gipsa-lab, Grenoble...

Documents

Transcript of Spatio-temporal saliency model to predict eye movements in video free viewing Gipsa-lab, Grenoble...