Visual Scene Understanding

37
Visual Scene Understanding Aude Oliva Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Website: http://cvcl.mit.edu PPA

Transcript of Visual Scene Understanding

Page 1: Visual Scene Understanding

Visual Scene Understanding

Aude OlivaDepartment of Brain and Cognitive Sciences

Massachusetts Institute of TechnologyWebsite: http://cvcl.mit.edu

PPA

Page 2: Visual Scene Understanding

High-level Scene RepresentationI. Long-term Memory Representation

What is the fidelity of stored scene representations and the infrastructure that supports them?

II. High-level Neural Representation of Visual ScenesHow is the shape of visual scene represented?

Soojin Park

Michelle Greene

Timothy Brady

Timothy Brady

TaliaKonkle

GeorgeAlvarez

Page 3: Visual Scene Understanding

Memory Representation

“Basically, my recollection is that we just separated the pictures into distinct thematic categories: e.g. cars, animals, single-person, 2-people, plants, etc.) Only a few slides were selected which fell into each category, and they were visually distinct.”

According to Standing

Standing (1973)

10,000 images

83% Recognition

What we know… What we don’t know…

Dogs Dogs Playing CardsPlaying Cards

… people can remember thousands

of images

… what people are remembering for each image?

Page 4: Visual Scene Understanding

Completely different kinds of places…

Different instances of the same kind of place…

Welcome to

Massive Memory

Experiment

A stream of scenes will be presented on the screen for 3

seconds each.

Your primary task:

Remember them ALL!

afterwards you will be tested with…

Page 5: Visual Scene Understanding

A stream of scenes will be presented on the screen for 3

seconds each.

Your other task:

Detect exact repeats anywhere in the stream

Welcome to

Massive Memory

Experiment

Page 6: Visual Scene Understanding

Barn

Page 7: Visual Scene Understanding

Beach

Page 8: Visual Scene Understanding

Bedroom

Page 9: Visual Scene Understanding

Cavern

Page 10: Visual Scene Understanding

Closet

Page 11: Visual Scene Understanding

Countryroad

Page 12: Visual Scene Understanding

Greenhouse

Page 13: Visual Scene Understanding

Waves

Page 14: Visual Scene Understanding

Methods – The Study Stream128 unique semantic categories of natural images

2912 natural images shown in the stream (3 seconds each, 800 msec ISI)

Number of exemplars per category: 4, 16, or 64 !

N= 24 observers

Page 15: Visual Scene Understanding

Methods – The Study StreamOnline Task: Detect Exact Repeats

Repeats could be 2 to 1024 back in the stream

Repeats could be from categories with 4, 16, or 64 exemplars

7% of images in the stream were repeats (192 / 2912)

1024-back (>2hr!)

2-back

Page 16: Visual Scene Understanding

Methods – The Memory Test

Followed by 224 2-alternative forced choice tests

Novel Exemplar

None of the tested categories were n-backed

Test Pairs were always the same for all subjects

Any effect of interference is due to the additional exemplars

Page 17: Visual Scene Understanding

96

84 80 76

0102030405060708090

100

1-novel 4 16 64

Exemplar

Results – Recognition MemoryPe

rcen

t Corr

ect

Replication of Standing (1973)

Page 18: Visual Scene Understanding

96

84 80 76

0102030405060708090

100

1-novel 4 16 64

Exemplar

Perc

ent

Corr

ect

Detailed RepresentationMinor Interference

2% drop with doubling the number of exemplars in memory

Konkle, Brady, Alvarez & Oliva (submitted)

Page 19: Visual Scene Understanding

96

84 80 76

0102030405060708090

100

1-novel 4 16 64

Exemplar

Perc

ent

Corr

ect

Highly DetailedMinor Interference

Page 20: Visual Scene Understanding
Page 21: Visual Scene Understanding

Objects & ScenesIs it fair to compare?

You can make each test

item and foil arbitrarily

hard

We tried to span the

category with our exemplars

and sampled the test item and

foil uniformly

Page 22: Visual Scene Understanding

5055

606570

75808590

95100 MMI-Scenes

MMI-Objects

2726252423222120chance

Perc

ent C

orre

ct

Number of Exemplars (log scale)

Scene

Object

Memory for Scenes and Objects

Konkle, Brady, Alvarez & Oliva (submitted)

Similar categorical interference effects

for objects and scenes

Page 23: Visual Scene Understanding

I. Conclusion• High fidelity representation in long term visual memory

• Similar categorical interference effects for scenes and objects

• Objects and scenes are entities represented at a similar level of abstraction in long term storage

• The results suggest that the structure of visual categories is information-theoretic optimal: It maximizes within category similarity & minimize between category similarities

Massive Memory Categorical Interference

See website with papers and stimuli: http://cvcl.mit.edu/MM

Page 24: Visual Scene Understanding

Visual Categories are represented by their shape

How to represent the shape of scenes ?

Page 25: Visual Scene Understanding

II – Neural Representationof Visual Scenes

Soojin Park Michelle Greene Timothy Brady

Walther et al (2009)

Page 26: Visual Scene Understanding

Scenes are spatial entitiesA scene is a 3 dimensional entity we act within:

it extends in space, it has a size, boundary, content, layout.

Shape of a scene: Spatial Boundary and Content

Page 27: Visual Scene Understanding

Spatial Envelope RepresentationA scene is inherently a 3D entity that may be described by

properties related to its size (volume) and its content

(1) Boundary of the spaceMean depth/SizeOpennessPerspective …

(2) Content of the spaceNaturalnessRoughnessClutter …

closet kitchen street

Oliva & Torralba (2001, 2002, 2006); Torralba & Oliva (2002, 2003); Greene & Oliva (2010, in press); Ross & Oliva (2010)

Page 28: Visual Scene Understanding

Spatial Envelope Representation of Visual Scenes

Oliva & Torralba (2001, 2002, 2006); Torralba & Oliva (2002, 2003); Greene & Oliva (2009, in press); Ross & Oliva (2010)

Page 29: Visual Scene Understanding

Spatial Boundary & Content Orthogonal Properties

Park, Brady, Greene & Oliva (submitted)

Page 30: Visual Scene Understanding

Experimental ConditionsN

atur

alU

rban

Con

tent

Closed OpenSpatial Boundary

Page 31: Visual Scene Understanding

fixation Open Natural fixation Closed Natural fixation Open Urban …

10s 20s Time

20 blocks per condition

Experimental Procedure

1-back task

Page 32: Visual Scene Understanding

PPA

ROIs localized with Independent localizers

LOC

Epstein & Kanwisher (1998)

Page 33: Visual Scene Understanding

Classification Performances in PPA and LOC

Both PPA and LOC regions classified the 4 groups with ~ 50% accuracy

An SVM classifier was trained to classify the four conditions using all blocks but one, and then was tested on the remaining block.

Page 34: Visual Scene Understanding

Patterns of ErrorsThe patterns of errors allows to dissociate multiple levels of structure coexisting within intact images, and test the extent to which a specific property is coded in a certain brain area.

Page 35: Visual Scene Understanding

Patterns of Errors

Page 36: Visual Scene Understanding

II. ConclusionA dual neural pathway for representing the shape of a visual

scene

Visual scenes are represented in a distributed and complementary manner by different brain regions sensitive to spatial boundary vs content of a scene

Park, Brady, Greene & Oliva (submitted)

Page 37: Visual Scene Understanding

Thank You

Funding: National Science Foundation Career Award IIS-0546262

http://cvcl.mit.edu/MM

Talia Konkle

Timothy Brady

George Alvarez

Michelle

Greene

Soojin

Park

Timothy Brady