Class 21, 1999 CBCl/AI MIT Neuroscience II T. Poggio.

Post on 13-Dec-2015

222 views 3 download

Tags:

Transcript of Class 21, 1999 CBCl/AI MIT Neuroscience II T. Poggio.

Class 21, 1999

CBCl/AI MIT

Neuroscience II

T. Poggio

Class 21, 1999

CBCl/AI MIT

Neuroscience

Brain Overview?

Class 21, 1999

CBCl/AI MIT

The Ventral Visual Pathway

modified from Ungerleider and Haxby, 1994

Class 21, 1999

CBCl/AI MIT

Class 21, 1999

CBCl/AI MIT

Visual Areas

Class 21, 1999

CBCl/AI MIT

Face-tuned cells in IT

Class 21, 1999

CBCl/AI MIT

Model of view-invariant recognition: learning from views

VIEW ANGLE

Poggio, Edelman Nature, 1990.

A graphical rewriting of mathematics of regularization (GRBF),a learning technique

Class 21, 1999

CBCl/AI MIT

Learning to Recognize3D Objects in IT Cortex

Logothetis, Pauls, Poggio1995

Examples of Visual Stimuli

After human psychophysics (Buelthoff, Edelman,Tarr, Sinha,…), whichsupports modelsbased on view-tunedunits...physiology!

Class 21, 1999

CBCl/AI MIT

Task Description

Task Description

Lef t LeverBlue Fixspot

Response

Color Change

2 sec

Stimulus

Yellow Fixspot

Response

OffOn

Recognition Task

Fixation Task

StimulusRight Lever

T TTT TD DD DDTT = TargetD = Distractor

Testing P haseLear ning Phase

Stimulus

Blue Fixspot

Response

Stimulus

Yellow Fixspot

Response

Left Lever

Right Lever

T = TargetD = Distractor

Logothetis, Pauls, Poggio1995

Class 21, 1999

CBCl/AI MIT

Recording Sites in Anterior IT

LUNLAT

IOS

STS

AMTSLAT

STS

AMTS

Ho=0

Logothetis, Pauls, and Poggio, 1995;Logothetis, Pauls, 1995

Class 21, 1999

CBCl/AI MIT

Model’s predictions: View-tuned Neurons

VIEW ANGLE

VIEW-TUNEDUNITS

Class 21, 1999

CBCl/AI MIT

The Cortex: Neurons Tuned to Object Views

Logothetis, Pauls, Poggio1995

Class 21, 1999

CBCl/AI MIT

A View Tuned Cell

12 7224 8448 10860 12036 96

12 24 36 48 60 72 84 96 108 120 132 168o o o o o o o o o o o o

-108 -96 -84 -72 -60 -48 -36 -24 -12 0-168 -120

Distractors

Target Views60

sp

ikes

/sec

800 msec

-108 -96 -84 -72 -60 -48 -36 -24 -12 0-168 -120 oo o o o o o o o oo o

Logothetis, Pauls, Poggio1995

Class 21, 1999

CBCl/AI MIT

Model’s predictions : View-invariant, Object-specific

Neurons

View Angle

VIEW-INVARIANT,

OBJECT-SPECIFIC

UNIT

Class 21, 1999

CBCl/AI MIT

The Cortex: View-invariant, Object-specific Neurons

Logothetis, Pauls, Poggio,1995

Class 21, 1999

CBCl/AI MIT

Recognition of Wire Objects

Class 21, 1999

CBCl/AI MIT

Generalization Field

Distractors (N=60)

10

0

Z

Y

X

- +45

Y

+45

-45

X

45

5

10

15

20

25

90

45

0

-45

-90-90

-450

4590

Class 21, 1999

CBCl/AI MIT

600 msec

1 2 3 4 6

7 8 9 10 12

13 14 15 16 18

20 21 22 28 30

184 s

pik

es/s

ec

Amoeba 01, Cell = 265

(a)

(b)

48 o

96 o

144 o

0 o

60 o

108o

156 o

12 o

72 o

120o

168 o

24 o

36 o

84 o

-12 o

132 o

Class 21, 1999

CBCl/AI MIT

Wire 526, Cell = 202

(b)

(a)

o108

o60

o12

-36o

o84

o36

o

-60 o

-12

o72

o24

o

-72 o

-24

o96

o48

-48 o

o0

14

2 sp

ike

s/se

c

600 msec

4

22

39

17

5

24

43

18

25

44

8

45

49

26

20

9

27

50

59

21

Class 21, 1999

CBCl/AI MIT

Distractors (N=60)0

10

Spik

es p

er

Second

-180 -135 -90 -45 0 45 90 135 1800

16

32

48

64

80 ( - 120 deg) ( 60 deg)

Rotation Around Y Axis

0

5

10

15

20

25

0 60 120 180-60-120-180S

pik

es p

er

Second

Rotation Around Y Axis

Hit Rate > 95% for all views

Class 21, 1999

CBCl/AI MIT

View-dependent Response of an IT Neuron

0 250 500 0 250 500 0 250 500 0 250 500 0 250 500 0 250 500 0 250 500

160

Hit

Ra

te

0

2040

60

80100

Sp

ike

s p

er S

eco

nd

10

20

30

40

50

60

0 60 120 180-60-120-180

0

-160 -120 -40 60 120900

Rotation Around Y Axis (degrees)0 60 120 180-60-120-180

Class 21, 1999

CBCl/AI MIT

Sparse Representations in IT

• About 400 view tuned cells per object

• Perhaps 20 view-invariant cells per object

In the recording area in AMTS -- a In the recording area in AMTS -- a specialized region for paperclips (!) --specialized region for paperclips (!) --we estimate that there are, after training,we estimate that there are, after training, (within an order of magnitude or two) …

Logothetis, Pauls, Poggio, 1997

Class 21, 1999

CBCl/AI MIT

Previous glimpses: cells tuned to face identity and

view

Perrett, 1989

Class 21, 1999

CBCl/AI MIT

2. View-tuned IT neurons

View-tuned cells in

IT Cortex:

how do they work?

How do they

achieve selectivity

and invariance? Max Riesenhuber andT. Poggio, Nature Neuroscience, just published

Class 21, 1999

CBCl/AI MIT

max

Some ofour fundingis fromHonda...

Class 21, 1999

CBCl/AI MIT

Model’s View-tuned Neurons

VIEW ANGLE

VIEW-TUNEDUNITS

Class 21, 1999

CBCl/AI MIT

Scale-Invariant Responses of an IT Neuron

(training on one size only!)

Logothetis, Pauls and Poggio, 1995

Scale Invariant Responses of an IT Neuron

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

0 2000 3000Time (msec)

1000

Spi

kes/

sec

0

76

4.0 deg(x 1.6)

4.75 deg(x 1.9)

5.5 deg(x 2.2)

6.25 deg(x 2.5)

2.5 deg(x 1.0)

1.0 deg(x 0.4)

1.75 deg(x 0.7)

3.25 deg(x 1.3)

Scale Invariant Responses of an IT Neuron

Class 21, 1999

CBCl/AI MIT

• Invariance around training view

• Invariance while maintaining specificity

*

Sp

ike

R

ate

Distractor ID

10 Best Distractors

37 9 20 5 24 3 2 1 0 6

0

10

20

30

40

60 108 132 156 18084

0

10

20

30

40

Rotation Around Y Axis

(a) (b)

Azimuth and Elevation(x = 2.25 degrees)

1.90 2.80 3.70 4.70 5.60

0

1

2

3

4

5

6

7

( 0,0) ( x,x) ( x,- x)( - x,x) ( - x,- x)

0

1

2

3

4

5

6

Degrees of Visual Angle(Target Response)/

(M

ea

n o

f B

est D

istra

cto

rs)

(c) (d)

Sp

ike

Ra

te(T

arg

et

Re

spo

nse

)/(M

ea

n o

f B

est

Dis

tra

cto

rs)

Invariances: Overview

Logothetis, Pauls and Poggio, 1995

Class 21, 1999

CBCl/AI MIT

Our quantitative model builds upon previous hierarchical models

•Hubel & Wiesel (1962): Simple to complex to ``higher order

hypercomplex cells’’

•Fukushima (1980): Alternation of “S” and “C” layers to build up

feature specificity and translation invariance, resp.

•Perrett & Oram (1993): Pooling as general mechanism to achieve

invariance

Class 21, 1999

CBCl/AI MIT

Model of view tuned cells

MAX Riesenhuber andTommy Poggio, 1999

Class 21, 1999

CBCl/AI MIT

Model Diagram

“IT”

“V4”

“V1”

. . .

...

. . .

...

w

View-specific learning: synaptic plasticity

Class 21, 1999

CBCl/AI MIT

Max (or “softmax”)

• key mechanism in the model

• computationally equivalent to selection (and scanning in our object detection system)

Class 21, 1999

CBCl/AI MIT

V1: Simple Features, Small Receptive Fields

• Simple cells respond to bars

Hubel & Wiesel, 1959•“Complex Cells”: translation invariance; pool over simple cells of the same orientation (Hubel&Wiesel)

Class 21, 1999

CBCl/AI MIT

Two possible Pooling Mechanisms

thanks to Pawan Sinha

Nn

nn

Class 21, 1999

CBCl/AI MIT

An Example: Simple to Complex Cells

“simple”cells

“complex” cell

?

Class 21, 1999

CBCl/AI MIT

Simple to Complex: Invariance to Position and Feature

Selectivity

?

“simple”cells

“complex” cell

Class 21, 1999

CBCl/AI MIT

3. Some predictions of the model

• Scale and translation invariance of view-tuned AIT neurons

• Response to pseudomirror views

• Effect of scrambling

• Multiple objects

• Robustness to clutter

• Consistent with K. Tanaka’s simplification procedure

• More and more complex features from V1 to AIT

Class 21, 1999

CBCl/AI MIT

Testing Selectivity and Invariance of Model Neurons

• Test specificity AND transformation tolerance of view-tuned model neurons

• Same objects as in Logothetis’ experiment

• 60 distractors

Class 21, 1999

CBCl/AI MIT

Invariances of IT (view-tuned) Model Neuron

Class 21, 1999

CBCl/AI MIT

Invariances: Experiment vs. Model (view-tuned cells)

05

10152025303540

3D Rotation

degrees

0

0.5

1

1.5

2

2.5

translation

degrees o

f v

is. angle

0

0.5

1

1.5

2

2.5

3

3.5

scale change

octaves

Model

Experiment

*

Class 21, 1999

CBCl/AI MIT

MAX vs. Summation

05

10152025303540

3D Rotation

deg

rees

0

0.5

1

1.5

2

translation

deg

rees o

f v

is. a

ng

le

0

0.5

1

1.5

2

2.5

3

3.5

scale change

octaves

max sum

Class 21, 1999

CBCl/AI MIT

Response toPseudo-Mirror Views

As in experiment, somemodel neurons show tuningto pseudo-mirror image

Class 21, 1999

CBCl/AI MIT

Robustness to scrambling: model and IT neurons

Experiments: Vogels, 1999

Class 21, 1999

CBCl/AI MIT

Recognition in Context: Two Objects

Class 21, 1999

CBCl/AI MIT

Recognition in Context: some experimental support

• Sato: Response of IT cells to two stimuli in RF

Sato, 1989

Class 21, 1999

CBCl/AI MIT

Recognition in Clutter:data

How does response of IT neurons change if background is introduced?

00.30.60.91.2

avg.

re

spon

se

stimulus stim + bg

00.250.5

0.751

%

CO

RREC

T

stimulus stim + bg

Missal et al., 1997

Class 21, 1999

CBCl/AI MIT

Recognition in Clutter: model

• average model neuron response

• recognition rates

Class 21, 1999

CBCl/AI MIT

Further Support: Keiji just mentioned his simplification paradigm...

Wang et al., 1998

Class 21, 1999

CBCl/AI MIT

Consistent behaviour of the model

Class 21, 1999

CBCl/AI MIT

Higher complexity and invariances in Higher Areas

Kobatake & Tanaka, 1994

Class 21, 1999

CBCl/AI MIT

Fujita and Tanaka’s Dictionary of Shapes (about 3000) in posterior IT (columnar

organization)

Class 21, 1999

CBCl/AI MIT

Similar properties in the model...

M. Tarr, Nature Neuroscience

Class 21, 1999

CBCl/AI MIT

Layers With Linear Pooling and With Max Pooling

•Linear pooling: yields more complex features (e.g. from LGN

inputs to simple cells and -- perhaps -- from PIT to AIT cells)

•Max pooling: yields invariant (position, scale) features

over a larger receptive field(e.g. from simple to complex V1 cells)

Class 21, 1999

CBCl/AI MIT

4. Hypothetical circuitry for Softmax

• The max operation is at the core of the model properties

• Which biophysical mechanisms and circuitry underlies the max operation?

Class 21, 1999

CBCl/AI MIT

Softmax circuitry

The SOFTMAX operation may arise from cortical microcircuitsof lateral inhibition between neurons in a cortical layer. An example:a circuit based on feed forward (or recurrent) shunting presynaptic (or post synaptic) inhibition. Key elements: 1) shunting inhibition 2) nonlinear transformation of the signals (synaptic nonlinearities or active membrane properties). The circuit performs: a gain control operation (as in the canonical microcircuit of Martin and Douglas…) and -- for certain values of the parameters -- a softmax operation:

j

pj

qi

i x

xy

Class 21, 1999

CBCl/AI MIT

Summary: main points of model

• Max-like operation, computationally similar to scanning and

selecting • Hypothetical inhibitory microcircuit for Softmax in cortex • Easy grafting of top-down attentional effect on circuitry• Segmentation is a byproduct of recognition• No binding problem, syncrhonization not needed• Model is extension of classical hierarchical H-W scheme• Model deals with nice object classes (e.g. faces) and can be extended to object classification (rather then subordinate level recognition).

• Just a plausibility proof!• Experiments wanted (to prove it wrong)!

Class 21, 1999

CBCl/AI MIT

Category boundary

Prototypes

100% Cat

80% Cat Morphs

60% Cat Morphs

60% Dog Morphs 80% Dog

Morphs

Prototypes 100% Dog

Novel 3D morphing system to create new objects that are linear combinations of 3D prototypes

Class 21, 1999

CBCl/AI MIT

.. .

.

..

.

FixationSample

Delay

Test(Nonmatch)

Delay

(Match)

Test(Match)

600 ms.

1000 ms.

500 ms.

Object classification task for monkey physiology

Class 21, 1999

CBCl/AI MIT

dog 100%

dog 80%dog 60%

cat 60%cat 80%

cat 100%

0 500 1000 1500 2000 2500 300010

15

20

25

30

35

40

45l04.spk 1301

time (msec)

spik

e ra

te (

Hz)

Dog activity

Cat activity

Sample on Delay period Fixation

Preliminary results from Prefrontal Cortex Recordings

This suggests that prefrontal neurons carry information about the category of objects

Class 21, 1999

CBCl/AI MIT

Class 21, 1999

CBCl/AI MIT

Recognition in Context: some experimental support

• Sato: Response of IT cells to two stimuli in RF

Sato, 1989

Summation index

for Max is 0for Sum is 1

Sato finds -0.1 in the average

Class 21, 1999

CBCl/AI MIT

Simulation