Rosch et al 75, Oliva & Torralba 2001 A Model of Visual ... · A Model of Visual Categorization...
Transcript of Rosch et al 75, Oliva & Torralba 2001 A Model of Visual ... · A Model of Visual Categorization...
1
A Model of Visual Categorization Based on
Structural Parameterization
Gaze Com
Christoph Rasche
Justus-Liebig Universität GiessenGermany
When we see a novel image… Rosch et al 75, Oliva & Torralba 2001
Chair
…we assign it toa basic-level category…
…occurs within ca. 150ms![for canonical views Palmer et al 81]
Dog Tree House
Structural VariabilityWitkin & Tenenbaum 83 , Draper et al 96, Palmer 99, Rasche 05, Basri & Jacobs 97
• Perona et al 101 categories: PCA, classifier, human-supervised• Oliva & Torralba 8 urban/natural scenes [super-ordinate]: FT, classifier→ Preprocessing not useful for analysis of ‘parts’ (or simulating a visual search by eye-movements)
Structural DescriptionGuzman 71, … , Marr 82, … , Biederman 87,…NN
→ distance distributionsobtaining parameters
→ buffer variability in a multi-dim. space
coarse scale?→ loss of accuracy
→ local/global
integrating local orientations?→ template matching
Contour Description
→ requires description that identifies those aspects:elementary segment: bow
many animal silhouettesundulating
curvature or inflexionlow edginess
natural sceneswiggly
irregularhigh edginess (zig-zag)
Bowness Signatureiteration with fixed window (chord/stick)
Advantage: signature (1D function) easier to analyze than 2D contour→ curvature, circularity, edginess, symmetry,…
amplitude
2
s
bowness
Local/Global Space
global
ω
inflexionlocalsignature
window size ω
10 20 30 40 50 60 700
0.5
11(5)
10 20 30 40 50 60 700
0.5
12(7)
10 20 30 40 50 60 700
0.5
13(9)
10 20 30 40 50 60 700
0.5
14(11)
10 20 30 40 50 60 700
0.5
15(15)
10 20 30 40 50 60 700
0.5
16(21)
10 20 30 40 50 60 700
0.5
17(29)
10 20 30 40 50 60 700
0.5
18(41)
160 170 180 190 200 210 220 230
50
60
70
80
90
100
110
120
11/87 [11]
*=starting, center point
1 2 3 4 5 6 7 8 9 100
0.2
0.4
0.6
0.8
1
φ
Fraction (β, τ, γ)
1 2 3 4 5 6 7 8 9 100
0.5
1
1.5
Spectra
ω
str arc trs alt bnd edg sym0
0.5
1
1.5Dimension Values
Integrated Signatures
→ suitable for copying a contour(by drawing):
s
bowness
Toward Abstraction of Global Geometry
inflexionlocal
global
window size ωLocal Global
bowness
inflexion
Fraction
Parameterization
ω
c(o, l, a, t, x, b, e, s)
curvature alternating edginessorientation,length symmetryinflexion
66 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
662 r0
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
66 r0
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
662 i230
0.99620 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
425 r0
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
976 i230
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
425 i230
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
513 r0
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
70 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
70 r0
1.00020 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
70 r30
0.99920 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
848 r30
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
226 r30
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
271 l5c2
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
382 l5c2
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
594 r0
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
72 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
72 r0
1.00020 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
23 l5c2
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
457 r0
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
72 l5c2
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
457 i230
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
541 r0
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
878 l5c2
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
74 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
74 r0
1.00020 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
111 r0
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
111 i230
0.99620 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
126 r0
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
24 l5c2
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
907 i230
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
126 i230
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
80 l5c2
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
80 r0
0.99820 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
378 r0
0.99220 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
497 r0
0.98920 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
497 i230
0.98920 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
497 r0
0.98820 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
604 r30
0.98620 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
306 l5c2
0.98620 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
85 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
85 r0
1.00020 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
767 i230
0.99820 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
767 r0
0.99820 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
457 l5c2
0.99620 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
821 r30
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
208 l5c2
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
766 l5c2
0.99420 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
83 i230
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
83 r0
0.99920 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
286 r0
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
937 r30
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
286 i230
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
524 r0
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
524 i230
0.99720 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
612 r30
0.99520 40 60 80 100 120 140 160 180
20
40
60
80
100
120
140
83 i23020 83 r020 873 r3020 916 r3020 77 r3020 35 r020 355 r3020 35 i23020
Selected Decreasing Similarity [>50’000 cnts; 1000 objs]
→ just a vector to express an unlimited variety of contours
Parameter Changesincreasing curvature, edginess
smootharc
wiggly(undulating)
wiggly(zig-zag)
3
For Gray-Scale Scenes:Adding Appearance Parameters
c( o, l, c, t, a, x, e r, cm, cs, fm, fs )geometry appearance
16432−160031
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
1
1
20 40 60 80 100 120 140 160 180
1
1
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
Selected Decreasing Similarity
Corel collection provides 112 basic-level categories!
Contour Partitioning
[contextual issue]
1 ‘hard’ rule only: U-turns
Extracting:- large bows- large straight segments
50 100 150
50
100
150
200
250
300
1
2• category specific <> accidental• local segments useful for
grouping/other features
26 − desert 35.0 % (scale 3)
58 − models 3.1 % (scale 3)
3−13%
98 − sunsets/sunrises 45.0 % (scale 3)
Category-Specific Contours
Asian people1
Indians2
actors3
agricultural vehicles4
aircrafts5
airplanes6
astronomy7
athletes8
barbecue food9
barns10
beverages11
birds12
black−and−white photos13
bridges14
candies15
canyons16
cars17
caves18
chemistry19
children20
churches21
city buildings22
coasts23
costumed persons24
crowds25
desert26
different activities27
different cultures28
domestic animals29
domestic/wild animals30
doors/entrances31
farm animals32
fields33
flags34
flowers35
food on plates36
forest scenes37
forests38
frames39
fruits40
furniture41
gardens42
graffiti43
historical buildings44
historical buildings/ruins45
houses46
icebergs47
insects48
instruments49
interior50
jewellery51
lakesides52
leaves53
lighthouses54
masks55
meals56
military vehicles57
models58
money59
monkeys60
motor−driven vehicles61
mountain animals62
mountain sceneries63
mountains64
native people65
ocean66
ocean life67
ornamental plants68
outdoor activity69
paintings70
parks71
parts of cars72
pastry73
patterns74
persons at work75
persons on holiday76
pharmaceuticals77
plants of the forest78
playing cards79
porcelain80
portraits81
postcards82
public transportation83
racing vehicles84
railway vehicles85
reptiles/amphibians86
roads87
shields88
ships89
sky90
skylines91
skyscrapers92
spices93
sports94
stamps95
statues96
stones97
sunsets/sunrises98
tools99
toys100
trees101
tundra102
uniformed persons103
vegetables104
vehicles105
volcanoes106
water animals107
waterfalls108
waterways109
weapons110
wedding party111
wild animals112
contours scale 5Faces
2
Faces easy
3
Leopards
4
Motorbikes
5
accordion
6
airplanes
7
anchor
8
ant
9
barrel
10
bass
11
beaver
12
binocular
13
bonsai
14
brain
15
brontosaurus
16
buddha
17
butterfly
18
camera
19
cannon
20
car side
21
ceiling fan
22
cellphone
23
chair
24
chandelier
25
cougar body
26
cougar face
27
crab
28
crayfish
29
crocodile
30
crocodile head
31
cup
32
dalmatian
33
dollar bill
34
dolphin
35
dragonfly
36
electric guitar
37
elephant
38
emu
39
euphonium
40
ewer
41
ferry
42
flamingo
43
flamingo head
44
garfield
45
gerenuk
46
gramophone
47
grand piano
48
hawksbill
49
headphone
50
hedgehog
51
helicopter
52
ibis
53
inline skate
54
joshua tree
55
kangaroo
56
ketch
57
lamp
58
laptop
59
llama
60
lobster
61
lotus
62
mandolin
63
mayfly
64
menorah
65
metronome
66
minaret
67
nautilus
68
octopus
69
okapi
70
pagoda
71
panda
72
pigeon
73
pizza
74
platypus
75
pyramid
76
revolver
77
rhino
78
rooster
79
saxophone
80
schooner
81
scissors
82
scorpion
83
sea horse
84
snoopy
85
soccer ball
86
stapler
87
starfish
88
stegosaurus
89
stop sign
90
strawberry
91
sunflower
92
tick
93
trilobite
94
umbrella
95
watch
96
water lilly
97
wheelchair
98
wild cat
99
windsor chair
100
wrench
101
yin yang
102
CALTECH, contours, σ=2
CALTECH 101Li et al 06
4
Opencountry
1
coast
2
forest
3
highway
4
inside city
5
mountain
6
street
7
tallbuilding
8
URB & NAT, contours, σ=1
Urban & NaturalOliva, Torralba 01
Can Explain Parallel Contour Pop-OutTreisman & Gormican 88
Fig 5 Fig 11
Traditional: saliency implemented by templatesAlternative: a systematic decomposition/parameterization
→ saliency = var(c)
Criticism: many pop-outs possible…
And now to Structural Relations…
- global-local processing Kovacs & Julesz 94, Lee et al 98
- biological motion perception Kovacs
- saccadic landing positions Richards & Kaufman, Kowler
…using Blum’s Symmetric-Axis Transform
Grass-FireBlum 73
time
• segments very informative• previous implementations
[closed shapes, iterations]Feldman & Singh, 2006
Wave PropagationTraveling wave in a 2D map of I&F neurons
t=1
5 10 15 20
10
20
t=5
5 10 15 20
10
20
t=9
5 10 15 20
10
20
t=13
5 10 15 20
10
20
t=17
5 10 15 20
10
20
time
Passive propagation [sub-threshold]Active propagation [above-threshold]
(Similar to dendritic active propagation)
Temporal evolvement for point: amphitheater
Temporal Evolvement → Landscape
Contour Image (CF)a
50 100 150
20
40
60
80
100
120
50 100 150
0
0
0
0
0
0
• characteristics:Rivers ≈ ContoursRidges ≈ Sym-axes
• works for fragmented contour images
Distance Transform
5
Selecting Sym-Axes with High-Pass Filter [e.g. DOG]
Sym−Axes Field (SF)
50 100 150
0
0
0
0
0
0
50 100 150
0
0
0
0
0
0
- biologically plausible: retina or V1 (single propagation + DOG)→ translation independent
- next: segregation at intersections
Distance Transform
Freq. Occurring SAX Signatures
→ parameterization: o, l, s1, s2, sfx, pfx, b, e,
arc length ofaxis in 2D
time(distance)
Area (Relation) Vectora( o, l, s1, s2, sfx, pfx, b, e, cm, cs, fm , fs)
geometry appearance
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
20 40 60 80 100 120 140 160 180
20
40
60
80
100
120
Sym-Ax Vector
Et Voilá…Treisman & Gormican 88
• Summary: flexible, dynamic decomposition (instead of a fixed architecture)
[to clarify: not supporting FIT]
angle 0/1 elongation
6
Toward Shape Abstraction
arc length of circle
landscape amplitude
→ parameterization: maxamp (openness-closeness)
RF at point of intersectionCategorization Results
[vector matching; 10 training images]
COREL CALTECH URB&NAT
18% 19% 42% (human-unsupervised)
However, my approach:• one decomposition for all inputs
(texture, shape, object, scene)
• can understand parts of the structure
58 − models 3.1 % (scale 3)
3−13%
Image-Rotation InvarianceGuyonneau & Kirchner & Thorpe 06
• Orientation knock-out is robust
Future?: More Grouping
• It’s in the temporal landscape(context analysis of sym-axis signatures)
→ structural relations (Gestaltist’s idea: self-collapsing)→ probabilistic combinations of descriptors→ more appearance dimensions (gradient, 3rd moment, texture)
laterality
Predicting Fixation Locations?Noton & Stark 71
can explain all pop-out
In Gray-Scale Scenes?Kienzle et al 06
worst best
• …intersections are part of it:decomposition/grouping delivers them.
7
Neural Network?
• V1: long-range horizontal connections• translation independent
o
#orientation histogram
column?
- orientation- length- curvature
Translation InvarianceThorpe
Blum 67, Deutsch 62, McCulloch 65
120 ms
Summary
DecompositionCan Explain
Pop-Out
18% on 112 COREL19% on 101 Caltech42% on 8 Urban/Natural
Evaluation[human-unsupervised
learning]
Local/Global Space Symmetric-Axes
SynthesisIs Invariant to
Image-Rotation
Texture, Shape, Object, SceneArbitrary Input
Fixation Prediction
TranslationIndependence
Allows Varying Abstraction Accuracy
Acknowledgments
Gaze ComGaze-Based Communication Project
European Commission within the Information Society Technologies contract no. IST-C-033816
Lab support: Karl Gegenfurtner
COREL Categorization: Nadine Hartig