Object Recognition by
Discriminative Combinations
of Line Segments and Ellipses
Alex Chia^˚
Susanto Rahardja^
Deepu Rajan˚
Maylor Leung˚
^Institute for Infocomm Research (I²R), Singapore
˚Nanyang Technological University, Singapore
Horse-side
Horse-sideHorse-side
• Image classification – Separate images containing an object
category from other images
Goals
2
• Category-Level Object Detection– Localize all instances of an object category
from an image
Goals – cont.
3
• Region based approach
– Exploits image pixel brightness or color values
– Other classes (e.g. horse) are more defined by their shape
• Region based approach
– Exploits image pixel brightness or color values
– Not suitable for complex classes characterized by thin skeletal structures (e.g. bicycle)
Existing Approaches
4
• Contour based approach
– Exploits spatial configuration or statistic of edge pixels
– Edge based rich local descriptors
– Contour fragments
– Shape primitives
• Contour based approach
– Exploits spatial configuration or statistic of edge pixels
– Edge based rich local descriptors
– Contour fragments
– Shape primitives
Existing Approaches – cont.
• Contour based approach
– Shape primitives
I. Support abstract reasoning (unlike edge based local descriptors)
II. Efficient storage demands (unlike contour fragments)
III. Efficient comparison across single and multiple scales (unlike contour fragments)
5
Detect object instances and classify images
Boost discriminative codeword combinations
Construct shape tokens
Our contour based approach - outline
Detect object instances and classify images
Evaluate performance
Learn category-specific codebook of shape tokens
Boost discriminative codeword combinations
Construct shape tokens
Extract line segments and ellipses
Learn category-specific codebook of shape tokens
DatasetTraining images Testing images
Extract line segments and ellipses
Learning phase Evaluation phase6
Tyxnnnrrr vvhwlwlA
Constructing shape tokens• Pair a reference primitive to its connected neighbor
– Tokens: Ellipse-line, Line-line, Ellipse-ellipse
• Geometrical and spatial properties– Length, orientation, distance between midpoints,
relative primitive positions
θr
θn
hlr
lnwr
lr
wr
θr
ln θn
hT
y
x
vv
7
Difference in widths
• A token is compared only to similar typed tokens
• Differences in their attributes
Difference in spatial separation of primitives
Difference in orientationDifference in widths
Difference in lengths
yjxj
yi
xiv
nrp
pj
pijil
nrp
pj
pil
nrp
pj
pilji vvvvDDhhDwwDllDAAD ,,,,,,,,
,,,
1,lnmin,
j
ijil l
lllD
Tyxnnnrrr vvhwlwlA
2/
,min,
jijijiD
22
,,, yj
yi
xj
xi
yj
xj
yi
xiv vvvvvvvvD
Difference in lengthsDifference in spatial
separation of primitives
Difference in orientation
Difference in relative primitive positions
Comparing shape tokens
8
• Clustering for its relative position– Mean-shift clustering
• Extracting tokens from within the bounding boxes of training objects
Learning category-specific codebook
• Clustering for its scale normalized appearance descriptors– Adapted bisecting 2-medoid clustering
Normalized appearance descriptorNormalized translational vector
10
• Medoid in each mean-shift as candidate codeword
• Appearance distance allowance• Indicate range of appearance candidate represents
• = Mean appearance distance + Std. dev.
• Scale normalized circular window• Indicate where candidate is found relative to object
centroid• center and radius of window:
• Medoid in each mean-shift as candidate codeword
• Appearance distance allowance• Indicate range of appearance candidate represents
Learning category-specific codebook – cont.
Mean-shift sub-cluster feature space
x
x x
x
x
x
x +
• = Mean appearance distance + Std. dev.
rc,
• Medoid in each mean-shift as candidate codeword
c
r
11
Learning category-specific codebook – cont.
• Score each candidate by appearance + geometric qualities•
• Number of unique training objects
•
d
r/1
Candidates from all sub-clusters
Candidates from 350 most populated sub-clusters
Appearance qualities
Geometric quality
12
Learning category-specific codebook – cont.
• Radial ranking method to select candidate into codebook
13
Learning category-specific codebook – cont.
Candidates from all sub-clusters
Candidates from 350 most populated sub-clusters
Candidates from 350 selected sub-clusters
Face Bike-front Bottle
Horse-side Cow-side
14
• Matching codeword combination• Every codeword in combination finds image tokens
within (appearance constraint)• Centroid predictions by all codewords in combination
concur (geometric constraint)
Learning discriminative codeword combinations
• Each codeword parameterized by• Appearance distance allowance• Scale normalized circular window with radius and
center
rc
• Matching codeword combination• Every codeword in combination finds image tokens
within (appearance constraint)• Centroid predictions by all codewords in combination
concur (geometric constraint)
15
For a scale ‘s’ and location ‘x’, all codewords find matching tokens within its estimated window, will predict centroid locations which concur
Learning discriminative codeword combinations – cont.
Basic idea for finding matched codeword combinations
ics
xirs
irs
xjcs
x = (0,0)++x = (0,0)++ics
xirs xj
csjrs
Given codeword i and codeword j, for a scale ‘s’ and location ‘x’ in an image
ics
jcs
jrs
jrs
16
Learning discriminative codeword combinations – cont.
''
'
* ,,minarg tdtdt igeoiappt
Finding token t* within estimated window that has the least appearance distance to codeword
ics
xirs
x = (0,0)++
xx
xx
[0, 2] if matching token found within window=
xx
x
[0, 2] if matching token found within windowotherwise
x
x
xx
x x
** ,,, tdtdxs igeoiappi Response of codeword i at scale ‘s’ and location ‘x’ of image
17
• Simple example (2 codewords)– Matching of codewords ‘i’ and ‘j’ at scale s and location x
– Generalized form
ii xs , jj xs ,andpi
pj
pi
pj
p, [0, 2] {-1 or +1}where, ii xs , jj xs ,and …p
ipj
pi
pj
Binary decision tree
Learning discriminative codeword combinations – cont.
Visual aspects of tokens
Spatial layout of tokens
Relationships of tokens
ii xs , and jj xs , , [0, 2]
Direction of inequality
Structural constraints of object class• Appearance Geometric+ + constraints of object class• Appearance Geometric Structural+ + constraints of object class• Appearance Geometric Structural+ +
Predicted label
18
iii pxs , predicted labelip iii pxs , jjj pxs ,and predicted
labelip jp iii pxs , jjj pxs ,andpredicted
label kkk pxs ,andjpip
kp iii pxs ,ip
• Input
• Output• Output… … …
Learning discriminative codeword combinations – cont.
… … …
111 , xs
112 , xs
11, xsn
…
211 , xs
212 , xs
21, xsn
…
311 , xs
312 , xs
31, xsn…
nm xs ,1
nm xs ,2
nmn xs ,
……Matrix of values
11, xsz 11, xsz 11, xsz nm xsz ,…Vector of z labels
Weight vector 11, xsw 22 , xsw 33, xsw nm xsw ,…
Boosting
otherwise
pxspandpxspifxsxCC jjjjiiii
0
,,,
• Output… … …
xsxCCxsHi
ii
,, • Detection confidence:
19
False positives per image
Rec
all
Shotton et. al. I
Shotton-et. al. II (Retrained test)
Bai et. al.
Our method
0.8738
0.8903
0.8032
0.9218
Detection RP-AUC
False positive rate
True
pos
itive
rate
Shotton et. al. I
Shotton-et. al. II (Retrained test)
Our method
0.9251
0.9400
0.9500
Classification ROC-AUC
Experimental Results – Weizmann horse
J. Shotton et.al., TPAMI, 2008.
X. Bai et. al., ICCV, 2009.
100 400 0.9826 0.9953 0.9325 0.9310
100 400 0.9983 1.0000 0.9996 1.0000
100 217 0.9974 0.9966 0.9895 0.9850
100 400 0.9883 0.9992 0.9797 0.9912
32 14 0.9643 0.9000 0.6843 0.6925
34 16 0.9688 0.9727 0.8256 0.7233
29 13 0.9468 0.9172 0.6042 0.6398
19 12 0.9584 0.9375 0.7421 0.6344
90 53 0.9445 0.9366 0.8299 0.6959
54 64 0.9773 0.9802 0.9009 0.9468
34 16 0.9844 0.9727 0.8335 0.8575
45 65 0.9944 0.9992 0.9945 0.9975
44 22 0.9918 0.9566 0.7368 0.7852
55 96 0.9756 0.9816 0.9361 0.9680
39 18 0.9352 0.9321 0.5730 0.4271
30 20 0.9525 0.9600 0.9619 0.9035
31 20 0.9800 0.9825 0.8964 0.9158
Average across categories 0.9730 0.9659 0.8483 0.8291
Object category
Number of object imagesImage classification results
ROC-AUCObject detection results
RP-AUCTraining Testing
Our method Shotton et. al. Our method Shotton et. al.
Object category
Number of object imagesImage classification results
ROC-AUCObject detection results
RP-AUCTraining Testing
Our method Shotton et. al. Our method Shotton et. al.
Plane
Motorbike
Face
Car-rear
Car-2/3-rear
Car-front
Bike-rear
Bike-front
Bike-side
Bottle
Cow-front
Cow-side
Horse-front
Horse-side
Person
Mug
Cup
Average across categories 0.9730 0. 9659 0.8483 0.8291
Experimental Results – Graz-17
J. Shotton et. al, TPAMI, 2008.
• Additional comparisons with other methods provided in paper
21
• Presented a contour based recognition approach which exploits simple and generic shape primitives
• Proposed a method to learn discriminative primitive combinations which have variable number of primitives
• Demonstrated with extensive experiments across 17 categories the effectiveness of our approach
Summary
Top Related