Dudek & Jugessur, ICRA 2000. April 2000, IEEE ICRADudek & Jugessur Robust Place and Object...
-
Upload
sandy-wildey -
Category
Documents
-
view
216 -
download
1
Transcript of Dudek & Jugessur, ICRA 2000. April 2000, IEEE ICRADudek & Jugessur Robust Place and Object...
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Robust Place and Object Recognition using Local Appearance based Methods
Gregory Dudek and Deeptiman Jugessur
Center for Intelligent Machines
McGill University
+
QuickTime™ and aAnimation decompressor
are needed to see this picture.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Outline
• Applications• PCA: shortcomings• Objectives• Approach• Background• System Overview• Results• Conclusion
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Two Applications
• Object recognition: what is that thing?– Recognizing a known object from its visual appearance.
– Landmarks, grasping targets, etc.
• Place recognition (coarse localization): what room am I in?– Recognizing the current waypoint on a trajectory,
validating the current locale for the application of a precise localization method, topological navigation.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
PCA-based recognition.
• Has now become a well established method for image recognition.
• PCA-based recognition: global transform of image with N degrees of freedom into an eigenspace with M << N degrees of freedom.– Freedoms M are the “most important” characteristics of
the set of images being memorized.
• Avoids having to segment image into object & background by using the whole thing.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Observations
• Using whole image implies recognizing combination of object AND background.
• Segmenting object from background would avoid dependence on background, but it’s too difficult.
• Using a small sub-region gives a less precise recognition (e.e. the sun-window could come from more than one image), it’s is efficient.
• Many subwindows together can “vote” for an unambiguous recognition.
• If the sub-windows are suitably chosen, they may totally ignore the background.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Problem Statement
• Improving the performance of classic PCA based recognition by accounting for:
– Varying backgrounds
– Planar rotations
– Occlusions
• Also (discussed in less detail) – Changes in object pose
– Non-rigid deformation
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Our key idea(s).
• Use sub-windows: several together uniquely accomplish recognition.
• Sub-windows are selected by an attention operator (several kinds can be used).
• Each sub-window is sampled non-uniformly to weight it towards it’s center.
• Use only the amplitude spectrum to buy rotational invariance.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Background
• Standard Appearance Based Recognition– M. Turk and S. Pentland 1991
– S.K. Nayar, H. Murase, S.A. Nene 1994
– H. Murase, S.K. Nayar 1995
– Shortcomings (due to global approach):• Background
• Scale
• Rotations
• Local changes of the image or object
• Occlusion
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Background (part 2)
• “Enhanced” Local sub-window methods– D. Lowe 1999: scale invariance, simple features.
– C. Schmid 1999: Probabilistic approach based on sub-windows extracted using Harris operator.
– C. Schmid & R. Mohr 1997: numerous sub-windows extracted using Harris operator for database image retrieval (simpler problem).
– K. Ohba & K. Ikeuchi 1997: K.L.T. operator used for the extraction
of sub-windows for the creation of an eigenspace. Only handles occlusion.
• Interest Operator of choice:– D. Reisfeld, H. Wolfson, Y.Yeshurun 1995: Local symmetry operator
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Approach
• 2 phases:
– Training (off-line) for the entire database of recognizable images:
• Run an interest operator to obtain a saliency map for each image.
• Choose sub-windows around the salient points for each image.
• Select most informative sub-windows and use foveal sampling.
• Create the eigenspace with the processed sub-windows.
– Testing (on-line) for a candidate test image:
• Run the same interest operator to obtain the saliency map.
• Choose the sub-windows and process the information within them.
• Project the sub-windows onto the eigenspace
• Perform classification based on nearest neighbor rules.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Recognition Model
Databaseof
recognizableimages
Candidatetest
image
Extractsub-windows
based oninterest operator
saliencyvalues and
information content
Obtainamplitude
spectrafor the
sub-windows
Eigenspacefor
classification
Run all images though the interest operator
Run the image through the interest operator
2D FFT
2D FFT
Create low dim. eigenspace
Project ontoeigenspace
Off-line
On-line
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Polar Samplings and 2D FFT
Polar Sampling Polar Sampling
SameAmplitude Spectrum
(in theory)
2D FFT 2D FFT
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Shift Theorem
f(x,y) → F(u,v)
Shift theorem states that:
f(x−a,y−b) → ej2π(au+bv)F(u,v)
Amplitudes are the same as:
|ej2π(au+bv)F(u,v) | = |F(u,v) |
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Place RecognitionTest Images Training Images
Best match
Best match
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Place Recognition (2)Test Images Training Images
Best match
Best match
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Object RecognitionTest Image Training Image
Recognition
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Object Recognition (2)Test Image Training Image
Best matches
Note:background variation
and occlusion
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Performance metrics
• On-line performance:• 15x15 pixel subwindows: 90% recognition with 10 subwindows
(10 interest points).
• 15x15 pixel subwindows: 100% recognition using 15 more subwindows
– Interest operator can take 1/30s to 10 min. (depending on the operator, images size, etc.).
– Classification in Eigenspace well under 1 sec (can be performed in real time).
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Performance vs Number of Interest PointsR
ecog
niti
on R
ate
100%
Number of features
Note: 10 windows of size 15x15 meansusing only 0.7% of the total image
content.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Conclusion & Extensions
• Approach to object and place recognition from single video images. Works despite planar rotation, occlusion or other deformations.
• Highly robust.
• Recognition rates of up to 100% with 20 test images.
• Improved robustness to background can be achieved using “masking” [Jugessur & Dudek CVPR 2000].
• Ongoing work sees to exploit geometry of interest points.
• Could filter in Eigenspace during training to select only “useful” features.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
That’s all
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Questions you could ask
• Have you considered the use of alternative interest/attention operators? Does the operator matter?
• What if the background is much more interesting (to the operator) that the object?
• How much does color information matter?• What is the consequence of not using geometric
information (and what does that really mean)?
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
April 2000, IEEE ICRA Dudek & Jugessur
Dudek & Jugessur, ICRA 2000.
Performance metrics
• Training time: roughly 64 windows, 15x15, 17 objects, 3 views per object: 24 hours.– This is using MATLAB and highly non-optimized code.
• Using similar methods on global images, other groups have reported times on the order of minutes for similar tasks.
• On-line performance: – Interest operator can take 1/30s to 10 min. (depending on
the operator, images size, etc.)
– Classification in Eigenspace well under 1 sec (can be performed in real time).