NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe...
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe...
![Page 1: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/1.jpg)
NIPS 2003 Tutorial
Real-time Object Recognition using Invariant Local Image Features
David Lowe
Computer Science Department
University of British Columbia
![Page 2: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/2.jpg)
Object Recognition
Definition: Identify an object and determine its pose and model parameters
Commercial object recognition Currently a $4 billion/year industry for inspection and
assembly Almost entirely based on template matching
Upcoming applications Mobile robots, toys, user interfaces Location recognition Digital camera panoramas, 3D scene modeling
![Page 3: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/3.jpg)
Invariant Local Features
Image content is transformed into local feature coordinates that are invariant to translation, rotation, scale, and other imaging parameters
SIFT Features
![Page 4: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/4.jpg)
Advantages of invariant local features
Locality: features are local, so robust to occlusion and clutter (no prior segmentation)
Distinctiveness: individual features can be matched to a large database of objects
Quantity: many features can be generated for even small objects
Efficiency: close to real-time performance
Extensibility: can easily be extended to wide range of differing feature types, with each adding robustness
![Page 5: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/5.jpg)
Zhang, Deriche, Faugeras, Luong (95) Apply Harris corner detector Match points by correlating only at corner points Derive epipolar alignment using robust least-squares
![Page 6: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/6.jpg)
Cordelia Schmid & Roger Mohr (97) Apply Harris corner detector Use rotational invariants at
corner points However, not scale invariant.
Sensitive to viewpoint and illumination change.
![Page 7: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/7.jpg)
Scale invariance
Requires a method to repeatably select points in location and scale:
The only reasonable scale-space kernel is a Gaussian (Koenderink, 1984; Lindeberg, 1994)
An efficient choice is to detect peaks in the difference of Gaussian pyramid (Burt & Adelson, 1983; Crowley & Parker, 1984 – but examining more scales)
Difference-of-Gaussian with constant ratio of scales is a close approximation to Lindeberg’s scale-normalized Laplacian (can be shown from the heat diffusion equation)
B l u r
R e s a m p l e
S u b t r a c t
B l u r
R e s a m p l e
S u b t r a c t
![Page 8: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/8.jpg)
Scale space processed one octave at a time
![Page 9: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/9.jpg)
Key point localization
Detect maxima and minima of difference-of-Gaussian in scale space
Fit a quadratic to surrounding values for sub-pixel and sub-scale interpolation (Brown & Lowe, 2002)
Taylor expansion around point:
Offset of extremum (use finite differences for derivatives):
B l u r
R e s a m p l e
S u b t r a c t
![Page 10: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/10.jpg)
Sampling frequency for scaleMore points are found as sampling frequency increases, but accuracy of matching decreases after 3 scales/octave
![Page 11: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/11.jpg)
Eliminating unstable keypoints
Discard points with DOG value below threshold (low contrast) However, points along edges may have high contrast in one direction
but low in another Compute principal curvatures from eigenvalues of 2x2 Hessian
matrix, and limit ratio (Harris approach):
![Page 12: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/12.jpg)
Select canonical orientation
Create histogram of local gradient directions computed at selected scale
Assign canonical orientation at peak of smoothed histogram
Each key specifies stable 2D coordinates (x, y, scale, orientation)
0 2
![Page 13: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/13.jpg)
Example of keypoint detectionThreshold on value at DOG peak and on ratio of principle curvatures (Harris approach)
(a) 233x189 image(b) 832 DOG extrema(c) 729 left after peak value threshold(d) 536 left after testing ratio of principle curvatures
![Page 14: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/14.jpg)
Creating features stable to viewpoint change
Edelman, Intrator & Poggio (97) showed that complex cell outputs are better for 3D recognition than simple correlation
![Page 15: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/15.jpg)
Stability to viewpoint change
Classification of rotated 3D models (Edelman 97): Complex cells: 94% vs simple cells: 35%
![Page 16: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/16.jpg)
SIFT vector formation Thresholded image gradients are sampled over 16x16
array of locations in scale space Create array of orientation histograms 8 orientations x 4x4 histogram array = 128 dimensions
![Page 17: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/17.jpg)
Feature stability to noise Match features after random change in image scale &
orientation, with differing levels of image noise Find nearest neighbor in database of 30,000 features
![Page 18: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/18.jpg)
Feature stability to affine change Match features after random change in image scale &
orientation, with 2% image noise, and affine distortion Find nearest neighbor in database of 30,000 features
![Page 19: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/19.jpg)
Distinctiveness of features Vary size of database of features, with 30 degree affine
change, 2% image noise Measure % correct for single nearest neighbor match
![Page 20: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/20.jpg)
Nearest-neighbor matching to feature database
Hypotheses are generated by matching each feature to nearest neighbor vectors in database
No fast method exists for always finding 128-element vector to nearest neighbor in a large database
Therefore, use approximate nearest neighbor: We use best-bin-first (Beis & Lowe, 97)
modification to k-d tree algorithm Use heap data structure to identify bins in order
by their distance from query point Result: Can give speedup by factor of 1000 while
finding nearest neighbor (of interest) 95% of the time
![Page 21: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/21.jpg)
Detecting 0.1% inliers among 99.9% outliers
Need to recognize clusters of just 3 consistent features among 3000 feature match hypotheses
LMS or RANSAC would be hopeless!
Generalized Hough transform Vote for each potential match according to model
ID and pose Insert into multiple bins to allow for error in
similarity approximation Using a hash table instead of an array avoids
need to form empty bins or predict array size
![Page 22: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/22.jpg)
Probability of correct match Compare distance of nearest neighbor to second nearest
neighbor (from different object) Threshold of 0.8 provides excellent separation
![Page 23: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/23.jpg)
Model verification
1) Examine all clusters in Hough transform with at least 3 features
2) Perform least-squares affine fit to model.
3) Discard outliers and perform top-down check for additional features.
4) Evaluate probability that match is correcta) Use Bayesian model, with probability that features
would arise by chance if object was not present
b) Takes account of object size in image, textured regions, model feature count in database, accuracy of fit (Lowe, CVPR 01)
![Page 24: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/24.jpg)
Solution for affine parameters
Affine transform of [x,y] to [u,v]:
Rewrite to solve for transform parameters:
![Page 25: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/25.jpg)
Planar texture models
Models for planar surfaces with SIFT keys
![Page 26: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/26.jpg)
Planar recognition
Planar surfaces can be reliably recognized at a rotation of 60° away from the camera
Affine fit approximates perspective projection
Only 3 points are needed for recognition
![Page 27: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/27.jpg)
3D Object Recognition Extract outlines
with background subtraction
![Page 28: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/28.jpg)
3D Object Recognition
Only 3 keys are needed for recognition, so extra keys provide robustness
Affine model is no longer as accurate
![Page 29: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/29.jpg)
Recognition under occlusion
![Page 30: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/30.jpg)
Test of illumination invariance
Same image under differing illumination
273 keys verified in final match
![Page 31: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/31.jpg)
Examples of view interpolation
![Page 32: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/32.jpg)
Recognition using View Interpolation
![Page 33: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/33.jpg)
Location recognition
![Page 34: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/34.jpg)
Robot Localization Joint work with Stephen Se, Jim Little
![Page 35: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/35.jpg)
Map continuously built over time
![Page 36: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/36.jpg)
Locations of map features in 3D
![Page 37: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/37.jpg)
Recognizing PanoramasMatthew Brown and David Lowe (ICCV 2003) Recognize overlap from an unordered set of images and
automatically stitch together SIFT features provide initial feature matching Image blending at multiple scales hides the seams
Panorama of our lab automatically assembled from 143 images
![Page 38: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/38.jpg)
Why Panoramas?
Are you getting the whole picture? Compact Camera FOV = 50 x 35°
![Page 39: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/39.jpg)
Why Panoramas?
Are you getting the whole picture? Compact Camera FOV = 50 x 35° Human FOV = 200 x 135°
![Page 40: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/40.jpg)
Why Panoramas?
Are you getting the whole picture? Compact Camera FOV = 50 x 35° Human FOV = 200 x 135° Panoramic Mosaic = 360 x 180°
![Page 41: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/41.jpg)
RANSAC for Homography
![Page 42: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/42.jpg)
RANSAC for Homography
![Page 43: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/43.jpg)
RANSAC for Homography
![Page 44: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/44.jpg)
Finding the panoramas
![Page 45: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/45.jpg)
Finding the panoramas
![Page 46: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/46.jpg)
Finding the panoramas
![Page 47: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/47.jpg)
Bundle Adjustment
New images initialised with rotation, focal length of best matching image
![Page 48: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/48.jpg)
Bundle Adjustment
New images initialised with rotation, focal length of best matching image
![Page 49: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/49.jpg)
Multi-band Blending
Burt & Adelson 1983 Blend frequency bands over range
![Page 50: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/50.jpg)
Low frequency ( > 2 pixels)
High frequency ( < 2 pixels)
2-band Blending
![Page 51: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/51.jpg)
Linear Blending
![Page 52: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/52.jpg)
2-band Blending
![Page 53: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/53.jpg)
Sony Aibo
SIFT usage:
Recognize charging station
Communicate with visual cards
Teach object recognition
![Page 54: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/54.jpg)
Comparison to template matching
Cost of template matching 250,000 locations x 30 orientations x 8 scales =
60,000,000 evaluations Does not easily handle partial occlusion and other
variation without large increase in template numbers Viola & Jones cascade must start again for each
qualitatively different template
Cost of local feature approach 3000 evaluations (reduction by factor of 20,000) Features are more invariant to illumination, 3D rotation,
and object variation Use of many small subtemplates increases robustness
to partial occlusion and other variations
![Page 55: NIPS 2003 Tutorial Real-time Object Recognition using Invariant Local Image Features David Lowe Computer Science Department University of British Columbia.](https://reader035.fdocuments.us/reader035/viewer/2022081516/56649d625503460f94a443f5/html5/thumbnails/55.jpg)
Future directions Build true 3D models
Integrate features from large number of training views and perform continuous learning
Feature classes can be greatly expanded Affine-invariant features (Tuytelaars & Van Gool,
Mikolajczyk & Schmid, Schaffalitzky & Zisserman, Brown & Lowe)
Incorporate color, texture, varying feature sizes Include edge features that separate figure from ground
Address instance recognition of generic models Map feature probabilities to measurements of interest
(e.g., specific person, expression, age)