Visual Recognition: Objects, Actions and Scenes
Transcript of Visual Recognition: Objects, Actions and Scenes
Includes slides from: O. Chum, K. Grauman, S. Lazebnik, B. Leibe, D. Lowe, J. Philbin,
J. Ponce, D. Nister, J. Sivic, N. Snavely and A. Zisserman
Lecture 2:
Local invariant image features
Ivan Laptev
INRIA, WILLOW, ENS/INRIA/CNRS UMR 8548
Laboratoire d’Informatique, Ecole Normale Supérieure, Paris, France
University of Trento
July 7-10, 2014
Trento, Italy
Visual Recognition:
Objects, Actions and Scenes
Image matching and recognition with local features
The goal: establish correspondence between two or more
images
Image points x and x’ are in correspondence if they are
projections of the same 3D scene point X.Images courtesy A. Zisserman
x
x'
XPP
/
Example I: Wide baseline matching and 3D reconstruction
Establish correspondence between two (or more) images.
[Schaffalitzky and Zisserman ECCV 2002]
Example I: Wide baseline matching and 3D reconstruction
Establish correspondence between two (or more) images.
[Schaffalitzky and Zisserman ECCV 2002]
X
[Agarwal, Snavely, Simon, Seitz, Szeliski, ICCV’09] –
Building Rome in a Day
57,845 downloaded images, 11,868 registered images. This video: 4,619 images.
Example II: Object recognition
[D. Lowe, 1999]
Establish correspondence between the target image and
(multiple) images in the model database.
Target
image
Model
database
Find these landmarks ...in these images and 1M more
Example III: Visual search
Given a query image, find images depicting the same place /
object in a large unordered image collection.
Establish correspondence between the query image and all
images from the database depicting the same object / scene.
Query image
Database image(s)
Bing visual scan
Mobile visual search
and others… Snaptell.com, Millpix.com
Applications
Take a picture of a product or advertisement
find relevant information on the web
[Pixee – Milpix]
Applications
Finding stolen/missing objects in a large collection…
Applications
Copy detection for images and videos
Search in 200h of videoQuery video
13K. Grauman, B. Leibe
Sony Aibo – Robotics
• Recognize docking station
• Communicate with visual cards
• Place recognition
• Loop closure in SLAM
Slide credit: David Lowe
Applications
Why is it difficult?
Want to establish correspondence despite possibly large changes in scale, viewspoint, lighting and partial occlusion
ViewpointScale
Lighting Occlusion
… and the image collection can be very large (e.g. 1M images)
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
Compute scale / affine co-variant local features
Estimate pairwise best matches between local features
Enforce geometric constraints between local features
How does it work?
Approach:
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Slide: S. Lazebnik
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Step 1: extract features
Step 2: match features
Slide: S. Lazebnik
Why extract features?
• Motivation: panorama stitching• We have two images – how do we combine them?
Step 1: extract features
Step 2: match features
Step 3: align imagesSlide: S. Lazebnik
Characteristics of good features
• Repeatability• The same feature can be found in several images despite geometric
and photometric transformations
• Saliency• Each feature is distinctive
• Compactness and efficiency• Many fewer features than image pixels
• Locality• A feature occupies a relatively small area of the image; robust to
clutter and occlusion
Slide: S. Lazebnik
A hard feature matching problem
NASA Mars Rover images
Slide: S. Lazebnik
NASA Mars Rover images
with SIFT feature matches
Figure by Noah Snavely
Answer below (look for tiny colored squares…)
Slide: S. Lazebnik
Corner Detection: Basic Idea
• We should easily recognize the point by looking through a small window
• Shifting a window in any direction should give a large change in intensity
“edge”:
no change
along the edge
direction
“corner”:
significant
change in all
directions
“flat” region:
no change in
all directions
Source: A. Efros
Corner Detection: Mathematics
Change in appearance of window W for the shift [u,v]:
I(x, y)E(u, v)
E(3,2)
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
Corner Detection: Mathematics
I(x, y)E(u, v)
E(0,0)
Change in appearance of window W for the shift [u,v]:
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
Corner Detection: Mathematics
We want to find out how this function behaves for
small shiftsE(u, v)
Change in appearance of window W for the shift [u,v]:
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
Corner Detection: Mathematics
• First-order Taylor approximation for small
motions [u, v]:
• Let’s plug this into
v
uIIyxI
vIuIyxI
vIuIyxIvyuxI
yx
yx
yx
),(
),(
sorder termhigher ),(),(
Wyx
yxIvyuxIvuE),(
2)],(),([),(
Slide: S. Lazebnik
Corner Detection: Mathematics
v
u
III
IIIvu
v
uII
yxIv
uIIyxI
yxIvyuxIvuE
yyx
yxx
Wyx
Wyx
yx
Wyx
yx
Wyx
2
2
),(
),(
2
),(
2
),(
2
)],(),([
)],(),([),(
Slide: S. Lazebnik
Corner Detection: Mathematics
The quadratic approximation simplifies to
where M is a second moment matrix computed from image
derivatives:
v
uMvuvuE ),(
Wyx yyx
yxx
III
IIIM
),(2
2
Slide: S. Lazebnik
• The surface E(u,v) is locally approximated by a
quadratic form. Let’s try to understand its shape.
• Specifically, in which directions
does it have the smallest/greatest
change?
Interpreting the second moment matrix
v
uMvuvuE ][),(
Wyx yyx
yxx
III
IIIM
),(2
2
E(u, v)
Slide: S. Lazebnik
First, consider the axis-aligned case
(gradients are either horizontal or vertical)
If either a or b is close to 0, then this is not a corner,
so look for locations where both are large.
Interpreting the second moment matrix
Wyx yyx
yxx
III
IIIM
),(2
2
b
a
0
0
Slide: S. Lazebnik
Consider a horizontal “slice” of E(u, v):
Interpreting the second moment matrix
This is the equation of an ellipse.
const][
v
uMvu
Slide: S. Lazebnik
Visualization of second moment matrices
Slide: S. Lazebnik
Visualization of second moment matrices
Slide: S. Lazebnik
Consider a horizontal “slice” of E(u, v):
Interpreting the second moment matrix
This is the equation of an ellipse.
RRM
2
11
0
0
The axis lengths of the ellipse are determined by the
eigenvalues and the orientation is determined by R
direction of the
slowest change
direction of the
fastest change
(max)-1/2
(min)-1/2
const][
v
uMvu
Diagonalization of M:
Slide: S. Lazebnik
Interpreting the eigenvalues
1
2
“Corner”
1 and 2 are large,
1 ~ 2;
E increases in all
directions
1 and 2 are small;
E is almost constant
in all directions
“Edge”
1 >> 2
“Edge”
2 >> 1
“Flat”
region
Classification of image points using eigenvalues
of M:
Slide: S. Lazebnik
Corner response function
“Corner”
R > 0
“Edge”
R < 0
“Edge”
R < 0
“Flat”
region
|R| small
2
2121
2 )()(trace)det( MMR
α: constant (0.04 to 0.06)
Slide: S. Lazebnik
Harris detector: Steps
1. Compute Gaussian derivatives at each pixel
2. Compute second moment matrix M in a
Gaussian window around each pixel
3. Compute corner response function R
4. Threshold R
5. Find local maxima of response function
(nonmaximum suppression)
C.Harris and M.Stephens. “A Combined Corner and Edge Detector.” Proceedings of the 4th Alvey Vision Conference: pages 147—151, 1988.
Slide: S. Lazebnik
Harris Detector: Steps
Slide: S. Lazebnik
Harris Detector: Steps
Compute corner response R
Slide: S. Lazebnik
Harris Detector: Steps
Find points with large corner response: R>threshold
Slide: S. Lazebnik
Harris Detector: Steps
Take only the points of local maxima of R
Slide: S. Lazebnik
Harris Detector: Steps
Slide: S. Lazebnik
Invariance and covariance
• We want corner locations to be invariant to photometric
transformations and covariant to geometric transformations
• Invariance: image is transformed and corner locations do not change
• Covariance: if we have two transformed versions of the same image,
features should be detected in corresponding locations
Slide: S. Lazebnik
Affine intensity change
• Only derivatives are used =>
invariance to intensity shift I I + b
• Intensity scaling: I a I
R
x (image coordinate)
threshold
R
x (image coordinate)
Partially invariant to affine intensity change
I a I + b
Slide: S. Lazebnik
Image translation
• Derivatives and window function are shift-invariant
Corner location is covariant w.r.t. translation
Slide: S. Lazebnik
Image rotation
Second moment ellipse rotates but its shape
(i.e. eigenvalues) remains the same
Corner location is covariant w.r.t. rotation
Slide: S. Lazebnik
Scaling
All points will
be classified
as edges
Corner
Corner location is not covariant to scaling!
Slide: S. Lazebnik
Blob detection
Slide: S. Lazebnik
Feature detection with scale selection
• We want to extract features with characteristic
scale that is covariant with the image
transformation
Slide: S. Lazebnik
Blob detection: basic idea
• To detect blobs, convolve the image with a
“blob filter” at multiple scales and look for
maxima of filter response in the resulting
scale space
Slide: S. Lazebnik
Blob filter
Laplacian of Gaussian: Circularly symmetric
operator for blob detection in 2D
2
2
2
22
y
g
x
gg
Slide: S. Lazebnik
Recall: Edge detection
gdx
df
f
gdx
d
Source: S. Seitz
Edge
Derivative
of Gaussian
Edge = maximum
of derivative
Edge detection, Take 2
gdx
df
2
2
f
gdx
d2
2
Edge
Second derivative
of Gaussian
(Laplacian)
Edge = zero crossing
of second derivative
Source: S. Seitz
From edges to blobs
• Edge = ripple
• Blob = superposition of two ripples
Spatial selection: the magnitude of the Laplacian
response will achieve a maximum at the center of
the blob, provided the scale of the Laplacian is
“matched” to the scale of the blob
maximum
Slide: S. Lazebnik
Scale-space blob detector: Example
Slide: S. Lazebnik
Scale-space blob detector: Example
Slide: S. Lazebnik
Scale-space blob detector
1. Convolve image with scale-normalized
Laplacian at several scales
2. Find maxima of squared Laplacian response
in scale-space
Slide: S. Lazebnik
Scale-space blob detector: Example
Slide: S. Lazebnik
• Approximating the Laplacian with a
difference of Gaussians:
2 ( , , ) ( , , )xx yyL G x y G x y
( , , ) ( , , )DoG G x y k G x y
(Laplacian)
(Difference of Gaussians)
Efficient implementation
Slide: S. Lazebnik
Efficient implementation
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004.
Slide: S. Lazebnik
From feature detection to feature description
• Scaled and rotated versions of the same
neighborhood will give rise to blobs that are related
by the same transformation
• What to do if we want to compare the appearance of
these image regions?
• Normalization: transform these regions into same-
size circles
• Problem: rotational ambiguity
Slide: S. Lazebnik
Eliminating rotation ambiguity
• To assign a unique orientation to circular
image windows:• Create histogram of local gradient directions in the patch
• Assign canonical orientation at peak of smoothed histogram
0 2 p
Slide: S. Lazebnik
SIFT features
• Detected features with characteristic scales
and orientations:
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik
SIFT descriptors
David G. Lowe. "Distinctive image features from scale-invariant
keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide: S. Lazebnik
Invariance vs. covariance
Invariance:• features(transform(image)) = features(image)
Covariance:• features(transform(image)) = transform(features(image))
Covariant detection => invariant descriptionSlide: S. Lazebnik
Affine adaptation
• Affine transformation approximates viewpoint
changes for roughly planar objects and
roughly orthographic cameras
Slide: S. Lazebnik
Affine adaptation
RRIII
IIIyxwM
yyx
yxx
yx
2
11
2
2
, 0
0),(
direction of
the slowest
change
direction of the
fastest change
(max)-1/2
(min)-1/2
Consider the second moment matrix of the window
containing the blob:
const][
v
uMvu
Recall:
This ellipse visualizes the “characteristic shape” of the
windowSlide: S. Lazebnik
Affine adaptation example
Scale-invariant regions (blobs)
Slide: S. Lazebnik
Affine adaptation example
Affine-adapted blobs
Slide: S. Lazebnik
Software
VLFeat: Vision Library Features http://www.vlfeat.org/
(will be used in this course)
– Local image features (Harris,SIFT, MSER, …)
– Local image descriptors (SIFT, LBP, …)
– Feature encodig (VLAD, Fisher)
– Machine learning tools (k-means, GMM, SVM)
– Matlab and C interfaces
References
– Schmid, C. and Mohr, R. (1997). Local grayvalue invariants for image
retrieval, IEEE Transactions on Pattern Analysis and Machine Intelligence
19(5): 530–535.
– Lindeberg, T. (1998). Feature detection with automatic scale selection,
International Journal of Computer Vision 30(2): 77–116.
– Lowe, D. (2004). Distinctive image features from scale-invariant keypoints,
International Journal of Computer Vision.
– Mikolajczyk, K. and Schmid, C. (2002). An affine invariant interest point
detector, Proc. Seventh European Conference on Computer Vision, Vol.
2350 of Lecture Notes in Computer Science, Springer Verlag, Berlin,
Copenhagen, Denmark, pp. I:128–142.
– Mikolajczyk, K. and Schmid, C. (2003). A performance evaluation of local
descriptors, Proc. Computer Vision and Pattern Recognition, pp. II: 257–
263.
– Sivic, J. and Zisserman, A. (2003). Video google: A text retrieval approach to
object matching in videos, Proc. Ninth International Conference on
Computer Vision, Nice, France, pp. 1470–1477.
– VLFeat (Vision Library Features) http://www.vlfeat.org/