CS598:Visual information Retrieval
description
Transcript of CS598:Visual information Retrieval
![Page 1: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/1.jpg)
CS598:VISUAL INFORMATION RETRIEVALLecture III: Image Representation: Invariant Local Image Descriptors
![Page 2: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/2.jpg)
RECAP OF LECTURE II Color, texture, descriptors
Color histogram Color correlogram LBP descriptors Histogram of oriented gradient Spatial pyramid matching
Distance & Similarity measure Lp distances Chi-Square distances KL distances EMD distances Histogram intersection
![Page 3: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/3.jpg)
LECTURE III: PART ILocal Feature Detector
![Page 4: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/4.jpg)
OUTLINE Blob detection
Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region
![Page 5: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/5.jpg)
GAUSSIAN KERNEL
• Constant factor at front makes volume sum to 1 (can be ignored when computing the filter values, as we should renormalize weights to sum to 1 in any case)
0.003 0.013 0.022 0.013 0.0030.013 0.059 0.097 0.059 0.0130.022 0.097 0.159 0.097 0.0220.013 0.059 0.097 0.059 0.0130.003 0.013 0.022 0.013 0.003
5 x 5, = 1
Source: C. Rasmussen
![Page 6: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/6.jpg)
GAUSSIAN KERNEL
• Standard deviation : determines extent of smoothingSource: K. Grauman
σ = 2 with 30 x 30 kernel
σ = 5 with 30 x 30 kernel
![Page 7: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/7.jpg)
CHOOSING KERNEL WIDTH• The Gaussian function has infinite support,
but discrete filters use finite kernels
Source: K. Grauman
![Page 8: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/8.jpg)
CHOOSING KERNEL WIDTH• Rule of thumb: set filter half-width to about 3σ
![Page 9: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/9.jpg)
GAUSSIAN VS. BOX FILTERING
![Page 10: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/10.jpg)
GAUSSIAN FILTERS• Remove “high-frequency” components from
the image (low-pass filter)• Convolution with self is another Gaussian
So can smooth with small- kernel, repeat, and get same result as larger- kernel would have
Convolving two times with Gaussian kernel with std. dev. σ is same as convolving once with kernel with std. dev.
• Separable kernel Factors into product of two 1D Gaussians
Source: K. Grauman
2
![Page 11: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/11.jpg)
SEPARABILITY OF THE GAUSSIAN FILTER
Source: D. Lowe
![Page 12: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/12.jpg)
SEPARABILITY EXAMPLE
*
*
=
=
2D convolution(center location only)
Source: K. Grauman
The filter factorsinto a product of 1D
filters:
Perform convolutionalong rows:
Followed by convolutionalong the remaining column:
![Page 13: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/13.jpg)
WHY IS SEPARABILITY USEFUL?• What is the complexity of filtering an n×n
image with an m×m kernel? O(n2 m2)
• What if the kernel is separable?O(n2 m)
![Page 14: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/14.jpg)
OUTLINE Blob detection
Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region
![Page 15: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/15.jpg)
BLOB DETECTION IN 2D Laplacian of Gaussian: Circularly symmetric
operator for blob detection in 2D
2
2
2
22
yg
xgg
![Page 16: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/16.jpg)
BLOB DETECTION IN 2D Laplacian of Gaussian: Circularly symmetric
operator for blob detection in 2D
2
2
2
222
norm yg
xgg Scale-normalized:
![Page 17: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/17.jpg)
• At what scale does the Laplacian achieve a maximum response to a binary circle of radius r?
SCALE SELECTION
r
image Laplacian
![Page 18: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/18.jpg)
• At what scale does the Laplacian achieve a maximum response to a binary circle of radius r?
• To get maximum response, the zeros of the Laplacian have to be aligned with the circle
• The Laplacian is given by:
• Therefore, the maximum response occurs at
SCALE SELECTION
r
image
62/)(222 2/)2(222
yxeyx .2/r
circle
Laplacian
0
![Page 19: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/19.jpg)
CHARACTERISTIC SCALE• We define the characteristic scale of a blob
as the scale that produces peak of Laplacian response in the blob center
characteristic scale
T. Lindeberg (1998). "Feature detection with automatic scale selection." International Journal of Computer Vision 30 (2): pp 77--116.
![Page 20: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/20.jpg)
SCALE-SPACE BLOB DETECTOR1. Convolve image with scale-normalized
Laplacian at several scales
![Page 21: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/21.jpg)
SCALE-SPACE BLOB DETECTOR: EXAMPLE
![Page 22: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/22.jpg)
SCALE-SPACE BLOB DETECTOR: EXAMPLE
![Page 23: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/23.jpg)
SCALE-SPACE BLOB DETECTOR1. Convolve image with scale-normalized
Laplacian at several scales2. Find maxima of squared Laplacian response
in scale-space
![Page 24: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/24.jpg)
SCALE-SPACE BLOB DETECTOR: EXAMPLE
![Page 25: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/25.jpg)
OUTLINE Blob detection
Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region
![Page 26: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/26.jpg)
Approximating the Laplacian with a difference of Gaussians:
2 ( , , ) ( , , )xx yyL G x y G x y
( , , ) ( , , )DoG G x y k G x y
(Laplacian)
(Difference of Gaussians)
EFFICIENT IMPLEMENTATION
![Page 27: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/27.jpg)
EFFICIENT IMPLEMENTATION
David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004.
![Page 28: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/28.jpg)
INVARIANCE AND COVARIANCE PROPERTIES• Laplacian (blob) response is invariant w.r.t.
rotation and scaling• Blob location and scale is covariant w.r.t.
rotation and scaling• What about intensity change?
![Page 29: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/29.jpg)
OUTLINE Blob detection
Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region
![Page 30: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/30.jpg)
ACHIEVING AFFINE COVARIANCE• Affine transformation approximates viewpoint
changes for roughly planar objects and roughly orthographic cameras
![Page 31: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/31.jpg)
ACHIEVING AFFINE COVARIANCE
RRIIIIII
yxwMyyx
yxx
yx
2
112
2
, 00
),(
direction of the slowest
change
direction of the fastest
change
(max)-1/2
(min)-1/2
Consider the second moment matrix of the window containing the blob:
const][
vu
Mvu
Recall:
This ellipse visualizes the “characteristic shape” of the window
![Page 32: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/32.jpg)
AFFINE ADAPTATION EXAMPLE
Scale-invariant regions (blobs)
![Page 33: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/33.jpg)
AFFINE ADAPTATION EXAMPLE
Affine-adapted blobs
![Page 34: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/34.jpg)
FROM COVARIANT DETECTION TO INVARIANT DESCRIPTION• Geometrically transformed versions of the same neighborhood
will give rise to regions that are related by the same transformation
• What to do if we want to compare the appearance of these image regions?
Normalization: transform these regions into same-size circles
![Page 35: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/35.jpg)
• Problem: There is no unique transformation from an ellipse to a unit circle
We can rotate or flip a unit circle, and it still stays a unit circle
AFFINE NORMALIZATION
![Page 36: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/36.jpg)
• To assign a unique orientation to circular image windows:
Create histogram of local gradient directions in the patch
Assign canonical orientation at peak of smoothed histogram
ELIMINATING ROTATION AMBIGUITY
0 2
![Page 37: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/37.jpg)
FROM COVARIANT REGIONS TO INVARIANT FEATURES
Extract affine regions Normalize regionsEliminate rotational
ambiguityCompute appearance
descriptors
SIFT (Lowe ’04)
![Page 38: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/38.jpg)
INVARIANCE VS. COVARIANCE Invariance:
features(transform(image)) = features(image) Covariance:
features(transform(image)) =transform(features(image))
Covariant detection => invariant description
![Page 39: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/39.jpg)
LECTURE III: PART ILearning Local Feature Descriptor
David G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, Vol. 60, No. 2, pp. 91-110• 6291 as of 02/28/2010; 22481 as of 02/03/2014• Our goal is to design the best local image descriptors
in the world.
![Page 40: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/40.jpg)
Panoramic stitching [Brown et al. CVPR05]
3D reconstruction [Snavely et al. SIGGRAPH06]
Image databases [Mikolajczyk & Schmid ICCV01]
Object or location recognition [Nister & Stewenius CVPR06] [Schindler et al. CVPR07]
Robot navigation [Deans et al. AC05]
Courtesy of Seitz & Szeliski
Real-worldFace recognition [Wright & Hua CVPR09][Hua & Akbarzadeh ICCV09]
![Page 41: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/41.jpg)
TYPICAL MATCHING PROCESS
Image 1 Image 2
Image 1 Image 2
Interest point/region detection (sparse)
Dense sampling of image patches
![Page 42: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/42.jpg)
TYPICAL MATCHING PROCESSImage 1 Image 2
P
Q
Descriptor space
![Page 43: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/43.jpg)
PROBLEM TO SOLVE
To obtain the most discriminative, compact, and computationally efficient local image descriptors. How can we get ground truth data? What is the form of the descriptor function f(.)? What is the measure for optimality?
Learning a function of a local image patchdescriptor = f ( )
s.t. a nearest neighbor classifier is optimal
![Page 44: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/44.jpg)
HOW CAN WE GET GROUND TRUTH DATA?
![Page 45: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/45.jpg)
TRAINING DATA
[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]
3D PointCloud
Multiview stereo = Training data
![Page 46: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/46.jpg)
TRAINING DATA
[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]
3D PointCloud
![Page 47: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/47.jpg)
TRAINING DATA3D PointCloud
[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]
![Page 48: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/48.jpg)
TRAINING DATA3D PointCloud
[Goesele et al. - ICCV’07] [Snavely et al. - SIGGRAPH’06 ]
![Page 49: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/49.jpg)
Statue of liberty (New York) – LibertyNotre Dame (Paris) – Notre DameHalf Dome (Yosemite) - Yosemite http://www.cs.ubc.ca/~mbrown/patchdata/patchdata.html
Liberty Yosemite
![Page 50: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/50.jpg)
WHAT IS THE FORM OF THE DESCRIPTOR FUNCTION?
![Page 51: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/51.jpg)
DESCRIPTOR ALGORITHMS
Algorithm
Normalizedimage patch
Descriptorvector
Gradientsquantized to
k orientations
NormalizeSummation
n grid region
[ SIFT – Lowe ICCV’99 ]
k × n vector
![Page 52: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/52.jpg)
DESCRIPTOR ALGORITHMS
Algorithm
Normalizedimage patch
Descriptorvector
GradientsQuantized
to k Orientation
s
Normalize(plus PCA)
[ GLOH – Mikolajzcyk & Schmid PAMI’05 ]
Summationn Log-Polar region
![Page 53: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/53.jpg)
DESCRIPTOR ALGORITHMS
Algorithm
Normalizedimage patch
Descriptorvector
CreateEdge Map Normalize
[ Shape Context – Belongie et al, NIPS’00 ]
Summationn Log-Polar region
![Page 54: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/54.jpg)
DESCRIPTOR ALGORITHMS
Algorithm
Normalizedimage patch
Descriptorvector
Grey value quantized into k bins
Normalize
[ Spin Image – Lazebnik et al, CVPR’03 ]
Summationn circular region
![Page 55: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/55.jpg)
DESCRIPTOR ALGORITHMS
Algorithm
Normalizedimage patch
Descriptorvector
Featuredetector Normalize
T S N[ Geometric Blur – Berg& Malik CVPR’01 ][ DAISY - Tola et al. CVPR’08]
Summationn Gaussian region
![Page 56: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/56.jpg)
OUR DESCRIPTOR ALGORITHM
Descriptorvector
T S N E Q
![Page 57: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/57.jpg)
S-BLOCK: PICKING THE BEST DAISY
S-Block: DAISY
Descriptorvector
T S N E Q
1r6s 1r8s 2r6s 2r8s
![Page 58: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/58.jpg)
T-BLOCK
DescriptorVector
T S N E Q T-Block:
• T1: Gradient bi-linearly weighted orientation binning• T2: Rectified gradient {| x|- x , | x|+ x , | y|- y , | y|+
y}• T3: Steerable filters
* <> 0 T3-4th-4
![Page 59: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/59.jpg)
N-E-Q BLOCKS
N-Block: Uniform v.s. SIFT-like normalization E-Block: Principle component analysis (PCA) Q-block: qi = βLvi + α, β is learnt from data, α =
0.5 if L is an odd number and α = 0 otherwise.
DescriptorVector
T S N E Q
![Page 60: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/60.jpg)
WHAT IS THE OPTIMAL CRITERION?
![Page 61: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/61.jpg)
DISCRIMINATIVE LEARNING AND OPTIMAL CRITERION
ST N
Parameterstraining pairs
False positive%Tr
ue p
ositi
ve %
Update Parameters(Powell)
Descriptor distances
• E-Block and Q-Block are optimized independently
95
α (error rate)
100,000 match/non-match
![Page 62: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/62.jpg)
SIFT-LIKE NORMALIZATION & PCA
SIFT like normalization has a clear sweet spot. PCA can usually reduce the dimension to 30~50.
Uniform normalization
![Page 63: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/63.jpg)
THE MOST DISCRIMINATIVE DAISY
Steerable filters at two spatial scales with PCA T3-2nd-6-2r6s + 2-scale + PCA: 42 dimension, 9.5% 2 × 6 × 2 × 2 × (2 × 6+1) = 624 dimensions before PCA
2r6s
![Page 64: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/64.jpg)
THE MOST COMPACT DAISY
T3-2nd-4-2r8s13 bytes, 13.2%
T2-4-1r8s-PCA7.5 bytes, 24.4%
SIFT128 bytes,
26.1%
• 2 bits /dimension is sufficient before PCA• 4 bits /dimension is needed after PCA• http://www.cs.ubc.ca/~mbrown/patchdata/patchdata.html
![Page 65: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/65.jpg)
SOME OF THE ERRORS MADE … …
The world is repeating itself….
![Page 66: CS598:Visual information Retrieval](https://reader035.fdocuments.us/reader035/viewer/2022081520/56816371550346895dd44d2e/html5/thumbnails/66.jpg)
SUMMARY Blob detection
Brief of Gaussian filter Scale selection Lapacian of Gaussian (LoG) detector Difference of Gaussian (DoG) detector Affine co-variant region
Learning local descriptors How can we get ground-truth data? What is the form of the descriptor function? What is the optimal criterion? How do we optimize it?