Paper Overviews

Paper Overviews

3 types of descriptors:

SIFT / PCA-SIFT (Ke, Sukthankar)

GLOH (Mikolajczyk, Schmid)

DAISY (Tola, et al, Winder, et al)

Comparison of descriptors (Mikolajczyk, Schmid)

Paper Overviews

PCA-SIFT: SIFT-based but with a smaller descriptor

GLOH: modifies the SIFT descriptor for robustness and distinctiveness

DAISY: novel descriptor that uses graph cuts for matching and depth map estimation

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

SIFT• 1. Peak Selection• Make Gaussian pyramid

http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html


SIFT• 1. Peak Selection• Find local peaks using difference of

Gaussians–- Peaks are found at different scales



SIFT



SIFT• 2. Keypoint Localization

–Remove peaks that are “unstable”:» Peaks in low-contrast areas» Peaks along edges» Features not distinguishable

SIFT



SIFT• 3. Keypoint Orientation• Make histogram of gradients for a patch

of pixels• Orient all patches so the dominant

gradient direction is vertical

http://www.inf.fu-berlin.de/lehre/SS09/CV/uebungen/uebung09/SIFT.pdf

http://www.inf.fu-berlin.de/lehre/SS09/CV/uebungen/uebung09/SIFT.pdf

SIFT



SIFT• 4. Descriptors

• Ideal descriptor:• Compact• Distinctive from other descriptors• Robust against lighting / viewpoint changes

SIFT• 4. Descriptors

• A SIFT descriptor is a 128-element vector:–4x4 array of 8-bin histograms–Each histogram is a smoothed representation of gradient orientations of the patch

PCA-SIFT• Changes step 4 of the SIFT process to

create different descriptors

• Rationale: –Construction of SIFT descriptors is

complicated–Reason for constructing them that way is

unclear – Is there a simpler alternative?

PCA-SIFT• “Principal Component Analysis” (PCA)• A widely-used method of dimensionality

reduction• Used with SIFT to make a smaller feature

descriptor–By projecting the gradient patch into a smaller space

PCA-SIFT–Creating a descriptor for keypoints:

1.Create patch eigenspace2.Create projection matrix3.Create feature vector

PCA-SIFT–1. Create patch eigenspace–For each keypoint:•Take a 41x41 patch around the keypoint•Compute horizontal / vertical gradients

–Put all gradient vectors for all keypoints into a matrix

PCA-SIFT–1. Create patch eigenspace–M = matrix of gradients for all keypoints–Calculate covariance of M–Calculate eigenvectors of covariance(M)

PCA-SIFT–2. Create projection matrix–Choose first n eigenvectors

–This paper uses n = 20

–This is the projection matrix–Store for later use, no need to re-compute

PCA-SIFT–3. Create feature vector–For a single keypoint:•Take its gradient vector, project it with the projection matrix•Feature vector is of size n

–This is called Grad PCA in the paper–“Img PCA” - use image patch instead of gradient–Size difference: 128 elements (SIFT) vs. n = 20

PCA-SIFT–Results–Tested SIFT vs. “Grad PCA” and “Img PCA” on a series of image variations:

–Gaussian noise–45° rotation followed by 50% scaling–50% intensity scaling–Projective warp

PCA-SIFT–Results (Precision-recall curves)–Grad PCA (black) generally outperforms Img PCA (pink) and SIFT (purple) except when brightness is reduced–Both PCA methods outperform SIFT with illumination changes

PCA-SIFT–Results–PCA-SIFT also gets more matches correct on images taken at different viewpoints

–

A Performance Evaluation of Local Descriptors

Krystian Mikojaczyk and Cordilia Schmid

Problem Setting for Comparison Matching Problem

From a slide of David G. Lowe (IJCV 2004)

As we did in Project2: Panorama, we want to find correctpairs of points in two images.

Overview of Compared Methods Region Detectordetects interest points

Region Descriptordescribes the points

Matching StrategyHow to find pairs of points in two images?

Region Detector Harris Points Blob Structure Detector1. Harris-Laplace Regions (similar to DoG)2. Hessian-Laplace Regions 3. Harris-Affine Region4. Hessian-Affine Region Edge Detector　 Canny Detector

Region DescriptorsDescriptor Dimension Category Distance Measure

SIFT 128

SIFT Based Descriptors

Euclidean

PCA-SIFT 36GLOH 128

Shape Context 36 Similar to SIFT, but focues on Edge locations with Canny Detector

Spin 50 A sparse set of affine-invariant local patches are used

Steerable Filter 14

Differential DescriptorsForcuses on the properties of local derivaties (local jet)

Mahalanobis

Differential Invariants 14Complex Filters 1681 Consists of many fileters

Gradient Moments 20 Moment based descriptorCross Correlation 81 Uniformaly sampled locations

Matching Strategy Threshold-Based Matching

Nearest Neighbor Matching – Threshold

Nearest Neighbor Matching – Distance Ratiothreshold||DD|| BA

threshold||DD||||DD||

CA

BA

DB: the first neighbor

DB: the first neighborDC: the second neighbor

Peformance Measurements Repeatability rate, ROC

Recall-Precision

Recall =# of correct maches

Total # of correct matches

Precision =# of correct maches

# of correct matches + # of false matches

TP (True Positive)

Actual positive

TP (True Positive)

Predicted positive

=

=

Example of Recall-Precision Let's say that our method detected.. * 50 corrsponding pairs were extracted * 40 detected pairs were correct pairs * As a groud truth, there are 200 correct pairs!Then, Recall = C/B = 40/200 = 20% Precision = C/A = 40/50 = 80%

The perfect descriptor gives 100% recall for any value of Precision!!

Actual posPredicted Pos

A BA C B

DataSet 6 different transformed images

Rotation

Image Blur

Zoom + Rotation

Viewpoint Change

Light ChangeJPEG Compression

Matching Strategies

* Hessian-Affine Regions

Nearnest Neigbor Matching – Threshold Nearnest Neigbor Matching – Distance Ratio

Threshold based Matching

View Point Change

With Hessian Affine Regions With Harris-Affine Regions

Scale Change with Rotation

Hessian-Laplace Regions Harris-Laplace Regions

Image Rotation of 30~45 degree

Harris Points

Image Blur

Hessian Affine Regions

JPEG Compression


IlluminationChanges


Ranking of Descriptor

1. SIFT-based descriptors, 128 dimensions GLOH, SIFT2. Shape Context, 36 dimensions

3. PCA-SIFT, 36 dimensions

4. Gradient moments & Steerable Filters ( 20 dimensions ) & ( 14 dimensions)

5. Other descriptors

High Peformance

Low Peformance

Note: This performance is for matching problem. This is not general performance.

Ranking of Difficult Image Transformation

1. Scale & Rotation & illumination

2. JPEG Compression

3. Image Blur

4. View Point Change

easy

difficult

1. Structured Scene

2. Textured Scene

easy

difficult

Two Textured Scenes

Other Results Hessian Regions are better than Harris Regions Nearnest Neigbor based matching is better than a

simple threshold based matching SIFT becomes better when nearenest neigbor

distance ration is used Robust region descriptors peform bettern than

point-wise descriptors Image Rotation does not have big impact on the

accuracy of descriptors

A Fast Local Descriptor for Dense MatchingEngin Tola, Vincent Lepetit, Pascal FuaEcole Polytechnique Federale de Lausanne, Switzerland

Paper novelty

• Introduces DAISY local image descriptor – much faster to compute than SIFT for dense point matching– works on the par or better than SIFT

• DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth.– works on low-quality images such as the ones captured by video streams

SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to

image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)

SIFT local image descriptor• Each bin contains a weighted sum of the norms of the image gradients

around its center, where the weights roughly depend on the distance to the bin center

DAISY local image descriptor• Gaussian convolved orientation maps are calculated for every direction

: Gaussian convolution filter with variance S : image gradient in direction o (.)+ : operator (a)+ = max(a, 0) : orientation maps

• Every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms

DAISY local image descriptor

DAISY local image descriptorI. Histograms at every pixel location are computed

: histogram at location (u, v) : Gaussian convolved orientation mapsII. Histograms are normalized to unit normIII. Local image descriptor is computed as

: the location with distance R from (u,v) in the direction given by j when the directions are quantized into N values

From Descriptor to Depth Map• The model uses EM to estimate depth map Z and occlusion map O by

maximizing

: descriptor of image n

Results

Picking the Best DaisySimon Winder, Gang Hua, Matthew Brown

Paper Contribution

• Utilize novel ground-truth training set• Test multiple configurations of low-level filters and DAISY pooling and

optimize over their parameter• Investigate the effects of robust normalization• Apply PCA dimension reduction and dynamic range reduction to compress

the representation of descriptors• Discuss computational efficiency and provide a list of recommendations

for descriptors that are useful in different scenarios

Descriptor Pipeline

• T-block takes the pixels from the image patch and transforms them to produce a vector of k non-linear filter responses at each pixel.– Block T1 involves computing gradients at each pixel and bilinearly quantizing the

gradient angle into k orientation bins as in SIFT– Block T2 rectifies the x and y components of the gradient to produce a vector of length

4:

– Block T3 uses steerable filters evaluated at a number of different orientations

Descriptor Pipeline

• S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of kN dimensions.

Descriptor Pipeline

• N-block normalizes the complete descriptor to provide invariance to lighting changes. Use a form of threshold normalization with the following stages– Normalize the descriptor to a unit vector– Clip all the elements of the vector that are above a threshold by computing

– Scale the vector to a byte range.

Descriptor Pipeline

• Dimension reduction. Apply principle components analysis to compress descriptor.– First optimize the parameters of the descriptor and then compute the matrix of principal

components base on all descriptors computed on the training set.– Next find the best dimensionality for reduction by computing the error rate on random

subsets of the training data.– Progressively increasing the dimensionality by adding PCA bases until minimum error is

found.

Descriptor Pipeline

• Quantization further compress descriptor to reduce memory requirement for large database of descriptor by quantizing descriptor elements into L levels.

Training

• Use 3D reconstructions as a source of training data.

• Use machine learning approach to optimize parameters.

Results

• Gradient-based descriptor

Results

• Dimension Reduction

Results

• Descriptor Quantization

Paper Overviews

Documents

Transcript of Paper Overviews