Paper Overviews

65
Paper Overviews 3 types of descriptors: SIFT / PCA-SIFT (Ke, Sukthankar) GLOH (Mikolajczyk, Schmid) DAISY (Tola, et al, Winder, et al) Comparison of descriptors (Mikolajczyk, Schmid)

description

Paper Overviews. 3 types of descriptors : SIFT / PCA-SIFT ( Ke , Sukthankar ) GLOH ( Mikolajczyk , Schmid ) DAISY ( Tola , et al, Winder, et al) Comparison of descriptors ( Mikolajczyk , Schmid ). Paper Overviews. PCA-SIFT: SIFT-based but with a smaller descriptor - PowerPoint PPT Presentation

Transcript of Paper Overviews

Page 1: Paper Overviews

Paper Overviews

3 types of descriptors:

SIFT / PCA-SIFT (Ke, Sukthankar)

GLOH (Mikolajczyk, Schmid)

DAISY (Tola, et al, Winder, et al)

Comparison of descriptors (Mikolajczyk, Schmid)

Page 2: Paper Overviews

Paper Overviews

PCA-SIFT: SIFT-based but with a smaller descriptor

GLOH: modifies the SIFT descriptor for robustness and distinctiveness

DAISY: novel descriptor that uses graph cuts for matching and depth map estimation

Page 3: Paper Overviews

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

Page 4: Paper Overviews

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

Page 5: Paper Overviews

SIFT• 1. Peak Selection• Make Gaussian pyramid

http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html

Page 6: Paper Overviews

SIFT• 1. Peak Selection• Find local peaks using difference of

Gaussians–- Peaks are found at different scales

http://www.cra.org/Activities/craw_archive/dmp/awards/2006/Bolan/DMP_Pages/filters.html

Page 7: Paper Overviews

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

Page 8: Paper Overviews

SIFT• 2. Keypoint Localization

–Remove peaks that are “unstable”:» Peaks in low-contrast areas» Peaks along edges» Features not distinguishable

Page 9: Paper Overviews

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

Page 10: Paper Overviews

SIFT• 3. Keypoint Orientation• Make histogram of gradients for a patch

of pixels• Orient all patches so the dominant

gradient direction is vertical

http://www.inf.fu-berlin.de/lehre/SS09/CV/uebungen/uebung09/SIFT.pdf

Page 11: Paper Overviews

SIFT

• “Scale Invariant Feature Transform”• 4 stages:

1.Peak selection2.Keypoint localization3.Keypoint orientation4.Descriptors

Page 12: Paper Overviews

SIFT• 4. Descriptors

• Ideal descriptor:• Compact• Distinctive from other descriptors• Robust against lighting / viewpoint changes

Page 13: Paper Overviews

SIFT• 4. Descriptors

• A SIFT descriptor is a 128-element vector:–4x4 array of 8-bin histograms–Each histogram is a smoothed representation of gradient orientations of the patch

Page 14: Paper Overviews

PCA-SIFT• Changes step 4 of the SIFT process to

create different descriptors

• Rationale: –Construction of SIFT descriptors is

complicated–Reason for constructing them that way is

unclear – Is there a simpler alternative?

Page 15: Paper Overviews

PCA-SIFT• “Principal Component Analysis” (PCA)• A widely-used method of dimensionality

reduction• Used with SIFT to make a smaller feature

descriptor–By projecting the gradient patch into a smaller space

Page 16: Paper Overviews

PCA-SIFT–Creating a descriptor for keypoints:

1.Create patch eigenspace2.Create projection matrix3.Create feature vector

Page 17: Paper Overviews

PCA-SIFT–1. Create patch eigenspace–For each keypoint:•Take a 41x41 patch around the keypoint•Compute horizontal / vertical gradients

–Put all gradient vectors for all keypoints into a matrix

Page 18: Paper Overviews

PCA-SIFT–1. Create patch eigenspace–M = matrix of gradients for all keypoints–Calculate covariance of M–Calculate eigenvectors of covariance(M)

Page 19: Paper Overviews

PCA-SIFT–2. Create projection matrix–Choose first n eigenvectors

–This paper uses n = 20

–This is the projection matrix–Store for later use, no need to re-compute

Page 20: Paper Overviews

PCA-SIFT–3. Create feature vector–For a single keypoint:•Take its gradient vector, project it with the projection matrix•Feature vector is of size n

–This is called Grad PCA in the paper–“Img PCA” - use image patch instead of gradient–Size difference: 128 elements (SIFT) vs. n = 20

Page 21: Paper Overviews

PCA-SIFT–Results–Tested SIFT vs. “Grad PCA” and “Img PCA” on a series of image variations:

–Gaussian noise–45° rotation followed by 50% scaling–50% intensity scaling–Projective warp

Page 22: Paper Overviews

PCA-SIFT–Results (Precision-recall curves)–Grad PCA (black) generally outperforms Img PCA (pink) and SIFT (purple) except when brightness is reduced–Both PCA methods outperform SIFT with illumination changes

Page 23: Paper Overviews

PCA-SIFT–Results–PCA-SIFT also gets more matches correct on images taken at different viewpoints

Page 24: Paper Overviews

A Performance Evaluation of Local Descriptors

Krystian Mikojaczyk and Cordilia Schmid

Page 25: Paper Overviews

Problem Setting for Comparison Matching Problem

From a slide of David G. Lowe (IJCV 2004)

As we did in Project2: Panorama, we want to find correctpairs of points in two images.

Page 26: Paper Overviews

Overview of Compared Methods Region Detectordetects interest points

Region Descriptordescribes the points

Matching StrategyHow to find pairs of points in two images?

Page 27: Paper Overviews

Region Detector Harris Points Blob Structure Detector1. Harris-Laplace Regions (similar to DoG)2. Hessian-Laplace Regions 3. Harris-Affine Region4. Hessian-Affine Region Edge Detector  Canny Detector

Page 28: Paper Overviews

Region DescriptorsDescriptor Dimension Category Distance Measure

SIFT 128

SIFT Based Descriptors

Euclidean

PCA-SIFT 36GLOH 128

Shape Context 36 Similar to SIFT, but focues on Edge locations with Canny Detector

Spin 50 A sparse set of affine-invariant local patches are used

Steerable Filter 14

Differential DescriptorsForcuses on the properties of local derivaties (local jet)

Mahalanobis

Differential Invariants 14Complex Filters 1681 Consists of many fileters

Gradient Moments 20 Moment based descriptorCross Correlation 81 Uniformaly sampled locations

Page 29: Paper Overviews

Matching Strategy Threshold-Based Matching

Nearest Neighbor Matching – Threshold

Nearest Neighbor Matching – Distance Ratiothreshold||DD|| BA

threshold||DD||||DD||

CA

BA

DB: the first neighbor

DB: the first neighborDC: the second neighbor

Page 30: Paper Overviews

Peformance Measurements Repeatability rate, ROC

Recall-Precision

Recall =# of correct maches

Total # of correct matches

Precision =# of correct maches

# of correct matches + # of false matches

TP (True Positive)

Actual positive

TP (True Positive)

Predicted positive

=

=

Page 31: Paper Overviews

Example of Recall-Precision Let's say that our method detected.. * 50 corrsponding pairs were extracted * 40 detected pairs were correct pairs * As a groud truth, there are 200 correct pairs!Then, Recall = C/B = 40/200 = 20% Precision = C/A = 40/50 = 80%

The perfect descriptor gives 100% recall for any value of Precision!!

Actual posPredicted Pos

A BA C B

Page 32: Paper Overviews

DataSet 6 different transformed images

Rotation

Image Blur

Zoom + Rotation

Viewpoint Change

Light ChangeJPEG Compression

Page 33: Paper Overviews

Matching Strategies

* Hessian-Affine Regions

Nearnest Neigbor Matching – Threshold Nearnest Neigbor Matching – Distance Ratio

Threshold based Matching

Page 34: Paper Overviews

View Point Change

With Hessian Affine Regions With Harris-Affine Regions

Page 35: Paper Overviews

Scale Change with Rotation

Hessian-Laplace Regions Harris-Laplace Regions

Page 36: Paper Overviews

Image Rotation of 30~45 degree

Harris Points

Page 37: Paper Overviews

Image Blur

Hessian Affine Regions

Page 38: Paper Overviews

JPEG Compression

* Hessian-Affine Regions

Page 39: Paper Overviews

IlluminationChanges

* Hessian-Affine Regions

Page 40: Paper Overviews

Ranking of Descriptor

1. SIFT-based descriptors, 128 dimensions GLOH, SIFT2. Shape Context, 36 dimensions

3. PCA-SIFT, 36 dimensions

4. Gradient moments & Steerable Filters ( 20 dimensions ) & ( 14 dimensions)

5. Other descriptors

High Peformance

Low Peformance

Note: This performance is for matching problem. This is not general performance.

Page 41: Paper Overviews

Ranking of Difficult Image Transformation

1. Scale & Rotation & illumination

2. JPEG Compression

3. Image Blur

4. View Point Change

easy

difficult

1. Structured Scene

2. Textured Scene

easy

difficult

Two Textured Scenes

Page 42: Paper Overviews

Other Results Hessian Regions are better than Harris Regions Nearnest Neigbor based matching is better than a

simple threshold based matching SIFT becomes better when nearenest neigbor

distance ration is used Robust region descriptors peform bettern than

point-wise descriptors Image Rotation does not have big impact on the

accuracy of descriptors

Page 43: Paper Overviews

A Fast Local Descriptor for Dense MatchingEngin Tola, Vincent Lepetit, Pascal FuaEcole Polytechnique Federale de Lausanne, Switzerland

Page 44: Paper Overviews

Paper novelty

• Introduces DAISY local image descriptor – much faster to compute than SIFT for dense point matching– works on the par or better than SIFT

• DAISY descriptors are fed into expectation-maximization (EM) algorithm which uses graph cuts to estimate the scene’s depth.– works on low-quality images such as the ones captured by video streams

Page 45: Paper Overviews

SIFT local image descriptor • SIFT descriptor is a 3–D histogram in which two dimensions correspond to

image spatial dimensions and the additional dimension to the image gradient direction (normally discretized into 8 bins)

Page 46: Paper Overviews

SIFT local image descriptor• Each bin contains a weighted sum of the norms of the image gradients

around its center, where the weights roughly depend on the distance to the bin center

Page 47: Paper Overviews

DAISY local image descriptor• Gaussian convolved orientation maps are calculated for every direction

: Gaussian convolution filter with variance S : image gradient in direction o (.)+ : operator (a)+ = max(a, 0) : orientation maps

• Every location in contains a value very similar to what a bin in SIFT contains: a weighted sum computed over an area of gradient norms

Page 48: Paper Overviews

DAISY local image descriptor

Page 49: Paper Overviews

DAISY local image descriptorI. Histograms at every pixel location are computed

: histogram at location (u, v) : Gaussian convolved orientation mapsII. Histograms are normalized to unit normIII. Local image descriptor is computed as

: the location with distance R from (u,v) in the direction given by j when the directions are quantized into N values

Page 50: Paper Overviews

From Descriptor to Depth Map• The model uses EM to estimate depth map Z and occlusion map O by

maximizing

: descriptor of image n

Page 51: Paper Overviews

Results

Page 52: Paper Overviews

Results

Page 53: Paper Overviews

Results

Page 54: Paper Overviews

Picking the Best DaisySimon Winder, Gang Hua, Matthew Brown

Page 55: Paper Overviews

Paper Contribution

• Utilize novel ground-truth training set• Test multiple configurations of low-level filters and DAISY pooling and

optimize over their parameter• Investigate the effects of robust normalization• Apply PCA dimension reduction and dynamic range reduction to compress

the representation of descriptors• Discuss computational efficiency and provide a list of recommendations

for descriptors that are useful in different scenarios

Page 56: Paper Overviews

Descriptor Pipeline

• T-block takes the pixels from the image patch and transforms them to produce a vector of k non-linear filter responses at each pixel.– Block T1 involves computing gradients at each pixel and bilinearly quantizing the

gradient angle into k orientation bins as in SIFT– Block T2 rectifies the x and y components of the gradient to produce a vector of length

4:

– Block T3 uses steerable filters evaluated at a number of different orientations

Page 57: Paper Overviews

Descriptor Pipeline

• S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of kN dimensions.

Page 58: Paper Overviews

Descriptor Pipeline

• S-block spatially accumulates weighted filter vectors to give N linearly summed vectors of length k and these are concatenated to form a descriptor of kN dimensions.

Page 59: Paper Overviews

Descriptor Pipeline

• N-block normalizes the complete descriptor to provide invariance to lighting changes. Use a form of threshold normalization with the following stages– Normalize the descriptor to a unit vector– Clip all the elements of the vector that are above a threshold by computing

– Scale the vector to a byte range.

Page 60: Paper Overviews

Descriptor Pipeline

• Dimension reduction. Apply principle components analysis to compress descriptor.– First optimize the parameters of the descriptor and then compute the matrix of principal

components base on all descriptors computed on the training set.– Next find the best dimensionality for reduction by computing the error rate on random

subsets of the training data.– Progressively increasing the dimensionality by adding PCA bases until minimum error is

found.

Page 61: Paper Overviews

Descriptor Pipeline

• Quantization further compress descriptor to reduce memory requirement for large database of descriptor by quantizing descriptor elements into L levels.

Page 62: Paper Overviews

Training

• Use 3D reconstructions as a source of training data.

• Use machine learning approach to optimize parameters.

Page 63: Paper Overviews

Results

• Gradient-based descriptor

Page 64: Paper Overviews

Results

• Dimension Reduction

Page 65: Paper Overviews

Results

• Descriptor Quantization