A Comparison of Feature Descriptors - …ttic.uchicago.edu/~smaji/reports/cs294-6-report.pdf · A...

8
A Comparison of Feature Descriptors Subhransu Maji University of California at Berkeley [email protected] Abstract Local Features provide a compact description of an im- age and are widely used in a number of vision related tasks like stereo reconstruction, objection recognition, tracking, etc. Owing to its importance, it has received considerable attention in the computer vision research community and they have been evaluated in restricted settings. In this work we do two things (1) Do a study of feature descriptors and empirically find the best set of parameters the geometric blur feature in the setting proposed by Mikolaczyk, et.al (2) We explore a looser notion of matching where there are ob- jects within the same class vary a lot. This general notion of matching is more relevant in the context of object cate- gorization. We take the Caltech 101 dataset and obtain a rough feature alignment by aligning the contours of objects within the same class. A We then perform a similar repeata- bility and descriptor performance on a small fraction of the dataset. 1. Introduction Local Feature representation of images are widely in use for recognition and matching in the context of recovering geometry, dealing with occlusion, tracking, etc. One would want these features to be both repeatable as well as invariant to various viewing conditions like lighting, viewpoint and object orientation. Depending on the task in hand various invariances become important. For the purposes of stereo matching where there is a projective (approximated to affine if the objects are far away from the camera) a descriptor which is very discriminative is useful. Also these features should be affine invariant. On the other hand if we look at examples of images within the same class the same local feature can have much larger variance. For e.g. lip corners of all human faces). The feature descriptor should be able to accommodate the intraclass variation but also give good inter class discriminativeness. Instead of complete rotation invariance we would like the features to be robust to small transformations of the object. 2. Geometric Blur Geometic blur features[4] gives image descriptors which are robust to small transformations. It does so by averag- ing the signal over small transformations and then sampling the signal at fixed locations to construct a feature out of it. For images oriented edge response has been shown to be a good signal. In practice the averaging of transforming can be modelled by convolving the signal with a kernel which weighs the contribution of neighbouring signals at a point. Gaussian Kernels with the support increasing linearly with the distance were used in. We use the same in this work. 2.1. Evaluation of GB features In this section we find the best set of parameters for the task as proposed in the work of Mikolajczyk , et. al.[2]. Briefly this datasets have images which are Affine trans- formations of one another. These datsets vary the rotation, scale, viewpoint, resolution and ligting conditions indepen- dently of one another. Ground truth homography between the images is also provided. This dataset provides a plat- form to evaluate an descriptor’s performnace over the vary- ing conditions. In an earlier work Mikolajczyk[2] did a comparision of the descriptors on the same dataset which gives us a baseline to compare the GB features with. The tunable parameters of the geometric blur feature in- clude the number of channels, blur gradient, base blur and the feature scale. In the experiments we tune the the blur gradient, base blur and the feature scales. We use oriented edge energy as the edge response and keep the number of edge channels fixed at four. The effect of each of these are described as follows: Feature Scale(s) The harris affine detector returns ob- tained by performing the maxima of the laplace over the scale space. The support of the feature descriptor is taken to be a multiple of the scale returned by the detector to include some context around the blob like structure. Thus, bigger the scale the more the context around the region. The support region is then normal- ized to a 51X51 image. Since GB features are not rota- tion invariant the image is steered along the dominant orientation and then the GB features are computed on these normalized patches. The performance of the de- scriptor is computed at scales 3,5,7 and 9. We observe that the performance increases on increasing the scale in all the datasets. Base Blur (β) Fixing the scale(s=9) which performs the best in previous step we test the performance at β 1, 10, 20 and 40. We notice that increasing the base blur consistently reduces the performance and the per- formance is best at β = 1.

Transcript of A Comparison of Feature Descriptors - …ttic.uchicago.edu/~smaji/reports/cs294-6-report.pdf · A...

A Comparison of Feature Descriptors

Subhransu MajiUniversity of California at Berkeley

[email protected]

Abstract

Local Features provide a compact description of an im-age and are widely used in a number of vision related taskslike stereo reconstruction, objection recognition, tracking,etc. Owing to its importance, it has received considerableattention in the computer vision research community andthey have been evaluated in restricted settings. In this workwe do two things (1) Do a study of feature descriptors andempirically find the best set of parameters the geometricblur feature in the setting proposed by Mikolaczyk, et.al (2)We explore a looser notion of matching where there are ob-jects within the same class vary a lot. This general notionof matching is more relevant in the context of object cate-gorization. We take theCaltech 101 dataset and obtain arough feature alignment by aligning the contours of objectswithin the same class. A We then perform a similar repeata-bility and descriptor performance on a small fraction of thedataset.

1. Introduction

Local Feature representation of images are widely in usefor recognition and matching in the context of recoveringgeometry, dealing with occlusion, tracking, etc. One wouldwant these features to be both repeatable as well as invariantto various viewing conditions like lighting, viewpoint andobject orientation. Depending on the task in hand variousinvariances become important. For the purposes of stereomatching where there is a projective (approximated to affineif the objects are far away from the camera) a descriptorwhich is very discriminative is useful. Also these featuresshould be affine invariant. On the other hand if we look atexamples of images within the same class the same localfeature can have much larger variance. For e.g. lip cornersof all human faces). The feature descriptor should be ableto accommodate the intraclass variation but also give goodinter class discriminativeness. Instead of complete rotationinvariance we would like the features to be robust to smalltransformations of the object.

2. Geometric Blur

Geometic blur features[4] gives image descriptors whichare robust to small transformations. It does so by averag-ing the signal over small transformations and then sampling

the signal at fixed locations to construct a feature out of it.For images oriented edge response has been shown to be agood signal. In practice the averaging of transforming canbe modelled by convolving the signal with a kernel whichweighs the contribution of neighbouring signals at a point.Gaussian Kernels with the support increasing linearly withthe distance were used in. We use the same in this work.

2.1. Evaluation of GB features

In this section we find the best set of parameters for thetask as proposed in the work of Mikolajczyk , et. al.[2].Briefly this datasets have images which are Affine trans-formations of one another. These datsets vary the rotation,scale, viewpoint, resolution and ligting conditions indepen-dently of one another. Ground truth homography betweenthe images is also provided. This dataset provides a plat-form to evaluate an descriptor’s performnace over the vary-ing conditions. In an earlier work Mikolajczyk[2] did acomparision of the descriptors on the same dataset whichgives us a baseline to compare the GB features with.

The tunable parameters of the geometric blur feature in-clude the number of channels, blur gradient, base blur andthe feature scale. In the experiments we tune the the blurgradient, base blur and the feature scales. We use orientededge energy as the edge response and keep the number ofedge channels fixed at four. The effect of each of these aredescribed as follows:

• Feature Scale(s)The harris affine detector returns ob-tained by performing the maxima of the laplace overthe scale space. The support of the feature descriptoris taken to be a multiple of the scale returned by thedetector to include some context around the blob likestructure. Thus, bigger the scale the more the contextaround the region. The support region is then normal-ized to a 51X51 image. Since GB features are not rota-tion invariant the image is steered along the dominantorientation and then the GB features are computed onthese normalized patches. The performance of the de-scriptor is computed at scales 3,5,7 and 9. We observethat the performance increases on increasing the scalein all the datasets.

• Base Blur (β) Fixing the scale(s=9) which performsthe best in previous step we test the performance atβ

1, 10, 20 and 40. We notice that increasing the baseblur consistently reduces the performance and the per-formance is best atβ = 1.

• Blur Gradient Fixing the scale(s=9) and BaseBlur(β=1) which performs the best in previous stepstest the performance at blur gradients 0, 0.5 and 1. Thebest performance is observed atα = 0.5.

The performance curves for varying set of parameters areshow in Figure 1. Empirically the parameters that work beston this dataset are s = 9 ,β = 1 andα = 0.5.

2.2. Comparison with other Feature De-scriptors

We use the evaluation framework similar to [3]. The re-call vs 1-precision graphs are plotted.

recall =#correct matches

correspondances, (1)

1− precision =#false matches

#false matches + #correct matches(2)

The correspondance is given by the ground truth. Wecompare the performance of the GB features with a otherfeatures like SIFT, Shape Context, spin Images, Image Mo-ments and Jet Descriptors. All the features which are notrotationally invariant were made invariant by steering thelocal feature patch along the dominant orientation direction.SIFT was observed to have given close to the best perfor-mance in an earlier experiment conducted on these datasets.We use the nearest neighbor within a threshold matchingmatching the descriptors. The recall vs. 1 - precision graphsare shown in Figure 2. We observe the following trends:

1. SIFT and Shape Context do better on wall, barkdatasets.

2. GB better on bikes, graf datasets

3. Both are Comparable on ubc, leuven, boat, treesdatasets

Notice that the performance difference is significant in1 and 2. One should take a moment to the reason for this.One difference between GB and SIFT is they way the gatherinformation from different frequencies in the image. SIFTdoing a histograms of gradients of is able to use the high fre-quency information explicitly. GB on the other hand whiledoes compute the edge energies, blurs this information lin-early with distance. The datasets in which SIFT does bet-ter has much more texture than while the ones in which GBdoes better less texture. Perhaps this correlates to why SIFTdoes better on more textured scenes, however one has to doconduct more experiments to establish this.

3. A Softer Notion of Matching

We consider a notion of matching based on an of align-ment of the features within two images. For different in-stances of the same object, the strong notion of homogra-phy does not give good correspondance of features as thereis a large intra class variation. Many clasification algorithms

use the the structure of the image for computing similarityfor example the framework of GB , Shape Contexts usingTSP for caracter etc. The performance of these algorithmsis dependent on detecting the right features at the right po-sitions. Ideally we would want the descriptor performanceto be better on such a softer notion of matching.

We take the caltech 101 [5] dataset which are annotatedwith the foreground masks. To compute the rough align-ment we take the contours of the two objects are find a scaleand translation which gives a maximal area overlap of thecontours. This optimization is non-covex so a rough esti-mate of the scale and translation is first computed using thebounding boxes and centroid of the contours and then theoverlap is improved using local search. This last step typi-cally improves the overlap by 5-10 % in most classes. Fig-ure 3 shows the alignemnent of first 15 images in the classto the first image and the average area overlap.

The classes are then scored in terms of the average align-ment of the first 15 images with the first image and the top8 classes are chosen. Figure 3 shows the best 8 and theworst 8 classes in terms of alignment in the Caltech 101dataset. Some categories have good alignment like soc-cer ball, dollar bill and buddha but the local features do notalign well due to widely varying internal texture. For e.g.the hexagonal patches of the football do not align though theexternal contour aligns perfectly. We ignore these classesinthis study. The relatively rigid objects give good alignmentwhile the animal and other non rigid classes are harder toalign based on the simple alignment model we use.

To establish a ground truth of the feature correspondencewe simply consider features of the images within a thresh-old distance(10 pixels) under the transformation to be amatch. This matching scheme can potentially match fea-tures which correspond to different ’local’ features if theyare sufficiently close, however it will capture most of thecorrect matches. The incorrect matches would affect a ’nontrivial’ descriptor more or less equally. Note this also resultsin lower recall scores. Figure 4 shows how the correspon-dence looks like for a few categories. Emprically we findthat the aligmnent gives reasonable matches, in most casesthe error is only a few pixles(5).

3.1. Comparison of Local Feature Detectors

We consider the following scale invariant detectors forcomparison:

• Harris detector finds points at a fixed scale.

• Harris Laplacedetector uses the scale-adapted Harrisfunction to localize points in scale-space. It then se-lects the points for which the Laplacian-of-Gaussianattains a maximum over scale.

• Hessian Laplacelocalizes points in space at the localmaxima of the Hessian determinant and in scale at thelocal maxima of the Laplacian-of-Gaussian.

• Harris/Hessian Affinedetector does an affine adapta-tion of the Harris/Hessian Laplace using the secondmoment matrix.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

(1)bikes (2)trees

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

(3)graffiti (4)wall

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

(5)bark (6)boat

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1−precision

frac

of c

orre

ct

s=3 b=01 a=0.5s=5 b=01 a=0.5s=7 b=01 a=0.5s=9 b=01 a=0.5s=9 b=10 a=0.5s=9 b=20 a=0.5s=9 b=40 a=0.5s=9 b=01 a=0.0s=9 b=01 a=1.0

(7)leuven (8)ubc

Figure 1. Effect of Scale ( s = 3, 5, 7, 9) blue curves, Base Blur ( β = 1, 10, 20, 40) green curves and Blur Gradient(α = 0, 0.5, 1) red curves.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−precision

frac

of c

orre

ct

Effect of scale − bikes

gbsiftscspinmomjla

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

Effect of scale − trees

gbsiftscspinmomjla

(1)bikes (2)trees

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1−precision

frac

of c

orre

ct

Effect of scale − graf

gbsiftscspinmomjla

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1−precision

frac

of c

orre

ct

Effect of scale − wall

gbsiftscspinmomjla

(3)graffiti (4)wall

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

Effect of scale − bark

gbsiftscspinmomjla

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1−precision

frac

of c

orre

ct

Effect of scale − boat

gbsiftscspinmomjla

(5)bark (6)boat

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1−precision

frac

of c

orre

ct

Effect of scale − leuven

gbsiftscspinmomjla

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1−precision

frac

of c

orre

ct

Effect of scale − ubc

gbsiftscspinmomjla

(7)leuven (8)ubc

Figure 2. Comparison with Other Features

50 100 150 200 250 300

50

100

150

200

250

300100 150 200

50

100

150

200

250

300 350 400

50

100

150

200

250

50 100 150 200 250

40

60

80

100

120

140

160

180

200

220

0 50 100 150 200 250

0

50

100

150

200

250

300

40 60 80 100 120 140 160 180 200 220

80

90

100

110

120

130

50 100 150 200 250 300

0

50

100

150

200

250

30040 60 80 100 120 140 160 180 200 220 240

40

60

80

100

120

140

0.9781 0.9717 0.9642 0.9490 0.9486 0.9483 0.9405 0.9223

0 50 100 150 200 250 300

20

40

60

80

100

0 20 40 60 80 100 120 140 160 180

0

20

40

60

80

100

120

140

160

180

200

100 150 200 250

50

100

150

200

250

50 100 150 200 250

100

120

140

160

180

200

220100 120 140 160 180 200 220

60

80

100

120

140

160

180

200

220

50 100 150 200 250

50

100

150

200

250

50 100 150 200 250 300

20

40

60

80

100

120

140

160

180

20050 100 150 200 250 300

60

80

100

120

0.7097 0.6934 0.6919 0.6658 0.6614 0.6444 0.6426 0.3318

Figure 3. Row 1,2:Top 8 Categories and their alignments ( left to right: yin yang,Faces easy, Faces, pizza, barrel,car side, stop sign, Motorbikes). Row 3,4:Worst 8 Categories and their alignments ( left to right: brontosauras,ceiling fan, emu, cary fish, seahorse, cougar body, may fly, wrench). The numbers denote the average frac-tional overlap of each class.

• Maximally Stable Exremal Regionsdetector finds re-gions such that pixels inside the MSER have eitherhigher (bright extremal regions) or lower (dark ex-tremal regions) intensity than all the pixels on its outerboundary.

• Uniform Detector(unif) - Select 500 points uniformlyon the edge maps by rejection sampling.

Figure 5 shows the performance of various interest pointdetectors for various amounts of distance thresholds(pixeldistance). It shows axis is the number of matches normal-ized by the product of the number of points in each im-age for various distance thresholds. We see that all thelocal feature detectors perform similarly with the HarrisAffine/Harris Laplace detector performing the best in mostcases. Note that in the case of strict matching version inthe Mikolaczyk’s framework MSER was shown to be mostrepeatable.

3.2. Comparison of the Feature Perfor-mance

We now can test the performance of the feature descrip-tors given the ground truth correspondence computed fromthe previous steps. We look at the following descriptors forcomparison with the Geometric blur descriptor.

• Scale Invariant Feature TransformationA local imageis path is divided into a grid (typically 4x4) and a ori-entation histogram is computed for each of these cells.

• Shape Contextscomputes the ditance and orientaionhistogram of other points relative to the interst point/

• Image MomentsThese compute the descriptors by tak-ing various higher order image moments.

• Jet Decriptors These are essentially higher orderderivatives of the image at the interest point

• Gradient Location and Orientaiton HistogramAs thename suggests it constructs a feature out of the imageusing the Histogram of location and Orientation in ofpoints in a window around the interest point.

Figure 6 shows the performance of various interest pointdescriptors as the recall vs 1-precision plots. We find thatthe SIFT, GB and SC features perform the best in mostcases. The GLOH detector which worked the best in theearlier dataset performs poorly in this dataset.

4. Conclusions and Future Work

In this work we provide a framework for comparing theperformance of features where the goal is object recogni-tion. We then perform a comparison study of the features onthe Caltech 101 dataset which is a widely used dataset forevaluating performance of object recognition algorithms.This is likely to change even more on the PASCAL datasetwhere there is even more variation in the classes. We findthat the popular interest point detectors do not perform aswell compared to the affine dataset of Mikolaczyk. Geomet-ric Blur , SIFT and Shape Context Features perform best onthese datasets. The performance of SIFT and SC are highlycorrelated while that of GB and SIFT are highly negativelycorrelated. We observe the same trend in the Affine datasettoo. A further investigation into the reasons for this wouldbe more insightful. A better alignment model, for e.g. thinplate splines could be used for getter a better ground truthfor the dataset. Including rotations would align the cate-gories like ceilingfan much better.

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

45

50

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

2 3 4 5 6 7 8 9 100

2

4

6

8

10

12

14

16

18

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

(1)yin yang (2)Faces

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

45

50

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

(3)Faces(easy) (4)Pizza

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

(5)Barrel (6)Car Side

2 3 4 5 6 7 8 9 100

5

10

15

20

25

30

35

40

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

2 3 4 5 6 7 8 9 100

10

20

30

40

50

60

70

80

90

error

frac

of c

orre

ct

hesaffharaffheslapharlapharmserunif

(7)Stop Sign (8)Motorbikes

Figure 5. Repeatability(fraction of correct Vs. Distance Thresho ld) score of various Interest Point Detectors.

0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

1−precision

frac

of c

orre

ctyin yang

siftscmomglohgbjet

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1−precision

frac

of c

orre

ct

Faces

siftscmomglohgbjet

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.05

0.1

0.15

0.2

0.25

1−precision

frac

of c

orre

ct

Faces Easy

siftscmomglohgbjet

0.95 0.96 0.97 0.98 0.99 10

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

1−precision

frac

of c

orre

ct

pizza

siftscmomglohgbjet

0.975 0.98 0.985 0.99 0.995 10

0.05

0.1

0.15

0.2

0.25

0.3

0.35

1−precision

frac

of c

orre

ct

barrel

siftscmomglohgbjet

0.75 0.8 0.85 0.9 0.95 10

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

1−precision

frac

of c

orre

ct

car side

siftscmomglohgbjet

0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1−precision

frac

of c

orre

ct

stop sign

siftscmomglohgbjet

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.02

0.04

0.06

0.08

0.1

0.12

0.14

1−precision

frac

of c

orre

ct

Motorbikes

siftscmomglohgbjet

Figure 6. Descriptor Performance on Caltech 101(recall vs. 1-pr ecision) score of various descriptors.

100 200 300 400 500 600 700 800 900 1000

50

100

150

200

250

300

350

Faces

100 200 300 400 500 600

50

100

150

car side

100 200 300 400 500 600

50

100

150

200

250

stopsign

50 100 150 200 250 300 350 400 450 500

50

100

150

Motorbikes

Figure 4. Ground Truth matches. We use the har-ris Affine detector with a distance threshold of 5pixels

5. Acknowledgements

Thanks to Prof Malik and Alex Berg for his code forGeometric Blur and interesting discussions. Thanks to Prof.Ruzena Bajsky, Prof. Shankar Sastry and Dr.Allen Yang forthe wonderful course.

References

[1] Harris-Affine , Hessian Affine: K. Mikolajczyk and C.Schmid, Scale and Affine invariant interest point detectors.In IJC V 60(1):63-86, 2004

[2] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J.Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, A com-parison of affine region detectors. In IJCV 65(1/2):43-72,2005.

[3] K. Mikolajczyk, C. Schmid, A performance evaluation oflocal descriptors. In PAMI 27(10):1615-1630

[4] Alexander C. Berg, Shape Matching and Object Recogni-tion, Ph.D. Thesis, Computer Science Division, U.C. Berke-ley, December 2005.

[5] L. Fei-Fei, R. Fergus and P. Perona. One-Shot learning ofobject categories. IEEE Trans. Pattern Recognition and Ma-chine Intelligence.