Metric Learning for Large-Scale Image...

Post on 16-Mar-2020

3 views 0 download

Transcript of Metric Learning for Large-Scale Image...

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

1

Motivation

Real-life image datasets are always evolving:• new images are added every second• new labels, tags, faces and products appear over time• for example: Facebook, Flickr, Twitter, Amazon. . .

Need to annotate these items for indexing and retrieval

Therefore, we are interested in methods for large-scalevisual classification where we can add new images andnew classes at near-zero cost on the fly

2

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

3

IntroductionRecent focus on large-scale image classification

• ImageNet data set [1]• Currently over 14 million images, and 20 thousand classes

Standard large-scale classification pipeline:• High dim. features: Super Vector [3] & Fisher Vector [4]• Linear 1-vs-Rest SVM classifiers [2,3,4]• Stochastic Gradient Descent (SGD) training [3,4]

→ In this work, we take features for granted and focus on thelearning problem.

1. Deng et al., ImageNet: A large-scale hierarchical image database, CVPR’092. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’103. Lin et al., Large-scale image classification: Fast feature extraction, CVPR’114. Sanchez and Perronnin, High-dimensional signature compression for large-scale

image classification, CVPR’11

4

Challenges of open-ended datasets1-vs-Rest + SGD might look ideal for our problem:

• 1-vs-Rest: classes are trained independently• SGD: online algorithm can accomodate new data

Still several issues need to be addressed:• Given a new sample, feed it to all classifiers?→ costly and suboptimal [1]

• How to balance the negatives and positives?• How to regularize (and choose the step-size)?

→We turn to distance-based classifiers.

1. Perronnin et al., Towards good practice in large-scale learning for imageclassification, CVPR’12

5

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

6

Distance Based Classifiers

Classify based on the distance between images, orbetween image and class-representatives:

• k-Nearest Neighbors• Nearest Class Mean Classification

Trivial addition of new images or new classes

Critically depends on the distance function

7

k-Nearest Neighbor ClassifierAssign an image i to the most common class among the kclosest images from the training set

3 Very flexible non-linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Expensive at test time!

Metric Learning: Large Margin Nearest Neighbors [1]

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

8

k-Nearest Neighbor ClassifierAssign an image i to the most common class among the kclosest images from the training set

3 Very flexible non-linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Expensive at test time!

Metric Learning: Large Margin Nearest Neighbors [1]

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

8

k-Nearest Neighbor ClassifierAssign an image i to the most common class among the kclosest images from the training set

3 Very flexible non-linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Expensive at test time!

Metric Learning: Large Margin Nearest Neighbors [1]

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

8

Nearest Class Mean ClassifierAssign an image i to the class with the closest class mean

µc =1

Nc

∑i:yi=c

x i

c∗ = argminc

d(x ,µc)

3 Very fast at test time: linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Class only represented with mean,not flexible enough?

We introduce metric learning

9

Nearest Class Mean ClassifierAssign an image i to the class with the closest class mean

µc =1

Nc

∑i:yi=c

x i

c∗ = argminc

d(x ,µc)

3 Very fast at test time: linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Class only represented with mean,not flexible enough?

We introduce metric learning

9

Nearest Class Mean ClassifierAssign an image i to the class with the closest class mean

µc =1

Nc

∑i:yi=c

x i

c∗ = argminc

d(x ,µc)

3 Very fast at test time: linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Class only represented with mean,not flexible enough?

We introduce metric learning

9

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

10

Mahalanobis Distance Learning

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

1. M = I Euclidean distance• Likely to be suboptimal

2. M : D × D Full Mahalanobis distance• Huge number of parameters for large D• Expensive to compute distances in O

(D2)

3. M = W>W Low-Rank Projection W : m × D• Controllable number of parameters: m × D• Allows for compression of images to only m dimensions• Cheap computation of distances in O

(m2)

11

Mahalanobis Distance Learning

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

1. M = I Euclidean distance• Likely to be suboptimal

2. M : D × D Full Mahalanobis distance• Huge number of parameters for large D• Expensive to compute distances in O

(D2)

3. M = W>W Low-Rank Projection W : m × D• Controllable number of parameters: m × D• Allows for compression of images to only m dimensions• Cheap computation of distances in O

(m2)

11

Mahalanobis Distance Learning

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

1. M = I Euclidean distance• Likely to be suboptimal

2. M : D × D Full Mahalanobis distance• Huge number of parameters for large D• Expensive to compute distances in O

(D2)

3. M = W>W Low-Rank Projection W : m × D• Controllable number of parameters: m × D• Allows for compression of images to only m dimensions• Cheap computation of distances in O

(m2)

11

NCM Metric Learning (NCMML)

Probabilistic formulation using the soft-min function:

p(c|x) =exp−dW (x ,µc)∑C

c′=1 exp−dW (x ,µc′)

Corresponds to class posterior in generative model:→ p(x |c) = N (x ; µc ,Σ), with shared covariance matrix

Crucial point: parameters W and {µc , c = 1, . . . ,C} can belearned independently on different data subsets.

12

NCM Metric Learning (NCMML)

Discriminative maximum likelihood training:• We maximize with respect to W :

L(W ) =N∑

i=1

ln p(yi |x i )

• Implicit regularization through the rank of W

Stochastic Gradient Descent (SGD): at time t• Pick a random sample (x t , yt )• Update:

W (t) = W (t−1) + ηt∇W=W (t−1) ln p(yt |xt )

→ mini-batch more efficient

13

Illustration of Learned Distances

14

Illustration of Learned Distances

14

Relationship to FDAThree non-linearly separable classes

15

Relationship to FDAFisher Discriminant Analysis: maximizes variance betweenall class means

15

Relationship to FDANCMML: maximizes variance between nearby class means

15

Relation to other linear classifiers

fc(x) = bc + wc>x

Linear SVM• Learn {bc ,wc} per class

WSABIE [1]• wc = vcW W ∈ Rd×D

• Learn {vc} per class and shared W

Nearest Class Mean• bc = ||Wµc ||22, wc = −2

(µc>W>W

)• Learn shared W

1. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

16

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

17

Experimental Evaluation

Data sets:• ILSVRC’10: classes = 1,000, images = 1.2M training + 50K

validation + 150K test• INET10K: classes ≈ 10K, images = 4.5M training + 50K

validation + 4.5M test

Features:• 4K and 64K dimensional Fisher Vectors [1]• PQ Compression on 64K features [2]

1. Perronnin et al., Improving the Fisher kernel for image classification, ECCV’102. Jegou et al., Product quantization for nearest neighbor search, PAMI’11

18

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NN

NCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0

Baseline: 1-vs-Rest SVM 61.8

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

2. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

19

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NNNCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0WSABIE [2] 61.6 61.3 61.5

Baseline: 1-vs-Rest SVM 61.8

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’062. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

19

Generalization on INET10K (Top 1 acc.)Nearest Class Mean Classifier

• Compute means of 10K classes, in about 1 CPU hour• Re-use metric learned on ILSVRC’10

1-vs-Rest SVM baseline• Train 10K SVM classifiers, in about 280 CPU days

Feat. dim. 64K 21K 128K ≈ 60KMethod NCM SVM SVM [1] SVM [2] DL [3]

Flat top-1 13.9 21.9 6.4 19.1 19.2

1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’102. Perronnin et al., Good practice in large-scale image classification, CVPR’123. Le et al., Building high-level features using large scale unsupervised learning,

ICML’12

20

Generalization on INET10K (Top 1 acc.)Nearest Class Mean Classifier

• Compute means of 10K classes, in about 1 CPU hour• Re-use metric learned on ILSVRC’10

1-vs-Rest SVM baseline• Train 10K SVM classifiers, in about 280 CPU days

Feat. dim. 64K 21K 128K ≈ 60KMethod NCM SVM SVM [1] SVM [2] DL [3]

Flat top-1 13.9 21.9 6.4 19.1 19.2

1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’102. Perronnin et al., Good practice in large-scale image classification, CVPR’123. Le et al., Building high-level features using large scale unsupervised learning,

ICML’12

20

Transfer Learning - Zero-Shot PriorUse ImageNet class hiearchy to estimate a mean, [1]

Internal nodes — Training nodes — New class

1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in alarge-scale setting, CVPR’11

21

Transfer Learning - Zero-Shot PriorUse ImageNet class hiearchy to estimate a mean, [1]

Internal nodes — Training nodes — New class

1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in alarge-scale setting, CVPR’11

21

Transfer Learning - Zero-Shot PriorUse ImageNet class hiearchy to estimate a mean, [1]

Internal nodes — Training nodes — New class

1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in alarge-scale setting, CVPR’11

21

Transfer Learning - Results ILSVRC’10

Step 1 Metric learning on 800 classesStep 2 Estimate means for remaining 200 for evaluation:

• Data mean (Maximum Likelihood)• Zero-Shot prior + data mean (Maximum a Posteriori)

0 1 10 100 10000

20

40

60

80

Number of samples per class

Top-5

accura

cy

22

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

23

ConclusionNearest Class Mean (NCM) Classification

We proposed NCM Metric LearningOutperforms k-NN, on par with SVM and WSABIE

Advantages of NCM over alternatives:Allows adding new images and classes at near zero costShows competitive results on unseen classesCan benefit from class priors for small sample sizes

Further improvementsExtension using multiple class centroids [1]

1. Mensink et al., Large Scale Metric Learning for Distance-Based ImageClassification, Tech-report, 2012

24

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

25