Metric Learning for Large-Scale Image...

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

Motivation

Real-life image datasets are always evolving:• new images are added every second• new labels, tags, faces and products appear over time• for example: Facebook, Flickr, Twitter, Amazon. . .

Need to annotate these items for indexing and retrieval

Therefore, we are interested in methods for large-scalevisual classification where we can add new images andnew classes at near-zero cost on the fly

Outline

1. Introduction

2. Distance Based Classifiers

3. Metric learning for NCM Classifier

4. Experimental Evaluation

5. Conclusion

IntroductionRecent focus on large-scale image classification

• ImageNet data set [1]• Currently over 14 million images, and 20 thousand classes

Standard large-scale classification pipeline:• High dim. features: Super Vector [3] & Fisher Vector [4]• Linear 1-vs-Rest SVM classifiers [2,3,4]• Stochastic Gradient Descent (SGD) training [3,4]

→ In this work, we take features for granted and focus on thelearning problem.

1. Deng et al., ImageNet: A large-scale hierarchical image database, CVPR’092. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’103. Lin et al., Large-scale image classification: Fast feature extraction, CVPR’114. Sanchez and Perronnin, High-dimensional signature compression for large-scale

image classification, CVPR’11

Challenges of open-ended datasets1-vs-Rest + SGD might look ideal for our problem:

• 1-vs-Rest: classes are trained independently• SGD: online algorithm can accomodate new data

Still several issues need to be addressed:• Given a new sample, feed it to all classifiers?→ costly and suboptimal [1]

• How to balance the negatives and positives?• How to regularize (and choose the step-size)?

→We turn to distance-based classifiers.

1. Perronnin et al., Towards good practice in large-scale learning for imageclassification, CVPR’12

Outline

1. Introduction

5. Conclusion

Distance Based Classifiers

Classify based on the distance between images, orbetween image and class-representatives:

• k-Nearest Neighbors• Nearest Class Mean Classification

Trivial addition of new images or new classes

Critically depends on the distance function

k-Nearest Neighbor ClassifierAssign an image i to the most common class among the kclosest images from the training set

3 Very flexible non-linear model

3 Easy to integrate new images

3 Easy to integrate new classes

7 Expensive at test time!

Metric Learning: Large Margin Nearest Neighbors [1]

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’06

Nearest Class Mean ClassifierAssign an image i to the class with the closest class mean

µc =1

∑i:yi=c

c∗ = argminc

d(x ,µc)

3 Very fast at test time: linear model

7 Class only represented with mean,not flexible enough?

We introduce metric learning

µc =1

∑i:yi=c

c∗ = argminc

d(x ,µc)

µc =1

∑i:yi=c

c∗ = argminc

d(x ,µc)

Outline

1. Introduction

5. Conclusion

Mahalanobis Distance Learning

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

1. M = I Euclidean distance• Likely to be suboptimal

2. M : D × D Full Mahalanobis distance• Huge number of parameters for large D• Expensive to compute distances in O

3. M = W>W Low-Rank Projection W : m × D• Controllable number of parameters: m × D• Allows for compression of images to only m dimensions• Cheap computation of distances in O

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

d(x ,x ′) = (x − x ′)>M(x − x ′)

dW (x ,x ′) = ||Wx −Wx ′||22

NCM Metric Learning (NCMML)

Probabilistic formulation using the soft-min function:

p(c|x) =exp−dW (x ,µc)∑C

c′=1 exp−dW (x ,µc′)

Corresponds to class posterior in generative model:→ p(x |c) = N (x ; µc ,Σ), with shared covariance matrix

Crucial point: parameters W and {µc , c = 1, . . . ,C} can belearned independently on different data subsets.

NCM Metric Learning (NCMML)

Discriminative maximum likelihood training:• We maximize with respect to W :

L(W ) =N∑

ln p(yi |x i )

• Implicit regularization through the rank of W

Stochastic Gradient Descent (SGD): at time t• Pick a random sample (x t , yt )• Update:

W (t) = W (t−1) + ηt∇W=W (t−1) ln p(yt |xt )

→ mini-batch more efficient

Illustration of Learned Distances

Relationship to FDAThree non-linearly separable classes

Relationship to FDAFisher Discriminant Analysis: maximizes variance betweenall class means

Relationship to FDANCMML: maximizes variance between nearby class means

Relation to other linear classifiers

fc(x) = bc + wc>x

Linear SVM• Learn {bc ,wc} per class

WSABIE [1]• wc = vcW W ∈ Rd×D

• Learn {vc} per class and shared W

Nearest Class Mean• bc = ||Wµc ||22, wc = −2

(µc>W>W

)• Learn shared W

1. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

Outline

1. Introduction

5. Conclusion

Experimental Evaluation

Data sets:• ILSVRC’10: classes = 1,000, images = 1.2M training + 50K

validation + 150K test• INET10K: classes ≈ 10K, images = 4.5M training + 50K

validation + 4.5M test

Features:• 4K and 64K dimensional Fisher Vectors [1]• PQ Compression on 64K features [2]

1. Perronnin et al., Improving the Fisher kernel for image classification, ECCV’102. Jegou et al., Product quantization for nearest neighbor search, PAMI’11

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NN

NCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0

Baseline: 1-vs-Rest SVM 61.8

2. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

Evaluation: ILSVRC’10 (Top 5 acc.)k-NN & NCM improve with metric learningNCM outperforms more flexible k-NNNCM competitive with SVM and WSABIE

4K Fisher VectorsProjection dimensionality 256 512 1024 `2

k-NN, LMNN [1] - dynamic 61.0 60.9 59.6 44.1NCM, learned metric 62.6 63.0 63.0 32.0WSABIE [2] 61.6 61.3 61.5

Baseline: 1-vs-Rest SVM 61.8

1. Weinberger et al., Distance Metric Learning for LMNN Classification, NIPS’062. Weston et al., Scaling up to large vocabulary image annotation, IJCAI’11

Generalization on INET10K (Top 1 acc.)Nearest Class Mean Classifier

• Compute means of 10K classes, in about 1 CPU hour• Re-use metric learned on ILSVRC’10

1-vs-Rest SVM baseline• Train 10K SVM classifiers, in about 280 CPU days

Feat. dim. 64K 21K 128K ≈ 60KMethod NCM SVM SVM [1] SVM [2] DL [3]

Flat top-1 13.9 21.9 6.4 19.1 19.2

1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’102. Perronnin et al., Good practice in large-scale image classification, CVPR’123. Le et al., Building high-level features using large scale unsupervised learning,

ICML’12

Generalization on INET10K (Top 1 acc.)Nearest Class Mean Classifier

• Compute means of 10K classes, in about 1 CPU hour• Re-use metric learned on ILSVRC’10

1-vs-Rest SVM baseline• Train 10K SVM classifiers, in about 280 CPU days

Feat. dim. 64K 21K 128K ≈ 60KMethod NCM SVM SVM [1] SVM [2] DL [3]

Flat top-1 13.9 21.9 6.4 19.1 19.2

1. Deng et al., What does classifying 10,000 image categories tell us?, ECCV’102. Perronnin et al., Good practice in large-scale image classification, CVPR’123. Le et al., Building high-level features using large scale unsupervised learning,

ICML’12

Transfer Learning - Zero-Shot PriorUse ImageNet class hiearchy to estimate a mean, [1]

Internal nodes — Training nodes — New class

1. Rohrbach et al., Evaluating knowledge transfer and zero-shot learning in alarge-scale setting, CVPR’11

Transfer Learning - Results ILSVRC’10

Step 1 Metric learning on 800 classesStep 2 Estimate means for remaining 200 for evaluation:

• Data mean (Maximum Likelihood)• Zero-Shot prior + data mean (Maximum a Posteriori)

0 1 10 100 10000

Number of samples per class

accura

Outline

1. Introduction

5. Conclusion

ConclusionNearest Class Mean (NCM) Classification

We proposed NCM Metric LearningOutperforms k-NN, on par with SVM and WSABIE

Advantages of NCM over alternatives:Allows adding new images and classes at near zero costShows competitive results on unseen classesCan benefit from class priors for small sample sizes

Further improvementsExtension using multiple class centroids [1]

1. Mensink et al., Large Scale Metric Learning for Distance-Based ImageClassification, Tech-report, 2012

Metric Learning for Large-Scale ImageClassification:Generalizing to New Classes at Near-Zero Cost

Florent Perronnin1

work published at ECCV 2012 with:Thomas Mensink1,2 Jakob Verbeek2 Gabriela Csurka1

1 Xerox Research Centre Europe, 2 INRIA

NIPS BigVision WorkshopDecember 7, 2012

Metric Learning for Large-Scale Image...

Documents

Transcript of Metric Learning for Large-Scale Image...

VisualRank : Applying PageRank to Large-Scale Image Search

Indexing in Large Scale Image Collections: Scaling - CiteSeerX

ImageNet : A Large-Scale Hierarchical Image Database

Towards Large-Scale Multi-Modal Image Search

Efﬁcient Large-Scale Image Annotation by …Image Annotation, Collaborative Multi-label Propagation 1. INTRODUCTION For many applications like image annotation, especially in large-scale

Using MapReduce for Large–scale Medical Image Analysis

Supplemental Material: From Large Scale Image ...vicente/files/entrylevel_supplemental.pdf · From Large Scale Image Categorization to Entry-Level Categories ... agama tusker white

Large-Scale Image and Video Processing

Weak Attributes for Large-Scale Image Retrievalyuxinnan/index.files/pdf/felix_yu_attribute... · Weak Attributes for Large-Scale Image Retrieval ... ly do not have direct correspondence

Efﬁcient Large-Scale Image Annotation by Probabilistic ...lms.comp.nus.edu.sg/sites/default/files/mm10-xiangyu.pdf · Efﬁcient Large-Scale Image Annotation by Probabilistic Collaborative

High-Performance Large-Scale Image Recognition Without ...

Measures of Complexity for Large Scale Image Datasets

ImageNet : A Large-Scale Hierarchical Image Database

ImageNet: A Large-Scale Hierarchical Image Databasewordnet.cs.princeton.edu/papers/imagenet_cvpr09.pdf · ImageNet: A Large-Scale Hierarchical Image Database Jia Deng, Wei Dong, Richard

Improving the Fisher Kernel for Large-Scale Image ...

IMAGE RECALL USING A LARGE SCALE GENERALIZED BRAIN ...

Dynamic and Scalable Large Scale Image Reconstruction · 2014-06-20 · Dynamic and Scalable Large Scale Image Reconstruction Christoph Strecha CVLab EPFL Christoph.Strecha@epfl.ch

Large-Scale Image Classification using High Performance Clusteringdsc.soic.indiana.edu/publications/Large-Scale_Image... · · 2014-02-21Large-Scale Image Classification using High

Large-scale Image Classiﬁcation: Fast ... - LMU Munich

LARGE-SCALE IMAGE COLLECTION CLEANSING, …libres.uncg.edu/ir/uncc/f/Yang_uncc_0694D_10369.pdf · iii ABSTRACT CHUNLEI YANG. Large-scale image collection cleansing, summarization