Bag-of-features for category recognition
description
Transcript of Bag-of-features for category recognition
![Page 1: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/1.jpg)
Bag-of-features for category recognition
Cordelia Schmid
![Page 2: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/2.jpg)
Bag-of-features for image classification
• Origin: texture recognition• Texture is characterized by the repetition of basic elements or
textons
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001;
Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 3: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/3.jpg)
Texture recognition
Universal texton dictionary
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 4: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/4.jpg)
Bag-of-features for image classification
• Origin: bag-of-words• Orderless document representation: frequencies of words from a
dictionary• Classification to determine document categories
![Page 5: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/5.jpg)
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]
![Page 6: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/6.jpg)
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]
Step 1 Step 2 Step 3
![Page 7: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/7.jpg)
Bag-of-features for image classification
• Excellent results in the presence of background clutter
bikes books building cars people phones trees
![Page 8: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/8.jpg)
Books- misclassified into faces, faces, buildings
Buildings- misclassified into faces, trees, trees
Cars- misclassified into buildings, phones, phones
Examples for misclassified images
![Page 9: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/9.jpg)
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
[Nowak,Jurie&Triggs,ECCV’06], [Zhang,Marszalek,Lazebnik&Schmid,IJCV’07]
Step 1 Step 2 Step 3
![Page 10: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/10.jpg)
Step 1: feature extraction
• Scale-invariant image regions + SIFT– Selection of characteristic points
Harris-Laplace Laplacian
![Page 11: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/11.jpg)
Step 1: feature extraction
• Scale-invariant image regions + SIFT– Robust description of the extracted image regions
gradient3D histogram
image patch
y
x
• SIFT [Lowe’99]– 8 orientations of the gradient
– 4x4 spatial grid
![Page 12: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/12.jpg)
Step 1: feature extraction
• Scale-invariant image regions + SIFT– Selection of characteristic points – Robust description of these characteristic points– Affine invariant regions give “too” much invariance– Rotation invariance in many cases “too” much invariance
![Page 13: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/13.jpg)
Step 1: feature extraction
• Scale-invariant image regions + SIFT– Selection of characteristic points – Robust description of these characteristic points– Affine invariant regions give “too” much invariance– Rotation invariance in many cases “too” much invariance
• Dense descriptors – Improve results in the context of categories (for most categories)– Interest points do not necessarily capture “all” features
![Page 14: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/14.jpg)
Dense features
- Multi-scale dense grid: extraction of small overlapping patches at multiple scales
- Computation of the SIFT descriptor for each grid cells
![Page 15: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/15.jpg)
Step 1: feature extraction
• Scale-invariant image regions + SIFT (see lecture 2)
• Dense descriptors
• Color-based descriptors
• Shape-based descriptors
![Page 16: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/16.jpg)
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
Step 1 Step 2 Step 3
![Page 17: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/17.jpg)
Step 2: Quantization
…
![Page 18: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/18.jpg)
Step 2:QuantizationStep 2:Quantization
Clustering
![Page 19: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/19.jpg)
Step 2: QuantizationStep 2: Quantization
Clustering
Visual vocabulary
![Page 20: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/20.jpg)
Examples for visual words
Airplanes
Motorbikes
Faces
Wild Cats
Leaves
People
Bikes
![Page 21: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/21.jpg)
Step 2: Quantization
• Cluster descriptors– K-mean – Gaussian mixture model
• Assign each visual word to a cluster– Hard or soft assignment
• Build frequency histogram
![Page 22: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/22.jpg)
K-means clustering
• We want to minimize sum of squared Euclidean distances between points xi and their nearest cluster centers
Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:
– Assign each data point to the nearest center– Recompute each cluster center as the mean of all points
assigned to it
![Page 23: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/23.jpg)
K-means clustering
• Local minimum, solution dependent on initialization
• Initialization important, run several times– Select best solution, min cost
![Page 24: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/24.jpg)
From clustering to vector quantization
• Clustering is a common method for learning a visual vocabulary or codebook– Unsupervised learning process– Each cluster center produced by k-means becomes a
codevector– Provided the training set is sufficiently representative, the
codebook will be “universal”
• The codebook is used for quantizing features– A vector quantizer takes a feature vector and maps it to the
index of the nearest codevector in a codebook– Codebook = visual vocabulary– Codevector = visual word
![Page 25: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/25.jpg)
Visual vocabularies: Issues
• How to choose vocabulary size?– Too small: visual words not representative of all patches– Too large: quantization artifacts, overfitting
• Computational efficiency– Vocabulary trees
(Nister & Stewenius, 2006)
• Soft quantization: Gaussian
mixture instead of k-means
![Page 26: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/26.jpg)
Hard or soft assignment
• K-means hard assignment – Assign to the closest cluster center – Count number of descriptors assigned to a center
• Gaussian mixture model soft assignment– Estimate distance to all centers– Sum over number of descriptors
• Frequency histogram
![Page 27: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/27.jpg)
Image representationImage representation
…..
freq
uenc
y
codewords
![Page 28: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/28.jpg)
Bag-of-features for image classification
Classification
SVM
Extract regions Compute descriptors
Find clusters and frequencies
Compute distance matrix
Step 1 Step 2 Step 3
![Page 29: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/29.jpg)
Step 3: Classification
• Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes
Zebra
Non-zebra
Decisionboundary
![Page 30: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/30.jpg)
Classification
• Assign input vector to one of two or more classes• Any decision rule divides input space into decision regions
separated by decision boundaries
![Page 31: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/31.jpg)
Nearest Neighbor Classifier
• Assign label of nearest training data point to each test data point
Voronoi partitioning of feature space for 2-category 2-D and 3-D data
from Duda et al.
Source: D. Lowe
![Page 32: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/32.jpg)
• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify• Works well provided there is lots of data and the distance function is
good
K-Nearest Neighbors
k = 5
Source: D. Lowe
![Page 33: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/33.jpg)
Linear classifiers
• Find linear function (hyperplane) to separate positive and negative examples
0:negative
0:positive
b
b
ii
ii
wxx
wxx
Which hyperplaneis best?
SVM (Support vector machine)
![Page 34: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/34.jpg)
Functions for comparing histograms
• L1 distance
• χ2 distance
• Quadratic distance (cross-bin)
N
i
ihihhhD1
2121 |)()(|),(
N
i ihih
ihihhhD
1 21
221
21 )()(
)()(),(
ji
ij jhihAhhD,
22121 ))()((),(
![Page 35: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/35.jpg)
Kernels for bags of features
• Histogram intersection kernel:
• Generalized Gaussian kernel:
• D can be Euclidean distance, χ2 distance, Earth Mover’s Distance, etc.
N
i
ihihhhI1
2121 ))(),(min(),(
2
2121 ),(1
exp),( hhDA
hhK
![Page 36: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/36.jpg)
Chi-square kernel
•Multi-channel chi-square kernel
● Channel c is a combination of detector, descriptor
● is the chi-square distance between histograms
● is the mean value of the distances between all training sample
● Extension: learning of the weights, for example with MKL
),( jic HHD
cA
m
i iiiic hhhhHHD1 21
22121 )]()([
2
1),(
![Page 37: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/37.jpg)
Pyramid match kernel
• Weighted sum of histogram intersections at multiple resolutions (linear in the number of features instead of cubic)
optimal partial matching between sets
of features
![Page 38: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/38.jpg)
Pyramid Match
Histogram intersection
![Page 39: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/39.jpg)
Difference in histogram intersections across levels counts number of new pairs matched
matches at this level matches at previous level
Histogram intersection
Pyramid Match
![Page 40: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/40.jpg)
Pyramid match kernel
• Weights inversely proportional to bin size
• Normalize kernel values to avoid favoring large sets
measure of difficulty of a match at level i
histogram pyramids
number of newly matched pairs at level i
![Page 41: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/41.jpg)
Example pyramid matchLevel 0
![Page 42: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/42.jpg)
Example pyramid matchLevel 1
![Page 43: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/43.jpg)
Example pyramid matchLevel 2
![Page 44: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/44.jpg)
Example pyramid match
pyramid match
optimal match
![Page 45: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/45.jpg)
Summary: Pyramid match kernel
optimal partial matching between sets
of features
number of new matches at level idifficulty of a match at level i
![Page 46: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/46.jpg)
Spatial pyramid matching
• Add spatial information to the bag-of-features
• Perform matching in 2D image space
[Lazebnik, Schmid & Ponce, CVPR 2006]
![Page 47: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/47.jpg)
Related work
Szummer & Picard (1997) Lowe (1999, 2004) Torralba et al. (2003)
GistSIFT
Similar approaches:
Subblock description [Szummer & Picard, 1997]
SIFT [Lowe, 1999]
GIST [Torralba et al., 2003]
![Page 48: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/48.jpg)
Locally orderless representation at several levels of spatial resolution
level 0
Spatial pyramid representation
![Page 49: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/49.jpg)
Spatial pyramid representation
level 0 level 1
Locally orderless representation at several levels of spatial resolution
![Page 50: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/50.jpg)
Spatial pyramid representation
level 0 level 1 level 2
Locally orderless representation at several levels of spatial resolution
![Page 51: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/51.jpg)
Spatial pyramid matching
• Combination of spatial levels with pyramid match kernel [Grauman & Darell’05]
![Page 52: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/52.jpg)
Scene classification
L Single-level Pyramid
0(1x1) 72.2±0.6
1(2x2) 77.9±0.6 79.0 ±0.5
2(4x4) 79.4±0.3 81.1 ±0.3
3(8x8) 77.2±0.4 80.7 ±0.3
![Page 53: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/53.jpg)
Retrieval examples
![Page 54: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/54.jpg)
Category classification – CalTech101
L Single-level Pyramid
0(1x1) 41.2±1.2
1(2x2) 55.9±0.9 57.0 ±0.8
2(4x4) 63.6±0.9 64.6 ±0.8
3(8x8) 60.3±0.9 64.6 ±0.7
Bag-of-features approach by Zhang et al.’07: 54 %
![Page 55: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/55.jpg)
CalTech101
Easiest and hardest classes
• Sources of difficulty:– Lack of texture– Camouflage– Thin, articulated limbs– Highly deformable shape
![Page 56: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/56.jpg)
Discussion
• Summary– Spatial pyramid representation: appearance of local
image patches + coarse global position information– Substantial improvement over bag of features– Depends on the similarity of image layout
• Extensions– Integrating different types of features, learning weights,
use of different grids [Zhang’07, Bosch & Zisserman’07, Varma et al.’07, Marszalek et al.’07]
– Flexible, object-centered grid
![Page 57: Bag-of-features for category recognition](https://reader036.fdocuments.us/reader036/viewer/2022062500/56815432550346895dc2309e/html5/thumbnails/57.jpg)
Evaluation of image classification
• Image classification task PASCAL VOC 2007-2009
• Precision – recall curves for evaluation
• Mean average precision