Multi-layer Orthogonal Codebook for Image Classification
description
Transcript of Multi-layer Orthogonal Codebook for Image Classification
![Page 1: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/1.jpg)
Multi-layer Orthogonal Codebook for Image Classification
Presented by Xia Li
![Page 2: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/2.jpg)
Outline
• Introduction– Motivation– Related work
• Multi-layer orthogonal codebook• Experiments• Conclusion
![Page 3: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/3.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
Dense, uniformly Sparse, at interest points
• For object categorization, dense sampling offers better coverage. [Nowak, Jurie & Triggs, ECCV 2006]
• Use orientation histograms within sub-patches to build 4*4*8=128 dim SIFT descriptor vector. [David Lowe, 1999, 2004]
Image credits: F-F. Li, E. Nowak, J. Sivic
Sampling:
Descriptor:
![Page 4: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/4.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
• Visual codebook construction– Supervised vs. Unsupervised clustering– k-means (typical choice), agglomerative
clustering, mean-shift,…• Vector Quantization via clustering
– Let cluster centers be the prototype “visual words”
Descriptor space
– Assign the closest cluster center to each new image patch descriptor.
Image credits: K. Grauman, B. Leibe
![Page 5: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/5.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
Bags of visual words
• Represent entire image based on its distribution (histogram) of word occurrences.
• Analogous to bag of words representation used for documents classification/retrieval.
Image credit: Fei-Fei Li
![Page 6: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/6.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
Image credit: S. Lazebnik [S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006]
![Page 7: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/7.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
Histogram intersection kernel:
Linear kernel:
Image credit: S. Lazebnik
![Page 8: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/8.jpg)
Image Classification
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
Image credit: S. Lazebnik [S. Lazebnik, C. Schmid, and J. Ponce, CVPR 2006]
![Page 9: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/9.jpg)
Motivation
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
• Codebook quality– Feature type– Codebook creation• Algorithm e.g. K-Means• Distance metric e.g. L2• Number of words
– Quantization process• Hard quantization: only one word is assigned
for each descriptor• Soft quantization: multi-words may be
assigned for each descriptor
![Page 10: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/10.jpg)
Motivation
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
• Quantization error– The Euclidean squared distance between a
descriptor vector and its mapped visual word
Hard quantization leads to large error
Effects of descriptor hard quantization – Severe drop indescriptor discriminative power. A scatter plot of descriptor discriminative power before and after quantization. The display is in logarithmic scale in both axes.
O. Boiman, E. Shechtman, M. Irani, CVPR 2008
![Page 11: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/11.jpg)
Motivation
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
• Codebook size is an important factor for applications that need efficiency– Simply enlarging codebook size can reduce
overall quantization error– but cannot guarantee every descriptor got
reduced error
codebook size percent of descriptorscodebook 128 vs. codebook 256 72.06%codebook 128 vs. codebook 512 84.18%
The right column is the percentage of descriptors whose quantization error is reduced when codebook size grows
![Page 12: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/12.jpg)
Motivation
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
• Good codebook for classification• Small individual quantization error ->
discriminative• Compact in size
– Contradict in some extent• Overemphasizing on discriminative ability may
increase the size of dictionary and weaken its generalization ability
• Over-compressing to a dictionary will more or less lose the information and its discriminative power
– Find a balance![X. Lian, Z. Li, C. Wang, B. lu, and L. Zhang, CVPR 2010]
![Page 13: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/13.jpg)
Related Work
• No quantization– NBNN [6]
• Supervised codebook– Probabilistic models [5]
• Unsupervised codebook– Kernel codebook [2]– Sparse coding [3]– Locality-constrained linear coding [4]
local feature extraction
visual codebook construction
vector quantization
spatial pooling
linear/nolinear classifier
![Page 14: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/14.jpg)
Multi-layer Orthogonal Codebook (MOC)
• Use standard K-Means to keep efficiency or any other clustering algorithm can be adopted
• Build codebook from residues to reduce quantization errors explicitly
![Page 15: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/15.jpg)
MOC Creation• First layer codebook– K-Means
• Residue:
N is the number of descriptors randomly sampled to build the codebook, di is one of the descriptors.
![Page 16: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/16.jpg)
MOC Creation• Orthogonal residue:
• Second layer codebook– K-Means
Third layer …
![Page 17: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/17.jpg)
Vector Quantization
• How to use MOC?– Kernel fusion: use them separately• Compute the kernels based on each layer codebook
separately• Let the final kernel to be the combination of multiple
kernels– Soft weighting: adjust weight for words from
different layers individually for each descriptor• Select the nearest word on each layer codebook for a
descriptor• Use the selected words from all layers to reconstruct that
descriptor and minimize reconstruction error
![Page 18: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/18.jpg)
Hard Quantization and Kernel Fusion (HQKF)
• Hard quantization on each layer– average pooling: descriptors in the m-th sub-region,
totally M sub-regions on an image, histogram for m-th sub-region is
• Histogram intersection kernel
……• Linear combine kernel values from each codebook
![Page 19: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/19.jpg)
Soft Weighting (SW)
• Weighting words for each descriptor
• Max pooling
• Linear kernelK is codebook size
![Page 20: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/20.jpg)
Soft Weighting (SW-NN)• To further consider the relationships between words from
multi-layers• Select 2 or more nearest words on each layer codebook, and
then weighting them to reconstruct the descriptor• Each descriptor is more accurately represented by multiple
words on each layer• The correlation between similar descriptors by sharing words
is captured
d1d2
[J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, CVPR 2010]
![Page 21: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/21.jpg)
Experiment
• Single feature type: SIFT– 16*16 pixel patches densely sampled over a grid
with spacing of 6 pixels• Spatial pyramid layer:– 21=16+4+1 sub-regions at three resolution level
• Clustering method on each layer: K-Means
![Page 22: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/22.jpg)
Datasets • Caltech-101– 101 categories, 31-800 images per category
• 15 Scenes– 15 scenes, 4485 images
![Page 23: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/23.jpg)
Quantization Error
• Quantization error is reduced more effectively by MOC compared with simply enlarging codebook size
• Experiment is done on Caltech101codebook size percent of descriptors
codebook 128 vs. codebook 256 72.06%codebook 128 vs. codebook 512 84.18%codebook 128 vs. codebook 128+128 91.22%codebook 256 vs. codebook 128+128 87.04%codebook 512 vs. codebook 128+128 63.80%
The right column is the percentage of descriptors whose quantization error is reduced when codebook changes
![Page 24: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/24.jpg)
Codebook Size
• Classification accuracy comparisons with single layer codebook
64 128 256 512 10240.58
0.6
0.62
0.64
0.66
0.68
0.7
0.72
0.74
single layer2-layerHQKF2-layerSW
Comparison with single codebook (Caltech101). 2-layer codebook has the same size on each layer which is also the same size as the single layer codebook.
![Page 25: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/25.jpg)
Comparisons with existing methods
Caltech101 15 Scenes
# of training 15 30 100SPM [1] 56.40 (200) 64.60 (200) 81.400.5 (1024)KC [2] - 64.14±1.18 76.670.39
ScSPM [3] 67.00.45 (1024) 73.20.54 (1024) 80.280.93 (1024)
LLC [4] 65.43 (2048) *73.44 (2048) -
HQKF 60.660.7(3-layer 512)
69.280.8(3-layer 512)
83.210.6(3-layer 1024)
SW 64.480.5(3-layer 512)
71.601.1(3-layer 512)
82.270.6(3-layer 1024)
SW+2NN 65.900.5(2-layer 1024)
72.970.8 (2-layer 1024)
-
• Classification accuracy comparisons with existing methods
Listed methods all used single type descriptor*only LLC used HoG instead of SIFT, we repeated their method with the type of descriptors we use, result is 71.63±1.2
![Page 26: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/26.jpg)
Conclusion
• Compared with existing methods, the proposed approach has the following merits: – 1) No complex algorithm and easy to implement.– 2) No time-consuming learning or clustering stage. Able to
be applied on large scale computer vision systems. – 3) Even more efficient than traditional K-Means clustering. – 4) Explicit residue minimization to explore discriminative
power of descriptors. – 5) The basic idea can be combined with many state-of-the-
art methods.
![Page 27: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/27.jpg)
References • [1] S. Lazebnik, C. Schmid, and J. Ponce, “Beyond bags of features:
Spatial pyramid matching for recognizing natural scene categories,” CVPR, pp. 2169 – 2178, 2006.
• [2] J. Gemert, J. Geusebroek, C. Veenman, and A. Smeulders, “Kernel codebooks for scene categorization,” ECCV, pp. 696-709, 2008.
• [3] J. Yang, K. Yu, Y. Gong, and T. Huang, “Linear spatial pyramid matching using sparse coding for image classification,” CVPR, pp. 1794-1801, 2009.
• [4] J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, and Y. Gong, “Locality-constrained linear coding for image classification,” CVPR, pp. 3360-3367, 2010.
• [5] X. Lian, Z. Li, C. Wang, B. Lu, and L. Zhang, “Probabilistic models for supervised dictionary learning,” CVPR, pp. 2305-2312, 2010.
• [6] O. Boiman, I. Rehovot, E. Shechtman, and M. Irani, “In defense of nearest-neighbor based image classification,” CVPR, pp. 1-8, 2008.
![Page 28: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/28.jpg)
• Thank you!
![Page 29: Multi-layer Orthogonal Codebook for Image Classification](https://reader036.fdocuments.us/reader036/viewer/2022062411/56816843550346895dde172d/html5/thumbnails/29.jpg)
Codebook Size
• Different size combination on 2-layer MOC
64 128 2560.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.7
0.71
64128256
Caltech101:The X-axis is the size of the 1st layer codebookDifferent colors represent the size of the 2nd layer codebook