Review: Intro to recognition
description
Transcript of Review: Intro to recognition
![Page 1: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/1.jpg)
Review: Intro to recognition• Recognition tasks• Machine learning approach: training, testing,
generalization• Example classifiers
• Nearest neighbor• Linear classifiers
![Page 2: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/2.jpg)
Image features
• Spatial support:
Pixel or local patch Segmentation region
Bounding box Whole image
![Page 3: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/3.jpg)
Image features• We will focus mainly on global image
features for whole-image classification tasks• GIST descriptors• Bags of features• Spatial pyramids
![Page 4: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/4.jpg)
GIST descriptors• Oliva & Torralba (2001)
http://people.csail.mit.edu/torralba/code/spatialenvelope/
![Page 5: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/5.jpg)
Bags of features
![Page 6: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/6.jpg)
Origin 1: Texture recognition
• Texture is characterized by the repetition of basic elements or textons
• For stochastic textures, it is the identity of the textons, not their spatial arrangement, that matters
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 7: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/7.jpg)
Origin 1: Texture recognition
Universal texton dictionary
histogram
Julesz, 1981; Cula & Dana, 2001; Leung & Malik 2001; Mori, Belongie & Malik, 2001; Schmid 2001; Varma & Zisserman, 2002, 2003; Lazebnik, Schmid & Ponce, 2003
![Page 8: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/8.jpg)
Origin 2: Bag-of-words models• Orderless document representation: frequencies of words
from a dictionary Salton & McGill (1983)
![Page 9: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/9.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 10: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/10.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 11: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/11.jpg)
Origin 2: Bag-of-words models
US Presidential Speeches Tag Cloudhttp://chir.ag/projects/preztags/
• Orderless document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 12: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/12.jpg)
1. Extract local features2. Learn “visual vocabulary”3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”
Bag-of-features steps
![Page 13: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/13.jpg)
1. Local feature extraction
• Regular grid or interest regions
![Page 14: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/14.jpg)
Normalize patch
Detect patches
Compute descriptor
Slide credit: Josef Sivic
1. Local feature extraction
![Page 15: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/15.jpg)
…
1. Local feature extraction
Slide credit: Josef Sivic
![Page 16: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/16.jpg)
2. Learning the visual vocabulary
…
Slide credit: Josef Sivic
![Page 17: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/17.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
![Page 18: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/18.jpg)
2. Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
![Page 19: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/19.jpg)
Review: K-means clustering• Want to minimize sum of squared Euclidean
distances between features xi and their nearest cluster centers mk
Algorithm:• Randomly initialize K cluster centers• Iterate until convergence:
• Assign each feature to the nearest center• Recompute each cluster center as the mean of all features
assigned to it
k
ki
kiMXDcluster
clusterinpoint
2)(),( mx
![Page 20: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/20.jpg)
Example codebook
…
Source: B. Leibe
Appearance codebook
![Page 21: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/21.jpg)
Another codebook
Appearance codebook…
Source: B. Leibe
![Page 22: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/22.jpg)
1. Extract local features2. Learn “visual vocabulary”3. Quantize local features using visual vocabulary 4. Represent images by frequencies of “visual words”
Bag-of-features steps
![Page 23: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/23.jpg)
Visual vocabularies: Details• How to choose vocabulary size?
• Too small: visual words not representative of all patches• Too large: quantization artifacts, overfitting• Right size is application-dependent
• Improving efficiency of quantization• Vocabulary trees (Nister and Stewenius, 2005)
• Improving vocabulary quality• Discriminative/supervised training of codebooks• Sparse coding, non-exclusive assignment to codewords
• More discriminative bag-of-words representations• Fisher Vectors (Perronnin et al., 2007), VLAD (Jegou et al., 2010)
• Incorporating spatial information
![Page 24: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/24.jpg)
Bags of features for action recognition
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
Space-time interest points
![Page 25: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/25.jpg)
Bags of features for action recognition
Juan Carlos Niebles, Hongcheng Wang and Li Fei-Fei, Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words, IJCV 2008.
![Page 26: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/26.jpg)
Spatial pyramids
level 0
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 27: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/27.jpg)
Spatial pyramids
level 0 level 1
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 28: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/28.jpg)
Spatial pyramids
level 0 level 1 level 2
Lazebnik, Schmid & Ponce (CVPR 2006)
![Page 29: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/29.jpg)
Results: Scene category dataset
Multi-class classification results(100 training images per class)
![Page 30: Review: Intro to recognition](https://reader036.fdocuments.us/reader036/viewer/2022081513/56816714550346895ddb8076/html5/thumbnails/30.jpg)
Results: Caltech101 dataset
Multi-class classification results (30 training images per class)