6.S093 Visual Recognition through Machine Learning Competition
6.S093 Visual Recognition through Machine Learning...
Transcript of 6.S093 Visual Recognition through Machine Learning...
![Page 1: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/1.jpg)
6.S093 Visual Recognition through Machine Learning Competition
Image by kirkh.deviantart.com
Aditya Khosla
![Page 2: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/2.jpg)
Today’s class
• Part 1: Competition details
• Part 2: Image representation lecture– Bag-of-words
– Spatial pyramid
• Part 3: Feature extraction tutorial
![Page 3: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/3.jpg)
Competition details: dataset
person
10 object categories
airplane bicycle car
cup/mug dog(s) guitar hamburger sofa trafficlight
![Page 4: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/4.jpg)
Competition details: dataset
Training set
8,000 images
Validation set
2,000 imagesTesting set
5,000 images
labels provided NO labels provided
Leaderboard set
![Page 5: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/5.jpg)
Competition details: submission
• For each image, you provide the probability of every class belonging in it (as returned by your algorithm)
airp
lan
e
bic
ycle car
cup
do
ggu
itar
ham
bu
rger
sofa
traf
fic
ligh
t
per
son
0
1
![Page 6: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/6.jpg)
Competition details: evaluation
• Average precision
![Page 7: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/7.jpg)
Competition details: prizes
Cas
h
first
+ cash
second third
+ cash
![Page 8: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/8.jpg)
Competition details: thank you!
![Page 9: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/9.jpg)
Image representation: bag-of-words
![Page 10: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/10.jpg)
Document representation: bag-of-words
• Order-less document representation: frequencies of words from a dictionary Salton & McGill (1983)
![Page 11: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/11.jpg)
Document representation: bag-of-words
• Order-less document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
![Page 12: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/12.jpg)
Document representation: bag-of-words
• Order-less document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
![Page 13: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/13.jpg)
Document representation: bag-of-words
• Order-less document representation: frequencies of words from a dictionary Salton & McGill (1983)
US Presidential Speeches Tag Cloud
![Page 14: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/14.jpg)
Image representation: bag-of-words
document
bag-of-words
![Page 15: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/15.jpg)
Image representation: bag-of-words
document
bag-of-words
image bag-of-visual words
![Page 16: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/16.jpg)
Object Bag of ‘words’
![Page 17: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/17.jpg)
ObjectUgly bag of
‘words’
![Page 18: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/18.jpg)
ObjectStylish bag of
‘words’
![Page 19: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/19.jpg)
ObjectStylish bag of
‘words’
![Page 20: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/20.jpg)
visual dictionary
![Page 21: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/21.jpg)
Image representation: bag-of-words
1. Extract descriptors
![Page 22: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/22.jpg)
Image representation: bag-of-words
1. Extract descriptors
2. Learn “visual dictionary”
![Page 23: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/23.jpg)
Image representation: bag-of-words
1. Extract descriptors
2. Learn “visual dictionary”
3. Quantize features using visual vocabulary
![Page 24: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/24.jpg)
Image representation: bag-of-words
1. Extract descriptors
2. Learn “visual dictionary”
3. Quantize features using visual vocabulary
![Page 25: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/25.jpg)
Image representation: bag-of-words
1. Extract descriptors
2. Learn “visual dictionary”
3. Quantize features using visual vocabulary
4. Represent images by frequencies of “visual words”
![Page 26: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/26.jpg)
1. Extracting descriptors
regular grid interest points
![Page 27: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/27.jpg)
Image representation: yesterdaygradient magnitude
gradient orientation
feature vector
![Page 28: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/28.jpg)
Image representation: yesterdaygradient magnitude
gradient orientation
descriptor
![Page 29: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/29.jpg)
2. Learning “visual dictionary”
Compute descriptor
![Page 30: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/30.jpg)
2. Learning “visual dictionary”
descriptors
…
![Page 31: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/31.jpg)
2. Learning visual dictionarydescriptors
…
![Page 32: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/32.jpg)
2. Learning visual dictionarydescriptors
…
Clustering
![Page 33: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/33.jpg)
2. Learning visual dictionarydescriptors
…
Clustering
visual vocabulary
![Page 34: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/34.jpg)
Example visual vocabulary
Fei-Fei et al. 2005
![Page 35: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/35.jpg)
Image patch examples
Sivic et al. 2005
![Page 36: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/36.jpg)
Image patch examples
Sivic et al. 2005
How to choose the vocabulary size?
![Page 37: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/37.jpg)
Bag-of-words: limitations
• What about the structure of the image?
=?
![Page 38: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/38.jpg)
Image representation: spatial pyramids
level 0
![Page 39: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/39.jpg)
Image representation: spatial pyramids
level 0 level 1
![Page 40: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/40.jpg)
Image representation: spatial pyramids
level 0 level 1 level 2
![Page 41: 6.S093 Visual Recognition through Machine Learning Competitionviscomp.csail.mit.edu/resource/slides/lecture3.pdf · •Part 1: Competition details •Part 2: Image representation](https://reader034.fdocuments.us/reader034/viewer/2022050517/5fa13a581f4af522244dd2b2/html5/thumbnails/41.jpg)
Tutorial