Scalable Learning in Computer Vision
Transcript of Scalable Learning in Computer Vision
![Page 1: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/1.jpg)
Scalable Learning
in Computer Vision
Adam Coates
Honglak Lee
Rajat Raina
Andrew Y. Ng
Stanford University
![Page 2: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/2.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Computer Vision is Hard
![Page 3: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/3.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction
• One reason for difficulty: small datasets.
Common Dataset Sizes (positives per class)
Caltech 101 800
Caltech 256 827
PASCAL 2008 (Car) 840
PASCAL 2008 (Person) 4168
LabelMe (Pedestrian) 25330
NORB (Synthetic) 38880
![Page 4: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/4.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction
• But the world is complex.
– Hard to get extremely high accuracy on real images if we haven’t seen enough examples.
0.75
0.8
0.85
0.9
0.95
1
1.E+03 1.E+04
Test Error (Area Under Curve) – Claw Hammers
Training Set Size
AU
C
![Page 5: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/5.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Introduction
• Small datasets:
– Clever features
• Carefully design to be robust to lighting, distortion, etc.
– Clever models
• Try to use knowledge of object structure.
– Some machine learning on top.
• Large datasets:
– Simple features
• Favor speed over invariance and expressive power.
– Simple model
• Generic; little human knowledge.
– Rely on machine learning to solve everything else.
![Page 6: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/6.jpg)
SUPERVISED LEARNING
FROM SYNTHETIC DATA
![Page 7: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/7.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Need to scale up each part of the learning process to really large datasets.
![Page 8: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/8.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data
• Not enough labeled data for algorithms to learn all the knowledge they need.
– Lighting variation
– Object pose variation
– Intra-class variation
• Synthesize positive examples to include this knowledge.
– Much easier than building this knowledge into the algorithms.
![Page 9: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/9.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data
• Collect images of object on a green-screen turntable.
Green Screen image
Segmented Object
Synthetic Background
Photometric/Geometric Distortion
![Page 10: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/10.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Synthetic Data: Example
• Claw hammers:Synthetic Examples (Training set)
Real Examples (Test set)
![Page 11: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/11.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Feature computations can be prohibitive for large numbers of images.
– E.g., 100 million examples x 1000 features.
100 billion feature values to compute.
![Page 12: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/12.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Features on CPUs vs. GPUs
• Difficult to keep scaling features on CPUs.– CPUs are designed for general-purpose computing.
• GPUs outpacing CPUs dramatically.
(nVidia CUDA Programming Guide)
![Page 13: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/13.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Features on GPUs
• Features: Cross-correlation with image patches.
– High data locality; high arithmetic intensity.
• Implemented brute-force.
– Faster than FFT for small filter sizes.
– Orders of magnitude faster than FFT on CPU.
• 20x to 100x speedups (depending on filter size).
![Page 14: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/14.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• Large number of feature vectors on disk are too slow to access repeatedly.
– E.g., Can run an online algorithm on one machine, but disk access is a difficult bottleneck.
![Page 15: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/15.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Distributed Training
• Solution: must store everything in RAM.
• No problem!
– RAM as low as $20/GB
• Our cluster with 120GB RAM:
– Capacity of >100 million examples.
• For 1000 features, 1 byte each.
![Page 16: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/16.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Distributed Training
• Algorithms that can be trained from sufficient statistics are easy to distribute.
• Decision tree splits can be trained using histograms of each feature.
– Histograms can be computed for small chunks of data on separate machines, then combined.
+
Slave 2Slave 1 Master Masterx x x
=
Split
![Page 17: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/17.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
The Learning Pipeline
ImageData
LearningAlgorithm
Low-level features
• We’ve scaled up each piece of the pipeline by a large factor over traditional approaches:
> 1000x 20x – 100x > 10x
![Page 18: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/18.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Size Matters
0.75
0.8
0.85
0.9
0.95
1
1.E+03 1.E+04 1.E+05 1.E+06 1.E+07 1.E+08
Test Error (Area Under Curve) – Claw Hammers
Training Set Size
AU
C
![Page 19: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/19.jpg)
UNSUPERVISED FEATURE
LEARNING
![Page 20: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/20.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Traditional supervised learning
Testing:
What is this?
Cars Motorcycles
![Page 21: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/21.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Self-taught learning
Natural scenes
Testing:
What is this?
Car Motorcycle
![Page 22: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/22.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Learning representations
ImageData
LearningAlgorithm
Low-level features
• Where do we get good low-level representations?
![Page 23: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/23.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Computer vision features
SIFT Spin image
HoG RIFT
Textons GLOH
![Page 24: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/24.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Unsupervised feature learning
Input image (pixels)
“Sparse coding”
(edges; cf. V1)
[Related work: Hinton, Bengio, LeCun, and others.]
DBN (Hinton et al., 2006) with additional sparseness constraint.
Higher layer
(Combinations
of edges; cf.V2)
![Page 25: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/25.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Unsupervised feature
learning
Input image
Model V1
Higher layer
(Model V2?)
Higher layer
(Model V3?)
• Very expensive to train.> 1 million examples.> 1 million parameters.
![Page 26: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/26.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Learning Large RBMs on
GPUs
5 hours
2 weeks
GPU
Dual-core CPU
Learning time for
10 million examples
(log scale)
Millions of parameters1 18 36 45
8 hours
½ hour
2 hours
35 hours
1 hour
1 day
1 week
(Rajat Raina, Anand Madhavan, Andrew Y. Ng)
72x faster
![Page 27: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/27.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Pixels
Edges
Object parts(combination of edges)
Object models
Learning features
• Can now train very complex networks.
• Can learn increasingly complex features.
• Both more specific and more general-purpose than hand-engineered features.
![Page 28: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/28.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
Conclusion
• Performance gains from large training sets are significant, even for very simple learning algorithms.– Scalability of the system allows these algorithms to
improve “for free” over time.
• Unsupervised algorithms promise high-quality features and representations without the need for hand-collected data.
• GPUs are a major enabling technology.
![Page 29: Scalable Learning in Computer Vision](https://reader034.fdocuments.us/reader034/viewer/2022042715/559e9ea01a28abd3048b4719/html5/thumbnails/29.jpg)
Adam Coates, Honglak Lee, Rajat Raina, Andrew Y. Ng – Stanford University
THANK YOU