CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision...
Transcript of CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision...
![Page 1: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/1.jpg)
CMPSCI 670: Computer Vision!Introduction to machine learning
University of Massachusetts, Amherst October 29, 2014
Instructor: Subhransu Maji
![Page 2: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/2.jpg)
• Conclude “introduction to recognition” • Introduction to machine learning
• learning to recognize • machine learning framework • properties of learning algorithms
• Common datasets in computer vision
Today
2
![Page 3: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/3.jpg)
1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s – present: sliding window approaches Late 1990s: local features Early 2000s: parts-and-shape models Mid-2000s: bags of features Present trends: “big data”, context, attributes, combining geometry and recognition, advanced scene understanding tasks, deep learning
History of ideas in recognition
3
![Page 4: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/4.jpg)
The “gist” of a scene: Oliva & Torralba (2001)
Global appearance models revisited
4
http://people.csail.mit.edu/torralba/code/spatialenvelope/
![Page 5: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/5.jpg)
New applications in graphics
5J. Hays and A. Efros, Scene Completion using Millions of Photographs, SIGGRAPH 2007
![Page 6: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/6.jpg)
D. Hoiem, A. Efros, and M. Herbert, Putting Objects in Perspective, CVPR 2006
Geometric context
6
![Page 7: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/7.jpg)
Geometry and recognition
7
V. Hedau, D. Hoiem, and D. Forsyth, Recovering the Spatial Layout of Cluttered Rooms, ICCV 2009.
![Page 8: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/8.jpg)
Geometry and recognition
8
A. Gupta, A. Efros and M. Hebert, Blocks World Revisited: Image Understanding Using Qualitative Geometry and Mechanics, ECCV 2010
![Page 9: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/9.jpg)
Recognition from RGBD Images
9
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, Real-Time Human Pose Recognition in Parts from a Single Depth Image, CVPR 2011
![Page 10: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/10.jpg)
Attributes for recognition
10
A. Farhadi, I. Endres, D. Hoiem, and D Forsyth, Describing Objects by their Attributes, CVPR 2009
![Page 11: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/11.jpg)
Human “in the loop” recognition
11Branson S., Wah C., Babenko B., Schroff F., Welinder P., Perona P., Belongie S., “Visual Recognition with Humans in the Loop”, European Conference on Computer Vision (ECCV), Heraklion, Crete, Sept., 2010.
![Page 12: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/12.jpg)
• Large datasets become the norm (real world settings) • LabelMe, PASCAL VOC, ImageNet • Enable new machine learning methods (e.g., deep learning)
Crowdsourcing
12
http://www.blogging4jobs.com/hr/solve-your-workplace-issues-by-crowdsourcing/ amazon mechanical turk
![Page 13: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/13.jpg)
Fine-grained recognition
13
often confused
many related classes
![Page 14: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/14.jpg)
Understanding objects in detail
14
Vedaldi et al., CVPR 14
![Page 15: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/15.jpg)
Sentence generation
15
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, T. Berg, Baby Talk: Understanding and Generating Simple Image Descriptions, CVPR 2011
![Page 16: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/16.jpg)
NY Times article
Deep learning
16
![Page 17: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/17.jpg)
Recent deep learning breakthroughs…
17
96 filters learned in layer 1ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton NIPS 2014
![Page 18: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/18.jpg)
• Conclude “introduction to recognition” • Introduction to machine learning
• learning to recognize • machine learning framework • properties of learning algorithms
• Common datasets in computer vision
Today
18
![Page 19: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/19.jpg)
Recognition: A machine learning approach
19
![Page 20: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/20.jpg)
Apply a prediction function to a feature representation of the image to get the desired output:
f( ) = “apple” f( ) = “tomato” f( ) = “cow”
The machine learning framework
20
![Page 21: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/21.jpg)
y = f(x) !
!
Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
The machine learning framework
21
output prediction function
Image feature
![Page 22: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/22.jpg)
Prediction
Steps
Training LabelsTraining
Images
Training
Training
Image Features
Image Features
Testing
Test Image
Learned model
Learned model
Slide credit: D. Hoiem
![Page 23: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/23.jpg)
Features (examples)
23
Histograms, bags of features
GIST descriptors Histograms of oriented gradients(HOG)
Raw pixels (and simple functions of raw pixels)
![Page 24: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/24.jpg)
Classifiers: Nearest neighbor
f(x) = label of the training example nearest to x !All we need is a distance function for our inputs No training required!
Test example
Training examples
from class 1
Training examples
from class 2
![Page 25: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/25.jpg)
Classifiers: Linear
Find a linear function to separate the classes: !
f(x) = sgn(w ⋅ x + b)
![Page 26: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/26.jpg)
Classifier: Decision treesPlay tennis?
26
![Page 27: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/27.jpg)
Images in the training set must be annotated with the “correct answer” that the model is expected to produce
Contains a motorbike
Recognition task and supervision
![Page 28: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/28.jpg)
Unsupervised “Weakly” supervised Fully supervised
Definition depends on task
![Page 29: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/29.jpg)
Generalization
How well does a learned model generalize from the data it was trained on to a new test set?
Training set (labels known) Test set (labels unknown)
![Page 30: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/30.jpg)
Diagnosing generalization abilityTraining error: how well does the model perform at
prediction on the data on which it was trained? Test error: how well does it perform on a never
before seen test set?
Training and test error are both high: underfitting • Model does an equally poor job on the training and the
test set • Either the training procedure is ineffective or the model is
too “simple” to represent the data
Training error is low but test error is high: overfitting • Model has fit irrelevant characteristics (noise) in the
training data • Model is too complex or amount of training data is
insufficient
![Page 31: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/31.jpg)
Underfitting and overfitting
Underfitting OverfittingGood generalization
Figure source
![Page 32: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/32.jpg)
Effect of model complexity
Training error
Test error
Slide credit: D. Hoiem
Underfitting Overfitting
Model complexity HighLow
Err
or
![Page 33: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/33.jpg)
Effect of training set size
Many training examples
Few training examples
Model complexity HighLow
Test
Err
or
Slide credit: D. Hoiem
![Page 34: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/34.jpg)
Validation
Split the dataset into training, validation, and test sets
Use training set to optimize model parameters
Use validation test to choose the best model
Use test set only to evaluate performance
Model complexity
Training set loss
Test set loss
Error
Validation set loss
Stopping point
![Page 35: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/35.jpg)
DatasetsCirca 2001: five categories, hundreds of
images per category Circa 2004: 101 categories Today: up to thousands of categories, millions
of images
![Page 36: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/36.jpg)
Caltech 101 & 256
Griffin, Holub, Perona, 2007
Fei-Fei, Fergus, Perona, 2004
http://www.vision.caltech.edu/Image_Datasets/Caltech101/ http://www.vision.caltech.edu/Image_Datasets/Caltech256/
![Page 37: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/37.jpg)
Caltech-101: Intra-class variability
![Page 38: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/38.jpg)
The PASCAL Visual Object Classes Challenge (2005-2012)
• Challenge classes:Person: person Animal: bird, cat, cow, dog, horse, sheep Vehicle: aeroplane, bicycle, boat, bus, car, motorbike, train Indoor: bottle, chair, dining table, potted plant, sofa, tv/monitor !
• Dataset size (by 2012): 11.5K training/validation images, 27K bounding boxes, 7K segmentations
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 39: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/39.jpg)
PASCAL competitions
Classification: For each of the twenty classes, predicting presence/absence of an example of that class in the test image
Detection: Predicting the bounding box and label of each object from the twenty target classes in the test image
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 40: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/40.jpg)
PASCAL competitions
Segmentation: Generating pixel-wise segmentations giving the class of the object visible at each pixel, or "background" otherwise
Person layout: Predicting the bounding box and label of each part of a person (head, hands, feet)
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 41: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/41.jpg)
PASCAL competitions
Action classification (10 action classes)
http://pascallin.ecs.soton.ac.uk/challenges/VOC/
![Page 42: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/42.jpg)
Russell, Torralba, Murphy, Freeman, 2008
LabelMe Dataset http://labelme.csail.mit.edu/
![Page 44: CMPSCI 670: Computer Visionsmaji/cmpsci670/fa14/lectures/...• Common datasets in computer vision Today 2 1960s – early 1990s: the geometric era 1990s: appearance-based models 1990s](https://reader035.fdocuments.us/reader035/viewer/2022062604/5f75f71a7d1e2f475933757e/html5/thumbnails/44.jpg)
• Chapter 14 of Szeliski’s book • A good reference especially for discriminative learning:
Further thoughts and readings
44
The Elements of Statistical Learning T. Hastie and R Tibshirani (2001,2009) http://www.stanford.edu/~hastie/local.ftp/Springer/ESLII_print10.pdf