&KDSWHU 'LDJQRVLVDQG0DQDJHPHQW · 1 1 1 1 1 1 ¢1 1 1 ï1 1 1 1 1 1
CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization...
Transcript of CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization...
![Page 1: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/1.jpg)
CS 1699: Intro to Computer Vision
Bias-Variance Trade-off +Other Models and Problems
Prof. Adriana KovashkaUniversity of Pittsburgh
November 3, 2015
![Page 2: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/2.jpg)
Outline
• Support Vector Machines (review + other uses)
• Bias-variance trade-off
• Scene recognition: Spatial pyramid matching
• Other classifiers– Decision trees
– Hidden Markov models
• Other problems– Clustering: agglomerative clustering
– Dimensionality reduction
![Page 3: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/3.jpg)
Lines in R2
0 bxw
c
aw
y
xx
0 bcyax
Let
w
00, yx
D
w
xw ||
22
00 b
ca
bcyaxD
distance from
point to line
Kristen Grauman
![Page 4: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/4.jpg)
Support vector machines
• Want line that maximizes the margin.
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
Support vectors
For support, vectors, 1 bi wx
Distance between point
and line: ||||
||
w
wx bi
www
211
M
ww
xw 1
bΤ
For support vectors:
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
Margin
![Page 5: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/5.jpg)
Finding the maximum margin line
1. Maximize margin 2/||w||
2. Correctly classify all training data points:
Quadratic optimization problem:
Minimize
Subject to yi(w·xi+b) ≥ 1
wwT
2
1
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
One constraint for each
training point.
Note sign trick.
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 6: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/6.jpg)
Finding the maximum margin line
• Solution:
b = yi – w·xi (for any support vector)
• Classification function:
• Notice that it relies on an inner product between the test
point x and the support vectors xi
• (Solving the optimization problem also involves
computing the inner products xi · xj between all pairs of
training points)
i iii y xw
by
xf
ii
xx
xw
i isign
b)(sign )(
If f(x) < 0, classify as negative, otherwise classify as positive.
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 7: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/7.jpg)
• Datasets that are linearly separable work out great:
• But what if the dataset is just too hard?
• We can map it to a higher-dimensional space:
0 x
0 x
0 x
x2
Nonlinear SVMs
Andrew Moore
![Page 8: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/8.jpg)
Examples of kernel functions
Linear:
Gaussian RBF:
Histogram intersection:
)2
exp()(2
2
ji
ji
xx,xxK
k
jiji kxkxxxK ))(),(min(),(
j
T
iji xxxxK ),(
Andrew Moore
![Page 9: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/9.jpg)
Allowing misclassifications
Maximize margin Minimize misclassification
Slack variable
The w that minimizes…
Misclassification cost
# data samples
![Page 10: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/10.jpg)
What about multi-class SVMs?
• Unfortunately, there is no “definitive” multi-class SVM formulation
• In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs
• One vs. others– Training: learn an SVM for each class vs. the others– Testing: apply each SVM to the test example, and assign it to the
class of the SVM that returns the highest decision value
• One vs. one– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to the
test example
Svetlana Lazebnik
![Page 11: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/11.jpg)
Evaluating Classifiers
• Accuracy
– # correctly classified / # all test examples
• Precision/recall
– Precision = # predicted true pos / # predicted pos
– Recall = # predicted true pos / # true pos
• F-measure
= 2PR / (P + R)
• Want evaluation metric to be in some range, e.g. [0 1]
– 0 = worst possible classifier, 1 = best possible classifier
![Page 12: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/12.jpg)
Precision / Recall / F-measure
• Precision = 2 / 5 = 0.4
• Recall = 2 / 4 = 0.5
• F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44
True positives
(images that contain people)
True negatives
(images that do not contain people)
Predicted positives
(images predicted to contain people)
Predicted negatives
(images predicted not to contain people)
Accuracy: 5 / 10 = 0.5
![Page 13: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/13.jpg)
Support Vector Regression
![Page 14: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/14.jpg)
Regression
Regression is like classification except the labels are real valued
Example applications:
‣ Stock value prediction
‣ Income prediction
‣ CPU power consumption
Subhransu Maji
![Page 15: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/15.jpg)
Regularized Error Function for Regression
N
n
nn wty1
22
2}{
N
n
nn wtxyEC1
2
2))((
In linear regression, we minimize the error function:
Use the Є-insensitive error function:
An example of Є-insensitive error function:
Adapted from Huan Liu
|)(|
0
yxfE
E |)(| yxffor
otherwise
true value predicted value
![Page 16: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/16.jpg)
Slack Variables for Regression
nnn yty
For a target point to lie
inside the tube:
Introduce slack variables
to allow points to lie
outside the tube:
nnn
nnn
xyt
xyt
)(
)(
Adapted from Huan Liu
0
0
n
n
Subject to:
N
n
nn wC1
2
2
1)(
Minimize:
![Page 17: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/17.jpg)
Next time:Support Vector Ranking
![Page 18: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/18.jpg)
Outline
• Support Vector Machines (review + other uses)
• Bias-variance trade-off
• Scene recognition: Spatial pyramid matching
• Other classifiers– Decision trees
– Hidden Markov models
• Other problems– Clustering: agglomerative clustering
– Dimensionality reduction
![Page 19: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/19.jpg)
Generalization
• How well does a learned model generalize from
the data it was trained on to a new test set?
Training set (labels known) Test set (labels
unknown)
Slide credit: L. Lazebnik
![Page 20: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/20.jpg)
Generalization• Components of generalization error
– Bias: how much the average model over all training sets differs
from the true model
• Error due to inaccurate assumptions/simplifications made by
the model
– Variance: how much models estimated from different training
sets differ from each other
• Underfitting: model is too “simple” to represent all the
relevant class characteristics
– High bias and low variance
– High training error and high test error
• Overfitting: model is too “complex” and fits irrelevant
characteristics (noise) in the data
– Low bias and high variance
– Low training error and high test error
Slide credit: L. Lazebnik
![Page 21: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/21.jpg)
Bias-Variance Trade-off
• Models with too few parameters are inaccurate because of a large bias (not enough flexibility).
• Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).
Slide credit: D. Hoiem
![Page 22: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/22.jpg)
Fitting a model
Figures from Bishop
Is this a good fit?
![Page 23: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/23.jpg)
With more training data
Figures from Bishop
![Page 24: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/24.jpg)
Bias-variance tradeoff
Training error
Test error
Underfitting Overfitting
Complexity Low Bias
High Variance
High Bias
Low Variance
Err
or
Slide credit: D. Hoiem
![Page 25: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/25.jpg)
Bias-variance tradeoff
Many training examples
Few training examples
Complexity Low Bias
High Variance
High Bias
Low Variance
Test E
rror
Slide credit: D. Hoiem
![Page 26: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/26.jpg)
Choosing the trade-off
• Need validation set
• Validation set is separate from the test set
Training error
Test error
Complexity Low Bias
High Variance
High Bias
Low Variance
Err
or
Slide credit: D. Hoiem
![Page 27: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/27.jpg)
Effect of Training Size
Testing
Training
Generalization Error
Number of Training Examples
Err
or
Fixed prediction model
Adapted from D. Hoiem
![Page 28: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/28.jpg)
How to reduce variance?
• Choose a simpler classifier
• Use fewer features
• Get more training data
• Regularize the parameters
Slide credit: D. Hoiem
![Page 29: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/29.jpg)
Regularization
Figures from Bishop
![Page 30: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/30.jpg)
Characteristics of vision learning problems
• Lots of continuous features
– Spatial pyramid may have ~15,000 features
• Imbalanced classes
– Often limited positive examples, practically infinite negative examples
• Difficult prediction tasks
• Recently, massive training sets became available
– If we have a massive training set, we want classifiers with low bias (high variance is ok) and reasonably efficient training
Adapted from D. Hoiem
![Page 31: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/31.jpg)
Remember…
• No free lunch: machine learning algorithms are tools
• Three kinds of error
– Inherent: unavoidable
– Bias: due to over-simplifications
– Variance: due to inability to perfectly estimate parameters from limited data
• Try simple classifiers first
• Better to have smart features and simple classifiers than simple features and smart classifiers
• Use increasingly powerful classifiers with more training data (bias-variance tradeoff)
Adapted from D. Hoiem
![Page 32: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/32.jpg)
Outline
• Support Vector Machines (review + other uses)
• Bias-variance trade-off
• Scene recognition: Spatial pyramid matching
• Other classifiers– Decision trees
– Hidden Markov models
• Other problems– Clustering: agglomerative clustering
– Dimensionality reduction
![Page 33: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/33.jpg)
Beyond Bags of Features: Spatial Pyramid Matching
for Recognizing Natural Scene Categories
CVPR 2006
Svetlana Lazebnik ([email protected])
Beckman Institute, University of Illinois at Urbana-Champaign
Cordelia Schmid ([email protected])
INRIA Rhône-Alpes, France
Jean Ponce ([email protected])
Ecole Normale Supérieure, France
http://www-cvr.ai.uiuc.edu/ponce_grp
![Page 34: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/34.jpg)
Bags of words
Slide credit: L. Lazebnik
![Page 35: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/35.jpg)
1. Extract local features
2. Learn “visual vocabulary” using clustering
3. Quantize local features using visual vocabulary
4. Represent images by frequencies of “visual words”
Bag-of-features steps
Slide credit: L. Lazebnik
![Page 36: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/36.jpg)
…
Local feature extraction
Slide credit: Josef Sivic
![Page 37: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/37.jpg)
Learning the visual vocabulary
…
Slide credit: Josef Sivic
![Page 38: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/38.jpg)
Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
![Page 39: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/39.jpg)
Learning the visual vocabulary
Clustering
…
Slide credit: Josef Sivic
Visual vocabulary
![Page 40: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/40.jpg)
Image categorization with bag of words
Training1. Extract bag-of-words representation
2. Train classifier on labeled examples using histogram values as features
Testing1. Extract keypoints/descriptors
2. Quantize into visual words using the clusters computed at training time
3. Compute visual word histogram
4. Compute label using classifier
Slide credit: D. Hoiem
![Page 41: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/41.jpg)
What about spatial layout?
All of these images have the same color histogramSlide credit: D. Hoiem
![Page 42: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/42.jpg)
Spatial pyramid
Compute histogram in each spatial bin
Slide credit: D. Hoiem
![Page 43: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/43.jpg)
Spatial pyramid
[Lazebnik et al. CVPR 2006]Slide credit: D. Hoiem
![Page 44: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/44.jpg)
Level 2
Level 1
Level 0
Feature histograms:
Level 3
Total weight (value of pyramid match kernel):
Pyramid matchingIndyk & Thaper (2003), Grauman & Darrell (2005)
Matching using pyramid and histogram intersection for some particular visual word:
Original images
Adapted from L. Lazebnik
xi xj
K( xi , xj )
![Page 45: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/45.jpg)
Scene category dataset
Fei-Fei & Perona: 65.2%
Multi-class classification results (100 training images per class)
Fei-Fei & Perona (2005), Oliva & Torralba (2001)
http://www-cvr.ai.uiuc.edu/ponce_grp/data
Slide credit: L. Lazebnik
![Page 46: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/46.jpg)
Scene category confusions
Difficult indoor images
kitchen living room bedroomSlide credit: L. Lazebnik
![Page 47: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/47.jpg)
Caltech101 dataset
Multi-class classification results (30 training images per class)
Fei-Fei et al. (2004)
http://www.vision.caltech.edu/Image_Datasets/Caltech101/Caltech101.html
Slide credit: L. Lazebnik
![Page 48: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/48.jpg)
Outline
• Support Vector Machines (review + other uses)
• Bias-variance trade-off
• Scene recognition: Spatial pyramid matching
• Other classifiers– Decision trees
– Hidden Markov models
• Other problems– Clustering: agglomerative clustering
– Dimensionality reduction
![Page 49: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/49.jpg)
Decision tree classifier
Example problem: decide whether to wait for a table at a restaurant, based on the following attributes:1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Slide credit: L. Lazebnik
![Page 50: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/50.jpg)
Decision tree classifier
Slide credit: L. Lazebnik
![Page 51: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/51.jpg)
Decision tree classifier
Slide credit: L. Lazebnik
![Page 52: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/52.jpg)
Sequence Labeling Problem
• Unlike most computer vision problems, many
NLP problems can viewed as sequence labeling.
• Each token in a sequence is assigned a label.
• Labels of tokens are dependent on the labels of
other tokens in the sequence, particularly their
neighbors (not i.i.d).
foo bar blam zonk zonk bar blam
Adapted from Ray Mooney
![Page 53: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/53.jpg)
Markov Model / Markov Chain
• A finite state machine with probabilistic
state transitions.
• Makes Markov assumption that next state
only depends on the current state and
independent of previous history.
Ray Mooney
![Page 54: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/54.jpg)
Sample Markov Model for POS
0.95
0.05
0.9
0.05stop
0.5
0.1
0.8
0.1
0.1
0.25
0.25
start0.1
0.5
0.4
Det Noun
PropNoun
Verb
Ray Mooney
![Page 55: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/55.jpg)
Sample Markov Model for POS
0.95
0.05
0.9
0.05stop
0.5
0.1
0.8
0.1
0.1
0.25
0.25
start0.1
0.5
0.4
Det Noun
PropNoun
Verb
P(PropNoun Verb Det Noun) = 0.4*0.8*0.25*0.95*0.1=0.0076
Adapted from Ray Mooney “Mary ate the cake.”
![Page 56: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/56.jpg)
Hidden Markov Model
• Probabilistic generative model for sequences.
• Assume an underlying set of hidden (unobserved) states in which the model can be (e.g. parts of speech).
• Assume probabilistic transitions between states over time (e.g. transition from POS to another POS as sequence is generated).
• Assume a probabilistic generation of tokens from states (e.g. words generated for each POS).
Ray Mooney
![Page 57: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/57.jpg)
Sample HMM for POS
PropNoun
JohnMaryAlice
Jerry
Tom
Noun
catdog
carpen
bedapple
Det
a thethe
the
that
athea
Verb
bit
ate sawplayed
hit
0.95
0.05
0.9
gave0.05
stop
0.5
0.1
0.8
0.1
0.1
0.25
0.25
start0.1
0.5
0.4
Ray Mooney
![Page 58: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/58.jpg)
Outline
• Support Vector Machines (review + other uses)
• Bias-variance trade-off
• Scene recognition: Spatial pyramid matching
• Other classifiers– Decision trees
– Hidden Markov models
• Other problems– Clustering: agglomerative clustering
– Dimensionality reduction
![Page 59: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/59.jpg)
Slide credit: D. Hoiem
![Page 60: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/60.jpg)
Clustering Strategies
• K-means– Iteratively re-assign points to the nearest cluster
center
• Mean-shift clustering– Estimate modes
• Graph cuts– Split the nodes in a graph based on assigned links with
similarity weights
• Agglomerative clustering– Start with each point as its own cluster and iteratively
merge the closest clusters
![Page 61: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/61.jpg)
Agglomerative clustering
![Page 62: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/62.jpg)
Agglomerative clustering
![Page 63: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/63.jpg)
Agglomerative clustering
![Page 64: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/64.jpg)
Agglomerative clustering
![Page 65: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/65.jpg)
Agglomerative clustering
![Page 66: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/66.jpg)
Agglomerative clustering
How to define cluster similarity?- Average distance between points, maximum
distance, minimum distance
How many clusters?- Clustering creates a dendrogram (a tree)
- Threshold based on max number of clusters or based on distance between merges
dis
tan
ce
Adapted from J. Hays
![Page 67: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/67.jpg)
Why do we cluster?
• Summarizing data– Look at large amounts of data– Represent a large continuous vector with the cluster number
• Counting– Histograms of texture, color, SIFT vectors
• Segmentation– Separate the image into different regions
• Prediction– Images in the same cluster may have the same labels
Slide credit: J. Hays, D. Hoiem
![Page 68: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/68.jpg)
Slide credit: D. Hoiem
![Page 69: CS 1699: Intro to Computer Vision Introductionkovashka/cs1699_fa15/... · Quadratic optimization problem: Minimize Subject to y i (w·x i +b) ≥ 1 wT w 2 1 negative ( 1) : 1 positive](https://reader033.fdocuments.us/reader033/viewer/2022051810/601a44c1343ebc0527337222/html5/thumbnails/69.jpg)
Dimensionality Reduction
Figure from Genevieve Patterson, IJCV 2014