CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of...
-
Upload
naomi-burns -
Category
Documents
-
view
219 -
download
0
Transcript of CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of...
![Page 1: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/1.jpg)
CS 1699: Intro to Computer Vision
Support Vector Machines
Prof. Adriana KovashkaUniversity of Pittsburgh
October 29, 2015
![Page 2: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/2.jpg)
Homework
• HW3: due Nov. 3– Note: You need to perform k-means on features
from multiple frames joined together, not one k-means per frame!
• HW4: due Nov. 24• HW5: half-length, still 10% of grade, out Nov.
24, due Dec. 10
![Page 3: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/3.jpg)
Plan for today
• Support vector machines• Bias-variance tradeoff• Scene recognition: Spatial pyramid matching
![Page 4: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/4.jpg)
The machine learning framework
y = f(x)
• Training: given a training set of labeled examples {(x1,y1), …, (xN,yN)}, estimate the prediction function f by minimizing the prediction error on the training set
• Testing: apply f to a never before seen test example x and output the predicted value y = f(x)
output prediction function
image feature
Slide credit: L. Lazebnik
![Page 5: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/5.jpg)
Nearest neighbor classifier
f(x) = label of the training example nearest to x
• All we need is a distance function for our inputs• No training required!
Test example
Training examples
from class 1
Training examples
from class 2
Slide credit: L. Lazebnik
![Page 6: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/6.jpg)
K-Nearest Neighbors classification
k = 5
Slide credit: D. Lowe
• For a new point, find the k closest points from training data• Labels of the k points “vote” to classify
If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative.
Black = negativeRed = positive
![Page 7: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/7.jpg)
Scene Matches
[Hays and Efros. im2gps: Estimating Geographic Information from a Single Image. CVPR 2008.] Slides: James Hays
![Page 8: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/8.jpg)
Nearest Neighbors: Pros and Cons
• NN pros:+Simple to implement+Decision boundaries not necessarily linear+Works for any number of classes+Nonparametric method
• NN cons:– Need good distance function– Slow at test time (large search problem to find
neighbors)– Storage of data
Adapted from L. Lazebnik
![Page 9: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/9.jpg)
Discriminative classifiers
Learn a simple function of the input features that correctly predicts the true labels on the training set
Training Goals1. Accurate classification of training data2. Correct classifications are confident3. Classification function is simple
𝑦= 𝑓 (𝑥 )
Slide credit: D. Hoiem
![Page 10: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/10.jpg)
Linear classifier
• Find a linear function to separate the classes
f(x) = sgn(w1x1 + w2x2 + … + wDxD) = sgn(w x)
Svetlana Lazebnik
![Page 11: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/11.jpg)
Lines in R2
0 bcyax
c
aw
y
xxLet
Kristen Grauman
![Page 12: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/12.jpg)
Lines in R2
0 bxw
c
aw
y
xx
0 bcyax
Let
w
Kristen Grauman
![Page 13: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/13.jpg)
Lines in R2
0 bxw
c
aw
y
xx
0 bcyax
Let
w
00 , yx
Kristen Grauman
![Page 14: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/14.jpg)
Lines in R2
0 bxw
c
aw
y
xx
0 bcyax
Let
w
00 , yx
D
w
xw b
ca
bcyaxD
22
00 distance from point to line
Kristen Grauman
![Page 15: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/15.jpg)
Lines in R2
0 bxw
c
aw
y
xx
0 bcyax
Let
w
00 , yx
D
w
xw ||22
00 b
ca
bcyaxD
distance from point to line
Kristen Grauman
![Page 16: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/16.jpg)
Linear classifiers• Find linear function to separate positive and
negative examples
0:negative
0:positive
b
b
ii
ii
wxx
wxx
Which lineis best?
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 17: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/17.jpg)
Support vector machines
• Discriminative classifier based on optimal separating line (for 2d case)
• Maximize the margin between the positive and negative training examples
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 18: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/18.jpg)
Support vector machines• Want line that maximizes the margin.
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
MarginSupport vectors
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
For support, vectors, 1 bi wx
wx+
b=-1
wx+
b=0
wx+
b=1
![Page 19: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/19.jpg)
Support vector machines• Want line that maximizes the margin.
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
Support vectors
For support, vectors, 1 bi wx
wx+
b=-1
wx+
b=0
wx+
b=1
Distance between point and line: ||||
||
w
wx bi
www
211
M
ww
xw 1
bΤ
For support vectors:
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
Margin
![Page 20: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/20.jpg)
Support vector machines• Want line that maximizes the margin.
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
MarginSupport vectors
For support, vectors, 1 bi wx
wx+
b=-1
wx+
b=0
wx+
b=1
Distance between point and line: ||||
||
w
wx bi
Therefore, the margin is 2 / ||w||
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 21: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/21.jpg)
Finding the maximum margin line
1. Maximize margin 2/||w||
2. Correctly classify all training data points:
Quadratic optimization problem:
Minimize
Subject to yi(w·xi+b) ≥ 1
wwT
2
1
1:1)(negative
1:1)( positive
by
by
iii
iii
wxx
wxx
One constraint for each training point.
Note sign trick.
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 22: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/22.jpg)
Finding the maximum margin line• Solution:
i iii y xw
Support vector
Learnedweight
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 23: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/23.jpg)
Finding the maximum margin line• Solution:
b = yi – w·xi (for any support vector)
• Classification function:
• Notice that it relies on an inner product between the test point x and the support vectors xi
• (Solving the optimization problem also involves computing the inner products xi · xj between all pairs of training points)
i iii y xw
by
xf
ii
xx
xw
i isign
b)(sign )(
If f(x) < 0, classify as negative, otherwise classify as positive.
C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998
![Page 24: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/24.jpg)
• Datasets that are linearly separable work out great:
• But what if the dataset is just too hard?
• We can map it to a higher-dimensional space:
0 x
0 x
0 x
x2
Nonlinear SVMs
Andrew Moore
![Page 25: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/25.jpg)
Φ: x → φ(x)
• General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:
Andrew Moore
Nonlinear SVMs
![Page 26: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/26.jpg)
Nonlinear kernel: Example
• Consider the mapping ),()( 2xxx
22
2222
),(
),(),()()(
yxxyyxK
yxxyyyxxyx
x2
Svetlana Lazebnik
![Page 27: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/27.jpg)
The “Kernel Trick”• The linear classifier relies on dot product between
vectors K(xi , xj) = xi · xj
• If every data point is mapped into high-dimensional space via some transformation Φ: xi → φ(xi ), the dot product becomes: K(xi , xj) = φ(xi ) · φ(xj)
• A kernel function is similarity function that corresponds to an inner product in some expanded feature space
• The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that: K(xi , xj) = φ(xi ) · φ(xj)
Andrew Moore
![Page 28: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/28.jpg)
Examples of kernel functions
Linear:
Gaussian RBF:
Histogram intersection:
)2
exp()(2
2
ji
ji
xx,xxK
k
jiji kxkxxxK ))(),(min(),(
jT
iji xxxxK ),(
Andrew Moore
![Page 29: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/29.jpg)
Allowing misclassifications
Maximize margin Minimize misclassification
Slack variable
The w that minimizes…
Misclassification cost
# data samples
![Page 30: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/30.jpg)
What about multi-class SVMs?• Unfortunately, there is no “definitive” multi-class SVM
formulation• In practice, we have to obtain a multi-class SVM by combining
multiple two-class SVMs • One vs. others
– Training: learn an SVM for each class vs. the others– Testing: apply each SVM to the test example, and assign it to the
class of the SVM that returns the highest decision value• One vs. one
– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to the test
example
Svetlana Lazebnik
![Page 31: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/31.jpg)
SVMs for recognition1. Define your representation for each
example.
2. Select a kernel function.
3. Compute pairwise kernel values between labeled examples
4. Use this “kernel matrix” to solve for SVM support vectors & weights.
5. To classify a new example: compute kernel values between new input and support vectors, apply weights, check sign of output.
Kristen Grauman
![Page 32: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/32.jpg)
Example: learning gender with SVMs
Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.
Moghaddam and Yang, Face & Gesture 2000.
Kristen Grauman
![Page 33: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/33.jpg)
• Training examples:– 1044 males– 713 females
• Experiment with various kernels, select Gaussian RBF
Learning gender with SVMs
)2
exp(),(2
2
ji
ji
xxxx
K
Kristen Grauman
![Page 34: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/34.jpg)
Support Faces
Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.
![Page 35: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/35.jpg)
Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.
![Page 36: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/36.jpg)
Gender perception experiment:How well can humans do?
• Subjects: – 30 people (22 male, 8 female)– Ages mid-20’s to mid-40’s
• Test data:– 254 face images (6 males, 4 females)– Low res and high res versions
• Task:– Classify as male or female, forced choice– No time limit
Moghaddam and Yang, Face & Gesture 2000.
![Page 37: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/37.jpg)
Moghaddam and Yang, Face & Gesture 2000.
Gender perception experiment:How well can humans do?
Error Error
![Page 38: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/38.jpg)
Human vs. Machine
• SVMs performed better than any single human test subject, at either resolution
Kristen Grauman
![Page 39: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/39.jpg)
Hardest examples for humans
Moghaddam and Yang, Face & Gesture 2000.
![Page 40: CS 1699: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 29, 2015.](https://reader036.fdocuments.us/reader036/viewer/2022062409/5697bf901a28abf838c8df85/html5/thumbnails/40.jpg)
SVMs: Pros and cons• Pros
• Many publicly available SVM packages: http://www.csie.ntu.edu.tw/~cjlin/libsvm/
or use built-in Matlab version (but slower)• Kernel-based framework is very powerful, flexible• Often a sparse set of support vectors – compact at test time• Work very well in practice, even with very small training
sample sizes
• Cons• No “direct” multi-class SVM, must combine two-class SVMs• Can be tricky to select best kernel function for a problem• Computation, memory
– During training time, must compute matrix of kernel values for every pair of examples
– Learning can take a very long time for large-scale problems
Adapted from Lana Lazebnik