Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that...
Transcript of Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that...
1
CAP 5610: Machine Learning
Instructor: Guo-Jun QI
Support Vector Machines
Linear Classifier
Naive Bayes
Assume each attribute is drawn from Gaussian
distribution with the same variance
Generative model: estimate the mean and
variance with closed-form solution
Logistic regression
Directly maximizing the log likelihood to fit the
model into the training data
Discriminative model: no closed-form solution,
a gradient ascent method is used.
2
Drawback
Lacking of a geometric intuition to explain
what’s a good linear classifier in high
dimensional space.
3
SVM
Supervised learning methods used for
Classification
Regression
A special property: simultaneously
minimize the classification error
maximize the geometric margin
maximum margin classifier
Excellent theory and good performance
4
Outline
Linear SVM – hard margin
Linear SVM – soft margin
Non-linear SVM
Application
5
Outline
Linear SVM – hard margin
Linear SVM – soft margin
Non-linear SVM
Application
6
Linear Classifiersf x y
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
w x + b<0
w x + b>0
Label y:
Parameters
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Parameters
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Parameters
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
Any of these would be fine..
..but which is best?
Parameters
Linear Classifiersf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
How would you classify this data?
Misclassified
to +1 class
Parameters
Classifier Marginf x yest
denotes +1
denotes -1 Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a data point.
Classifier Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
Parameters
Maximum Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
The maximum margin linear classifier is the linear classifier with the maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Linear SVM
Support Vectors are those datapoints that the margin pushes up against
1. Maximizing the margin makes sense according to intuition
2. Implies that only support vectors are important; other training examples can be discarded without affecting the training result.
Parameters
Maximum Marginf x yest
denotes +1
denotes -1
f(x,w,b) = sign(w x + b)
The maximum margin linear classifier is the linear classifier with the maximum margin.
This is the simplest kind of SVM (Called an LSVM)
Linear SVM
Keeping only support vectors will not change the maximum margin classifier.
Robust to the small changes (noises) in non-support vectors
Parameters
Basics to SVM math
15
w/|w|w/|w|:
Perpendicular to line wx+b=0
Unit length
Margin of two parallel lines is
where
x1
x2
||
||
||
|| 2121
w
bb
w
xxw
w/|w|
1 1
1 2 2 1
2 2
0( )
0
wx bw x x b b
wx b
Decision rule:
Positive examples: w . x+ + b > +1
Negative examples: w . x- + b < -1
Subtracting two equations: w . (x+-x-) = 2
x-
x+
Linear SVM Mathematically
What we know:
w . x+ + b = +1
w . x- + b = -1
w . (x+-x-) = 2
x-
x+
ww
wxxM
2)(
M=Margin Width
Linear SVM Mathematically
Linear SVM Mathematically
Goal: 1) Correctly classify all training data
if yi = +1
if yi = -1
for all i
2) Maximize the Margin
same as minimize
We can formulate a Quadratic Optimization Problem and solve for w and b
Minimize
subject to
wM
2
www t
2
1)(
1bwxi
1bwxi
1)( bwxy ii
1)( bwxy ii
i
wwt
2
1
Solving the Optimization Problem
Need to optimize a quadratic function subject to linear
constraints.
Use Lagrange multiplier. αi is associated with every
constraint : , dual problem
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi
1)( bwxy ii
Refer: Christopher J. C. Burges: A Tutorial on Support Vector Machines for
Pattern Recognition, Data Mining and Knowledge Discovery, 1998
The Optimization Problem Solution
The solution has the form:
αi must satisfy Karush-Kuhn-Tucker conditions:
αi [yi(wTxi+b)-1]=0, for any i
If αi >0, yi(wTxi+b)-1=0, xi is on the margin
If yi(wTxi+b)>1, αi =0
Each non-zero αi indicates that corresponding xi
is a support vector.
w =Σαiyixi b= yk- wTxkfor any xk such that αk 0
Maximum Margin
denotes +1
denotes -1
w, b depends only on Support Vectors via active constraints
yi(wTxk+b)-1=0
The Optimization Problem Solution
To classify the new test point x, we use
f(x) = wx + b =ΣαiyixiTx + b
Find α1…αN such that
Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and
(1) Σαiyi = 0
(2) αi ≥ 0 for all αi