Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that...

22
1 CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines

Transcript of Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that...

Page 1: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

1

CAP 5610: Machine Learning

Instructor: Guo-Jun QI

Support Vector Machines

Page 2: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifier

Naive Bayes

Assume each attribute is drawn from Gaussian

distribution with the same variance

Generative model: estimate the mean and

variance with closed-form solution

Logistic regression

Directly maximizing the log likelihood to fit the

model into the training data

Discriminative model: no closed-form solution,

a gradient ascent method is used.

2

Page 3: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Drawback

Lacking of a geometric intuition to explain

what’s a good linear classifier in high

dimensional space.

3

Page 4: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

SVM

Supervised learning methods used for

Classification

Regression

A special property: simultaneously

minimize the classification error

maximize the geometric margin

maximum margin classifier

Excellent theory and good performance

4

Page 5: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

5

Page 6: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Outline

Linear SVM – hard margin

Linear SVM – soft margin

Non-linear SVM

Application

6

Page 7: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifiersf x y

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

w x + b<0

w x + b>0

Label y:

Parameters

Page 8: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Parameters

Page 9: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Parameters

Page 10: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

Any of these would be fine..

..but which is best?

Parameters

Page 11: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear Classifiersf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

How would you classify this data?

Misclassified

to +1 class

Parameters

Page 12: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Classifier Marginf x yest

denotes +1

denotes -1 Define the marginof a linear classifier as the width that the boundary could be increased by before hitting a data point.

Classifier Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

Parameters

Page 13: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Maximum Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Support Vectors are those datapoints that the margin pushes up against

1. Maximizing the margin makes sense according to intuition

2. Implies that only support vectors are important; other training examples can be discarded without affecting the training result.

Parameters

Page 14: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Maximum Marginf x yest

denotes +1

denotes -1

f(x,w,b) = sign(w x + b)

The maximum margin linear classifier is the linear classifier with the maximum margin.

This is the simplest kind of SVM (Called an LSVM)

Linear SVM

Keeping only support vectors will not change the maximum margin classifier.

Robust to the small changes (noises) in non-support vectors

Parameters

Page 15: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Basics to SVM math

15

w/|w|w/|w|:

Perpendicular to line wx+b=0

Unit length

Margin of two parallel lines is

where

x1

x2

||

||

||

|| 2121

w

bb

w

xxw

w/|w|

1 1

1 2 2 1

2 2

0( )

0

wx bw x x b b

wx b

Page 16: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Decision rule:

Positive examples: w . x+ + b > +1

Negative examples: w . x- + b < -1

Subtracting two equations: w . (x+-x-) = 2

x-

x+

Linear SVM Mathematically

Page 17: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

What we know:

w . x+ + b = +1

w . x- + b = -1

w . (x+-x-) = 2

x-

x+

ww

wxxM

2)(

M=Margin Width

Linear SVM Mathematically

Page 18: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Linear SVM Mathematically

Goal: 1) Correctly classify all training data

if yi = +1

if yi = -1

for all i

2) Maximize the Margin

same as minimize

We can formulate a Quadratic Optimization Problem and solve for w and b

Minimize

subject to

wM

2

www t

2

1)(

1bwxi

1bwxi

1)( bwxy ii

1)( bwxy ii

i

wwt

2

1

Page 19: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Solving the Optimization Problem

Need to optimize a quadratic function subject to linear

constraints.

Use Lagrange multiplier. αi is associated with every

constraint : , dual problem

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi

1)( bwxy ii

Refer: Christopher J. C. Burges: A Tutorial on Support Vector Machines for

Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 20: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

The Optimization Problem Solution

The solution has the form:

αi must satisfy Karush-Kuhn-Tucker conditions:

αi [yi(wTxi+b)-1]=0, for any i

If αi >0, yi(wTxi+b)-1=0, xi is on the margin

If yi(wTxi+b)>1, αi =0

Each non-zero αi indicates that corresponding xi

is a support vector.

w =Σαiyixi b= yk- wTxkfor any xk such that αk 0

Page 21: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

Maximum Margin

denotes +1

denotes -1

w, b depends only on Support Vectors via active constraints

yi(wTxk+b)-1=0

Page 22: Support Vector Machines - UCF Computer Sciencegqi/CAP5610/CAP5610Lecture07.pdf · datapoints that the margin pushes up against 1. Maximizing the margin makes sense according to intuition

The Optimization Problem Solution

To classify the new test point x, we use

f(x) = wx + b =ΣαiyixiTx + b

Find α1…αN such that

Q(α) =Σαi - ½ΣΣαiαjyiyjxiTxj is maximized and

(1) Σαiyi = 0

(2) αi ≥ 0 for all αi