UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü...

84
UVA CS 4501: Machine Learning Lecture 16: Support Vector Machine Dr. Yanjun Qi University of Virginia Department of Computer Science

Transcript of UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü...

Page 1: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

UVACS4501:MachineLearning

Lecture16:SupportVectorMachine

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

Page 2: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Wherearewe?èFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

4/3/18 2

Dr.YanjunQi/UVA

Page 3: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 3

Dr.YanjunQi/UVA

Page 4: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

HistoryofSVM• SVMisinspiredfromstatisticallearningtheory[3]• SVMwasfirstintroducedin1992[1]

• SVMbecomespopularbecauseofitssuccessinhandwrittendigitrecognition(1994)– 1.1%testerror rateforSVM.– Thesameastheerrorratesofacarefullyconstructedneuralnetwork,LeNet 4.

• Section5.11in[2]orthediscussionin[3]fordetails

• Regardedasanimportantexampleof“kernelmethods”,arguablythehottestareainmachinelearning20yearsago

[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992.

[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.

[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.

4/3/18 4

Dr.YanjunQi/UVA

Theoreticallysound/Impactful

Page 5: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

ApplicationsofSVMs

• ComputerVision• TextCategorization• Ranking(e.g.,Googlesearches)• HandwrittenCharacterRecognition• Timeseriesanalysis• Bioinformatics• ……….

àLotsofverysuccessfulapplications!!!

4/3/18

Dr.YanjunQi/UVA

5

Page 6: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Handwrittendigitrecognition

4/3/18

Dr.YanjunQi/UVA

6

Page 7: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

ADatasetforbinary

classification

• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independent

variables/covariates/predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobe

predicted[lastcolumn]

4/3/18 7

Output as Binary Class:

only two possibilities

Dr.YanjunQi/UVA

Page 8: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

AffineHyperplanes

• https://en.wikipedia.org/wiki/Hyperplane• Anyhyperplanecanbegivenin coordinates asthesolutionofasinglelinear(algebraic)equation ofdegree 1.

4/3/18

Dr.YanjunQi/UVA

8

Page 9: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Review:VectorProduct,Orthogonal,andNorm

Fortwovectorsx andy,

xTy

iscalledthe(inner)vectorproduct.

Thesquarerootoftheproductofavectorwithitself,

iscalledthe2-norm (|x|2),canalsowriteas|x|

x andy arecalledorthogonal if

xTy=0

4/3/18 9

Dr.YanjunQi/UVA

Page 10: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 10

Dr.YanjunQi/UVA

Page 11: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

4/3/18 11

Dr.YanjunQi/UVA

Page 12: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

4/3/18 12

Dr.YanjunQi/UVA

Page 13: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

4/3/18 13

Dr.YanjunQi/UVA

Page 14: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

4/3/18 14

Dr.YanjunQi/UVA

Page 15: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

LinearClassifiersfx yest

denotes+1denotes-1

Anyofthesewouldbefine..

..butwhichisbest?

4/3/18 15

Dr.YanjunQi/UVA

Page 16: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

ClassifierMarginfx yest

denotes+1denotes-1 Definethemargin of

alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehittingadatapoint.

4/3/18 16

Dr.YanjunQi/UVA

Page 17: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

LinearSVM4/3/18 17

Dr.YanjunQi/UVA

Page 18: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

Support Vectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM4/3/18 18

Dr.YanjunQi/UVA

x2

x1

Page 19: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

Support Vectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM

f(x,w,b) =sign(wTx +b)

4/3/18 19

Dr.YanjunQi/UVA

x2

x1

Page 20: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Max margin classifiers

• Instead of fitting all points, focus on boundary points

• Learn a boundary that leads to the largest margin from both sets of points

From all the possible boundary lines, this leads to the largest margin on both sides

4/3/18 20

Dr.YanjunQi/UVA

x2

x1

Page 21: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Max margin classifiers

• Instead of fitting all points, focus on boundary points

• Learn a boundary that leads to the largest margin from points on both sides

D

DWhy MAX margin?

• Intuitive, ‘makes sense’

• Some theoretical support

• Works well in practice

4/3/18 21

Dr.YanjunQi/UVA

x2

x1

Page 22: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Max margin classifiers

• Instead of fitting all points, focus on boundary points

• Learn a boundary that leads to the largest margin from points on both sides

D

DThat is why Also known as linear support vector machines (SVMs)

These are the vectors supporting the boundary

4/3/18 22

Dr.YanjunQi/UVA

x2

x1

Page 23: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Max-margin&DecisionBoundary

Class -1

Class 1

4/3/18

Dr.YanjunQi/UVA

23

W is a p-dim vector; b is a

scalar

x2

x1

Page 24: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Max-margin&DecisionBoundary• Thedecisionboundaryshouldbeasfarawayfrom

thedataofbothclassesaspossible

Class -1

Class 1

W is a p-dim vector; b is a

scalar

4/3/18

Dr.YanjunQi/UVA

24

Page 25: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Specifying a max margin classifier

Classify as +1 if wTx+b >= 1

Classify as -1 if wTx+b <= - 1

Undefined if -1 <wTx+b < 1

Class +1 plane

boundary

Class -1 plane

4/3/18 25

Dr.YanjunQi/UVA

Page 26: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Specifying a max margin classifier

Classify as +1 if wTx+b >= 1

Classify as -1 if wTx+b <= - 1

Undefined if -1 <wTx+b < 1

Is the linear separation assumption realistic?

We will deal with this shortly, but lets assume it for now

4/3/18 26

Dr.YanjunQi/UVA

Page 27: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

q SupervisedClassificationq SupportVectorMachine(SVM)

ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 27

Dr.YanjunQi/UVA

Page 28: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the marginClassify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1

• Lets define the width of the margin by M

• How can we encode our goal of maximizing M in terms of our parameters (w and b)?

• Lets start with a few obsevrations

4/3/18 28

Dr.YanjunQi/UVA

Page 29: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

MarginM

4/3/18

Dr.YanjunQi/UVA

29

Page 30: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

30

• wT x+ + b = +1

• wT x- + b = -1

• M = | x+ - x- | = ?

Page 31: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the margin: observation-1

• Observation 1: the vector w is orthogonal to the +1 plane

• Why?

Corollary: the vector w is orthogonal to the -1 plane 4/3/18 31

Dr.YanjunQi/UVA

Page 32: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the margin: observation-1

Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1

• Observation 1: the vector w is orthogonal to the +1 plane

• Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0

Corollary: the vector w is orthogonal to the -1 plane 4/3/18 32

Dr.YanjunQi/UVA

Page 33: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the margin: observation-1

Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1

• Observation 1: the vector w is orthogonal to the +1 plane

• Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0

Corollary: the vector w is orthogonal to the -1 plane 4/3/18 33

Dr.YanjunQi/UVA

Page 34: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Observation1è Review: Orthogonal

4/3/18 34

Dr.YanjunQi/UVA

Page 35: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the margin: observation-1

• Observation1:thevectorwisorthogonaltothe+1plane

Class 1

Class 2

M

4/3/18

Dr.YanjunQi/UVA

35

Page 36: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Maximizing the margin: observation-2

Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1

• Observation 1: the vector w is orthogonal to the +1 and -1 planes

• Observation 2: if x+ is a point on the +1 plane and x- is the closest point to x+ on the -1 plane then

x+ = λ w + x-

Since w is orthogonal to both planes we need to ‘travel’ some distance along w to get from x+ to x-

4/3/18 36

Dr.YanjunQi/UVA

Page 37: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

37

• wT x+ + b = +1

• wT x- + b = -1

• M = | x+ - x- | = ?

• x+ = λ w + x-

Putting it together

Page 38: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λ w + x-

• | x+ - x- | = M

We can now define M in terms of w and b

4/3/18 38

Dr.YanjunQi/UVA

Page 39: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

39

Page 40: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λw + x-

• | x+ - x- | = M

We can now define M in terms of w and b

wT x+ + b = +1

=>

wT (λw + x-) + b = +1=>

wTx- + b + λwTw = +1

=>

-1 + λwTw = +1

=>

λ = 2/wTw

4/3/18 40

Dr.YanjunQi/UVA

Page 41: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λ w + x-

• | x+ - x- | = M

• λ = 2/wTw

We can now define M in terms of w and b

M = |x+ - x-|

=>

=>

M =| λw |= λ | w |= λ wTw

M = 2 wTwwTw

=2wTw

4/3/18 41

Dr.YanjunQi/UVA

Page 42: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Finding the optimal parameters

M =2wTw

We can now search for the optimal parameters by finding a solution that:

1. Correctly classifies all points

2. Maximizes the margin (or equivalently minimizes wTw)

Several optimization methods can be used: Gradient descent, OR SMO (see extra slides)

4/3/18 42

Dr.YanjunQi/UVA

Page 43: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

43

Page 44: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

44

Thegradientpointsinthedirection ofthegreatestrateofincreaseofthefunctionanditsmagnitude istheslopeofthegraphinthatdirection

Page 45: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 45

Dr.YanjunQi/UVA

Page 46: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Step i.e. learning optimal parameter for SVM

M =2wTw

4/3/18 46

Dr.YanjunQi/UVA

Page 47: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Step i.e. learning optimal parameter for SVM

M =2wTw

Min (wTw)/2 subject to the following constraints:

4/3/18 47

Dr.YanjunQi/UVA

Page 48: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Step i.e. learning optimal parameter for SVM

M =2wTw

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n training samples

4/3/18 48

Dr.YanjunQi/UVA

Page 49: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n input samples

4/3/18 49

Dr.YanjunQi/UVA

Page 50: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n input samples

4/3/18 50

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi x i ⋅w + b( ) ≥1

Page 51: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Reformulation

Min (wTw)/2 subject to the following constraints:

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

}A total of n constraints if we have n input samples

4/3/18 51

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1

Quadratic Objective

Quadratic programming i.e., - Quadratic objective - Linear constraints

Page 52: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecase(softSVM)ü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 52

Dr.YanjunQi/UVA

Page 53: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Linearly Non separable case• So far we assumed that a linear hyperplane can perfectly separate the points

• But this is not usually the case

- noise, outliers How can we convert this to a QP problem?

- Minimize training errors?

min wTw

min #errors

Hard to solve (two minimization problems)

4/3/18 53

Dr.YanjunQi/UVA

Page 54: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Linearly Non separable case• So far we assumed that a linear plane can perfectly separate the points

• But this is not usally the case

- noise, outliersHow can we convert this to a QP problem?

- Minimize training errors?

min wTw

min #errors

- Penalize training errors:

min wTw+C*(#errors)

Hard to solve (two minimization problems)

Hard to encode in a QP problem

4/3/18 54

Dr.YanjunQi/UVA

Page 55: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane

-1 plane

+1 plane

jk

The new optimization problem is:

!!minw

wTw2 + Cε i

i=1

n

4/3/18 55

Dr.YanjunQi/UVA

ε ε

Page 56: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane

-1 plane

+1 plane

jk

The new optimization problem is:

subject to the following inequality constraints:For all xi in class + 1

wTxi+b >= 1- i

For all xi in class - 1

wTxi+b <= -1+ i

minw

wTw2 +C ε i

i=1

n

Wait. Are we missing something?

4/3/18 56

Dr.YanjunQi/UVA

ε

εε ε

Page 57: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Final optimization for linearly non-separable case

The new optimization problem is:

subject to the following inequality constraints:

minw

wTw2 +C ε i

i=1

n

For all i

} A total of n constraints

} Another n constraints

4/3/18 57

Dr.YanjunQi/UVA

!!ε i ≥0

Page 58: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

58

Two optimization problems:

For the separable and non separable cases

For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <=-1

minw

wTw2 +C ε i

i=1

n

For all i

minwwTw2

!!ε i ≥0

Page 59: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

HingeLossforSoftSVM

4/3/18

Dr.YanjunQi/UVA

59

argminw,b

wTw2

+C max(0,1− yi wT x i + b( ))i=1

n

subject to:

yi wT x i + b( ) ≥1− ε i

ε i ≥ 0

minw

wTw2 +C ε i

i=1

n

For all i

ε i ≥0

soft

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1

vs.HardSVM

Page 60: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

60

ModelSelection,findrightC

Selecttheright

penaltyparameter

C

Page 61: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

61

ModelSelection,findrightC

AlargevalueofCmeansthatmisclassificationsarebad- resultinginsmallermarginsandlesstrainingerror(butmoreexpectedtrueerror).

AsmallCresultsinmoretrainingerror,hopefullybettertrueerror.

Page 62: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)üLinearlynon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 62

Dr.YanjunQi/UVA

Page 63: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Two optimization problems: For the separable and non separable cases

Min (wTw)/2 minw

wTw2 +C ε i

i=1

n

For all i

• Instead of solving these QPs directly we will solve a dual formulation of the SVM optimization problem

• The main reason for switching to this type of representation is that it would allow us to use a neat trick that will make our lives easier (and the run time faster)

4/3/18 63

Dr.YanjunQi/UVA

!!ε i ≥0

Page 64: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Review: ConstrainedOptimization

4/3/18

Dr.YanjunQi/UVA

64

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:

Page 65: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Review: ConstrainedOptimization

4/3/18

Dr.YanjunQi/UVA

65

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:

Page 66: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Review: ConstrainedOptimization

4/3/18

Dr.YanjunQi/UVA

66

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:

Page 67: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Optimization Review: Ingredients

• Objectivefunction• Variables• Constraints

Findvaluesofthevariablesthatminimizeormaximizetheobjectivefunctionwhilesatisfyingtheconstraints

674/3/18

Dr.YanjunQi/UVA

Page 68: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dr.YanjunQi/UVA

Optimization Review: Lagrangian Duality(Extra)

• ThePrimalProblem

Primal:

ThegeneralizedLagrangian:

theα's(αι≥0)arecalledtheLagarangianmultipliersLemma:

Are-writtenPrimal:

minws.t.

f0(w)fi(w)≤0,i =1,…,k

L(w ,α )= f0(w)+ α i fi(w)

i=1

k

maxα ,α i≥0L(w ,α )=

f0(w) ifw satisfiesprimalconstraints∞ o/w

⎧⎨⎪

⎩⎪

minwmaxα ,α i≥0L(w ,α ) ©EricXing@CMU,2006-20084/3/18

“MethodofLagrangemultipliers”converttoahigher-dimensional problem

Page 69: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dr.YanjunQi/UVA

69

Optimization Review: Lagrangian Duality,cont.(Extra)

• RecallthePrimalProblem:

• TheDualProblem:

• Theorem(weakduality):

• Theorem(strongduality):Iff thereexistasaddlepointof

wehave

minwmaxα ,α i≥0L(w ,α )

maxα ,α i≥0minwL(w ,α )

d* =maxα ,α i≥0minwL(w ,α ) ≤ minwmaxα ,α i≥0L(w ,α )= p

*

** pd = L(w ,α )

4/3/18

Page 70: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

An alternative representation of the SVM QP

• We will start with the linearly separable case

• Instead of encoding the correct classification rule and constraint we will use Lagrange multiplies to encode it as part of the our minimization problem

Min (wTw)/2

s.t.

(wTxi+b)yi >= 1

4/3/18 70

Dr.YanjunQi/UVA

Recall that Lagrange multipliers can be applied to turn the following problem:

Lprimal(w ,b,α )=

12 w

2− α i yi(w ⋅x i +b)−1( )

i=1

N

Page 71: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dr.YanjunQi/UVA

71

***( )

TheDualProblem(Extra)

• WeminimizeL withrespecttow andb first:

Notethat(*) implies:

• Plus(***)backtoL ,andusing(**),wehave:

maxα i≥0minw ,bL(w ,b,α )

!! ∇wL(w ,b,α )! =w− α i yixi =0

i=1

train

∑ ,

!! ∇bL(w ,b,α )! = α i yi =0

i=1

train

∑ ,

!!w = α i yixi

i=1

train

*()

!!! L(w ,b,α )= α i

i=1∑ − 12 α iα j yi y j(x iTx j )

i , j=1∑

**( )

4/3/18

Page 72: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Solvingforw thatgivesmaximummargin:1. Combineobjectivefunctionandconstraintsintonew

objectivefunction,usingLagrangemultipliers \alphai

2. TominimizethisLagrangian,wetakederivativesofw andbandsetthemto0:

Summary:DualforSVM(Extra)

!!!Lprimal =

12 w

2− α i yi(w ⋅x i +b)−1( )

i=1

N

4/3/18

Dr.YanjunQi/UVA

72

Page 73: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

3. Substitutingandrearranginggivesthedual oftheLagrangian:

whichwetrytomaximize(notminimize).

4. Oncewehavethe\alphai,wecansubstituteintopreviousequationstogetw andb.

5. Thisdefinesw andb aslinearcombinationsofthetrainingdata.

Summary:DualforSVM(Extra)

Ldual = α i

i=1

N

∑ − 12 i∑ α iα j yi y jx i ⋅x j

j∑

4/3/18

Dr.YanjunQi/UVA

73!!w = α i yixi

i=1

train

Page 74: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Summary: Dual SVM for linearly separable case

Dual formulation

maxα αi −i∑ 1

2αiα jyiyj

i,j∑ xi

Txj

αiyi = 0i∑

αi ≥ 0 ∀i

4/3/18 74

Dr.YanjunQi/UVA

EasierthanoriginalQP,moreefficientalgorithmsexisttofindαi

Page 75: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dual SVM for linearly separable case – Training

Our dual target function: maxα αi −i∑ 1

2αiα jyiyj

i,j∑ xi

Txj

αiyi = 0i∑

αi ≥ 0 ∀i

Dot product for all training samples

Dr.YanjunQi/UVA

4/3/18 75

Page 76: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dr.YanjunQi/UVA

76

Supportvectors:non-zero αi• onlyafewαi's canbenonzero!!

α i yi(w ⋅x i +b)−1( ) =0,i =1,…,n

Wecallthetrainingdatapointswhoseαi's arenonzerothesupportvectors (SV)

4/3/18

Page 77: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

77

α6=1.4

Class 1

Class 2

α1=0.8

α2=0

α3=0

α4=0

α5=0

α7=0

α8=0.6

α9=0

α10=0

!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n

Page 78: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

4/3/18

Dr.YanjunQi/UVA

• onlyafewαi canbenonzero!!

!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n

α6=1.4

Class 1

Class 2

α1=0.8

α2=0

α3=0

α4=0

α5=0α7=0

α8=0.6

α9=0

α10=0

Page 79: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dual SVM - interpretation

w = α ixiyii∑

For i that are 0, no influence

4/3/18 79

Dr.YanjunQi/UVA

α

Page 80: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dual SVM for linearly separable case –

Testing To evaluate a new sample xtswe need to compute:

yts! = sign(wTxts +b)= sign( α iyi

i=1..n∑ xi

Txts +b)

Dot product with (“all” ??) training samples

4/3/18 80

Dr.YanjunQi/UVA

yts! = sign α i yi x i

Txts( )i∈SupportVectors

∑ +b⎛

⎝⎜⎞

⎠⎟

For \alphai that are 0,

no influence

Page 81: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dual formulation for linearly non-separable case

Dual target function:

!!

maxα α i −i∑ 1

2 α iα jyi y ji,j∑ xi

Txj

α iyi =0i∑C >α i ≥0,∀i

The only difference is that the \alpha are now bounded

4/3/18 81

Dr.YanjunQi/UVA

Hyperparameter Cshouldbetunedthrough k-foldsCV

Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now

Onceagain,efficientalgorithmexisttofindαi

Page 82: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Dual formulation for linearly non-separable case

Dual target function:

!!

maxα α i −i∑ 1

2 α iα jyi y ji,j∑ xi

Txj

α iyi =0i∑C >α i ≥0,∀i

To evaluate a new sample xtswe need to compute:

wTxts +b= α iyi

i∈supportV∑ xi

Txts +b

The only difference is that the \alpha are now bounded

4/3/18 82

Dr.YanjunQi/UVA

Hyperparameter Cshouldbetunedthrough k-foldsCV

Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now

Onceagain,efficientalgorithmexisttofindαi

Page 83: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

Support Vector Machine

classification

Kernel Trick Func K(xi, xj)

Margin + Hinge Loss

QP with Dual form

Dual Weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

w = α ixiyii∑

argminw,b

wi2

i=1p∑ +C εi

i=1

n

subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi

K(x, z) :=Φ(x)TΦ(z)

83

maxα α i −i∑ 1

2 α iα jyi y ji,j∑ xi

Txj

α iyi =0i∑ , α i ≥0 ∀i

Page 84: UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü History of SVM ü Large Margin Linear Classifier ü Define Margin (M) in terms of model

References• BigthankstoProf.Ziv Bar-JosephandProf.EricXing@CMUforallowingmetoreusesomeofhisslides

• ElementsofStatisticalLearning,byHastie,Tibshirani andFriedman

• Prof.AndrewMoore@CMU’sslides• TutorialslidesfromDr.Tie-YanLiu,MSRAsia• APracticalGuidetoSupportVectorClassificationChih-WeiHsu,Chih-ChungChang,andChih-JenLin,2003-2010

• TutorialslidesfromStanford“ConvexOptimizationI—Boyd&Vandenberghe

4/3/18 84

Dr.YanjunQi/UVACS