UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü...

UVACS4501:MachineLearning

Lecture16:SupportVectorMachine

Dr.Yanjun Qi

UniversityofVirginiaDepartmentofComputerScience

Wherearewe?èFivemajorsectionsofthiscourse

q Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheoryq Graphicalmodels

4/3/18 2

Dr.YanjunQi/UVA

Today

qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 3

Dr.YanjunQi/UVA

HistoryofSVM• SVMisinspiredfromstatisticallearningtheory[3]• SVMwasfirstintroducedin1992[1]

• SVMbecomespopularbecauseofitssuccessinhandwrittendigitrecognition(1994)– 1.1%testerror rateforSVM.– Thesameastheerrorratesofacarefullyconstructedneuralnetwork,LeNet 4.

• Section5.11in[2]orthediscussionin[3]fordetails

• Regardedasanimportantexampleof“kernelmethods”,arguablythehottestareainmachinelearning20yearsago

[1] B.E. Boser et al. A Training Algorithm for Optimal Margin Classifiers. Proceedings of the Fifth Annual Workshop on Computational Learning Theory 5 144-152, Pittsburgh, 1992.

[2] L. Bottou et al. Comparison of classifier methods: a case study in handwritten digit recognition. Proceedings of the 12th IAPR International Conference on Pattern Recognition, vol. 2, pp. 77-82, 1994.

[3] V. Vapnik. The Nature of Statistical Learning Theory. 2nd edition, Springer, 1999.

4/3/18 4

Dr.YanjunQi/UVA

Theoreticallysound/Impactful

ApplicationsofSVMs

• ComputerVision• TextCategorization• Ranking(e.g.,Googlesearches)• HandwrittenCharacterRecognition• Timeseriesanalysis• Bioinformatics• ……….

àLotsofverysuccessfulapplications!!!

4/3/18

Dr.YanjunQi/UVA

5

Handwrittendigitrecognition

4/3/18

Dr.YanjunQi/UVA

6

ADatasetforbinary

classification

• Data/points/instances/examples/samples/records:[rows]• Features/attributes/dimensions/independent

variables/covariates/predictors/regressors:[columns,exceptthelast]• Target/outcome/response/label/dependentvariable:specialcolumntobe

predicted[lastcolumn]

4/3/18 7

Output as Binary Class:

only two possibilities

Dr.YanjunQi/UVA

AffineHyperplanes

• https://en.wikipedia.org/wiki/Hyperplane• Anyhyperplanecanbegivenin coordinates asthesolutionofasinglelinear(algebraic)equation ofdegree 1.

4/3/18

Dr.YanjunQi/UVA

8

Review:VectorProduct,Orthogonal,andNorm

Fortwovectorsx andy,

xTy

iscalledthe(inner)vectorproduct.

Thesquarerootoftheproductofavectorwithitself,

iscalledthe2-norm (|x|2),canalsowriteas|x|

x andy arecalledorthogonal if

xTy=0

4/3/18 9

Dr.YanjunQi/UVA

Today

qSupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 10

Dr.YanjunQi/UVA

LinearClassifiersfx yest

denotes+1denotes-1

Howwouldyouclassifythisdata?

4/3/18 11

Dr.YanjunQi/UVA


denotes+1denotes-1


4/3/18 12

Dr.YanjunQi/UVA


denotes+1denotes-1


4/3/18 13

Dr.YanjunQi/UVA


denotes+1denotes-1


4/3/18 14

Dr.YanjunQi/UVA


denotes+1denotes-1

Anyofthesewouldbefine..

..butwhichisbest?

4/3/18 15

Dr.YanjunQi/UVA

ClassifierMarginfx yest

denotes+1denotes-1 Definethemargin of

alinearclassifierasthewidththattheboundarycouldbeincreasedbybeforehittingadatapoint.

4/3/18 16

Dr.YanjunQi/UVA

MaximumMarginfx yest

denotes+1denotes-1 Themaximum

marginlinearclassifier isthelinearclassifierwiththe,maximummargin.ThisisthesimplestkindofSVM(CalledanLSVM)

LinearSVM4/3/18 17

Dr.YanjunQi/UVA




Support Vectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM4/3/18 18

Dr.YanjunQi/UVA

x2

x1




Support Vectorsarethosedatapointsthatthemarginpushesupagainst

LinearSVM

f(x,w,b) =sign(wTx +b)

4/3/18 19

Dr.YanjunQi/UVA

x2

x1

Max margin classifiers

• Instead of fitting all points, focus on boundary points

• Learn a boundary that leads to the largest margin from both sets of points

From all the possible boundary lines, this leads to the largest margin on both sides

4/3/18 20

Dr.YanjunQi/UVA

x2

x1



• Learn a boundary that leads to the largest margin from points on both sides

D

DWhy MAX margin?

• Intuitive, ‘makes sense’

• Some theoretical support

• Works well in practice

4/3/18 21

Dr.YanjunQi/UVA

x2

x1



• Learn a boundary that leads to the largest margin from points on both sides

D

DThat is why Also known as linear support vector machines (SVMs)

These are the vectors supporting the boundary

4/3/18 22

Dr.YanjunQi/UVA

x2

x1

Max-margin&DecisionBoundary

Class -1

Class 1

4/3/18

Dr.YanjunQi/UVA

23

W is a p-dim vector; b is a

scalar

x2

x1

Max-margin&DecisionBoundary• Thedecisionboundaryshouldbeasfarawayfrom

thedataofbothclassesaspossible

Class -1

Class 1

W is a p-dim vector; b is a

scalar

4/3/18

Dr.YanjunQi/UVA

24

Specifying a max margin classifier

Classify as +1 if wTx+b >= 1

Classify as -1 if wTx+b <= - 1

Undefined if -1 <wTx+b < 1

Class +1 plane

boundary

Class -1 plane

4/3/18 25

Dr.YanjunQi/UVA

Specifying a max margin classifier

Classify as +1 if wTx+b >= 1

Classify as -1 if wTx+b <= - 1

Undefined if -1 <wTx+b < 1

Is the linear separation assumption realistic?

We will deal with this shortly, but lets assume it for now

4/3/18 26

Dr.YanjunQi/UVA

Today

q SupervisedClassificationq SupportVectorMachine(SVM)

ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü MulticlassSVM

4/3/18 27

Dr.YanjunQi/UVA

Maximizing the marginClassify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1

• Lets define the width of the margin by M

• How can we encode our goal of maximizing M in terms of our parameters (w and b)?

• Lets start with a few obsevrations

4/3/18 28

Dr.YanjunQi/UVA

MarginM

4/3/18

Dr.YanjunQi/UVA

29

4/3/18

Dr.YanjunQi/UVA

30

• wT x+ + b = +1

• wT x- + b = -1

• M = | x+ - x- | = ?

Maximizing the margin: observation-1

• Observation 1: the vector w is orthogonal to the +1 plane

• Why?

Corollary: the vector w is orthogonal to the -1 plane 4/3/18 31

Dr.YanjunQi/UVA


Classify as +1 if wTx+b >= 1Classify as -1 if wTx+b <= - 1Undefined if -1 <wTx+b < 1


• Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0


Dr.YanjunQi/UVA




• Why?

Let u and v be two points on the +1 plane, then for the vector defined by u and v we have wT(u-v) = 0


Dr.YanjunQi/UVA

Observation1è Review: Orthogonal

4/3/18 34

Dr.YanjunQi/UVA


• Observation1:thevectorwisorthogonaltothe+1plane

Class 1

Class 2

M

4/3/18

Dr.YanjunQi/UVA

35



• Observation 1: the vector w is orthogonal to the +1 and -1 planes

• Observation 2: if x+ is a point on the +1 plane and x- is the closest point to x+ on the -1 plane then

x+ = λ w + x-

Since w is orthogonal to both planes we need to ‘travel’ some distance along w to get from x+ to x-

4/3/18 36

Dr.YanjunQi/UVA

4/3/18

Dr.YanjunQi/UVA

37

• wT x+ + b = +1

• wT x- + b = -1

• M = | x+ - x- | = ?

• x+ = λ w + x-

Putting it together

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λ w + x-

• | x+ - x- | = M

We can now define M in terms of w and b

4/3/18 38

Dr.YanjunQi/UVA

4/3/18

Dr.YanjunQi/UVA

39

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λw + x-

• | x+ - x- | = M


wT x+ + b = +1

=>

wT (λw + x-) + b = +1=>

wTx- + b + λwTw = +1

=>

-1 + λwTw = +1

=>

λ = 2/wTw

4/3/18 40

Dr.YanjunQi/UVA

Putting it together

• wT x+ + b = +1

• wT x- + b = -1

• x+ = λ w + x-

• | x+ - x- | = M

• λ = 2/wTw


M = |x+ - x-|

=>

=>

M =| λw |= λ | w |= λ wTw

€

M = 2 wTwwTw

=2wTw

4/3/18 41

Dr.YanjunQi/UVA

Finding the optimal parameters

€

M =2wTw

We can now search for the optimal parameters by finding a solution that:

1. Correctly classifies all points

2. Maximizes the margin (or equivalently minimizes wTw)

Several optimization methods can be used: Gradient descent, OR SMO (see extra slides)

4/3/18 42

Dr.YanjunQi/UVA

4/3/18

Dr.YanjunQi/UVA

43

4/3/18

Dr.YanjunQi/UVA

44

Thegradientpointsinthedirection ofthegreatestrateofincreaseofthefunctionanditsmagnitude istheslopeofthegraphinthatdirection

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 45

Dr.YanjunQi/UVA

Optimization Step i.e. learning optimal parameter for SVM

€

M =2wTw

4/3/18 46

Dr.YanjunQi/UVA


€

M =2wTw

Min (wTw)/2 subject to the following constraints:

4/3/18 47

Dr.YanjunQi/UVA


€

M =2wTw


For all x in class + 1

wTx+b >= 1

For all x in class - 1

wTx+b <= -1

} A total of n constraints if we have n training samples

4/3/18 48

Dr.YanjunQi/UVA

Optimization Reformulation



wTx+b >= 1


wTx+b <= -1

} A total of n constraints if we have n input samples

4/3/18 49

Dr.YanjunQi/UVA




wTx+b >= 1


wTx+b <= -1

} A total of n constraints if we have n input samples

4/3/18 50

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi x i ⋅w + b( ) ≥1




wTx+b >= 1


wTx+b <= -1

}A total of n constraints if we have n input samples

4/3/18 51

Dr.YanjunQi/UVA

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1

Quadratic Objective

Quadratic programming i.e., - Quadratic objective - Linear constraints

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)ü LinearlyNon-separablecase(softSVM)ü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 52

Dr.YanjunQi/UVA

Linearly Non separable case• So far we assumed that a linear hyperplane can perfectly separate the points

• But this is not usually the case

- noise, outliers How can we convert this to a QP problem?

- Minimize training errors?

min wTw

min #errors

Hard to solve (two minimization problems)

4/3/18 53

Dr.YanjunQi/UVA

Linearly Non separable case• So far we assumed that a linear plane can perfectly separate the points

• But this is not usally the case

- noise, outliersHow can we convert this to a QP problem?

- Minimize training errors?

min wTw

min #errors

- Penalize training errors:

min wTw+C*(#errors)

Hard to solve (two minimization problems)

Hard to encode in a QP problem

4/3/18 54

Dr.YanjunQi/UVA

Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane

-1 plane

+1 plane

jk

The new optimization problem is:

!!minw

wTw2 + Cε i

i=1

n

∑

4/3/18 55

Dr.YanjunQi/UVA

ε ε

Linearly Non separable case• Instead of minimizing the number of misclassified points we can minimize the distance between these points and their correct plane

-1 plane

+1 plane

jk


subject to the following inequality constraints:For all xi in class + 1

wTxi+b >= 1- i

For all xi in class - 1

wTxi+b <= -1+ i

minw

wTw2 +C ε i

i=1

n

∑

Wait. Are we missing something?

4/3/18 56

Dr.YanjunQi/UVA

ε

εε ε

Final optimization for linearly non-separable case


subject to the following inequality constraints:

minw

wTw2 +C ε i

i=1

n

∑

For all i

} A total of n constraints

} Another n constraints

4/3/18 57

Dr.YanjunQi/UVA

!!ε i ≥0

4/3/18

Dr.YanjunQi/UVA

58

Two optimization problems:

For the separable and non separable cases


wTx+b >= 1


wTx+b <=-1

minw

wTw2 +C ε i

i=1

n

∑

For all i

€

minwwTw2

!!ε i ≥0

HingeLossforSoftSVM

4/3/18

Dr.YanjunQi/UVA

59

argminw,b

wTw2

+C max(0,1− yi wT x i + b( ))i=1

n

∑

subject to:

yi wT x i + b( ) ≥1− ε i

ε i ≥ 0

minw

wTw2 +C ε i

i=1

n

∑

For all i

ε i ≥0

soft

argminw,b

wi2

i=1p∑

subject to ∀x i ∈Dtrain : yi wT x i + b( ) ≥1

vs.HardSVM

4/3/18

Dr.YanjunQi/UVA

60

ModelSelection,findrightC

Selecttheright

penaltyparameter

C

4/3/18

Dr.YanjunQi/UVA

61

ModelSelection,findrightC

AlargevalueofCmeansthatmisclassificationsarebad- resultinginsmallermarginsandlesstrainingerror(butmoreexpectedtrueerror).

AsmallCresultsinmoretrainingerror,hopefullybettertrueerror.

Today

q SupportVectorMachine(SVM)ü HistoryofSVMü LargeMarginLinearClassifierü DefineMargin(M)intermsofmodelparameterü Optimizationtolearnmodelparameters(w,b)üLinearlynon-separablecaseü Optimizationwithdualformü Nonlineardecisionboundaryü PracticalGuide

4/3/18 62

Dr.YanjunQi/UVA

Two optimization problems: For the separable and non separable cases

Min (wTw)/2 minw

wTw2 +C ε i

i=1

n

∑

For all i

• Instead of solving these QPs directly we will solve a dual formulation of the SVM optimization problem

• The main reason for switching to this type of representation is that it would allow us to use a neat trick that will make our lives easier (and the run time faster)

4/3/18 63

Dr.YanjunQi/UVA

!!ε i ≥0

Optimization Review: ConstrainedOptimization

4/3/18

Dr.YanjunQi/UVA

64

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:


4/3/18

Dr.YanjunQi/UVA

65

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:


4/3/18

Dr.YanjunQi/UVA

66

minu u2

s.t. u>=b

b Global min

Allowed min

b Global min

Allowed min

Case1:

Case2:

Optimization Review: Ingredients

• Objectivefunction• Variables• Constraints

Findvaluesofthevariablesthatminimizeormaximizetheobjectivefunctionwhilesatisfyingtheconstraints

674/3/18

Dr.YanjunQi/UVA

Dr.YanjunQi/UVA

Optimization Review: Lagrangian Duality(Extra)

• ThePrimalProblem

Primal:

ThegeneralizedLagrangian:

theα's(αι≥0)arecalledtheLagarangianmultipliersLemma:

Are-writtenPrimal:

minws.t.

f0(w)fi(w)≤0,i =1,…,k

L(w ,α )= f0(w)+ α i fi(w)

i=1

k

∑

maxα ,α i≥0L(w ,α )=

f0(w) ifw satisfiesprimalconstraints∞ o/w

⎧⎨⎪

⎩⎪

minwmaxα ,α i≥0L(w ,α ) ©EricXing@CMU,2006-20084/3/18

“MethodofLagrangemultipliers”converttoahigher-dimensional problem

Dr.YanjunQi/UVA

69

Optimization Review: Lagrangian Duality,cont.(Extra)

• RecallthePrimalProblem:

• TheDualProblem:

• Theorem(weakduality):

• Theorem(strongduality):Iff thereexistasaddlepointof

wehave

minwmaxα ,α i≥0L(w ,α )

maxα ,α i≥0minwL(w ,α )

d* =maxα ,α i≥0minwL(w ,α ) ≤ minwmaxα ,α i≥0L(w ,α )= p

*

** pd = L(w ,α )

4/3/18

An alternative representation of the SVM QP

• We will start with the linearly separable case

• Instead of encoding the correct classification rule and constraint we will use Lagrange multiplies to encode it as part of the our minimization problem

Min (wTw)/2

s.t.

(wTxi+b)yi >= 1

4/3/18 70

Dr.YanjunQi/UVA

Recall that Lagrange multipliers can be applied to turn the following problem:

Lprimal(w ,b,α )=

12 w

2− α i yi(w ⋅x i +b)−1( )

i=1

N

∑

Dr.YanjunQi/UVA

71

***( )

TheDualProblem(Extra)

• WeminimizeL withrespecttow andb first:

Notethat(*) implies:

• Plus(***)backtoL ,andusing(**),wehave:

maxα i≥0minw ,bL(w ,b,α )

!! ∇wL(w ,b,α )! =w− α i yixi =0

i=1

train

∑ ,

!! ∇bL(w ,b,α )! = α i yi =0

i=1

train

∑ ,

!!w = α i yixi

i=1

train

∑

*()

!!! L(w ,b,α )= α i

i=1∑ − 12 α iα j yi y j(x iTx j )

i , j=1∑

**( )

4/3/18

Solvingforw thatgivesmaximummargin:1. Combineobjectivefunctionandconstraintsintonew

objectivefunction,usingLagrangemultipliers \alphai

2. TominimizethisLagrangian,wetakederivativesofw andbandsetthemto0:

Summary:DualforSVM(Extra)

!!!Lprimal =

12 w

2− α i yi(w ⋅x i +b)−1( )

i=1

N

∑

4/3/18

Dr.YanjunQi/UVA

72

3. Substitutingandrearranginggivesthedual oftheLagrangian:

whichwetrytomaximize(notminimize).

4. Oncewehavethe\alphai,wecansubstituteintopreviousequationstogetw andb.

5. Thisdefinesw andb aslinearcombinationsofthetrainingdata.

Summary:DualforSVM(Extra)

Ldual = α i

i=1

N

∑ − 12 i∑ α iα j yi y jx i ⋅x j

j∑

4/3/18

Dr.YanjunQi/UVA

73!!w = α i yixi

i=1

train

∑

Summary: Dual SVM for linearly separable case

Dual formulation

maxα αi −i∑ 1

2αiα jyiyj

i,j∑ xi

Txj

αiyi = 0i∑

αi ≥ 0 ∀i

4/3/18 74

Dr.YanjunQi/UVA

EasierthanoriginalQP,moreefficientalgorithmsexisttofindαi

Dual SVM for linearly separable case – Training

Our dual target function: maxα αi −i∑ 1

2αiα jyiyj

i,j∑ xi

Txj

αiyi = 0i∑

αi ≥ 0 ∀i

Dot product for all training samples

Dr.YanjunQi/UVA

4/3/18 75

Dr.YanjunQi/UVA

76

Supportvectors:non-zero αi• onlyafewαi's canbenonzero!!

α i yi(w ⋅x i +b)−1( ) =0,i =1,…,n

Wecallthetrainingdatapointswhoseαi's arenonzerothesupportvectors (SV)

4/3/18

4/3/18

Dr.YanjunQi/UVA

77

α6=1.4

Class 1

Class 2

α1=0.8

α2=0

α3=0

α4=0

α5=0

α7=0

α8=0.6

α9=0

α10=0

!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n

4/3/18

Dr.YanjunQi/UVA

• onlyafewαi canbenonzero!!

!!! α i yi(w ⋅x i +b)−1( ) =0,!!!!i =1,…,n

α6=1.4

Class 1

Class 2

α1=0.8

α2=0

α3=0

α4=0

α5=0α7=0

α8=0.6

α9=0

α10=0

Dual SVM - interpretation

€

w = α ixiyii∑

For i that are 0, no influence

4/3/18 79

Dr.YanjunQi/UVA

α

Dual SVM for linearly separable case –

Testing To evaluate a new sample xtswe need to compute:

yts! = sign(wTxts +b)= sign( α iyi

i=1..n∑ xi

Txts +b)

Dot product with (“all” ??) training samples

4/3/18 80

Dr.YanjunQi/UVA

yts! = sign α i yi x i

Txts( )i∈SupportVectors

∑ +b⎛

⎝⎜⎞

⎠⎟

For \alphai that are 0,

no influence

Dual formulation for linearly non-separable case

Dual target function:

!!

maxα α i −i∑ 1

2 α iα jyi y ji,j∑ xi

Txj

α iyi =0i∑C >α i ≥0,∀i

The only difference is that the \alpha are now bounded

4/3/18 81

Dr.YanjunQi/UVA

Hyperparameter Cshouldbetunedthrough k-foldsCV

Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now

Onceagain,efficientalgorithmexisttofindαi

Dual formulation for linearly non-separable case

Dual target function:

!!



Txj

α iyi =0i∑C >α i ≥0,∀i

To evaluate a new sample xtswe need to compute:

wTxts +b= α iyi

i∈supportV∑ xi

Txts +b

The only difference is that the \alpha are now bounded

4/3/18 82

Dr.YanjunQi/UVA

Hyperparameter Cshouldbetunedthrough k-foldsCV

Thisisverysimilartotheoptimizationproblem inthelinearseparablecase,exceptthatthereisanupperbound C onαi now

Onceagain,efficientalgorithmexisttofindαi

Support Vector Machine

classification

Kernel Trick Func K(xi, xj)

Margin + Hinge Loss

QP with Dual form

Dual Weights

Task

Representation

Score Function

Search/Optimization

Models, Parameters

€

w = α ixiyii∑

argminw,b

wi2

i=1p∑ +C εi

i=1

n

∑

subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi

K(x, z) :=Φ(x)TΦ(z)

83



Txj

α iyi =0i∑ , α i ≥0 ∀i

References• BigthankstoProf.Ziv Bar-JosephandProf.EricXing@CMUforallowingmetoreusesomeofhisslides

• ElementsofStatisticalLearning,byHastie,Tibshirani andFriedman

• Prof.AndrewMoore@CMU’sslides• TutorialslidesfromDr.Tie-YanLiu,MSRAsia• APracticalGuidetoSupportVectorClassificationChih-WeiHsu,Chih-ChungChang,andChih-JenLin,2003-2010

• TutorialslidesfromStanford“ConvexOptimizationI—Boyd&Vandenberghe

4/3/18 84

Dr.YanjunQi/UVACS

UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü...

Documents

Transcript of UVA CS 4501: Machine Learning Lecture 16: Support Vector ...q Support Vector Machine (SVM) ü...