1 3.6 Support Vector Machines K. M. Koo. 2 Goal of SVM Find Maximum Margin goal find a separating...

Post on 26-Mar-2015

222 views 1 download

Tags:

Transcript of 1 3.6 Support Vector Machines K. M. Koo. 2 Goal of SVM Find Maximum Margin goal find a separating...

1

3.6 Support Vector Machines

K. M. Koo

2

Goal of SVMFind Maximum Margin

goalfind a separating hyperplane with

maximum margin

marginminimum distance between a

separating hyperplane and the sets of or

1 2

3

Goal of SVMFind Maximum Margin

assume that are linearly separable

margin

find separating hyperplane with maximum margin

1

2

21,

4

Calculate margin

separating hyperplane and are not uniquely determinedunder the constraint ,

and are uniquely determine

0)( 0 wg Txwx

0)( 0 wxg Txww 0w

10min wTxwx

w0w

5

Calculate margin

distance between a point x and is given by

thus, the margin is given by

)(xg

wxw /0wT

w

wxwwxwxx

/1

/)()/( 00 minmin

ww TT

6

Optimization of margin

maximization of margin

20

10

,1

,1

thatrequiring

211

xxw

xxw

www

w

wT

T

7

Optimization of margin

therefore, we want to

Niwy

J

iT

i ,...,2,1 1)( subject to

2

1)( minimize

0

2

xw

ww

separating hyperplanewith maximal margin

separating hyperplanewith minimum w

This is an optimization-problem with inequality constraints

8

optimization with constraints

)(θJ

1

2

constraint

cost function

min-value

)(θJ

1

2

min-value

optimization with equality constraintsoptimization with inequality constraints

constraint

9

Lagrange Multiplier

optimization problem under constraints can be solved by the method of Lagrange Multipliers

let be real valued functions, let and ,and let , the level set for with value . assume .if has a local minimum or maximum on at , which is called a critical point of ,then there is a real number ,called a Lagrange multiplier, such that

nRURUgf for ,:,U0x cg )( 0x )(1 cgS

g c 0)( 0 xg Sf |

0x

Sf |

)()( 00 xx gf

S

0),( xL

10

The Method of Lagrange Multiplier

1

2

1)( cJ θ

2)( cJ θ

3)( cJ θ

b

JT θa

θ

subject to

)( minimize

aθθ

))(( *J

bT θa

a

1

2

)(θJ

1c2c

3c

bT θa

11

Lagrange Multiplier

Lagrangian is obtained as follows:for equality constraints

for inequality constraints

In our caseInequality constraints

N1,2,...,i ,0

]1)([2

1),,(

10

2

0

i

N

ii

Tii wywL

xwwλw

)()(),(1

iTi

m

ii bJL

θaθλθ

mibJL iiTi

m

ii ,...,1,0 0, )()(),(

1

θaθλθ

12

Convex

a subset is convex iff for any , the line segment joining and is also a subset of , i.e. for any ,

a real-valued function on is convex iff for any two points and for any ,

XC Cyx ,

x yC

f C

Cyx ,

]1,0[)()1()())1(( yfxfyxf

]1,0[Cyx )1(

13

Convex

)(xf

x

)(xf

x

)(xf

x

x x

y y

convex set concave set

convex function concave function neither convex nor concave

1x 2x

)( 1xf

)( 2xf

14

Convex Optimization

an optimization problem is said to be convex iff the cost function as well as the constraints are convex the optimization problem for SVM is convex

the solution to a convex problem, if it exist, is unique. that is, there is no local optimum!

for convex optimization problem, KKT(Karush-Kuhn-Tucker) condition is necessary and sufficient for the solution

15

KKT(Karush-Kuhn-Tucker) condition KKT condition

1. The gradient of the Lagrangian with respect to the original variable is 0

2. The original constraints are satisfied

3. Multipliers for inequality constraints

4. (Complementary KKT) product of multiplier and constraints equal to 0

for convex optimize problems,1-4 are necessary and sufficient for the solution

0

0λww

),,( 0wL

0),,( 00

λw wLw

Nii ,...,2,0 ,0

Niwy iT

ii ,...,2,1 ,0]1)([ 0 xw

16

KKT condition for the optimization of margin

recall

KKT condition

Niwy

J

iT

i ,...,2,1 1)( subject to

2

1)( minimize

0

2

xw

ww

Niwy

Ni

wLw

wL

iT

ii

i

,...,2,1 ,0]1)([

,...,2,0 ,0

0),,(

),,(

0

00

0

xw

λw

0λww

(3.62)

(3.63)

(3.64)

(3.65)

(3.66)

N

ii

Tii

T wxwywwwL1

00 ]1)([2

1),,( λw

17

KKT condition for the optimization of margin

Combining (3.66) with (3.62)

01

1

N

iii

N

iiii

y

yw

x (3.67)

(3.68)

18

Remarks-support vector

of the optimal solution is a linear combination of feature vectors which are associated with

support vectors are associated with

sN

iiii y

1

xw

wNN s

ix0i

0i

Niwy iT

ii ,...,2,1 ,0]1)([ 0 xw

19

Remarks-support vector

0

ctorsupport ve -non

i

0

ctorsupport ve

i

00 wTxw

10 wTxw

10 wTxw

The resulting hyperplane classifier is insensitive to the number and position of non-support vector

20

Remark-computation w0

can be implicitly obtaines by any of the condition satisfying strict complement (i.e. )

In practice, is computed as an average value obtained using all conditions of the type

0]1)([ 0 wy iT

ii xw

0i

0w

0w

21

Remark-optimal hyperplane is unique

the optimal hyperplane classifier of a support vector machine is unique under two conditionthe cost function is convex the inequality constraints consist of

linear functionsconstraints are convex

an optimization problem is said to be convex iff the target(or cost) function as well as the constraints are convex (the optimization problem for SVM is convex)

the solution to a convex problem, if it exist, is unique. that is, there is no local optimum!

22

Computation optimal Lagrange multiplier

optimization problem belongs to the convex programming family (convex optimization problem) of problems

It can be solved by considering the so called Lagrangian duality and can be stated equivalently by its Wolfe dual representation form

),( subject to

),(max

),(minmax),(maxmin)(min

0

00

λθθ

λθ

λθλθθ

λ

θλλθθ

L

L

LLJ

Lagrangian duality

Wolfe dual representation

23

Wolfe dual representation form

xw

λw

0

subject to

]1)([

2

1),,( maximize

N

1i

1

10

0

ii

N

iiii

N

ii

Tii

T

y

y

wxwy

wwwL

24

Computation optimal Lagrange multiplier

once the optimal Lagrangian multipliers have been computed, the optimal hyperplane is obtained

xxλ

,0 subject to

2

1max

1

,1

N

iii

jij

Tijiji

N

ii

y

yy

(3.75)

(3.76)

25

Remarks

the cost function does not depend explicitly on the dimensionality of the input spacethis allows for efficient generalizations

in the case of nonlinearly separable classes

although the resulting optimal hyperplane is unique, there is no guarantee about Lagrange multipliers

26

Simple example

T

N

ii

Tii

TT

TT

w

L

ww

L

ww

L

www

www

www

wwwww

wxywL

],[

00

0

0

)1(

)1(

)1(

)1(2

]1)([2

),,(

]1,1[,]1,1[:

]1,1[,]1,1[:

43214321

43210

432122

432111

0214

0213

0212

0211

22

21

10

2

0

2

1

w

ww

λw

consider the two classification task that consists of the following points

its Lagrangian function

KKT condition

27

Simple exampleLagrangian duality

0,]0,1[

],[

0

)(21

)(21

)(21

)(21

)22(max

0

43214321

4321

32

41

32

41

324124

23

22

214321

wT

T

w

λ

optimize with equality constraint

resultmore then one solution

28

SVM for Non-separable Classes

in the case of non-separable, the training feature vector belong to one of the following three categories

10 wTxw

1)(0 0 wy Ti xw

0)( 0 wy Ti xw

10 wTxw

00 wTxw

10 wTxw

29

SVM for Non-separable Classes

All three cases can be treated under a single type constraints

iT

i wy 1)][ 0xw

0i

10 i

0i

30

SVM for Non-separable Classes

goal ismake the margin as large as possible keep the number of points with as

small as possible

(3.79) is intractable because of discontinuous function

0i

0 0

0 1)(

)(2

1),,(

1

2

0

i

ii

N

ii

I

ICwJ

wξw (3.79)

)(I

31

SVM for Non-separable Classes

as common case, we choose to optimize a closely related cost function

Ni

Niwy

CwJ

i

iiT

i

N

ii

,...,2,1 ,0

,...,2,1 ,1][ subject to

2

1),,( minimize

0

1

2

0

xw

wξw

32

SVM for Non-separable Classes

to Lagrangian

]1)([2

1),,,,(

11

10

2

0

N

iii

N

ii

N

iii

Tii

C

wywL

xwwξw

33

SVM for Non-separable Classes

The corresponding KKT condition

Ni

Niwy

Ni

NiCL

yw

L

yL

ii

iiT

ii

ii

iii

ii

N

iiii

,...,2,0 ,0

,...,2,1 ,0]1)([

,...,2,0 ,0 ,0

,...,2,1 ,0 or 0

0or 0

or

0

N

1i0

1

xw

xw0w (3.85)

(3.86)

(3.87)

(3.90)

(3.88)

(3.89)

34

SVM for Non-separable Classes

The associated Wolfe dual representation now becomes

Ni

NiC

y

y

wL

ii

ii

ii

N

iiii

,...,2,0 ,0 ,0

,...,2,1 ,0

0

subject to

),,,,( maximize

N

1i

1

0

xw

ξw

35

SVM for Non-separable Classes

equivalent to

0

,...,2,1 ,0 subject to

2

1max

1

,1

N

iii

i

jij

Tijiji

N

ii

y

NiC

yy

xx

36

Remarks-difference with the linearly separable case

Lagrange multipliers( ) need to be bounded by C

the slack variables, , and their associated Lagrange multipliers, , do not enter into the problem explicitlyreflected indirectly though C

i

ii

37

RemarksM-class problem

SVM for M-class problem design M separating hyperplanes so th

at separate class from all the others

assign0)( xgi

0)( xgi i

0)( xgi

0)( xgi

0)( xgi

)}({maxarg if in xgi kk

i x