Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the...
-
date post
20-Dec-2015 -
Category
Documents
-
view
220 -
download
2
Transcript of Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the...
![Page 1: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/1.jpg)
Support Vector Classification(Linearly Separable Case, Primal)
The hyperplane that solves the minimization problem:
(w;b)
min(w;b)2R n+1
21 jjwjj22
D(Aw+ eb)>e;
realizes the maximal margin hyperplane withgeometric margin í = jjwjj2
1
![Page 2: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/2.jpg)
Support Vector Classification(Linearly Separable Case, Dual Form)
The dual problem of previous MP:
maxë2R l
e0ë à 21ë0DAA0Dë
subject to
e0Dë = 0; ë>0:Applying the KKT optimality conditions, we have
w = A0Dë. But where isb?
06ë ? D(Aw+ eb) à e>0Don’t forget
![Page 3: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/3.jpg)
Dual Representation of SVM
(Key of Kernel Methods: )
The hypothesis is determined by(ëã;bã)
h(x) = sgn(êx;A0Dëã
ë+ bã)
= sgn(P
i=1
l
yiëãi
êxi;x
ë+ bã)
= sgn(P
ëãi >0
yiëãi
êxi;x
ë+ bã)
w = A0Dëã =P
i=1
`
yiëiA0i
Remember : A0i = xi
![Page 4: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/4.jpg)
Compute the Geometric Margin via Dual Solution
The geometric margin í = jjwãjj21 and
êwã;wã
ë= (ëã)0DAA0Dëã, hence we can
computeí by usingëã. Use KKT again (in dual)!
0 6 ëã ? D(AA0Dëã + bãe) à e> 0 Don’t forgete0Dëã = 0
í = (e0ëã)à 21
= (P
ëãi >0
ëãi )
à 21
![Page 5: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/5.jpg)
Soft Margin SVM(Nonseparable Case)
If data are not linearly separable Primal problem is infeasible Dual problem is unbounded above
Introduce the slack variable for each training point
yi(w0xi + b)>1à øi; øi>0 8 i
The inequality system is always feasible
w = 0; b= 0 & ø= ee.g.
![Page 6: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/6.jpg)
xj
x
x
x
x
x
x
x
x
o
o
o
o
o
o
o
oi
í
í
øj
øi
![Page 7: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/7.jpg)
Two Different Measures of Training Error
min(w;b;ø)2R n+1+l
21jjwjj22 + 2
Cjjøjj22
D(Aw+ eb) + ø>e
2-Norm Soft Margin:
1-Norm Soft Margin:min
(w;b;ø)2R n+1+l21jjwjj22 + Ce0ø
D(Aw+ eb) + ø>e
ø> 0
![Page 8: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/8.jpg)
2-Norm Soft Margin Dual Formulation
The Lagrangian for 2-norm soft margin:
L (w;b;ø;ë) = 21w0w+ 2
Cø0ø+ë0[eà D(Aw+ eb) à ø]
where ë>0
The partial derivatives with respect to primalvariables equal zeros
@w@L (w;b;ø;ë) = wà A0Dë = 0
@b@L (w;b;ø;ë) = e0Dë = 0; @ø
@L (w;b;ø;ë) = Cøà ë = 0
![Page 9: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/9.jpg)
Dual Maximization ProblemFor 2-Norm Soft Margin
Dual:
ë>0
maxë2R l
e0ë à 21ë0D(AA0+ C
1I )Dë
e0Dë = 0
The corresponding KKT complementarity:
06ë ? D(Aw+ eb) + øà e>0 Use above conditions to find bã
![Page 10: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/10.jpg)
f (x) =ð P
i=1
?wiþi(x)
ñ+ b
Linear Machine in Feature Space
Let þ : X ! Fbe a nonlinear map from the
input space to some feature space
The classifier will be in the form (Primal):
Make it in the dual form:
f (x) =ð P
i=1
lë iyi
êþ(xi) áþ(x)
ëñ+ b
![Page 11: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/11.jpg)
K (x;z) =êþ(x) áþ(z)
ë
Kernel: Represent Inner Product in Feature Space
The classifier will become:
f (x) =ð P
i=1
lë iyiK (xi;x)
ñ+ b
Definition: A kernel is a functionK : X â X ! Rsuch thatfor all x;z 2 X
where þ : X ! F
![Page 12: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/12.jpg)
Introduce Kernel into DualFormulation
Let S = f (x1;y1);(x2;y2);. . .(xl;yl)gbe a linearly separable training sample in the feature space
implicitly defined by the kernel K (x;z).The SV classifier is determined byëã that
solvesmaxë2R l
e0ë à 21ë0DK (A;A0)Dë
subject to
e0Dë = 0; ë>0:
![Page 13: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/13.jpg)
The value of kernel function represents the inner product in feature space
Kernel functions merge two steps 1. map input data from input space to feature space (might be infinite dim.) 2. do inner product in the feature space
Kernel TechniqueBased on Mercer’s Condition (1909)
![Page 14: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/14.jpg)
Mercer’s Conditions Guarantees the Convexity of QP
and k(x;z)is a symmetric function onX .
K 2 Rnâ n
be a finite spaceX = f x1; x2; . . .; xngLet
Then k(x;z)is a kernel function if and only if
is positive semi-definite.;K i j = k(xi;xj)
![Page 15: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/15.jpg)
Introduce Kernel in Dual FormulationFor 2-Norm Soft Margin
ë>0
maxë2R l
e0ë à 21ë0D(K (A;A0) + C
1I )Dë
e0Dë = 0
Then the decision rule is defined by
Use above conditions to find
The feature space implicitly defined byk(x;z) Supposeëãsolves the QP problem:
h(x) = sgn(K (x;A0)Dëã + bã)
![Page 16: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/16.jpg)
Introduce Kernel in Dual Formulationfor 2-Norm Soft Margin
for any
bã is chosen so that
yi[K (A0i;A
0)Dëã + bã] = 1à Cëã
i
i with ëãi 6= 0
06ëã ? D(K (A;A0)Dëã + ebã)+ øã à e> 0
Because:
and ëã = Cøã
![Page 17: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/17.jpg)
Geometric Margin in Feature Spacefor 2-Norm Soft Margin
The geometric margin in the feature space is defined by
í = jjwãjj21 =
àe0ëã à C
1jjëãjj22áà 2
1
jjwãjj22 = (ëã)0DK (A;A0)Dëã
...= e0ëã à C
1 jjëãjj22
Why e0øã > jjøãjj22 ?
![Page 18: Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.](https://reader030.fdocuments.us/reader030/viewer/2022032704/56649d455503460f94a22759/html5/thumbnails/18.jpg)
Discussion about Cfor 2-Norm Soft Margin
The only difference between “hard margin” and 2-norm soft margin is the objective function in the optimization problem
Larger C will give you a smaller margin in the feature space
CompareK (A;A0) & (K (A;A0) + C1I )
Smaller C will give you a better numerical condition