An Introduction to Support Vector...
Transcript of An Introduction to Support Vector...
![Page 1: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/1.jpg)
An Introduction to Support Vector Machines
Seong-Bae Park
Kyungpook National Universityhttp://sejong.knu.ac.kr/~sbpark
![Page 2: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/2.jpg)
2
Supervised Learning
Environment Solution d
Problem xTeacher
Learner f (Student) -
yx
feedback
![Page 3: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/3.jpg)
3
Quality of Learning MachineLoss L(y, f(x, w)) ≥ 0
Discrepancy between true output and output by the learning machine
Risk functionalExpected value of the loss
LearningThe process of estimating the function f(x, w) which minimizes the risk functional using only the training data
∫= dydypwfyLwR xxx ),()),(,()(
![Page 4: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/4.jpg)
4
Common Learning Tasks (1)Classification
∫=⎩⎨⎧
≠=
=
dydypwfyLwR
wfywfy
wfyL
xxx
xx
x
),()),(,()(
),( if1),( if0
)),(,(
![Page 5: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/5.jpg)
5
Common Learning Tasks (2)Regression
Common Loss FunctionSquared Error (L2)
Risk
2)),(()),(,( wfywfyL xx −=
∫ −= dydypwfywR xxx ),()),(()( 2
![Page 6: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/6.jpg)
6
ML HypothesisMaximum Likelihood hypothesis
)|(maxarg
)()|(maxarg)(
)()|(maxarg
)|(maxarg
hDP
hPhDPDP
hPhDP
DhPh
Hh
Hh
Hh
HhML
∈
∈
∈
∈
=
=
=
=
![Page 7: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/7.jpg)
7
Maximum Likelihood RevisitedIf x1, …, xn are iid samples from a pdf , the likelihood is defined by
Maximum Likelihood EstimatorChoose w* that maximizes the likelihood
Relation to LossL(w) = - P(w|x) Take a log-likelihood
)|( wf x
.)|()|(1∏=
=n
ii wfwP xx
n
∑=
−=i
iML wfwR1
)|(ln)( x
![Page 8: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/8.jpg)
8
Empirical Risk MinimizationDo we know p(x, y)?
Generally NO!!!What we have is only training data!
Empirical Risk
ERM is more general than ML.In density estimation, ERM is equivalent to ML.
L(f(x,w))= - ln f(x|w)
∑
∫
=
=
=n
i
wfyLn
wR
dydypwfyLwR
1emp )),(,(1)(
),()),(,()(
x
xxx
![Page 9: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/9.jpg)
9
Risk and Empirical RiskWhen loss is
Relation between them (Vapnik, 1995)With probability 1-η,
h: VC dimension ( ≥ 0)Regardless of P(x, y)
∑
∫
=
−=
−=
l
iii wfy
lwR
ydPwfyR
1emp ),(
21)(
),(),(21)(
x
xxω
),(21 wfy x−
⎟⎠⎞
⎜⎝⎛ −+
+≤l
hlhwRR )4/log()1)/2(log()()( empηω
![Page 10: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/10.jpg)
10
Risk and Empirical RiskWhen loss is
Relation between them (Vapnik, 1995)With probability 1-η,
h: VC dimension ( ≥ 0)Regardless of P(x, y)
∑
∫
=
−=
−=
l
iii wfy
lwR
ydPwfyR
1emp ),(
21)(
),(),(21)(
x
xxω
),(21 wfy x−
⎟⎠⎞
⎜⎝⎛ −+
+≤l
hlhwRR )4/log()1)/2(log()()( empηω
VC confidence
![Page 11: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/11.jpg)
11
VC dimension
A set of instances S is shattered by {f(w)} iff for every dichotomy of S there exists some f(w) consistent with this dichotomy.
In case of l points, there are 2l dichotomies.The Vapnik-Chervonenkis dimension, VC, is the maximum number of training points that can be shattered by {f(w)}.
![Page 12: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/12.jpg)
12
Minimizing R(w) by minimizing h
![Page 13: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/13.jpg)
13
Perceptron Revisited: Linear Separators
wTx + b = 0
wTx + b < 0wTx + b > 0
f(x) = sign(wTx + b)
![Page 14: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/14.jpg)
14
Learning Perceptron (1)
Perceptron Learning AlgorithmGiven a training set S = {(x1, y1), …., (xl, yl) }and learning rate η ∈ R+
w0 ← 0; b0 ← 0; k ← 0R ← max1≤i≤l ||xi||while (there is some errors)
for i = 1 to lif yi( < wk ⋅ xi > + bk ) ≤ 0 then
wk+1 ← wk + ηyixibk+1 ← bk + ηyiR2
end ifend for
end whilereturn ( wk, bk )
![Page 15: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/15.jpg)
15
Learning Perceptron (2)
![Page 16: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/16.jpg)
16
Linear Separators
Which of the linear separators is optimal?
![Page 17: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/17.jpg)
17
MarginDistance from example x to the separator is Examples closest to the hyperplane are support vectors. Margin ρ of the separator is the width of separation between classes.
wxw br
T +=
r
ρ
![Page 18: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/18.jpg)
18
Maximum Margin ClassificationMaximizing the margin is good according to intuition and PAC theory.Implies that only support vectors are important; other training examples are ignorable.
![Page 19: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/19.jpg)
19
Linear SVM Mathematically (1)Assuming all data is at least distance 1 from the hyperplane, the following two constraints follow for a training set {(xi ,yi)}
For support vectors, the inequality becomes an equality. Since each example’s distance from the hyperplane is , the margin is:
wTxi + b ≥ 1 if yi = 1
wTxi + b ≤ -1 if yi = -1
w2
=ρwxw br
T +=
![Page 20: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/20.jpg)
20
Linear SVMs Mathematically (2)
Quadratic optimization problem:
A better formulation:
Find w and b such that
is maximized and for all {(xi ,yi)}wTxi + b ≥ 1 if yi=1; wTxi + b ≤ -1 if yi = -1
w2
=ρ
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}yi (wTxi + b) ≥ 1
![Page 21: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/21.jpg)
21
Solving the Optimization Problem
Need to optimize a quadratic function subject to linear constraints.Quadratic optimization problems are a well-known class of mathematical programming problems, and many (rather intricate) algorithms exist for solving them. The solution involves constructing a dual problem where a Lagrange multiplierαi is associated with every constraint in the primary problem:
Find w and b such thatΦ(w) =½ wTw is minimized and for all {(xi ,yi)}yi (wTxi + b) ≥ 1
Find α1…αN such thatQ(α) =Σαi - ½ΣΣαiαjyiyjxi
Txj is maximized and (1) Σαiyi = 0(2) αi ≥ 0 for all αi
![Page 22: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/22.jpg)
22
The Optimization Problem Solution
The solution has the form:
Each non-zero αi indicates that corresponding xi is a support vector.Then the classifying function will have the form:
Notice that it relies on an inner product between the test point x and the support vectors xi! Also keep in mind that solving the optimization problem involvedcomputing the inner products xi
Txj between all training points!
w =Σαiyixi b= yk- wTxk for any xk such that αk≠ 0
f(x) = ΣαiyixiTx + b
![Page 23: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/23.jpg)
23
Support Vectors in Dual Form
r
ρ
α1 = 0
α2 = 0
α3 = 0.8
α4 = 0
α5 = 0α6 = 0.4
α7 = 0
α8 = 0
α9 = 0
α10 = 0
α11 = 0
α12 = 0
α13 = 1.4α15 = 0
α14 = 0 α17 = 0
α19 = 0
α20 = 0α21 = 0
![Page 24: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/24.jpg)
24
Soft Margin Classification (1)What if the training set is not linearly separable?Slack variables ξi can be added to allow misclassification of difficult or noisy examples.
ξiξi
![Page 25: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/25.jpg)
25
Optimization Situation
Minimize
Dual problemMaximize
subject to
1111−=+−≤+⋅
Soft Margin Classification (2)
+=−+≥+⋅
iii
iii
yforbyforb
ξξ
wxwx
∑=
+l
iiC
1
2||||21 ξw
∑ =
≤≤ i
yCC
0parameter) defined-useran is (0
α
α
iii
jiji
jijii
i yy xx ⋅− ∑∑,2
1 ααα
![Page 26: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/26.jpg)
26
Non-linear SVMsDatasets that are linearly separable with some noise work out great:
But what are we going to do if the dataset is just too hard?
How about… mapping data to a higher-dimensional space:x2
0 x
0 x
0 x
![Page 27: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/27.jpg)
27
Non-linear SVMs: Feature spaces
General ideaThe original feature space can always be mapped to some higher-dimensional feature space where the training set is separable:
Φ: x → φ(x)
![Page 28: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/28.jpg)
28
The “Kernel Trick”The linear classifier relies on inner product between vectors K(xi,xj)=xi
Txj
If every data point is mapped into high-dimensional space via some transformation Φ: x → φ(x), the inner product becomes:
K(xi,xj)= φ(xi) Tφ(xj)A kernel function is some function that corresponds to an inner product into some feature space.Example: 2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi
Txj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):K(xi,xj)=(1 + xi
Txj)2,= 1+ xi1
2xj12 + 2 xi1xj1 xi2xj2+ xi2
2xj22 + 2xi1xj1 + 2xi2xj2=
= [1 xi12 √2 xi1xi2 xi2
2 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj2
2 √2xj1 √2xj2] == φ(xi) Tφ(xj), where φ(x) = [1 x1
2 √2 x1x2 x22 √2x1 √2x2]
![Page 29: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/29.jpg)
29
What Functions are Kernels?For some functions K(xi,xj) checking that K(xi,xj)= φ(xi) Tφ(xj) can be cumbersome. Mercer’s theorem:
Every semi-positive definite symmetric function is a kernelSemi-positive definite symmetric functions correspond to a semi-positive definite symmetric Gram matrix:
K(x1,x1) K(x1,x2) K(x1,x3) … K(x1,xN)
K(x2,x1) K(x2,x2) K(x2,x3) K(x2,xN)
… … … … …
K(xN,x1) K(xN,x2) K(xN,x3) … K(xN,xN)
K=
![Page 30: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/30.jpg)
30
Examples of Kernel FunctionsLinear: K(xi,xj)= xi
Txj
Polynomial of power p: K(xi,xj)= (1+ xi Txj)p
Gaussian (radial-basis function network): K(xi,xj)=
Two-layer perceptron: K(xi,xj)= tanh(β0xi Txj + β1)
2
2
2σji xx −
−e
![Page 31: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/31.jpg)
31
Non-linear SVMs MathematicallyDual problem formulation:
The solution is:
Optimization techniques for finding αi’sremain the same!
Find α1…αN such thatQ(α) =Σαi - ½ΣΣαiαjyiyjK(xi, xj) is maximized and (1) Σαiyi = 0(2) αi ≥ 0 for all αi
f(x) = ΣαiyiK(xi, xj)+ b
![Page 32: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/32.jpg)
32
SVM Structure
![Page 33: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/33.jpg)
33
VC dimension of SVMMinimal embedding space
Any embedding space with minimal dimension for a given kernel
Let K be a kernel which corresponds to a minimal embedding space H. Then the VC dimension of the corresponding SVM is dim(H) + 1.
VC dimension of SVM can be ∞.Striking conundrum
High VC dimension, but good performance!
![Page 34: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/34.jpg)
34
Generalization Error by Margin
Risk bound by margin ρWith a probability 1-η,
Large margin makes SVM stronger!
⎟⎟⎠
⎞⎜⎜⎝
⎛++≤ )/1log(log)( 2
2
2
ηρ
lRlc
lbwR
![Page 35: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/35.jpg)
35
SVMlight (1)Author: T. JoachimsDownload: http://svmlight.joachims.orgTwo executable files
svm_learnsvm_learn training_data model_file
svm_classifysvm_classify test_data model_file
svm_learn svm_classify
Training data Test data
Generated Model Classified Result
![Page 36: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/36.jpg)
36
SVMlight (2)
Written in CApplicable to Classification, Regression, and Ranking TasksCan handle thousands of support vectorsCan handle hundred-thousands of training examplesSupport standard kernel functions and user-defined kernelsUse sparse vector representation
![Page 37: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/37.jpg)
37
Why handling many SVs is important?
Learning SVM
Q: n x n matrix (Qij = yiyjK(xi, xj) )For many real world applications, Q is too large for standard computers.
SMO decompositionOverall QP QP subproblems
Joachims presented a optimization method for SMO decomposition.
Find α1…αN such thatQ(α) =Σαi - ½Σαiαj Qij is maximized and (1) Σαiyi = 0(2) αi ≥ 0 for all αi
![Page 38: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/38.jpg)
38
svm_learn options-z {c, r, p} Selection of task: classification(c), regression(r), preference
ranking(p) (default is c)-c float C parameter for soft-margin SVM (default: E[xTx]-1)
-t int Type of kernel functions0: linear1: polynomial (sx⋅y + c)d
2: RBF3: sigmoid than(sx⋅y + c)4: user defined kernel
-d int Parameter d in polynomial kernel
-g float Parameter gamma in RBF kernel
-s float Parameter s in sigmoid/polynomial kernel
-r float Parameter c in sigmoid/polynomial kernel-u string Parameter of user-defined kernel
![Page 39: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/39.jpg)
39
Format of data
Each data is represented as a line
Feature/value pairs must be ordered by increasing feature number.Features with value zero can be skipped.Example
-1 1:0.43 3:0.12 9284:0.2 # comment
<line> .=. <target> <feature>:<value>...<feature>:<value> # <info><target> .=. +1 | -1 | 0 | <float><feature> .=. <integer><value> .=. <float><info> .=. <string>
![Page 40: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/40.jpg)
40
Text Chunking
Corpushttp://sejong.knu.ac.kr/~sbpark/Chunk
� maj B-ADVP mmd B-NP ncn I-NP jxt I-NP ncn B-NP jcm I-NP� ncps I-NP jca I-NP mag B-ADVP paa B-VP ef I-VP� nbn I-VP paa I-VP ef I-VP. sf O
Information Value
VocabularyTotal WordsChunk TypesPOS TagsSentencesPhrases
16,838321,328
952
12,092112,658
![Page 41: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/41.jpg)
41
Context
![Page 42: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/42.jpg)
42
Data Format
1 1:1 16315:1 32630:1 50221:1 66411:1 82496:1 97890:1 114205:1 114258:1 114311:1 114401:1 114447:1 114492:1 114553:1 114576:1 114586:1 114596:1-1 1:1 16315:1 33906:1 50096:1 66181:1 81575:1 98759:1 114205:1 114258:1 114348:1 114394:1 114439:1 114500:1 114535:1 114576:1 114586:1 114599:1-1 1:1 17591:1 33781:1 49866:1 65260:1 82444:1 97890:1 114205:1 114295:1 114341:1 114386:1 114447:1 114482:1 114553:1 114576:1 114589:1 114603:11 1276:1 17466:1 33551:1 48945:1 66129:1 81575:1 97894:1 114242:1 114288:1 114333:1 114394:1 114429:1 114500:1 114556:1 114579:1 114593:1 114603:1-1 1276:1 17466:1 33551:1 49814:1 65260:1 81579:1 97890:1 114242:1 114288:1 114333:1 114376:1 114447:1 114503:1 114552:1 114583:1 114593:1 114599:1-1 1151:1 17236:1 33499:1 48945:1 65264:1 81575:1 98803:1 114235:1 114280:1 114323:1 114394:1 114450:1 114499:1 114533:1 114583:1 114589:1 114603:1
…
BNP.data
![Page 43: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/43.jpg)
43
Running SVMlight
svm_learn BNP.data BNP.modelSVM-light Version V3.500 # kernel type3 # kernel parameter -d 1 # kernel parameter -g 1 # kernel parameter -s 1 # kernel parameter -r empty # kernel parameter -u 114605 # highest feature index 290465 # number of training documents 13947 # number of support vectors plus 1 0.94731663 # threshold b -0.05882352941165028270553705169732 456:1 16683:1 33555:1 48945:1 65260:1 81981:1 98703:1 114229:1 114309:1 114324:1 114394:1 114447:1 114480:1 114564:1 114579:1 114593:1 114603:1 -0.05882352941165028270553705169732 1:1 17591:1 33555:1 49634:1 65472:1 82444:1 98054:1 114205:1 114295:1 114324:1 114401:1 114447:1 114482:1 114550:1 114576:1 114589:1 114603:1 …
![Page 44: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/44.jpg)
44
Performance
Decision Tree SVM MBL
AccuracyF-score
97.95±0.24%91.36±0.85
98.15±0.20%92.54±0.72
97.79±0.29%91.38±1.01
![Page 45: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/45.jpg)
45
Another Example Task
Korean Clause Boundary Detection
Word POS Chunk Output
기지에서보이는
위버반도에서가장높ㄴ
봉우리를
서울봉이
라부르ㄴ다
.
ncnjca
pvgetmnqjca
magpaaetmncnjconqjp
ecspvgefsf
B-NPI-NPB-VPI-VPB-NPI-NP
B-ADJPB-VPI-VPB-NPI-NPB-NPB-VPI-VPB-VPI-VP
O
SSSXXESXXXEXXXXEXXE
![Page 46: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/46.jpg)
46
Clause Boundary DetectionTwo Binary Classification Tasks
Finding Ending Point (S, X)Finding Starting Point (E, X)
Feature set
Feature set
Feature Selection
Feature Selection
Learning
Classification
Learning
Classification
Ending Point
Starting Point
S X
S: w1, w2, …, wi,…..wn
S: w1, w2, …, wi,…..wn
E X
![Page 47: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/47.jpg)
47
FeaturesDimension of a vector (= 4,232)
# of words: 4,171# of POSs: 52# of chunks: 9
Trigram Modelwi-1: 1 ~ 4,232wi: 4,233 ~ 8,464wi+1: 8,465 ~ 12,696
![Page 48: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/48.jpg)
48
Vector RepresentationWord POS Chunk Output
기지에서보이는
위버반도에서가장높ㄴ
봉우리를
서울봉이
라부르ㄴ다
.
ncnjca
pvgetmnqjca
magpaaetmncnjconqjp
ecspvgefsf
B-NPI-NPB-VPI-VPB-NPI-NP
B-ADJPB-VPI-VPB-NPI-NPB-NPB-VPI-VPB-VPI-VP
O
SSSXXESXXXEXXXXEXXE
wi
wi-1 wi-1 wi+1
30:1 6302:1 9921:14215:1 8423:1 12664:14229:1 8462:1 12692:14232:1
wi-1 wi wi+1
Word 는 위버반도 에서
POS etm nq jcaChungk I-VP B-NP I-NP
Ending Point E
-1 30:1 4215:1 4229:1 4232:1 6302:1 8423:1 8462:1 9921:1 12664:1 12692:1
![Page 49: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/49.jpg)
49
Execution of SVMlight (1)
![Page 50: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/50.jpg)
50
Execution of SVMlight (2)
![Page 51: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/51.jpg)
51
Third Example: Text Classification
Document into a vectorBinary Vector
x = <w1, w2, …, w|v|}Commonly-used Corpus
Reuters-2157812,902 Reuters stories118 categoriesModApte split
75% for training (9,603 stories)25% for test (3,299 stroies)
Feature Selection300 words with the highest mutual information with each category|v| = 300
![Page 52: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/52.jpg)
52
Text Classification Results
![Page 53: An Introduction to Support Vector Machinesisoft.postech.ac.kr/Course/CS703AHLT/SupportVectorMachines.pdf · 22 The Optimization Problem Solution The solution has the form: Each non-zero](https://reader034.fdocuments.us/reader034/viewer/2022042309/5ed5699d6551673b635ad687/html5/thumbnails/53.jpg)
53
Interpreting Weight VectorCategory “interest”
Terms with Highest WeightPrime: 0.70Rate: 0.67Interest: 0.63Rates: 0.60Discount: 0.46
Terms with Lowest WeightGroup: -0.24Year: -0.25Sees: -0.33World: -0.35Dlrs: -0.71