Support Vector Machine
-
Upload
carter-burris -
Category
Documents
-
view
16 -
download
4
description
Transcript of Support Vector Machine
![Page 1: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/1.jpg)
Support Vector Machine
Le Do Hoang Nam – CNTN08
![Page 2: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/2.jpg)
Linear Programming
General Form with x in Rn
Linear objective, Linear constraints, …
![Page 3: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/3.jpg)
Linear Programming
An example: The Diet Problem
How to come up with a cheapest meal that meets all nutrition standards?
![Page 4: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/4.jpg)
Linear Programming
Let x1, x2 and x3 be the amount in kilos of carrot, cabbage and cucumber in the dish.
Mathematically,
![Page 5: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/5.jpg)
Linear Programming
In canonical form:
How to solve? Simplex. Newton method. Gradient descend.
![Page 6: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/6.jpg)
LP and Classification
Given a set of N samples (mi, li) mi is the feature set.
li = -1 or 1 is the label.
If a sample is correctly classified by a hyper-plane wTx + c then:
li (wTmi + c) ≥ 1
linear function
![Page 7: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/7.jpg)
LP and Classification
(w, c) is a good classification if it satisfies:
li (wTmi + c) ≥ 1 , i = 1..nwhich are linear constraints
LP form:
![Page 8: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/8.jpg)
LP and Classification
Without any objective function, we have ALL possible solutions:
Class 1
Class 2
Class 1
Class 2
![Page 9: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/9.jpg)
LP and Classification
If data is not linearly separable:
Minimize number of errors
Class 1
Class 2
![Page 10: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/10.jpg)
LP and Classification
Our objective becomes:
But, cardinal function is non-linear not an LP
![Page 11: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/11.jpg)
LP and Classification
Cardinal function:
x
f(x)
1
O 1
Solution: Approximate it with Hinge-loss function.
![Page 12: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/12.jpg)
LP and Classification
Hinge-loss function:
x
f(x)
1
O 1
Or:
![Page 13: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/13.jpg)
LP and Classification
Classification problem now becomes:
which can be solved as an LP
![Page 14: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/14.jpg)
LP and Classification
Geometry view:
Class 1
Class 2
mi
mj
εi
εj
wTx + c = 0
wTx + c = -1
wTx + c = 1
![Page 15: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/15.jpg)
LP and Classification
Another problem: Some samples are uncertain
Class 1
Class 2
![Page 16: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/16.jpg)
LP and Classification
Solution: Maximum the margin d.
Class 1
Class 2
d
![Page 17: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/17.jpg)
LP and Classification
All samples are outside the margin
All the distances from samples to boundary are bigger than d/2. That means:
![Page 18: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/18.jpg)
LP and Classification
Because hyper-plane is homogenous, we choose w such as:
The objective function:
![Page 19: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/19.jpg)
LP and Classification
The problem now becomes:
![Page 20: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/20.jpg)
Support Vector Machine
Together with the error minimization, we have the SVM:
λ means the trade-off between error and robustness
![Page 21: Support Vector Machine](https://reader030.fdocuments.us/reader030/viewer/2022032607/5681309a550346895d967bdb/html5/thumbnails/21.jpg)
Kernel Method