Svm V SVC

download Svm V SVC

If you can't read please download the document

Transcript of Svm V SVC

Blue and Grey



Support Vector Machine

By: Amr Koura

Agenda

Definition.

Kernel Functions.

Optimization Problem.

Soft Margin Hyperplanes.

V-SVC.

SMO algorithm.

Demo.




Definition

Definition

Supervised learning model with associated learning algorithms that analyze and recognize patterns.

Application:
- Machine learning.
- Pattern recognition.- classification and regression analysis.

Binary Classifier

Given set of Points P={ such that
and } .
build model that assign new example to

Question

What if the examples are not linearly separable?








http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex8/ex8.html




Kernel Function

Kernel Function

SVM can efficiently perform non linear classification using Kernel trick.


Kernel trick map the input into high dimension space where the examples become linearly separable.



Kernel Function


































https://en.wikipedia.org/wiki/Support_vector_machine

Kernel Function

Linear Kernel.

Polynomial Kernel.

Gaussian RBF Kernel.

Sigmoid Kernel.

Linear Kernel Function

K(X,Y)=

Dot product between X,Y.

Polynomial Kernel Function




Where d: degree of polynomial, and c is free parameter trade off between the influence of higher and lower order terms in polynomials.

Gaussian RBF Kernel






Where denote square euclidean distance.

Other form:

Sigmoid Kernel Function





Where is scaling factor and r is shifting parameter.




Optimization Problem

Optimization Problem

Need to find hyperplane with maximum margin.










https://en.wikipedia.org/wiki/Support_vector_machine

Optimization Problem

Distance between two hyperplanes = .

Goal:
1- minimize ||W||.
2- prevent points to fall into margin.

Constraint:
and

together:
, st:

Optimization Problem

Mathematically convenient:
, st:

By Lagrange multiplier , the problem become



quadratic optimization problem.

Optimization Problem

The solution can be expressed in linear combination of :
.

for these points in support vector.

Optimization problem

The QP is solved iff:1) KKT conditions are fulfilled for every example.2) is semi definite positive.

KKT conditions are:




Soft Margin Hyperplanes

Soft Margin Hyperplanes

The soft margin hyperplanes will choose a hyperplane that splits the examples as cleanly as possible with maximum margin.

Non slack variable , measure the degree of misclassification.

Soft Margin Hyperplanes





































Learning with Kernels , by: scholkopf

Soft Margin Hyperplanes

The optimization problem:

, st: , .

using Lagrange multiplier:



st: ,

C is essentially a regularisation parameter, which controls the trade-off between achieving a low error on the training data and minimising the norm of the weights.

After the Optimizer computes , the W can be computed as




V-SVC

V-SVC

In previous formula , C variable was tradeoff between (1) minimizing training errors (2)maximizing margin.

Replace C by parameter V, control number of margin errors and support vectors.

V is upper bound of training error rate.

V-SVC

The optimization problem become:
,st:

, and .

V-SVC

Using Lagrange multiplier:



St:

, and

and decision function f(X)=




SMO Algorithm

SMO Algorithm

Sequential Minimal Optimization algorithm used to solve quadratic programming problem.

Algorithm:

1- select pair of examples details are coming.
2- optimize target function with respect to selected pair analytically.
3- repeat until the selected pairs step 1 is optimized or number of iteration exceed user defined input.

SMO Algorithm

2-optimize target function with respect to selected pair analytically.

- the update on value of and depends on the difference between the approximation error in and .

Solve for two Lagrange multipliers

http://research.microsoft.com/pubs/68391/smo-book.pdf

Solve for two Lagrange multipliers

http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf

Solve for two Lagrange multipliers

double X = Kii+Kjj+2*Kij;double delta = (-G[i]-G[j])/X;double diff = alpha[i] - alpha[j];alpha[i] += delta; alpha[j] += delta;if(region I): alpha[i] = C_i; alpha[j] = C_i diff;if(region II): alpha[j] = C_j; alpha[i] = C_j + diff;if(region III): alpha[j] = 0;alpha[i] = diff;If (region IV): alpha[i] = 0;alpha[j] = -diff;

SMO Algorithm

1- select pair of examples:

we need to find pair (i,j) where the difference between classification error is maximum.




The pair is optimal if the difference between classification error is less than

SMO Algorithm

1- select pair of examples Continue:

Define the following variables:







(Max difference) (min difference)

SMO algorithm complexity

Memory complexity: no additional matrix is required to solve the problem. Only 2*2 Matrix is required in each iteration.

Memory complexity is linear on training data set size.

SMO algorithm is scaled between linear and quadratic in the size of training data size.