Download - Svm V SVC

Blue and Grey

Support Vector Machine

By: Amr Koura

Agenda

Definition.

Kernel Functions.

Optimization Problem.

Soft Margin Hyperplanes.

V-SVC.

SMO algorithm.

Demo.

Definition

Definition

Supervised learning model with associated learning algorithms that analyze and recognize patterns.

Application:
- Machine learning.
- Pattern recognition.- classification and regression analysis.

Binary Classifier

Given set of Points P={ such that
and } .
build model that assign new example to

Question

What if the examples are not linearly separable?

http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex8/ex8.html

Kernel Function

Kernel Function

SVM can efficiently perform non linear classification using Kernel trick.

Kernel trick map the input into high dimension space where the examples become linearly separable.

Kernel Function

https://en.wikipedia.org/wiki/Support_vector_machine

Kernel Function

Linear Kernel.

Polynomial Kernel.

Gaussian RBF Kernel.

Sigmoid Kernel.

Linear Kernel Function

K(X,Y)=

Dot product between X,Y.

Polynomial Kernel Function

Where d: degree of polynomial, and c is free parameter trade off between the influence of higher and lower order terms in polynomials.

Gaussian RBF Kernel

Where denote square euclidean distance.

Other form:

Sigmoid Kernel Function

Where is scaling factor and r is shifting parameter.

Optimization Problem


Need to find hyperplane with maximum margin.

https://en.wikipedia.org/wiki/Support_vector_machine


Distance between two hyperplanes = .

Goal:
1- minimize ||W||.
2- prevent points to fall into margin.

Constraint:
and

together:
, st:


Mathematically convenient:
, st:

By Lagrange multiplier , the problem become

quadratic optimization problem.


The solution can be expressed in linear combination of :
.

for these points in support vector.

Optimization problem

The QP is solved iff:1) KKT conditions are fulfilled for every example.2) is semi definite positive.

KKT conditions are:

Soft Margin Hyperplanes


The soft margin hyperplanes will choose a hyperplane that splits the examples as cleanly as possible with maximum margin.

Non slack variable , measure the degree of misclassification.


Learning with Kernels , by: scholkopf


The optimization problem:

, st: , .

using Lagrange multiplier:

st: ,

C is essentially a regularisation parameter, which controls the trade-off between achieving a low error on the training data and minimising the norm of the weights.

After the Optimizer computes , the W can be computed as

V-SVC

V-SVC

In previous formula , C variable was tradeoff between (1) minimizing training errors (2)maximizing margin.

Replace C by parameter V, control number of margin errors and support vectors.

V is upper bound of training error rate.

V-SVC

The optimization problem become:
,st:

, and .

V-SVC

Using Lagrange multiplier:

St:

, and

and decision function f(X)=

SMO Algorithm

SMO Algorithm

Sequential Minimal Optimization algorithm used to solve quadratic programming problem.

Algorithm:

1- select pair of examples details are coming.
2- optimize target function with respect to selected pair analytically.
3- repeat until the selected pairs step 1 is optimized or number of iteration exceed user defined input.

SMO Algorithm

2-optimize target function with respect to selected pair analytically.

- the update on value of and depends on the difference between the approximation error in and .

Solve for two Lagrange multipliers

http://research.microsoft.com/pubs/68391/smo-book.pdf


http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf


double X = Kii+Kjj+2*Kij;double delta = (-G[i]-G[j])/X;double diff = alpha[i] - alpha[j];alpha[i] += delta; alpha[j] += delta;if(region I): alpha[i] = C_i; alpha[j] = C_i diff;if(region II): alpha[j] = C_j; alpha[i] = C_j + diff;if(region III): alpha[j] = 0;alpha[i] = diff;If (region IV): alpha[i] = 0;alpha[j] = -diff;

SMO Algorithm

1- select pair of examples:

we need to find pair (i,j) where the difference between classification error is maximum.

The pair is optimal if the difference between classification error is less than

SMO Algorithm

1- select pair of examples Continue:

Define the following variables:

(Max difference) (min difference)

SMO algorithm complexity

Memory complexity: no additional matrix is required to solve the problem. Only 2*2 Matrix is required in each iteration.

Memory complexity is linear on training data set size.

SMO algorithm is scaled between linear and quadratic in the size of training data size.