Blue and Grey
Support Vector Machine
By: Amr Koura
Agenda
Definition.
Kernel Functions.
Optimization Problem.
Soft Margin Hyperplanes.
V-SVC.
SMO algorithm.
Demo.
Definition
Definition
Supervised learning model with associated learning algorithms that analyze and recognize patterns.
Application:
- Machine learning.
- Pattern recognition.- classification and regression analysis.
Binary Classifier
Given set of Points P={ such that
and } .
build model that assign new example to
Question
What if the examples are not linearly separable?
http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=MachineLearning&doc=exercises/ex8/ex8.html
Kernel Function
Kernel Function
SVM can efficiently perform non linear classification using
Kernel trick.
Kernel trick map the input into high dimension space where the examples become linearly separable.
Kernel Function
https://en.wikipedia.org/wiki/Support_vector_machine
Kernel Function
Linear Kernel.
Polynomial Kernel.
Gaussian RBF Kernel.
Sigmoid Kernel.
Linear Kernel Function
K(X,Y)=
Dot product between X,Y.
Polynomial Kernel Function
Where d: degree of polynomial, and c is free parameter trade off
between the influence of higher and lower order terms in
polynomials.
Gaussian RBF Kernel
Where denote square euclidean distance.
Other form:
Sigmoid Kernel Function
Where is scaling factor and r is shifting parameter.
Optimization Problem
Optimization Problem
Need to find hyperplane with maximum margin.
https://en.wikipedia.org/wiki/Support_vector_machine
Optimization Problem
Distance between two hyperplanes = .
Goal:
1- minimize ||W||.
2- prevent points to fall into margin.
Constraint:
and
together:
, st:
Optimization Problem
Mathematically convenient:
, st:
By Lagrange multiplier , the problem become
quadratic optimization problem.
Optimization Problem
The solution can be expressed in linear combination of :
.
for these points in support vector.
Optimization problem
The QP is solved iff:1) KKT conditions are fulfilled for every example.2) is semi definite positive.
KKT conditions are:
Soft Margin Hyperplanes
Soft Margin Hyperplanes
The soft margin hyperplanes will choose a hyperplane that splits the examples as cleanly as possible with maximum margin.
Non slack variable , measure the degree of misclassification.
Soft Margin Hyperplanes
Learning with Kernels , by: scholkopf
Soft Margin Hyperplanes
The optimization problem:
, st: , .
using Lagrange multiplier:
st: ,
C is essentially a regularisation parameter, which controls the trade-off between achieving a low error on the training data and minimising the norm of the weights.
After the Optimizer computes , the W can be computed as
V-SVC
V-SVC
In previous formula , C variable was tradeoff between (1) minimizing training errors (2)maximizing margin.
Replace C by parameter V, control number of margin errors and support vectors.
V is upper bound of training error rate.
V-SVC
The optimization problem become:
,st:
, and .
V-SVC
Using Lagrange multiplier:
St:
, and
and decision function f(X)=
SMO Algorithm
SMO Algorithm
Sequential Minimal Optimization algorithm used to solve quadratic programming problem.
Algorithm:
1- select pair of examples details are coming.
2- optimize target function with respect to selected pair
analytically.
3- repeat until the selected pairs step 1 is optimized or number of
iteration exceed user defined input.
SMO Algorithm
2-optimize target function with respect to selected pair
analytically.
- the update on value of and depends on the difference between the
approximation error in and .
Solve for two Lagrange multipliers
http://research.microsoft.com/pubs/68391/smo-book.pdf
Solve for two Lagrange multipliers
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf
Solve for two Lagrange multipliers
double X = Kii+Kjj+2*Kij;double delta = (-G[i]-G[j])/X;double diff = alpha[i] - alpha[j];alpha[i] += delta; alpha[j] += delta;if(region I): alpha[i] = C_i; alpha[j] = C_i diff;if(region II): alpha[j] = C_j; alpha[i] = C_j + diff;if(region III): alpha[j] = 0;alpha[i] = diff;If (region IV): alpha[i] = 0;alpha[j] = -diff;
SMO Algorithm
1- select pair of examples:
we need to find pair (i,j) where the difference between
classification error is maximum.
The pair is optimal if the difference between classification error
is less than
SMO Algorithm
1- select pair of examples Continue:
Define the following variables:
(Max difference) (min difference)
SMO algorithm complexity
Memory complexity: no additional matrix is required to solve the problem. Only 2*2 Matrix is required in each iteration.
Memory complexity is linear on training data set size.
SMO algorithm is scaled between linear and quadratic in the size of training data size.
Top Related