Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי...
-
Upload
austin-davis -
Category
Documents
-
view
216 -
download
2
Transcript of Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי...
Support Vector Machines Project
מגישים: גיל טל ואורן אגם
מנחה: מיקי אלעד
1999נובמבר
הטכניון מכון טכנולוגי לישראלהפקולטה להנדסת חשמל
המעבדה לעיבוד וניתוח תמונות
Introduction
• SVM is an emerging technique for supervised learning problems, which might replace neural-networks.
• Main features: – Good generalization error - maximal margin. – Convex optimization problem.– Linear and non-linear decision surfaces.
• Proposed initially by Vapnik (82’).
Project Objectives
• Learn the theory of SVM,
• Design an efficient training algorithm,
• Create a 2D demo in order to explain the features of the SVM, and the parameters involved, and
• Create a C++ software package which can serve as a platform for learnining problems.
Supervised Learning
)x,y(,........),x,y(),x,y( LL2211
1,1y L1kk
Input:
where: input vectors,
classification values.
Purpose: Find a machine I(z) that classifies
correctly the training data, and generalizes
well to other inputs.
nL1kkx
Neural Networks
1. Training involves a solution of non-convex optimiz. problem.
2. Generalization error is typically not satisfactory.
3. Hard to choose the structure of the net.
Support Vector Machine (SVM)
• Input Vectors are mapped to a high dimensional feature space Z. (1. How ?)
• In this space a hyper-plane decision surface is constructed. (2. How ?)
• This decision surface has special properties that ensure high generalization. (3. How ?)
• Training is done in a numerically feasible approach. (4. How ?)
1. Mapping to Higher Dimension
• Map the vectors from to a higher dimension
(N>n) using a non-linear mapping function chosen a priori.
• Basic idea: a linear separation in the N-dim. is a non-linear separating surface in the n-dim.
Nn:
nN
Example: Non-Linear Mapping
As a different example, if the input vectors has n=200, and we are using
5th order polynomial, N has BILLIONS OF ENTRIES
There is a computational problem that must be taken care of
2. Separating Hyper-plane
)x,y(,..........),x,y(),x,y( LL2211Input:
The input is linearly separable if there exists a vector Wand a scalar b such that:
The separating hyperplane is given by
0bxW
1bxWy
1yfor1bxW1yfor1bxW
kk
kk
kk
3. Optimal Hyper-plane
1. SVM defines the optimal hyper-plane as the one with maximal margin
2. It can be shown that the margin is given by 2W
L,...,2,1k1bxWy
ToSubjectWMinimize
kk
2
W
QP Problem
LaGrange multipliers
To do so we construct a Lagrangian:
]1)bWXi(yi[iWW2
1)A,b,W(L
t
1i
At the point of minimum we get:
L
1iii
L
1iiii
y00b
)A,b,W(L
XyW0W
)A,b,W(L
• Most of the α’s are zeros.• The non-zeros correspond to the points satisfing the inequalities as equalities
These points are called the SUPPORT-VECTORS
W Z b0 0 0
VectorsSupport
iii XyW
bZXysign)Z(I
VectorsSupport
iii
Decision Law
Classification by SVM
4. Using Kernel functions
Let us restrict the kind of functions such that
Examples:
Nn:
2121 Z,ZKZZ
d21212
221
21 ZZ1Z,ZKor2
ZZexpZ,ZK
The QP Problem
0,0YToSubject
D2
11Maximize
T
TT
is the vector of weights (laGrange mult.) , 1 is a vector of ones D is a matrix with the entries:
jijijiji X,XKyyXXyyDij
Using kernel functions, the overall problem remain QP:
The Decision Rule
bZ,XKysign)Z(I i
VectorsSupport
ii
• Using Kernel functions, we are required to perform inner-products in the lower (n) dimension only, both for training and for applying it on input patterns.• By solving for the optimal we actually find the support vector
Results
1. Write here about the software that you developed
2. Cut and paste an image which will show the application window
3. Add more examples (for example - show how the same non-linear problem is treated with growing d - the polynomial degree)
4. Say something about the algorithm that you have implemented (main features)
Example 1: Linear Classification
Example 2: Non-Linear Separation
Conclusions