Lecture12 - SVM
-
Upload
albert-orriols-puig -
Category
Documents
-
view
6.429 -
download
1
description
Transcript of Lecture12 - SVM
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 12Lecture 12Support Vector Machines
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull
Recap of Lecture 111st generation NN: Perceptrons and othersg p
Also multi-layer percetronsAlso multi-layer percetrons
Slide 2Artificial Intelligence Machine Learning
Recap of Lecture 112nd generation NNg
Some people figure it out how to adapt the weights of internal layers aye s
Seemed to be very powerful and able to solve almost anything
Slide 3
The reality showed that this was not exactly trueArtificial Intelligence Machine Learning
Today’s Agenda
Moving to SVMgLinear SVM
The separable caseThe non-separable case
Non Linear SVMNon-Linear SVM
Slide 4Artificial Intelligence Machine Learning
IntroductionSVM (Vapnik, 1995)( p , )
Clever type of perceptron
I t d f h d di th l f d ti f t hInstead of hand-coding the layer of non-adaptive features, each training example is used to create a new feature using a fixed recipeec pe
A clever optimization technique is used to select the best subset of featuressubset o eatu es
Many NNs researchers switched to SVM in the 1990s because they work betterbecause they work better
Here, we’ll take a slow path into SVM concepts
Slide 5Artificial Intelligence Machine Learning
Shattering Points with Oriented HyperplanesRemember the idea
I want to build hyperplanes that separate points of two classes
In a two-dimensional space lines
E.g.: Linear Classifier
Which is the best separating line?Which is the best separating line?
Remember, a hyperplane is t d b th tirepresented by the equation
0bWX 0=+ bWX
Slide 6Artificial Intelligence Machine Learning
Linear SVMI want the line that maximizes the margin between gexamples of both classes!
Support Vectors
Slide 7Artificial Intelligence Machine Learning
Linear SVMIn more detail
Let’s assume two classesy = {-1 1}yi = {-1, 1}
Each example described by a set of features x (x is aa set of features x (x is a vector; for clarity, we will mark vectors in bold in the remainder of the slides)
The problem can be formulated as followsAll training must satisfy(in the separable case)( )
This can be combined
Slide 8
This can be combined
Artificial Intelligence Machine Learning
Linear SVMWhat are the support vectors?pp
Let’s find the points that lay on the hyper plane H1
Their perpendicular distance to the origin isTheir perpendicular distance to the origin is
Let’s find the points that lay on the hyper plane H2
Their perpendicular distance to the origin is
The margin is:
Slide 9Artificial Intelligence Machine Learning
Linear SVMTherefore, the problem is, p
Find the hyper plane that minimizes
Subject to
But let us change to the Lagrange formulation becauseBut let us change to the Lagrange formulation becauseThe constraints will be placed on the Lagrange multipliers themselves (easier to handle)themselves (easier to handle)
Training data will appear only in form of dot products between vectorsvectors
Slide 10Artificial Intelligence Machine Learning
Linear SVMThe Lagrangian formulation comes to beg g
Where αi are the Lagrange multipliers
So now we need toSo, now we need toMinimize Lp w.r.t w, b
Simultaneously require that the derivatives of Lp w.r.t to αvanish
All subject to the constraints αi ≥ 0
Slide 11Artificial Intelligence Machine Learning
Linear SVMTransformation to the dual problemp
This is a convex problem
W i l tl l th d l blWe can equivalently solve the dual problem
That is, maximize LD
W.r.t αi
Subject to constraintsSubject to constraints
And with αi ≥ 0
Slide 12Artificial Intelligence Machine Learning
Linear SVM
This is a quadratic programming problem. You can solve it with many methods such as gradient descent
We’ll not see these methods in class
Slide 13Artificial Intelligence Machine Learning
The Non-Separable caseWhat if I can not separate the two classesp
We will not be able to solve the Lagrangian formulationWe will not be able to solve the Lagrangian formulation proposed
Any idea?
Slide 14
Any idea?
Artificial Intelligence Machine Learning
The Non-Separable CaseJust relax the constraints by permitting some errorsy p g
Slide 15Artificial Intelligence Machine Learning
The Non-Separable CaseThat means that the Lagrangian is rewritteng g
We change the objective function to be minimized tou c o o be ed o
Therefore, we are maximizing the margin and minimizing the error
C i t t t b h b thC is a constant to be chosen by the user
The dual problem becomes
Subject to and
Slide 16Artificial Intelligence Machine Learning
Non-Linear SVMWhat happens if the decision function is a linear function of ppthe data?
In our equations data appears in form of dot products x xIn our equations, data appears in form of dot products xi · xj
Wouldn’t you like to have polynomials, logarithmics, … functions to fit the data?functions to fit the data?
Slide 17Artificial Intelligence Machine Learning
Non-Linear SVM
The kernel trickThe kernel trickMap the data into a higher-dimensional space
Mercer theorem: any continuous, symmetric, positive semi-definite kernel function K(x, y) can be expressed as a dot product in a high dimensional spaceproduct in a high-dimensional space
Now, we have a kernel function
An example
All we have talked about still holds when using theAll we have talked about still holds when using the kernel function
The only difference is that now my function will beThe only difference is that now my function will be
Slide 18Artificial Intelligence Machine Learning
Non-Linear SVMSome typical kernelsSome typical kernels
A i l l f l i l k l ith 3A visual example of a polynomial kernel with p=3
Slide 19Artificial Intelligence Machine Learning
Some Further IssuesWe have to classify datay
Described by nominal attributes and continuous attributes
P b bl ith i i lProbably with missing values
That may have more than two classes
How SVM deal with them?SVM defined over continuous attributes No problem!SVM defined over continuous attributes. No problem!
Nominal attributes Map into continuous space
S fMultiple classes Build SVM that discriminate each pair of classes
Slide 20Artificial Intelligence Machine Learning
Some Further IssuesI’ve seen lots of formulas… But I want to program a SVM p gbuilder. How I get my SVM?
We have already mentioned that there are many methods toWe have already mentioned that there are many methods to solve the quadratic programming problem
Many algorithms designed for SVMMany algorithms designed for SVM
One of the most significant: Sequential Minimal Optimization
C l h l i hCurrently, there are many new algorithms
Slide 21Artificial Intelligence Machine Learning
Next Class
Association Rules
Slide 22Artificial Intelligence Machine Learning
Introduction to MachineIntroduction to Machine LearningLearning
Lecture 12Lecture 12Support Vector Machines
Albert Orriols i Puigi l @ ll l [email protected]
Artificial Intelligence – Machine LearningEnginyeria i Arquitectura La Salleg y q
Universitat Ramon Llull