Welcome to SVMS Welcome to SVMS Open House 2014 Cougar Team.
SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos...
Transcript of SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos...
![Page 1: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/1.jpg)
©2005-2007 Carlos Guestrin 1
SVMs, Duality and theKernel Trick
Machine Learning – 10701/15781Carlos GuestrinCarnegie Mellon University
February 26th, 2007
![Page 2: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/2.jpg)
©2005-2007 Carlos Guestrin 2
SVMs reminder
![Page 3: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/3.jpg)
©2005-2007 Carlos Guestrin 3
Today’s lecture
Learn one of the most interesting and excitingrecent advancements in machine learning The “kernel trick” High dimensional feature spaces at no extra cost!
But first, a detour Constrained optimization!
![Page 4: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/4.jpg)
©2005-2007 Carlos Guestrin 4
Constrained optimization
![Page 5: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/5.jpg)
©2005-2007 Carlos Guestrin 5
Lagrange multipliers – Dual variables
Moving the constraint to objective functionLagrangian:
Solve:
![Page 6: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/6.jpg)
©2005-2007 Carlos Guestrin 6
Lagrange multipliers – Dual variables
Solving:
![Page 7: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/7.jpg)
©2005-2007 Carlos Guestrin 7
Dual SVM derivation (1) –the linearly separable case
![Page 8: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/8.jpg)
©2005-2007 Carlos Guestrin 8
Dual SVM derivation (2) –the linearly separable case
![Page 9: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/9.jpg)
©2005-2007 Carlos Guestrin 9
Dual SVM interpretation
w.x
+ b
= 0
![Page 10: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/10.jpg)
©2005-2007 Carlos Guestrin 10
Dual SVM formulation –the linearly separable case
![Page 11: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/11.jpg)
©2005-2007 Carlos Guestrin 11
Dual SVM derivation –the non-separable case
![Page 12: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/12.jpg)
©2005-2007 Carlos Guestrin 12
Dual SVM formulation –the non-separable case
![Page 13: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/13.jpg)
©2005-2007 Carlos Guestrin 13
Announcements
Class projects out later this week
![Page 14: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/14.jpg)
©2005-2007 Carlos Guestrin 14
Why did we learn about the dualSVM?
There are some quadratic programmingalgorithms that can solve the dual faster than theprimal
But, more importantly, the “kernel trick”!!! Another little detour…
![Page 15: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/15.jpg)
©2005-2007 Carlos Guestrin 15
Reminder from last time: What if thedata is not linearly separable?
Use features of features of features of features….
Feature space can get really large really quickly!
![Page 16: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/16.jpg)
©2005-2007 Carlos Guestrin 16
Higher order polynomials
number of input dimensions
num
ber o
f mon
omia
l ter
ms
d=2
d=4
d=3
m – input featuresd – degree of polynomial
grows fast!d = 6, m = 100about 1.6 billion terms
![Page 17: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/17.jpg)
©2005-2007 Carlos Guestrin 17
Dual formulation only depends ondot-products, not on w!
![Page 18: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/18.jpg)
©2005-2007 Carlos Guestrin 18
Dot-product of polynomials
![Page 19: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/19.jpg)
©2005-2007 Carlos Guestrin 19
Finally: the “kernel trick”!
Never represent features explicitly Compute dot products in closed form
Constant-time high-dimensional dot-products for many classes of features
Very interesting theory – ReproducingKernel Hilbert Spaces Not covered in detail in 10701/15781,
more in 10702
![Page 20: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/20.jpg)
©2005-2007 Carlos Guestrin 20
Polynomial kernels
All monomials of degree d in O(d) operations:
How about all monomials of degree up to d? Solution 0:
Better solution:
![Page 21: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/21.jpg)
©2005-2007 Carlos Guestrin 21
Common kernels
Polynomials of degree d
Polynomials of degree up to d
Gaussian kernels
Sigmoid
![Page 22: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/22.jpg)
©2005-2007 Carlos Guestrin 22
Overfitting?
Huge feature space with kernels, what aboutoverfitting??? Maximizing margin leads to sparse set of support
vectors Some interesting theory says that SVMs search for
simple hypothesis with large margin Often robust to overfitting
![Page 23: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/23.jpg)
©2005-2007 Carlos Guestrin 23
What about at classification time
For a new input x, if we need to represent Φ(x),we are in trouble!
Recall classifier: sign(w.Φ(x)+b) Using kernels we are cool!
![Page 24: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/24.jpg)
©2005-2007 Carlos Guestrin 24
SVMs with kernels
Choose a set of features and kernel function Solve dual problem to obtain support vectors αi
At classification time, compute:
Classify as
![Page 25: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/25.jpg)
©2005-2007 Carlos Guestrin 25
What’s the difference betweenSVMs and Logistic Regression?
High dimensionalfeatures withkernels
Loss function
NoYes!
Log-lossHinge loss
LogisticRegression
SVMs
![Page 26: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/26.jpg)
©2005-2007 Carlos Guestrin 26
Kernels in logistic regression
Define weights in terms of support vectors:
Derive simple gradient descent rule on αi
![Page 27: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/27.jpg)
©2005-2007 Carlos Guestrin 27
What’s the difference between SVMsand Logistic Regression? (Revisited)
Almost always no!Often yes!Solution sparse
Yes!Yes!High dimensionalfeatures withkernels
Real probabilities“Margin”Semantics ofoutput
Loss function Log-lossHinge loss
LogisticRegression
SVMs
![Page 28: SVMs, Duality and the Kernel Trickguestrin/Class/10701-S07/Slides/kernels.pdf · ©2005-2007 Carlos Guestrin 1 SVMs, Duality and the Kernel Trick Machine Learning – 10701/15781](https://reader030.fdocuments.us/reader030/viewer/2022040608/5ec589b5a8184f5b061e1640/html5/thumbnails/28.jpg)
©2005-2007 Carlos Guestrin 28
What you need to know
Dual SVM formulation How it’s derived
The kernel trick Derive polynomial kernel Common kernels Kernelized logistic regression Differences between SVMs and logistic regression