Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is...
-
Upload
clarence-smith -
Category
Documents
-
view
213 -
download
0
Transcript of Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is...
![Page 1: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/1.jpg)
Machine Learning
Weak 4Lecture 2
![Page 2: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/2.jpg)
Hand in Data
• It is online• Only around 6000 images!!!• Deadline is one week. • Next Thursday lecture will be only one hour
and only about the hand in• If you nailed it you can stay home
![Page 3: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/3.jpg)
Support Vector Machines Last Time
Today Today
![Page 4: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/4.jpg)
Functional Margins
For each point we define the functional margin
Define the functional margin of the hyperplane, e.g. the parameters w,b as
“functional distance”
![Page 5: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/5.jpg)
Geometric Margin xi
How far is xi from the hyperplane?How long is segment from xi to L?
Hyperplane
L
wSince L on hyperplane
Definition of L
Multiply in
Solve
![Page 6: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/6.jpg)
Margins functional and Geometrical
w
Related by ||w||
![Page 7: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/7.jpg)
Optimizing Margins
Maximize
Subject To
Geometric Margin
Point Margins
Scale Constraint
![Page 8: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/8.jpg)
Optimization
Subject To
Minimize
Quadratic Programming - Convex
w
Functional margin =1 means sitting on margin
![Page 9: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/9.jpg)
Linear Separable SVM
Subject To
Minimize
Constrained Problem
We need to study the theory of Lagrange Multipliersto understand the SVM
![Page 10: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/10.jpg)
Lagrange Multipliers
Define The Lagrangian
Only consider convex f, gi, and affine hi (method is more general)
α,β are called Lagrange Multipliers
![Page 11: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/11.jpg)
Primal Problem
Which is what we are looking for!!!
We denote the solution x*
![Page 12: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/12.jpg)
Dual Problem
α,β are dual feasible if αi ≥ 0 for all iThis implies for dual feasible α,β
![Page 13: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/13.jpg)
Weak and Strong Duality
Question: When are they equal? Technical
Assume Strong Duality d* = p*
![Page 14: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/14.jpg)
Complementary SlacknessLet x* be primal optimal α*,β* dual optimal (p*=d*)
All Non-NegativeMust be zerosince squeezed between p*
for all iComplimentary Slackness
![Page 15: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/15.jpg)
Karush-Kuhn-Tucker (KKT) ConditionsLet x* be primal optimal α*,β* dual optimal (p*=d*)
gi(x*) ≤ 0, for all i
αi* ≥ 0 for all i
αi* gi(x*) = 0 for all i
hi(x*) = 0, for all iPrimal Feasibility
Dual Feasibility
Complementary Slackness
Stationary
KKT Conditions for optimality, necessary and sufficient.
![Page 16: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/16.jpg)
Finally Back To SVM
Subject To
Minimize
Define the Lagrangian (no β required)
![Page 17: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/17.jpg)
SVM Summary Dual
S. t.
Support Vectors
w
![Page 18: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/18.jpg)
SVM Generalization
VC Dimension for hyperplanes is the number of parameters
Theoretically SpeakingWhy bother finding large margins hyperplanes?
There are other bounds: Rich Theory
![Page 19: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/19.jpg)
Kernel Support Vector Machines
![Page 20: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/20.jpg)
Kernels
Nonlinear feature transforms
Define Kernel and replaceThe two optimizations problems are identical!!!Kernel is an inner product in another space
![Page 21: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/21.jpg)
Kernels
K is an inner product in Φ space!
![Page 22: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/22.jpg)
Polynomial Kernel
Dimensional Space!!!
Feature Transform would take nd time
Computing the kernel takes n time
![Page 23: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/23.jpg)
Gaussian Kernel
Think of this as a similarity measure
It is essentially 0 if x and z are not close
![Page 24: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/24.jpg)
Gaussian Kernel Nonlinear Transform
Simplest case, x,z are 1D e.g. numbers
Inner product between infinitely long feature mapped x,z
![Page 25: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/25.jpg)
Lets Apply It
![Page 26: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/26.jpg)
![Page 27: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/27.jpg)
![Page 28: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/28.jpg)
![Page 29: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/29.jpg)
![Page 30: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/30.jpg)
![Page 31: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/31.jpg)
Kernel MatrixPoints Kernel K(x,z)=Φ(x)TΦ(z)
Kernel Matrix (same name)
If K is a valid Kernel e.g. K(x,z) = Φ(x)TΦ(z) for some Φ
Then K is symmetric positive semidefinite (xTKx≥0)Mercer Kernels, Positive semidefiniteis sufficient and necessary condition
![Page 32: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/32.jpg)
Kernels
• Add nonlinearity to our SVM• Efficient computation in high and even infinite
dimensional spaces. • Few Support vectors (on margin) help us in
generalization (in theory) and runtime• Kernels are not limited to SVM• Kernel Perceptrons, Kernel logistic Regression,
…, any place where we only depend on the inner product
![Page 33: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/33.jpg)
Non Separable Data SVM
![Page 34: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/34.jpg)
Violating Margin
wWrong sideof the track
ξ
![Page 35: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/35.jpg)
S. To
Minimize
If a point is on wrong side of the margin at distance ξ we penalize by Cξ
Hyperparameter C controls the competing goalsof a large margin and points being on the right side of it
How to find C? Validation (Model Selection)
w
Does this look like regularization to you?
![Page 36: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/36.jpg)
Effect of CC=1
C=100
![Page 37: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/37.jpg)
Minimize
S. To
Primal Var.
Lagrange Mult.
![Page 38: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/38.jpg)
Defining The Problems
Dual
Primal
Dual Opt
Primal Opt
![Page 39: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/39.jpg)
Find minimizing w,b,ξ Use Gradients
New Constraint
![Page 40: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/40.jpg)
Constraints
![Page 41: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/41.jpg)
Look Familiar!
![Page 42: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/42.jpg)
S. t.
When done optimizing set β = C-α
Convex Quadratic Program
![Page 43: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/43.jpg)
KKT Complementary SlacknessOptimal solution must haveFor all inequality constraints
We know
![Page 44: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/44.jpg)
On Margin
Right Side
Wrong Side
Find b*: Use a point on margin
Practice way: Average over margin points
![Page 45: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/45.jpg)
S. t.
S. To
![Page 46: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/46.jpg)
Coordinate Ascent
Pick Fix
Solve
Repeat until done
![Page 47: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/47.jpg)
Sequential Minimal Optimization (SMO) Algorithm
S. t.
Coordinate Ascent:
Cannot Change only one variable Take Two
![Page 48: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/48.jpg)
Pick 2 indexes
Fix nonpicked
Optimize W for α’s selected
Repeat Until Done
subject to additional constraint
Algorithm Outline
![Page 49: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/49.jpg)
Linear Equation in α1,α2
α1
α2
0 C
L
H
Constrains we have
C
![Page 50: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/50.jpg)
y1 is either 1 or -1
α2
α10 C
L
H
Optimize
Subject to
![Page 51: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/51.jpg)
i=j=1
i=1,j=2, i=2,j=1i=j=2
i=1, j>2
i>3, j=2
Trying to say it is a second degree polynomial in α2
![Page 52: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/52.jpg)
= Second degree polynomial
We can maximize such things:
α2
α10 C
L
H
![Page 53: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/53.jpg)
Remains
• How to pick α’s– Pick one that violate KKT or Heuristic– Pick another one and optimize
• Stopping Criterion– Close enough to KKT conditions or tired of waiting
![Page 54: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.](https://reader035.fdocuments.us/reader035/viewer/2022081519/56649f295503460f94c421b1/html5/thumbnails/54.jpg)
The End Of SVMs
• Except you will use them in hand in 2…