Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is...

54
Machine Learning Weak 4 Lecture 2

Transcript of Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is...

Page 1: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Machine Learning

Weak 4Lecture 2

Page 2: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Hand in Data

• It is online• Only around 6000 images!!!• Deadline is one week. • Next Thursday lecture will be only one hour

and only about the hand in• If you nailed it you can stay home

Page 3: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Support Vector Machines Last Time

Today Today

Page 4: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Functional Margins

For each point we define the functional margin

Define the functional margin of the hyperplane, e.g. the parameters w,b as

“functional distance”

Page 5: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Geometric Margin xi

How far is xi from the hyperplane?How long is segment from xi to L?

Hyperplane

L

wSince L on hyperplane

Definition of L

Multiply in

Solve

Page 6: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Margins functional and Geometrical

w

Related by ||w||

Page 7: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Optimizing Margins

Maximize

Subject To

Geometric Margin

Point Margins

Scale Constraint

Page 8: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Optimization

Subject To

Minimize

Quadratic Programming - Convex

w

Functional margin =1 means sitting on margin

Page 9: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Linear Separable SVM

Subject To

Minimize

Constrained Problem

We need to study the theory of Lagrange Multipliersto understand the SVM

Page 10: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Lagrange Multipliers

Define The Lagrangian

Only consider convex f, gi, and affine hi (method is more general)

α,β are called Lagrange Multipliers

Page 11: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Primal Problem

Which is what we are looking for!!!

We denote the solution x*

Page 12: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Dual Problem

α,β are dual feasible if αi ≥ 0 for all iThis implies for dual feasible α,β

Page 13: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Weak and Strong Duality

Question: When are they equal? Technical

Assume Strong Duality d* = p*

Page 14: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Complementary SlacknessLet x* be primal optimal α*,β* dual optimal (p*=d*)

All Non-NegativeMust be zerosince squeezed between p*

for all iComplimentary Slackness

Page 15: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Karush-Kuhn-Tucker (KKT) ConditionsLet x* be primal optimal α*,β* dual optimal (p*=d*)

gi(x*) ≤ 0, for all i

αi* ≥ 0 for all i

αi* gi(x*) = 0 for all i

hi(x*) = 0, for all iPrimal Feasibility

Dual Feasibility

Complementary Slackness

Stationary

KKT Conditions for optimality, necessary and sufficient.

Page 16: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Finally Back To SVM

Subject To

Minimize

Define the Lagrangian (no β required)

Page 17: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

SVM Summary Dual

S. t.

Support Vectors

w

Page 18: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

SVM Generalization

VC Dimension for hyperplanes is the number of parameters

Theoretically SpeakingWhy bother finding large margins hyperplanes?

There are other bounds: Rich Theory

Page 19: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernel Support Vector Machines

Page 20: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernels

Nonlinear feature transforms

Define Kernel and replaceThe two optimizations problems are identical!!!Kernel is an inner product in another space

Page 21: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernels

K is an inner product in Φ space!

Page 22: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Polynomial Kernel

Dimensional Space!!!

Feature Transform would take nd time

Computing the kernel takes n time

Page 23: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Gaussian Kernel

Think of this as a similarity measure

It is essentially 0 if x and z are not close

Page 24: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Gaussian Kernel Nonlinear Transform

Simplest case, x,z are 1D e.g. numbers

Inner product between infinitely long feature mapped x,z

Page 25: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Lets Apply It

Page 26: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Page 27: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Page 28: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Page 29: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Page 30: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Page 31: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernel MatrixPoints Kernel K(x,z)=Φ(x)TΦ(z)

Kernel Matrix (same name)

If K is a valid Kernel e.g. K(x,z) = Φ(x)TΦ(z) for some Φ

Then K is symmetric positive semidefinite (xTKx≥0)Mercer Kernels, Positive semidefiniteis sufficient and necessary condition

Page 32: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Kernels

• Add nonlinearity to our SVM• Efficient computation in high and even infinite

dimensional spaces. • Few Support vectors (on margin) help us in

generalization (in theory) and runtime• Kernels are not limited to SVM• Kernel Perceptrons, Kernel logistic Regression,

…, any place where we only depend on the inner product

Page 33: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Non Separable Data SVM

Page 34: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Violating Margin

wWrong sideof the track

ξ

Page 35: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

S. To

Minimize

If a point is on wrong side of the margin at distance ξ we penalize by Cξ

Hyperparameter C controls the competing goalsof a large margin and points being on the right side of it

How to find C? Validation (Model Selection)

w

Does this look like regularization to you?

Page 36: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Effect of CC=1

C=100

Page 37: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Minimize

S. To

Primal Var.

Lagrange Mult.

Page 38: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Defining The Problems

Dual

Primal

Dual Opt

Primal Opt

Page 39: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Find minimizing w,b,ξ Use Gradients

New Constraint

Page 40: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Constraints

Page 41: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Look Familiar!

Page 42: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

S. t.

When done optimizing set β = C-α

Convex Quadratic Program

Page 43: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

KKT Complementary SlacknessOptimal solution must haveFor all inequality constraints

We know

Page 44: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

On Margin

Right Side

Wrong Side

Find b*: Use a point on margin

Practice way: Average over margin points

Page 45: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

S. t.

S. To

Page 46: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Coordinate Ascent

Pick Fix

Solve

Repeat until done

Page 47: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Sequential Minimal Optimization (SMO) Algorithm

S. t.

Coordinate Ascent:

Cannot Change only one variable Take Two

Page 48: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Pick 2 indexes

Fix nonpicked

Optimize W for α’s selected

Repeat Until Done

subject to additional constraint

Algorithm Outline

Page 49: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Linear Equation in α1,α2

α1

α2

0 C

L

H

Constrains we have

C

Page 50: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

y1 is either 1 or -1

α2

α10 C

L

H

Optimize

Subject to

Page 51: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

i=j=1

i=1,j=2, i=2,j=1i=j=2

i=1, j>2

i>3, j=2

Trying to say it is a second degree polynomial in α2

Page 52: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

= Second degree polynomial

We can maximize such things:

α2

α10 C

L

H

Page 53: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

Remains

• How to pick α’s– Pick one that violate KKT or Heuristic– Pick another one and optimize

• Stopping Criterion– Close enough to KKT conditions or tired of waiting

Page 54: Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.

The End Of SVMs

• Except you will use them in hand in 2…