Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf ·...

35
Kernels (SVMs, Logistic Regression) Aarti Singh Machine Learning 101315 Oct 2, 2019

Transcript of Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf ·...

Page 1: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Kernels'(SVMs,'Logistic'Regression)

Aarti&Singh

Machine&Learning&101315Oct&2,&2019

Page 2: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Constrained+optimization+– dual+problem

2

b++ve

Primal+problem:

Moving+the+constraint+to+objective+functionLagrangian:

Dual+problem:If$strong$duality$holds,$then$d*$=$p*$and$x*,$!*$satisfy$KKT$conditions$including

!*(x*<b)$=$0

Page 3: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dual%SVM%– linearly%separable%case

• Primal'problem:

• Dual'problem'(derivation):

3

w – weights on features (d-dim problem)

! – weights on training pts (n-dim problem)

n'training'points,'d'features (x1,'…,'xn)'where'xi is'a'd?dimensional'vector'

Page 4: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dual%SVM%– linearly%separable%case

Dual%problem%is%also%QPSolution%gives%!js

4

Use%support%vectors%with%!k>0%to%compute%b%since%constraint%is%tight%(w.xk +%b)yk =%1

Page 5: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dual%SVM%– non,separable%case

5

• Primal(problem:

• Dual(problem:((Lagrange%Multipliers

,{ξj}(

,{ξj}(L(w, b, ⇠,↵, µ)

Page 6: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dual%SVM%– non,separable%case

6

Dual&problem&is&also&QPSolution&gives&!js

comes&from Intuition:If&C→∞,&recover&hard@margin&SVM

@L

@⇠= 0

Page 7: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

So#why#solve#the#dual#SVM?

• There%are%some%quadratic%programming%algorithms%that%can%solve%the%dual%faster%than%the%primal,%(specially%in%high%dimensions%d>>n)

• But,%more%importantly,%the%“kernel#trick”!!!

7

Page 8: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Separable(using(higher/order(features

8

x1

x2

r&=&√x12+x22

!

x1

x1

x 12

Page 9: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

What%if%data%is%not%linearly%separable?

9

Use%features%of%features%of%features%of%features….

Feature(space(becomes(really(large(very(quickly!

Φ(x)(=((x12,(x22,(x1x2,(….,(exp(x1))

Page 10: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Higher'Order'Polynomials

10

m$– input$features$ d$– degree$of$polynomial

grows$fast!d$=$6,$m$=$100about$1.6$billion$terms

Page 11: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dual%formulation%only%depends%on%dot2products,%not%on%w!

11

Φ(x)%– High+dimensional%feature%space,%but%never%need%it%explicitly%as%long%as%we%can%compute%the%dot%product%fast%using%some%Kernel%K

Page 12: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Dot$Product$of$Polynomials

12

d=1

d=2

d

Page 13: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Finally:(The(Kernel(Trick!

13

• Never'represent'features'explicitly– Compute'dot'products'in'closed'

form

• Constant8time'high8dimensional'dot8products'for'many'classes'of'features

Page 14: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Common%Kernels

14

• Polynomials,of,degree,d

• Polynomials,of,degree,up,to,d

• Gaussian/Radial,kernels,(polynomials,of,all,orders,– recall,series,expansion,of,exp)

• Sigmoid

Page 15: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Mercer%Kernels

15

What'functions'are'valid'kernels'that'correspond'to'feature'vectors'!(x)?

Answer:'Mercer'kernels'K• K'is'continuous'• K'is'symmetric• K'is'positive'semi?definite,'i.e.''xTKx ≥'0'for'all'x

Page 16: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Overfitting

16

• Huge'feature'space'with'kernels,'what'about'overfitting???–Maximizing'margin'leads'to'sparse'set'of'support'vectors'

– Some'interesting'theory'says'that'SVMs'search'for'simple'hypothesis'with'large'margin

– Often'robust'to'overfitting

Page 17: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

What%about%classification%time?

17

• For&a&new&input&x,&if&we&need&to&represent&!(x),&we&are&in&trouble!• Recall&classifier:&sign(w.!(x)+b)

• Using&kernels&we&are&cool!

Page 18: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels

18

• Choose(a(set(of(features(and(kernel(function• Solve(dual(problem(to(obtain(support(vectors(!i

• At(classification(time,(compute:

Classify%as

Page 19: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Iris%dataset,%2%vs 13,%Linear%Kernel

19

Page 20: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Iris%dataset,%1%vs 23,%Polynomial%Kernel%degree%2

20

Page 21: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Iris%dataset,%1%vs 23,%Gaussian%RBF%kernel

21

Page 22: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Iris%dataset,%1%vs 23,%Gaussian%RBF%kernel

22

Page 23: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Chessboard*dataset,*Gaussian*RBF*kernel

23

Page 24: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%with%Kernels• Chessboard*dataset,*Polynomial*kernel

24

Page 25: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Corel&Dataset

25

Page 26: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Corel&Dataset

26Olivier)Chapelle 1998

Page 27: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

USPS$Handwritten$digits

27

Page 28: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

28

SVMs LogisticRegression

Loss/function Hinge&loss Log+loss

Page 29: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

29

SVM : Hinge%loss

051%loss

051 1

Logistic.Regression :.Log%loss% (.4ve log.conditional.likelihood)

Hinge%lossLog%loss

Page 30: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

30

SVMs LogisticRegression

Loss/function Hinge&loss Log+loss

High/dimensional/features/with/kernels

Yes! Yes!

Solution/sparse Often&yes! Almost&always&no!

Semantics/of/output

“Margin” Real&probabilities

Page 31: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

Kernels'in'Logistic'Regression

31

• Define(weights(in(terms(of(features:

• Derive(simple(gradient(descent(rule(on(!i

Page 32: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

32

SVMs LogisticRegression

Loss/function Hinge&loss Log+loss

High/dimensional/features/with/kernels

Yes! Yes!

Solution/sparse Often&yes! Almost&always&no!

Semantics/of/output

“Margin” Real&probabilities

Page 33: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

33

SVMs LogisticRegression

Loss/function Hinge&loss Log+loss

High/dimensional/features/with/kernels

Yes! Yes!

Solution/sparse Often&yes! Almost&always&no!

Semantics/of/output

“Margin” Real&probabilities

Page 34: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

SVMs%vs.%Logistic%Regression

34

SVMs LogisticRegression

Loss/function Hinge&loss Log+loss

High/dimensional/features/with/kernels

Yes! Yes!

Solution/sparse Often&yes! Almost&always&no!

Semantics/of/output

“Margin” Real&probabilities

Page 35: Kernels'(SVMs,'Logistic' Regression)aarti/Class/10315_Fall19/lecs/Lecture11.pdf · Kernels'(SVMs,'Logistic' Regression) Aarti&Singh Machine&Learning&101315 Oct&2,&2019

What%you%need%to%know• Maximizing)margin• Derivation)of)SVM)formulation• Slack)variables)and)hinge)loss• Relationship)between)SVMs)and)logistic)regression

– 0/1)loss– Hinge)loss– Log)loss

• Tackling)multiple)class– One)against)All– Multiclass SVMs

• Dual)SVM)formulation– Easier to solve)when dimension high)d >)n– Kernel Trick 35