Kernel Methods and its applications

Introduction to Kernel methods

ML Workshop, ISI KolkataChiranjib Bhattacharyya

Machine Learning labDept of CSA, IISc

chiru@csa.iisc.ernet.in

http://drona.csa.iisc.ernet.in/~chiru

19th Oct, 2012

Introduction

Kernel methods makes Machine Learning more applicable.Kernels are similarity measuresKernels can help integrate different sources of data

Agenda

1 Kernel TrickSVM and Non-linear Classification

2 Definition of Kernel functions

3 Kernels and Hilbert SpacesRKHS, Representer theorem etc

PART 1: KERNEL TRICK

Binary classification

Classifier

f : X →−1,1. f (x) = sign(w>x+b)

Data:D = (xi,yi)|i = 1, . . . ,m xi ∈X ,yi ∈ 1,−1

find f from D

Binary classification

Classifier

f : X →−1,1. f (x) = sign(w>x+b)

Data:D = (xi,yi)|i = 1, . . . ,m xi ∈X ,yi ∈ 1,−1

find f from D

Review of C-SVM

minw,bCm

∑i=1

max(1− yi(w>xi +b),0)+12‖w‖2

C-SVM formulation

maximizeα −12 ∑

ijαiαjyiyjx>i xj +

∑i=1

subject to 0≤ αi ≤ C,∑i

αiyi = 0

At optimality w = ∑mi=1 αiyixi

f (x) = sign(m

∑i=1

αiyix>i x+b)

C-SVM in feature spaces

Let us work with a feature map, Φ(x).

ijαiαjyiyjΦ(xi)>Φ(xj)+

∑i=1

αiyi = 0

and our classifier is

f (x) = sign(m

∑i=1

αiyiΦ(xi)>Φ(x)+b)

The dot product between any pair of examples computed in thefeature space be denoted by

K(x,z) = Φ(x)>Φ(z)

C-SVM in feature spaces

Let us work with a feature map, Φ(x).

ijαiαjyiyjK(xi,xj)+

∑i=1

αiyi = 0

and our classifier is

f (x) = sign(m

∑i=1

αiyiK(xi,x)+b)

The dot product between any pair of examples computed in thefeature space be denoted by

K(x,z) = Φ(x)>Φ(z)

An example

Let x ∈ IR2 and Φ(x) = [x21 x2

√2x1x2]>

K(x,z) = Φ(x)>Φ(z) = x21z2

1 +2x1x2z1z2 + x22z2

2 =< x,z >2

If K(x,z) = (x>z)r is a dot product in a(d+r−1

)feature space

corresponding to x,z ∈ IRd.

If d = 256,r = 4, the feature space size is 6,35,376.

However if we know K one can still solve the SVM formulationwithout explicitly evaluating Φ

Kernel function

K :×X → IR is a Kernel function if

K(x,z) = K(z,x) symmetric

Kis positive semidefinite, i.e.∀n,x1, . . . ,xn ∈X ,

the matrix Kij = K(xi,xj) is psd

Recall that a K ∈ IRd×d is psd if u>Ku≥ 0 for all u ∈ IRd.

Examples of Kernel function

K(x,z) = Φ(x)>Φ(z) where φ : E → IRd

K is symmetric i.e. K(x,z) = K(z,x)

Positive Semidefinite:Let D = x1,x2, . . . ,xn be set of arbitrarily chosen n elements of E .Define

Kij = Φ(xi)>Φ(xj)

For any u ∈ IRn it is straightforward to see that

u>Ku = ‖Φ(D)u‖22 ≥ 0 Φ(D) = [Φ(x1), . . . ,Φ(xn)]

Examples of Kernel function

K(x,z) = Φ(x)>Φ(z) where φ : E → IRd

K is symmetric i.e. K(x,z) = K(z,x)Positive Semidefinite:Let D = x1,x2, . . . ,xn be set of arbitrarily chosen n elements of E .Define

Kij = Φ(xi)>Φ(xj)

For any u ∈ IRn it is straightforward to see that

u>Ku = ‖Φ(D)u‖22 ≥ 0 Φ(D) = [Φ(x1), . . . ,Φ(xn)]

Examples of Kernel functions

K(x,z) = x>z Φ(x) = x

K(x,z) = (x>z)r Φt1t2...td(x) =√

r!t1!t2!....td!x

t11 xt2

2 ...xtdd

∑di=1 ti = r

K(x,z) = e−γ‖x−z‖2

Kernel Construction

Let K1 and K2 be two valid kernels.K(x,y) = Φ(x)>Φ(y)

K(u,v) = K1(u,v)K2(u,v)

K = αK1 +βK2 α,β ≥ 0

K(x,y) =K(x,y)√

K(x,x)√

K(y,y)

K(x,y) = x>y

K(x,y) = (x>y)i

K(x,y) = limN→∞

∑i=0

(x>y)i

i!= ex>y

K(x,y) = e−12‖x−y‖2

Kernel Construction

Let K1 and K2 be two valid kernels.K(x,y) = Φ(x)>Φ(y)

K(u,v) = K1(u,v)K2(u,v)

K = αK1 +βK2 α,β ≥ 0

K(x,y) =K(x,y)√

K(x,x)√

K(y,y)

K(x,y) = x>y

K(x,y) = (x>y)i

K(x,y) = limN→∞

∑i=0

(x>y)i

i!= ex>y

K(x,y) = e−12‖x−y‖2

Kernel function and feature map

A theorem due to Mercer guarantees a feature map for symmetric, psdkernel functions.Loosely statedFor a symmetric function K : X ×X → IR, there exists an expansionK(x,z) = Φ(x)>Φ(z) iff∫

Xg(x)g(z)K(x,z)dxdz≥ 0

PART 2: Kernels and Hilbert spaces

What is a Dot product(aka Inner Product)

Let X be a vector space.

What is a Dot product

Symmetry < u,v >=< v,u > u,v ∈X

Bilinear < αu+βv,w >= α < u,w > +β < v,w > u,v,w,∈X

Positive Semidefinite < u,u > ≥ 0 u ∈X

< u,u >= 0 iff u = 0

‖x‖=√〈x,x〉

‖x‖= 0 =⇒ x = 0

Examples of Dot products

X = IRn,< u,v >= u>v

X = IRn,< u,v >=n

∑i=1

λiuivi λi ≥ 0

X = L2(X) =

f :∫

Xf (x)2dx < ∞

f ,g ∈X < f ,g >=

f (x)g(x)dx

Cauchy Schwartz inequality

Let X be an inner product space.

|〈x,y〉| ≤ ‖x‖‖y‖ ∀ x,y ∈X

and equality holds iff x = αz for some scalar α

Proof: ∀α ∈ IR ‖x−αz‖2 ≥ 0

‖x‖2−2α〈x,z〉+α2‖z‖2 ≥ 0∀α

Let α = 〈x,z〉‖z‖2 and the inequality follows by taking square roots. The

claim about equality follows from the definition of norm.

Hilbert Space: Basic facts

Defn: A Inner product space (H ,〈·, ·〉H ) is a Hilbert Space if it isseparable and complete.We will denote the norm as ‖ · ‖H . The orthogonal complement of M,where M ⊂H be a subspace of H is defined asM⊥ = z|〈x,z〉H = 0, ∀x ∈M

Hilbert space Projection theorem

Let M be a subspace of Hilbert space H ,〈·, ·〉H . For every x ∈Hthe following holds

There exists an unique ΠM(x) ∈M such thatΠM(x) = argminz∈M‖x− z‖H

x−ΠM(x) ∈M⊥ 〈z,x−ΠM(x)〉H = 0 ∀z ∈M

‖x‖2H = ‖ΠM(x)‖2

H +‖y‖2H where

x = ΠM(x)+ y where y ∈M⊥

Reproducing kernel Hilbert Space(RKHS)

Let K be any kernel function. Consider the following set

H = f |f (.) =m

∑i=1

αiK(.,xi)∀xi ∈X ,m ∈ N

Dot product

For any f ,g ∈H ,

f (.) =m1

∑i=1

αiK(.,xi) , g(.) =m2

∑i=1

βjK(.,xj)

〈f ,g〉H =m1

∑i=1

∑j=1

αiβjK(xi,xj)

Is it a dot product?

Reproducing kernel Hilbert Space(RKHS)

As K is symmetric, 〈f ,g〉H = 〈g, f 〉H

〈f (.), f (.)〉=m

∑i=1

∑j=1

αiαjK(xi,xj)

Recall that K is a psd matrix if K is kernel function and so〈f (.), f (.)〉H ≥ 0

Reproducible Property

for any f ∈H

f (x) =m

∑i=i

αiK(x,xi) = 〈m

∑i=1

αiK(.,xi),K(.,x)〉= 〈f (.),K(.,x)〉

Applying C-S inequality |f (x)| ≤√〈f , f 〉H

√K(x,x) holds leading to

|f (x)|= 0 whenever 〈f , f 〉H = 0

Representer theorem

Let K be a valid kernel defined on X and H be the correspondingRKHS. Let Ω be an increasing function. The optimization problem

ming∈H

G(g) =m

∑i=1

l(g(xi),yi)+Ω(‖g‖2H )

is solved when g∗ = ∑mi=1 αiK(.,xi)

Proof: Let M = ∑mi=1 αiK(.,xi) i = 1, . . . ,m. Clearly M is a

subspace of H . Take any g ∈H .

g(xi) = 〈g,K(.,xi)〉= 〈gM +gper,K(.,xi)〉

= 〈gM,K(.,xi)〉+ 〈gper,K(.,xi)〉= 〈gM,K(.,xi)〉= gM(xi)

As Ω is an increasing function, Ω(‖g‖2H )≥Ω(‖gM‖2

Representer theorem

Let K be a valid kernel defined on X and H be the correspondingRKHS. Let Ω be an increasing function. The optimization problem

ming∈H

G(g) =m

∑i=1

l(g(xi),yi)+Ω(‖g‖2H )

is solved when g∗ = ∑mi=1 αiK(.,xi)

Proof: Let M = ∑mi=1 αiK(.,xi) i = 1, . . . ,m. Clearly M is a

subspace of H . Take any g ∈H .

g(xi) = 〈g,K(.,xi)〉= 〈gM +gper,K(.,xi)〉

= 〈gM,K(.,xi)〉+ 〈gper,K(.,xi)〉= 〈gM,K(.,xi)〉= gM(xi)

As Ω is an increasing function, Ω(‖g‖2H )≥Ω(‖gM‖2

References

Kernel methods in Computational BiologyScholkopf et al. 2004Kernel methods for Pattern AnalysisJohn Shawe Taylor and N. CristaniniLearning with KernelsScholkopf and Smola 2002

Kernel Methods and its applications

Documents

Transcript of Kernel Methods and its applications

Gaussian Models and Kernel Methods

Kernel Methods and Machine Learning (BookZZ.org)

Kernel Methods for Pattern Recognition 2018

Kernel Methods for Learning Languages

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Kernel Methods for Knowledge Structures

Support Vector and Kernel Methods

NEW OPTIMIZATION METHODS AND APPLICATIONS IN KERNEL-BASED MACHINE LEARNING · 2010-05-01 · NEW OPTIMIZATION METHODS AND APPLICATIONS IN KERNEL-BASED MACHINE LEARNING By Onur S»eref

Bias / Variance Analysis, Kernel Methods

Kernel Methods for Collaborative Filtering · Kernel Methods for Collaborative Filtering by Xinyuan Sun A thesis Submitted to the Faculty ... 1 Introduction 1 ... Incorporating kernel

Kernel Methods Fast Algorithms and Real Life Applications

Neural Networks and Kernel Methods

Learning Multiple Tasks with Kernel Methods

Statistical Learning and Kernel Methods in …vector machines, and kernel feature spaces. In addition, we present an overview of applications of kernel methods in bioinformatics.1

Intro ML: Tutorial on Kernel Methods

Kernel methods in machine learning

Introduction to Kernel Methods - GitHub Pages · to Kernel Methods F. Gonz alez Introduction The Kernel Trick The Kernel Approach to Machine Learning A Kernel Pattern Analysis Algorithm

CVPR2009 tutorial: Kernel Methods in Computer Vision: part I: Introduction to Kernel Methods, Selecting and Combining Kernels

Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel methods in machine learning - Kernel Machines