10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a...

36
10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014

Transcript of 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a...

Page 1: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Recitation : Kernels

Manojit Nandi

February 27, 2014

Page 2: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

OutlineMathematical Theory

Banach Space and Hilbert SpacesKernelsCommonly Used Kernels

Kernel TheoryOne Weird Kernel TrickRepresenter Theorem

Do You want to Build a (((((hhhhhSnowman Kernel? It doesn’t have to be(((((hhhhhSnowman Kernel.Operations that Preserve/Create Kernels

Page 3: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Current Research in Kernel TheoryFlaxman TestFast Food

References

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 3/36

Page 4: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 4/36

Page 5: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

IMPORTANT: The Kernels used in Support Vector Machines are different fromthe Kernels used in Kernel Regression.

To avoid confusion, I’ll refer to the Support Vector Machines kernels as MercerKernels and the Kernel Regression kernels as Smoothing Kernels

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 5/36

Page 6: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Recall: A square matrix A ∈ RNxN is positive semi-definite if for all vectorsu ∈ Rn, uTAu ≥ 0.

For some kernel function K : XxX → R defined over a set X ; |X | = n, theGram Matrix G is an nXn matrix such that Gij = K(xi, xj) =< φ(xi), φ(xj) >

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 6/36

Page 7: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Mathematical Theory

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 7/36

Page 8: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Banach Space and Hilbert Spaces

A Banach Space is a complete Vector Space V equipped with a norm.

Examples:

• lmp Spaces: Rm equipped with the norm ‖x‖ = (m∑

i=1|xi|p)

1p

• Function Spaces, ‖f‖ := (∫

X |f(x)|P dx)1p

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 8/36

Page 9: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

A Hilbert Space is a complete Vector Space V equipped with an inner product< ., . >.The inner product defines a norm (‖x‖ =< x, x >), so Hilbert Spaces are aspecial case of Banach Spaces.

Example:Euclidean Space with standard inner product: x, y ∈ Rn;< x, y >=

n∑i=1xiyi =

xT y

Because Kernel functions are the inner product in some higher-dimensionalspace, each kernel function corresponds to some Hilbert Space H.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 9/36

Page 10: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Kernels

A Mercer Kernel is a function K defined over a set X such that K : XxX → R.The Mercer Kernel represents an inner product in some higher dimensional space

A Mercer Kernel is symmetric and positive semi-definite. By Mercer’s theo-rem, we know that any symmetric positive semi-definite kernel represents theinner product in some higher-dimensional Hilbert Space H.

Symmetric and Positive Semi-Definite ⇔ Kernel Function ⇔< φ(x), φ(x′) >

for some φ(.).

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 10/36

Page 11: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Why Symmetric?Inner products are symmetric by definition, so therefore if the kernel functionrepresents an inner product in some Hilbert Space, then the kernel function mustbe symmetric as well.K(x, z) =< φ(x), φ(z) >=< φ(z), φ(x) >= K(z, x).

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 11/36

Page 12: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Why Positive Semi-definite?In order to be positive semi-definite, the Gram Matrix G ∈ RnXn must satisfyuTGu ≥ 0 ∀u ∈ Rn. Using functional analysis, one can show that uTGucorresponds to < hu, hu > for some hu in some Hilbert Space H. Because< hu, hu > ≥ 0 by the definition of an inner product, then < hu, hu > ≥ 0 and< hu, hu >= uTGu⇒ uTGu ≥ 0.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 12/36

Page 13: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Why are Kernels useful?Let x = (x1, x2) and z = (z1, z2). Then,

K(x, z) =< x, z >2= (x1z1 + x2z2)2 = (x21z

21) + (2x1z1x2z2) + (x2

2z22)

=< (x21,√

2x1x2, x22), (z2

1 ,√

2z1z2, z22) >=< φ(x), φ(z) >

Where φ(x) = (x21,√

2x1x2, x22).

What if x = (x1, x2, x3, x4), z = (z1, z2, z3, z4), and K(x, z) =< x, z >50.In this case, φ(x) and φ(z) are 23426∗ dimensional vectors.

∗: From combinatorics, 50 unlabeled balls into 4 labeled boxes =(53

3).

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 13/36

Page 14: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

We can either:

1. Calculate < x, z > (Inner product of two four-dimension vectors) and raisethis value (a float) to the 50th power.

2. OR map x and z to φ(x) and φ(z), and calculate the inner product of two23426 dimensional vectors.

Which is faster?

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 14/36

Page 15: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Commonly Used Kernels

Some commonly used Kernels are:

• Linear Kernel: K(x, z) =< x, z >

• Polynomial Kernel: K(x, z) =< x, z >d where d is the degree of thepolynomial

• Gaussian RBF: K(x, z) = exp(− 12σ2 ‖x− z‖

2) = exp(−γ ‖x− z‖2)

• Sigmoid Kernel: K(x, z) = tanh(αxT z + β)

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 15/36

Page 16: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Other Kernels:

1. Laplacian Kernel

2. ANOVA Kernel

3. Circular Kernel

4. Spherical Kernel

5. Wave Kernel

6. Power Kernel

7. Log Kernel

8. B-Spline Kernel

9. Bessel Kernel

10. Cauchy Kernel

11. χ2 Kernel

12. Wavelet Kernel

13. Bayesian Kernel

14. Histogram Kernel

As you can see, there are a lot of kernels.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 16/36

Page 17: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

The Gaussian RBF Kernel is infiniteIn class, Barnabas mentioned that the Gaussian RBF Kernel corresponds to aninfinite dimensional vector space. From the Moore-Aronszajn theorem, we knowthere is a unique Hilbert space of functions for with the Gaussian RBF is areproducing kernel.

One can show that this Hilbert Space has a infinite basis, and so the Gaus-sian RBF Kernel corresponds to an infinite-dimensional vector space. A YouTubevideo of Abu-Mostafa explaining this in more detail has been included in thereferences at the end of this presentation.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 17/36

Page 18: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Kernel Theory

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 18/36

Page 19: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

One Weird Kernel TrickThe kernel trick is one the reasons kernel methods are so popular in machinelearning. With the kernel trick, we can substitute any dot product with a kernelfunction. This means any linear dot product in the formulation of a machinelearning algorithm can be replaced with a non-linear kernel function. The non-linear kernel function may allow us to find separation or structure in the datathat was not present in the linear dot product.

Kernel Trick: < xi, xj > can be replaced with K(xi, xj) for any valid ker-nel function K. Furthermore, we can swap any kernel function with any otherkernel function.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 19/36

Page 20: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Some examples of algorithms that take advantage of the kernel trick:

• Support Vector Machines (Poster Child for Kernel Methods).

• Kernel Principal Component Analysis

• Kernel Independent Component Analysis

• Kernel K-Means algorithm

• Kernel Gaussian Process Regression

• Kernel Deep Learning

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 20/36

Page 21: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Representer TheoremTheoremLet X be a non-empty set and k a positive-definite real-valued kernel on onXxX with the corresponding reproducing kernel Hilbert Space Hk. Given atraining sample [(x1, y1), ..., (xn, yn)] ∈ XxR, a strictly monotonic increasingreal-valued function g : [0,∞) → R, and an arbitrary emprical loss functionE : (XXR2)m → R ∪∞, then for any f∗ satisfying

f̂ = argminf∈Hk

{E[(x1, y1, f(x1), ..., (xn, yn, f(xn))]}+ g(‖f‖)

f∗ can be represented in the form f∗(.) =n∑

i=1αik(·, xi),

where αi ∈ R for all αi

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 21/36

Page 22: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Example: Let Hk be some RKHS function space with a kernel k(., .). Let[(x1, y1), ..., (xm, ym)] be a training input-output set, and our task is to find f∗that minimizes the following regularized kernel.

f̂ = argminf∈Hk

(n∏

i=1|f(xi)|6)

n∑i=1

[∣∣∣sin(‖xi‖yi−f(xi))

∣∣∣25+yi |f(xi)|249] + exp(‖f‖F )

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 22/36

Page 23: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

f̂ = argminf∈Hk

(n∏

i=1|f(xi)|6)

n∑i=1

[∣∣∣sin(‖xi‖yi−f(xi))

∣∣∣25+yi |f(xi)|249] + exp(‖f‖F )

Well exp(.) is a strictly monotonic increasing function, so exp(‖f‖F ) fits theg(‖f‖F ) part.

(n∏

i=1|f(xi)|6)

n∑i=1

[∣∣∣sin(‖xi‖yi−f(xi))

∣∣∣25+ yi |f(xi)|249] is some arbitrary emper-

ical loss function of the [xi, yi, f(xi)] . So this optimization can be expressed asf̂ = argmin

f∈Hk

{E[(x1, y1, f(x1), ..., (xn, yn, f(xn))]}+ g(‖f‖)

Therefore, by the Representer Theorem, f∗(.) =n∑

i=1αiK(xi, .)

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 23/36

Page 24: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Do You want to Build a (((((((hhhhhhhSnowman Kernel? Itdoesn’t have to be (((((((hhhhhhhSnowman Kernel.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 24/36

Page 25: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Operations that Preserve/Create KernelsLet k1, ...km be valid kernel functions defined over some set X such that |X | = n,and let α1, ...., αm be non-negative coefficents. The following are valid kernels.

• K(x,z) =m∑

i=1αiki(x, z) (Closed under non-negative linear multiplication)

• K(x,z) =m∏

i=1ki(x, z) (Closed under multiplication)

• K(x,z) =k1(f(x), f(z)) for any function f : X → X

• K(x,z) = g(x)g(z) for any function g : X → R

• K(x,z) = xTATAz for any matrix A ∈ RmXn

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 25/36

Page 26: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Proof: K(x,z) =m∑

i=1αiki(x, z) is a valid kernel

Symmetry: K(z, x) =m∑

i=1αiki(z, x). Because each ki is a kernel function,

it is symmetric, so ki(z, x) = ki(x, z). Therefore,K(z, x) =

m∑i=1αiki(z, x) =

m∑i=1αiki(x, z) = K(x, z)⇒ K(z, x) = K(x, z)

Positive Semi-definite: Let u ∈ Rn be arbitrary. The Gram matrix of K,denoted by G has the property, Gi,j = K(xi, xj) =

m∑i=1αiki(z, x) ⇒ G =

α1G1 + ...+ αmGm. Now uTGu = uT (α1G1 + ...+ αmGm)u= α1uTG1u + ....+ αmuTGmu =

m∑i=1αiuTGiu.

uTGiu ≥ 0, and αi ≥ 0, so αiuTGiu ≥ 0.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 26/36

Page 27: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Proof: K(x,z) = xTATAz for any matrix A ∈ RmXn is a valid Kernel.

For this proof, we are going to show K(x, z) is an inner product on some HilbertSpace. Let φ(x) = Ax, then < φ(x), φ(z) >= φ(x)Tφ(z) = (Ax)T (Az) =xTATAz = K(x, z)⇒< φ(x), φ(z) >= K(x, z).

Therefore, K(x, z) is an inner product on some Hilbert Space.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 27/36

Page 28: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Just because it is a valid kernel does not mean it is a good kernel

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 28/36

Page 29: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 29/36

Page 30: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Some Exercises Prove the following are not valid kernels:

1. K(x, z) = k1(x, z)− k2(x, z) for k1, k2 valid kernels

2. K(x, z) = exp(γ ‖x− z‖2) for some γ > 0

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 30/36

Page 31: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Current Research in Kernel Theory

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 31/36

Page 32: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Flaxman Test

Correlates of homicide: New space/time interaction tests for spatiotemporal pointprocesses

Goal: Develop a statistical test that can test for the strength of interactionbetween time and space for different types of homicides.

Result: Using Reproducing Kernel Hilbert spaces, one can project the space-timedata into a higher dimensional space that can test for a significant interaction ofa kernalized distance of space and time using the Hilbert-Schmidt IndependenceCriterion.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 32/36

Page 33: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Fast FoodFastfood - Approximating Kernel Expansions in Loglinear time

Problem: Kernel methods do not scale well to large datasets, especially duringprediction time, because the algorithm has to compute the kernel distance betweenthe new vector and all of the data in the training set.

Solution: Using advanced mathematical black magic, dense random Gaussianmatrices can be approximated using Hadamard matrices and diagonal Gaussianmatrices.

V = 1σ√dSHG

∏HB

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 33/36

Page 34: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

Where∏∈ {0, 1}n is a permutation matrix, H is a Hadamard matrix, S is a

random scaling matrix with elements along the diagonal, B has random {+−

1}along the diagonal, and G has random Gaussian entries.

This algorithm performs 100x faster than the previous leading algorithm (RandomKitchen Sinks) and uses 1000x less memory.

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 34/36

Page 35: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

10-701/15-781 Kernel Theory Recitation

References

Representer Theorem and Kernel MethodsPredicting Structured Data by Alex SmolaKernel MachinesString and Tree KernelsWhy Gaussian Kernel is Infinite

Manojit Nandi | Carnegie Mellon University — Machine Learning Department 35/36

Page 36: 10-701/15-781Recitation: Kernelsaarti/Class/10701_Spring14/KernelTheory.pdfJust because it is a valid kernel does not mean it is a good kernel ManojitNandi | CarnegieMellonUniversity—MachineLearningDepartment28/36

Questions?