Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... ·...

72
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos

Transcript of Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... ·...

Page 1: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

Advanced Introduction to

Machine Learning

CMU-10715

Gaussian Processes

Barnabás Póczos

Page 2: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

Introduction

Page 3: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

3

http://www.gaussianprocess.org/

Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

Page 4: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

4

Introduction Regression

Properties of Multivariate Gaussian distributions

Ridge Regression

Gaussian Processes

Weight space view o Bayesian Ridge Regression + Kernel trick

Function space view o Prior distribution over functions

+ calculation posterior distributions

Contents

Page 5: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

Regression

Page 6: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

Why GPs for Regression?

Motivation 1:

All the above regression method give point estimates. We would like a method that could also provide confidence during the estimation.

Motivation 2:

Let us kernelize linear ridge regression, and see what we get…

Regression methods:

Linear regression, ridge regression, support vector regression, kNN regression, etc…

Page 7: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

7

• Here’s where the function will

most likely be. (expected function)

• Here are some examples of what it might look like. (sampling from the posterior distribution)

• Here is a prediction of what you’ll see if you evaluate your function at x’, with confidence

GPs can answer the following questions

Why GPs for Regression?

Page 8: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

8

Properties of Multivariate Gaussian

Distributions

Page 9: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

9

1D Gaussian Distribution Parameters

• Mean,

• Variance, 2

Page 10: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

10

Multivariate Gaussian

Page 11: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

11

A 2-dimensional Gaussian is defined by

• a mean vector = [ 1, 2 ]

• a covariance matrix: where i,j

2 = E[ (xi – i) (xj – j) ]

is (co)variance

Note: is symmetric,

“positive semi-definite”: x: xT x 0

2

2,2

2

2,1

2

1,2

2

1,1

Multivariate Gaussian

Page 12: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

12

Multivariate Gaussian examples

= (0,0)

18.0

8.01

Page 13: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

13

Multivariate Gaussian examples

= (0,0)

18.0

8.01

Page 14: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

14

Marginal distributions of Gaussians are Gaussian

Given:

Marginal Distribution:

bbba

abaa

baba xxx ),(),,(

Useful Properties of Gaussians

Page 15: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

15

Marginal distributions of Gaussians are Gaussian

Page 16: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

16

Block Matrix Inversion

Theorem

Definition: Schur complements

Page 17: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

17

Conditional distributions of Gaussians are Gaussian

Notation:

Conditional Distribution:

bbba

abaa1

bbba

abaa

Useful Properties of Gaussians

Page 18: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

18

Higher Dimensions Visualizing > 3 dimensions is… difficult

Means and marginals are practical, but then we don’t see correlations between those variables

Marginals are Gaussian, e.g., f(6) ~ N(µ(6), σ2(6))

Visualizing an 8-dimensional Gaussian f:

µ(6) σ2(6)

6 5 4 3 2 1 7 8

Page 19: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

19

Yet Higher Dimensions Why stop there?

Don’t panic: It’s just a function

Page 20: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

20

Getting Ridiculous

Why stop there?

Page 21: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

21

Gaussian Process

Probability distribution indexed by an arbitrary set (integer, real, finite dimensional vector, etc)

Each element gets a Gaussian distribution over the reals with mean µ(x)

These distributions are dependent/correlated as defined by k(x,z)

Any finite subset of indices defines a multivariate Gaussian distribution

Definition:

Page 22: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

22

Gaussian Process

Distribution over functions….

Yayyy! If our regression model is a GP, then it won’t be a point estimate anymore! It can provide regression estimates with confidence

Domain (index set) of the functions can be pretty much whatever

• Reals

• Real vectors

• Graphs

• Strings

• Sets

• …

Page 23: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

23

Bayesian Updates for GPs

• How can we do regression and learn the GP from data?

• We will be Bayesians today: • Start with GP prior

• Get some data

• Compute a posterior

Page 24: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

24

Samples from the prior distribution

Picture is taken from Rasmussen and Williams

Page 25: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

25

Samples from the posterior distribution

Picture is taken from Rasmussen and Williams

Page 26: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

26

Prior

Page 27: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

27

Data

Page 28: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

28

Posterior

Page 29: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

29

Introduction

Ridge Regression

Gaussian Processes

• Weight space view Bayesian Ridge Regression + Kernel trick

• Function space view Prior distribution over functions

+ calculation posterior distributions

Contents

Page 30: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

30

Ridge Regression

Linear regression:

Ridge regression:

The Gaussian Process is a Bayesian Generalization

of the kernelized ridge regression

Page 31: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

31

Introduction

Ridge Regression

Gaussian Processes

• Weight space view Bayesian Ridge Regression + Kernel trick

• Function space view Prior distribution over functions

+ calculation posterior distributions

Contents

Page 32: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

32

Weight Space View

GP = Bayesian ridge regression in feature space

+ Kernel trick to carry out computations

The training data

Page 33: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

33

Bayesian Analysis of Linear Regression with Gaussian noise

Page 34: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

34

Bayesian Analysis of Linear Regression with Gaussian noise

The likelihood:

Page 35: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

35

Bayesian Analysis of Linear Regression with Gaussian noise

The prior:

Now, we can calculate the posterior:

Page 36: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

36

Bayesian Analysis of Linear Regression with Gaussian noise

After “completing the square”

MAP estimation

Ridge Regression

Page 37: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

37

Bayesian Analysis of Linear Regression with Gaussian noise

This posterior covariance matrix doesn’t depend on the observations y,

A strange property of Gaussian Processes

Page 38: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

38

Projections of Inputs into Feature Space

The reviewed Bayesian linear regression suffers from

limited expressiveness

To overcome the problem )

go to a feature space and do linear regression there

a., explicit features

b., implicit features (kernels)

Page 39: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

39

Explicit Features

Linear regression in the feature space

Page 40: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

40

Explicit Features

The predictive distribution after feature map:

Page 41: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

41

Explicit Features

Shorthands:

The predictive distribution after feature map:

Page 42: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

42

Explicit Features The predictive distribution after feature map:

A problem with (*) is that it needs an NxN matrix inversion...

(*)

(*) can be rewritten:

Theorem:

Page 43: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

43

Proofs • Mean expression. We need:

• Variance expression. We need:

Lemma:

Matrix inversion Lemma:

Page 44: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

44

From Explicit to Implicit Features

Reminder: This was the original formulation:

Page 45: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

45

From Explicit to Implicit Features

The feature space always enters in the form of:

Lemma:

No need to know the explicit N dimensional features.

Their inner product is enough.

Page 46: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

46

Results

Page 47: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

47

Results using Netlab , Sin function

Page 48: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

48

Results using Netlab, Sin function

Increased # of training points

Page 49: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

49

Results using Netlab, Sin function

Increased noise

Page 50: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

50

Results using Netlab, Sinc function

Page 51: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

51

Thanks for the Attention!

Page 52: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

52

Extra Material

Page 53: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

53

Introduction

Ridge Regression

Gaussian Processes

• Weight space view Bayesian Ridge Regression + Kernel trick

• Function space view Prior distribution over functions

+ calculation posterior distributions

Contents

Page 54: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

54

Function Space View An alternative way to get the previous results

Inference directly in function space

Definition: (Gaussian Processes)

GP is a collection of random variables, s.t. any finite number of them have a joint Gaussian distribution

Page 55: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

55

Function Space View

Notations:

Page 56: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

56

Function Space View

Gaussian Processes:

Page 57: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

57

Function Space View

The Bayesian linear regression is an example of GP

Page 58: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

58

Function Space View

Special case

Page 59: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

59

Function Space View

Picture is taken from Rasmussen and Williams

Page 60: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

60

Function Space View

Observation

Explanation

Page 61: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

61

Prediction with noise free observations

noise free observations

Page 62: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

62

Prediction with noise free observations

Goal:

Page 63: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

63

Prediction with noise free observations Lemma:

Proofs: a bit of calculation using the joint (n+m) dim density

Remarks:

Page 64: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

64

Prediction with noise free observations

Picture is taken from Rasmussen and Williams

Page 65: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

65

Prediction using noisy observations

The joint distribution:

Page 66: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

66

Prediction using noisy observations

The posterior for the noisy observations:

where

In the weight space view we had:

Page 67: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

67

Prediction using noisy observations

Short notations:

Page 68: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

68

Prediction using noisy observations

Two ways to look at it:

• Linear predictor

• Manifestation of the Representer Theorem

Page 69: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

69

Prediction using noisy observations

Remarks:

Page 70: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

70

GP pseudo code Inputs:

Page 71: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

71

GP pseudo code (continued)

Outputs:

Page 72: Advanced Introduction to Machine Learning CMU-10715bapoczos/Classes/ML10715_2015... · 2015-10-14 · Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes ... R.

72

Thanks for the Attention!