Fitting Covariance and Multioutput Gaussian...
Transcript of Fitting Covariance and Multioutput Gaussian...
![Page 1: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/1.jpg)
Fitting Covariance and Multioutput GaussianProcesses
Neil D. Lawrence
GPSS16th September 2014
![Page 2: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/2.jpg)
Outline
Parametric Models are a Bottleneck
Constructing Covariance
GP Limitations
Kalman Filter
![Page 3: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/3.jpg)
Outline
Parametric Models are a Bottleneck
Constructing Covariance
GP Limitations
Kalman Filter
![Page 4: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/4.jpg)
Nonparametric Gaussian Processes
I We’ve seen how we go from parametric to non-parametric.I The limit implies infinite dimensional w.I Gaussian processes are generally non-parametric: combine
data with covariance function to get model.I This representation cannot be summarized by a parameter
vector of a fixed size.
![Page 5: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/5.jpg)
The Parametric Bottleneck
I Parametric models have a representation that does notrespond to increasing training set size.
I Bayesian posterior distributions over parameters containthe information about the training data.
I Use Bayes’ rule from training data, p(w|y,X
),
I Make predictions on test data
p(y∗|X∗,y,X
)=
∫p(y∗|w,X∗
)p(w|y,X)dw
).
I w becomes a bottleneck for information about the trainingset to pass to the test set.
I Solution: increase m so that the bottleneck is so large that itno longer presents a problem.
I How big is big enough for m? Non-parametrics saysm→∞.
![Page 6: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/6.jpg)
The Parametric Bottleneck
I Now no longer possible to manipulate the model throughthe standard parametric form.
I However, it is possible to express parametric as GPs:
k(xi, x j
)= φ: (xi)
> φ:
(x j
).
I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full
rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.
![Page 7: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/7.jpg)
The Parametric Bottleneck
I Now no longer possible to manipulate the model throughthe standard parametric form.
I However, it is possible to express parametric as GPs:
k(xi, x j
)= φ: (xi)
> φ:
(x j
).
I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full
rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.
![Page 8: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/8.jpg)
The Parametric Bottleneck
I Now no longer possible to manipulate the model throughthe standard parametric form.
I However, it is possible to express parametric as GPs:
k(xi, x j
)= φ: (xi)
> φ:
(x j
).
I These are known as degenerate covariance matrices.
I Their rank is at most m, non-parametric models have fullrank covariance matrices.
I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.
![Page 9: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/9.jpg)
The Parametric Bottleneck
I Now no longer possible to manipulate the model throughthe standard parametric form.
I However, it is possible to express parametric as GPs:
k(xi, x j
)= φ: (xi)
> φ:
(x j
).
I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full
rank covariance matrices.
I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.
![Page 10: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/10.jpg)
The Parametric Bottleneck
I Now no longer possible to manipulate the model throughthe standard parametric form.
I However, it is possible to express parametric as GPs:
k(xi, x j
)= φ: (xi)
> φ:
(x j
).
I These are known as degenerate covariance matrices.I Their rank is at most m, non-parametric models have full
rank covariance matrices.I Most well known is the “linear kernel”, k(xi, x j) = x>i x j.
![Page 11: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/11.jpg)
Making Predictions
I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.
I In GPs this involves combining the training data with thecovariance function and the mean function.
I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.
I Complexity of parametric model remains fixed regardlessof the size of our training data set.
I For a non-parametric model the required number ofparameters grows with the size of the training data.
![Page 12: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/12.jpg)
Making Predictions
I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.
I In GPs this involves combining the training data with thecovariance function and the mean function.
I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.
I Complexity of parametric model remains fixed regardlessof the size of our training data set.
I For a non-parametric model the required number ofparameters grows with the size of the training data.
![Page 13: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/13.jpg)
Making Predictions
I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.
I In GPs this involves combining the training data with thecovariance function and the mean function.
I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.
I Complexity of parametric model remains fixed regardlessof the size of our training data set.
I For a non-parametric model the required number ofparameters grows with the size of the training data.
![Page 14: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/14.jpg)
Making Predictions
I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.
I In GPs this involves combining the training data with thecovariance function and the mean function.
I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.
I Complexity of parametric model remains fixed regardlessof the size of our training data set.
I For a non-parametric model the required number ofparameters grows with the size of the training data.
![Page 15: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/15.jpg)
Making Predictions
I For non-parametrics prediction at new points f∗ is made byconditioning on f in the joint distribution.
I In GPs this involves combining the training data with thecovariance function and the mean function.
I Parametric is a special case when conditional predictioncan be summarized in a fixed number of parameters.
I Complexity of parametric model remains fixed regardlessof the size of our training data set.
I For a non-parametric model the required number ofparameters grows with the size of the training data.
![Page 16: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/16.jpg)
Covariance Functions and Mercer Kernels
I Mercer Kernels and Covariance Functions are similar.
I the kernel perspective does not make a probabilisticinterpretation of the covariance function.
I Algorithms can be simpler, but probabilistic interpretationis crucial for kernel parameter optimization.
![Page 17: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/17.jpg)
Covariance Functions and Mercer Kernels
I Mercer Kernels and Covariance Functions are similar.I the kernel perspective does not make a probabilistic
interpretation of the covariance function.
I Algorithms can be simpler, but probabilistic interpretationis crucial for kernel parameter optimization.
![Page 18: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/18.jpg)
Covariance Functions and Mercer Kernels
I Mercer Kernels and Covariance Functions are similar.I the kernel perspective does not make a probabilistic
interpretation of the covariance function.I Algorithms can be simpler, but probabilistic interpretation
is crucial for kernel parameter optimization.
![Page 19: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/19.jpg)
Outline
Parametric Models are a Bottleneck
Constructing Covariance
GP Limitations
Kalman Filter
![Page 20: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/20.jpg)
Constructing Covariance Functions
I Sum of two covariances is also a covariance function.
k(x, x′) = k1(x, x′) + k2(x, x′)
![Page 21: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/21.jpg)
Constructing Covariance Functions
I Product of two covariances is also a covariance function.
k(x, x′) = k1(x, x′)k2(x, x′)
![Page 22: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/22.jpg)
Multiply by Deterministic Function
I If f (x) is a Gaussian process.I g(x) is a deterministic function.I h(x) = f (x)g(x)I Then
kh(x, x′) = g(x)k f (x, x′)g(x′)
where kh is covariance for h(·) and k f is covariance for f (·).
![Page 23: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/23.jpg)
Covariance Functions
MLP Covariance Function
k (x, x′) = αasin(
wx>x′ + b√
wx>x + b + 1√
wx′>x′ + b + 1
)
I Based on infinite neuralnetwork model.
w = 40
b = 4
![Page 24: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/24.jpg)
Covariance Functions
MLP Covariance Function
k (x, x′) = αasin(
wx>x′ + b√
wx>x + b + 1√
wx′>x′ + b + 1
)
I Based on infinite neuralnetwork model.
w = 40
b = 4
![Page 25: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/25.jpg)
Covariance Functions
Linear Covariance Function
k (x, x′) = αx>x′
I Bayesian linearregression.
α = 1
![Page 26: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/26.jpg)
Covariance Functions
Linear Covariance Function
k (x, x′) = αx>x′
I Bayesian linearregression.
α = 1
![Page 27: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/27.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 28: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/28.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 29: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/29.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 30: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/30.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 31: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/31.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 32: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/32.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 33: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/33.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 34: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/34.jpg)
Gaussian Process Interpolation
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
f(x)
x
Figure : Real example: BACCO (see e.g. (Oakley and O’Hagan, 2002)).Interpolation through outputs from slow computer simulations (e.g.atmospheric carbon levels).
![Page 35: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/35.jpg)
Gaussian Noise
I Gaussian noise model,
p(yi| fi
)= N
(yi| fi, σ2
)where σ2 is the variance of the noise.
I Equivalent to a covariance function of the form
k(xi, x j) = δi, jσ2
where δi, j is the Kronecker delta function.I Additive nature of Gaussians means we can simply add
this term to existing covariance matrices.
![Page 36: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/36.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 37: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/37.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 38: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/38.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 39: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/39.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 40: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/40.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 41: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/41.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 42: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/42.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 43: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/43.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 44: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/44.jpg)
Gaussian Process Regression
-3
-2
-1
0
1
2
3
-2 -1 0 1 2
y(x)
x
Figure : Examples include WiFi localization, C14 callibration curve.
![Page 45: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/45.jpg)
General Noise Models
Graph of a GPI Relates input variables,
X, to vector, y, through fgiven kernel parametersθ.
I Plate notation indicatesindependence of yi| fi.
I In general p(yi| fi
)is
non-Gaussian.I We approximate with
Gaussianp(yi| fi
)≈ N
(mi| fi, β−1
i
).
yi
X
fi
θ
i = 1 . . . n
Figure : The Gaussian processdepicted graphically.
![Page 46: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/46.jpg)
Gaussian Noise
0
1
2
-3 -2 -1 0 1 2 3 4
p(
f∗|X, x∗,y)
Figure : Inclusion of a data point with Gaussian noise.
![Page 47: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/47.jpg)
Gaussian Noise
0
1
2
-3 -2 -1 0 1 2 3 4
p(
f∗|X, x∗,y)
p(y∗ = 0.6| f∗
)
Figure : Inclusion of a data point with Gaussian noise.
![Page 48: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/48.jpg)
Gaussian Noise
0
1
2
-3 -2 -1 0 1 2 3 4
p(
f∗|X, x∗,y)
p(y∗ = 0.6| f∗
)p(
f∗|X, x∗,y, y∗)
Figure : Inclusion of a data point with Gaussian noise.
![Page 49: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/49.jpg)
Expectation Propagation
Local Moment Matching
I Easiest to consider a single previously unseen data point,y∗, x∗.
I Before seeing data point, prediction of f∗ is a GP, q(
f∗|y,X).
I Update prediction using Bayes’ Rule,
p(
f∗|y, y∗,X, x∗)
=p(y∗| f∗
)p(
f∗|y,X, x∗)
p(y, y∗|X, x∗
) .
This posterior is not a Gaussian process if p(y∗| f∗
)is
non-Gaussian.
![Page 50: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/50.jpg)
Classification Noise Model
Probit Noise Model
0
0.5
1
-4 -2 0 2 4
p(y i|f
i)
fi
yi = −1 yi = 1
Figure : The probit model (classification). The plot shows p(yi| fi
)for
different values of yi. For yi = 1 we have
p(yi| fi
)= φ
(fi)
=∫ fi−∞N (z|0, 1) dz.
![Page 51: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/51.jpg)
Expectation Propagation II
Match Moments
I Idea behind EP — approximate with a Gaussian process atthis stage by matching moments.
I This is equivalent to minimizing the following KLdivergence where q
(f∗|y, y∗,X, x∗
)is constrained to be a GP.
q(
f∗|y, y∗X, x∗)
= argminq( f∗ |y,y∗X,x∗)KL(p(
f∗|y, y∗X, x∗)||q
(f∗|y, y∗,X, x∗
))I This is equivalent to setting⟨
f∗⟩
q( f∗|y,y∗,X,x∗) =⟨
f∗⟩
p( f∗|y,y∗,X,x∗)⟨f 2∗
⟩q( f∗|y,y∗,X,x∗)
=⟨
f 2∗
⟩p( f∗|y,y∗,X,x∗)
![Page 52: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/52.jpg)
Expectation Propagation III
Equivalent Gaussian
I This is achieved by replacing p(y∗| f∗
)with a Gaussian
distribution
p(
f∗|y, y∗,X, x∗)
=p(y∗| f∗
)p(
f∗|y,X, x∗)
p(y, y∗|X, x∗
)becomes
q(
f∗|y, y∗,X, x∗)
=N
(m∗| f∗, β−1
m
)p(
f∗|y,X, x∗)
p(y, y∗|X, x∗
) .
![Page 53: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/53.jpg)
Classification
0
1
2
3
-3 -2 -1 0 1 2 3
p(
f∗|X, x∗,y)
Figure : An EP style update with a classification noise model.
![Page 54: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/54.jpg)
Classification
0
1
2
3
-3 -2 -1 0 1 2 3
p(
f∗|X, x∗,y)
p(y∗ = 1| f∗
)
Figure : An EP style update with a classification noise model.
![Page 55: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/55.jpg)
Classification
0
1
2
3
-3 -2 -1 0 1 2 3
p(
f∗|X, x∗,y)
p(y∗ = 1| f∗
)p(
f∗|X, x∗,y, y∗)
Figure : An EP style update with a classification noise model.
![Page 56: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/56.jpg)
Classification
0
1
2
3
-3 -2 -1 0 1 2 3
p(
f∗|X, x∗,y)
p(y∗ = 1| f∗
)p(
f∗|X, x∗,y, y∗)
q(
f∗|X, x∗,y)
Figure : An EP style update with a classification noise model.
![Page 57: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/57.jpg)
Ordinal Noise Model
Ordered Categories
0
0.5
1
-4 -2 0 2 4
p(y i|f
i)
fi
yi = −1 yi = 1yi = 0
Figure : The ordered categorical noise model (ordinal regression).The plot shows p
(yi| fi
)for different values of yi. Here we have
assumed three categories.
![Page 58: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/58.jpg)
Laplace Approximation
I Equivalent Gaussian is found by making a local 2nd orderTaylor approximation at the mode.
I Laplace was the first to suggest this1, so it’s known as theLaplace approximation.
![Page 59: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/59.jpg)
Learning Covariance ParametersCan we determine covariance parameters from the data?
N(y|0,K
)=
1
(2π)n2 |K|
12exp
(−
y>K−1y2
)
The parameters are inside the covariancefunction (matrix).
ki, j = k(xi, x j;θ)
![Page 60: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/60.jpg)
Learning Covariance ParametersCan we determine covariance parameters from the data?
N(y|0,K
)=
1
(2π)n2 |K|
12exp
(−
y>K−1y2
)
The parameters are inside the covariancefunction (matrix).
ki, j = k(xi, x j;θ)
![Page 61: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/61.jpg)
Learning Covariance ParametersCan we determine covariance parameters from the data?
logN(y|0,K
)=−
12
log |K|−y>K−1y
2−
n2
log 2π
The parameters are inside the covariancefunction (matrix).
ki, j = k(xi, x j;θ)
![Page 62: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/62.jpg)
Learning Covariance ParametersCan we determine covariance parameters from the data?
E(θ) =12
log |K| +y>K−1y
2
The parameters are inside the covariancefunction (matrix).
ki, j = k(xi, x j;θ)
![Page 63: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/63.jpg)
Eigendecomposition of Covariance
A useful decomposition for understanding the objectivefunction.
K = RΛ2R>
λ1λ2
Diagonal of Λ represents distancealong axes.R gives a rotation of these axes.
where Λ is a diagonal matrix and R>R = I.
Useful representation since |K| =∣∣∣Λ2
∣∣∣ = |Λ|2.
![Page 64: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/64.jpg)
Capacity control: log |K|
λ1 0
0 λ2
λ1
Λ =
![Page 65: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/65.jpg)
Capacity control: log |K|
λ1 0
0 λ2
λ1
λ2Λ =
![Page 66: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/66.jpg)
Capacity control: log |K|
λ1 0
0 λ2
λ1
λ2Λ =
![Page 67: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/67.jpg)
Capacity control: log |K|
|Λ| = λ1λ2
λ1 0
0 λ2
λ1
λ2Λ =
![Page 68: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/68.jpg)
Capacity control: log |K|
|Λ| = λ1λ2
λ1 0
0 λ2
λ1
λ2Λ =
![Page 69: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/69.jpg)
Capacity control: log |K|
|Λ| = λ1λ2
λ1 0
0 λ2
λ1
λ2 |Λ|Λ =
![Page 70: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/70.jpg)
Capacity control: log |K|
|Λ| = λ1λ2
λ1 0 0
0 λ2 0
0 0 λ3λ1
λ2 |Λ|Λ =
![Page 71: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/71.jpg)
Capacity control: log |K|
|Λ| = λ1λ2λ3
λ1 0 0
0 λ2 0
0 0 λ3λ1
λ2
λ3
|Λ|Λ =
![Page 72: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/72.jpg)
Capacity control: log |K|
|Λ| = λ1λ2
λ1 0
0 λ2
λ1
λ2 |Λ|Λ =
![Page 73: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/73.jpg)
Capacity control: log |K|
|RΛ| = λ1λ2
w1,1 w1,2
w2,1 w2,2
λ1λ2
|Λ|RΛ =
![Page 74: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/74.jpg)
Data Fit: y>K−1y2
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6
y 2
y1
λ1
λ2
![Page 75: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/75.jpg)
Data Fit: y>K−1y2
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6
y 2
y1
λ1
λ2
![Page 76: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/76.jpg)
Data Fit: y>K−1y2
-6
-4
-2
0
2
4
6
-6 -4 -2 0 2 4 6
y 2
y1
λ1λ2
![Page 77: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/77.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 78: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/78.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 79: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/79.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 80: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/80.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 81: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/81.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 82: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/82.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 83: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/83.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 84: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/84.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 85: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/85.jpg)
Learning Covariance ParametersCan we determine length scales and noise levels from the data?
-2
-1
0
1
2
-2 -1 0 1 2
y(x)
x
-10-505
101520
10−1 100 101
length scale, `
E(θ) =12
log |K| +y>K−1y
2
![Page 86: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/86.jpg)
Gene Expression Example
I Given given expression levels in the form of a time seriesfrom Della Gatta et al. (2008).
I Want to detect if a gene is expressed or not, fit a GP to eachgene (Kalaitzis and Lawrence, 2011).
![Page 87: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/87.jpg)
RESEARCH ARTICLE Open Access
A Simple Approach to Ranking DifferentiallyExpressed Gene Expression Time Courses throughGaussian Process RegressionAlfredo A Kalaitzis* and Neil D Lawrence*
Abstract
Background: The analysis of gene expression from time series underpins many biological studies. Two basic formsof analysis recur for data of this type: removing inactive (quiet) genes from the study and determining whichgenes are differentially expressed. Often these analysis stages are applied disregarding the fact that the data isdrawn from a time series. In this paper we propose a simple model for accounting for the underlying temporalnature of the data based on a Gaussian process.
Results: We review Gaussian process (GP) regression for estimating the continuous trajectories underlying in geneexpression time-series. We present a simple approach which can be used to filter quiet genes, or for the case oftime series in the form of expression ratios, quantify differential expression. We assess via ROC curves the rankingsproduced by our regression framework and compare them to a recently proposed hierarchical Bayesian model forthe analysis of gene expression time-series (BATS). We compare on both simulated and experimental data showingthat the proposed approach considerably outperforms the current state of the art.
Conclusions: Gaussian processes offer an attractive trade-off between efficiency and usability for the analysis ofmicroarray time series. The Gaussian process framework offers a natural way of handling biological replicates andmissing values and provides confidence intervals along the estimated curves of gene expression. Therefore, webelieve Gaussian processes should be a standard tool in the analysis of gene expression time series.
BackgroundGene expression profiles give a snapshot of mRNA con-centration levels as encoded by the genes of an organ-ism under given experimental conditions. Early studiesof this data often focused on a single point in timewhich biologists assumed to be critical along the generegulation process after the perturbation. However, thestatic nature of such experiments severely restricts theinferences that can be made about the underlying dyna-mical system.With the decreasing cost of gene expression microar-
rays time series experiments have become commonplacegiving a far broader picture of the gene regulation pro-cess. Such time series are often irregularly sampled andmay involve differing numbers of replicates at each timepoint [1]. The experimental conditions under which
gene expression measurements are taken cannot be per-fectly controlled leading the signals of interest to be cor-rupted by noise, either of biological origin or arisingthrough the measurement process.Primary analysis of gene expression profiles is often
dominated by methods targeted at static experiments, i.e. gene expression measured on a single time-point, thattreat time as an additional experimental factor [1-6].However, were possible, it would seem sensible to con-sider methods that can account for the special nature oftime course data. Such methods can take advantage ofthe particular statistical constraints that are imposed ondata that is naturally ordered [7-12].The analysis of gene expression microarray time-series
has been a stepping stone to important problems in sys-tems biology such as the genome-wide identification ofdirect targets of transcription factors [13,14] and the fullreconstruction of gene regulatory networks [15,16]. Amore comprehensive review on the motivations and
* Correspondence: [email protected]; [email protected] Sheffield Institute for Translational Neuroscience, 385A Glossop Road,Sheffield, S10 2HQ, UK
Kalaitzis and Lawrence BMC Bioinformatics 2011, 12:180http://www.biomedcentral.com/1471-2105/12/180
© 2011 Kalaitzis and Lawrence; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the CreativeCommons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, andreproduction in any medium, provided the original work is properly cited.
![Page 88: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/88.jpg)
-2.5-2
-1.5-1
-0.50
0.51
1 1.5 2 2.5 3 3.5
log 10
SNR
log10 length scale
Contour plot of Gaussian process likelihood.
![Page 89: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/89.jpg)
-2.5-2
-1.5-1
-0.50
0.51
1 1.5 2 2.5 3 3.5
log 10
SNR
log10 length scale
-1
-0.5
0
0.5
1
0 50100150200250300
y(x)
x
Optima: length scale of 1.2221 and log10 SNR of 1.9654 loglikelihood is -0.22317.
![Page 90: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/90.jpg)
-2.5-2
-1.5-1
-0.50
0.51
1 1.5 2 2.5 3 3.5
log 10
SNR
log10 length scale
-1
-0.5
0
0.5
1
0 50100150200250300
y(x)
x
Optima: length scale of 1.5162 and log10 SNR of 0.21306 loglikelihood is -0.23604.
![Page 91: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/91.jpg)
-2.5-2
-1.5-1
-0.50
0.51
1 1.5 2 2.5 3 3.5
log 10
SNR
log10 length scale
-0.8-0.6-0.4-0.2
00.20.40.60.8
0 50100150200250300
y(x)
x
Optima: length scale of 2.9886 and log10 SNR of -4.506 loglikelihood is -2.1056.
![Page 92: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/92.jpg)
Outline
Parametric Models are a Bottleneck
Constructing Covariance
GP Limitations
Kalman Filter
![Page 93: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/93.jpg)
Limitations of Gaussian Processes
I Inference is O(n3) due to matrix inverse (in practice useCholesky).
I Gaussian processes don’t deal well with discontinuities(financial crises, phosphorylation, collisions, edges inimages).
I Widely used exponentiated quadratic covariance (RBF) canbe too smooth in practice (but there are manyalternatives!!).
![Page 94: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/94.jpg)
Outline
Parametric Models are a Bottleneck
Constructing Covariance
GP Limitations
Kalman Filter
![Page 95: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/95.jpg)
Simple Markov Chain
I Assume 1-d latent state, a vector over time, x = [x1 . . . xT].I Markov property,
xi =xi−1 + εi,
εi ∼N (0, α)
=⇒ xi ∼N (xi−1, α)
I Initial state,x0 ∼ N (0, α0)
I If x0 ∼ N (0, α) we have a Markov chain for the latent states.I Markov chain it is specified by an initial distribution
(Gaussian) and a transition distribution (Gaussian).
![Page 96: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/96.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x0 = 0.000, ε1 = −2.24
x1 = 0.000 − 2.24 = −2.24
![Page 97: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/97.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x1 = −2.24, ε2 = 0.457
x2 = −2.24 + 0.457 = −1.78
![Page 98: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/98.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x2 = −1.78, ε3 = 0.178
x3 = −1.78 + 0.178 = −1.6
![Page 99: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/99.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x3 = −1.6, ε4 = −0.292
x4 = −1.6 − 0.292 = −1.89
![Page 100: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/100.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x4 = −1.89, ε5 = −0.501
x5 = −1.89 − 0.501 = −2.39
![Page 101: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/101.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x5 = −2.39, ε6 = 1.32
x6 = −2.39 + 1.32 = −1.08
![Page 102: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/102.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x6 = −1.08, ε7 = 0.989
x7 = −1.08 + 0.989 = −0.0881
![Page 103: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/103.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x7 = −0.0881, ε8 = −0.842
x8 = −0.0881 − 0.842 = −0.93
![Page 104: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/104.jpg)
Gauss Markov Chain
-4
-2
0
2
4
0 1 2 3 4 5 6 7 8 9
x
t
x0 = 0, εi ∼ N (0, 1)
x8 = −0.93, ε9 = −0.41
x9 = −0.93 − 0.410 = −1.34
![Page 105: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/105.jpg)
Multivariate Gaussian Properties: Reminder
Ifz ∼ N
(µ,C
)and
x = Wz + b
thenx ∼ N
(Wµ + b,WCW>
)
![Page 106: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/106.jpg)
Multivariate Gaussian Properties: Reminder
Simplified: Ifz ∼ N
(0, σ2I
)and
x = Wz
thenx ∼ N
(0, σ2WW>
)
![Page 107: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/107.jpg)
Matrix Representation of Latent Variables
x1
x2
x3
x4
x5
ε1
ε2
ε3
ε4
ε5
1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1
×=
x1 = ε1
![Page 108: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/108.jpg)
Matrix Representation of Latent Variables
x1
x2
x3
x4
x5
ε1
ε2
ε3
ε4
ε5
1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1
×=
x2 = ε1 + ε2
![Page 109: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/109.jpg)
Matrix Representation of Latent Variables
x1
x2
x3
x4
x5
ε1
ε2
ε3
ε4
ε5
1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1
×=
x3 = ε1 + ε2 + ε3
![Page 110: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/110.jpg)
Matrix Representation of Latent Variables
x1
x2
x3
x4
x5
ε1
ε2
ε3
ε4
ε5
1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1
×=
x4 = ε1 + ε2 + ε3 + ε4
![Page 111: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/111.jpg)
Matrix Representation of Latent Variables
x1
x2
x3
x4
x5
ε1
ε2
ε3
ε4
ε5
1 0 0 0 01 1 0 0 01 1 1 0 01 1 1 1 01 1 1 1 1
×=
x5 = ε1 + ε2 + ε3 + ε4 + ε5
![Page 112: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/112.jpg)
Matrix Representation of Latent Variables
x εL1 ×=
![Page 113: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/113.jpg)
Multivariate Process
I Since x is linearly related to εwe know x is a Gaussianprocess.
I Trick: we only need to compute the mean and covarianceof x to determine that Gaussian.
![Page 114: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/114.jpg)
Latent Process Mean
x = L1ε
![Page 115: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/115.jpg)
Latent Process Mean
〈x〉 = 〈L1ε〉
![Page 116: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/116.jpg)
Latent Process Mean
〈x〉 = L1 〈ε〉
![Page 117: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/117.jpg)
Latent Process Mean
〈x〉 = L1 〈ε〉
ε ∼ N (0, αI)
![Page 118: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/118.jpg)
Latent Process Mean
〈x〉 = L10
![Page 119: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/119.jpg)
Latent Process Mean
〈x〉 = 0
![Page 120: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/120.jpg)
Latent Process Covariance
xx> = L1εε>L>1x> = ε>L>
![Page 121: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/121.jpg)
Latent Process Covariance
⟨xx>
⟩=
⟨L1εε>L>1
⟩
![Page 122: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/122.jpg)
Latent Process Covariance
⟨xx>
⟩= L1
⟨εε>
⟩L>1
![Page 123: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/123.jpg)
Latent Process Covariance
⟨xx>
⟩= L1
⟨εε>
⟩L>1
ε ∼ N (0, αI)
![Page 124: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/124.jpg)
Latent Process Covariance
⟨xx>
⟩= αL1L>1
![Page 125: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/125.jpg)
Latent Process
x = L1ε
![Page 126: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/126.jpg)
Latent Process
x = L1ε
ε ∼ N (0, αI)
![Page 127: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/127.jpg)
Latent Process
x = L1ε
ε ∼ N (0, αI)
=⇒
![Page 128: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/128.jpg)
Latent Process
x = L1ε
ε ∼ N (0, αI)
=⇒
x ∼ N(0, αL1L>1
)
![Page 129: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/129.jpg)
Covariance for Latent Process II
I Make the variance dependent on time interval.I Assume variance grows linearly with time.I Justification: sum of two Gaussian distributed random
variables is distributed as Gaussian with sum of variances.I If variable’s movement is additive over time (as described)
variance scales linearly with time.
![Page 130: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/130.jpg)
Covariance for Latent Process II
I Givenε ∼ N (0, αI) =⇒ ε ∼ N
(0, αL1L>1
).
Thenε ∼ N (0,∆tαI) =⇒ ε ∼ N
(0,∆tαL1L>1
).
where ∆t is the time interval between observations.
![Page 131: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/131.jpg)
Covariance for Latent Process II
ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1
)
K = α∆tL1L>1
ki, j = α∆tl>:,il:, j
where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.
ki, j = α∆t min(i, j)
define ∆ti = ti so
ki, j = αmin(ti, t j) = k(ti, t j)
![Page 132: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/132.jpg)
Covariance for Latent Process II
ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1
)K = α∆tL1L>1
ki, j = α∆tl>:,il:, j
where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.
ki, j = α∆t min(i, j)
define ∆ti = ti so
ki, j = αmin(ti, t j) = k(ti, t j)
![Page 133: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/133.jpg)
Covariance for Latent Process II
ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1
)K = α∆tL1L>1
ki, j = α∆tl>:,il:, j
where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.
ki, j = α∆t min(i, j)
define ∆ti = ti so
ki, j = αmin(ti, t j) = k(ti, t j)
![Page 134: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/134.jpg)
Covariance for Latent Process II
ε ∼ N (0, α∆tI) , x ∼ N(0, α∆tL1L>1
)K = α∆tL1L>1
ki, j = α∆tl>:,il:, j
where l:,k is a vector from the kth row of L1: the first k elementsare one, the next T − k are zero.
ki, j = α∆t min(i, j)
define ∆ti = ti so
ki, j = αmin(ti, t j) = k(ti, t j)
![Page 135: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/135.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Markov Process
k (t, t′) = αmin(t, t′)
I Covariance matrix isbuilt using the inputs tothe function t.
![Page 136: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/136.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Markov Process
k (t, t′) = αmin(t, t′)
I Covariance matrix isbuilt using the inputs tothe function t.
-3-2-10123
0 0.5 1 1.5 2
![Page 137: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/137.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Markov Process
Visualization of inverse covariance (precision).
I Precision matrix issparse: only neighboursin matrix are non-zero.
I This reflects conditionalindependencies in data.
I In this case Markovstructure.
![Page 138: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/138.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)
k (x, x′) = α exp
−‖x − x′‖222`2
I Covariance matrix is
built using the inputs tothe function x.
I For the example above itwas based on Euclideandistance.
I The covariance functionis also know as a kernel.
![Page 139: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/139.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Exponentiated Quadratic Kernel Function (RBF, SquaredExponential, Gaussian)
k (x, x′) = α exp
−‖x − x′‖222`2
I Covariance matrix is
built using the inputs tothe function x.
I For the example above itwas based on Euclideandistance.
I The covariance functionis also know as a kernel.
![Page 140: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/140.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Exponentiated Quadratic
Visualization of inverse covariance (precision).
I Precision matrix is notsparse.
I Each point is dependenton all the others.
I In this casenon-Markovian.
![Page 141: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/141.jpg)
Covariance FunctionsWhere did this covariance matrix come from?
Markov Process
Visualization of inverse covariance (precision).
I Precision matrix issparse: only neighboursin matrix are non-zero.
I This reflects conditionalindependencies in data.
I In this case Markovstructure.
![Page 142: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/142.jpg)
Simple Kalman Filter I
I We have state vector X =[x1 . . . xq
]∈ RT×q and if each state
evolves independently we have
p(X) =
q∏i=1
p(x:,i)
p(x:,i) = N(x:,i|0,K
).
I We want to obtain outputs through:
yi,: = Wxi,:
![Page 143: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/143.jpg)
Stacking and Kronecker Products I
I Represent with a ‘stacked’ system:
p(x) = N (x|0, I ⊗K)
where the stacking is placing each column of X one on topof another as
x =
x:,1x:,2...
x:,q
![Page 144: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/144.jpg)
Kronecker Product
aK bKcK dK
Ka b
c d⊗ =
![Page 145: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/145.jpg)
Kronecker Product
⊗ =
![Page 146: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/146.jpg)
Stacking and Kronecker Products I
I Represent with a ‘stacked’ system:
p(x) = N (x|0, I ⊗K)
where the stacking is placing each column of X one on topof another as
x =
x:,1x:,2...
x:,q
![Page 147: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/147.jpg)
Column Stacking
⊗ =
![Page 148: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/148.jpg)
For this stacking the marginal distribution over time is given bythe block diagonals.
![Page 149: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/149.jpg)
For this stacking the marginal distribution over time is given bythe block diagonals.
![Page 150: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/150.jpg)
For this stacking the marginal distribution over time is given bythe block diagonals.
![Page 151: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/151.jpg)
For this stacking the marginal distribution over time is given bythe block diagonals.
![Page 152: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/152.jpg)
For this stacking the marginal distribution over time is given bythe block diagonals.
![Page 153: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/153.jpg)
Two Ways of Stacking
Can also stack each row of X to form column vector:
x =
x1,:x2,:...
xT,:
p(x) = N (x|0,K ⊗ I)
![Page 154: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/154.jpg)
Row Stacking
⊗ =
![Page 155: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/155.jpg)
For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.
![Page 156: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/156.jpg)
For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.
![Page 157: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/157.jpg)
For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.
![Page 158: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/158.jpg)
For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.
![Page 159: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/159.jpg)
For this stacking the marginal distribution over the latentdimensions is given by the block diagonals.
![Page 160: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/160.jpg)
Observed Process
The observations are related to the latent points by a linearmapping matrix,
yi,: = Wxi,: + εi,:
ε ∼ N(0, σ2I
)
![Page 161: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/161.jpg)
Mapping from Latent Process to Observed
Wx1,:
Wx2,:
Wx3,:
x1,:
x2,:
x3,:
W 0 0
0 W 0
0 0 W
× =
![Page 162: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/162.jpg)
Output Covariance
This leads to a covariance of the form
(I ⊗W)(K ⊗ I)(I ⊗W>) + Iσ2
Using (A ⊗ B)(C ⊗D) = AC ⊗ BD This leads to
K ⊗WW> + Iσ2
ory ∼ N
(0,WW>
⊗K + Iσ2)
![Page 163: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/163.jpg)
Kernels for Vector Valued Outputs: A Review
Foundations and TrendsR© inMachine LearningVol. 4, No. 3 (2011) 195–266c© 2012 M. A. Alvarez, L. Rosasco and N. D. LawrenceDOI: 10.1561/2200000036
Kernels for Vector-ValuedFunctions: A Review
By Mauricio A. Alvarez,
Lorenzo Rosasco and Neil D. Lawrence
Contents
1 Introduction 197
2 Learning Scalar Outputs
with Kernel Methods 200
2.1 A Regularization Perspective 200
2.2 A Bayesian Perspective 202
2.3 A Connection Between Bayesian
and Regularization Points of View 205
3 Learning Multiple Outputs with
Kernel Methods 207
3.1 Multi-output Learning 207
3.2 Reproducing Kernel for Vector-Valued Functions 209
3.3 Gaussian Processes for Vector-Valued Functions 211
4 Separable Kernels and Sum of Separable Kernels 213
4.1 Kernels and Regularizers 214
4.2 Coregionalization Models 217
4.3 Extensions 228
![Page 164: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/164.jpg)
Kronecker Structure GPs
I This Kronecker structure leads to several publishedmodels.
(K(x, x′))d,d′ = k(x, x′)kT(d, d′),
where k has x and kT has n as inputs.I Can think of multiple output covariance functions as
covariances with augmented input.I Alongside x we also input the d associated with the output
of interest.
![Page 165: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/165.jpg)
Separable Covariance Functions
I Taking B = WW> we have a matrix expression acrossoutputs.
K(x, x′) = k(x, x′)B,
where B is a p × p symmetric and positive semi-definitematrix.
I B is called the coregionalization matrix.I We call this class of covariance functions separable due to
their product structure.
![Page 166: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/166.jpg)
Sum of Separable Covariance Functions
I In the same spirit a more general class of kernels is given by
K(x, x′) =
q∑j=1
k j(x, x′)B j.
I This can also be written as
K(X,X) =
q∑j=1
B j ⊗ k j(X,X),
I This is like several Kalman filter-type models addedtogether, but each one with a different set of latentfunctions.
I We call this class of kernels sum of separable kernels (SoSkernels).
![Page 167: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/167.jpg)
Geostatistics
I Use of GPs in Geostatistics is called kriging.I These multi-output GPs pioneered in geostatistics:
prediction over vector-valued output data is known ascokriging.
I The model in geostatistics is known as the linear model ofcoregionalization (LMC, Journel and Huijbregts (1978);Goovaerts (1997)).
I Most machine learning multitask models can be placed inthe context of the LMC model.
![Page 168: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/168.jpg)
Weighted sum of Latent Functions
I In the linear model of coregionalization (LMC) outputs areexpressed as linear combinations of independent randomfunctions.
I In the LMC, each component fd is expressed as a linear sum
fd(x) =
q∑j=1
wd, ju j(x).
where the latent functions are independent and havecovariance functions k j(x, x′).
I The processes f j(x)qj=1 are independent for q , j′.
![Page 169: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/169.jpg)
Kalman Filter Special Case
I The Kalman filter is an example of the LMC whereui(x)→ xi(t).
I I.e. we’ve moved form time input to a more general inputspace.
I In matrix notation:1. Kalman filter
F = WX
2. LMCF = WU
where the rows of these matrices F, X, U each contain qsamples from their corresponding functions at a differenttime (Kalman filter) or spatial location (LMC).
![Page 170: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/170.jpg)
Intrinsic Coregionalization Model
I If one covariance used for latent functions (like in Kalmanfilter).
I This is called the intrinsic coregionalization model (ICM,Goovaerts (1997)).
I The kernel matrix corresponding to a dataset X takes theform
K(X,X) = B ⊗ k(X,X).
![Page 171: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/171.jpg)
Autokrigeability
I If outputs are noise-free, maximum likelihood isequivalent to independent fits of B and k(x, x′) (Helterbrandand Cressie, 1994).
I In geostatistics this is known as autokrigeability(Wackernagel, 2003).
I In multitask learning its the cancellation of intertasktransfer (Bonilla et al., 2008).
![Page 172: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/172.jpg)
Intrinsic Coregionalization Model
K(X,X) = ww> ⊗ k(X,X).
w =
[15
]B =
[1 55 25
]
![Page 173: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/173.jpg)
Intrinsic Coregionalization Model
K(X,X) = ww> ⊗ k(X,X).
w =
[15
]B =
[1 55 25
]
![Page 174: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/174.jpg)
Intrinsic Coregionalization Model
K(X,X) = ww> ⊗ k(X,X).
w =
[15
]B =
[1 55 25
]
![Page 175: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/175.jpg)
Intrinsic Coregionalization Model
K(X,X) = ww> ⊗ k(X,X).
w =
[15
]B =
[1 55 25
]
![Page 176: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/176.jpg)
Intrinsic Coregionalization Model
K(X,X) = ww> ⊗ k(X,X).
w =
[15
]B =
[1 55 25
]
![Page 177: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/177.jpg)
Intrinsic Coregionalization Model
K(X,X) = B ⊗ k(X,X).
B =
[1 0.5
0.5 1.5
]
![Page 178: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/178.jpg)
Intrinsic Coregionalization Model
K(X,X) = B ⊗ k(X,X).
B =
[1 0.5
0.5 1.5
]
![Page 179: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/179.jpg)
Intrinsic Coregionalization Model
K(X,X) = B ⊗ k(X,X).
B =
[1 0.5
0.5 1.5
]
![Page 180: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/180.jpg)
Intrinsic Coregionalization Model
K(X,X) = B ⊗ k(X,X).
B =
[1 0.5
0.5 1.5
]
![Page 181: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/181.jpg)
Intrinsic Coregionalization Model
K(X,X) = B ⊗ k(X,X).
B =
[1 0.5
0.5 1.5
]
![Page 182: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/182.jpg)
LMC Samples
K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)
B1 =
[1.4 0.50.5 1.2
]`1 = 1
B2 =
[1 0.5
0.5 1.3
]`2 = 0.2
![Page 183: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/183.jpg)
LMC Samples
K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)
B1 =
[1.4 0.50.5 1.2
]`1 = 1
B2 =
[1 0.5
0.5 1.3
]`2 = 0.2
![Page 184: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/184.jpg)
LMC Samples
K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)
B1 =
[1.4 0.50.5 1.2
]`1 = 1
B2 =
[1 0.5
0.5 1.3
]`2 = 0.2
![Page 185: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/185.jpg)
LMC Samples
K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)
B1 =
[1.4 0.50.5 1.2
]`1 = 1
B2 =
[1 0.5
0.5 1.3
]`2 = 0.2
![Page 186: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/186.jpg)
LMC Samples
K(X,X) = B1 ⊗ k1(X,X) + B2 ⊗ k2(X,X)
B1 =
[1.4 0.50.5 1.2
]`1 = 1
B2 =
[1 0.5
0.5 1.3
]`2 = 0.2
![Page 187: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/187.jpg)
LMC in Machine Learning and Statistics
I Used in machine learning for GPs for multivariateregression and in statistics for computer emulation ofexpensive multivariate computer codes.
I Imposes the correlation of the outputs explicitly throughthe set of coregionalization matrices.
I Setting B = Ip assumes outputs are conditionallyindependent given the parameters θ. (Minka and Picard,1997; Lawrence and Platt, 2004; Yu et al., 2005).
I More recent approaches for multiple output modeling aredifferent versions of the linear model of coregionalization.
![Page 188: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/188.jpg)
Semiparametric Latent Factor Model
I Coregionalization matrices are rank 1 Teh et al. (2005).rewrite equation (??) as
K(X,X) =
q∑j=1
w:, jw>:, j ⊗ k j(X,X).
I Like the Kalman filter, but each latent function has adifferent covariance.
I Authors suggest using an exponentiated quadraticcharacteristic length-scale for each input dimension.
![Page 189: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/189.jpg)
Semiparametric Latent Factor Model Samples
K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)
w1 =
[0.51
]w2 =
[1
0.5
]
![Page 190: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/190.jpg)
Semiparametric Latent Factor Model Samples
K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)
w1 =
[0.51
]w2 =
[1
0.5
]
![Page 191: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/191.jpg)
Semiparametric Latent Factor Model Samples
K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)
w1 =
[0.51
]w2 =
[1
0.5
]
![Page 192: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/192.jpg)
Semiparametric Latent Factor Model Samples
K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)
w1 =
[0.51
]w2 =
[1
0.5
]
![Page 193: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/193.jpg)
Semiparametric Latent Factor Model Samples
K(X,X) = w:,1w>:,1 ⊗ k1(X,X) + w:,2w>:,2 ⊗ k2(X,X)
w1 =
[0.51
]w2 =
[1
0.5
]
![Page 194: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/194.jpg)
Gaussian processes for Multi-task, Multi-output andMulti-class
I Bonilla et al. (2008) suggest ICM for multitask learning.I Use a PPCA form for B: similar to our Kalman filter
example.I Refer to the autokrigeability effect as the cancellation of
inter-task transfer.I Also discuss the similarities between the multi-task GP and
the ICM, and its relationship to the SLFM and the LMC.
![Page 195: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/195.jpg)
Multitask Classification
I Mostly restricted to the case where the outputs areconditionally independent given the hyperparameters φ(Minka and Picard, 1997; Williams and Barber, 1998; Lawrenceand Platt, 2004; Seeger and Jordan, 2004; Yu et al., 2005;Rasmussen and Williams, 2006).
I Intrinsic coregionalization model has been used in themulticlass scenario. Skolidis and Sanguinetti (2011) use theintrinsic coregionalization model for classification, byintroducing a probit noise model as the likelihood.
I Posterior distribution is no longer analytically tractable:approximate inference is required.
![Page 196: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/196.jpg)
Computer Emulation
I A statistical model used as a surrogate for acomputationally expensive computer model.
I Higdon et al. (2008) use the linear model ofcoregionalization to model images representing theevolution of the implosion of steel cylinders.
I In Conti and O’Hagan (2009) use the ICM to model avegetation model: called the Sheffield Dynamic GlobalVegetation Model (Woodward et al., 1998).
![Page 197: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/197.jpg)
References I
E. V. Bonilla, K. M. Chai, and C. K. I. Williams. Multi-task Gaussian process prediction. In J. C. Platt, D. Koller,Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20, Cambridge, MA,2008. MIT Press.
S. Conti and A. O’Hagan. Bayesian emulation of complex multi-output and dynamic computer models. Journal ofStatistical Planning and Inference, 140(3):640–651, 2009. [DOI].
G. Della Gatta, M. Bansal, A. Ambesi-Impiombato, D. Antonini, C. Missero, and D. di Bernardo. Direct targets of thetrp63 transcription factor revealed by a combination of gene expression profiling and reverse engineering.Genome Research, 18(6):939–948, Jun 2008. [URL]. [DOI].
P. Goovaerts. Geostatistics For Natural Resources Evaluation. Oxford University Press, 1997. [Google Books] .
J. D. Helterbrand and N. A. C. Cressie. Universal cokriging under intrinsic coregionalization. Mathematical Geology,26(2):205–226, 1994.
D. M. Higdon, J. Gattiker, B. Williams, and M. Rightley. Computer model calibration using high dimensional output.Journal of the American Statistical Association, 103(482):570–583, 2008.
A. G. Journel and C. J. Huijbregts. Mining Geostatistics. Academic Press, London, 1978. [Google Books] .
A. A. Kalaitzis and N. D. Lawrence. A simple approach to ranking differentially expressed gene expression timecourses through Gaussian process regression. BMC Bioinformatics, 12(180), 2011. [DOI].
N. D. Lawrence and J. C. Platt. Learning to learn with the informative vector machine. In R. Greiner andD. Schuurmans, editors, Proceedings of the International Conference in Machine Learning, volume 21, pages 512–519.Omnipress, 2004. [PDF].
T. P. Minka and R. W. Picard. Learning how to learn is learning with point sets. Available on-line., 1997. [URL].Revised 1999, available at http://www.stat.cmu.edu/˜minka/.
J. Oakley and A. O’Hagan. Bayesian inference for the uncertainty distribution of computer model outputs.Biometrika, 89(4):769–784, 2002.
C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.[Google Books] .
M. Seeger and M. I. Jordan. Sparse Gaussian Process Classification With Multiple Classes. Technical Report 661,Department of Statistics, University of California at Berkeley,
![Page 198: Fitting Covariance and Multioutput Gaussian Processesml.dcs.shef.ac.uk/gpss/gpss14/talks/gp_gpss14_session2.pdf · Fitting Covariance and Multioutput Gaussian Processes Neil D. Lawrence](https://reader031.fdocuments.us/reader031/viewer/2022021904/5ba46aec09d3f2af168d6de6/html5/thumbnails/198.jpg)
References II
G. Skolidis and G. Sanguinetti. Bayesian multitask classification with Gaussian process priors. IEEE Transactions onNeural Networks, 22(12):2011 – 2021, 2011.
Y. W. Teh, M. Seeger, and M. I. Jordan. Semiparametric latent factor models. In R. G. Cowell and Z. Ghahramani,editors, Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, pages 333–340,Barbados, 6-8 January 2005. Society for Artificial Intelligence and Statistics.
H. Wackernagel. Multivariate Geostatistics: An Introduction With Applications. Springer-Verlag, 3rd edition, 2003.[Google Books] .
C. K. Williams and D. Barber. Bayesian Classification with Gaussian processes. IEEE Transactions on Pattern Analysisand Machine Intelligence, 20(12):1342–1351, 1998.
I. Woodward, M. R. Lomas, and R. A. Betts. Vegetation-climate feedbacks in a greenhouse world. PhilosophicalTransactions: Biological Sciences, 353(1365):29–39, 1998.
K. Yu, V. Tresp, and A. Schwaighofer. Learning Gaussian processes from multiple tasks. In Proceedings of the 22ndInternational Conference on Machine Learning (ICML 2005), pages 1012–1019, 2005.