Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised...
Transcript of Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised...
![Page 1: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/1.jpg)
Gaussian Process Regression Networks
Andrew Gordon Wilson
[email protected]/andrewUniversity of Cambridge
Joint work with David A. Knowles and Zoubin Ghahramani
June 27, 2012ICML, Edinburgh
1 / 20
![Page 2: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/2.jpg)
Multiple responses with input dependent covariances
I Two response variables:y1(x): concentration of cadmium at aspatial location x.y2(x): concentration of zinc at aspatial location x.
I The values of these responses, at agiven spatial location x∗, arecorrelated.
I We can account for thesecorrelations (rather than assumingy1(x) and y2(x) are independent) toenhance predictions.
I We can further enhance predictionsby accounting for how thesecorrelations vary with geographicallocation x. Accounting for inputdependent correlations is adistinctive feature of the Gaussianprocess regression network.
1 2 3 4 5longitude
1
2
3
4
5
6
lati
tude
0.15
0.00
0.15
0.30
0.45
0.60
0.75
0.90
2 / 20
![Page 3: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/3.jpg)
Motivation for modelling dependent covariances
PromiseI Many problems in fact have input dependent uncertainties and correlations.
I Accounting for dependent covariances (uncertainties and correlations) cangreatly improve statistical inferences.
Uncharted Territory
I For convenience, response variables are typically seen as independent, or ashaving fixed covariances (e.g. multi-task literature).
I The few existing models of dependent covariances are typically not expressive(e.g. Brownian motion covariance structure) or scalable (e.g. < 5 responsevariables).
GoalI We want to develop expressive and scalable models (> 1000 response variables)
for dependent uncertainties and correlations.
3 / 20
![Page 4: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/4.jpg)
Outline
I Gaussian process reviewI Gaussian process regression networksI Applications
4 / 20
![Page 5: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/5.jpg)
Gaussian processes
DefinitionA Gaussian process (GP) is a collection of random variables, any finite number ofwhich have a joint Gaussian distribution.
Nonparametric Regression Model
I Prior: f (x) ∼ GP(m(x), k(x, x′)), meaning (f (x1), . . . , f (xN)) ∼ N (µ,K),with µi = m(xi) and Kij = cov(f (xi), f (xj)) = k(xi, xj).
GP posterior︷ ︸︸ ︷p(f (x)|D) ∝
Likelihood︷ ︸︸ ︷p(D|f (x))
GP prior︷ ︸︸ ︷p(f (x))
−10 −5 0 5 10−3
−2
−1
0
1
2
3
input, t a)
outp
ut, f
(t)
Gaussian process sample prior functions
−10 −5 0 5 10−3
−2
−1
0
1
2
3
input, tb)
outp
ut, f
(t)
Gaussian process sample posterior functions
5 / 20
![Page 6: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/6.jpg)
Gaussian processes
“How can Gaussian processes possiblyreplace neural networks? Did we throw thebaby out with the bathwater?”
David MacKay, 1998.
6 / 20
![Page 7: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/7.jpg)
Semiparametric Latent Factor Model
The semiparametric latent factor model (SLFM) (Teh, Seeger, and Jordan, 2005) is apopular multi-output (multi-task) GP model for fixed signal correlations betweenoutputs (response variables):
p×1︷︸︸︷y(x) =
p×q︷︸︸︷W
q×1︷︸︸︷f(x) +σy
N (0,Ip)︷︸︸︷z(x)
I x: input variable (e.g. geographical location).
I y(x): p× 1 vector of output variables(responses) evaluated at x.
I W: p× q matrix of mixing weights.
I f(x): q× 1 vector of Gaussian processfunctions.
I σy: hyperparameter controlling noisevariance.
I z(x): i.i.d Gaussian white noise with p× pidentity covariance Ip.
fi(x)
f1(x)
fq(x)
y1(x)
yj(x)
W11
Wpq
W1q
Wp1
yp(x)
...
... ...
...
x
7 / 20
![Page 8: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/8.jpg)
Deriving the GPRN
f1(x1)
f2(x1)
y1(x1)
y2(x1)
f1(x2)
f2(x2)
y1(x2)
y2(x2)
Structure atx = x1
Structure atx = x2
At x = x1 the two outputs (responses) y1 and y2 are correlated since they share thebasis function f1. At x = x2 the outputs are independent.
8 / 20
![Page 9: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/9.jpg)
From the SLFM to the GPRN
fi(x)
f1(x)
fq(x)
y1(x)
yj(x)
W11
Wpq
W1q
Wp1
yp(x)
...
... ...
...
x
GPRNSLFM
?
p×1︷︸︸︷y(x) =
p×q︷︸︸︷W
q×1︷︸︸︷f(x) +σy
N (0,Ip)︷︸︸︷z(x)
9 / 20
![Page 10: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/10.jpg)
From the SLFM to the GPRN
fi(x)
f1(x)
fq(x)
y1(x)
yj(x)
W11
Wpq
W1q
Wp1
yp(x)
...
... ...
...
x
GPRNSLFM
f̂i(x)
f̂1(x)
f̂q(x)
y1(x)
yj(x)
W11(x)
Wpq (x)
W1q(x
)W
p1 (x
)
yp(x)
...
... ...
...
x
p×1︷︸︸︷y(x) =
p×q︷︸︸︷W
q×1︷︸︸︷f(x) +σy
N (0,Ip)︷︸︸︷z(x)
p×1︷︸︸︷y(x) =
p×q︷ ︸︸ ︷W(x) [
q×1︷︸︸︷f(x) +σf
N (0,Iq)︷︸︸︷ε(x) ]︸ ︷︷ ︸
f̂(x)
+σy
N (0,Ip)︷︸︸︷z(x)
10 / 20
![Page 11: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/11.jpg)
Gaussian process regression networks
p×1︷︸︸︷y(x) =
p×q︷ ︸︸ ︷W(x) [
q×1︷︸︸︷f(x) +σf
N (0,Iq)︷︸︸︷ε(x) ]︸ ︷︷ ︸
f̂(x)
+σy
N (0,Ip)︷︸︸︷z(x)
or, equivalently,
y(x) =
signal︷ ︸︸ ︷W(x)f(x) +
noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .
I y(x): p× 1 vector of output variables(responses) evaluated at x.
I W(x): p× q matrix of weight functions.W(x)ij ∼ GP(0, kw).
I f(x): q× 1 vector of Gaussian process nodefunctions. f(x)i ∼ GP(0, kf ).
I σf , σy: hyperparameters controlling noisevariance.
I ε(x), z(x): Gaussian white noise.
f̂i(x)
f̂1(x)
f̂q(x)
y1(x)
yj(x)
W11(x)
Wpq (x)
W1q(x
)W
p1 (x
)
yp(x)
...
... ...
...
x
11 / 20
![Page 12: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/12.jpg)
GPRN Inference
I We sample from the posterior over Gaussian processes in the weight and nodefunctions using elliptical slice sampling (ESS) (Murray, Adams, and MacKay,2010). ESS is especially good for sampling from posteriors with correlatedGaussian priors.
I We also approximate this posterior using a message passing implementation ofvariational Bayes (VB).
I The computational complexity is cubic in the number of data points and linearin the number of response variables, per iteration of ESS or VB.
I Details are in the paper.
12 / 20
![Page 13: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/13.jpg)
GPRN Results, Jura Heavy Metal Dataset
1 2 3 4 5longitude
1
2
3
4
5
6
lati
tude
0.15
0.00
0.15
0.30
0.45
0.60
0.75
0.90
f2(x)
f1(x)
W11(x)
W12(x)
W21(x)
W22(x)
W31(x)
W32(x)
y1(x)
y2(x)
y3(x)
y(x) =
signal︷ ︸︸ ︷W(x)f(x) +
noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .
13 / 20
![Page 14: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/14.jpg)
GPRN Results, Gene Expression 50D
GPRN (ESS) GPRN (VB) CMOGP LMC SLFM0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8GENE (p=50)
SM
SE
14 / 20
![Page 15: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/15.jpg)
GPRN Results, Gene Expression 1000D
GPRN (ESS) GPRN (VB) CMOFITC CMOPITC CMODTC0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
SM
SE
GENE (p=1000)
15 / 20
![Page 16: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/16.jpg)
Training Times on GENE
Training time GENE (50D) (s) GENE (1000D) (s)
GPRN (VB) 12 330GPRN (ESS) 40 9000LMC, CMOGP, SLFM minutes days
16 / 20
![Page 17: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/17.jpg)
Multivariate Volatility Results
GPRN (VB) GPRN (ESS) GWP WP MGARCH0
1
2
3
4
5
6
7
MS
E
MSE on EQUITY dataset
17 / 20
![Page 18: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/18.jpg)
Summary
I A Gaussian process regression network is used for multi-task regression andmultivariate volatility, and can account for input dependent signal and noisecovariances.
I Can scale to thousands of dimensions.
I Outperforms multi-task Gaussian process models and multivariate volatilitymodels.
18 / 20
![Page 19: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/19.jpg)
Generalised Wishart ProcessesRecall that the GPRN model can be written as
y(x) =
signal︷ ︸︸ ︷W(x)f(x) +
noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .
The induced noise process,
Σ(x)noise = σ2f W(x)W(x)T + σ2
y I ,
is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). Atevery x, Σ(x) is marginally Wishart, and the dynamics of Σ(x) are governed by theGP covariance kernel used for the weight functions in W(x).
19 / 20
![Page 20: Gaussian Process Regression Networksandrewgw/gprnslidesicml.pdf · is an example of a Generalised Wishart Process (Wilson and Ghahramani, 2010). At every x, ( x) is marginally Wishart,](https://reader033.fdocuments.us/reader033/viewer/2022050310/5f71cc7bc5a2df79662c7883/html5/thumbnails/20.jpg)
GPRN Inference New
y(x) =
signal︷ ︸︸ ︷W(x)f(x) +
noise︷ ︸︸ ︷σf W(x)ε(x) + σyz(x) .
Prior is induced through GP priors in nodes and weights
p(u|σf ,γ) = N (0,CB)
Likelihood
p(D|u, σf , σy) =
N∏i=1
N (y(xi); W(xi)f̂(xi), σ2y Ip)
Posterior
p(u|D, σf , σy,γ) ∝ p(D|u, σf , σy)p(u|σf ,γ)
We sample from the posterior using elliptical slice sampling (Murray, Adams, andMacKay, 2010) or approximate it using a message passing implementation ofvariational Bayes.
20 / 20