Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable...
Transcript of Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable...
Scalable Training of Inference Networks forGaussian-Process Models
Jiaxin ShiTsinghua University
Jun ZhuMohammad Emtiyaz Khan
Joint work with
Gaussian Process
mean function
covariance function / kernel
Sparse variational GP [Titsias, 09; Hensman et al., 13]
Gaussian field
inducing points
Posterior inference
complexity, conjugate likelihoods
Inference Networks for GP Models Remove sparse assumption
Gaussianfield
Inputs
Observations
Data Prediction
Inference network
Inference Networks for GP Models Remove sparse assumption
Inference network
Data Prediction
Gaussianfield
Inputs
Observations
Examples of Inference Networks
weight space
function space
● Bayesian neural networks:
○ intractable output densitysin
cos
sin
freq (s)
weights (w)
[Sun et al., 19]
● Inference network architecture can be derived from
the weight-space posterior
○ Random feature expansions
○ Deep neural nets
[Cutajar, et al., 18]
● Consider matching variational and true posterior processes at arbitrary
● This objective is doing improper minibatch for the KL divergence term
Minibatch Training is DifficultFunctional Variational Bayesian Neural Networks (Sun et al., 19)
● Full batch fELBO
● Practical fELBO
Measurement points
Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent
● work with the functional density directly
○ natural gradient in the density space
○ minibatch approximation with stochastic functional gradient
● closed-form solution as an adaptive Bayesian filter
[Dai et al., 16; Cheng & Boots, 16]
seeing next data point adapted prior
● sequentially applying Bayes’ rule is the most natural gradient
○ in conjugate models: equivalent to natural gradient for exponential families
[Raskutti & Mukherjee, 13; Khan & Lin, 17]
Scalable Training of Inference Networks for GP Models Minibatch training of inference networks
student teacher
● an idea from filtering: bootstrap
○ similar idea: temporal difference (TD) learning with function approximation
Scalable Training of Inference Networks for GP Models Minibatch training of inference networks
● (Gaussian likelihood case) closed-form marginals of at locations
○ equivalent to GP regression
● (Nonconjugate case) optimize an upper bound of
Scalable Training of Inference Networks for GP Models Measurement points vs. inducing points
GPN
et
M=2 M=5 M=20
SVG
P
● inducing points - expressiveness of variational approximation
● measurement points - variance of training
Scalable Training of Inference Networks for GP Models Effect of proper minibatch training
FBNN, M=20 GPNet, M=20
Airline Delay (700K)
● Fix underfitting N=100, batch size=20
● Better performance with more measurement points
Scalable Training of Inference Networks for GP Models Regression & Classification
Regression benchmarks
GP classification with a priorderived from
infinite-width Bayesian ConvNets
Poster #227Code: https://github.com/thjashin/gp-infer-net