Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable...

Scalable Training of Inference Networks forGaussian-Process Models

Jiaxin ShiTsinghua University

Jun ZhuMohammad Emtiyaz Khan

Joint work with

Gaussian Process

mean function

covariance function / kernel

Sparse variational GP [Titsias, 09; Hensman et al., 13]

Gaussian field

inducing points

Posterior inference

complexity, conjugate likelihoods

Inference Networks for GP Models Remove sparse assumption

Gaussianfield

Inputs

Observations

Data Prediction

Inference network

Inference Networks for GP Models Remove sparse assumption

Inference network

Data Prediction

Gaussianfield

Inputs

Observations

Examples of Inference Networks

weight space

function space

● Bayesian neural networks:

○ intractable output densitysin

cos

sin

freq (s)

weights (w)

[Sun et al., 19]

● Inference network architecture can be derived from

the weight-space posterior

○ Random feature expansions

○ Deep neural nets

[Cutajar, et al., 18]

● Consider matching variational and true posterior processes at arbitrary

● This objective is doing improper minibatch for the KL divergence term

Minibatch Training is DifficultFunctional Variational Bayesian Neural Networks (Sun et al., 19)

● Full batch fELBO

● Practical fELBO

Measurement points

Scalable Training of Inference Networks for GP Models Stochastic, functional mirror descent

● work with the functional density directly

○ natural gradient in the density space

○ minibatch approximation with stochastic functional gradient

● closed-form solution as an adaptive Bayesian filter

[Dai et al., 16; Cheng & Boots, 16]

seeing next data point adapted prior

● sequentially applying Bayes’ rule is the most natural gradient

○ in conjugate models: equivalent to natural gradient for exponential families

[Raskutti & Mukherjee, 13; Khan & Lin, 17]

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

student teacher

● an idea from filtering: bootstrap

○ similar idea: temporal difference (TD) learning with function approximation

Scalable Training of Inference Networks for GP Models Minibatch training of inference networks

● (Gaussian likelihood case) closed-form marginals of at locations

○ equivalent to GP regression

● (Nonconjugate case) optimize an upper bound of

Scalable Training of Inference Networks for GP Models Measurement points vs. inducing points

GPN

et

M=2 M=5 M=20

SVG

P

● inducing points - expressiveness of variational approximation

● measurement points - variance of training

Scalable Training of Inference Networks for GP Models Effect of proper minibatch training

FBNN, M=20 GPNet, M=20

Airline Delay (700K)

● Fix underfitting N=100, batch size=20

● Better performance with more measurement points

Scalable Training of Inference Networks for GP Models Regression & Classification

Regression benchmarks

GP classification with a priorderived from

infinite-width Bayesian ConvNets

Poster #227Code: https://github.com/thjashin/gp-infer-net

Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable...

Documents

Transcript of Scalable Training of Inference Networks for Gaussian ...12-14-00)-12-15-05-5097... · Scalable...