Post on 21-Jun-2015
description
2. Outline Introduction Stochastic Variational Inference Variational Inference 101 Stochastic Variational Inference Deep Generative Models with SVB MCMC with mini-batches MCMC 101 MCMC using noisy gradients MCMC using noisy Metropolis-Hastings Theoretical results Conclusion 3. Big Data (mine is bigger than yours) Square Kilometer Array (SKA) produces 1 Exabyte per day by 2024 (interested to do approximate inference on this data, talk to me) 4. Introduction Why do we need posterior inference if the datasets are BIG? 5. p>>N Big data may mean large p, small n Gene expression data fMRI data 5 6. Planning Planning against uncertainty needs probabilities 6 7. Little data inside Big data Not every data-case carries information about every model component New user with no ratings (cold start problem) 7 8. 1943: First NN (+/- N=10) 1988: NetTalk (+/- N=20K) 2009: Hintons Deep Belief Net (+/- N=10M) 2013: Google/Y! (N=+/- 10B) Big Models! Models grow faster than useful information in data 8 9. Two Ingredients for Big Data Bayes Any big data posterior inference algorithm should: 1. easily run on a distributed architecture. 2. only use a small mini-batch of the data at every iteration. 10. Bayesian Posterior Inference Variational Inference Sampling Variational Family Q All probability distributions Deterministic Biased Local minima Easy to assess convergence Stochastic (sample error) Unbiased Hard to mix between modes Hard to assess convergence 11. Variational Bayes 11 Hinton & van Camp (1993) Neal & Hinton (1999) Saul & Jordan (1996) Saul, Jaakkola & Jordan (1996) Attias (1999,2000) Wiegerinck (2000) Ghahramani & Beal (2000,2001) Coordinate descent on Q P Q (Bishop, Pattern Recognition and Machine Learning) 12. Stochastic VB Hoffman, Blei & Bach, 2010 Stochastic natural gradient descent on Q 12 P and Q in exponential family. Q factorized: At every iteration: subsample n