Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape...

Background Stochastic Gradient MCMC Results

Learning Weight Uncertainty with Stochastic

Gradient MCMC for Shape Classification

Presenter: Chunyuan Li

Chunyuan Li, Andrew Stevens, Changyou Chen,Yunchen Pu, Zhe Gan, Lawrence Carin

Duke University

June 30, 2016

Deep Neural Nets for Shape Representations

Shapes in the real-world manifest rich variability.

Learning deep representations of shapes with DNNs.While SGD with Backpropagation is popular, issues exist:

Overfitting: Make overly confident decisions on prediction

Stochastic Gradient MCMC: Motivation

Weight Uncertainty of DNNs

! " ! "

! : Input Object " : Output Label : Hidden Units

: Weight with8a fixed value : Weight with a distribution

Posterior inference of weight distributions

Bring MCMC back to CV community to tackle “big data"Traditional MCMC: was popular in CV a decade ago

– inclduing Gibbs sampling, HMC, MH, etc; NOT scalablePropose to use scalable MCMC to fill the gap

Stochastic Gradient MCMC: Algorithm

Implementation1 Training: Adding noise to parameter update2 Testing: Model averaging

SG-MCMC algorithms and their optimization counterpartsAlgorithms SG-MCMC OptimizationBasic SGLD SGDPreconditioning pSGLDı RMSprop/AdagradMomentum SGHMC momentum SGDThermostat SGNHT Santaù

[ı] Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural NetworksLi et al, AAAI 2016

[ù] Bridging the Gap between Stochastic Gradient MCMC and Stochastic OptimizationChen et al, AISTATS 2016

Interpretation of Dropout and Batch Normaliation

Dropout/DropConnect and SGLD share the same form ofupdate rule, with the only di�erence being that the level ofinjected noise is di�erent.

The integration of binary Dropout with SG-MCMC can beviewed as learning weight uncertainty of mixtures of neuralnetworks.

Batch-Normalization can accelerate SG-MCMC training.It helps prevent the sampler from getting stuck in thesaturated regimes of nonlinearities.

Results: Applications to Shape Classification

A variety of 2D and 3D datasets– inclduing SHREC and ShapeNet etc

Empirical observationsThe use of Bayesian learning (SG-MCMC or Dropout) slows downtraining initially. This is likely due to the higher uncertainty imposedduring learning, resulting in more exploration of parameter space.

Increased uncertainty, however, prevents overfitting and eventuallyresults in improved performance.

Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape...

Documents

Transcript of Learning Weight Uncertainty with Stochastic Gradient MCMC ... · Results: Applications to Shape...

Lect4: Exact Sampling Techniques and MCMC Convergence Analysisvcla.stat.ucla.edu/old/MCMC/MCMC_tutorial/Lect7_Convergence.pdf · Lect4: Exact Sampling Techniques and MCMC Convergence

Partnership working MCMC

A Dirichlet Form approach to MCMC Optimal Scaling · Intro MCMC Dirichlet ResultsConcRefs Introduction to Markov chain Monte Carlo (MCMC) General reference:Brooks et al. (2011) MCMC

Very Simple Classiﬁcation Rules Perform Well on … · Very Simple Classiﬁcation Rules Perform Well on Most Commonly Used Datasets Robert C. Holte (holte@csi.uottawa.ca) Computer

MCMC: Gibbs Sampler

Mcmc Gibbs Intro

Information-theoretic Dictionary Learning for Image Classiﬁcation · 2018-09-04 · datasets demonstrate the effectiveness of our approach for image classiﬁcation tasks. Index

Tutorial Lectures on MCMC I - · PDF fileTutorial Lectures on MCMC I Sujit Sahu a University of Southampton Utrecht: August 2000. Introduction to MCMC

«MCMC Conference»

An Introduction to MCMC for Machine Learning - Vision Labsvision.psych.umn.edu/users/schrater/schrater_lab/courses/AI2/MCMC... · An Introduction to MCMC for Machine Learning CHRISTOPHE

Dnssec Mcmc Amir

DiRS: On Creating Benchmark Datasets for Remote Sensing ... · principles, we also provide an example on building datasets for RS image classiﬁcation, i.e., Million-AID, a new large-scale

Metric Structures on Datasets: Stability and Classification ... · Metric Structures on Datasets: Stability and Classification of Algorithms 3 Fig.1. A simplification of 3d models

MCMC gains accrediation

Convergence of MCMC Algorithms in Finite Samplesecon.ucsb.edu/~doug/245a/Papers/MCMC Convergence Slides.pdf · Convergence of MCMC Algorithms in Finite Samples Anna Kormilitsina and

MCMC INTERCESSORY PRAYER

Mcmc Theory

MCMC Algorithms - Saquib

Basic Excel MCMC

Replica exchange MCMC