MCMC: Gibbs Sampler

MCMC:Gibbs Sampler

Building Complex Models: Conditional Sampling

● Consider a more complex model

● When sampling for each parameter iteratively, only need to consider distributions that have that parameter

pb , ,∣y ∝ p y∣b p b∣ , p p

pb∣ , , y ∝ p y∣b p b∣ ,

p∣b , , y ∝ p b∣ , p

In one MCMC step

p bg1∣g ,g , y ∝ p y∣bg p bg∣g ,g

p g1∣bg1 ,g , y ∝ p bg1∣g ,g p g

p g1∣bg1 ,g1 , y ∝ pbg1∣g1 ,g p g

Gibbs Sampling

● If we can solve a conditional distribution analytically– We can sample from that conditional directly

– Even if we can't solve for full joint distribution analytically

● Example

p ,2∣y ∝N y∣ ,2N ∣0,V 0 IG 2∣ ,

p 2g1∣

g1 ,y ∝ N y∣g1 ,2g IG

2g∣ ,

p g1∣

2g ,y ∝N y∣g ,2gN

g∣0,V 0

= IG n2 ,12∑ y i−

Trade-offs of Gibbs Sampling

● 100% acceptance rate● Quicker mixing, quicker convergence● No need to specify a Jump distribution● No need to tune a Jump variance● Requires that we know the analytical solution

for the conditional– Conjugate distributions

● Can mix different samplers within a MCMC

Gibbs Sampler

1) Start from some initial parameter value c

2) Evaluate the unnormalized posterior

3) Propose a new parameter value Random draw from conditional distribution given the current value for all other parameters

4) Evaluate the new unnormalized posterior

5) Decide whether or not to accept the new valueAccept new value 100% of the time

6) Repeat 3-5

Metropolis Algorithm

)1 Start from some initial parameter value c

)2 Evaluate the unnormalized posterior p(c)

)3 Propose a new parameter value Random draw from a “jump” distribution centered on the current parameter value

)4 Evaluate the new unnormalized posterior p()

)5 Decide whether or not to accept the new valueAccept new value with probability

a = p() / p(c)

)6 Repeat 3-5

Bayesian Regressionvia Gibbs Sampling

● Consider the standard regression model

– Y = response variable

– Xj = covariates

– j = parameters

yi=b0b1 xi ,1b2 xi ,2⋯bn xi , ni

i~N 0,2

Matrix version

● As noted earlier this

can be expressed as

● The full data set can be expressed as

yi=b0b1 xi ,1b2 xi ,2⋯bn xi ,n

yi=X ib

Quick review of matrix math

● Vector

● Vector x vector

bT×b=[b1 b2 b3]×[b1

b3]=b1

● Transpose

bT=[b1 b2 b3 ]3 x 11 x 3

1 x 3 3 x 1 1 x 1

See Appendix C for more details

Quick review of matrix math

● Vector x vector

● Vector x matrix

b× bT=[b1

b3]×[b1 b2 b3]=[

b12 b1b2 b1b3

b2b1 b22 b2b3

b3b1 b3b2 b32 ]

● Matrix x Matrix

3 x 1 1 x 1

3 x 1 n x 1

X XT=Zn x 3 n x n3 x n

● Matrix x Matrix

● Identity Matrix

XT X=[ ∑ xi ,12 ∑ xi ,1 xi ,2 ∑ x i ,1 xi ,3

∑ x i ,2 xi ,1 ∑ xi ,22 ∑ x i ,2 xi ,3

∑ xi ,3 xi ,1 ∑ xi ,3 xi ,2 ∑ xi ,32 ]

3 x n 3 x 3n x 3

I=[1 0 00 1 00 0 1]

● Matrix Inversion

X−1X=X X−1=I

...back to regression

y=X b ~N n ,0

~N n0,2 I Likelihood

p y ∣b ,2 ,X =N ny ∣ X b ,2 I

Priorsp b=N pb ∣ b0 ,V b

p 2=IG

2∣ s1 , s2

Process modelData model

Parameter model

General pattern

● Write down likelihood– Data model

– Process model

● Write down priors for the free parameters in the likelihood– Parameter model

● Solve for posterior distribution of model parameters– Analytical

– Numerical

Posterior

p b ,2∣X ,y∝N n y ∣X b ,2 I ×

N pb ∣ b0 ,V b IG 2∣ s1 , s2

Sample in terms of conditionals

p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b

p 2∣b ,X ,y∝N n y ∣X b ,2 I IG 2∣ s1 , s2

p b ∣2 ,X ,y∝N n y ∣X b ,2 I N p b ∣ b0 ,V b

Regression Parameters

Will have a posterior that is multivariate Normal...

N n b ∣ vV ,V =1

2n /2∣∣1 /2exp [ 1

2b−Vv TV−1

b−Vv ]

What are V and v??

b−Vv TV−1b−Vv

bTV−1b− bTV−1V v−v V V−1bv V V−1Vv

bTV−1b− bT v−v bvVv

bTV−1b−2 bT vvVv

y−XbT 2 I −1y−Xbb−b0

TV b−1b−b0

−2 yT y−−2 yT Xb−−2bT XT y−2bT XT Xb

bTV b−1b− bTV b

−1 b0−b0TV b

−1bb0TV b

−1b0

bTV−1b

−2 bT v

−2bT XT Xb bT V b−1b V−1=−2 XT XV b

−−2 yT Xb−−2bT XT y− bTV b−1 b0−b0

TV b−1b

v=−2XT yV b

−1 b0

p 2 ∣b ,X ,y∝N n y ∣X b ,2 I IG 2 ∣ s1 , s2

Variance

Will have a posterior that is Inverse Gamma...

IG 2∣u1 ,u2∝2

−u11exp [− u2

n /2 exp [ 122 y−Xb

T y−Xb]2−s11exp [− s2

u1=s1n2

u2=s212y−XbT y−Xb

BUGS implementation

model{

##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)

##likelihoodfor(i in 1:n){

mu[i] <- b[1] + x[i] * b[2]y[i] ~ dnorm(mu[i],tau)

b [ 1 ]

0 . 0 2 0 . 0 4 0 . 0 6 0 . 0

b [ 2 ]

- 1 0 . 0 - 5 . 0 0 . 0

b [ 1 ].1 0 0 2 0 . 0 3 0 . 0 4 0 . 0

t a u0 . 0 0 . 1 0 . 2 0 . 3

b [ ]1

l a g0 5 0 1 0 0

Centering the data

model{

##prior distributionsb[1] ~ dnorm(0,1.0E-5)b[2] ~ dnorm(0,1.0E-5)tau ~ dgamma(0.1,0.001)mX <- mean(x[])

##likelihoodfor(i in 1:n){

mu[i] <- b[1] + (x[i]-mX) * b[2]y[i] ~ dnorm(mu[i],tau)

b [ 1 ]- 5 . 0 0 . 0 5 . 0 1 0 . 0

Outline

● Different numerical techniques for sampling from the posterior– Rejection Sampling

– Inverse Distribution Sampling

– Markov Chain-Monte Carlo (MCMC)● Metropolis● Metropolis-Hastings● Gibbs sampling

● Sampling conditionals vs full model● Flexibility to specify complex models

Where things stand...

● We've now filled our “toolbox” with the basic tools that will enable us to explore more advanced topics– Probability rules & distributions

– Data models & Likelihoods

– Process models

– Parameter models & Prior distributions

– Analytical and Numerical Methods in ML and Bayes

General pattern - Bayes

● Write down likelihood– Data model

– Process model

● Write down priors for the free parameters in the likelihood– Parameter model

● Solve for posterior distribution of model parameters– Analytical

– Numerical

General Pattern - Maximum Likelihood

● Step 1: Write a likelihood function describing the likelihood of the observation

● Step 2: Find the value of the model parameter that maximized the likelihood– Calculate -lnL

– Analytical● Take derivative with respect to EACH parameter● Set = 0 , solve for parameter

– Numerical● Find minimum of -lnL surface

Where we're going...

● Section 2 is the “meat” of the course:– Interval estimation

● Confidence and Credible intervals● Predictive intervals

– Model selection and hypothesis testing

– GLMs

– Nonlinear models

– Hierarchal models and random effects

– Measurement error and missing data

● Section 3 is more advanced applications

MCMC: Gibbs Sampler

Documents

Transcript of MCMC: Gibbs Sampler

Copy of Gibbs Sampler full - Division of Public Health ... · examples, we show mathematically that what a Gibbs sampler converges to is a function of the order of the sampling scheme

Fast Collapsed Gibbs Sampler for Dirichlet Process ...rajarshd.github.io/talks/DPGMM_Cholesky.pdfFast Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models using Rank

1 Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation.

Introduction to MCMC methods, the Gibbs Sampler, and Data Augmentation

massachusetts institute of technology — computer science ... · a Gibbs sampler for the inﬁnite blockmodel, integrating out η and using the CRP to sample z. Gibbs sampling is

Introduction to Just Another Gibbs Sampler (JAGS)rcs46/jags_present.pdf · Introduction to Just Another Gibbs Sampler (JAGS) Rebecca Steorts Department of Statistics University of

Motif identification with Gibbs Sampler

The Gibbs Motif Sampler By: Andrea Smith Introduction to Bioinformatics December 9, 2004.

Improving the structure MCMC sampler for Bayesian networks ... · Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move ... leading to

Gibbs sampling (an MCMC method) and relations to EMtom/10-702/GibbsAndMCMCsampling.pdf · 2 Gibbs Sampling We have a joint density f (x, y 1, …, y k) and we are interested, say,

Markov Chain Monte Carlo and Gibbs Samplingliam/teaching/neurostat-spr11/papers/mcmc/... · Markov Chain Monte Carlo and Gibbs Sampling Lecture Notes for EEB 581, version 26 April

On particle Gibbs sampling - arXivThe particle Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm to sample from the full posterior distribution of a state-space model. It

Introducing and evaluating a Gibbs sampler for spatial ... · 2 Data augmentation and Gibbs sampler for spatial Poisson re-gression models We assume that observations Yi,i=1,..,n

MCMC Methods: Gibbs and Metropolis · The Metropolis-Hastings algorithm Gibbs sampling MCMC Methods: Gibbs and Metropolis Patrick Breheny February 28 Patrick Breheny BST 701: Bayesian

(AR)a Gibbs Sampler for the (Extended) Marginal Rasch Model (2015)

Edge Detection Enhancement Using Gibbs Sampler Author: Michael Sedivy.

George Casella and Edward George - Explaining the Gibbs Sampler

Markov Chain Monte Carlo. Gibbs Sampler. - ime.usp.bryambar/MAE5704/Aula9GibbsSampler-2018/aula... · Aula 9. Gibbs Sampler. 2 [CG] Introduction. \The Gibbs sampler is a technique

MCMC Basics and Gibbs Sampling - Purdue Universityjltobias/BayesClass/lecture... · 2010. 2. 1. · In Gibbs sampling, we construct the transition kernel so thatthe posterior distribution

Improving performances of MCMC for Nearest Neighbor ... · Gibbs sampler loops over , w(S) and , is xed to 0. The latent eld w(S) is updated sequentially or by blocks. This sampler