Inference V: MCMC Methods

Inference V:MCMC Methods

Stochastic Sampling

In previous class, we examined methods that use independent samples to estimate P(X = x |e )

Problem: It is difficult to sample from P(X1, …. Xn |e )

We had to use likelihood weighting to reweigh our samples

This introduced bias in estimation In some case, such as when the evidence is on

leaves, these methods are inefficient

MCMC Methods

We are going to discuss sampling methods that are based on Markov Chain

Markov Chain Monte Carlo (MCMC) methods

Key ideas: Sampling process as a Markov Chain

Next sample depends on the previous one These will approximate any posterior distribution

We start by reviewing key ideas from the theory of Markov chains

Markov Chains Suppose X1, X2, … take some set of values

wlog. These values are 1, 2, ... A Markov chain is a process that corresponds to the

network:

To quantify the chain, we need to specify Initial probability: P(X1) Transition probability: P(Xt+1|Xt)

A Markov chain has stationary transition probability

P(Xt+1|Xt) is the same for all times t

X1 X2 X3Xn... ...

Irreducible Chains

A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0

There is a positive probability of reaching j from i after some number steps

A chain is irreducible if every state is accessible from every state

Ergodic Chains

A state is positively recurrent if there is a finite expected time to get back to state i after being in state i

If X has finite number of states, then this is suffices that i is accessible from itself

A chain is ergodic if it is irreducible and every state is positively recurrent

(A)periodic Chains

A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d

A chain is aperiodic if it contains no periodic state

Stationary Probabilities

Thm: If a chain is ergodic and aperiodic, then the limit

exists, and does not depend on i

Moreover, let

then, P*(X) is the unique probability satisfying

)|(lim 1 iXXP nn

=∞→

)|(lim)( 1* iXjXPjXP n

∞→

∑ ===== +i

tt iXPiXjXPjXP )()|()( *1

Stationary Probabilities

The probability P*(X) is the stationary probability of the process

Regardless of the starting point, the process will converge to this probability

The rate of convergence depends on properties of the transition probability

Sampling from the stationary probability

This theory suggests how to sample from the stationary probability:

Set X1 = i, for some random/arbitrary i For t = 1, 2, …, n

Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) return xn

If n is large enough, then this is a sample from P*(X)

Designing Markov Chains

How do we construct the right chain to sample from?

Ensuring aperiodicity and irreducibility is usually easy

Problem is ensuring the desired stationary probability

Designing Markov Chains

Key tool: If the transition probability satisfies

then, P*(X) = Q(X) This gives a local criteria for checking that the chain

will have the right stationary distribution

0)|1(whenever)()(

1 >==+==

==== =

+ itXjtXPiXQjXQ

jXiXPiXjXP

MCMC Methods

We can use these results to sample from P(X1,…,Xn|e)

Idea: Construct an ergodic & aperiodic Markov Chain

such that P*(X1,…,Xn) = P(X1,…,Xn|e)

Simulate the chain n steps to get a sample

MCMC Methods

Notes: The Markov chain variable Y takes as value

assignments to all variables that are consistent evidence

For simplicity, we will denote such a state using the vector of variables

}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV ××∈= L

Gibbs Sampler

One of the simplest MCMC method At each transition change the state of just on Xi

We can describe the transition probability as a stochastic procedure:

Input: a state x1,…,xn Choose i at random (using uniform probability) Sample x’i from

P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) let x’j = xj for all j i return x’1,…,x’n

Correctness of Gibbs Sampler

By chain rule

P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)

Thus, we get

Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

),,,,,,|'(),,,,,,|(

)|,,,',,,()|,,,,,,(

xxxxxPxxxxxP

+− =

Gibbs Sampling for Bayesian Network

Why is the Gibbs sampler “easy” in BNs? Recall that the Markov blanket of a variable

separates it from the other variables in the network P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

This property allows us to use local computations to perform sampling in each transition

Gibbs Sampling in Bayesian Networks

How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

Let Y1, …, Yk be the children of Xi

By definition of Mbi, the parents of Yj are in Mbi{Xi}

It is easy to show that

∑ ∏∏

x jyiii

ii payPPaxP

payPPaxP

)|()|'(

)|()|(

Sampling Strategy

How do we collect the samples?

Strategy I: Run the chain M times, each run for N steps

each run starts from a different state points Return the last state in each run

M chains

Sampling Strategy

Strategy II: Run one chain for a long time After some “burn in” period, sample points every

some fixed number of steps

“burn in” M samples from one chain

Comparing Strategies

Strategy I: Better chance of “covering” the space of points

especially if the chain is slow to reach stationarity Have to perform “burn in” steps for each chain

Strategy II: Perform “burn in” only once Samples might be correlated (although only weakly)

Hybrid strategy: run several chains, and sample few samples

from each Combines benefits of both strategies

Inference V: MCMC Methods

Documents

Transcript of Inference V: MCMC Methods

Bayesian parameter inference for stochastic biochemical ...nag48/pmcmc.pdfBayesian parameter inference for stochastic biochemical network models using particle MCMC Andrew Golightly

Monte Carlo and Empirical Methods for Stochastic Inference ...€¦ · Last time: MCMC methods for Baesiany inference The frequentist approach to inference Introduction to bootstrap

Scaling analysis of delayed rejection MCMC methods

MCMC Methods for Multivariate Generalized Linear Mixed Models

Bayesian statistics and MCMC methods for portfolio selection

Introduction to MCMC methods

Large-Scale Bayesian Computation Using Stochastic Gradient ... · Markov chain Monte Carlo (MCMC), one of the most popular methods for inference on Bayesian models, scales poorly

Markov Chain Monte Carlo (MCMC) Inferencenlp.chonbuk.ac.kr/PGM2019/slides_jbnu/MCMC.pdf · 2019-05-29 · Markov Chain Monte Carlo (MCMC) Inference Seung-Hoon Na Chonbuk National

Dynamics-based MCMC Methods · Introduction What is dynamics-based MCMC methods (D-MCMCs)? A kind of MCMC methods that generate samples by simulating a dynamics. What is a dynamics?

MCMC Methods for Continuous-Time Financial Econometrics · MCMC Methods for Continuous-Time Financial Econometrics Michael Johannes and Nicholas Polson ∗ January 25, 2006 Abstract

MCMC Methods in Harmonic Models

Connections between MCMC and Likelihood Methods

LESSON 1 AN INTRODUCTION TO MCMC SAMPLING METHODS ...

Stochastic Subgradient MCMC Methods - arXiv Subgradient MCMC Methods ... stochastic subgradient methods for deterministic optimization, ... we conjoin the ideas of stochastic subgradi-ent

FEM-BASED DISCRETIZATION-INVARIANT MCMC METHODS …users.ices.utexas.edu/~tanbui/PublishedPapers/FEMBayesian.pdf · fem-based discretization-invariant mcmc methods for pde-constrained

MCMC light transport methods - Cornell University

MCMC Methods for Functions: Modifying Old Algorithms to Make …homepages.warwick.ac.uk/~masdr/JOURNALPUBS/stuart103.pdf · 2013-09-06 · MCMC METHODS FOR FUNCTIONS 425 methods).

MCMC Methods - University of Sheffield/file/MCMC.pdf · MCMC Methods for data modeling Kenneth Scerri Department of Automatic Control and Systems Engineering

An Introduction to MCMC methods and Bayesian Statistics

Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.