Inference V: MCMC Methods

.

Inference V:MCMC Methods

Stochastic Sampling

In previous class, we examined methods that use independent samples to estimate P(X = x |e )

Problem: It is difficult to sample from P(X1, …. Xn |e )

We had to use likelihood weighting to reweigh our samples

This introduced bias in estimation In some case, such as when the evidence is on

leaves, these methods are inefficient

MCMC Methods

We are going to discuss sampling methods that are based on Markov Chain

Markov Chain Monte Carlo (MCMC) methods

Key ideas: Sampling process as a Markov Chain

Next sample depends on the previous one These will approximate any posterior distribution

We start by reviewing key ideas from the theory of Markov chains

Markov Chains Suppose X1, X2, … take some set of values

wlog. These values are 1, 2, ... A Markov chain is a process that corresponds to the

network:

To quantify the chain, we need to specify Initial probability: P(X1) Transition probability: P(Xt+1|Xt)

A Markov chain has stationary transition probability

P(Xt+1|Xt) is the same for all times t

X1 X2 X3Xn... ...

Irreducible Chains

A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0

There is a positive probability of reaching j from i after some number steps

A chain is irreducible if every state is accessible from every state

Ergodic Chains

A state is positively recurrent if there is a finite expected time to get back to state i after being in state i

If X has finite number of states, then this is suffices that i is accessible from itself

A chain is ergodic if it is irreducible and every state is positively recurrent

(A)periodic Chains

A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d

A chain is aperiodic if it contains no periodic state

Stationary Probabilities

Thm: If a chain is ergodic and aperiodic, then the limit

exists, and does not depend on i

Moreover, let

then, P*(X) is the unique probability satisfying

)|(lim 1 iXXP nn

=∞→

)|(lim)( 1* iXjXPjXP n

n====

∞→

∑ ===== +i

tt iXPiXjXPjXP )()|()( *1

*

Stationary Probabilities

The probability P*(X) is the stationary probability of the process

Regardless of the starting point, the process will converge to this probability

The rate of convergence depends on properties of the transition probability

Sampling from the stationary probability

This theory suggests how to sample from the stationary probability:

Set X1 = i, for some random/arbitrary i For t = 1, 2, …, n

Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) return xn

If n is large enough, then this is a sample from P*(X)

Designing Markov Chains

How do we construct the right chain to sample from?

Ensuring aperiodicity and irreducibility is usually easy

Problem is ensuring the desired stationary probability

Designing Markov Chains

Key tool: If the transition probability satisfies

then, P*(X) = Q(X) This gives a local criteria for checking that the chain

will have the right stationary distribution

0)|1(whenever)()(

)|(

)|(

1

1 >==+==

==== =

+

+ itXjtXPiXQjXQ

jXiXPiXjXP

tt

tt

MCMC Methods

We can use these results to sample from P(X1,…,Xn|e)

Idea: Construct an ergodic & aperiodic Markov Chain

such that P*(X1,…,Xn) = P(X1,…,Xn|e)

Simulate the chain n steps to get a sample

MCMC Methods

Notes: The Markov chain variable Y takes as value

assignments to all variables that are consistent evidence

For simplicity, we will denote such a state using the vector of variables

}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV ××∈= L

Gibbs Sampler

One of the simplest MCMC method At each transition change the state of just on Xi

We can describe the transition probability as a stochastic procedure:

Input: a state x1,…,xn Choose i at random (using uniform probability) Sample x’i from

P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) let x’j = xj for all j i return x’1,…,x’n

Correctness of Gibbs Sampler

By chain rule

P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)

Thus, we get

Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

),,,,,,|'(),,,,,,|(

)|,,,',,,()|,,,,,,(

111

111

111

111

ee

ee

niii

niii

niii

niii

xxxxxPxxxxxP

xxxxxPxxxxxP

KKKK

KKKK

+−

+−

+−

+− =

Gibbs Sampling for Bayesian Network

Why is the Gibbs sampler “easy” in BNs? Recall that the Markov blanket of a variable

separates it from the other variables in the network P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

This property allows us to use local computations to perform sampling in each transition

Gibbs Sampling in Bayesian Networks

How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

Let Y1, …, Yk be the children of Xi

By definition of Mbi, the parents of Yj are in Mbi{Xi}

It is easy to show that

∑ ∏∏

=

i

j

j

x jyiii

jyiii

ii payPPaxP

payPPaxP

MbxP

'

)|()|'(

)|()|(

)|(

Sampling Strategy

How do we collect the samples?

Strategy I: Run the chain M times, each run for N steps

each run starts from a different state points Return the last state in each run

M chains

Sampling Strategy

Strategy II: Run one chain for a long time After some “burn in” period, sample points every

some fixed number of steps

“burn in” M samples from one chain

Comparing Strategies

Strategy I: Better chance of “covering” the space of points

especially if the chain is slow to reach stationarity Have to perform “burn in” steps for each chain

Strategy II: Perform “burn in” only once Samples might be correlated (although only weakly)

Hybrid strategy: run several chains, and sample few samples

from each Combines benefits of both strategies

Inference V: MCMC Methods

Documents

Transcript of Inference V: MCMC Methods