Inference V: MCMC Methods

21
. Inference V: MCMC Methods

description

Inference V: MCMC Methods. Stochastic Sampling. In previous class, we examined methods that use independent samples to estimate P(X = x | e ) Problem: It is difficult to sample from P(X 1 , …. X n | e ) We had to use likelihood weighting to reweigh our samples - PowerPoint PPT Presentation

Transcript of Inference V: MCMC Methods

Page 1: Inference V: MCMC Methods

.

Inference V:MCMC Methods

Page 2: Inference V: MCMC Methods

Stochastic Sampling

In previous class, we examined methods that use independent samples to estimate P(X = x |e )

Problem: It is difficult to sample from P(X1, …. Xn |e )

We had to use likelihood weighting to reweigh our samples

This introduced bias in estimation In some case, such as when the evidence is on

leaves, these methods are inefficient

Page 3: Inference V: MCMC Methods

MCMC Methods

We are going to discuss sampling methods that are based on Markov Chain

Markov Chain Monte Carlo (MCMC) methods

Key ideas: Sampling process as a Markov Chain

Next sample depends on the previous one These will approximate any posterior distribution

We start by reviewing key ideas from the theory of Markov chains

Page 4: Inference V: MCMC Methods

Markov Chains Suppose X1, X2, … take some set of values

wlog. These values are 1, 2, ... A Markov chain is a process that corresponds to the

network:

To quantify the chain, we need to specify Initial probability: P(X1) Transition probability: P(Xt+1|Xt)

A Markov chain has stationary transition probability

P(Xt+1|Xt) is the same for all times t

X1 X2 X3Xn... ...

Page 5: Inference V: MCMC Methods

Irreducible Chains

A state j is accessible from state i if there is an n such that P(Xn = j | X1 = i) > 0

There is a positive probability of reaching j from i after some number steps

A chain is irreducible if every state is accessible from every state

Page 6: Inference V: MCMC Methods

Ergodic Chains

A state is positively recurrent if there is a finite expected time to get back to state i after being in state i

If X has finite number of states, then this is suffices that i is accessible from itself

A chain is ergodic if it is irreducible and every state is positively recurrent

Page 7: Inference V: MCMC Methods

(A)periodic Chains

A state i is periodic if there is an integer d such thatP(Xn = i | X1 = i ) = 0 when n is not divisible by d

A chain is aperiodic if it contains no periodic state

Page 8: Inference V: MCMC Methods

Stationary Probabilities

Thm: If a chain is ergodic and aperiodic, then the limit

exists, and does not depend on i

Moreover, let

then, P*(X) is the unique probability satisfying

)|(lim 1 iXXP nn

=∞→

)|(lim)( 1* iXjXPjXP n

n====

∞→

∑ ===== +i

tt iXPiXjXPjXP )()|()( *1

*

Page 9: Inference V: MCMC Methods

Stationary Probabilities

The probability P*(X) is the stationary probability of the process

Regardless of the starting point, the process will converge to this probability

The rate of convergence depends on properties of the transition probability

Page 10: Inference V: MCMC Methods

Sampling from the stationary probability

This theory suggests how to sample from the stationary probability:

Set X1 = i, for some random/arbitrary i For t = 1, 2, …, n

Sample a value xt+1 for Xt+1 from P(Xt+1|Xt=xt) return xn

If n is large enough, then this is a sample from P*(X)

Page 11: Inference V: MCMC Methods

Designing Markov Chains

How do we construct the right chain to sample from?

Ensuring aperiodicity and irreducibility is usually easy

Problem is ensuring the desired stationary probability

Page 12: Inference V: MCMC Methods

Designing Markov Chains

Key tool: If the transition probability satisfies

then, P*(X) = Q(X) This gives a local criteria for checking that the chain

will have the right stationary distribution

0)|1(whenever)()(

)|(

)|(

1

1 >==+==

==== =

+

+ itXjtXPiXQjXQ

jXiXPiXjXP

tt

tt

Page 13: Inference V: MCMC Methods

MCMC Methods

We can use these results to sample from P(X1,…,Xn|e)

Idea: Construct an ergodic & aperiodic Markov Chain

such that P*(X1,…,Xn) = P(X1,…,Xn|e)

Simulate the chain n steps to get a sample

Page 14: Inference V: MCMC Methods

MCMC Methods

Notes: The Markov chain variable Y takes as value

assignments to all variables that are consistent evidence

For simplicity, we will denote such a state using the vector of variables

}satisfy,...,|)()(,...,{)( 1111 enn xxXVXVxxYV ××∈= L

Page 15: Inference V: MCMC Methods

Gibbs Sampler

One of the simplest MCMC method At each transition change the state of just on Xi

We can describe the transition probability as a stochastic procedure:

Input: a state x1,…,xn Choose i at random (using uniform probability) Sample x’i from

P(Xi|x1, …, xi-1, xi+1 ,…, xn, e) let x’j = xj for all j i return x’1,…,x’n

Page 16: Inference V: MCMC Methods

Correctness of Gibbs Sampler

By chain rule

P(x1, …, xi-1, xi, xi+1 ,…, xn|e) =P(x1, …, xi-1, xi+1 ,…, xn|e)P(xi|x1, …, xi-1, xi+1 ,…, xn, e)

Thus, we get

Since we choose i from the same distribution at each stage, this procedure satisfies the ratio criteria

),,,,,,|'(),,,,,,|(

)|,,,',,,()|,,,,,,(

111

111

111

111

ee

ee

niii

niii

niii

niii

xxxxxPxxxxxP

xxxxxPxxxxxP

KKKK

KKKK

+−

+−

+−

+− =

Page 17: Inference V: MCMC Methods

Gibbs Sampling for Bayesian Network

Why is the Gibbs sampler “easy” in BNs? Recall that the Markov blanket of a variable

separates it from the other variables in the network P(Xi | X1,…,Xi-1,Xi+1,…,Xn) = P(Xi | Mbi )

This property allows us to use local computations to perform sampling in each transition

Page 18: Inference V: MCMC Methods

Gibbs Sampling in Bayesian Networks

How do we evaluate P(Xi | x1,…,xi-1,xi+1,…,xn) ?

Let Y1, …, Yk be the children of Xi

By definition of Mbi, the parents of Yj are in Mbi{Xi}

It is easy to show that

∑ ∏∏

=

i

j

j

x jyiii

jyiii

ii payPPaxP

payPPaxP

MbxP

'

)|()|'(

)|()|(

)|(

Page 19: Inference V: MCMC Methods

Sampling Strategy

How do we collect the samples?

Strategy I: Run the chain M times, each run for N steps

each run starts from a different state points Return the last state in each run

M chains

Page 20: Inference V: MCMC Methods

Sampling Strategy

Strategy II: Run one chain for a long time After some “burn in” period, sample points every

some fixed number of steps

“burn in” M samples from one chain

Page 21: Inference V: MCMC Methods

Comparing Strategies

Strategy I: Better chance of “covering” the space of points

especially if the chain is slow to reach stationarity Have to perform “burn in” steps for each chain

Strategy II: Perform “burn in” only once Samples might be correlated (although only weakly)

Hybrid strategy: run several chains, and sample few samples

from each Combines benefits of both strategies