MarkovChainMonteCarlotheory and worked examples
Dario Digiuni,
A.A. 2007/2008
Markov Chain Monte Carlo
• Class of sampling algorithms
• High sampling efficiency
• Sample from a distribution with unknown normalization constant
• Often the only way to solve problems in time polynomial in the number of dimensions
e.g. evaluation of a convex body volume
MCMC: applications
• Statistical Mechanics
Metropolis-Hastings
• Optimization
▫ Simulated annealing
• Bayesian Inference
▫ Metropolis-Hastings
▫ Gibbs sampling
The Monte Carlo principle• Sample a set of N independent and identically-distributed variables
• Approximation of the target p.d.f. with the empirical expression
… then approximation of the integrals!
Rejection Sampling
1. It needs finding M!2. Low acceptance rate
Idea• I can use the previously sampled value to find the following one
• Exploration of the configuration space by means of Markov Chains:
def .: Markov process
def .: Markov chain
Invariant distribution• Stability conditions:
1. Irreducibility= for every state there exists a finite probability to visitany other state
2. Aperiodicity = there are no loops.
• Sufficient condition
1. Detailed balance principle
MCMC algorithms are aperiodic, irreducible Markov chains havingthe target pdf as the invariant distribution
Example• What is the probability to find the lift at the ground floor in a three
floor building?
▫ 3 states Markov chain
▫ Lift= Random Walker
▫ Transition matrix
▫ Looking for the invariant distribution
… burn-in …
Example - 2• I can apply the matrix T on the right to any of the states, e.g.
• Google’s PageRank:
▫ Websites are the states, T is defined by the number of hyperlinks amongthem and the user is the random walker:
The webpages are displayed following the invariant distribution!
~ 50% is the probability to findthe lift at the ground floor
homogeneousMarkov chain
Metropolis-Hastings• Given the target distribution
1. Choose a value for
2. Sample from a proposal distribution
3. Accept the new value with probability
4. Return to 1
Ratio independentof the normalization!
Equal in Metropolis algorithm
equivalent to T
M.-H. – Pros and Cons
• Very general sampling method:
▫ I can sample from a unnormalized distribution
▫ It does not require to provide upper bound for the function
• Good working depends on the choice of the proposal distribution
▫ well-mixing condition
M.-H. - Example• In Statistical Mechanics it is important to evalue the partition
function,
e.g. Ising model
Sum every possible spin state:In a 10 x 10 x 10 spin cube,I would have to sum over
Possible states = UNFEASIBLE
MCMC APPROACH:
1. Evaluate the system’s energy
2. Pick up a spin at random and flip it:
1. If energy decreases, this is the new spin configuration
2. If energy increases, this is the new spin configuration withprobability
Simulated Annealing
• It allows one to find the global maximum of a generic pdf
▫ No comparison between the value of local minima required
▫ Application to the maximum-likelihood method
• It is a non-homogeneous Markov chain whose invariant distributionkeeps changing as follows:
Simulated Annealing: example• Let us apply the algorithm to a simple, 1-dimensional case
• The optimal cooling scheme is
Simulated Annealing: Pros and Cons
• The global maximum is univocally determined▫ Even if walker starts next to a local (non global!) maximum, it converges to the
true global maximum
• It requires a good tuning of the parameters
Gibbs Sampler
• Optimal method to marginalize multidimensional distributions
• Let us assume we have a n-dimensional vector and that we know allthe conditional probability expression for the pdf
• We take the following proposal distribution:
Gibbs Sampler - 2
• Then:
very efficientmethod!
Gibbs Sampler – practically
Gibbs Sampler – practically
1. §Initialize
2. for (i=0 ; i < N; i++)
• Sample
• Sample
• Sample
• Sample
fix n-1 coordinates and sample from the resulting pdf
Gibbs Sampler – example
• Let us pretend we cannot determine the normalizationconstant…
… but we can make a comparison with the true marginalizedpdf…
Gibbs Sampler – results
• Comparison between GibbsSampling and the true M.-H.sampling from the marginalized pdf
• Good c2 agreement
A complex MCMC application
A radioactive source decays with frequency l1 and a detector recordsonly every k1 –th event, then at the moment tc the decay rate
changes to l2 and only one event out ofk2 is recorded.
Apparently l1 , k1 , tc , l2 and k2 are undetermined.
We wish to find them.
Preparation
• The waiting time for the k-th event in a Poissonian process withfrequency l is distributed according to:
• I can sample a big amount of events from this pdf, changing the parameters l1 e k1 to l2 e k2 at time tc
• I evaluate the likelihood:
Idea• I assume log-likelihood to be the invariant distribution!
▫ which are the Markov chain states?
struct State {
double lambda1, lambda2;double tc;int k1, k2;double plog;
State(double la1, double la2, double t, int kk1, int kk2) :
lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {}
State() {}; };
Parameterspace
Corresponding log-likelihood value
Practically
• I have to find an appropriate proposal distribution to move amongthe states
▫ Attention: varying li and ki I have toi prevent the acceptance rate to betoo low… but also too high!
• The a ratio is evaluated as the ratio between the final-state and initial-state likelihood values.
• Try to guess the values for li , ki and tc
• Let the chain evolve for a burn-in time and then record the results.
Results• Even if the inital guess is quite far from the real
value, the random walker converges.
guess: l1=5 l2 = 5 k1 = 3 k2 = 2
real: l1=1 l2 = 2 k1 = 1, k2 = 1
Results- 2
• Estimate of the uncertainty
l1
l2
Results- 3
• All the parameters can be detemined quickly
guess: tc=150 real: tc=300
References
• C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50(2003), 5-43.
• G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174.
• W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, NumericalRecipes , Third Edition, Cambridge University Press, 2007.
• M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli (1998).
• B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581
Top Related