Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. ·...

22
Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University of Tokyo, Japan KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 1 / 22

Transcript of Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. ·...

Page 1: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Metropolis-Hastings Algorithm for Mixture Model andits Weak Convergence

Kengo, KAMATANI

University of Tokyo, Japan

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 1 / 22

Page 2: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

1 Gibbs sampler usually works well.

2 However in certain settings, it works poorly. ex) Mixture model.

3 Fortunately, we found an alternative MCMC method which worksbetter in simulation.

Problem

Both 2 and 3 are uniformly ergodic. Therefore, to compare those methods,we have to calculate the convergence rates. It is very difficult!

Therefore, in Harris recurrence approach, the comparison is difficult.We take another approach.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 2 / 22

Page 3: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Summary of the talk

Sec. 1 I show a bad behavior of the Gibbs sampler.

Sec. 2 Define efficiency (consistency) of MCMC. Prove that theGibbs sampler has a bad convergence property.

Sec. 3 Propose a new MCMC. Prove that the new MCMC is betterthan the Gibbs sampler.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 3 / 22

Page 4: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Note that...

Harris recurrence property is also very important for our approach.Without this property, our approach is useless.

The another motivation of our approach is to divide two differentconvergence issues 1) convergence to the local area and 2) consistency

Only the mixture model is considered here. However it may be usefulto other models.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 4 / 22

Page 5: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Outline

1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler

2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy

3 MH algorithm converges fasterMH proposal constructionMH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 5 / 22

Page 6: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Outline

1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler

2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy

3 MH algorithm converges fasterMH proposal constructionMH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 6 / 22

Page 7: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Bad behavior of the Gibbs samplerModel description

1 Consider a model

pX |Θ(dx |θ) = (1− θ)F0(dx) + θF1(dx).

2 Flip a coin with the proportion of head θ. If the coin is head, generatex from F1, otherwise, from F0.

3 We do not observe the coin but x .

4 Observation xn = (x1, x2, . . . , xn), x i ∼ pX |Θ(dx |θ0).Prior distribution pΘ = Beta(α1, α0).

We want to calculate the posterior distribution.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 7 / 22

Page 8: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Bad behavior of the Gibbs samplerGibbs sampler

1 Set θ(0) ∈ Θ.

2 y i ∼ Bi(1, pi ) (i = 1, 2, . . . , n) where

pi =θ(0)f1(x i )

(1− θ(0))f0(x i ) + θ(0)f1(x i ).

Count m =∑n

i=1 yi . Fi (dx) = fi (x)dx .

3 Generate θ(1) ∼ Beta(α1 + m, α0 + n −m).

4 Empirical measure of (θ(0), θ(1), . . . , θ(N − 1)) is an estimator of theposterior distribution.

The next figure is a path of the Gibbs sampler when the true model is F0,that is, θ0 = 0.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 8 / 22

Page 9: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Bad behavior of the Gibbs samplerGibbs sampler

0 200 400 600 800 1000

02

46

Path of MCMC

iteration

deviance

Figure: Plot of paths of MCMC methods for n = 104. The dashed line is a pathfrom the Gibbs sampler and the solid line is the MH algorithm.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 9 / 22

Page 10: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Bad behavior of the Gibbs samplerHow to define efficiency

1 MCMC methods produce complicated Markov chain.

2 We make an approximation of MCMC method.

We observe the behavior of MCMC methods when the sample size n→∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 10 / 22

Page 11: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Outline

1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler

2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy

3 MH algorithm converges fasterMH proposal constructionMH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 11 / 22

Page 12: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Weak convergence of MCMCWhat is MCMC?

Write s instead of θ.

1 For each observation x , Gibbs sampler produces pathss = (s(0), s(1), . . .) in S∞.

2 In other words, for x ∈ X , Gibbs sampler defines a law G x ∈ P(S∞).

3 Therefore, a Gibbs sampler is a set of probability measuresG = (G x ; x ∈ X ) (Later, we will consider G as a random variableG (x) = G x).

Let ν̂m(s) be the empirical measure of s(0), . . . , s(m − 1).Let νx be the target distribution for each x .

We expect that d(ν̂m(s), νx)→ 0 in a certain sense.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 12 / 22

Page 13: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Weak convergence of MCMCConsistency

1 We expect that as m→∞,

EG (d(ν̂m(s), ν))→ 0.

But G and ν depend on x!.

2 We expect that as m→∞,

EG x (d(ν̂m(s), νx)) = oP(1).

But G x and νx may depend on n!.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 13 / 22

Page 14: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Weak convergence of MCMCConsistency

Definition

(Mn = (Mxn ); n ∈ N): sequence of MCMC.

We call (Mn; n ∈ N) consistent for νn = (νxn ) if for any m(n)→∞,

EMxn(d(ν̂[m(n)](s), νxn )) = oPn(1).

For a regular model, the Gibbs sampler has consistency with scalingθ 7→ n1/2(θ − θ0).

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 14 / 22

Page 15: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Weak convergence of MCMCDegeneracy

Definition1 If a measure ω ∈ P(S∞) satisfies the following, we call it degenerate:

ω({s; s(0) = s(1) = s(2) = · · · }) = 1 (1)

2 We also call M degenerate (in P) if Mx is degenerate a.s. x .

3 If Mn ⇒ M and M degenerate, we call Mn degenerate in the limit.

The Gibbs sampler Gn for mixture model is degenerate with scalingθ 7→ n1/2θ if θ0 = 0 as n→∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 15 / 22

Page 16: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Weak convergence of MCMCDegeneracy

In fact, Gn tends to a diffusion process type variable with time scaling0, 1, 2, . . . 7→ 0, n−1/2, 2n−1/2, . . .!

Under both space and time scaling, G xn is similar to the law of

dSt = (α1 + StZn − S2t I )dt + StdBt

where Zn ⇒ N(0, I ) and I is the Fisher information matrix.

If we take m(n)n−1/2 →∞, the empirical measure converges to theposterior distribution.

We call Gn n1/2-weakly consistent.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 16 / 22

Page 17: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

Outline

1 Bad behavior of the Gibbs samplerModel descriptionGibbs sampler

2 Efficiency of MCMCWhat is MCMC?ConsistencyDegeneracy

3 MH algorithm converges fasterMH proposal constructionMH performance

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 17 / 22

Page 18: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

MH algorithm converges fasterMH proposal construction

Construct a posterior distribution for another parametric family:

1 Fix Q ⊂ P(X ).

2 For each θ, set

qX |Θ(dx |θ) := argminq∈Qd(pX |Θ(dx |θ), q)

where d is a certain metric. ex) Kullback-Leibler distance.

3 Calculate the posterior qnΘ|Xn(dθ|xn).

Remark

We assume that we can generate θ ∼ qnΘ|Xn(dθ|xn) in PC.

This construction is similar to

1 quasi Bayes method (See ex. Smith and Markov 1978)

2 variational Bayes method (See ex. Humphreys and Titterington 2000).

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 18 / 22

Page 19: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

MH algorithm converges fasterMH proposal construction

Construct an independent type Metropolis-Hastings algorithm with targetdistribution pnΘ|Xn

(dθ|xn).

Step 0 Generate θ(0) ∼ qnΘ|Xn(dθ|xn). Go to Step 1.

Step i Generate θ(i)∗ ∼ qnΘ|Xn(dθ|xn). Then

θ(i) =

{θ∗(i) with probability α(θ(i), θ∗(i))θ(i − 1) with probability 1− α(θ(i), θ∗(i))

.

Go to Step i + 1.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 19 / 22

Page 20: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

MH algorithm converges fasterMH performance (Normal): Mean squared error

0 2000 4000 6000 8000 10000

0.0

0.5

1.0

1.5

2.0

MCMC standard error 0 0 1 2 4 3 0

mcmc

sd

Figure: The dashed line is apath from the Gibbs samplerand the solid line is the MHalgorithm for n = 10.

0 2000 4000 6000 8000 10000

0.0

0.5

1.0

1.5

2.0

MCMC standard error 0 0 1 3 4 3 0

mcmc

sdFigure: The same figure as theleft. The sample size is 102.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 20 / 22

Page 21: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

MH algorithm converges fasterRemarks

Remarks...

If F0 and F1 are similar, the Gibbs sampler becomes even worse. This factcan be verified by setting Fε and ε→ 0 instead of ε ≡ 1. In this case, wehave to take m(n)εn−1/2 →∞ ( Gn is ε−1n1/2-weakly consistent).

For other model, there is a case such that m(n)n−1 →∞.

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 21 / 22

Page 22: Metropolis-Hastings Algorithm for Mixture Model and its Weak … · 2010. 8. 31. · Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence Kengo, KAMATANI University

MH algorithm converges faster

Thank you!

KAMATANI (University of Tokyo, Japan) Metropolis-Hastings Algorithm for Mixture Model and its Weak Convergence 22 / 22