Talk in Telecom-Paris, Nov. 15, 2011
-
Upload
christian-robert -
Category
Entertainment & Humor
-
view
2.297 -
download
1
description
Transcript of Talk in Telecom-Paris, Nov. 15, 2011
![Page 1: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/1.jpg)
Vanilla Rao–Blackwellisation of
Metropolis–Hastings algorithms
Christian P. Robert
Universite Paris-Dauphine, IuF, and CRESTJoint works with Randal Douc, Pierre Jacob and Murray Smith
November 16, 2011
1 / 36
![Page 2: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/2.jpg)
Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Takes advantage of parallel capacities at a very basic level(GPUs)
2 / 36
![Page 3: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/3.jpg)
Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Takes advantage of parallel capacities at a very basic level(GPUs)
2 / 36
![Page 4: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/4.jpg)
Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Takes advantage of parallel capacities at a very basic level(GPUs)
2 / 36
![Page 5: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/5.jpg)
Main themes
1 Rao–Blackwellisation on MCMC
2 Can be performed in any Hastings Metropolis algorithm
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Takes advantage of parallel capacities at a very basic level(GPUs)
2 / 36
![Page 6: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/6.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
3 / 36
![Page 7: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/7.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
3 / 36
![Page 8: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/8.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
3 / 36
![Page 9: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/9.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
4 / 36
![Page 10: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/10.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑nt=1 h(x
(t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 36
![Page 11: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/11.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑nt=1 h(x
(t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 36
![Page 12: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/12.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑nt=1 h(x
(t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 36
![Page 13: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/13.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑nt=1 h(x
(t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 36
![Page 14: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/14.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is thestationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 36
![Page 15: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/15.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is thestationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 36
![Page 16: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/16.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x , y) = π(y)q(x |y)α(y , x).
⊲ π is the stationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 36
![Page 17: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/17.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x , y) = π(y)q(x |y)α(y , x).
⊲ π is the stationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 36
![Page 18: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/18.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Some properties of the HM algorithm
1 Alternative representation of the estimator δ is
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) ,
where
zi ’s are the accepted yj ’s,MN is the number of accepted yj ’s till time N,ni is the number of times zi appears in the sequence (x (t))t .
7 / 36
![Page 19: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/19.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) =
8 / 36
![Page 20: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/20.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) =
8 / 36
![Page 21: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/21.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) =
8 / 36
![Page 22: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/22.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) = π(x)p(x)∫π(u)p(u)du
︸ ︷︷ ︸
π(x)
α(x , y)q(y |x)p(x)
︸ ︷︷ ︸
q(y |x)
8 / 36
![Page 23: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/23.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) = π(x)α(x , y)q(y |x)∫π(u)p(u)du
8 / 36
![Page 24: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/24.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) = π(y)α(y , x)q(x |y)∫π(u)p(u)du
8 / 36
![Page 25: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/25.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate from q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
◮ this is the transition of the HM algorithm.The transition kernel qadmits π as a stationary distribution:
π(x)q(y |x) = π(y)q(x |y) ,
8 / 36
![Page 26: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/26.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dyand stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 36
![Page 27: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/27.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dyand stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 36
![Page 28: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/28.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dyand stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 36
![Page 29: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/29.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Lemma (Douc & X., AoS, 2011)
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dyand stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 36
![Page 30: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/30.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1
10 / 36
![Page 31: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/31.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1 zi
ni−1
indep
indep
10 / 36
![Page 32: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/32.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
10 / 36
![Page 33: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/33.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) .
10 / 36
![Page 34: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/34.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) .
10 / 36
![Page 35: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/35.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
11 / 36
![Page 36: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/36.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Importance sampling perspective
1 A natural idea:
δ∗ =1
N
MN∑
i=1
h(zi )
p(zi ),
12 / 36
![Page 37: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/37.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
12 / 36
![Page 38: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/38.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
2 But p not available in closed form.
12 / 36
![Page 39: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/39.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used inthe original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 36
![Page 40: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/40.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
The Bernoulli factory
The crude estimate of 1/p(zi ),
ni = 1 +
∞∑
j=1
∏
ℓ≤j
I {uℓ ≥ α(zi , yℓ)} ,
can be improved:
Lemma (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y |zi ), the quantity
ξi = 1 +∞∑
j=1
∏
ℓ≤j
{1− α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
lower than the conditional variance of ni , {1− p(zi )}/p2(zi ).
13 / 36
![Page 41: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/41.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Rao-Blackwellised, for sure?
ξi = 1 +∞∑
j=1
∏
ℓ≤j
{1− α(zi , yℓ)}
1 Infinite sum but finite with at least positive probability:
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon improvement:
ξki = 1 +∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
14 / 36
![Page 42: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/42.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Rao-Blackwellised, for sure?
ξi = 1 +∞∑
j=1
∏
ℓ≤j
{1− α(zi , yℓ)}
1 Infinite sum but finite with at least positive probability:
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)q(yt |x (t))
}
For example: take a symmetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
Finite horizon improvement:
ξki = 1 +∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
14 / 36
![Page 43: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/43.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number ofterms.
15 / 36
![Page 44: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/44.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Moreover, for k ≥ 1,
V
[
ξki
∣
∣
∣zi
]
=1− p(zi )
p2(zi )−
1− (1− 2p(zi ) + r(zi ))k
2p(zi )− r(zi )
(
2− p(zi )
p2(zi )
)
(p(zi )− r(zi )) ,
where p(zi ) :=∫
α(zi , y) q(y |zi ) dy . and r(zi ) :=∫
α2(zi , y) q(y |zi ) dy .
15 / 36
![Page 45: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/45.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Variance improvement
Proposition (Douc & X., AoS, 2011)
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Therefore, we have
V
[
ξi
∣∣∣ zi
]
≤ V
[
ξki
∣∣∣ zi
]
≤ V
[
ξ0i
∣∣∣ zi
]
= V [ni | zi ] .
15 / 36
![Page 46: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/46.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
zi−1
ξki = 1 +∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 36
![Page 47: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/47.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
zi−1 zi
ξki−1
not indep
not indep
ξki = 1 +∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 36
![Page 48: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/48.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
zi−1 zi zi+1
ξki−1 ξki
not indep
not indep
not indep
not indep
ξki = 1 +∞∑
j=1
∏
1≤ℓ≤k∧j
{1− α(zi , yj )}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 36
![Page 49: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/49.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
zi−1 zi zi+1
ξki−1 ξki
not indep
not indep
not indep
not indep
δkM =
∑Mi=1 ξ
ki h(zi )
∑Mi=1 ξ
ki
.
16 / 36
![Page 50: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/50.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
zi−1 zi zi+1
ξki−1 ξki
not indep
not indep
not indep
not indep
δkM =
∑Mi=1 ξ
ki h(zi )
∑Mi=1 ξ
ki
.
16 / 36
![Page 51: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/51.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Let
δkM =
∑Mi=1 ξ
ki h(zi )
∑Mi=1 ξ
ki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}.
17 / 36
![Page 52: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/52.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Let
δkM =
∑Mi=1 ξ
ki h(zi )
∑Mi=1 ξ
ki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}. Assumethat there exists a positive function ϕ ≥ 1 such that
∀h ∈ Cϕ,
∑Mi=1 h(zi )/p(zi )∑M
i=1 1/p(zi )
P−→ π(h)
Theorem (Douc & X., AoS, 2011)
Under the assumption that π(p) > 0, the following convergence property holds:
i) If h is in Cϕ, then
δkMP
−→M→∞ π(h) (◮Consistency)
17 / 36
![Page 53: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/53.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Let
δkM =
∑Mi=1 ξ
ki h(zi )
∑Mi=1 ξ
ki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}.Assume that there exists a positive function ψ such that
∀h ∈ Cψ,√M
(∑Mi=1 h(zi )/p(zi )∑M
i=1 1/p(zi )− π(h)
)
L−→ N (0, Γ(h))
Theorem (Douc & X., AoS, 2011)
Under the assumption that π(p) > 0, the following convergence propertyholds:
ii) If, in addition, h2/p ∈ Cϕ and h ∈ Cψ, then√M(δkM − π(h))
L−→M→∞ N (0,Vk [h − π(h)]) , (◮Clt)
where Vk(h) := π(p)∫π(dz)V
[
ξki
∣∣∣ z
]
h2(z)p(z) + Γ(h) . 17 / 36
![Page 54: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/54.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi )− π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑Nt=1 h(x
(t))
N− π(h)
)
N→+∞−→ N (0,V0[h − π(h)]) ,
where MN is defined by 18 / 36
![Page 55: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/55.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi )− π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Moreover, assume that ∃φ ≥ 1 such that for any starting point x ,
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√(∑N
h(x (t)))
N→+∞
18 / 36
![Page 56: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/56.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi )− π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Moreover, assume that ∃φ ≥ 1 such that for any starting point x ,
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√(∑N
h(x (t)))
N→+∞
18 / 36
![Page 57: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/57.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi )− π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑Nt=1 h(x
(t))
N− π(h)
)
N→+∞−→ N (0,V0[h − π(h)]) ,
where MN is defined by18 / 36
![Page 58: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/58.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Theorem (Douc & X., AoS, 2011)
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑Nt=1 h(x
(t))
N− π(h)
)
N→+∞−→ N (0,V0[h − π(h)]) ,
where MN is defined by
MN∑
i=1
ξ0i ≤ N <
MN+1∑
i=1
ξ0i .
18 / 36
![Page 59: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/59.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Variance gain (1)
h(x) x x2 IX>0 p(x)τ = .1 0.971 0.953 0.957 0.207τ = 2 0.965 0.942 0.875 0.861τ = 5 0.913 0.982 0.785 0.826τ = 7 0.899 0.982 0.768 0.820
Ratios of the empirical variances of δ∞ and δ estimating E[h(X )]:100 MCMC iterations over 103 replications of a random walk Gaussianproposal with scale τ .
19 / 36
![Page 60: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/60.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Illustration (1)
Figure: Overlay of the variations of 250 iid realisations of the estimatesδ (gold) and δ∞ (grey) of E[X ] = 0 for 1000 iterations, along with the90% interquantile range for the estimates δ (brown) and δ∞ (pink), inthe setting of a random walk Gaussian proposal with scale τ = 10.
20 / 36
![Page 61: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/61.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Extra computational effort
median mean q.8 q.9 timeτ = .25 0.0 8.85 4.9 13 4.2τ = .50 0.0 6.76 4 11 2.25τ = 1.0 0.25 6.15 4 10 2.5τ = 2.0 0.20 5.90 3.5 8.5 4.5
Additional computing effort due: median and mean numbers of additionaliterations, 80% and 90% quantiles for the additional iterations, and ratioof the average R computing times obtained over 105 simulations
21 / 36
![Page 62: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/62.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Formal importance samplingVariance reductionAsymptotic resultsIllustrations
Illustration (2)
Figure: Overlay of the variations of 500 iid realisations of the estimatesδ (deep grey), δ∞ (medium grey) and of the importance sampling version(light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100 iterations, alongwith the 90% interquantile ranges (same colour code), in the setting ofan independent exponential proposal with scale µ = 0.02. 22 / 36
![Page 63: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/63.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 Rao-Blackwellisation (2)Independent caseGeneral MH algorithms
23 / 36
![Page 64: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/64.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Integrating out white noise
In Casella+X. (1996) paper, averaging of possible past and futurehistories (by integrating out uniforms) to improve weights of acceptedvaluesRao–Blackwellised weight on proposed values yt
ϕ(i)t = δt
p∑
j=t
ξtj
with δ0 = 1 δt =
t−1∑
j=0
δjξj(t−1)ρjt
and ξtt = 1 , ξtj =
j∏
u=t+1
(1− ρtu)
occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio
ωt = π(yt)/µ(yt) , ρtu = ωu/ωt ∧ 1 .
24 / 36
![Page 65: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/65.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Integrating out white noise
In Casella+X. (1996) paper, averaging of possible past and futurehistories (by integrating out uniforms) to improve weights of acceptedvaluesRao–Blackwellised weight on proposed values yt
ϕ(i)t = δt
p∑
j=t
ξtj
with δ0 = 1 δt =
t−1∑
j=0
δjξj(t−1)ρjt
and ξtt = 1 , ξtj =
j∏
u=t+1
(1− ρtu)
occurence survivals of the yt ’s, associated with Metropolis–Hastings ratio
ωt = π(yt)/µ(yt) , ρtu = ωu/ωt ∧ 1 .
24 / 36
![Page 66: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/66.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2)...
Possible recovery of efficiency thanks to parallelisation:Moving from (ǫ1, . . . , ǫp) towards...
(ǫ(1), . . . , ǫ(p))
by averaging over ”all” possible orders
25 / 36
![Page 67: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/67.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2)...
Possible recovery of efficiency thanks to parallelisation:Moving from (ǫ1, . . . , ǫp) towards...
(ǫ(1), . . . , ǫ(p))
by averaging over ”all” possible orders
25 / 36
![Page 68: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/68.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Integrating out white noise
Potentialy large variance improvement but cost of O(T 2)...
Possible recovery of efficiency thanks to parallelisation:Moving from (ǫ1, . . . , ǫp) towards...
(ǫ(1), . . . , ǫ(p))
by averaging over ”all” possible orders
25 / 36
![Page 69: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/69.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Case of the independent Metropolis–Hastings algorithm
Starting at time t with p processors and a pool of p proposed values,
(y1, . . . , yp)
use processors to examine in parallel p different “histories”
26 / 36
![Page 70: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/70.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Case of the independent Metropolis–Hastings algorithm
Starting at time t with p processors and a pool of p proposed values,
(y1, . . . , yp)
use processors to examine in parallel p different “histories”
26 / 36
![Page 71: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/71.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Improvement
The standard estimator τ1 of Eπ [h(X )]
τ1(xt , y1:p) =1
p
p∑
k=1
h(xt+k)
is necessarily dominated by the average
τ2(xt , y1:p) =1
p2
p∑
k=0
nkh(yk)
where y0 = xt and n0 is the number of times xt is repeated.
27 / 36
![Page 72: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/72.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Further Rao-Blackwellisation
E.g., use of the Metropolis–Hastings weights wj : j being the index suchthat xt+i−1 = yj , update of the weights at each time t + i :
wj = wj + 1− ρ(xt+i−1, yi )
wi = wi + ρ(xt+i−1, yi )
resulting into a more stable estimator
τ3(xt , y1:p) =1
p2
p∑
k=0
wkh(yk)
E.g., Casella+X. (1996)
τ4(xt , y1:p) =1
p2
p∑
k=0
ϕkh(yk)
28 / 36
![Page 73: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/73.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Further Rao-Blackwellisation
E.g., use of the Metropolis–Hastings weights wj : j being the index suchthat xt+i−1 = yj , update of the weights at each time t + i :
wj = wj + 1− ρ(xt+i−1, yi )
wi = wi + ρ(xt+i−1, yi )
resulting into a more stable estimator
τ3(xt , y1:p) =1
p2
p∑
k=0
wkh(yk)
E.g., Casella+X. (1996)
τ4(xt , y1:p) =1
p2
p∑
k=0
ϕkh(yk)
28 / 36
![Page 74: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/74.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Markovian continuity
The Markov validity of the chain is not jeopardised! The chain continues
by picking one sequence at random and taking the corresponding x(j)t+p as
starting point of the next parallel block.
29 / 36
![Page 75: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/75.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Markovian continuity
The Markov validity of the chain is not jeopardised! The chain continues
by picking one sequence at random and taking the corresponding x(j)t+p as
starting point of the next parallel block.
29 / 36
![Page 76: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/76.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of Rao-Blackwellisations
Comparison of
τ1 basic IMH estimator of Eπ [h(X )],
τ2 improving by averaging over permutations of proposed values andusing p times more uniforms
τ3 improving upon τ2 by basic Rao-Blackwell argument,
τ4 improving upon τ2 by integrating out ancillary uniforms, at a costof O(p2).
30 / 36
![Page 77: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/77.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
![Page 78: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/78.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
![Page 79: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/79.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
![Page 80: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/80.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget N (0, 1) distribution (based on 10, 000 independent replicas).
31 / 36
![Page 81: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/81.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?Comparison of
τ2N with no permutation,
τ2C with circular permutations,
τ2R with random permutations,
τ2H with half-random permutations,
τ2S with stratified permutations,
32 / 36
![Page 82: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/82.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
![Page 83: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/83.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
![Page 84: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/84.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
![Page 85: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/85.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Impact of the order
Parallelisation allows for the partial integration of the uniforms
What about the permutation order?
32 / 36
![Page 86: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/86.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Importance target
Comparison with the ultimate importance sampling
33 / 36
![Page 87: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/87.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Importance target
Comparison with the ultimate importance sampling
33 / 36
![Page 88: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/88.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Importance target
Comparison with the ultimate importance sampling
33 / 36
![Page 89: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/89.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Importance target
Comparison with the ultimate importance sampling
33 / 36
![Page 90: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/90.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Extension to the general case
Same principle can be applied to any Markov update: if
xt+1 = Ψ(xt , ǫt)
then generate(ǫ1, . . . , ǫp)
in advance and distribute to the p processors in different permutationordersPlus use of Douc & X’s (2011) Rao–Blackwellisation ξki
34 / 36
![Page 91: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/91.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Extension to the general case
Same principle can be applied to any Markov update: if
xt+1 = Ψ(xt , ǫt)
then generate(ǫ1, . . . , ǫp)
in advance and distribute to the p processors in different permutationordersPlus use of Douc & X’s (2011) Rao–Blackwellisation ξki
34 / 36
![Page 92: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/92.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Implementation
Similar run of p parallel chains (x(j)t+i ), use of averages
τ2(x(1:p)1:p ) =
1
p2
p∑
k=1
p∑
j=1
nkh(x(j)t+k)
and selection of new starting value at random at time t + p:
35 / 36
![Page 93: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/93.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Implementation
Similar run of p parallel chains (x(j)t+i ), use of averages
τ2(x(1:p)1:p ) =
1
p2
p∑
k=1
p∑
j=1
nkh(x(j)t+k)
and selection of new starting value at random at time t + p:
35 / 36
![Page 94: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/94.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget distribution (based on p = 64 parallel processors, 50 blocs of pMCMC steps and 500 independent replicas).
RB par org
−0.
10−
0.05
0.00
0.05
0.10
RB par org
0.9
1.0
1.1
1.2
1.3
36 / 36
![Page 95: Talk in Telecom-Paris, Nov. 15, 2011](https://reader033.fdocuments.us/reader033/viewer/2022052907/559139851a28ab14498b4798/html5/thumbnails/95.jpg)
Metropolis Hastings revisitedRao–Blackwellisation
Rao-Blackwellisation (2)
Independent caseGeneral MH algorithms
Illustration
Variations of estimates based on RB and standard versions of parallelchains and on a standard MCMC chain for the mean and variance of thetarget distribution (based on p = 64 parallel processors, 50 blocs of pMCMC steps and 500 independent replicas).
RB par org
−0.
10−
0.05
0.00
0.05
0.10
RB par org
0.9
1.0
1.1
1.2
1.3
36 / 36