Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop:...
Transcript of Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop:...
![Page 1: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/1.jpg)
Intelligent Data Analysis and Probabilistic Inference
Lecture 16:SamplingRecommended reading:Bishop: Chapter 11MacKay: Chapter 29
Duncan Gillies and Marc DeisenrothDepartment of ComputingImperial College London
February 22–24, 2016
![Page 2: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/2.jpg)
Monte Carlo Methods—Motivation
§ Monte Carlo methods are computational techniques that makeuse of random numbers
§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability
distribution ppxq
2. Problem 2: Estimate expectations of functions under thatdistribution:
Er f pxqs “ż
f pxqppxqdx
Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically
Sampling IDAPI, Lecture 16 February 22–24, 2016 2
![Page 3: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/3.jpg)
Monte Carlo Methods—Motivation
§ Monte Carlo methods are computational techniques that makeuse of random numbers
§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability
distribution ppxq2. Problem 2: Estimate expectations of functions under that
distribution:
Er f pxqs “ż
f pxqppxqdx
Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically
Sampling IDAPI, Lecture 16 February 22–24, 2016 2
![Page 4: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/4.jpg)
Monte Carlo Methods—Motivation
§ Monte Carlo methods are computational techniques that makeuse of random numbers
§ Two typical problems:1. Problem 1: Generate samples txpsqu from a given probability
distribution ppxq2. Problem 2: Estimate expectations of functions under that
distribution:
Er f pxqs “ż
f pxqppxqdx
Example: Means/variances of distributions, marginallikelihoodComplication: Integral cannot be evaluated analytically
Sampling IDAPI, Lecture 16 February 22–24, 2016 2
![Page 5: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/5.jpg)
Monte Carlo Estimation
§ Statistical sampling can be applied to compute expectations
Er f pxqs “ż
f pxqppxqdx
«1S
Sÿ
s“1
f pxpsqq, xpsq „ ppxq
§ Example: Making predictions (e.g., Bayesian linear regressionwith a training set D “ tX, yu at test input x˚)
ppy˚|x˚,Dq “ż
ppy˚|θ, x˚qppθ|Dqdθ
«1S
Sÿ
s“1
ppy˚|θpsq, x˚q , θpsq „ ppθ|Dq
§ If we can sample from ppxq (or ppθq) we can approximate theseintegrals
Sampling IDAPI, Lecture 16 February 22–24, 2016 3
![Page 6: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/6.jpg)
Monte Carlo Estimation
§ Statistical sampling can be applied to compute expectations
Er f pxqs “ż
f pxqppxqdx
«1S
Sÿ
s“1
f pxpsqq, xpsq „ ppxq
§ Example: Making predictions (e.g., Bayesian linear regressionwith a training set D “ tX, yu at test input x˚)
ppy˚|x˚,Dq “ż
ppy˚|θ, x˚qppθ|Dqdθ
«1S
Sÿ
s“1
ppy˚|θpsq, x˚q , θpsq „ ppθ|Dq
§ If we can sample from ppxq (or ppθq) we can approximate theseintegrals
Sampling IDAPI, Lecture 16 February 22–24, 2016 3
![Page 7: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/7.jpg)
Monte Carlo Estimation
§ Statistical sampling can be applied to compute expectations
Er f pxqs “ż
f pxqppxqdx
«1S
Sÿ
s“1
f pxpsqq, xpsq „ ppxq
§ Example: Making predictions (e.g., Bayesian linear regressionwith a training set D “ tX, yu at test input x˚)
ppy˚|x˚,Dq “ż
ppy˚|θ, x˚qppθ|Dqdθ
«1S
Sÿ
s“1
ppy˚|θpsq, x˚q , θpsq „ ppθ|Dq
§ If we can sample from ppxq (or ppθq) we can approximate theseintegrals
Sampling IDAPI, Lecture 16 February 22–24, 2016 3
![Page 8: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/8.jpg)
Properties of Monte Carlo Sampling
Er f pxqs “ż
f pxqppxqdx
«1S
Sÿ
s“1
f pxpsqq, xpsq „ ppxq
§ Estimator is unbiased
§ Variance shrinks 9 1{S, regardless of the dimensionality of x
Sampling IDAPI, Lecture 16 February 22–24, 2016 4
![Page 9: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/9.jpg)
Alternatives to Monte Carlo
Er f pxqs “ż
f pxqppxqdx
To evaluate these expectations we can use other methods than MonteCarlo:
§ Numerical integration (low-dimensional problems)
§ Deterministic approximations, e.g., Variational Bayes,Expectation Propagation
Sampling IDAPI, Lecture 16 February 22–24, 2016 5
![Page 10: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/10.jpg)
Back to Monte Carlo Estimation
Er f pxqs “ż
f pxqppxqdx
«1S
Sÿ
s“1
f pxpsqq, xpsq „ ppxq
§ How do we get these samples?
Need to solve Problem 1
§ Sampling from simple distributions
§ Sampling from complicated distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 6
![Page 11: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/11.jpg)
Important Example
§ By specifying the model, we know the prior ppθq and thelikelihood ppD|θq
§ The unnormalized posterior is
ppθ|Dq9ppD|θqppθq
and there is often no hope to compute the normalization constant
§ Samples are a good way to characterize this posterior (importantfor model comparison, Bayesian predictions, ...)
Sampling IDAPI, Lecture 16 February 22–24, 2016 7
![Page 12: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/12.jpg)
Sampling Discrete Values
p = 0.3 p = 0.2 p = 0.5a b c
u = 0.55
§ u „ U r0, 1s, where U is the uniform distribution
§ u “ 0.55 ñ x “ c
Sampling IDAPI, Lecture 16 February 22–24, 2016 8
![Page 13: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/13.jpg)
Continuous Variables
p(x)
x
~
More complicated.Geometrically, sample uniformly from the area under the curve
Sampling IDAPI, Lecture 16 February 22–24, 2016 9
![Page 14: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/14.jpg)
Rejection Sampling
Sampling IDAPI, Lecture 16 February 22–24, 2016 10
![Page 15: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/15.jpg)
Rejection Sampling: Setting
p(x)
x
~
§ Assume sampling from ppzq is difficult
§ Evaluating p̃pzq “ Zppzq is easy (and Z may be unknown)
§ Find a simpler distribution (proposal distribution) qpzq fromwhich we can easily draw samples (e.g., Gaussian)
§ Find an upper bound kqpzq ě p̃pzq
Sampling IDAPI, Lecture 16 February 22–24, 2016 11
![Page 16: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/16.jpg)
Algorithm
z0 z
u0
kq(z0)kq(z)
p(z)~
acceptance arearejection area
Adapted from PRML (Bishop, 2006)
1. Generate z0 „ qpzq
2. Generate u0 „ U r0, kqpz0qs
3. If u0 ą p̃pz0q, reject the sample. Otherwise, retain z0
Sampling IDAPI, Lecture 16 February 22–24, 2016 12
![Page 17: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/17.jpg)
Properties
z0 z
u0
kq(z0)kq(z)
p(z)~
acceptance arearejection area
Adapted from PRML (Bishop, 2006)
§ Accepted pairs pz, uq are uniformly distributed under the curveof p̃pzq
§ Probability density of the z-coordiantes of accepted points mustbe proportional to p̃pzq
§ Samples are independent samples from ppzqSampling IDAPI, Lecture 16 February 22–24, 2016 13
![Page 18: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/18.jpg)
Shortcomings
z0 z
u0
kq(z0)kq(z)
p(z)~
acceptance arearejection area
Adapted from PRML (Bishop, 2006)
§ Finding k is tricky
§ In high dimensions the factor k is probably huge
§ Low acceptance rate
Sampling IDAPI, Lecture 16 February 22–24, 2016 14
![Page 19: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/19.jpg)
Shortcomings
z0 z
u0
kq(z0)kq(z)
p(z)~
acceptance arearejection area
Adapted from PRML (Bishop, 2006)
§ Finding k is tricky
§ In high dimensions the factor k is probably huge
§ Low acceptance rate
Sampling IDAPI, Lecture 16 February 22–24, 2016 14
![Page 20: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/20.jpg)
Shortcomings
z0 z
u0
kq(z0)kq(z)
p(z)~
acceptance arearejection area
Adapted from PRML (Bishop, 2006)
§ Finding k is tricky
§ In high dimensions the factor k is probably huge
§ Low acceptance rate
Sampling IDAPI, Lecture 16 February 22–24, 2016 14
![Page 21: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/21.jpg)
Importance Sampling
Sampling IDAPI, Lecture 16 February 22–24, 2016 15
![Page 22: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/22.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx “ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq
, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 23: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/23.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx
“
ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq
, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 24: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/24.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx “ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq
, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 25: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/25.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx “ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq
, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 26: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/26.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx “ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq
, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 27: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/27.jpg)
Importance Sampling
Key idea: Do not throw away all rejected samples, but give themlower weight by rewriting the integral as an expectation under asimpler distribution q (proposal distribution):
Epr f pxqs “ż
f pxqppxqdx
“
ż
f pxqppxqqpxqqpxq
dx “ż
f pxqppxqqpxq
qpxqdx
“ Eq
„
f pxqppxqqpxq
If we choose q in a way that we can easily sample from it, we canapproximate this last expectation by Monte Carlo:
Eq
„
f pxqppxqqpxq
«1S
Sÿ
s“1
f pxpsqqppxpsqqqpxpsqq
“1S
Sÿ
s“1
ws f pxpsqq, xpsq „ qpxq
Sampling IDAPI, Lecture 16 February 22–24, 2016 16
![Page 28: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/28.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 29: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/29.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 30: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/30.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 31: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/31.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 32: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/32.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 33: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/33.jpg)
Properties
§ Unbiased if q ą 0 where p ą 0 and if we can evaluate p
§ Breaks down if we do not have enough samples (puts nearly allweight on a single sample)
Degeneracy (see also Particle Filtering)
§ Many draws from proposal density q required, especially in highdimensions
§ Requires to be able to evaluate true p. Generalization exists for p̃.This generalization is biased (but consistent).
§ Does not scale to interesting problems
Different approach to sample from complicated (high-dimensional)distributions
Sampling IDAPI, Lecture 16 February 22–24, 2016 17
![Page 34: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/34.jpg)
Markov Chains
Sampling IDAPI, Lecture 16 February 22–24, 2016 18
![Page 35: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/35.jpg)
ObjectiveGenerate samples from an unknown target distribution.
Sampling IDAPI, Lecture 16 February 22–24, 2016 19
![Page 36: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/36.jpg)
Markov Chains
Key idea: Instead of independent samples, use a proposal density qthat depends on the state xptq
§ Markov property: ppxpt`1q|xp1q, . . . , xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain
§ T is called a transition operator
§ Example: Tpxpt`1q|xptqq “ N`
xpt`1q | xptq, σ2I˘
§ Samples xp1q, . . . , xptq form a Markov chain
§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them
Sampling IDAPI, Lecture 16 February 22–24, 2016 20
![Page 37: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/37.jpg)
Markov Chains
Key idea: Instead of independent samples, use a proposal density qthat depends on the state xptq
§ Markov property: ppxpt`1q|xp1q, . . . , xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain
§ T is called a transition operator
§ Example: Tpxpt`1q|xptqq “ N`
xpt`1q | xptq, σ2I˘
§ Samples xp1q, . . . , xptq form a Markov chain
§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them
Sampling IDAPI, Lecture 16 February 22–24, 2016 20
![Page 38: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/38.jpg)
Markov Chains
Key idea: Instead of independent samples, use a proposal density qthat depends on the state xptq
§ Markov property: ppxpt`1q|xp1q, . . . , xptqq “ Tpxpt`1q|xptqq onlydepends on the previous setting/state of the chain
§ T is called a transition operator
§ Example: Tpxpt`1q|xptqq “ N`
xpt`1q | xptq, σ2I˘
§ Samples xp1q, . . . , xptq form a Markov chain
§ Samples xp1q, . . . , xptq are no longer independent, but unbiasedWe can still average them
Sampling IDAPI, Lecture 16 February 22–24, 2016 20
![Page 39: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/39.jpg)
Behavior of Markov Chains
From Iain Murray’s MCMC Tutorial
Four different behaviors of Markov chains:
§ Diverge (e.g., random walk diffusion where xpt`1q „ N`
xptq, I˘
)
§ Converge to an absorbing state
§ Converge to a (deterministic) limit cycle
§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way
Sampling IDAPI, Lecture 16 February 22–24, 2016 21
![Page 40: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/40.jpg)
Behavior of Markov Chains
From Iain Murray’s MCMC Tutorial
Four different behaviors of Markov chains:
§ Diverge (e.g., random walk diffusion where xpt`1q „ N`
xptq, I˘
)
§ Converge to an absorbing state
§ Converge to a (deterministic) limit cycle
§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way
Sampling IDAPI, Lecture 16 February 22–24, 2016 21
![Page 41: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/41.jpg)
Behavior of Markov Chains
From Iain Murray’s MCMC Tutorial
Four different behaviors of Markov chains:
§ Diverge (e.g., random walk diffusion where xpt`1q „ N`
xptq, I˘
)
§ Converge to an absorbing state
§ Converge to a (deterministic) limit cycle
§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way
Sampling IDAPI, Lecture 16 February 22–24, 2016 21
![Page 42: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/42.jpg)
Behavior of Markov Chains
From Iain Murray’s MCMC Tutorial
Four different behaviors of Markov chains:
§ Diverge (e.g., random walk diffusion where xpt`1q „ N`
xptq, I˘
)
§ Converge to an absorbing state
§ Converge to a (deterministic) limit cycle
§ Converge to an equilibrium distribution p˚: Markov chainremains in a region, bouncing around in a random way
Sampling IDAPI, Lecture 16 February 22–24, 2016 21
![Page 43: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/43.jpg)
Converging to an Equilibrium Distribution
§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)
Bouncing around in an equilibrium distribution is a good thing
§ Design the Markov chain such that the equilibrium distributionis the desired posterior
§ We know the equilibrium distribution (the one we want tosample from)
Generate a Markov chain that converges to that equilibriumdistribution (independent of start state)
§ Although successive samples are dependent we can effectivelygenerate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample
Sampling IDAPI, Lecture 16 February 22–24, 2016 22
![Page 44: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/44.jpg)
Converging to an Equilibrium Distribution
§ Remember objective: Explore/sample parameters that may havegenerated our data (generate samples from posterior)
Bouncing around in an equilibrium distribution is a good thing§ Design the Markov chain such that the equilibrium distribution
is the desired posterior§ We know the equilibrium distribution (the one we want to
sample from)Generate a Markov chain that converges to that equilibrium
distribution (independent of start state)§ Although successive samples are dependent we can effectively
generate independent samples by running the Markov chain longenough: Discard most of the samples, retain only every Mthsample
Sampling IDAPI, Lecture 16 February 22–24, 2016 22
![Page 45: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/45.jpg)
Conditions for Converging to an EquilibriumDistribution
Markov chain conditions:
§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.
Self-consistency property
§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start
PropertyErgodic Markov chains only have one equilibrium distribution
Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution
Sampling IDAPI, Lecture 16 February 22–24, 2016 23
![Page 46: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/46.jpg)
Conditions for Converging to an EquilibriumDistribution
Markov chain conditions:
§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.
Self-consistency property
§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start
PropertyErgodic Markov chains only have one equilibrium distribution
Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution
Sampling IDAPI, Lecture 16 February 22–24, 2016 23
![Page 47: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/47.jpg)
Conditions for Converging to an EquilibriumDistribution
Markov chain conditions:
§ Invariance/Stationarity: If you run the chain for a long time andyou are in the equilibrium distribution, you stay in equilibrium ifyou take another step.
Self-consistency property
§ Ergodicity: Any state can be reached from any state.Equilibrium distribution is the same no matter where we start
PropertyErgodic Markov chains only have one equilibrium distribution
Use ergodic and stationary Markov chains to generate samplesfrom the equilibrium distribution
Sampling IDAPI, Lecture 16 February 22–24, 2016 23
![Page 48: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/48.jpg)
Invariance and Detailed Balance
§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):
p˚px1q “ÿ
xTpx1|xqp˚pxq
Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚
§ Sufficient condition for p˚ being invariant:Detailed balance:
p˚pxqTpx|x1q “ p˚px1qTpx|x1q
Also ensures that the Markov chain is reversible.
Sampling IDAPI, Lecture 16 February 22–24, 2016 24
![Page 49: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/49.jpg)
Invariance and Detailed Balance
§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):
p˚px1q “ÿ
xTpx1|xqp˚pxq
Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚
§ Sufficient condition for p˚ being invariant:Detailed balance:
p˚pxqTpx|x1q “ p˚px1qTpx|x1q
Also ensures that the Markov chain is reversible.
Sampling IDAPI, Lecture 16 February 22–24, 2016 24
![Page 50: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/50.jpg)
Invariance and Detailed Balance
§ Invariance: Each step leaves the distribution p˚ invariant (westay in p˚):
p˚px1q “ÿ
xTpx1|xqp˚pxq
Once we sample from p˚, the transition operator will not changethis, i.e., we do not fall back to some funny distribution p ‰ p˚
§ Sufficient condition for p˚ being invariant:Detailed balance:
p˚pxqTpx|x1q “ p˚px1qTpx|x1q
Also ensures that the Markov chain is reversible.
Sampling IDAPI, Lecture 16 February 22–24, 2016 24
![Page 51: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/51.jpg)
Metropolis-Hastings
Sampling IDAPI, Lecture 16 February 22–24, 2016 25
![Page 52: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/52.jpg)
Metropolis-Hastings
§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.
Example: Gaussian centered at xptq
Metropolis-Hastings Algorithm
1. Generate x1 „ qpx1|xptqq
2. Ifqpxptq|x1qp̃px1q
qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s
accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.
§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)
Sampling IDAPI, Lecture 16 February 22–24, 2016 26
![Page 53: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/53.jpg)
Metropolis-Hastings
§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.
Example: Gaussian centered at xptq
Metropolis-Hastings Algorithm
1. Generate x1 „ qpx1|xptqq
2. Ifqpxptq|x1qp̃px1q
qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s
accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.
§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)
Sampling IDAPI, Lecture 16 February 22–24, 2016 26
![Page 54: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/54.jpg)
Metropolis-Hastings
§ Assume that p̃ “ Zp can be evaluated easily (in practice: log p̃)§ Proposal density qpx1|xptqq depends on last sample xptq.
Example: Gaussian centered at xptq
Metropolis-Hastings Algorithm
1. Generate x1 „ qpx1|xptqq
2. Ifqpxptq|x1qp̃px1q
qpx1|xptqqp̃pxptqqě u , u „ Ur0, 1s
accept the sample xpt`1q “ x1. Otherwise set xpt`1q “ xptq.
§ If proposal distribution is symmetric: Metropolis Algorithm(Metropolis et al., 1953); Otherwise Metropolis-HastingsAlgorithm (Hastings, 1970)
Sampling IDAPI, Lecture 16 February 22–24, 2016 26
![Page 55: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/55.jpg)
Example
From Iain Murray’s MCMC Tutorial
Sampling IDAPI, Lecture 16 February 22–24, 2016 27
![Page 56: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/56.jpg)
Step-Size Demo
§ Explore ppxq “ N`
x | 0, 1˘
for different step sizes σ.
§ We can only evaluate log p̃pxq “ ´x2{2
§ Proposal distribution q: Gaussian N`
xpt`1q | xptq, σ2˘
centered atthe current state for various step sizes σ
§ Expect to explore the space between ´2, 2.
Sampling IDAPI, Lecture 16 February 22–24, 2016 28
![Page 57: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/57.jpg)
Step-Size Demo: Discussion
§ Acceptance rate depends on the step size of the proposaldistribution
Exploration parameter
§ If we do not reject enough, the method does not work.
§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.
§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties
§ Tune the step size
Sampling IDAPI, Lecture 16 February 22–24, 2016 29
![Page 58: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/58.jpg)
Step-Size Demo: Discussion
§ Acceptance rate depends on the step size of the proposaldistribution
Exploration parameter
§ If we do not reject enough, the method does not work.
§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.
§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties
§ Tune the step size
Sampling IDAPI, Lecture 16 February 22–24, 2016 29
![Page 59: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/59.jpg)
Step-Size Demo: Discussion
§ Acceptance rate depends on the step size of the proposaldistribution
Exploration parameter
§ If we do not reject enough, the method does not work.
§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.
§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties
§ Tune the step size
Sampling IDAPI, Lecture 16 February 22–24, 2016 29
![Page 60: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/60.jpg)
Step-Size Demo: Discussion
§ Acceptance rate depends on the step size of the proposaldistribution
Exploration parameter
§ If we do not reject enough, the method does not work.
§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.
§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties
§ Tune the step size
Sampling IDAPI, Lecture 16 February 22–24, 2016 29
![Page 61: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/61.jpg)
Step-Size Demo: Discussion
§ Acceptance rate depends on the step size of the proposaldistribution
Exploration parameter
§ If we do not reject enough, the method does not work.
§ In rejection sampling we do not like rejections, but in MHrejections tell you where the target distribution is.
§ Theoretical results: in 1D 44%, in higher dimensions about 25%acceptance rate for good mixing properties
§ Tune the step size
Sampling IDAPI, Lecture 16 February 22–24, 2016 29
![Page 62: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/62.jpg)
Properties
§ Unlike rejection sampling, the previous sample is used to resetthe chain (if a sample was discarded)
§ If q ą 0, we will end up in the equilibrium distribution:pptqpxq tÑ8
ÝÑ p˚pxq
§ Explore the state space by random walkMay take a while in high dimensions
§ No further catastrophic problems in high dimensions
Sampling IDAPI, Lecture 16 February 22–24, 2016 30
![Page 63: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/63.jpg)
Gibbs Sampling
Sampling IDAPI, Lecture 16 February 22–24, 2016 31
![Page 64: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/64.jpg)
Gibbs Sampling
§ Assumption: ppxq is too complicated to draw samples fromdirectly, but its conditionals ppxi|xziq are tractable to work with
§ Example:
yi „ N`
µ, τ´1˘ , µ „ N`
0, 1˘
, τ „ Gammap2, 1q
Then
ppy, µ, τq “nź
i“1
ppyi|µ, τqppµqppτq
9τn{2 expp´τ
2
ÿ
i
pyi ´ µq2q expp´12 µ2qτ expp´τq
ppµ|τq “ N` τ
ř
i yi1`nτ , p1` nτq´1˘
ppτ|µq “ Gammap2` n2 , 1` 1
2
ÿ
i
pyi ´ µq2q
Sampling IDAPI, Lecture 16 February 22–24, 2016 32
![Page 65: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/65.jpg)
Gibbs Sampling
§ Assumption: ppxq is too complicated to draw samples fromdirectly, but its conditionals ppxi|xziq are tractable to work with
§ Example:
yi „ N`
µ, τ´1˘ , µ „ N`
0, 1˘
, τ „ Gammap2, 1q
Then
ppy, µ, τq “nź
i“1
ppyi|µ, τqppµqppτq
9τn{2 expp´τ
2
ÿ
i
pyi ´ µq2q expp´ 12 µ2qτ expp´τq
ppµ|τq “ N` τ
ř
i yi1`nτ , p1` nτq´1˘
ppτ|µq “ Gammap2` n2 , 1` 1
2
ÿ
i
pyi ´ µq2qSampling IDAPI, Lecture 16 February 22–24, 2016 32
![Page 66: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/66.jpg)
Algorithm
z1
z2
L
l
From PRML (Bishop, 2006)
Assuming n parameters x1, . . . , xn, Gibbs sampling samplesindividual variables conditioned on all others:
1. xpt`1q1 „ ppx1|x
ptq2 , . . . , xptqn q
2. xpt`1q2 „ ppx2|x
pt`1q1 , xptq3 , . . . , xptqn q
3....
4. xpt`1qn „ ppxn|x
pt`1q1 , . . . , xpt`1q
n´1
Sampling IDAPI, Lecture 16 February 22–24, 2016 33
![Page 67: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/67.jpg)
Gibbs Sampling: Ergodicity
§ ppxq is invariant§ Ergodicity: Sufficient to show that all conditionals are greater
than 0.Then any point in x-space can be reached from any other point
(potentially with low probability) in a finite number of stepsinvolving one update of each of the component variables.
Sampling IDAPI, Lecture 16 February 22–24, 2016 34
![Page 68: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/68.jpg)
Properties
§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq
Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq
Exploration by random walk behavior can be slow
§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from
the conditionals§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it
works out how to do the updates: STAN1, WinBUGS2, JAGS3
1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/
Sampling IDAPI, Lecture 16 February 22–24, 2016 35
![Page 69: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/69.jpg)
Properties
§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq
Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq
Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)
§ Applicability depends on how easy it is to draw samples fromthe conditionals
§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it
works out how to do the updates: STAN1, WinBUGS2, JAGS3
1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/
Sampling IDAPI, Lecture 16 February 22–24, 2016 35
![Page 70: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/70.jpg)
Properties
§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq
Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq
Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from
the conditionals
§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it
works out how to do the updates: STAN1, WinBUGS2, JAGS3
1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/
Sampling IDAPI, Lecture 16 February 22–24, 2016 35
![Page 71: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/71.jpg)
Properties
§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq
Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq
Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from
the conditionals§ May not work well if the variables are correlated
§ Statistical software derives the conditionals of the model, and itworks out how to do the updates: STAN1, WinBUGS2, JAGS3
1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/
Sampling IDAPI, Lecture 16 February 22–24, 2016 35
![Page 72: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/72.jpg)
Properties
§ Gibbs is Metropolis-Hastings with acceptance probability 1:Sequence of proposal distributions q is defined in terms ofconditional distributions of the joint ppxq
Converge to equilibrium distribution: pptqpxq tÑ8ÝÑ ppxq
Exploration by random walk behavior can be slow§ No adjustable parameters (e.g., step size)§ Applicability depends on how easy it is to draw samples from
the conditionals§ May not work well if the variables are correlated§ Statistical software derives the conditionals of the model, and it
works out how to do the updates: STAN1, WinBUGS2, JAGS3
1http://mc-stan.org/2http://www.mrc-bsu.cam.ac.uk/software/bugs/3http://mcmc-jags.sourceforge.net/
Sampling IDAPI, Lecture 16 February 22–24, 2016 35
![Page 73: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/73.jpg)
Slice Sampling
Sampling IDAPI, Lecture 16 February 22–24, 2016 36
![Page 74: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/74.jpg)
Key Idea behind Slice Sampling
p(x)
x
~
§ Idea: Sample point (random walk)uniformly under the curve p̃pxq
§ Introduce additional variable u, define joint p̂px, uq:
p̂px, uq “
#
1{Zp if 0 ď u ď p̃pxq0 otherwise
, Zp “
ż
p̃pxqdx
§ The marginal distribution over x is thenż
p̂px, uqdu “ż p̃pxq
01{Zpdu “ p̃pxq{Zp “ ppxq
Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values
§ Gibbs sampling: Update one variable at a time
Sampling IDAPI, Lecture 16 February 22–24, 2016 37
![Page 75: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/75.jpg)
Key Idea behind Slice Sampling
p(x)
x
~
§ Idea: Sample point (random walk)uniformly under the curve p̃pxq
§ Introduce additional variable u, define joint p̂px, uq:
p̂px, uq “
#
1{Zp if 0 ď u ď p̃pxq0 otherwise
, Zp “
ż
p̃pxqdx
§ The marginal distribution over x is thenż
p̂px, uqdu “ż p̃pxq
01{Zpdu “ p̃pxq{Zp “ ppxq
Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values
§ Gibbs sampling: Update one variable at a time
Sampling IDAPI, Lecture 16 February 22–24, 2016 37
![Page 76: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/76.jpg)
Key Idea behind Slice Sampling
p(x)
x
~
§ Idea: Sample point (random walk)uniformly under the curve p̃pxq
§ Introduce additional variable u, define joint p̂px, uq:
p̂px, uq “
#
1{Zp if 0 ď u ď p̃pxq0 otherwise
, Zp “
ż
p̃pxqdx
§ The marginal distribution over x is thenż
p̂px, uqdu “ż p̃pxq
01{Zpdu “ p̃pxq{Zp “ ppxq
Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values
§ Gibbs sampling: Update one variable at a time
Sampling IDAPI, Lecture 16 February 22–24, 2016 37
![Page 77: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/77.jpg)
Key Idea behind Slice Sampling
p(x)
x
~
§ Idea: Sample point (random walk)uniformly under the curve p̃pxq
§ Introduce additional variable u, define joint p̂px, uq:
p̂px, uq “
#
1{Zp if 0 ď u ď p̃pxq0 otherwise
, Zp “
ż
p̃pxqdx
§ The marginal distribution over x is thenż
p̂px, uqdu “ż p̃pxq
01{Zpdu “ p̃pxq{Zp “ ppxq
Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values
§ Gibbs sampling: Update one variable at a time
Sampling IDAPI, Lecture 16 February 22–24, 2016 37
![Page 78: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/78.jpg)
Key Idea behind Slice Sampling
p(x)
x
~
§ Idea: Sample point (random walk)uniformly under the curve p̃pxq
§ Introduce additional variable u, define joint p̂px, uq:
p̂px, uq “
#
1{Zp if 0 ď u ď p̃pxq0 otherwise
, Zp “
ż
p̃pxqdx
§ The marginal distribution over x is thenż
p̂px, uqdu “ż p̃pxq
01{Zpdu “ p̃pxq{Zp “ ppxq
Obtain samples from unknown ppxq by sampling from p̂px, uqand then ignore u values
§ Gibbs sampling: Update one variable at a timeSampling IDAPI, Lecture 16 February 22–24, 2016 37
![Page 79: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/79.jpg)
Slice Sampling Algorithm
p(x)
x(t)
u
x
~ p(x)
x(t)
uxmin xmax
~
x
Adapted from PRML (Bishop, 2006)
§ Repeat the following steps:1. Draw u|xptq „ U r0, p̃pxqs2. Draw xpt`1q|u „ U rtx : p̃pxq ą uus slice
§ In practice, we sample xpt`1q|u uniformly from an intervalrxmin, xmaxs around xptq.
§ The interval is found adaptively (see Neal (2003) for details)
Sampling IDAPI, Lecture 16 February 22–24, 2016 38
![Page 80: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/80.jpg)
Slice Sampling Algorithm
p(x)
x(t)
u
x
~ p(x)
x(t)
uxmin xmax
~
x
Adapted from PRML (Bishop, 2006)
§ Repeat the following steps:1. Draw u|xptq „ U r0, p̃pxqs2. Draw xpt`1q|u „ U rtx : p̃pxq ą uus slice
§ In practice, we sample xpt`1q|u uniformly from an intervalrxmin, xmaxs around xptq.
§ The interval is found adaptively (see Neal (2003) for details)
Sampling IDAPI, Lecture 16 February 22–24, 2016 38
![Page 81: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/81.jpg)
Relation to other Sampling Methods
Similar to:
§ Metropolis: Just need to be able to evaluate p̃pxqMore robust to the choice of parameters (e.g., step size isautomatically adapted)
§ Gibbs: 1-dimensional transitions in state spaceNo longer required that we can easily sample from 1-Dconditionals
§ Rejection: Asymptotically draw samples from the volume underthe curve described by p̃No upper-bounding of p̃ required
Sampling IDAPI, Lecture 16 February 22–24, 2016 39
![Page 82: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/82.jpg)
Properties
§ Slice sampling can be applied to multivariate distributions byrepeatedly sampling each variable in turn (similar to Gibbssampling). See (Neal, 2003; Murray et al., 2010) for moredetails
§ This requires to compute a function that is proportional toppxi|xziq for all variables xi.
§ No rejections
§ Adaptive step sizes
§ Easy to implement
§ Broadly applicable
Sampling IDAPI, Lecture 16 February 22–24, 2016 40
![Page 83: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/83.jpg)
Properties
§ Slice sampling can be applied to multivariate distributions byrepeatedly sampling each variable in turn (similar to Gibbssampling). See (Neal, 2003; Murray et al., 2010) for moredetails
§ This requires to compute a function that is proportional toppxi|xziq for all variables xi.
§ No rejections
§ Adaptive step sizes
§ Easy to implement
§ Broadly applicable
Sampling IDAPI, Lecture 16 February 22–24, 2016 40
![Page 84: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/84.jpg)
Discussion MCMC
§ Asymptotic guarantee to converge to the equilibrium distributionfor any kind of model
§ General-purpose method to draw samples in any kind ofprobabilistic model Probabilistic Programming
§ Convergence difficult to assess
§ Long chains required in high dimensions
§ Choice of proposal distribution is hard
§ Need to store all samples (subsequent computations require towork with these samples)
Sampling IDAPI, Lecture 16 February 22–24, 2016 41
![Page 85: Lecture 16: Samplingdfg/ProbabilisticInference/IDAPISlides16.pdf · Recommended reading: Bishop: Chapter 11 MacKay: Chapter 29 Duncan Gillies and Marc Deisenroth ... we can approximate](https://reader033.fdocuments.us/reader033/viewer/2022050514/5f9e50875798d1373c041da1/html5/thumbnails/85.jpg)
References I
[1] C. M. Bishop. Pattern Recognition and Machine Learning.Information Science and Statistics. Springer-Verlag, 2006.
[2] D. J. C. MacKay. Information Theory, Inference, and LearningAlgorithms. Cambridge University Press, The Edinburgh Building,Cambridge CB2 2RU, UK, 2003.
Sampling IDAPI, Lecture 16 February 22–24, 2016 42