Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Introduction to Sequential Monte Carlo Methods
Importance Sampling for Nonlinear Non-Gaussian Dynamic Models
Prof. Nicholas Zabaras Materials Process Design and Control Laboratory
Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall
Cornell University Ithaca, NY 14853-3801
Email: [email protected]
URL: http://mpdc.mae.cornell.edu/
April 28, 2013 1
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Contents References and SMC Resources
INTRODUCING THE STATE SPACE MODEL: Discrete time Markov Models, Tracking Problem, Speech Enhancement, the Dynamics of an Asset, The state space model with observations, target distribution and point estimates, particle motion in random field
BAYESIAN INFERENCE IN STATE SPACE MODELS: Bayesian recursion for the state-space model, Forward Filtering, Forward-Backward Filter
ONLINE BAYESIAN PARAMETER ESTIMATION: Online parameter estimation, MLE, EM, Need for numerical approach to HMM
IMPORTANCE SAMPLING: Review of IS, Monte Carlo for the State Space Example, IS for the State Space Model, Optimal Importance Distribution, Effective Sample Size, Sequential IS
2
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References C.P. Robert & G. Casella, Monte Carlo Statistical Methods, Chapter 11
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.
A. Doucet, N. De Freitas & N. Gordon (eds), Sequential Monte Carlo in Practice, Springer-Verlag:
2001
A. Doucet, N. De Freitas, N.J. Gordon, An introduction to Sequential Monte Carlo, in SMC in Practice, 2001
D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006
E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006
J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems, JASA, 1998
A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI
A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources
Pierre Del Moral, Feynman-Kac models and interacting particle systems (SMC resources)
A. Doucet, Sequential Monte Carlo Methods, Video Lectures, 2007
N. de Freitas and A. Doucet, Sequential MC Methods, N. de Freitas and A. Doucet, Video Lectures,
2010
3
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References M.K. Pitt and N. Shephard, Filtering via Simulation: Auxiliary Particle Filter, JASA, 1999
A. Doucet, S.J. Godsill and C. Andrieu, On Sequential Monte Carlo sampling methods for Bayesian
filtering, Stat. Comp., 2000
J. Carpenter, P. Clifford and P. Fearnhead, An Improved Particle Filter for Non-linear Problems, IEE 1999.
A. Kong, J.S. Liu & W.H. Wong, Sequential Imputations and Bayesian Missing Data Problems, JASA, 1994
O. Cappe, E. Moulines & T. Ryden, Inference in Hidden Markov Models, Springer-Verlag, 2005
W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models, JRSS B, 2001
G. Poyadjis, A. Doucet and S.S. Singh, Maximum Likelihood Parameter Estimation using Particle Methods, Joint Statistical Meeting, 2005
N Gordon, D J Salmond, AFM Smith, Novel Approach to nonlinear non Gaussian Bayesian state estimation, IEE, 1993
Particle Filters, S. Godsill, 2009 (Video Lectures)
R. Chen and J.S. Liu, Predictive Updating Methods with Application to Bayesian Classification, JRSS B, 1996
4
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References C. Andrieu and A. Doucet, Particle Filtering for Partially Observed Gaussian State-Space Models,
JRSS B, 2002
R Chen and J Liu, Mixture Kalman Filters, JRSSB, 2000
A Doucet, S J Godsill, C Andrieu, On SMC sampling methods for Bayesian Filtering, Stat. Comp. 2000
N. Kantas, A.D., S.S. Singh and J.M. Maciejowski, An overview of sequential Monte Carlo methods for parameter estimation in general state-space models, in Proceedings IFAC System Identification (SySid) Meeting, 2009
C. Andrieu, A.Doucet & R. Holenstein, Particle Markov chain Monte Carlo methods, JRSS B, 2010
C. Andrieu, N. De Freitas and A. Doucet, Sequential MCMC for Bayesian Model Selection, Proc. IEEE Workshop HOS, 1999
P. Fearnhead, MCMC, sufficient statistics and particle filters, JCGS, 2002
G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Trans. Signal Processing, 2002
5
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References C. Andrieu, A. Doucet and V.B. Tadic, Online EM for parameter estimation in nonlinear-non
Gaussian state-space models, Proc. IEEE CDC, 2005
G. Poyadjis, A. Doucet and S.S. Singh, Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation, Biometrika, 2011
C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for
Genome Wide Transcript Analysis, Technical report 2008
R. Martinez-Cantin, J. Castellanos and N. de Freitas. Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM. International Conference on Robotics and Automation
C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), JRSS B, 2010
A Doucet, Sequential Monte Carlo Methods and Particle Filters, List of Papers, Codes, and Viedo lectures on SMC and particle filters
Pierre Del Moral, Feynman-Kac models and interacting particle systems
6
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
References P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo samplers, JRSSB, 2006
P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation, Bayesian
Statistics, 2006
P. Del Moral, A. Doucet & S.S. Singh, Forward Smoothing using Sequential Monte Carlo, technical report, Cambridge University, 2009
P. Del Moral, Feynman-Kac Formulae, Springer-Verlag, 2004
Sequential MC Methods, M. Davy, 2007 A Doucet, A Johansen, Particle Filtering and Smoothing: Fifteen years later, in Handbook of
Nonlinear Filtering (edts D Crisan and B. Rozovsky), Oxford Univ. Press, 2011
A. Johansen and A. Doucet, A Note on Auxiliary Particle Filters, Stat. Proba. Letters, 2008.
A. Doucet et al., Efficient Block Sampling Strategies for Sequential Monte Carlo, (with M. Briers & S. Senecal), JCGS, 2006.
C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis, Stat Comput. 2011.
7
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Introduction Sequential Monte Carlo (SMC) methods are used to approximate
any sequence of probability distributions.
They are used often in physics Compute eigenvalues of positive operators Compute free energies Solve differential or integral equations Simulate polymer chains Etc.
Hidden Markov Models (HMM) are used in these notes and most tutorials for introducing SMC – but SMC is clearly a method for a much bigger class of problems.
In HMM, SMC methods are often known is Particle Filtering or Smoothing Methods.
8
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 9
Introducing the State Space Model
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Discrete- Time Markov Model Consider a discrete-time Markov process : {Xn},n≥1
It is defined by an initial density and a transition
density:
We then can write (prior distribution of the states):
10
1 ~ (.)X µ
( ) ( )1| ~ |n nX X x f x− = ⋅
( ) ( ) ( )1: 1 1 12
, , ( ) |n
n n k kk
p p x x x f x xµ −=
≡ = ∏x
x1 x2 xn ... Markov Chain
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Tracking Example Consider tracking a target in the XY plane (location/speed in x-y):
We consider the constant velocity model:
The transition density for this model is then:
11
( ),1 ,1 ,2 ,2, , ,T
k k k k kX X V X V=
1
3 2
22
, ~ (0, )0 1
,0 0 1
0 3 2,0
2
k k k k
CVCV
CV
CVCV
CV
X XT
T T
T Tσ
−= +
= =
= =
A W WA
A AA
N Σ
ΣΣ Σ
Σ
( )1 1| ( ; , )k k k kf x x x x− −= AN Σ
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Speech Enhancement We model speech signals as an autoregressive (AR) process, i.e.
We can write this in a matrix form as follows:
The transition density is now:
12
2
1, ~ (0, )
d
k i k i k k si
S S V Vα σ−=
= +∑ N
( )1
1 2
, ,....,
... 11 0
,... :
1 0
Tk k k k k k d
d
U U V U S S
α α α− −= + =
= =
A B
A B
( ) ( ) ( ) ( )( )1 1: 1
21 1 1 2:
| ( ; , )k d
U k k k k s ku df u u u u uσ δ
− −− −= AN
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Speech Enhancement We can also consider the AR coefficients to be time dependent:
Thus for non-stationary speech signals, we can write:
The process Xk=(αk,Uk) is a Markov with transition density
13
( )
21
,1 ,
, ~ (0, ), :
,...,k k k k d
Tk k k d
W W I whereαα α σ
α α α
−= +
=
N
( ) 21 1| ( ; , )k k k k df Iα αα α α α σ− −= N
( ) ( ) ( ) ( )( )
( ) ( )
1 1: 1
2 21 1 1 1 2:
1
1 ,1 ,1
1
| ( ; , ) ( ; , )
,..., :
k dk k k k d k k k s ku d
kT
k k k k d
k d
f x x I u u u
withS
A uS
αα α σ σ δ
α α
− −− − −
−
−
− −
=
=
AN N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The Dynamics of an Asset The Heston model (1993) describes the dynamics of an
asset price St with the following model for Xt = log(St)
dXt = μdt + dWt + dZt where Zt is a jump process, and dWt Brownian motion.
We approximate this (time integration) by a discrete-time
Markov process Xt+δ = Xt + δμ +Wt+δ,t + Zt+δ,t
The same model is used for biochemichal networks, disease and population dynamics, etc.
14
D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006 E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Example: The State Space Model Let us discuss in some detail a very popular dynamic system, the state
space model, now including observations
A state space model is an extension of a Markov Chain which is able to capture the sequential relations among hidden variables.
It is a dynamic system including two major parts
x1 x2 xn ...
y1 y2 yn ... observations
Markov Chain
15
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The State Space Model The two parts can be expressed by equations
state equation: {Xn},n≥1 is a latent/hidden Markov process with
observation equation: {Yn},n≥1 is an observation process with the observations being conditionally independent given {Xn},n≥1
The observations {yn} are conditionally independent, i.e. given the Markov states {xn}, and are independent. Thus the likelihood is
Our aim is to recover {Xn},n≥1 given {Yn},n≥1.
( ) ( )1 1 1 1~ (.) | ~ |n n n nX and X X x f xµ − − −= ⋅
( ) ( )| ~ |n n n nY X x g x= ⋅
( ) ( )1 11
, , | , , |n
n n i ii
p y y x x g y x=
=∏
( )|i ig y x ( )|j jg y x
16
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The State Space Model: Examples A Linear Gaussian State Space Model
A Stochastic Volatility Model
1 1 1 1
. . . . . .
~ ( , ),
~ (0, ) ~ (0, )
−Σ = +
= +
Σ Σ
n n n
n n ni i d i i d
n v n w
X m and X AX BVY CX DW where
V and W
N
N N
( )
2
1 12
. . . . . .2
~ 0,1
exp / 2 ,
| | 1, ~ (0, ) ~ (0,1)
σ αα
β
α σ
−
= + −
=
<
n n n
n n n
i i d i i d
n n
X and X X V
Y X W where
V and W
N
N N
17
( )21~ ,n nx xα σ−N ( ) ( )( )2| ;0, expn n n ng y x y xβ= N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Tracking Example The simplest linear model is of the form:
The non-linear version (Bearings-only-tracking) is more popular:
Note that the mean of the Gaussian is a highly non-linear function of the state.
18
. . ., ~ (0, )
( | ) ( ; , )
i i d
k k k k e
k k k k e
Y CX DE Eg y x y CX
+ Σ ⇒= Σ
= NN
. . .,21 2
,1
,21 2
,1
tan , ~ (0, )
( | ) ; tan ,
i i dk
k k kk
kk k k
k
XY E E
X
xg y x y
x
σ
σ
−
−
= + ⇒
=
N
N
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
The State Space Model At time n, we have a total of n observations and the target distribution to
be estimated is
The target distribution is “time-varying”. The posterior distribution should be updated after new observations are added. Thus we need to estimate a sequence of distributions according to the time sequence
( )1: 1:|n np x y
x1 x2 xn ...
y1 y2 yn ... observations
Markov Chain
( )1 1|p x y ( )1: 1:|n np x y Target Distribution ( )1:2 1:2|p x y
19
( ) ( )
( ) ( ) ( )
1 11
1: 1 12
: , , | , , |
|
n
n n i ii
n
n k kk
Likelihood p y y x x g y x
Prior : p x f x xµ
=
−=
=
=
∏
∏
x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 20
Bayesian Inference in State Space Models
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayesian Inference in State- Space Models In Bayesian estimation, the target distribution (posterior) for such
state-space model is
The state equation for the Markov process defines a prior as
The observation equation defines the likelihood as
The posterior distribution is known up to a normalizing constant
( )1: 1:|n np x y
( ) ( ) ( )1: 1 12
|n
n k kk
p x f x xµ −=
= ∏x
( ) ( )1: 1:1
| |n
n n k kk
p g y x=
=∏y x
( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( )
1: 1:1: 1: 1: 1: 1: 1: 1: 1 1
2 11: Pr
1: 1: 1: 1: 1:
| | | |
... |
µ −= =
= ∝ = =
=
∏ ∏
∫ ∫
n nn n
n n n n n n n k k k kk kn ior Likelihood
n n n n n
pp p p p x f x x g y x and
p
p p p d
x , yx y x , y x y x
y
y x y x x
21
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayesian Inference in State- Space Models In this lecture, our target distribution is as follows:
The posterior and marginal likelihood don’t admit close forms unless
{Xn} and {Yn} follow linear Gaussian eqs or when {Xn} takes values in a finite state space.
( ) ( ) ( ) ( ) ( ) ( )1:1: 1: 1: 1: 1: 1: 1:| , ,n n
n n n n n n n n n nn
p p Z pZ
γπ γ= = = =
xx x y x x , y y
22
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayesian Inference in State- Space Models From the posterior distribution, one can compute useful point
estimates
One can also compute the MAP estimate for components
The posterior mean (minimum mean square estimate) can also be estimated as:
( )1: 1:arg max |n np x y
23
( )1:arg max |k np x y
( ) ( )1: 1: 1: 1: 1 1:| ... |k n n n k k np x p d d− += ∫ ∫y x y x x
[ ] ( )1: 1:| |k n k k n kX x p x dx= ∫y y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Particle Motion in Random Medium Consider a Markovian particle evolving in a random
medium as follows:
At time n, the probability for the particle to be killed is given as 1-g(Xn), where 0≤g(x)≤1 for any
Let T be the time at the which the particle is killed. We want to compute the probability Pr(T>n).
{ }, 1nX n ≥
( ) ( )1 1~ (.) | ~ |n nX and X X x f xµ + = ⋅
.x E∈
24
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Particle Motion in Random Medium Starting from t=1, given the current state x1, the probability for the
particle to survive is given as g(x1).
Thus, the joint probability {particle at state x1, particle survive} is
By integration on x1, the probability that a particle survives at time t=1 is
( ) ( )1 1 1x g x dxµ∫
( ) ( )1 1x g xµ
25
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Particle Motion in Random Medium At t=2, given the state x1, the current state x2 is determined by the
transition probability
The probability for such a particle to survive at time t=2 is also determined by current state x2, i.e. the probability is g(x2)
If the particle survives at time t=2, it means 1. at time 1, the particle survives with probability g(x1)
2. state x1 determines the current state x2 with probability f(x2|x1) 3. the probability to survive at time t=2 is g(x2)
( ) ( ) ( ) ( )1 2 1 1 2|x f x x g x g xµ
( )2 1|f x x
The joint probability for the three events is
( ) ( )1 2 1|x f x xµ determines the random states ( ) ( )1 2g x g x determines the probability to survive at each time
26
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Particle Motion in Random Medium This can be considered as a typical Hidden Markov Model
The probability density for the particle to survive at time t=n is
By integration over the state variables xk, we obtain the probability for the particle to survive at time t=n
Markov Chain (state equation)
( )1~ |k k kx f x x −
Survive (observation equation) ( )~k ky g x
( ) ( ) ( )1 12 1
|n n
k k kk k
x f x x g xµ −= =
⋅∏ ∏
[ ]
( ) ( ) ( )
1:
1 1 1:2 1
Pr( ) Pr
|
n
n n
k k k nk k
T n E obability of not being killed given X
x f x x g x d
µ
µ −= =
> = =
= ⋅∏ ∏∫ x
27
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Particle Motion in Random Medium
To place this calculation in our SMC framework, we define the following:
Then the integration needed to compute the required probability is just the normalization constant of , i.e.
28
( )1:n nxγ
( ) ( ) ( )
( ) ( )
1 1 1:2 1
1:1:
|
Pr( )
n n
n k k k nk k
n nn n
n
n
Z x f x x g x d
andZ
Z T n
µ
γπ
−= =
= ⋅
=
= >
∏ ∏∫ x
xx
[ ]
( ) ( ) ( )
1:
1 1 1:2 1
Pr( ) Pr
|
n
n n
k k k nk k
T n E obability of not being killed given X
x f x x g x d
µ
µ −= =
> = =
= ⋅∏ ∏∫ x
( ) ( ) ( ) ( )1: 1 12 1
|n n
n n k k kk k
x f x x g xγ µ −= =
= ⋅∏ ∏x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 29
Bayesian Recursion Fomulas for the State Space Model
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Let us return to our state space model where the objective is to compute . We want to calculate this sequentially.
We can write the following recursion equation:
where the prediction of yn given y1:n-1 is:
We can write our update equation above in two recursive steps:
( ) ( ) ( )( ) ( )
1: 1 1: 1 1: 1
1 1: 1 1: 1 1 1: 1 1:
| , ( | )
( | ) , ( | ) ( | )
n n n n n n n n n n n
n n n n n n n n n n n n n n n
p y p y x dx g y x p x dx
g y x p x x d g y x f x x p x d
− − −
− − − − − − −
= =
= =
∫ ∫∫ ∫
y | y | y
| y x | y x
Bayesian Recursion for the State Space Model
30
( )1: 1:|n np x y
( ) ( ) ( )( ) ( ) ( ) ( )
( )( )( ) ( )
( ) ( ) ( ) ( ) ( )1:
1: 1: 1: 1: 1: 1: 11: 1: 1: 1 1: 1 1: 1 1: 1
1: 1 1: 1 1: 1 1: 1 1: 1 1:
Pr :
1 1: 1 1: 11 1: 1 1: 1
1: 1
| |
( | ) |1( | ) ||
n n n n n nn n n n n n
n n n n n n
edictive p
n n n n n nn n n n n n
n n
p p p pp p p
p p p p
g y x f x x pg y x f x x p
p y
−− − − −
− − − − −
− − −− − −
−
= =
= =
x
x , y y x , y yx y x y x , y
x , y y x , y y
x | yx , y
y
( )
( )
1: 1
1: 1|
n n
n np y
−
−
| y
y
( ) ( ) ( )
( ) ( )( ) ( )
1: 1: 1 1 1: 1 1: 1
1: 1: 11: 1: 1: 1: 1
1: 1
|
( | )| ( | )
|
n n n n n n
n n n nn n n n n n
n n
Step I - Prediction : p f x x p
g y x pStep II -Update : p g y x p
p y
− − − −
−−
−
=
= ∝
x | y x | y
x | yx y x | y
y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Prediction- Updating for the Marginal A two-step prediction/update for the marginal (filtering distributions)
can also be easily derived.
where: This recursion leads to the Kalman filter and the standard HMM filter
for linear Gaussian models.
Note that in the context of SMC these are not directly useful results. Our key emphasis remains in the calculation of even if our interests are in computing
31
( )1:|n np x y
( ) ( )( ) ( )( ) ( )
( ) ( ) ( )( )
1: 1 1: 1: 1 1
1 1: 1 1 1: 1 1
1 1 1: 1 1
1: 11: 1: 1
1: 1
( | ),
|
n n n n n n
n n n n n n
n n n n n
n n n nn n n n n
n n
Step I - Prediction : p x p dx
p x x , p x dx
f x x p x dx
g y x p xStep II -Update : p x p x y
p y
− − − −
− − − − −
− − − −
−−
−
=
=
=
= =
∫∫∫
| y x | y
| y | y
| | y
| y| y | y
y
( )1: 1:|n np x y
( ) ( )1: 1 1 1 1: 1 1:| ( | ) ( | )n n n n n n n n n np y g y x f x x p x d− − − − −= ∫y | y x
( ){ }1:n np x | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
To compute the normalizing factor ., one can use recursive calculation avoiding high dimensional integration.
To compute , we use the following recursion:
We can now see that the calculation of is a product of lower dimensional integrals.
( ) ( ) ( )( ) ( )
1: 1 1: 1 1: 1
1 1: 1 1: 1 1 1: 1 1:
| , ( | )
( | ) , ( | ) ( | )
k k k k k k k k k k k
k k k k k k k n n k k k k k k
p y p y x dx g y x p x dx
g y x p x x d g y x f x x p x d
− − −
− − − − − − −
= =
= =
∫ ∫∫ ∫
y | y | y
| y x | y x
Recursive Calculation of the Marginal
32
( )1:np y
( ) ( )1: 1 1: 12
( ) |n
n k kk
p p y p y −=
= ∏y y
( )1: 1|k kp y −y
( )1:np y
( )1:np y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Forward Filtering Backward Smoothing One can also estimate the marginal smoothing distribution (an offline estimate once all measurements
are collected)
Indeed, one can show:
Here we used (see Appendix next)
33
( )1:| , 1,...,k np x k n=y
( ) ( )
( ) ( )( ) ( )
1: 1 1:
1 1:1: 1 1: 1
1 1:
, , 1, 2,...,( )
( | )|
k k k k
k k k kk n k n k
k k
I - Forward pass : Compute and store p x p x k nuse the prediction and update recursions from an earlier slide
f x x p xII - Backward pass (k = n -1,n - 2,..,1) : p x p x dx
p x
+
++ +
+
=
= ∫
| y | y
| y| y | y
y
( ) ( ) ( )
( ) ( )( ) ( )
1: 1 1: 1 1 1: 1 1: 1
1 1:1 1: 1 1: 1 1 1: 1
1 1:
, ( | , )
( | )( | , )
|
k n k k n k k k n k n k
k k k kk k k k n k k n k
k k
p x p x x dx p x x p x dx
f x x p xp x x p x dx p x dx
p x
+ + + + +
++ + + + +
+
= =
= =
∫ ∫
∫ ∫
| y | y y | y
| yy | y | y
y
( )1 1: 1 1:( | , ) | ,k k n k k kp x x p x x+ +=y y
1:ny
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Appendix Here we prove the Eq. used in the earlier slide:
Note that:
34
( )1 1: 1 1:( | , ) | ,k k n k k kp x x p x x+ +=y y
( ) ( )( )
( )( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
1:n 1:n 1: 1 2:1 1:n1 1:n
1 1:n 1:n 1:n 1: 2:
1
1 1 1: 1 2:1
1
1 1 1: 2:1
1 11
,, ,| ,
, ,
| | |
| | |
| |
k k nk kk k
k k k n
n
i i i i n n k k nin
i i i i n n k k nik
i i i ii
p x dx dxp x xp x x
p x p x dx dx
p x f x x g y x g y x dx dx
p x f x x g y x g y x dx dx
p x f x x g y x
− +++
+ +
−
+ − +=−
+ +=
+=
= =
⋅=
⋅
=
∫∫
∏∫
∏∫
∏
yyy
y y
( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( )( ) ( )
1
1: 1 1 2:1
1
1 1 1: 1 2:1 1
1 1:1 1:k
1 1:
| | |
| | | | |
, ,| ,
,
n
k i i i i n n k ni k
k n
i i i i k i i i i n n k ni i k
k k kk k
k k
dx f x x g y x g y x dx
p x f x x g y x dx f x x g y x g y x dx
p x xp x x
p x
−
− + += +−
+ + += = +
++
+
= =
∏∫ ∫
∏ ∏∫ ∫
yy
y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Forward- Backward (Two- Filter) Smoother One can also estimate the marginal smoothing distribution as follows
(see proof on the following slide):
Note that we can have: . This can lead to wrong algorithms.
This is known as the forward-backward smoother.
35
( )1:| , 1,...,k np x k n=y
( ) ( )( ) ( )( ) ( )
( ) ( )( )
1: 1: 1 1
1: 1 1 1
2: 1 1 1 1 1
1: 1:1:
1: 1:
| , |
| , |
| ( | ) |
( | )|
k n k k n k k k
k n k k k k k
k n k k k k k k
k k k n kk n
k n k
Step I - Backward information filter : p x p x x dx
p x x f x x dx
p x g y x f x x dx
p x p xStep II -Update : p x
p
+ + + +
+ + + +
+ + + + + +
+
+
=
=
=
=
∫∫∫
y y
y
y
y y || y
y y
( )1: |k n k kp x dx+ = ∞∫ y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Proof of the Two- Filter Smoother Note that: . We can look at each term
separately:
The update rule is then:
36
( ) ( )( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )
1
1 1 1: 1 1:1: 1
1: 1: 11:
1 1 1: 11
1 1
1 1 1: 1 1 1 1:1 1
| | |,| ,
, | | |
| | | | | | |
n
n n i i i i k k nn k i
k n k k kk k
k k i i i i ki
k n
k k i i i i k k k n n i i i i k ni i k
p x g y x f x x g y x dx dxp xp x
p x p x g y x f x x g y x dx
p x g y x f x x g y x dx f x x g y x f x x g y x dx
−
+ − +=
+ −
+ −=
− −
+ − + + += = +
= =
=
∏∫
∏∫
∏ ∏∫
yy y
y
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )
1
1 1 1: 11
1
1 1 1:1
| | |
| | | |
k
k k i i i i ki
n
k k n n i i i i k ni k
p x g y x f x x g y x dx
f x x g y x f x x g y x dx
−
+ −=
−
+ + += +
=
∫
∏∫
∏∫
( ) ( )1: 1: 1:| , |k n k k k n kp x p x+ +=y y y
( ) ( )( )
( ) ( ) ( ) ( ) ( )
( )
( ) ( ) ( ) ( )
1
1 1 1:1: 1
1:
1
1 1 1:1
| | | |,|
| | | |
n
k k k n n i i i i k nk n k i k
k n kk k
n
k k n n i i i i k ni k
p x f x x g y x f x x g y x dxp xp x
p x p x
f x x g y x f x x g y x dx
−
+ + ++ = +
+
−
+ + += +
= =
=
∏∫
∏∫
yy
( ) ( ) ( )( )
( ) ( )( )
( ) ( )( )
1: 1:1: 1: 1:
1: 1:
1: 1: 1: 1: 1:
1: 1: 1: 1:
, || | ,
|
| , | | || |
k k n kk n k k k n
k n k
k n k k k k k n k k k
k n k k n k
p xp x p x
p
p x p x p x p xp p
++
+
+ +
+ +
= =
= =
y yy y y
y y
y y y y yy y y y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Bayesian Recursion for the State Space Model Let us return back to our main objective: computing
We will apply sequential Monte Carlo methods to approximate the target distribution.
37
( )1: 1:|n np x y
( ) ( ) ( )
( ) ( )( ) ( )
1: 1: 1 1 1: 1 1: 1
1: 1: 11: 1: 1: 1: 1
1: 1
|
( | )| ( | )
|
n n n n n n
n n n nn n n n n n
n n
Step I - Prediction : p f x x p
g y x pStep II -Update : p g y x p
p y
− − − −
−−
−
=
= ∝
x | y x | y
x | yx y x | y
y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 38
Online Bayesian Parameter Estimation
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static
parameter θ with some prior p(θ):
Given data y1:n, inference now is based on:
We can use standard SMC but on the extended space Zn=(Xn,θn).
Note that θ is a static parameter –does not involve with n.
39
( ) ( )1 1 1 1~ (.) | ~ |n n n n nX and X X x f x xθµ − − −=
( ) ( )| ~ |n n n n nY X x g y xθ=
( ) ( ) ( )
( ) ( ) ( )
1: 1: 1: 1: 1:
1: 1:
, | | | ,
|
n n n n n
n n
p p pwherep p p
θ
θ
θ θ
θ θ
=
∝
x y y x y
y y
( ) ( ) ( ) ( ) ( )11 1| | , | |
nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θ−− −= =
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Maximum Likelihood Parameter Estimation Standard approaches for parameter estimation consists of computing
the Maximum Likelihood (ML) estimate
The likelihood function can be multimodal and there is no guarantee to find its global optimum.
Standard (stochastic) gradient algorithms can be used (e.g. based on Fisher’s identity) to find a local minimum:
These algorithms can work decently but it can be difficult to scale the components of the gradients.
Note that these algorithms involve computing which is one of our main SMC algorithmic results.
40
( )1:arg max logML npθθ = y
( ) ( ) ( )1: 1: 1: 1: 1: 1:log logn n n n n np p , p | dθ θ θ∇ = ∇∫y x y x y x
( )1: 1:n np |θ x y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Expectation/Maximization for HMM One can also use the EM algorithm
Where we used:
Implementation of the EM algorithm requires computing expectations with respect to the smoothing distributions
41
( )( ) ( ) ( )
( ) ( )( ) ( )
( ) ( )( ) ( )
( ) ( )
( )1: 1: ( 1) 1: 1: 1:
1 1 1 ( 1) 1 1: 1
1 ( 1) 1: 1: 1:2
,
, log
log |
log | |
i i
in n i n n n
i n
n
k k k k i k k n k kk
Q
Q p , p | d
x g y x p x | dx
f x x g y x p | d
θ θ
θ
θ
θ θ θ
θ θ
µ
−
−
− − − −=
=
=
=
+
∫∫
∑∫
x y x y x
y
x y x
( )( 1) 1: 1:i k k np |θ − −x y
( ) ( ) ( ) ( )1: 1: 1 12 1
| |n n
n n k k k kk k
p x f x x g y xµ −= =
= ∏ ∏x , y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Closed Form Inference in HMM We have closed-form solutions for finite state-space HMM as all
integrals are becoming finite sums
Linear Gaussian models; all the posterior distributions are Gaussian; (Kalman filter).
In most cases of interest, it is not possible to compute the solution in
closed-form and we need numerical approximations.
This is the case for all non-linear non-Gaussian models.
SMC methods for such problems are in some sense asymptotically consistent.
42
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Closed Form Inference in HMM Gaussian approximations: Extended Kalman filter, Unscented
Kalman filter.
Gaussian sum approximations.
Projection filters (similar to Variational methods in machine learning).
Simple discretization of the state-space.
Analytical methods work in simple cases but are not reliable and it is difficult to diagnose when they fail.
Standard discretization of the space is expensive and difficult to implement in high-dimensional scenarios.
We need numerical approximations.
43
Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 44
Importance Sampling and its Application to Nonlinear Non-
Gaussian Dynamic Models
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling Our goal is to compute an expectation value of the form :
where π(x) is a probability distribution (posterior inference in Bayesian models, Bayesian model validation, etc.)
We assume that where is unknown and γ is
known pointwise.
The basic idea in Monte Carlo methods is to sample N i.i.d. random numbers and build an empirical measure
Using this:
( ) ( ) ( )A
f f dπ π= ∫x x x x
( ) ( ) ( ). . .
( ) ( )
1
1 , where ~ .N i i d
i i
if f
Nππ
=
= ∑x X X
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.
( )( ) xxZ
γπ = ( )Z x dxγ=∫
( ). . .
( ) ~ .i i d
i πX
( ) ( )
1
1i
N
Xi
xN
π δ=∑dx = dx
45
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Monte Carlo Methods Using the approximation of π:
The following hold: Similarly, marginalization is also simple:
In MC, the samples automatically concentrate in regions of high probability mass regardless of the dimension of the space.
However, it is not always easy or effective to sample from the original probability distribution π(x). A more effective strategy is to focus on the regions of “importance” in π(x) so as to save computational resources.
J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.
( )1 2 1: 1 1:1
1( ) ( , ,..., ) ip
N
p p n p p n pXi
x dx x x x d d dxN
π π δ− +=
= = ∑∫ x x
( ) ( ) ( ). . .
( ) ( )
1
1 , where ~ .N i i d
i i
if f
Nππ
=
= ∑x X X
( ) ( )
( ) ( )( )( )
( ) ( )( ) ( )( )( )( )2 21, , ~ 0,f f V f f f N f f f fNπ π π π π ππ π π
= = − − − N
46
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling We assume that π(x) is only known up to a normalizing constant:
For any distribution q(x) such that π(x)>0 q(x)>0, we can write:
The proposal distribution q(x) is known as “importance density” or
“trial density”. w(x) is called the importance weight.
The importance density can be chosen arbitrarily as any proposal easy to sample from:
( ) ( )Z
γπ =
xx
( ) ( ) ( )( ) ( )
( ) ( ) ( ) ( )( )
= , where
Z
w q w qw
Z qw q dγ
π = =∫
x x x x xx x
xx x x
( ) ( ) ( )( )
. . .( )
1
1~ i
Ni i di
iq q d d
Nδ
=
⇒ = ∑ XX x x x x
47
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling Substitution of in the importance sampling identity gives:
Similarly, we can approximate the normalization factor of our target distribution as follows:
( ) ( )( )
1
1i
N
iq d d
Nδ
=
= ∑ Xx x x
( ) ( ) ( )( ) ( )
( ) ( )
( )( )
( )
( )
( )
( )
( )1
( ) 1
1
( ) ( ) ( )
1
1
,1
1
i
i
Ni
Nii
Ni i
iN
i i i
i
w dw q Nd d W dw q d w
N
where W w and W
δπ δ=
=
=
=
= = =
∝ =
∑∑
∫ ∑
∑
X
X
X xx xx x x x
x x x X
X
( )( ) ( ) ( ) ( ) ( ) ( )
( )( )
( )( )
1 1
1 1iN N
ii
i iZ q d w q d w
q N N q
γγ
= =
= = = =∑ ∑∫ ∫Xx
x x x x x Xx X
48
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling
The distribution π(x) is now approximated by a weighted sum of delta masses, where the weights compensate for the discrepancy between π(x) and q(x).
( ) ( ) ( )( )( ) ( ) ( ) ( )
1 1, 1i
N Ni i i i
i id W d where W w and Wπ δ
= =
= ∝ =∑ ∑Xx x x X
49
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling Similarly calculation of using importance sampling gives:
The statistics of this estimate are given for N>>1 as follows:
( ) ( ) ( ) ( )( ) ( )
1
iN
i
Ai
f f d f Wπ
π=
= = ∑∫x x x x X
( )fπ x
( ) ( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )( )2
1
1Negligible Bias
f f W f fN
V f W f fN
π πππ
πππ
= − −
= −
x x X X x
x X X x
50
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling We can similarly compute the statistics of the normalization constant:
They are given as:
( )( ) ( )
( )( ) ( )
( )
( )
( )1 1
1 1i
i
i
N N
i iZ q d w
q N Nq
γγ
= =
= = =∑ ∑∫Xx
x x Xx X
( )( )
2
,
1q
Z Z and
V Z ZN q
γ
= = −
xx
51
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Review: Importance Sampling We select q(x) as close as possible to π(x).
The variance of the weights is bounded iff
In practice, it is sufficient to ensure that the weights are bounded:
This is equivalent to saying that q(x) should have heavier tails than π(x).
( )( )
2
dqγ
< ∞∫x
xx
( ) ( )( )
wqγ
= < ∞x
xx
52
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Monte Carlo for the State Space Model Recall that we are interested to estimate the high-dimensional
density
For now let us start with a fixed n.
A Monte Carlo approximation (empirical measure) of our target distribution is of the form:
For any function , we can use a Monte Carlo
approximation of its expectation as:
53
( ) ( )( ) ( )1: 1:
1: 1: 1: 1:1:
| n nn n n n
n
pp p
p= ∝
x , yx y x , y
y
( ) ( ) ( )( )1:
( )1: 1: 1: 1: 1: 1:
1
1 , ~in
Ni
n n n n n nNi
p where pN
δ=
= ∑ Xx | y x X x | y
( )1: : nnϕ →x X
( ) ( ) ( ) ( )
( ) ( ) ( )
1: 1:
( )1:
1: 1: 1: 1:
( )1: 1: 1: 1:
1 1
1 1
n nNn
in
n
n n n nNp
N Ni
n n n ni i
p d
dN N
ϕ ϕ
ϕ δ ϕ= =
=
= =
∫
∑ ∑∫
x | y
X
x x | y x
x x x X
X
X
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Monte Carlo for the State Space Model This earlier estimate is asymptotically consistent (converges towards ).
The estimate is unbiased and its variance gives the following
convergence properties:
The rate of convergence is independent of n. This does not imply
that Monte Carlo bits the curse of dimensionality since it is possible that increases (with time) n.
54
( ) ( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( ) ( )( )( )
1: 1: 1: 1:1:
1: 1: 1: 1: 1: 1:
1
0,
in n n nn N
n n n n n nN
pp
d
p pp
Var VarN
N Var
ϕ ϕ
ϕ ϕ ϕ
=
− →
x | y x | yX
x | y x | y x | yN
( ) ( )1: 1:n np ϕx | y
( ) ( )1: 1:n npVar ϕx | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Monte Carlo for the State Space Model The Monte Carlo approximation can easily be used to compute any
marginal distribution, e.g.
Note that the marginal likelihood cannot be estimated as easily
using
55
( )1:k np x | y
( )1:np y
( ) ( )
( )
( )
1
( )1:
1
( )
1: 1: 1: 1: 1 1:
1: 1: 1 1:1
1
1
1
n
in
n
ik
k n n n k k nN N
N
n k k ni
N
kXi
p x p d d
d dN
xN
δ
δ
−
−
− +
− +=
=
=
=
=
∫
∑∫
∑
X
| y x | y x x
x x x
X
X
( )( )1: 1: 1:~i
n n npX x | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Difficulties with Standard Monte Carlo Sampling It is difficult to sample from our target high-dimensional distribution:
MCMC methods are not useful in this context.
As n increases, we would like to be able to sample from with an algorithm that keeps the computational cost fixed at each time step n.
56
( )( )1: 1: 1:~i
n n npX x | y
( )1: 1:n np x | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Importance Sampling for our State Space Model Rather than sampling directly from our target distribution ,
we should sample from an importance distribution
Note that in the notation here for q, y1:n is used as a parameter – not to indicate any posterior distribution.
The importance distribution needs to satisfy the following properties: The support of includes the support of i.e.
It is easy to sample from
We use the following identity:
57
( )1: 1:n np x | y( )1: 1:n nq x | y
( )1: 1:n nq x | y ( )1: 1:n np x | y
( ) ( )1: 1: 1: 1:0 0n n n np q> ⇒ >x | y x | y
( )1: 1:n nq x | y
( ) ( )( )
( ) ( ) ( )( ) ( ) ( )
( ) ( )( ) ( )
1: 1: 1: 1: 1: 1:1: 1:1: 1:
1: 1: 1: 1: 1: 1: 1: 1: 1: 1:
1: 1: 1: 1:
1: 1: 1: 1: 1:
n n n n n nn nn n
n n n n n n n n n n
n n n n
n n n n n
p , q qp ,p
p , d p , q q d
w , qw , q d
= =
=
∫ ∫
∫
x y x | y x | yx yx | y
x y x x y x | y x | y x
x y x | yx y x | y x
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Importance Sampling for our State Space Model Let us draw N samples from our importance distribution:
Then using the identity in the earlier slide, we obtain the following approximation of our target distribution:
Note that:
58
( ) ( ) ( )( ) ( )
( ) ( )
( ) ( )
( ) ( )( )
( )1:
( )1:
( )1:
1: 1: 1: 1:1: 1:
1: 1: 1: 1: 1:
1: 1: 1:1
1: 1: 1: 1:1
( )1: 1:( ) ( )
1:( )11: 1:
1
1
1
,
in
in
in
n n n nNn nN
n n n n nN
N
n n ni
N
n n n ni
iNn ni i
n n n Niin n
i
w , qp
w , q d
w ,N
w , dN
w ,W W
w ,
δ
δ
δ
=
=
=
=
=
=
= =
∫
∑
∑∫
∑∑
X
X
X
x y x | yx | y
x y x | y x
x y x
x y x x
X yx
X y
( ) ( ) ( )( )1:
( )1: 1: 1: 1: 1: 1:
1
1~ , in
Nin n n n n nN
iq q
Nδ
=
= ∑ XX x | y x | y x
( ) ( ) ( ) ( )( )1:
( )1: 1: 1: 1: 1: 1: 1:
1 1
1 1in
N Ni
n n n n n n nNi i
p w , d w ,N N
δ= =
= =∑ ∑∫ Xy x y x x X y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Normalized Weights in Importance Sampling We defined earlier the unnormalized weights as follows:
The normalized weights we are also introduced as:
59
( ) ( )( ) ( ) ( )
( )1: 1: 1: 1:
1: 1: 1:1: 1: 1: 1:
n n n nn n n
n n n n
Discrepancy betweentarget distribution andimportance distribution
p , p |Un - normalized weights : w , p
q q= =
x y x yx y y
x | y x | y
( )( )
( )1: 1:( )
( )1: 1:
1
in ni
n Nin n
i
w ,Normalized weights :W
w ,=
=
∑
X y
X y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Optimal Importance Sampling Distribution is an unbiased estimate of with variance:
You can bring this variance to zero with the selection
Of course this is what we wanted to avoid (we want to sample from an easier distribution).
However, this results points to the fact that the choice of q needs to
be as close as possible to the target distribution.
60
( )1:nNp y ( )1:np y
( ) ( )21: 1: 1: 1: 1:
1 1n n n n nw , q | dN − ∫ x y x y x
( ) ( )1: 1: 1: 1:n n n nq p=x | y x | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Importance Sampling Estimates We are interested in an importance sampling approximation of .
This is a biased estimate for a finite N and one can show:
The asymptotic bias is of the order 1/N (negligible) and the MSE
error is:
61
( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )( )
( ) ( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( )
1: 1: 1: 1: 1: 1:
1: 1: 1: 1: 1: 1:
21: 1:
1: 1:1: 1:
2 21: 1:
1: 1:1: 1:
lim
0,
n n n n n nN
n n n n n nN
n np n p npN
n n
dn n
p n p npn n
p |N d
q
p |N d
q
ϕ ϕ ϕ ϕ
ϕ ϕ ϕ ϕ
→∞ − = − −
− → −
∫
∫
x | y x | y x | y
x | y x | y x | y
x yx x
x | y
x yx x
x | yN
( ) ( )1: 1:n np ϕx | y
( )
( )2 1
2
O N O N
MSE bias variance− −
= +
( ) ( ) ( )1: 1:
( ) ( )1:
1n nN
Ni i
n npi
Wϕ ϕ=
=∑x | y X
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Selection of Importance Sampling Distribution As discussed before, the importance sampling distribution should be
selected so that the weights are bounded or equivalently has heavier tails than
To minimize the asymptotic bias, we aim for that is as close as possible to
Note that the selection of the importance sampling needs to be not only such that it covers the support of the target but also needs to be a clever one for the particular problem of interest.
For numerical examples and MatLab implementations please see an earlier lecture on importance sampling.
62
( )1: 1:n nq x | y( )1: 1:n np x | y
( )1: 1: 1:n
n n nw , C≤ ∀ ∈x y x X
( )1: 1:n nq x | y( )1: 1:n np x | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Effective Sample Size In our importance sampling approximation from the target
using the importance distribution (for a fixed n), we would like ideally to have
In this case, all the unnormalized importance weights will be equal
and their variance equal to zero.
To access the quality of the importance sampling approximation, note that for flat functions,
This is often interpreted as the effective sample size (N weighted samples from are approximately equivalent to M unweighted samples from )
63
( )1: 1:n np x | y( )1: 1:n nq x | y
( ) ( )1: 1: 1: 1:n n n nq p=x | y x | y
( ) ( )1: 1: 1: 1:1
n n n nqVariance of IS estimate Var w
Variance of Standard MC estimate≈ + x | y X | y
( )1: 1:n nq x | y( )1: 1:n np x | y
( ) ( )1: 1: 1: 1:1
n n n nq
NM NVar w
= ≤+ x | y X | y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Effective Sample Size We often approximate the effective sample size M as follows:
since
We clearly can see that
We can thus have ESS=1 (one of the weights equal to 1, all other zero, very inefficient) to ESS=N (all weights equal to 1/N, excellent sampling).
64
1( )2
1
Ni
ni
ESS w−
=
= ∑
( ) ( ) ( )1: 1:
( ) 2 ( )1: 1: 1: 1:
1
1 1n n
Ni in n n nq
iVar w w
N =
≈ −∑x | y X | y X | y
1( )2
11
Ni
ni
ESS w N−
=
≤ = ≤
∑
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling Let us return to our state space model and consider a sequential
Monte Carlo approximation of
The distributions are known up to a normalizing constant:
We want to estimate the expectations of functions
and/or the normalizing constants Zn. One can use MCMC to sample from This calculation will
be slow and cannot compute
( ){ }1: 1:n n npπ = x | y
( )1: 1:1:1:
( )( ) n nn nn n
n n
p ,Z Z
γπ = =x yxx
: nnf →X
( ) 1: 1: 1:( ) ( )n n n n n n ndπ ϕ ϕ π= ∫ x x x
{ } , 1, 2..n nπ =
{ } , 1, 2..nZ n =
65
( ) ( )1: 1: 1: 1:n n n np p ,∝x | y x y
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling We want to do these calculations sequentially starting with π1 and Z1
at step (time 1), then proceeding to π2 and Z2, etc.
Sequential Monte Carlo (SMC) provides the means to do so as an alternative algorithm to MCMC.
The key idea is that if πn-1 does not differ a lots from πn, we should be able to reuse our estimate of πn-1 to approximate πn.
66
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling We want to design a sequential importance sampling method to
approximate
Assume that `at time 1’, we have approximations
using an importance density
( )1 1 1| .q x y
{ } { }1 1n nn nand Zπ
≥ ≥
( ) ( ) 11 1 1 1, ,Nx p x y Zπ =
( )
( ) ( ) ( )( )
( )
( ) ( )( )
( )( )
( )1
( )1 1 1 1
( )1 1 1( ) ( )
1 1 1 1 1 1( )1
1 1 11
( )1 1 1 1
1
1 1 1 11 1 1
1 1 1 1 1 1
~ | , 1, 2,...,
,, ,
,
1 ,
,,
| |
i
i
iNi i
N Nji
j
Ni
i
q x y i N
w X yp x y dx W dx where W
w X y
Z w X y withN
x p x yw x y
q x y q x y
δ
γ
=
=
=
=
= =
=
= =
∑∑
∑
X
X
67
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling At `time 2’, we want to approximate
using an importance density
We want to reuse the samples and q1(x1|y1) in building the importance sampling approximation for .
Let us select
To obtain , we need to sample as follows:
The importance sampling weight for this step is then:
( )2 1:2 1:2| .q x y ( ) ( ) 22 1:2 1:2 1:2, ,Np Zπ =x x y
( )1
iX( )2 1:2 2, Zπ x
( ) ( ) ( )2 1:2 1:2 1 1 1 2 2 1:2 1| | | ,q q x y q x x=x y y
( )( )1:2 2 1:2 1:2~ |i qX x y
( )( ) ( ) ( )2 1 2 2 1:2 1| ~ | ,i i iX X q x Xy
( ) ( )( )
( )( ) ( )
( )( )
( )( ) ( ) ( ) ( )
( ) ( )
2 1:2 1:2 1:22 1:2 1:2
2 1:2 1:2 1 1 1 2 2 1:2 1
1 1 1:2 1:2 1:2 1:21 1 1
1 1 1 1 1 2 2 1:2 1 1 1 2 2 1:2 1
,,
| | | ,
, , ,,
| , | , , | ,Incremental weight
pw
q q x y q x x
p x y p pw x y
q x y p x y q x x p x y q x x
γ= = =
= =
x x yx y
x y y
x y x yy y
Weight fromstep1
68
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling The normalized weights for step 2 are then given as:
Generalizing to step n, we can write:
Thus if
we sample Xn from
( ) ( ) ( )( ) ( )
1:2 1:2( )2 2 1:2 1:2 1 1 1
1 1 2 2 1:2 1
,, ,
, | ,i
Incremental weight
pW w w x y
p x y q x x∝ =
x yx y
yWeight fromstep1
( ) ( ) ( )
( ) ( )
1: 1: 1 1: 1 1: 1 1: 1: 1
1 1 1 1: 1: 12
| | ,
| | ,
n n n n n n n n n n
n
k k k kk
q q q x
q x y q x
− − − −
−=
=
= ∏
x y x y | y x
y x
( )( )1: 1 1 1: 1 1: 1~ |i
n n n nq− − − −X x y
( )( ) ( ) ( )1: 1 1: 1: 1~ | ,i i i
n n n n n nX q x− −| X y X
69
Bayesian Scientific Computing, Spring 2013 (N. Zabaras)
Sequential Importance Sampling The weights for step n are then given as:
Similarly the normalized weights are as follows:
For our state space model, the above update formula takes the form:
In general, we may need to store all the paths even if our interest is to only compute
In the next lecture, we will return to the sequential importance sampling algorithm and consider a simplified proposal distribution.
( )1 1: 11: 1
( ) ( ) ( )( ) 1: 1: 1: 1 1: 1 1: 1:1: 1: ( ) ( ) ( ) ( ) ( )
1: 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1:
( )
( )1 1: 1 1: 1
( ) ( ) ( )( )( ) ( | ) ( ) ( )
( )
in nn
i i ii n n n n n n
n n n i i i i in n n n n n n n n n n n
w
in n n
p , p , p ,wq | q p , q X ,
w
− −−
− −
− − − − − −
− − −
= =
=
X ,y
X y X y X yX , yX y X y X y | X y
X , y( )1: 1:
( ) ( ) ( )1: 1 1: 1 1: 1 1:
( )( ) ( )
in n
i i in n n n n n
p ,p , q X ,− − −
X yX y | X y
( ) ( ) ( )1: 1: 1: 1:( ) ( )i i i
n n n n n n nW W w≡ ∝X , y X , y
70
{ }( )1:
inX
( )1:( )n n n nx p xπ = | y
( ) ( ) ( )( ) ( ) 11: 1: 1 1: 1 1: 1 ( ) ( )
1: 1: 1
( ) ( )( ) ( )( )
i i ii i n n n n
n n n n n n i in n n n
f X | X g y | Xw wq X ,
−− − −
−
=X , y X , y| y X
Top Related