Download - Introduction to Sequential Monte Carlo Methodsibilion/...A Doucet, Sequential Monte Carlo Methods and Particle Filters , List of Papers, Codes, and Viedo lectures on SMC and particle

Bayesian Scientific Computing, Spring 2013 (N. Zabaras)

Introduction to Sequential Monte Carlo Methods

Importance Sampling for Nonlinear Non-Gaussian Dynamic Models

Prof. Nicholas Zabaras Materials Process Design and Control Laboratory

Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall

Cornell University Ithaca, NY 14853-3801

Email: [email protected]

URL: http://mpdc.mae.cornell.edu/

April 28, 2013 1

mailto:[email protected]

http://mpdc.mae.cornell.edu/

http://mpdc.mae.cornell.edu/


Contents References and SMC Resources

INTRODUCING THE STATE SPACE MODEL: Discrete time Markov Models, Tracking Problem, Speech Enhancement, the Dynamics of an Asset, The state space model with observations, target distribution and point estimates, particle motion in random field

BAYESIAN INFERENCE IN STATE SPACE MODELS: Bayesian recursion for the state-space model, Forward Filtering, Forward-Backward Filter

ONLINE BAYESIAN PARAMETER ESTIMATION: Online parameter estimation, MLE, EM, Need for numerical approach to HMM

IMPORTANCE SAMPLING: Review of IS, Monte Carlo for the State Space Example, IS for the State Space Model, Optimal Importance Distribution, Effective Sample Size, Sequential IS

2


References C.P. Robert & G. Casella, Monte Carlo Statistical Methods, Chapter 11

J.S. Liu, Monte Carlo Strategies in Scientific Computing, Chapter 3, Springer-Verlag, New York.

A. Doucet, N. De Freitas & N. Gordon (eds), Sequential Monte Carlo in Practice, Springer-Verlag:

2001

A. Doucet, N. De Freitas, N.J. Gordon, An introduction to Sequential Monte Carlo, in SMC in Practice, 2001

D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006

E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006

J.S. Liu and R. Chen, Sequential Monte Carlo methods for dynamic systems, JASA, 1998

A. Doucet, Sequential Monte Carlo Methods, Short Course at SAMSI

A. Doucet, Sequential Monte Carlo Methods & Particle Filters Resources

Pierre Del Moral, Feynman-Kac models and interacting particle systems (SMC resources)

A. Doucet, Sequential Monte Carlo Methods, Video Lectures, 2007

N. de Freitas and A. Doucet, Sequential MC Methods, N. de Freitas and A. Doucet, Video Lectures,

2010

3

http://www.amazon.com/Monte-Statistical-Methods-Springer-Statistics/dp/0387212396

http://books.google.com/books?id=Dk_ou-gqnHQC&dq=Monte+Carlo+Strategies+in+Scientific+Computing&printsec=frontcover&source=bn&hl=en&ei=3NpyS_7cJI6vtgejn8GFCg&sa=X&oi=book_result&ct=result&resnum=4&ved=0CBUQ6AEwAw

http://www.amazon.com/Sequential-Practice-Statistics-Engineering-Information/dp/0387951466

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/doucet_defreitas_gordon_smcbookintro.pdf

http://www.stats.ox.ac.uk/~doucet/doucet_defreitas_gordon_smcbookintro.pdf

http://www.amazon.com/Stochastic-Modelling-Systems-Mathematical-Computational/dp/1439837724




http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3020138/

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/zpq18438.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/liu&chen98_2.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/mainbras.pdf

http://www.stats.ox.ac.uk/~doucet/


http://www.stats.ox.ac.uk/~doucet/samsi_course.html




http://www.stats.ox.ac.uk/~doucet/smc_resources.html



http://www.math.u-bordeaux1.fr/~pdelmora/research.html

http://www.math.u-bordeaux1.fr/~pdelmora/simulinks.html








http://videolectures.net/mlss07_doucet_smcm/

http://videolectures.net/mlss07_doucet_smcm/

http://videolectures.net/nips09_doucet_freitas_smc/

http://videolectures.net/nips09_doucet_freitas_smc/


References M.K. Pitt and N. Shephard, Filtering via Simulation: Auxiliary Particle Filter, JASA, 1999

A. Doucet, S.J. Godsill and C. Andrieu, On Sequential Monte Carlo sampling methods for Bayesian

filtering, Stat. Comp., 2000

J. Carpenter, P. Clifford and P. Fearnhead, An Improved Particle Filter for Non-linear Problems, IEE 1999.

A. Kong, J.S. Liu & W.H. Wong, Sequential Imputations and Bayesian Missing Data Problems, JASA, 1994

O. Cappe, E. Moulines & T. Ryden, Inference in Hidden Markov Models, Springer-Verlag, 2005

W Gilks and C. Berzuini, Following a moving target: MC inference for dynamic Bayesian Models, JRSS B, 2001

G. Poyadjis, A. Doucet and S.S. Singh, Maximum Likelihood Parameter Estimation using Particle Methods, Joint Statistical Meeting, 2005

N Gordon, D J Salmond, AFM Smith, Novel Approach to nonlinear non Gaussian Bayesian state estimation, IEE, 1993

Particle Filters, S. Godsill, 2009 (Video Lectures)

R. Chen and J.S. Liu, Predictive Updating Methods with Application to Bayesian Classification, JRSS B, 1996

4

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/PittShephard99.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/doucet00b.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/doucet00b.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/Carpenter99.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/klw94.pdf


http://www.ime.usp.br/ebp/ebp13/mainbras.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/MonteCarlo-DBNs.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/poyiadjis_doucet_singh_parameterestimationparticlesJM05.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/poyiadjis_doucet_singh_parameterestimationparticlesJM05.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/Gordonnovelapproach.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/Gordonnovelapproach.pdf

http://videolectures.net/mlss09uk_godsill_pf/

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/2345984.pdf


References C. Andrieu and A. Doucet, Particle Filtering for Partially Observed Gaussian State-Space Models,

JRSS B, 2002

R Chen and J Liu, Mixture Kalman Filters, JRSSB, 2000

A Doucet, S J Godsill, C Andrieu, On SMC sampling methods for Bayesian Filtering, Stat. Comp. 2000

N. Kantas, A.D., S.S. Singh and J.M. Maciejowski, An overview of sequential Monte Carlo methods for parameter estimation in general state-space models, in Proceedings IFAC System Identification (SySid) Meeting, 2009

C. Andrieu, A.Doucet & R. Holenstein, Particle Markov chain Monte Carlo methods, JRSS B, 2010

C. Andrieu, N. De Freitas and A. Doucet, Sequential MCMC for Bayesian Model Selection, Proc. IEEE Workshop HOS, 1999

P. Fearnhead, MCMC, sufficient statistics and particle filters, JCGS, 2002

G. Storvik, Particle filters for state-space models with the presence of unknown static parameters, IEEE Trans. Signal Processing, 2002

5

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/1467-9868.00363.pdf



http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/chen&liu00.pdf




http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/doucet_godsill_andrieu_sequentialmontecarloforbayesfiltering.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/kantas_doucet_singh_maciejowski_tutorialparameterestimation.pdf




http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/andrieu_doucet_holenstein_PMCMC.pdf


http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/10.1.1.36.2536.pdf

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.4619









References C. Andrieu, A. Doucet and V.B. Tadic, Online EM for parameter estimation in nonlinear-non

Gaussian state-space models, Proc. IEEE CDC, 2005

G. Poyadjis, A. Doucet and S.S. Singh, Particle Approximations of the Score and Observed Information Matrix in State-Space Models with Application to Parameter Estimation, Biometrika, 2011

C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for

Genome Wide Transcript Analysis, Technical report 2008

R. Martinez-Cantin, J. Castellanos and N. de Freitas. Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM. International Conference on Robotics and Automation

C. Andrieu, A.D. & R. Holenstein, Particle Markov chain Monte Carlo methods (with discussion), JRSS B, 2010

A Doucet, Sequential Monte Carlo Methods and Particle Filters, List of Papers, Codes, and Viedo lectures on SMC and particle filters

Pierre Del Moral, Feynman-Kac models and interacting particle systems

6

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/andrieu_doucet_tadic_onlineparameterestimationCDC2005.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/andrieu_doucet_tadic_onlineparameterestimationCDC2005.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/poyiadjis_doucet_singh_particlescoreparameterestimation.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/poyiadjis_doucet_singh_particlescoreparameterestimation.pdf








http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/icra2007.pdf






http://www.math.u-bordeaux1.fr/~pdelmora/research.html







References P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo samplers, JRSSB, 2006

P. Del Moral, A. Doucet and A. Jasra, Sequential Monte Carlo for Bayesian Computation, Bayesian

Statistics, 2006

P. Del Moral, A. Doucet & S.S. Singh, Forward Smoothing using Sequential Monte Carlo, technical report, Cambridge University, 2009

P. Del Moral, Feynman-Kac Formulae, Springer-Verlag, 2004

Sequential MC Methods, M. Davy, 2007 A Doucet, A Johansen, Particle Filtering and Smoothing: Fifteen years later, in Handbook of

Nonlinear Filtering (edts D Crisan and B. Rozovsky), Oxford Univ. Press, 2011

A. Johansen and A. Doucet, A Note on Auxiliary Particle Filters, Stat. Proba. Letters, 2008.

A. Doucet et al., Efficient Block Sampling Strategies for Sequential Monte Carlo, (with M. Briers & S. Senecal), JCGS, 2006.

C. Caron, R. Gottardo and A. Doucet, On-line Changepoint Detection and Parameter Estimation for Genome Wide Transcript Analysis, Stat Comput. 2011.

7

http://onlinelibrary.wiley.com/doi/10.1111/j.1467-9868.2006.00553.x/pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/j.1467-9868.2006.00553.x.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/delmoral_doucet_jasra_sequentialmontecarloforbayesiancomputation_valenciaalmostfinal.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/1012.5390v1.pdf

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/1012.5390v1.pdf

http://www.springer.com/mathematics/probability/book/978-0-387-20268-6





http://videolectures.net/mlss07_davy_smcmc/



http://www.cs.ubc.ca/~arnaud/doucet_johansen_tutorialPF.pdf



http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/doucet_johansen_tutorialPF.pdf

http://www.sciencedirect.com/science/article/pii/S0167715208000035

http://www.sciencedirect.com/science/article/pii/S0167715208000035

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/1-s2.0-S0167715208000035-main.pdf

http://www.jstor.org/discover/10.2307/27594204?uid=3739832&uid=2&uid=4&uid=3739256&sid=21102207562797






http://www.stats.ox.ac.uk/~doucet/caron_doucet_gottardo_onlinechangepointdetectionparameterestimation.pdf





http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/caron_doucet_gottardo_onlinechangepointdetectionparameterestimation.pdf








Introduction Sequential Monte Carlo (SMC) methods are used to approximate

any sequence of probability distributions.

They are used often in physics Compute eigenvalues of positive operators Compute free energies Solve differential or integral equations Simulate polymer chains Etc.

Hidden Markov Models (HMM) are used in these notes and most tutorials for introducing SMC – but SMC is clearly a method for a much bigger class of problems.

In HMM, SMC methods are often known is Particle Filtering or Smoothing Methods.

8

Bayesian Scientific Computing, Spring 2013 (N. Zabaras) 9

Introducing the State Space Model


Discrete- Time Markov Model Consider a discrete-time Markov process : {Xn},n≥1

It is defined by an initial density and a transition

density:

We then can write (prior distribution of the states):

10

1 ~ (.)X µ

( ) ( )1| ~ |n nX X x f x− = ⋅

( ) ( ) ( )1: 1 1 12

, , ( ) |n

n n k kk

p p x x x f x xµ −=

≡ = ∏x

x1 x2 xn ... Markov Chain


Tracking Example Consider tracking a target in the XY plane (location/speed in x-y):

We consider the constant velocity model:

The transition density for this model is then:

11

( ),1 ,1 ,2 ,2, , ,T

k k k k kX X V X V=

1

3 2

22

, ~ (0, )0 1

,0 0 1

0 3 2,0

2

k k k k

CVCV

CV

CVCV

CV

X XT

T T

T Tσ

−= +

= =

= =

A W WA

A AA

N Σ

ΣΣ Σ

Σ

( )1 1| ( ; , )k k k kf x x x x− −= AN Σ


Speech Enhancement We model speech signals as an autoregressive (AR) process, i.e.

We can write this in a matrix form as follows:

The transition density is now:

12

2

1, ~ (0, )

d

k i k i k k si

S S V Vα σ−=

= +∑ N

( )1

1 2

, ,....,

... 11 0

,... :

1 0

Tk k k k k k d

d

U U V U S S

α α α− −= + =

= =

A B

A B

( ) ( ) ( ) ( )( )1 1: 1

21 1 1 2:

| ( ; , )k d

U k k k k s ku df u u u u uσ δ

− −− −= AN


Speech Enhancement We can also consider the AR coefficients to be time dependent:

Thus for non-stationary speech signals, we can write:

The process Xk=(αk,Uk) is a Markov with transition density

13

( )

21

,1 ,

, ~ (0, ), :

,...,k k k k d

Tk k k d

W W I whereαα α σ

α α α

−= +

=

N

( ) 21 1| ( ; , )k k k k df Iα αα α α α σ− −= N

( ) ( ) ( ) ( )( )

( ) ( )

1 1: 1

2 21 1 1 1 2:

1

1 ,1 ,1

1

| ( ; , ) ( ; , )

,..., :

k dk k k k d k k k s ku d

kT

k k k k d

k d

f x x I u u u

withS

A uS

αα α σ σ δ

α α

− −− − −

−

−

− −

=

=

AN N


The Dynamics of an Asset The Heston model (1993) describes the dynamics of an

asset price St with the following model for Xt = log(St)

dXt = μdt + dWt + dZt where Zt is a jump process, and dWt Brownian motion.

We approximate this (time integration) by a discrete-time

Markov process Xt+δ = Xt + δμ +Wt+δ,t + Zt+δ,t

The same model is used for biochemichal networks, disease and population dynamics, etc.

14

D. Wilkison, Stochastic Modelling for Systems Biology, Second Edition, 2006 E. Ionides, Inference for Nonlinear Dynamical Systems, PNAS, 2006

http://en.wikipedia.org/wiki/Heston_model


http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3020138/

http://mpdc.mae.cornell.edu/Courses/MAE714/Papers/zpq18438.pdf


Example: The State Space Model Let us discuss in some detail a very popular dynamic system, the state

space model, now including observations

A state space model is an extension of a Markov Chain which is able to capture the sequential relations among hidden variables.

It is a dynamic system including two major parts

x1 x2 xn ...

y1 y2 yn ... observations

Markov Chain

15


The State Space Model The two parts can be expressed by equations

state equation: {Xn},n≥1 is a latent/hidden Markov process with

observation equation: {Yn},n≥1 is an observation process with the observations being conditionally independent given {Xn},n≥1

The observations {yn} are conditionally independent, i.e. given the Markov states {xn}, and are independent. Thus the likelihood is

Our aim is to recover {Xn},n≥1 given {Yn},n≥1.

( ) ( )1 1 1 1~ (.) | ~ |n n n nX and X X x f xµ − − −= ⋅

( ) ( )| ~ |n n n nY X x g x= ⋅

( ) ( )1 11

, , | , , |n

n n i ii

p y y x x g y x=

=∏

( )|i ig y x ( )|j jg y x

16


The State Space Model: Examples A Linear Gaussian State Space Model

A Stochastic Volatility Model

1 1 1 1

. . . . . .

~ ( , ),

~ (0, ) ~ (0, )

−Σ = +

= +

Σ Σ

n n n

n n ni i d i i d

n v n w

X m and X AX BVY CX DW where

V and W

N

N N

( )

2

1 12

. . . . . .2

~ 0,1

exp / 2 ,

| | 1, ~ (0, ) ~ (0,1)

σ αα

β

α σ

−

= + −

=

<

n n n

n n n

i i d i i d

n n

X and X X V

Y X W where

V and W

N

N N

17

( )21~ ,n nx xα σ−N ( ) ( )( )2| ;0, expn n n ng y x y xβ= N


Tracking Example The simplest linear model is of the form:

The non-linear version (Bearings-only-tracking) is more popular:

Note that the mean of the Gaussian is a highly non-linear function of the state.

18

. . ., ~ (0, )

( | ) ( ; , )

i i d

k k k k e

k k k k e

Y CX DE Eg y x y CX

+ Σ ⇒= Σ

= NN

. . .,21 2

,1

,21 2

,1

tan , ~ (0, )

( | ) ; tan ,

i i dk

k k kk

kk k k

k

XY E E

X

xg y x y

x

σ

σ

−

−

= + ⇒

=

N

N


The State Space Model At time n, we have a total of n observations and the target distribution to

be estimated is

The target distribution is “time-varying”. The posterior distribution should be updated after new observations are added. Thus we need to estimate a sequence of distributions according to the time sequence

( )1: 1:|n np x y

x1 x2 xn ...

y1 y2 yn ... observations

Markov Chain

( )1 1|p x y ( )1: 1:|n np x y Target Distribution ( )1:2 1:2|p x y

19

( ) ( )

( ) ( ) ( )

1 11

1: 1 12

: , , | , , |

|

n

n n i ii

n

n k kk

Likelihood p y y x x g y x

Prior : p x f x xµ

=

−=

=

=

∏

∏

x


Bayesian Inference in State Space Models


Bayesian Inference in State- Space Models In Bayesian estimation, the target distribution (posterior) for such

state-space model is

The state equation for the Markov process defines a prior as

The observation equation defines the likelihood as

The posterior distribution is known up to a normalizing constant

( )1: 1:|n np x y

( ) ( ) ( )1: 1 12

|n

n k kk

p x f x xµ −=

= ∏x

( ) ( )1: 1:1

| |n

n n k kk

p g y x=

=∏y x

( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( )

( ) ( ) ( )

1: 1:1: 1: 1: 1: 1: 1: 1: 1 1

2 11: Pr

1: 1: 1: 1: 1:

| | | |

... |

µ −= =

= ∝ = =

=

∏ ∏

∫ ∫

n nn n

n n n n n n n k k k kk kn ior Likelihood

n n n n n

pp p p p x f x x g y x and

p

p p p d

x , yx y x , y x y x

y

y x y x x

21


Bayesian Inference in State- Space Models In this lecture, our target distribution is as follows:

The posterior and marginal likelihood don’t admit close forms unless

{Xn} and {Yn} follow linear Gaussian eqs or when {Xn} takes values in a finite state space.

( ) ( ) ( ) ( ) ( ) ( )1:1: 1: 1: 1: 1: 1: 1:| , ,n n

n n n n n n n n n nn

p p Z pZ

γπ γ= = = =

xx x y x x , y y

22


Bayesian Inference in State- Space Models From the posterior distribution, one can compute useful point

estimates

One can also compute the MAP estimate for components

The posterior mean (minimum mean square estimate) can also be estimated as:

( )1: 1:arg max |n np x y

23

( )1:arg max |k np x y

( ) ( )1: 1: 1: 1: 1 1:| ... |k n n n k k np x p d d− += ∫ ∫y x y x x

[ ] ( )1: 1:| |k n k k n kX x p x dx= ∫y y


Particle Motion in Random Medium Consider a Markovian particle evolving in a random

medium as follows:

At time n, the probability for the particle to be killed is given as 1-g(Xn), where 0≤g(x)≤1 for any

Let T be the time at the which the particle is killed. We want to compute the probability Pr(T>n).

{ }, 1nX n ≥

( ) ( )1 1~ (.) | ~ |n nX and X X x f xµ + = ⋅

.x E∈

24


Particle Motion in Random Medium Starting from t=1, given the current state x1, the probability for the

particle to survive is given as g(x1).

Thus, the joint probability {particle at state x1, particle survive} is

By integration on x1, the probability that a particle survives at time t=1 is

( ) ( )1 1 1x g x dxµ∫

( ) ( )1 1x g xµ

25


Particle Motion in Random Medium At t=2, given the state x1, the current state x2 is determined by the

transition probability

The probability for such a particle to survive at time t=2 is also determined by current state x2, i.e. the probability is g(x2)

If the particle survives at time t=2, it means 1. at time 1, the particle survives with probability g(x1)

2. state x1 determines the current state x2 with probability f(x2|x1) 3. the probability to survive at time t=2 is g(x2)

( ) ( ) ( ) ( )1 2 1 1 2|x f x x g x g xµ

( )2 1|f x x

The joint probability for the three events is

( ) ( )1 2 1|x f x xµ determines the random states ( ) ( )1 2g x g x determines the probability to survive at each time

26


Particle Motion in Random Medium This can be considered as a typical Hidden Markov Model

The probability density for the particle to survive at time t=n is

By integration over the state variables xk, we obtain the probability for the particle to survive at time t=n

Markov Chain (state equation)

( )1~ |k k kx f x x −

Survive (observation equation) ( )~k ky g x

( ) ( ) ( )1 12 1

|n n

k k kk k

x f x x g xµ −= =

⋅∏ ∏

[ ]

( ) ( ) ( )

1:

1 1 1:2 1

Pr( ) Pr

|

n

n n

k k k nk k

T n E obability of not being killed given X

x f x x g x d

µ

µ −= =

> = =

= ⋅∏ ∏∫ x

27


Particle Motion in Random Medium

To place this calculation in our SMC framework, we define the following:

Then the integration needed to compute the required probability is just the normalization constant of , i.e.

28

( )1:n nxγ

( ) ( ) ( )

( ) ( )

1 1 1:2 1

1:1:

|

Pr( )

n n

n k k k nk k

n nn n

n

n

Z x f x x g x d

andZ

Z T n

µ

γπ

−= =

= ⋅

=

= >

∏ ∏∫ x

xx

[ ]

( ) ( ) ( )

1:

1 1 1:2 1

Pr( ) Pr

|

n

n n

k k k nk k

T n E obability of not being killed given X

x f x x g x d

µ

µ −= =

> = =

= ⋅∏ ∏∫ x

( ) ( ) ( ) ( )1: 1 12 1

|n n

n n k k kk k

x f x x g xγ µ −= =

= ⋅∏ ∏x


Bayesian Recursion Fomulas for the State Space Model


Let us return to our state space model where the objective is to compute . We want to calculate this sequentially.

We can write the following recursion equation:

where the prediction of yn given y1:n-1 is:

We can write our update equation above in two recursive steps:

( ) ( ) ( )( ) ( )

1: 1 1: 1 1: 1

1 1: 1 1: 1 1 1: 1 1:

| , ( | )

( | ) , ( | ) ( | )

n n n n n n n n n n n

n n n n n n n n n n n n n n n

p y p y x dx g y x p x dx

g y x p x x d g y x f x x p x d

− − −

− − − − − − −

= =

= =

∫ ∫∫ ∫

y | y | y

| y x | y x

Bayesian Recursion for the State Space Model

30

( )1: 1:|n np x y

( ) ( ) ( )( ) ( ) ( ) ( )

( )( )( ) ( )

( ) ( ) ( ) ( ) ( )1:

1: 1: 1: 1: 1: 1: 11: 1: 1: 1 1: 1 1: 1 1: 1

1: 1 1: 1 1: 1 1: 1 1: 1 1:

Pr :

1 1: 1 1: 11 1: 1 1: 1

1: 1

| |

( | ) |1( | ) ||

n n n n n nn n n n n n

n n n n n n

edictive p

n n n n n nn n n n n n

n n

p p p pp p p

p p p p

g y x f x x pg y x f x x p

p y

−− − − −

− − − − −

− − −− − −

−

= =

= =

x

x , y y x , y yx y x y x , y

x , y y x , y y

x | yx , y

y

( )

( )

1: 1

1: 1|

n n

n np y

−

−

| y

y

( ) ( ) ( )

( ) ( )( ) ( )

1: 1: 1 1 1: 1 1: 1

1: 1: 11: 1: 1: 1: 1

1: 1

|

( | )| ( | )

|

n n n n n n

n n n nn n n n n n

n n

Step I - Prediction : p f x x p

g y x pStep II -Update : p g y x p

p y

− − − −

−−

−

=

= ∝

x | y x | y

x | yx y x | y

y


Prediction- Updating for the Marginal A two-step prediction/update for the marginal (filtering distributions)

can also be easily derived.

where: This recursion leads to the Kalman filter and the standard HMM filter

for linear Gaussian models.

Note that in the context of SMC these are not directly useful results. Our key emphasis remains in the calculation of even if our interests are in computing

31

( )1:|n np x y

( ) ( )( ) ( )( ) ( )

( ) ( ) ( )( )

1: 1 1: 1: 1 1

1 1: 1 1 1: 1 1

1 1 1: 1 1

1: 11: 1: 1

1: 1

( | ),

|

n n n n n n

n n n n n n

n n n n n

n n n nn n n n n

n n

Step I - Prediction : p x p dx

p x x , p x dx

f x x p x dx

g y x p xStep II -Update : p x p x y

p y

− − − −

− − − − −

− − − −

−−

−

=

=

=

= =

∫∫∫

| y x | y

| y | y

| | y

| y| y | y

y

( )1: 1:|n np x y

( ) ( )1: 1 1 1 1: 1 1:| ( | ) ( | )n n n n n n n n n np y g y x f x x p x d− − − − −= ∫y | y x

( ){ }1:n np x | y


To compute the normalizing factor ., one can use recursive calculation avoiding high dimensional integration.

To compute , we use the following recursion:

We can now see that the calculation of is a product of lower dimensional integrals.

( ) ( ) ( )( ) ( )

1: 1 1: 1 1: 1

1 1: 1 1: 1 1 1: 1 1:

| , ( | )

( | ) , ( | ) ( | )

k k k k k k k k k k k

k k k k k k k n n k k k k k k

p y p y x dx g y x p x dx

g y x p x x d g y x f x x p x d

− − −

− − − − − − −

= =

= =

∫ ∫∫ ∫

y | y | y

| y x | y x

Recursive Calculation of the Marginal

32

( )1:np y

( ) ( )1: 1 1: 12

( ) |n

n k kk

p p y p y −=

= ∏y y

( )1: 1|k kp y −y

( )1:np y

( )1:np y


Forward Filtering Backward Smoothing One can also estimate the marginal smoothing distribution (an offline estimate once all measurements

are collected)

Indeed, one can show:

Here we used (see Appendix next)

33

( )1:| , 1,...,k np x k n=y

( ) ( )

( ) ( )( ) ( )

1: 1 1:

1 1:1: 1 1: 1

1 1:

, , 1, 2,...,( )

( | )|

k k k k

k k k kk n k n k

k k

I - Forward pass : Compute and store p x p x k nuse the prediction and update recursions from an earlier slide

f x x p xII - Backward pass (k = n -1,n - 2,..,1) : p x p x dx

p x

+

++ +

+

=

= ∫

| y | y

| y| y | y

y

( ) ( ) ( )

( ) ( )( ) ( )

1: 1 1: 1 1 1: 1 1: 1

1 1:1 1: 1 1: 1 1 1: 1

1 1:

, ( | , )

( | )( | , )

|

k n k k n k k k n k n k

k k k kk k k k n k k n k

k k

p x p x x dx p x x p x dx

f x x p xp x x p x dx p x dx

p x

+ + + + +

++ + + + +

+

= =

= =

∫ ∫

∫ ∫

| y | y y | y

| yy | y | y

y

( )1 1: 1 1:( | , ) | ,k k n k k kp x x p x x+ +=y y

1:ny


Appendix Here we prove the Eq. used in the earlier slide:

Note that:

34

( )1 1: 1 1:( | , ) | ,k k n k k kp x x p x x+ +=y y

( ) ( )( )

( )( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( )

1:n 1:n 1: 1 2:1 1:n1 1:n

1 1:n 1:n 1:n 1: 2:

1

1 1 1: 1 2:1

1

1 1 1: 2:1

1 11

,, ,| ,

, ,

| | |

| | |

| |

k k nk kk k

k k k n

n

i i i i n n k k nin

i i i i n n k k nik

i i i ii

p x dx dxp x xp x x

p x p x dx dx

p x f x x g y x g y x dx dx

p x f x x g y x g y x dx dx

p x f x x g y x

− +++

+ +

−

+ − +=−

+ +=

+=

= =

⋅=

⋅

=

∫∫

∏∫

∏∫

∏

yyy

y y

( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( )

( )( ) ( )

1

1: 1 1 2:1

1

1 1 1: 1 2:1 1

1 1:1 1:k

1 1:

| | |

| | | | |

, ,| ,

,

n

k i i i i n n k ni k

k n

i i i i k i i i i n n k ni i k

k k kk k

k k

dx f x x g y x g y x dx

p x f x x g y x dx f x x g y x g y x dx

p x xp x x

p x

−

− + += +−

+ + += = +

++

+

= =

∏∫ ∫

∏ ∏∫ ∫

yy

y


Forward- Backward (Two- Filter) Smoother One can also estimate the marginal smoothing distribution as follows

(see proof on the following slide):

Note that we can have: . This can lead to wrong algorithms.

This is known as the forward-backward smoother.

35

( )1:| , 1,...,k np x k n=y

( ) ( )( ) ( )( ) ( )

( ) ( )( )

1: 1: 1 1

1: 1 1 1

2: 1 1 1 1 1

1: 1:1:

1: 1:

| , |

| , |

| ( | ) |

( | )|

k n k k n k k k

k n k k k k k

k n k k k k k k

k k k n kk n

k n k

Step I - Backward information filter : p x p x x dx

p x x f x x dx

p x g y x f x x dx

p x p xStep II -Update : p x

p

+ + + +

+ + + +

+ + + + + +

+

+

=

=

=

=

∫∫∫

y y

y

y

y y || y

y y

( )1: |k n k kp x dx+ = ∞∫ y


Proof of the Two- Filter Smoother Note that: . We can look at each term

separately:

The update rule is then:

36

( ) ( )( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )

1

1 1 1: 1 1:1: 1

1: 1: 11:

1 1 1: 11

1 1

1 1 1: 1 1 1 1:1 1

| | |,| ,

, | | |

| | | | | | |

n

n n i i i i k k nn k i

k n k k kk k

k k i i i i ki

k n

k k i i i i k k k n n i i i i k ni i k

p x g y x f x x g y x dx dxp xp x

p x p x g y x f x x g y x dx

p x g y x f x x g y x dx f x x g y x f x x g y x dx

−

+ − +=

+ −

+ −=

− −

+ − + + += = +

= =

=

∏∫

∏∫

∏ ∏∫

yy y

y

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )

1

1 1 1: 11

1

1 1 1:1

| | |

| | | |

k

k k i i i i ki

n

k k n n i i i i k ni k

p x g y x f x x g y x dx

f x x g y x f x x g y x dx

−

+ −=

−

+ + += +

=

∫

∏∫

∏∫

( ) ( )1: 1: 1:| , |k n k k k n kp x p x+ +=y y y

( ) ( )( )

( ) ( ) ( ) ( ) ( )

( )

( ) ( ) ( ) ( )

1

1 1 1:1: 1

1:

1

1 1 1:1

| | | |,|

| | | |

n

k k k n n i i i i k nk n k i k

k n kk k

n

k k n n i i i i k ni k

p x f x x g y x f x x g y x dxp xp x

p x p x

f x x g y x f x x g y x dx

−

+ + ++ = +

+

−

+ + += +

= =

=

∏∫

∏∫

yy

( ) ( ) ( )( )

( ) ( )( )

( ) ( )( )

1: 1:1: 1: 1:

1: 1:

1: 1: 1: 1: 1:

1: 1: 1: 1:

, || | ,

|

| , | | || |

k k n kk n k k k n

k n k

k n k k k k k n k k k

k n k k n k

p xp x p x

p

p x p x p x p xp p

++

+

+ +

+ +

= =

= =

y yy y y

y y

y y y y yy y y y


Bayesian Recursion for the State Space Model Let us return back to our main objective: computing

We will apply sequential Monte Carlo methods to approximate the target distribution.

37

( )1: 1:|n np x y

( ) ( ) ( )

( ) ( )( ) ( )

1: 1: 1 1 1: 1 1: 1

1: 1: 11: 1: 1: 1: 1

1: 1

|

( | )| ( | )

|

n n n n n n

n n n nn n n n n n

n n

Step I - Prediction : p f x x p

g y x pStep II -Update : p g y x p

p y

− − − −

−−

−

=

= ∝

x | y x | y

x | yx y x | y

y


Online Bayesian Parameter Estimation


Online Bayesian Parameter Estimation Assume that our state model is defined with some unknown static

parameter θ with some prior p(θ):

Given data y1:n, inference now is based on:

We can use standard SMC but on the extended space Zn=(Xn,θn).

Note that θ is a static parameter –does not involve with n.

39

( ) ( )1 1 1 1~ (.) | ~ |n n n n nX and X X x f x xθµ − − −=

( ) ( )| ~ |n n n n nY X x g y xθ=

( ) ( ) ( )

( ) ( ) ( )

1: 1: 1: 1: 1:

1: 1:

, | | | ,

|

n n n n n

n n

p p pwherep p p

θ

θ

θ θ

θ θ

=

∝

x y y x y

y y

( ) ( ) ( ) ( ) ( )11 1| | , | |

nn n n n n n n n nf z z f x x g y z g y xθ θ θδ θ−− −= =


Maximum Likelihood Parameter Estimation Standard approaches for parameter estimation consists of computing

the Maximum Likelihood (ML) estimate

The likelihood function can be multimodal and there is no guarantee to find its global optimum.

Standard (stochastic) gradient algorithms can be used (e.g. based on Fisher’s identity) to find a local minimum:

These algorithms can work decently but it can be difficult to scale the components of the gradients.

Note that these algorithms involve computing which is one of our main SMC algorithmic results.

40

( )1:arg max logML npθθ = y

( ) ( ) ( )1: 1: 1: 1: 1: 1:log logn n n n n np p , p | dθ θ θ∇ = ∇∫y x y x y x

( )1: 1:n np |θ x y


Expectation/Maximization for HMM One can also use the EM algorithm

Where we used:

Implementation of the EM algorithm requires computing expectations with respect to the smoothing distributions

41

( )( ) ( ) ( )

( ) ( )( ) ( )

( ) ( )( ) ( )

( ) ( )

( )1: 1: ( 1) 1: 1: 1:

1 1 1 ( 1) 1 1: 1

1 ( 1) 1: 1: 1:2

,

, log

log |

log | |

i i

in n i n n n

i n

n

k k k k i k k n k kk

Q

Q p , p | d

x g y x p x | dx

f x x g y x p | d

θ θ

θ

θ

θ θ θ

θ θ

µ

−

−

− − − −=

=

=

=

+

∫∫

∑∫

x y x y x

y

x y x

( )( 1) 1: 1:i k k np |θ − −x y

( ) ( ) ( ) ( )1: 1: 1 12 1

| |n n

n n k k k kk k

p x f x x g y xµ −= =

= ∏ ∏x , y


Closed Form Inference in HMM We have closed-form solutions for finite state-space HMM as all

integrals are becoming finite sums

Linear Gaussian models; all the posterior distributions are Gaussian; (Kalman filter).

In most cases of interest, it is not possible to compute the solution in

closed-form and we need numerical approximations.

This is the case for all non-linear non-Gaussian models.

SMC methods for such problems are in some sense asymptotically consistent.

42


Closed Form Inference in HMM Gaussian approximations: Extended Kalman filter, Unscented

Kalman filter.

Gaussian sum approximations.

Projection filters (similar to Variational methods in machine learning).

Simple discretization of the state-space.

Analytical methods work in simple cases but are not reliable and it is difficult to diagnose when they fail.

Standard discretization of the space is expensive and difficult to implement in high-dimensional scenarios.

We need numerical approximations.

43

http://en.wikipedia.org/wiki/Extended_Kalman_filter

http://en.wikipedia.org/wiki/Kalman_filter

http://en.wikipedia.org/wiki/Kalman_filter


Importance Sampling and its Application to Nonlinear Non-

Gaussian Dynamic Models


Review: Importance Sampling Our goal is to compute an expectation value of the form :

where π(x) is a probability distribution (posterior inference in Bayesian models, Bayesian model validation, etc.)

We assume that where is unknown and γ is

known pointwise.

The basic idea in Monte Carlo methods is to sample N i.i.d. random numbers and build an empirical measure

Using this:

( ) ( ) ( )A

f f dπ π= ∫x x x x

( ) ( ) ( ). . .

( ) ( )

1

1 , where ~ .N i i d

i i

if f

Nππ

=

= ∑x X X


( )( ) xxZ

γπ = ( )Z x dxγ=∫

( ). . .

( ) ~ .i i d

i πX

( ) ( )

1

1i

N

Xi

xN

π δ=∑dx = dx

45



Monte Carlo Methods Using the approximation of π:

The following hold: Similarly, marginalization is also simple:

In MC, the samples automatically concentrate in regions of high probability mass regardless of the dimension of the space.

However, it is not always easy or effective to sample from the original probability distribution π(x). A more effective strategy is to focus on the regions of “importance” in π(x) so as to save computational resources.


( )1 2 1: 1 1:1

1( ) ( , ,..., ) ip

N

p p n p p n pXi

x dx x x x d d dxN

π π δ− +=

= = ∑∫ x x

( ) ( ) ( ). . .

( ) ( )

1

1 , where ~ .N i i d

i i

if f

Nππ

=

= ∑x X X

( ) ( )

( ) ( )( )( )

( ) ( )( ) ( )( )( )( )2 21, , ~ 0,f f V f f f N f f f fNπ π π π π ππ π π

= = − − − N

46



Review: Importance Sampling We assume that π(x) is only known up to a normalizing constant:

For any distribution q(x) such that π(x)>0 q(x)>0, we can write:

The proposal distribution q(x) is known as “importance density” or

“trial density”. w(x) is called the importance weight.

The importance density can be chosen arbitrarily as any proposal easy to sample from:

( ) ( )Z

γπ =

xx

( ) ( ) ( )( ) ( )

( ) ( ) ( ) ( )( )

= , where

Z

w q w qw

Z qw q dγ

π = =∫

x x x x xx x

xx x x

( ) ( ) ( )( )

. . .( )

1

1~ i

Ni i di

iq q d d

Nδ

=

⇒ = ∑ XX x x x x

47


Review: Importance Sampling Substitution of in the importance sampling identity gives:

Similarly, we can approximate the normalization factor of our target distribution as follows:

( ) ( )( )

1

1i

N

iq d d

Nδ

=

= ∑ Xx x x

( ) ( ) ( )( ) ( )

( ) ( )

( )( )

( )

( )

( )

( )

( )1

( ) 1

1

( ) ( ) ( )

1

1

,1

1

i

i

Ni

Nii

Ni i

iN

i i i

i

w dw q Nd d W dw q d w

N

where W w and W

δπ δ=

=

=

=

= = =

∝ =

∑∑

∫ ∑

∑

X

X

X xx xx x x x

x x x X

X

( )( ) ( ) ( ) ( ) ( ) ( )

( )( )

( )( )

1 1

1 1iN N

ii

i iZ q d w q d w

q N N q

γγ

= =

= = = =∑ ∑∫ ∫Xx

x x x x x Xx X

48


Review: Importance Sampling

The distribution π(x) is now approximated by a weighted sum of delta masses, where the weights compensate for the discrepancy between π(x) and q(x).

( ) ( ) ( )( )( ) ( ) ( ) ( )

1 1, 1i

N Ni i i i

i id W d where W w and Wπ δ

= =

= ∝ =∑ ∑Xx x x X

49


Review: Importance Sampling Similarly calculation of using importance sampling gives:

The statistics of this estimate are given for N>>1 as follows:

( ) ( ) ( ) ( )( ) ( )

1

iN

i

Ai

f f d f Wπ

π=

= = ∑∫x x x x X

( )fπ x

( ) ( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )( )2

1

1Negligible Bias

f f W f fN

V f W f fN

π πππ

πππ

= − −

= −

x x X X x

x X X x

50


Review: Importance Sampling We can similarly compute the statistics of the normalization constant:

They are given as:

( )( ) ( )

( )( ) ( )

( )

( )

( )1 1

1 1i

i

i

N N

i iZ q d w

q N Nq

γγ

= =

= = =∑ ∑∫Xx

x x Xx X

( )( )

2

,

1q

Z Z and

V Z ZN q

γ

= = −

xx

51


Review: Importance Sampling We select q(x) as close as possible to π(x).

The variance of the weights is bounded iff

In practice, it is sufficient to ensure that the weights are bounded:

This is equivalent to saying that q(x) should have heavier tails than π(x).

( )( )

2

dqγ

< ∞∫x

xx

( ) ( )( )

wqγ

= < ∞x

xx

52


Monte Carlo for the State Space Model Recall that we are interested to estimate the high-dimensional

density

For now let us start with a fixed n.

A Monte Carlo approximation (empirical measure) of our target distribution is of the form:

For any function , we can use a Monte Carlo

approximation of its expectation as:

53

( ) ( )( ) ( )1: 1:

1: 1: 1: 1:1:

| n nn n n n

n

pp p

p= ∝

x , yx y x , y

y

( ) ( ) ( )( )1:

( )1: 1: 1: 1: 1: 1:

1

1 , ~in

Ni

n n n n n nNi

p where pN

δ=

= ∑ Xx | y x X x | y

( )1: : nnϕ →x X

( ) ( ) ( ) ( )

( ) ( ) ( )

1: 1:

( )1:

1: 1: 1: 1:

( )1: 1: 1: 1:

1 1

1 1

n nNn

in

n

n n n nNp

N Ni

n n n ni i

p d

dN N

ϕ ϕ

ϕ δ ϕ= =

=

= =

∫

∑ ∑∫

x | y

X

x x | y x

x x x X

X

X


Monte Carlo for the State Space Model This earlier estimate is asymptotically consistent (converges towards ).

The estimate is unbiased and its variance gives the following

convergence properties:

The rate of convergence is independent of n. This does not imply

that Monte Carlo bits the curse of dimensionality since it is possible that increases (with time) n.

54

( ) ( ) ( ) ( )

( ) ( ) ( ) ( )( ) ( ) ( )( )( )

1: 1: 1: 1:1:

1: 1: 1: 1: 1: 1:

1

0,

in n n nn N

n n n n n nN

pp

d

p pp

Var VarN

N Var

ϕ ϕ

ϕ ϕ ϕ

=

− →

x | y x | yX

x | y x | y x | yN

( ) ( )1: 1:n np ϕx | y

( ) ( )1: 1:n npVar ϕx | y


Monte Carlo for the State Space Model The Monte Carlo approximation can easily be used to compute any

marginal distribution, e.g.

Note that the marginal likelihood cannot be estimated as easily

using

55

( )1:k np x | y

( )1:np y

( ) ( )

( )

( )

1

( )1:

1

( )

1: 1: 1: 1: 1 1:

1: 1: 1 1:1

1

1

1

n

in

n

ik

k n n n k k nN N

N

n k k ni

N

kXi

p x p d d

d dN

xN

δ

δ

−

−

− +

− +=

=

=

=

=

∫

∑∫

∑

X

| y x | y x x

x x x

X

X

( )( )1: 1: 1:~i

n n npX x | y


Difficulties with Standard Monte Carlo Sampling It is difficult to sample from our target high-dimensional distribution:

MCMC methods are not useful in this context.

As n increases, we would like to be able to sample from with an algorithm that keeps the computational cost fixed at each time step n.

56

( )( )1: 1: 1:~i

n n npX x | y

( )1: 1:n np x | y


Importance Sampling for our State Space Model Rather than sampling directly from our target distribution ,

we should sample from an importance distribution

Note that in the notation here for q, y1:n is used as a parameter – not to indicate any posterior distribution.

The importance distribution needs to satisfy the following properties: The support of includes the support of i.e.

It is easy to sample from

We use the following identity:

57

( )1: 1:n np x | y( )1: 1:n nq x | y

( )1: 1:n nq x | y ( )1: 1:n np x | y

( ) ( )1: 1: 1: 1:0 0n n n np q> ⇒ >x | y x | y

( )1: 1:n nq x | y

( ) ( )( )

( ) ( ) ( )( ) ( ) ( )

( ) ( )( ) ( )

1: 1: 1: 1: 1: 1:1: 1:1: 1:

1: 1: 1: 1: 1: 1: 1: 1: 1: 1:

1: 1: 1: 1:

1: 1: 1: 1: 1:

n n n n n nn nn n

n n n n n n n n n n

n n n n

n n n n n

p , q qp ,p

p , d p , q q d

w , qw , q d

= =

=

∫ ∫

∫

x y x | y x | yx yx | y

x y x x y x | y x | y x

x y x | yx y x | y x


Importance Sampling for our State Space Model Let us draw N samples from our importance distribution:

Then using the identity in the earlier slide, we obtain the following approximation of our target distribution:

Note that:

58

( ) ( ) ( )( ) ( )

( ) ( )

( ) ( )

( ) ( )( )

( )1:

( )1:

( )1:

1: 1: 1: 1:1: 1:

1: 1: 1: 1: 1:

1: 1: 1:1

1: 1: 1: 1:1

( )1: 1:( ) ( )

1:( )11: 1:

1

1

1

,

in

in

in

n n n nNn nN

n n n n nN

N

n n ni

N

n n n ni

iNn ni i

n n n Niin n

i

w , qp

w , q d

w ,N

w , dN

w ,W W

w ,

δ

δ

δ

=

=

=

=

=

=

= =

∫

∑

∑∫

∑∑

X

X

X

x y x | yx | y

x y x | y x

x y x

x y x x

X yx

X y

( ) ( ) ( )( )1:

( )1: 1: 1: 1: 1: 1:

1

1~ , in

Nin n n n n nN

iq q

Nδ

=

= ∑ XX x | y x | y x

( ) ( ) ( ) ( )( )1:

( )1: 1: 1: 1: 1: 1: 1:

1 1

1 1in

N Ni

n n n n n n nNi i

p w , d w ,N N

δ= =

= =∑ ∑∫ Xy x y x x X y


Normalized Weights in Importance Sampling We defined earlier the unnormalized weights as follows:

The normalized weights we are also introduced as:

59

( ) ( )( ) ( ) ( )

( )1: 1: 1: 1:

1: 1: 1:1: 1: 1: 1:

n n n nn n n

n n n n

Discrepancy betweentarget distribution andimportance distribution

p , p |Un - normalized weights : w , p

q q= =

x y x yx y y

x | y x | y

( )( )

( )1: 1:( )

( )1: 1:

1

in ni

n Nin n

i

w ,Normalized weights :W

w ,=

=

∑

X y

X y


Optimal Importance Sampling Distribution is an unbiased estimate of with variance:

You can bring this variance to zero with the selection

Of course this is what we wanted to avoid (we want to sample from an easier distribution).

However, this results points to the fact that the choice of q needs to

be as close as possible to the target distribution.

60

( )1:nNp y ( )1:np y

( ) ( )21: 1: 1: 1: 1:

1 1n n n n nw , q | dN − ∫ x y x y x

( ) ( )1: 1: 1: 1:n n n nq p=x | y x | y


Importance Sampling Estimates We are interested in an importance sampling approximation of .

This is a biased estimate for a finite N and one can show:

The asymptotic bias is of the order 1/N (negligible) and the MSE

error is:

61

( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( )( )

( ) ( ) ( ) ( )( ) ( )( ) ( ) ( ) ( )( )

1: 1: 1: 1: 1: 1:

1: 1: 1: 1: 1: 1:

21: 1:

1: 1:1: 1:

2 21: 1:

1: 1:1: 1:

lim

0,

n n n n n nN

n n n n n nN

n np n p npN

n n

dn n

p n p npn n

p |N d

q

p |N d

q

ϕ ϕ ϕ ϕ

ϕ ϕ ϕ ϕ

→∞ − = − −

− → −

∫

∫

x | y x | y x | y

x | y x | y x | y

x yx x

x | y

x yx x

x | yN

( ) ( )1: 1:n np ϕx | y

( )

( )2 1

2

O N O N

MSE bias variance− −

= +

( ) ( ) ( )1: 1:

( ) ( )1:

1n nN

Ni i

n npi

Wϕ ϕ=

=∑x | y X


Selection of Importance Sampling Distribution As discussed before, the importance sampling distribution should be

selected so that the weights are bounded or equivalently has heavier tails than

To minimize the asymptotic bias, we aim for that is as close as possible to

Note that the selection of the importance sampling needs to be not only such that it covers the support of the target but also needs to be a clever one for the particular problem of interest.

For numerical examples and MatLab implementations please see an earlier lecture on importance sampling.

62

( )1: 1:n nq x | y( )1: 1:n np x | y

( )1: 1: 1:n

n n nw , C≤ ∀ ∈x y x X

( )1: 1:n nq x | y( )1: 1:n np x | y

http://mpdc.mae.cornell.edu/Courses/MAE714/ImportanceSampling.pdf


Effective Sample Size In our importance sampling approximation from the target

using the importance distribution (for a fixed n), we would like ideally to have

In this case, all the unnormalized importance weights will be equal

and their variance equal to zero.

To access the quality of the importance sampling approximation, note that for flat functions,

This is often interpreted as the effective sample size (N weighted samples from are approximately equivalent to M unweighted samples from )

63

( )1: 1:n np x | y( )1: 1:n nq x | y

( ) ( )1: 1: 1: 1:n n n nq p=x | y x | y

( ) ( )1: 1: 1: 1:1

n n n nqVariance of IS estimate Var w

Variance of Standard MC estimate≈ + x | y X | y

( )1: 1:n nq x | y( )1: 1:n np x | y

( ) ( )1: 1: 1: 1:1

n n n nq

NM NVar w

= ≤+ x | y X | y


Effective Sample Size We often approximate the effective sample size M as follows:

since

We clearly can see that

We can thus have ESS=1 (one of the weights equal to 1, all other zero, very inefficient) to ESS=N (all weights equal to 1/N, excellent sampling).

64

1( )2

1

Ni

ni

ESS w−

=

= ∑

( ) ( ) ( )1: 1:

( ) 2 ( )1: 1: 1: 1:

1

1 1n n

Ni in n n nq

iVar w w

N =

≈ −∑x | y X | y X | y

1( )2

11

Ni

ni

ESS w N−

=

≤ = ≤

∑


Sequential Importance Sampling Let us return to our state space model and consider a sequential

Monte Carlo approximation of

The distributions are known up to a normalizing constant:

We want to estimate the expectations of functions

and/or the normalizing constants Zn. One can use MCMC to sample from This calculation will

be slow and cannot compute

( ){ }1: 1:n n npπ = x | y

( )1: 1:1:1:

( )( ) n nn nn n

n n

p ,Z Z

γπ = =x yxx

: nnf →X

( ) 1: 1: 1:( ) ( )n n n n n n ndπ ϕ ϕ π= ∫ x x x

{ } , 1, 2..n nπ =

{ } , 1, 2..nZ n =

65

( ) ( )1: 1: 1: 1:n n n np p ,∝x | y x y


Sequential Importance Sampling We want to do these calculations sequentially starting with π1 and Z1

at step (time 1), then proceeding to π2 and Z2, etc.

Sequential Monte Carlo (SMC) provides the means to do so as an alternative algorithm to MCMC.

The key idea is that if πn-1 does not differ a lots from πn, we should be able to reuse our estimate of πn-1 to approximate πn.

66


Sequential Importance Sampling We want to design a sequential importance sampling method to

approximate

Assume that `at time 1’, we have approximations

using an importance density

( )1 1 1| .q x y

{ } { }1 1n nn nand Zπ

≥ ≥

( ) ( ) 11 1 1 1, ,Nx p x y Zπ =

( )

( ) ( ) ( )( )

( )

( ) ( )( )

( )( )

( )1

( )1 1 1 1

( )1 1 1( ) ( )

1 1 1 1 1 1( )1

1 1 11

( )1 1 1 1

1

1 1 1 11 1 1

1 1 1 1 1 1

~ | , 1, 2,...,

,, ,

,

1 ,

,,

| |

i

i

iNi i

N Nji

j

Ni

i

q x y i N

w X yp x y dx W dx where W

w X y

Z w X y withN

x p x yw x y

q x y q x y

δ

γ

=

=

=

=

= =

=

= =

∑∑

∑

X

X

67


Sequential Importance Sampling At `time 2’, we want to approximate

using an importance density

We want to reuse the samples and q1(x1|y1) in building the importance sampling approximation for .

Let us select

To obtain , we need to sample as follows:

The importance sampling weight for this step is then:

( )2 1:2 1:2| .q x y ( ) ( ) 22 1:2 1:2 1:2, ,Np Zπ =x x y

( )1

iX( )2 1:2 2, Zπ x

( ) ( ) ( )2 1:2 1:2 1 1 1 2 2 1:2 1| | | ,q q x y q x x=x y y

( )( )1:2 2 1:2 1:2~ |i qX x y

( )( ) ( ) ( )2 1 2 2 1:2 1| ~ | ,i i iX X q x Xy

( ) ( )( )

( )( ) ( )

( )( )

( )( ) ( ) ( ) ( )

( ) ( )

2 1:2 1:2 1:22 1:2 1:2

2 1:2 1:2 1 1 1 2 2 1:2 1

1 1 1:2 1:2 1:2 1:21 1 1

1 1 1 1 1 2 2 1:2 1 1 1 2 2 1:2 1

,,

| | | ,

, , ,,

| , | , , | ,Incremental weight

pw

q q x y q x x

p x y p pw x y

q x y p x y q x x p x y q x x

γ= = =

= =

x x yx y

x y y

x y x yy y

Weight fromstep1

68


Sequential Importance Sampling The normalized weights for step 2 are then given as:

Generalizing to step n, we can write:

Thus if

we sample Xn from

( ) ( ) ( )( ) ( )

1:2 1:2( )2 2 1:2 1:2 1 1 1

1 1 2 2 1:2 1

,, ,

, | ,i

Incremental weight

pW w w x y

p x y q x x∝ =

x yx y

yWeight fromstep1

( ) ( ) ( )

( ) ( )

1: 1: 1 1: 1 1: 1 1: 1: 1

1 1 1 1: 1: 12

| | ,

| | ,

n n n n n n n n n n

n

k k k kk

q q q x

q x y q x

− − − −

−=

=

= ∏

x y x y | y x

y x

( )( )1: 1 1 1: 1 1: 1~ |i

n n n nq− − − −X x y

( )( ) ( ) ( )1: 1 1: 1: 1~ | ,i i i

n n n n n nX q x− −| X y X

69


Sequential Importance Sampling The weights for step n are then given as:

Similarly the normalized weights are as follows:

For our state space model, the above update formula takes the form:

In general, we may need to store all the paths even if our interest is to only compute

In the next lecture, we will return to the sequential importance sampling algorithm and consider a simplified proposal distribution.

( )1 1: 11: 1

( ) ( ) ( )( ) 1: 1: 1: 1 1: 1 1: 1:1: 1: ( ) ( ) ( ) ( ) ( )

1: 1: 1 1: 1 1: 1 1: 1 1: 1 1: 1 1:

( )

( )1 1: 1 1: 1

( ) ( ) ( )( )( ) ( | ) ( ) ( )

( )

in nn

i i ii n n n n n n

n n n i i i i in n n n n n n n n n n n

w

in n n

p , p , p ,wq | q p , q X ,

w

− −−

− −

− − − − − −

− − −

= =

=

X ,y

X y X y X yX , yX y X y X y | X y

X , y( )1: 1:

( ) ( ) ( )1: 1 1: 1 1: 1 1:

( )( ) ( )

in n

i i in n n n n n

p ,p , q X ,− − −

X yX y | X y

( ) ( ) ( )1: 1: 1: 1:( ) ( )i i i

n n n n n n nW W w≡ ∝X , y X , y

70

{ }( )1:

inX

( )1:( )n n n nx p xπ = | y

( ) ( ) ( )( ) ( ) 11: 1: 1 1: 1 1: 1 ( ) ( )

1: 1: 1

( ) ( )( ) ( )( )

i i ii i n n n n

n n n n n n i in n n n

f X | X g y | Xw wq X ,

−− − −

−

=X , y X , y| y X