Anderson Tutorial - University of Wisconsin–Madison

52
Stochastic Simulation of Models Arising in the Life Sciences David F. Anderson * * [email protected] Department of Mathematics University of Wisconsin - Madison SIAM Life Sciences 2012 August 9th, 2012

Transcript of Anderson Tutorial - University of Wisconsin–Madison

Page 1: Anderson Tutorial - University of Wisconsin–Madison

Stochastic Simulation of Models Arising in the Life Sciences

David F. Anderson∗

[email protected]

Department of Mathematics

University of Wisconsin - Madison

SIAM Life Sciences 2012

August 9th, 2012

Page 2: Anderson Tutorial - University of Wisconsin–Madison

Broad Outline

1. Give two slides on facts for exponential random variables.

2. Discuss mathematical models of interest,I stochastic models of biochemical (population) processes.I continuous time Markov chains.

3. Gillespie’s algorithm

4. Derive Different stochastic representationsI random time change.I next reaction method.

5. Derive “tau-leaping”

6. Derive ODEs and diffusion/Langevin approximations,

7. If time: one more advanced idea – parameter sensitivities.

Page 3: Anderson Tutorial - University of Wisconsin–Madison

The exponential random variable

LemmaIf for i = 1, . . . , n, the random variables Xi ∼ Exp(λi ) are independent, then

X0 ≡ mini{Xi} ∼ Exp(λ0), where λ0 =

n∑i=1

λi .

LemmaFor i = 1, . . . , n, let the random variables Xi ∼ Exp(λi ) be independent. Let jbe the index of the minimum of {Xi}. Then j is a discrete random variablewith probability mass function

P{j = i} =λi

λ0, where λ0 =

n∑i=1

λi .

Page 4: Anderson Tutorial - University of Wisconsin–Madison

How to generate an exponential

Suppose we want an exp(λ) random variable from a uniform.

LemmaIf U is uniform(0, 1) and λ > 0, then

X = ln(1/U)/λ,

is exponentially distributed with a parameter of λ.

Proof.For t ≥ 0,

P{X ≤ t} = P{ln(1/U)/λ ≤ t}= P{1/U ≤ eλt}= P{e−λt ≤ U}= 1− e−λt .

Page 5: Anderson Tutorial - University of Wisconsin–Madison

What type of models:

Substrate-enzyme kinetics:

S + Eκ1�κ−1

[SE ]κ2→ P + E .

Gene transcription & translation:

Gκ1→ G + M transcription

Mκ2→ M + P translation

Mκ3→ ∅ degradation

Pκ4→ ∅ degradation

G + Pκ5�κ−5

B Binding/unbinding of Gene

Cartoon representation:

1

1J. Paulsson, Physics of Life Reviews, 2, 2005 157 – 175.

Page 6: Anderson Tutorial - University of Wisconsin–Madison

Another example: Viral infectionLet

1. T = viral template.

2. G = viral genome.

3. S = viral structure.

4. V = virus.

Reactions:

R1) T + “stuff”κ1→ T + G κ1 = 1

R2) Gκ2→ T κ2 = 0.025

R3) T + “stuff”κ3→ T + S κ3 = 1000

R4) Tκ4→ ∅ κ4 = 0.25

R5) Sκ5→ ∅ κ5 = 2

R6) G + Sκ6→ V κ6 = 7.5× 10−6

I R. Srivastava, L. You, J. Summers, and J. Yin, J. Theoret. Biol., 2002.I E. Haseltine and J. Rawlings, J. Chem. Phys, 2002.I K. Ball, T. Kurtz, L. Popovic, and G. Rempala, Annals of Applied Probability, 2006.I W. E, D. Liu, and E. Vanden-Eijden, J. Comput. Phys, 2006.

Page 7: Anderson Tutorial - University of Wisconsin–Madison

Population Example: Lotka-Volterra predator-prey model

Think of A as a prey and B as a predator.

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with XA(0) = XB(0) = 1000 and κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [A(t),B(t)]T .

x ′(t) = κ1x1(t)[

10

]+ κ2x1(t)x2(t)

[−11

]+ κ3x2(t)

[0−1

]or

x(t) = x(0) + κ1

∫ t

0x1(s)ds

[10

]+ κ2

∫ t

0x1(s)x2(s)ds

[−11

]+ κ3

∫ t

0x2(s)ds

[0−1

]

Page 8: Anderson Tutorial - University of Wisconsin–Madison

What type of models:Modeling choices:

1. Discrete vs. continuous state space.

2. Deterministic vs stochastic dynamics.

Deterministic dynamics (continuous):I Write down a system of ODEs or integral equations.

Discrete stochastic dynamics. Make the following assumptions:I Well mixed (Markov),I state changes only occur due to reactions happening,I the reaction choice is random,I This gives DTMC,I holding times are exponentially distributed.

It is known that we can:I simulate paths using Gillespie’s algorithm,I Write down chemical master equation (dynamics of probabilities),I Write down stochastic equations.

Page 9: Anderson Tutorial - University of Wisconsin–Madison

What is Gillespie’s algorithm?

We can already state the basics of Gillespie’s algorithm.

Repeat the following steps:

1. Determine how long until the next reaction takes place: ∆.I Requires an appropriate exponential random variable.

2. Determine which reaction is the next to occur (discrete RV overreactions).

3. Update the state by addition/subtraction according to correct reactionand time by ∆.

4. Repeat for as long as you want.

Similar to someone simply telling you Euler’s method for an ODE, I doubt thisgave you much insight into the processes.

My hope is that more mathematical equations/structure will.

Page 10: Anderson Tutorial - University of Wisconsin–Madison

Biochemical models

• We consider a network of reactions involving d chemical species,S1, . . . ,Sd :

d∑i=1

νik Si −→d∑

i=1

ν′ik Si

• νk ∈ Zd≥0: number of molecules of each chemical species consumed in

the k th reaction.

• ν′k ∈ Zd≥0: number of molecules of each chemical species created in the

k th reaction.

Denote reaction vector as

ζk = ν′k − νk ,

Page 11: Anderson Tutorial - University of Wisconsin–Madison

Biochemical models• Example: each instance of the reaction S1 + S2 → S3 changes the state

of the system by the reaction vector:

ζk = ν′k − νk =

001

− 1

10

=

−1−11

.• The state of the system X (t) ∈ Zd

≥0 gives the number of molecules ofeach species in the system at time t .

• So if k th reaction happens at time s ≥ 0, then

X (s) = X (s−) + ζk .

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

NOTE:I λk gives transition rate from x to x + ζk .

I Total rate out of x is λ0(x) =∑

k λk (x).

Page 12: Anderson Tutorial - University of Wisconsin–Madison

Mass-action kinetics

The standard intensity function chosen is mass-action kinetics:

λk (x) = κk (∏

i

νik !)

(xνk

)= κk

∏i

xi !

(xi − νik )!.

Example: If S1 → anything, then λk (x) = κk x1.

Example: If S1 + S2 → anything, then λk (x) = κk x1x2.

Example: If 2S2 → anything, then λk (x) = κk x2(x2 − 1).

Page 13: Anderson Tutorial - University of Wisconsin–Madison

Chemical master equationCalled the forward equation in mathematics.

1. For x ∈ Zd≥0, let

P(x , t |π)

be the probability that the system is in state x at time t , given an initialdistribution of π.

2. Then, the forward equations are

ddt

P(x , t |π) = −λ0(x)P(x , t |π) +∑

k

λk (x − ζk )P(x − ζk , t |π).

Some points:

1. Tough to get intuition from these equations for a given system (I think).

2. Can be solved for small, simple systems. Can then compute, forexample, expectations.

3. System is often infinite dimensional and impossible to solve.

People often turn to simulation: Gillespie’s algorithm.

Page 14: Anderson Tutorial - University of Wisconsin–Madison

Gillespie’s Algorithm: simulate DTMC + exponential holding times

1. Initialize. Set the initial number of molecules of each species and sett = 0.

2. Calculate the intensity/propensity function, λk , for each reaction.

3. Set λ0 =∑M

k=1 λk .

4. Generate two independent uniform(0,1) random numbers r1 and r2.

5. Set∆ = (1/λ0) ln(1/r1)

(equivalent to generating an exponential RV with parameter λ0).

6. Find µ ∈ [1, . . . ,M] such that

λ−10

µ−1∑k=1

λk < r2 ≤ λ−10

µ∑k=1

λk ,

which is equivalent to choosing from reactions [1, . . . ,M] with the k threaction having probability λk/λ0.

7. Set t = t + ∆ and update the number of each molecular speciesaccording to reaction µ: X ← X + ζµ.

8. Return to step 2 or quit.

Page 15: Anderson Tutorial - University of Wisconsin–Madison

Simulation and Monte CarloWe can now generate paths of biochemical and/or population systems andcan consider the following general question:

I How can I approximate Ef (X (T )) to some desired tolerance, ε > 0?

Note that for us (those studying biochemical/population processes), thiscovers many topics:

1. Straight expectations and variances:

EXi (t), E[Xi (t)2

],

for some i ∈ {1, . . . , d} and t ≥ 0.

2. Expected average values:

E[

1t

∫ t

0Xi (s)ds

],

3. Expected hitting times,

4. Any probability,P(X (t) ∈ A) = E[1{X (t) ∈ A}].

Page 16: Anderson Tutorial - University of Wisconsin–Madison

A standard simulation problem

Problem: Approximate Ef (X (T )) to some desired tolerance, ε > 0.

Easy!:

I Simulate the CTMC exactly,

I generate independent paths, X[i](t), use the unbiased, consistentestimator

µn =1n

n∑i=1

f (X[i](t)).

I stop when desired confidence interval is ±ε.

Page 17: Anderson Tutorial - University of Wisconsin–Madison

What is the computational cost?

Recall,

µn =1n

n∑i=1

f (X[i](t)).

Thus,

Var(µn) = O(

1n

).

So, if we wantσn = O(ε),

we need1√n

= O(ε) =⇒ n = O(ε−2).

If N gives average cost (steps) of a path using exact algorithm:

Total computational complexity = (cost per path)× (# paths)

= O(Nε−2).

Can be bad if (i) N is large, or (ii) ε is small.

Page 18: Anderson Tutorial - University of Wisconsin–Madison

Benefits/drawbacksBenefits:

1. Easy to implement.

2. Estimator

µn =1n

n∑i=1

f (X[i](t))

is unbiased and consistent.

Drawbacks:1. The cost of O(Nε−2) could be prohibitively large.

2. For our models, we often have that N is very large.

I For different ideas, a better mathematical understanding of theunderlying processes is then useful.

Question: is there a mathematical equation that this CTMC satisfies?

Answer: Yes, there are a few. Each needs an understanding of Poissonprocesses.

Page 19: Anderson Tutorial - University of Wisconsin–Madison

Background information: The Poisson processA Poisson process, Y (·), is a model for a series of random observationsoccurring in time.

(a) Let {ei} be i.i.d. exponential random variables with parameter one.

(b) Now, put points down on a line with spacing equal to the ei :

x x x x x x x x↔e1↔e2

←→e3 · · · t

I Let Y (t) denote the number of points hit by time t .

I In the figure above, Y (t) = 6.

0 5 10 15 200

5

10

15

20

25

λ = 1

Page 20: Anderson Tutorial - University of Wisconsin–Madison

The Poisson processLet

I Y be a unit rate Poisson process.

I Define Yλ(t) ≡ Y (λt),

Then Yλ is a Poisson process with parameter λ.

x x x x x x x x↔e1↔e2

←→e3 · · · t

Intuition: The Poisson process with rate λ is simply the number of points hit(of the unit-rate point process) when we run along the time frame at rate λ.

0 5 10 15 200

10

20

30

40

50

60

λ = 3

Page 21: Anderson Tutorial - University of Wisconsin–Madison

The Poisson process

There is no reason λ needs to be constant in time, in which case

Yλ(t) ≡ Y(∫ t

0λ(s)ds

)is an nonhomogeneous Poisson process with propensity/intensity λ(t) ≥ 0.

Thus

P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ t+∆t

tλ(s)ds

}≈ λ(t)∆t .

Points:

1. We have “changed time” to convert a unit-rate Poisson process to onewhich has rate or intensity or propensity λ(t).

2. Will use similar time changes of unit-rate processes to build models ofinterest.

Page 22: Anderson Tutorial - University of Wisconsin–Madison

Models of interest

Consider the simple systemA + B → C

where one molecule each of A and B is being converted to one of C.

Biological intuition for standard stochastic model:

P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t

whereI κ is a positive constant, the reaction rate constant.

Page 23: Anderson Tutorial - University of Wisconsin–Madison

Models of interest

A + B → C

Simple book-keeping: if X (t) = (XA(t),XB(t),XC(t)) gives the state at time t ,

X (t) = X (0) + R(t)

−1−11

,

whereI R(t) is the # of times the reaction has occurred by time t andI X (0) is the initial condition.

Goal: represent R(t) in terms of Poisson process.

Page 24: Anderson Tutorial - University of Wisconsin–Madison

Models of interest

Recall that for A + B → C our intuition was

P{reaction occurs in (t , t + ∆t ]∣∣Ft} ≈ κXA(t)XB(t)∆t ,

and that for an inhomogeneous Poisson process with rate λ(t) we have

P{Yλ(t + ∆t)− Yλ(t) > 0|Ft} = 1− exp{−∫ ∆t

0λ(t)dt

}≈ λ(t)∆t .

This suggests we can model

R(t) = Y(∫ t

0κXA(s)XB(s)ds

)where Y is a unit-rate Poisson process.Hence XA(t)

XB(t)XC(t)

≡ X (t) = X (0) +

−1−11

Y(∫ t

0κXA(s)XB(s)ds

).

This equation uniquely determines X for all t ≥ 0.

Page 25: Anderson Tutorial - University of Wisconsin–Madison

General stochastic models of biochemical reactions• We consider a network of reactions involving d chemical species,

S1, . . . ,Sd :d∑

i=1

νik Si −→d∑

i=1

ν′ik Si

• νk ∈ Zd≥0: number of molecules of each chemical species consumed in

the k th reaction.

• ν′k ∈ Zd≥0: number of molecules of each chemical species created in the

k th reaction.

Denote reaction vector as

ζk = ν′k − νk ,

• The intensity (or propensity) of k th reaction is λk : Zd≥0 → R.

• By analogy with before

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

Page 26: Anderson Tutorial - University of Wisconsin–Madison

Population Example: Lotka-Volterra predator-prey model

Think of A as a prey and B as a predator.

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with XA(0) = XB(0) = 1000 and κ1 = 2, κ2 = .002, κ3 = 2.

Deterministic model. Let x(t) = [A(t),B(t)]T .

x(t) = x(0) + κ1

∫ t

0x1(s)ds

[10

]+ κ2

∫ t

0x1(s)x2(s)ds

[−11

]+ κ3

∫ t

0x2(s)ds

[0−1

]

Stochastic model. Let X (t) = [A(t),B(t)]T .

X(t) = X(0) + Y1

(κ1

∫ t

0X1(s)ds

)[10

]+ Y2

(κ2

∫ t

0X1(s)X2(s)ds

)[−11

]+ Y3

(κ3

∫ t

0X2(s)ds

)[0−1

]

Page 27: Anderson Tutorial - University of Wisconsin–Madison

Lotka-VolterraThink of A as a prey and B as a predator.

Aκ1→ 2A, A + B

κ2→ 2B, Bκ3→ ∅,

with A(0) = B(0) = 1000 and κ1 = 2, κ2 = .002, κ3 = 2.

400 600 800 1000 1200 1400 1600 1800

600

800

1000

1200

1400

1600

1800

Predator

Prey

Stochastic modelODE model

0 5 10 15 20 25 30 35 40700

800

900

1000

1100

1200

1300

1400

1500

PreyPredator

0 5 10 15 20 25 30 35 40400

600

800

1000

1200

1400

1600

1800

2000

PreyPredator

Page 28: Anderson Tutorial - University of Wisconsin–Madison

Wait! I just pulled a fast one on you.

1. I told you how to simulate a CTMC (DTMC + holding times)

2. Then developed a CTMC model for biochemical processes.

3. Is simulation of the model I developed equivalent to the simulationstrategy I proposed for CTMCs?

Answer: no, but this is more subtle than you might expect.

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

Page 29: Anderson Tutorial - University of Wisconsin–Madison

Statistically exact simulation methods

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

Alternative to the Gillespie algorithm: at time t let

Tk =

∫ t

0λk (X (s))ds

Pk = min{s > Tk : Yk (s) > Yk (Tk )}.

Assuming no other reactions fire first, amount of time until reaction k firesgiven by

λk ∆tk = Pk − Tk

=⇒ ∆tk =Pk − Tk

λk.

Page 30: Anderson Tutorial - University of Wisconsin–Madison

The (modified) Next Reaction Method: simulating the random timechange representation

1. Initialize system and set Tk = 0 for all k .

2. For each k , set Pk = ln(1/rk ), where rk ∼ unif(0, 1).

3. For each k set: ∆tk = (Pk − Tk )/λk .

4. ∆ = min{∆tk}. Reaction that fires is where min. achieved.

5. Set Pµ = Pµ + ln(1/r), r ∼ unif(0, 1).

6. For each k , Tk = Tk + λk ∆.

7. Update system and propensities.

8. Return to step 3 or quit.

Page 31: Anderson Tutorial - University of Wisconsin–Madison

Representation of Gillespie’s algorithm

Question: Can we derive a stochastic representation for Gillespie’s algorithm.

What do I need?1. DTMC: uniform random variables!

I Let {U0,U1,U2, . . . } be independent uniform(0,1) RVs.

2. Exponential holding times with rates λ0(X (t)).

I Let Y be a unit-rate Poisson process: will run at rate λ0(X(t)).

Then, let

R0(t) = Y(∫ t

0λ0(X (s))ds

)

X (t) = X (0) +∑

k

ζk

∫ t

01(qk−1(X(s−)),qk (X(s−)](UR0(s−))dR0(s),

where

qk (x) =k∑`=1

λ`(x)

Page 32: Anderson Tutorial - University of Wisconsin–Madison

RecapWe now have two representations and, hence, simulation strategies for thegeneration of sample paths. One is more useful than the other:

1. Random time change representation of Tom Kurtz, which looks more likeintegrated ODE:

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk ,

Yk are independent, unit-rate Poisson processes.

2. Gillespie’s algorithm (thinning of a Poisson process):

R0(t) = Y(∫ t

0λ0(X (s))ds

)

X (t) = X (0) +∑

k

ζk

∫ t

01(qk−1(X(s−)),qk (X(s−)](UR0(s−))dR0(s),

where

qk (x) =k∑`=1

λ`(x)

Page 33: Anderson Tutorial - University of Wisconsin–Madison

At this point you could/should/may be asking....

Who cares?

I will try to convince you that the RTC

1. yields different computational/simulation methods,

2. gives more analytical understanding of the models

– “Ito equations” for these models.

Page 34: Anderson Tutorial - University of Wisconsin–Madison

tau-leaping

Deriving “Tau-leaping” is a snap with RTC.

Explicit “τ -leaping” 2 was developed by Dan Gillespie in an effort to overcomethe problem that each time-step may be prohibitively small.

Tau-leaping is essentially an Euler approximation of∫ t

0λk (X (s))ds:

Z (τ) = Z (0) +∑

k

Yk

(∫ τ

0λk (Z (s)) ds

)ζk

≈ Z (0) +∑

k

Yk

(λk (Z (0)) τ

)ζk

d= Z (0) +

∑k

Poisson(λk (Z (0)) τ

)ζk .

2D. T. Gillespie, J. Chem. Phys., 115, 1716 – 1733.

Page 35: Anderson Tutorial - University of Wisconsin–Madison

tau-leaping

One “intuitive” representation for Z (t) generated by τ -leaping is

Z (t) = X (0) +∑

k

Yk

(∫ t

0λk (Z ◦ η(s))ds

)ζk ,

where

η(s) = tn, if tn ≤ s < tn+1 = tn + τ

is a step function giving left endpoints of time discretization.

Question: Is there a point for Diffusion approximation if only going to use tosimulate?

Page 36: Anderson Tutorial - University of Wisconsin–Madison

What do we know about Poisson processes?

Recall that for a Poisson process, Y , the law of large numbers says

1N

Y (Nu) ≈ u.

when N � 1.

What about fluctuations? The CLT gives

1N

Y (Nu) ≈ u +1√N

W (u),

What do these imply for us?

Page 37: Anderson Tutorial - University of Wisconsin–Madison

LLN - mean field

Recall that process satisfies

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

Suppose that each Xi (t) = O(N) for some large N � 1. Then, scaling by Nyields the equations:

X N(t) = X N(0) +∑

k

1N

Yk

(N∫ t

0λk (X N(s))ds

)ζk .

So, LLN gives

X N(t) ≈ X N(0) +∑

k

ζk

∫ t

0λk (X N(s))ds

Hence, a good ODE approximation is

x ′(t) =∑

k

ζkλk (x(t)).

Page 38: Anderson Tutorial - University of Wisconsin–Madison

LLN - mean field

Please note the very common error: taking expectations of

X (t) = X (0) +∑

k

Yk

(∫ t

0λk (X (s))ds

)ζk .

yields

EX (t) = EX (0) +∑

k

ζk

∫ t

0Eλk (X (s))ds.

This is NOT the same as

EX (t) = EX (0) +∑

k

ζk

∫ t

0λk (EX (s))ds.

Which is equivalent to the usual ODE

x ′(t) =∑

k

ζkλk (x).

Page 39: Anderson Tutorial - University of Wisconsin–Madison

Diffusion/Langevin approximationRecall that Poisson processes satisfy (by the CLT)

1N

Y (Nu) ≈ u +1√N

W (u),

where W is a Brownian motion.Hence,

X N(t) = X N(0) +∑

k

1N

Yk

(N∫ t

0λk (X N(s))ds

)ζk

≈ X N(0) +∑

k

ζk

∫ t

0λk (X N(s))ds +

∑k

ζk1√N

Wk

(∫ t

0λk (X N(s))ds

)or

X N(t) ≈ X N(0) +∑

k

ζk

∫ t

0λk (X N(s))ds +

∑k

ζk1√N

∫ t

0

√λk (X N(s))dWk

Suggests approximate process of

DN(t) = X N(0) +∑

k

ζk

∫ t

0λk (DN(s))ds +

∑k

ζk1√N

∫ t

0

√λk (DN(s))dWk

which can be simulated. (Why not use tau-leaping, though?)

Page 40: Anderson Tutorial - University of Wisconsin–Madison

A specific computational problem: Gradient estimationWe have

X θ(t) = X θ(0) +∑

k

Yk

(∫ t

0λθk (X θ(s))ds

)ζk .

and we define

J(θ) = Ef (X θ(t)].

We know how to estimate J(θ) using Monte Carlo.

However, what if we want

J ′(θ) =ddθ

Ef (X θ(t)).

Thus, we want to know how sensitive our statistic is to perturbations in θ.Tells us:

1. How best to control a system for desired outcome.2. Robustness of system to perturbations in parameters.3. Which parameters we need to estimate well from data, etc.

There are multiple methods. We will consider:I Finite differences.

Page 41: Anderson Tutorial - University of Wisconsin–Madison

Finite differencingThis method is pretty straightforward and is therefore used most.

Simply note that

J ′(θ) ≈ J(θ + ε)− J(θ)

ε=

Ef (X θ+ε(t))− Ef (X θ(t))

ε+ O(ε).

Centered differencing reduces bias to O(ε2).

The usual finite difference estimator is

DR(ε) =1R

R∑i=1

f (X θ+ε[i] (t))− f (X θ

[i](t))

ε

If generated independently, then

Var(DR(ε)) = R−1ε−2Var(f (X θ+ε[i] (t))− f (X θ

[i](t)))

= O(R−1ε−2).

Terrible. Worse than expectations.

How about common random numbers for variance reduction?

Page 42: Anderson Tutorial - University of Wisconsin–Madison

Common random numbers

It’s exactly what it sounds like. Reuse the random numbers used in thegeneration of

X θ+ε[i] (t) and X θ

[i](t).

Why? Because:

Var(f (X θ+ε[i] (t))− f (X θ+ε

[i] (t))) = Var(f (X θ+ε[i] (t))) + Var(f (X θ

[i](t)))

− 2Cov(f (X θ+ε[i] (t)), f (X θ

[i](t))).

So, if we can “couple” the random variables, we can get a variance reduction!Sometimes substantial.

Page 43: Anderson Tutorial - University of Wisconsin–Madison

Common random numbers

I In the context of Gillespie’s algorithm, we simply reuse all the samerandom numbers (uniforms).

I This can be achieved simply by setting the “seed” of the random numbergenerator before generating X θ+ε and X θ.

Mathematically, this is equivalent to using the same Poisson process anduniforms in:

Rθ+ε0 (t) = Y

(∫ t

0λθ+ε

0 (Xθ+ε(s))ds)

Xθ+ε(t) = Xθ+ε(0) +∑

k

ζk

∫ t

01

(qθ+εk−1(Xθ+ε(s−)),qθ+ε

k (Xθ+ε(s−)](UR0(s−))dRθ+ε

0 (s),

Rθ0 (t) = Y(∫ t

0λθ0(Xθ(s))ds

)

Xθ(t) = Xθ(0) +∑

k

ζk

∫ t

01(qθ

k−1(Xθ(s−)),qθk (Xθ(s−)](UR0(s−))dRθ0 (s),

Page 44: Anderson Tutorial - University of Wisconsin–Madison

Common random numbers

CRN + Gillespie is good idea.

1. Costs little in terms of implementation.

2. Variance reduction and gains in efficiency can be huge.

Thus, it is probably the most common method used today.

However:I Over time, the processes decouple, often completely.

Question: Can we do better? Is there a natural way to couple processesusing random time change?

Page 45: Anderson Tutorial - University of Wisconsin–Madison

How do we generate processes simultaneously

Suppose I want to generate:I A Poisson process with intensity 13.1.I A Poisson process with intensity 13.

I We could let Y1 and Y2 be independent unit-rate Poisson processes, andset

Z13.1(t) = Y1(13t) + Y2(0.1t)

Z13(t) = Y1(13t),

The variance of difference is much smaller:

Var(Z13.1(t)− Z13(t)) = Var (Y2(0.1t)) = 0.1t .

Using a fact: sum of homogeneous Poisson process is again a Poissonprocess.

Page 46: Anderson Tutorial - University of Wisconsin–Madison

Next problem: parameter sensitivities.

Couple the processes.

X θ+ε(t) = X θ+ε(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,2

(∫ t

0λθ+ε

k (X θ+ε(s))− λθ+εk (X θ+ε(s)) ∧ λθk (X θ(s))ds

)ζk

X θ(t) = X θ(0) +∑

k

Yk,1

(∫ t

0λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk

+∑

k

Yk,3

(∫ t

0λθk (X θ(s))− λθ+ε

k (X θ+ε(s)) ∧ λθk (X θ(s))ds)ζk ,

Page 47: Anderson Tutorial - University of Wisconsin–Madison

Next problem: parameter sensitivities.

Theorem3 Suppose (X θ+ε,X θ) satisfy coupling. Then, for any T > 0 there is aCT ,f > 0 for which

E supt≤T

(f (X θ+ε(t))− f (X θ(t))

)2≤ CT ,f ε.

This lowers variance of estimator from

O(R−1ε−2),

toO(R−1ε−1).

Lowered by order of magnitude (in ε).

Point: a deeper mathematical understanding led to better computationalmethod.

3David F. Anderson, An Efficient Finite Difference Method for Parameter Sensitivities ofContinuous Time Markov Chains, accepted for publication at SIAM: Journal on Numerical Analysis.Available on arxiv.org at arxiv.org/abs/1109.2890

Page 48: Anderson Tutorial - University of Wisconsin–Madison

Example: birth and death

∅θ

�0.1

M. (1)

0 20 40 60 80 1000

50

100

150

200

250

300

350

400

450

Time

Varia

nce

Crude Monte Carlo

(a) CMC

0 20 40 60 80 1000

5

10

15

20

25

TimeVa

rianc

e

Girsanov Transformation

(b) Likelihood

0 2000 4000 6000 8000 100000

50

100

150

200

250

300

350

400

450

Time

Varia

nce

Common Reaction Path

(c) CRP, T = 10, 000

0 20 40 60 80 1000

2

4

6

8

10

12

14

16

18

20

Time

Varia

nce

Common Reaction Path

(d) CRP, T = 100

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

1.2

1.4

Time

Varia

nce

Coupled Finite Differences

(e) CFD

0 2000 4000 6000 8000 100000

50

100

150

200

250

300

350

400

450

Time

Varia

nce

Gillespie’s Algorithm with Commone Random Numbers

(f) Gillespie + CRN

Page 49: Anderson Tutorial - University of Wisconsin–Madison

Example: genetic toggle switch

∅λ1(X)

�λ2(X)

X1, ∅λ3(X)

�λ4(X)

X2, (2)

with intensity functions

λ1(X (t)) =α1

1 + X2(t)β, λ2(X (t)) = X1(t)

λ3(X (t)) =α2

1 + X1(t)γ. λ4(X (t)) = X2(t),

and parameter choice

α1 = 50, α2 = 16, β = 2.5, γ = 1.

I Begin the process with initial condition [0, 0] andI consider the sensitivity of X1 as a function of α1.

Page 50: Anderson Tutorial - University of Wisconsin–Madison

Example: genetic toggle switch

0 5 10 15 20 25 30 35 400

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Time

Varia

nce

Coupled Finite DifferencesCommon Reaction Path

(g) Variance to time T = 40

Figure: Time plot of the variance of the Coupled Finite Difference estimator versus theCommon Reaction Path estimator for the model (2). Each plot was generated using10,000 sample paths. A perturbation of ε = 1/10 was used.

Page 51: Anderson Tutorial - University of Wisconsin–Madison

Recap

I Simulation of CTMCs found in biology is usually relatively straightforwardif you just want a few paths:

I Gillespie’s algorithm.

I Next reaction method.

I These algorithms are precise simulations of different representations forthe same models.

I Once the naive method fails, the random time change understanding isusually the most helpful in deriving advanced computational methods:

1. tau-leaping,2. variance reduction methods...

Page 52: Anderson Tutorial - University of Wisconsin–Madison

Thanks!Some references:

1. Dan Gillespie, J. Comput. Phys. 22, 403 (1976).

2. Dan Gillespie, J. Phys. Chem. 81, 2340 (1977).

3. M. Gibson and J. Bruck, J. Phys. Chem. A, 105, 1876, 2000.

4. David F. Anderson, A modified next reaction method for simulatingchemical systems with time dependent propensities and delays, J.Chem. Phys. 127 (2007), no. 21, 214107.

5. David F. Anderson and Thomas G. Kurtz, Continuous time Markov chainmodels for chemical reaction networks, chapter in Design and Analysisof Biomolecular Circuits: Engineering Approaches to Systems andSynthetic Biology, H. Koeppl et al. (eds.), Springer, 2011.

6. David F. Anderson, An Efficient Finite Difference Method for ParameterSensitivities of Continuous Time Markov Chains, accepted forpublication at SIAM: Jour. Num. Analysis. Available on arxiv.org atarxiv.org/abs/1109.2890.