Intro to Numerical Stochastic Differential...

Post on 23-Jun-2020

4 views 0 download

Transcript of Intro to Numerical Stochastic Differential...

Intro to Numerical Stochastic Differential Equations

Matt DavisonUniversita degli Studie, Verona

Wed May 24 2017

Wiener Process

• Construct a Wiener process W(t) on the set of points 0 = t0 < t1 < t2 < … < tN = T by:

• W(t0) = W(0) = 0

• W(t1) = W(0) + √t1*Z0

• W(tk+1) = W(tk) + √(tk+1-tk)*Zk

• Where the Zk are iid N(0,1) draws

• In the natural setting where tk+1-tk = T/N

• We often compress notation to writeWk = W(tk)

Recall what a stochastic process is

• It lives in a space (Ω,Ft,P) where:

• Ω contains every continuous function in the time interval [0,T] (Ω is a huge space)

• Ft is the sigma algebra saying which elements ω of Ω we can distinguish at time t

• (intuitively, everything that is the same up to time t is associated by Ft, even if it is different after time t)

• P is the way to put probabilities on everything.

Two ways to think about it:

• At a fixed time, and over all sample space (consistent with Ft), you have a probability density

• For the Wiener process this is N(0,t) if we condition just on time 0 information, it is N(Ws,t-s) if we condition on information at time s < t

• Over all time for a given point ω in Ω it is a (continuous, nondifferentiable) function.

Stochastic (Ito) integral

• By analogy to the left endpoint rule for quadrature, we define, on the partition

• 0 < t1 < t2 < … < tN = T

• ∫0t b(W(s),s)dWs = Σk=0

N-1 b(Wk,tk)(Wk+1-Wk)

• The actual integral is the standard limit as the partition becomes infinitely fine.

Example 1

• ∫0t dWs = Σk=0

N-1 (Wk+1-Wk) = WN – W0 = WT

• Since W0 = 0 and tN = T

• So far it looks like “regular” calculus rules apply

• Don’t have to make it much harder before they don’t:

• ∫0t WsdWs = Σk=0

N-1 Wk(Wk+1-Wk)

New Example (Analytic)

• ∫0t WsdWs = Σk=0

N-1 Wk(Wk+1-Wk)

• ½ Σk=0N-1 (Wk+Wk+1)(Wk+1-Wk) +

½ Σk=0N-1 (Wk-Wk+1)(Wk+1-Wk)

• = ½ Σk=0N-1 (Wk

2-Wk+12) + ½ Σk=0

N-1 (Wk-Wk+1)2

• The first sum telescopes to yield WT

2 – W02 = WT

2,

• The second sum is

• Σk=0N-1 (Wk-Wk+1)2 = Σk=0

N-1 (tk+1-tk)Zk2

More tricks…

• = ½WT2 + ½ Σk=0

N-1 (Wk-Wk+1)2

• The second sum is

• Σk=0N-1 (Wk-Wk+1)2 = Σk=0

N-1 (tk+1-tk)Zk2

• Suppose that tk = kh, Nh = T. As we make the partitions finer, n∞.

• Σk=0N-1 (Wk-Wk+1)2 = hΣk=0

N-1 Zk2

• Since E[Z2] = 1, this reduces to

• hΣk=0N-1 1 = hN = T.

• So ∫0t WsdWs = ½WT

2 - ½ T.

Pause to look at this numerically

• We’ve probably all seen this trick, and it is cute, and it makes the point that we need Ito Calculus, and Ito’s lemma.

• But it sort of obscures what is going on.

• If we simulate it using tk = kh, Nh = T, let’s say T = 10 N = 1000 we get the result from the following table:

Stochastic Integral

Trial Approx Theory Rel Err1 7.41 7.87 -5.84%2 -2.93 -2.42 21.02%3 -2.22 -2.24 -0.66%4 -2.94 -3.40 -13.57%5 10.47 11.42 -8.29%6 -4.31 -4.63 -6.88%7 12.80 11.94 7.19%8 -5.01 -4.60 8.78%9 11.25 11.87 -5.22%

10 4.84 4.49 7.88%Averages 2.94 3.03 0.44%

Some notes

• Analytical result is convincing

• But the numerical method doesn’t approximate it all that well.

• Lots of noise.

• However the numerical method overshoots and undershoots --- “on average” it might perform OK.

General Matlab Code

%STINT Approximate Ito integrals from 0 to T of WdWfunction [ito_err,ito] = STINT(T,N,M)dt = T/N;for k = 1:M

dW = sqrt(dt)*randn(1,N); % incrementsW = cumsum(dW);ito(k) = sum([0,W(1:end-1)].*dW); % remember .* is a dot productito_err(k) = ito(k) - 0.5*(W(end)^2-T);

end

Sqrt(10) = 3.1623

Runs (M) T N mean(abs(ito_err)) Ratio Abs(Mean(ito)

10,000 1 10 0.1784 3.1687 0.005

10,000 1 100 0.0563 3.1629 0.0049

10,000 1 1000 0.0178 3.1786 0.00036

10,000 1 10,000 0.0056 3.111 0.00073

10,000 1 100,000 0.0018 0.0063

Strong convergence

• Looking at two definitions of convergence.

• The first is “strong convergence”.

• It is the average difference between the numerical and analytic solution over paths

• So mean(abs(ito_err)) or E[|X-X*|]

• This converges so that every time we take 10 times as many simulations we get sqrt(10) better accuracy. Not surprising – like Monte Carlo!

Weak Convergence

• The is the absolute value of the average of the approximation minus the average of the analytic expression: |E[X-X*]|

• E[∫0T Ws dWs] = ∫0

T E[Ws]dWs = ∫0T 0dWs = 0

• This is just abs(mean(ito)).

Hard to tell rate of Weak cgce

• Because it is already so close to being converged even with a ridiculously small number of integration points (10).

• In fact, even with 2 integration points it is STILL very small.

• With just 1 integration point it is precisely zero, by construction. (Convince yourself of why!)

Stochastic Differential Equations

• We use Wiener Processes to “add noise” to ordinary differential equations to obtain stochastic differential equations:

• x(t) = x0 + ∫0t a(x(s),s)ds + ∫0

t b(x(s),s)dWs

• Some of these SDEs have analytical solutions (a set of PDFs indexed by t)

• But most must be solved numerically.

• If b(x(s),s) = 0 for all s we return to a first order ODE

• dx/dt = a(x,t); x(0) = x0,

• Recall how we solved this ODE numerically

Forward Euler Method

• dx/dt = f(x,t) (1a)x(0) = x0 (1b)

• We want an approximate solution on [0,T]

• Divide [0,T] into N intervals of length h = T/N

• (more general is also possible and adaptive methods do this).

• Call the endpoints of these intervals tk, k = 0…N where t0 = 0 and tk+1 = tk + h.

Left endpoint approx of derivative

• dx/dt(tk) = [x(tk+1)-x(tk)]/[tk+1-tk] = [x(tk+1)-x(tk)]/h = f(x(tk),tk)

• Then the ODE turns into a set of N algebraic difference equations by using

• x(tk+1) = x(tk) + hf(x(tk),tk) (2a)x(t0) = x0 (2b)

Simple Example

• dx/dt = -Bx (thus f(x,t) = -Bx) (3a)

• x(0) = A (3b)

• Solve on the interval [0,T] with n steps; h = T/n

• Then we get x(tk+1) = x(tk) + hf(x(tk),tk) reduces to

With solution

• x(tk+1) = x(tk) - Bhx(tk)

• or

• x(tk+1) = (1-Bh)x(tk)

• x(t0) = A

• Unusually, we can solve this difference equation in closed form to obtain

• x(tk) = A(1-Bh)k (4)

Investigating errors

• Let’s compare (4) to the exact solution of the ode, X(t) = Aexp(-Bt) (we distinguish X and xbecause the two solutions are not the same). Setting t = tk = kh, we get

• X(tk) = Aexp[-Btk]• Now let’s look at the difference between the

solutions e(tk) = |X(tk)-x(tk)|• If h is very small (h << 1/B), we can expand both

X and x into rapidly converging Taylor series in Bh:• X(tk)=X(kh)=A[1 - Bkh+ ½ (Bkh)2 +(1/6)(Bkh)3 + ….]

O(h) error for ODEs

• x(tk) = x(kh) = • A[1 – Bkh + [k(k-1)/2](Bh)2 + [k(k-1)(k-2)/6](Bh)3 +

….]• Since k3 – k(k2-3k+2) = 3k2 -2k• The difference between these is X(tk)-x(tk) = ½ A

k(Bh)2 + A(3k2-2k)/6(Bh)3 + ….• Writing kh = tk this becomes Atk [½ B2h + (3k-2)/6

B3h2 + …}.• This error approaches zero so that lim h 0

[e(tk)/h] is a constant, in this case ½ AB2 tk.

SDEs

• SDEs are written in integral form:

• x(t) = x0 + ∫0t a(x(s),s)ds + ∫0

t b(x(s),s)dWs

• We want to approximate solution on interval [0,T]

• Divide [0,T] into N intervals of length h = T/N

• Call the endpoints of these intervals tk, k = 0…N where t0 = 0 and tk+1 = tk + h.

Forward Euler is O(h)

• This general “O(h)” behaviour applies more generally for the Forward Euler method and, for the “weak error”, also for the version of Forward Euler applied to stochastic integrals.

Euler Maruyama

• x(tk+1) = x0 + ∫0t a(x(s),s)ds + ∫0

t b(x(s),s)dWs

• = x(tk) + ∫tktk+1 a(x(s),s)ds + ∫tk

tk+1 b(x(s),s)dWs

• If tk+1-tk = h is small, we can approximate both integrals by their left endpoint limit:

• x(tk+1) = x(tk) + a(x(tk),tk)h+ b(x(tk),tk)dWk (4)

• This is the Euler-Maruyama method.

Euler Maryuma for GBM

• dS = μSdt + σSdW

• Or, a(S,t) = μS and b(S,t) = σS

• Then Sk+1 = Sk + μSkh+ σSk(Wk+1 – Wk)

• S0 given.

• Of course we have an analytic solution for GBM as well:

• St = S0 exp[(μ – ½ σ2)t]exp[σWt]

Numerical implementation

• First simulate a Wiener process W0 = 0, W1,W2,…WN via Wk+1 = Wk + √hN(0,1)

• Then use this sequence of points on the Wiener process to solve:

• Sk+1 = Sk + μSkh+ σSk(Wk+1 – Wk)• And exact soln is

Sexact(t) = S0 exp[(μ – ½ σ2)t]exp[σWt]• To see how this works we experiment with the

following Matlab code:

Matlab code%% EM1 Euler Maruyama method on dX = mu*X dt + sigma*X dW, X(0) = Xzero

randn('state',100) %set seed for randn

mu = 0.05; sigma = 0.2; Xzero = 10; T =10; N = 2^8; dt = T/N;

dW = sqrt(dt)*randn(1,N); %Brownian increments

W = cumsum(dW); % discretized Brownian path

Xtrue = Xzero*exp((mu - 0.5*sigma^2)*([dt:dt:T])+sigma*W);

plot([0:dt:T],[Xzero,Xtrue],'m-'), hold on

R = 4; Dt = 4*dt; L = N/R; %L EM steps of size Dt = R*dt

Xem = zeros(1,L); %preallocate for efficiency

Xtemp = Xzero;

for j = 1:LWinc = sum(dW(R*(j-1)+1:R*j)); %Winc is sum of appropriate R smaller steps

Xtemp = Xtemp + Dt*mu*Xtemp + sigma*Xtemp*Winc;Xem(j) = Xtemp;

end

plot([0:Dt:T],[Xzero,Xem],'r--*'), hold off

xlabel('t','FontSize',12)

ylabel('X','FontSize',16,'Rotation',0,'HorizontalAlignment','right')

Very useful reference

• The EM1 Matlab code (as well as the EMSTRONG and MILSTRONG codes we will investigate next) are small modifications of codes in a very nice paper

• Des Higham (2001), “An Algorithmic Introduction to Numerical Simulation of Stochastic Differential Equations”, SIAM Review 43(3) 525-546.

Results

Works pretty well!

• How can we discuss convergence?

• Strong convergence is E[|XEM(T)-Xex(T)|]

• So is a pathwise construct. The average is over different samples in sample space

• (1/N) Σω=1 N |Xω

EM(T)-Xωex(T)|

• It can be seen, or demonstrated, that for E-M the strong convergence rate is quite slow: only like h1/2

First let’s show it

• Just work with same GBM model we’ve already played with.

• Make a Matlab script that:• Does a GBM with 5 different step sizes (multiples

of 2 of each other)• Loops over 1000 paths and finds the average

absolute difference between time T values of true and EM values

• Plots these differences vs. step size and fits LS curve to it. Gets slope pretty close to 1/2

The Matlab Code part 1

%% EMSTRONG Test strong convergence of Euler Maruyama % % Solved dX = mu*X dt + sigma*X dW, X(0) = Xzero,% where mu = 2, sigma = 1, and Xzero - 1.%% Discretized Brownian path over [0,1] has dt = 2^(-9).% E-M uses 5 different time steps: 16dt, 8dt, 4dt, 2dt, dt.% Examine strong convergence at T = 1: E|X_L - X(T)|

randn('state',100)mu = 0.05; sigma = 0.2; Xzero = 1; %problem parametersT =10; N = 2^8; dt = T/N; %M = 1000; % number of paths sampled

EMStrong code part 2

Xerr = zeros(M,5); %preallocate arrayfor s = 1:M, %sample over discrete Brownian paths

dW = sqrt(dt)*randn(1,N); %Brownian incrementsW = cumsum(dW); % discretized Brownian pathXtrue = Xzero*exp((mu - 0.5*sigma^2)*T+sigma*W(end));for p = 1:5

R = 2^(p-1); Dt = R*dt; L = N/R; % L Euler steps of size Dt = R*dtXtemp = Xzero;

for j = 1:LWinc = sum(dW(R*(j-1)+1:R*j));Xtemp = Xtemp + Dt*mu*Xtemp + sigma*Xtemp*Winc;

endXerr(s,p) = abs(Xtemp-Xtrue); %store the error at t = 1

endend

EM Strong Code part 3

Dtvals = dt*(2.^([0:4]));loglog(Dtvals,mean(Xerr),'b*-'), hold onloglog(Dtvals,(Dtvals.^(.5)),'r--'), hold off %reference slope of 1/2axis([1e-2 1 1e-2 10])xlabel('\Delta t'), ylabel('Sample average of |X(T)-X_L|')title('emstrong.m','FontSize',10)% Least squares fit of error = C*Dt^q %A = [ones(5,1),log(Dtvals)']; rhs = log(mean(Xerr)');sol = A\rhs; q = sol(2)resid = norm(A*sol-rhs)

EM Results: µ= 5%, σ = 20%, T = 10 LS slope estimate is 0.4859

Strong convergence ½

• This can be proved.• It requires application of the Cauchy Schwartz,

Gronwall Lemma, and Ito Isometry. You can find the proof in Kloeden and Platen, or in some slides that Tony Ware presented to your predecessors in 2013.

• For me at least the proof does not really bring with it much intuition

• But I have a calculation, much like the one we did for the dx/dt = -Bx ODE, that I find very insightful.

Computing E-M for GBM

• dX = aXdt + bXdW, X(0) = x0

• So a special case: a(x,t) = ax, b(x,t) =bx.Then we get , writing x(tk) = Xk

• Xk+1 = Xk + aXkh+ bXk√h Zk

• X0 = x0

• Xk+1 = (1+ah + b√h Zk)Xk

X0 = x0

• So Xk = x0 Πj=1k (1+ah + b√h Zj)

Exact Solution

• Exact solution is:

• x0 exp((a – ½ b2)kh)exp(b√hΣ(Z1 + …. + Zk) )

• We need to compute Err(k) = |x0 Πj=1

k (1+ah + b√h Zj) –

x0 exp((a – ½ b2)kh)exp(b√hΣ(Z1 + …. + Zk) )|

This scales with x0 so we can drop x0; we need to expand everything up to O(h3/2). Later we will take the expectation

This is a real mess but let’s get started….

EM Approx to h 1.5 order

• Let’s expand the product to O(h3/2) but not higher.

• We get all the products of 1 with 1, ah and b√h Zj

• We get all the products of b√h Zj with b√h Zk and with ah

• We also get all triple products of b√h Zm

• That’s it (for eg ah x ah is too high order in h).

• So we get 1+ kah + b√h (Z0 _+ Z1 + … Zk-1 ) +b2h (sum over all possible pairs of different indices Zi and Zj) + abh3/2 (Z0 _+ Z1 + … Zk-1 ) + b3h3/2 (sum over all possible triples of different indices i,j,k)

Exact soln to O(h1.5)

• To compare the exact solution to the approximate solution on the same path use Z = Z1 + …. + Zk

• x0 exp[(a – ½ b2)kh] exp(b√hZ)

• If we expand this to order 3/2 in h we get

• (1 + (a – ½ b2)kh) (1 + b√hZ + ½ b2hZ2 + 1/6 b3

h3/2 Z3)

• which will be 1 + (a – ½ b2)kh +b√hZ (1 + (a –½ b2)kh) + ½ b2hZ2 + 1/6 b3 h3/2 Z3

Now take the difference

• 1+ kah + b√h (Z0 _+ Z1 + … Zk-1 ) +b2h (sum over all possible pairs of different indices Zi and Zj) + abh3/2 (Z0

_+ Z1 + … Zk-1 ) - (1 + (a – ½ b2)kh +b√hZ (1 + (a – ½ b2)kh) + ½ b2hZ2)

• This tidies quite a bit, remembering that (Z0 _+ Z1 + … Zk-1 ) = Z and that ½ Z2 - (sum over all possible pairs of different indices Ziand Zj) = ½ (Z0

2_+ Z1

2 + … Zk-12), to yield:

• ½ b2 kh + ½ b2 kh √hZ – ½ b2 h (Z0 2+ Z1

2 + … Zk-12)

• There will also be some b3h3/2 terms (of form Z13, Z1

2Z2and similar) that vanish in the next expectation step.

• Recall Nh = T to get:• Err(T) = ½ |x0||T b√hZ– h(Z0

2+ Z12 + ..+ZN-1

2 – N)|• Now both of these terms 0 as N infinity • This makes sense as it says that the E-M works in

the limit.• For the strong convergence the expectation is of

Err(T) and is *outside* the absolute values! • But the size of |Z| is √N, • the size of |(Z0

2+ Z12 + ..+ZN-1

2 – N)| is also √N• So, for finite N and h,

Strong convergence O(1/2)

• E{Err(T)} = ½b2T1.5 b(Z/√N)– ½b2√h√T[(Z0

2+ Z12 + ..+ZN-1

2 – N)/√N]

• So we have an O(h0.5) term in our strong error.

• We will see, when we study the Milstein method, how this term gets removed.

Weak Convergence

• This is E[XEM(T)] – E[XEx(T)], which for Euler Maryuma scales like h.

• Here we must take the two expectations on their own. This turns out to be easier:

E[XEM(T)]

• Dead easy.

• E[Xk] = E[x0 Πj=1k (1+ah + b√h Zj)]

• Now use the tower law to take the expectation over each time step succesively

• Since E[Zj] = 0 this simplifies right down to:

• E[Xk] = x0 Πj=1k (1+ah) = x0 (1+ah)k

• Note this is just the Euler method solution to the deterministic ODE from earlier.

E[XExact(T)]

• E[x0 exp[(a – ½ b2)kh] exp(b√hZ)]

• When k = N, Nh = T, this becomes:

• x0 exp[(a – ½ b2)T] E[exp(b√hZ)]

• Expand the exponential in a Taylor series to get:

• E[exp(b√hZ)] = E[1 + b√hZ + ½ b2hZ2 + …]

Lots of integrals

• All the odd terms vanish, since Z^odd is antisymmetric

• After seeing the pattern in the Gaussian integrals it turns out that:

• E[exp(b√hZ)] = exp( ½ b2 T)

• (Could probably also do this with Ito Isometryin some clever way).

• That boils together to yield:

E[XExact(T)] conclusion

• E[x0 exp[(a – ½ b2)T] exp(b√hZ)] = x0 exp(aT)

• Again, the exact solution for the ODE.

• This means we’ve already done all the work needed to get the O(h) weak convergence!

• And seen why we need that – ½ σ2 T correction in the analytic solution.

Weak vs. Strong

• For many options pricing applications it is really only the weak version of convergence we are using.

• And, for GBM at least, the results are really good for E-M – so the scaling isn’t crucial.

• We can go to the Milstein method if we want to do better

Milstein I

• Euler Maryuma seemed to be confused.• It was accurate to order h in the drift term, but

only to √h in the diffusion term.• Despite this its weak order was h, but that was in

some sense “lucky”.• Recall Ito dF(X) = Fx dX + ½ Fxx b2 dt• Plug this in to the diffusion term using• ‘db(x) = b’(x)dX + ½ b’’(x) b2 dt• (The second term here will drop out as higher

order; for GBM it vanishes anyhow).

Milstein II

• Then we will get, for the diffusion term

• ∫tktk+1 b(x(s))dWs

• = ∫s=tktk+1 [b(xk) + b’(xk)b(xk) ∫u = tk

s dWu]dWs

• The first term here is already in the E-M expression, but the second

• b’(xk)b(xk) ∫s=tktk+1 ∫u = tk

s dWudWs

• Is the new Milstein term

Milstein III

• The double integral here reduces to• ∫s=tk

tk+1 ∫u = tks dWudWs = ½ (Wk+1-Wk)

2 – ½ h• [To see this note that• ∫s=0

h∫u = 0s dWudWs =∫s=0

hWsdWs = ½Wh2 – ½ h

• Where the second equality was actually worked out at the beginning of class,

(and also follows from Ito’s lemma)• d(½Wt

2 – ½ t) = WtdWt – ½dt + ½(dt) = WtdWt

Milstein IV

• So Milstein is:

• Xk+1 = Xk + a(Xk)h+ b(Xk) (Wk+1-Wk) +

½ b(Xk)b’(Xk){(Wk+1-Wk)2 – h}

Or, Xk+1 = Xk + a(Xk)h+ b(Xk) Zk√h+

½ b(Xk)b’(Xk)Zk2h

Where the Zk are iid N(0,1) draws.

S0=10, μ = 5%, σ = 20%, h = 0.01, N = 100, T = 1

Milstein Strong convergence is 1

• Can demonstrate this for GBM with a very close “cousin” of the EMSTRONG code, MILSTRONG.

• MILSTRONG has just one different line in its “guts” than EMSTRONG:

Xtemp = Xtemp + Dt*µ*Xtemp + σ*Xtemp*Winc

+ 0.5*σ^2*Xtemp.*(Winc.^2-Dt);

• And a slightly different display (comparing with m =1 slope line, not m = ½ slope.

Milstein Rslt: µ= 5%, σ = 20%, T = 10 LS slope estimate is 0.9923

After all this

• Despite the dramatic improvement in strong solutions, the weak solutions don’t converge any faster.

• For a lot of MC applications in finance the “weak” error estimate is the right one (think of pricing a European option for instance).

• For such problems Milstein is not worth doing.

• (Strong) convergence is a bit faster as a function of h, but the method is more work, and you can get a slightly smaller h in the same time with E-M.

Now

• Next we will take a look at simulating Delta Hedging, to get some more insights about why strong methods aren’t so important (Hedging is super strong anyway!)

• See attached spreadsheet.