Pricing Bermudan options via multilevel approximation...

20
SIAM J. FINANCIAL MATH. c xxxx Society for Industrial and Applied Mathematics Vol. xx, pp. x x–x Pricing Bermudan options via multilevel approximation methods Denis Belomestny , Fabian Dickmann , and Tigran Nagapetyan § Abstract. In this article we propose a novel approach to reduce the computational complexity of various ap- proximation methods for pricing discrete time American or Bermudan options. Given a sequence of continuation values estimates corresponding to different levels of spatial approximation, we propose a multilevel low biased estimate for the price of the option. It turns out that the resulting complexity gain can be of order ε -1 with ε denoting the desired precision. The performance of the proposed multilevel algoritheorems is illustrated by a numerical example. Key words. Bermudan options, Multilevel Monte Carlo, mesh method, global regression, complexity analysis AMS subject classifications. 65C30, 65C20, 65C05, 60H35 1. Introduction. Pricing of an American option usually reduces to solving an optimal stopping problem that can be efficiently solved in low dimensions via dynamic programming algoritheorem. However, many problems arising in practice (see e.g. [11]) have high dimen- sions, and these applications have motivated the development of Monte Carlo methods for pricing American option. Pricing American style derivatives via Monte Carlo is a challenging task, because it requires the backwards dynamic programming algoritheorem that seems to be incompatible with the forward structure of Monte Carlo methods. In recent years much research was focused on the development of fast methods to compute approximations to the optimal exercise policy. Eminent examples include the functional optimization approach of [2], the mesh method of [5], the regression-based approaches of [7], [13], [14], [9] and [3]. The complexity of the fast approximations algoritheorems depends on the desired precision ε in a quite nonlinear way that, in turn, is determined by some fine properties of the underlying exercise boundary and the continuation values (see, e.g., [3]). In some situations (e.g. in the case of the stochastic mesh method or local regression) this complexity is of order ε 3 , which is rather high. One way to reduce the complexity of the fast approximation methods is to use various variance reduction methods. However, the latter methods are often ad hoc and, more importantly, do not lead to provably reduced asymptotic complexity. In this paper we propose a generic approach which is able to reduce the order of asymptotic complexity and which is applicable to various fast approximation methods, such as global regression, local regression or stochastic mesh method. The main idea of the method is inspired by the path- breaking work of [10] that introduced a multilevel idea into stochastics. As similar to the recent work of [4], we consider not levels corresponding to different discretization steps, but levels related to different degrees of approximation of the continuation values. For example, in the case of the Longstaff-Schwartz algoritheorem, the latter degree is basically governed by * The research by Denis Belomestny was made in IITP RAS and supported by Russian Scientific Foundation grant (project N 14-50-00150) Duisburg-Essen University and IITP RAS. ([email protected]) Duisburg-Essen University. ([email protected]) § Weierstrass Institute of Applied Mathematics. ([email protected]) 1

Transcript of Pricing Bermudan options via multilevel approximation...

SIAM J. FINANCIAL MATH. c© xxxx Society for Industrial and Applied MathematicsVol. xx, pp. x x–x

Pricing Bermudan options via multilevel approximation methods∗

Denis Belomestny†, Fabian Dickmann‡, and Tigran Nagapetyan§

Abstract. In this article we propose a novel approach to reduce the computational complexity of various ap-proximation methods for pricing discrete time American or Bermudan options. Given a sequence ofcontinuation values estimates corresponding to different levels of spatial approximation, we proposea multilevel low biased estimate for the price of the option. It turns out that the resulting complexitygain can be of order ε

−1 with ε denoting the desired precision. The performance of the proposedmultilevel algoritheorems is illustrated by a numerical example.

Key words. Bermudan options, Multilevel Monte Carlo, mesh method, global regression, complexity analysis

AMS subject classifications. 65C30, 65C20, 65C05, 60H35

1. Introduction. Pricing of an American option usually reduces to solving an optimalstopping problem that can be efficiently solved in low dimensions via dynamic programmingalgoritheorem. However, many problems arising in practice (see e.g. [11]) have high dimen-sions, and these applications have motivated the development of Monte Carlo methods forpricing American option. Pricing American style derivatives via Monte Carlo is a challengingtask, because it requires the backwards dynamic programming algoritheorem that seems tobe incompatible with the forward structure of Monte Carlo methods. In recent years muchresearch was focused on the development of fast methods to compute approximations to theoptimal exercise policy. Eminent examples include the functional optimization approach of[2], the mesh method of [5], the regression-based approaches of [7], [13], [14], [9] and [3]. Thecomplexity of the fast approximations algoritheorems depends on the desired precision ε ina quite nonlinear way that, in turn, is determined by some fine properties of the underlyingexercise boundary and the continuation values (see, e.g., [3]). In some situations (e.g. in thecase of the stochastic mesh method or local regression) this complexity is of order ε−3, whichis rather high. One way to reduce the complexity of the fast approximation methods is touse various variance reduction methods. However, the latter methods are often ad hoc and,more importantly, do not lead to provably reduced asymptotic complexity. In this paper wepropose a generic approach which is able to reduce the order of asymptotic complexity andwhich is applicable to various fast approximation methods, such as global regression, localregression or stochastic mesh method. The main idea of the method is inspired by the path-breaking work of [10] that introduced a multilevel idea into stochastics. As similar to therecent work of [4], we consider not levels corresponding to different discretization steps, butlevels related to different degrees of approximation of the continuation values. For example,in the case of the Longstaff-Schwartz algoritheorem, the latter degree is basically governed by

∗The research by Denis Belomestny was made in IITP RAS and supported by Russian Scientific Foundation grant(project N 14-50-00150)

†Duisburg-Essen University and IITP RAS. ([email protected])‡Duisburg-Essen University. ([email protected])§Weierstrass Institute of Applied Mathematics. ([email protected])

1

2 TEX PRODUCTION

the number of basis functions and in the case of the mesh method by the number of trainingpaths used to approximate the continuation values. The new multilevel approach is able tosignificantly reduce the complexity of the fast approximation methods leading in some cases tothe complexity gain of order ε−1. The paper is organised as follows. In Section 2 the pricingproblem is formulated, the main assumptions are introduced and illustrated. In Section 3the complexity analysis of a generic approximation algoritheorem is carried out. The mainmultilevel Monte Carlo algoritheorem is introduced in Section 4 where also its complexity isstudied. In Section 5 we numerically test our approach for the problem of pricing Bermudanmax-call options via mesh method. The proofs are collected in Section 7.

2. Main setup. An American option grants its holder the right to select the time atwhich to exercise the option, and in this differs from a European option that may be exercisedonly at a fixed date. A general class of American option pricing problems can be formulatedthrough an R

d Markov process Xt, 0 ≤ t ≤ T defined on a filtered probability space(Ω,F , (Ft)0≤t≤T ,P) . It is assumed that the process (Xt) is adapted to (Ft)0≤t≤T in the sensethat each Xt is Ft measurable. Recall that each Ft is a σ -algebra of subsets of Ω such thatFs ⊆ Ft ⊆ F for s ≤ t. We restrict attention to options admitting a finite set of exerciseopportunities 0 = t0 < t1 < t2 < . . . < tJ = T, called Bermudan options. Then

Zj := Xtj , j = 0, . . . ,J ,

is a Markov chain. If exercised at time tj , j = 1, . . . ,J , the option pays gj(Zj), for someknown functions g0, g1, . . . , gJ mapping R

d into [0,∞). Let Tj denote the set of stoppingtimes taking values in j, j + 1, . . . ,J . A standard result in the theory of contingent claimsstates that the equilibrium price Vj(z) of the Bermudan option at time tj in state z, giventhat the option was not exercised prior to tj , is its value under the optimal exercise policy:

V ∗j (z) = sup

τ∈Tj

E[gτ (Zτ )|Zj = z], z ∈ Rd.

A common feature of all fast approximation algoritheorems is that they deliver estimatesCk,0(z), . . . , Ck,J−1(z) for the so-called continuation values:

C∗j (z) := E[V ∗

j+1(Zj+1)|Zj = z], j = 0, . . . ,J − 1.(2.1)

Here the index k indicates that the above estimates are based on the set of “training” trajecto-

ries (Z(i)0 , . . . , Z

(i)J ), i = 1, . . . , k, all starting from one point, i.e., Z

(1)0 = . . . = Z

(k)0 . In the case

of the so-called regression methods and the mesh method, the estimates for the continuationvalues are obtained via the recursion (dynamic programming principle):

C∗J (z) = 0,

C∗j (z) = E[max(gj+1(Zj+1), C

∗j+1(Zj+1))|Zj = z]

combined with Monte Carlo: at (J − j) th step one estimates the expectation

(2.2) E[max(gj+1(Zj+1), Ck,j+1(Zj+1))|Zj = z]

USING SIAM’S MM LATEX MACROS 3

via regression (global or local) based on the set of paths

(Z(i)j , Ck,j+1(Z

(i)j+1)), i = 1, . . . , k,

where Ck,j+1(z) is the estimate for C∗j+1(z) obtained in the previous step.

Given the estimates Ck,0(z), . . . , Ck,J−1(z), we can construct a lower bound (low biasedestimate) for V ∗

0 using the (generally suboptimal) stopping rule:

τk = min

0 ≤ j ≤ J : gj(Zj) ≥ Ck,j(Zj)

with Ck,J ≡ 0 by definition. Indeed, fix a natural number n and simulate n new independenttrajectories of the process Z. A low-biased estimate for V ∗

0 can be then defined as

V n,k0 =

1

n

n∑

r=1

gτ(r)k

(

Z(r)

τ(r)k

)

(2.3)

withτ(r)k = inf

0 ≤ j ≤ J : gj(Z(r)j ) ≥ Ck,j(Z

(r)j )

.

Thus any fast approximation approximation algoritheorem can be viewed as consisting of thefollowing two steps.Step 1 Construction of the estimates Ck,j, j = 1, . . . , J, on k training paths.

Step 2 Construction of the low-biased estimate V n,k0 by evaluating functions Ck,j, j =

1, . . . , J, on each of new n testing trajectories.Let us now consider a generic family of the continuation values estimates Ck,0(z), . . . , Ck,J−1(z)

with the natural number k determining the quality of the estimates as well as their complexity.In particular, we make the following assumptions.(AP) For any k ∈ N, the estimates Ck,0(z), . . . , Ck,J−1(z) are defined on some probability

space (Ωk,Fk,Pk) which is independent of (Ω,F ,P).(AC) For any j = 1, . . . ,J , the cost of constructing the estimate Ck,j on k training paths,

i.e., Ck,j

(

Z(i)j

)

, i = 1, . . . , k, is of order k × kκ1 for some κ1 > 0 and the cost of

evaluating Ck,j(z) in a new point z 6∈ Z(1)j , . . . , Z

(k)j is of order kκ2 for some κ2 > 0.

(AQ) There is a sequence of positive real numbers γk with γk → 0, k → ∞ such that

Pk

(

supz

∣Ck,j(z)− C∗j (z)

∣ > η√γk

)

< B1e−B2η, η > 0

for some constants B1 > 0 and B2 > 0 not depending on k and η.Discussion.

• Given (AC), the overall complexity of the corresponding fast approximation algorithe-orem is proportional to

(2.4) k1+κ1 + n× kκ2 ,

where the first term in (2.4) represents the cost of constructing the estimates Ck,j,j = 1, . . . , J, on training paths and the second one gives the cost of evaluating theestimated continuation values on n testing paths.

4 TEX PRODUCTION

• Additionally, one usually has to take into account the cost of paths simulation. If theprocess X solves a stochastic differential equation and the Euler discretisation schemewith a time step h is used to generate paths, then the term k × h−1 + n× h−1 needsto be added to (2.4). In order to make the analysis more focused and transparent, wedo not take here the path generation costs and discretisation errors into account.

Let us now illustrate the above assumptions for three well known fast approximation methods.Example 2.1 (Global regression). Fix a vector of real-valued functions ψ = (ψ1, . . . , ψM ) on

Rd. Suppose that the estimate Ck,j+1 is already constructed and has the form

Ck,j+1(z) = αkj+1,1ψ1(z) + . . .+ αk

j+1,MψL(z)

for some (αkj+1,1, . . . , α

kj+1,M ) ∈ R

M . Let αkj = (αk

j,1, . . . , αkj,M) be a solution of the following

least squares optimization problem:

(2.5) arginfα∈RM

k∑

i=1

[

ζj+1,k(Z(i)j+1)− α1ψ1(Z

(i)j )− . . .− αMψM (Z

(i)j )]2

with ζj+1,k(z) = max gj+1(z), Ck,j+1(z). Define an approximation for C∗j via

Ck,j(z) = αkj,1ψ1(z) + . . .+ αk

j,MψM (z), z ∈ Rd.

It is clear that all estimates Ck,j are well defined on the cartesian product of k independentcopies of (Ω,F ,P). The complexity comp(αk

j ) of computing αkj is of order k·M2+comp(αk

j+1),

since each αkj is of the form α

kj = B−1b with

Bp,q =1

k

k∑

i=1

ψp(Z(i)j )ψq(Z

(i)j )

and

bp =1

k

k∑

i=1

ψp(Z(i)j )ζk,j+1(Z

(i)j+1),

p, q ∈ 1, . . . ,M. Iterating backwardly in time, we get comp(αkj ) ∼ (J − j) · k ·M2. Further-

more, it can be shown that the estimates Ck,0(z), . . . , Ck,J−1(z) satisfy the assumption (AQ)with γk = 1/k, provided M increases with k at polynomial rate, i.e., M = kρ for some ρ > 0(see, e.g., [15]). Thus the parameters κ1 and κ2 in (AC) are given by 2ρ and ρ, respectively.

[12] argue that the number of paths k must increase exponentially with M . In fact, the pa-per of Glasserman and Yu deals with rather special case, as they consider Black-Scholes modeland use orthogonal Hermite polynomials to estimate the corresponding continuation values.In the latter situation the variance of the least squares estimator exponentially increases withthe number of Hermite polynomials M and this leads to the aforementioned requirement. Ingeneral, the variance increases polynomially in M and this is the case studied in [15].

Example 2.2 (Local regression).Local polynomial regression estimates can be defined as fol-lows. Fix some j such that 0 ≤ j < J and suppose that we want to compute the expectation

E[ζj+1,k(Zj+1)|Zj = z], z ∈ Rd

USING SIAM’S MM LATEX MACROS 5

with ζj+1,k(z) = max gj+1(z), Ck,j+1(z) . For some δ > 0, z ∈ Rd, an integer l ≥ 0 and a

function K : Rd → R+, denote by qz,k a polynomial on Rd of degree l (i.e. the maximal order

of the multi-index is less than or equal to l) which minimizes

(2.6)k∑

i=1

[

ζj+1,k(Z(i)j+1)− q(Z

(i)j − z)

]2K

(

Z(i)j − z

δ

)

over the set of all polynomials q of degree l. The local polynomial estimator of order l for C∗j (z)

is then defined as Ck,j(z) = qz,k(0) if qz,k is the unique minimizer of (2.6) and Ck,j(z) = 0otherwise. The value δ is called a bandwidth and the function K is called a kernel function.In [3], it is shown that the local polynomial estimates Ck,0(z), . . . , Ck,J−1(z) of degree l satisfythe assumption (AQ) with γk = k−2β/(2β+d) under β-Holder smoothness of the continuationvalues C∗

0 (z), . . . , C∗J−1(z), provided δ = k−1/(2l+d). Since in general the summation in (2.6)

runs over all k paths (see Figure 2.1) we have κ1 = 1 and κ2 = 1 in (AC).

1 2 3 4 5 6 7 8 9 10

Figure 2.1. Local regression and mesh methods: in order to compute the continuation value estimate Ck,5

in a point (red) lying on a testing path (blue), all k points (yellow) on training paths at time 5 have to be used.

Example 2.3 (Mesh Method). In the mesh method of [6], the continuation value C∗j at a

point z is approximated via

Ck,j(z) =1

k

k∑

i=1

ζk,j+1(Z(i)j+1) · wij(z),(2.7)

where ζk,j+1(z) = max gj+1(z), Ck,j+1(z) and

wij(z) =pj(z, Z

(i)j+1)

1k

∑kl=1 pj(Z

(l)j , Z

(i)j+1)

,

6 TEX PRODUCTION

where pj(x, ·) is the conditional density of Zj+1 given Zj = x. Again the summation in (2.7)runs over all k paths. Hence κ1 = 1 in (AC) and for any j = 0, . . . ,J − 1, the complexity ofcomputing Ck,j(z) in a point z not belonging to the set of training trajectories, is of order k(see Figure 2.1), provided the transition density pj(x, y) is analytically known. For assumption(AQ) see, e.g., [1].

3. Complexity analysis of V n,k0 . We shall use throughout the notation A . B if A is

bounded by a constant multiple of B, independently of the parameters involved, that is, inthe Landau notation A = O(B). Equally A & B means B . A and A ∼ B stands for A . Band A & B simultaneously.

In order to carry out the complexity analysis of the estimate (2.3), we need the so-called“margin” or boundary assumption.(AM) There exist constants A > 0, δ0 > 0 and α > 0 such that

P(

|C∗j (Zj)− gj(Zj)| ≤ δ

)

≤ Aδα

for all j = 0, . . . ,J , and all δ > 0.Remark 3.1.Assumption (AM) provides a useful characterization of the behavior of contin-

uation values (C∗j ) and payoffs (gj) near the exercise boundary ∂E with

E =

(j, x) : gj(x) ≥ C∗j (x)

.

In the situation when all functions C∗j − gj , j = 0, . . . ,J − 1, are smooth and have non-

vanishing derivatives in the vicinity of the exercise boundary, we have α = 1. Other values ofα are possible as well, see [3]. Let us now turn to the properties of the estimate V n,k

0 . While

the variance of the estimate V n,k0 is given by

(3.1) Var[V n,k0 ] = Var[gτk(Zτk)]/n,

its bias is analysed in the following theorem.Theorem 3.2.Suppose that (AP), (AM) and (AQ) hold with some α > 0, then

∣V ∗0 − E[V n,k

0 ]∣

∣ . γ(1+α)/2k , k → ∞.

The next theorem gives an upper estimate for the complexity of V n,k0 .

Theorem 3.3. Let assumptions (AP), (AC), (AQ) and (AM) hold with

γk = k−µ, k ∈ N

for some µ > 0. Then for some constant c > 0 and any ε > 0 the choice

k∗ = cε− 2

µ(1+α) , n∗ = cε−2

leads to

E[

V n∗,k∗

0 − V ∗0

]2≤ ε2,(3.2)

USING SIAM’S MM LATEX MACROS 7

and the complexity of the estimate V n∗,k∗

0 (i.e. the cost needed to achieve (3.2)) is boundedfrom above by Cn∗,k∗(ε) with

(3.3) Cn∗,k∗(ε) . ε−2·max

(

κ1+1µ(1+α)

, 1+κ2

µ(1+α)

)

, ε→ 0.

Discussion. Theorem 3.3 implies that the complexity of V n∗,k∗

0 is always larger than ε−2.In the case κ1 = 1 and κ2 = 1 (mesh method or local regression) we get

(3.4) Cn∗,k∗(ε) . ε−2·max

(

2µ(1+α)

,1+ 1µ(1+α)

)

.

Furthermore, in the most common case α = 1, the bound (3.4) simplifies to

Cn∗,k∗(ε) . ε−2·max

(

1µ,1+ 1

)

.

Since for all regression methods and the mesh method µ ≤ 1, the asymptotic complexity isalways larger than ε−3. In the next section, we present a multilevel approach that can reducethe asymptotic complexity to ε−2 in some cases.

4. Multilevel approach. Fix some natural number L and let k = (k0, k1, . . . , kL) andn = (n0, n1, . . . , nL) be two sequences of natural numbers, satisfying k0 < k1 < . . . < kL andn0 > n1 > . . . > nL. Define

V n,k0 =

1

n0

n0∑

r=1

gτ(r)k0

(

Z(r)

τ(r)k0

)

+L∑

l=1

1

nl

nl∑

r=1

[

gτ(r)kl

(

Z(r)

τ(r)kl

)

− gτ(r)kl−1

(

Z(r)

τ(r)kl−1

)

]

withτ(r)k = inf

0 ≤ j ≤ J : gj(Z(r)j ) ≥ Ck,j(Z

(r)j )

, k ∈ N,

where for any l = 1, . . . , L, both estimates Ckl,j and Ckl−1,j are based on one set of kl train-ing trajectories, i.e. to estimate Ckl−1,j we use a subset of kl−1 trajectories from the set ofkl trajectories used to approximate Ckl,j. Note, that in all levels l = 1, . . . , L, we use the

same testing paths Z(r)· , r = 1, . . . , nl in gτ (r)kl

(

Z(r)

τ(r)kl

)

and gτ(r)kl−1

(

Z(r)

τ(r)kl−1

)

. Let us analyse the

properties of the estimate V n,k0 . First note that its bias coincides with the bias of gτkL (ZτkL

)

corresponding to the finest approximation level. As to the variance of V n,k0 , it can be signifi-

cantly reduced due the use of “good” continuation value estimates Ckl−1,j and Ckl,j (that areboth close to C∗

j ) on the same set of testing trajectories in each level. In this way a “coupling”effect is achieved. The following theorem quantifies the above heuristics.

Theorem 4.1. Let (AP), (AQ) and (AM) hold with some α > 0. If

Mp := E[

∣ maxl=0,...,J

gl(Zl)∣

2p]

<∞

for some p ≥ 1, then

E[

∣gτkl

(

Zτkl

)

− gτkl−1

(

Zτkl−1

)∣

2]

≤ CM1/pp γ

α/(2q)kl−1

8 TEX PRODUCTION

for any l = 1, . . . , L, some absolute constant C > 0 and q satisfying 1/p + 1/q = 1. As acorollary we get the following result.

Theorem 4.2. Suppose that conditions of Theorem 4.1 are fulfilled, then the estimate V n,k0

has the bias of order γ(1+α)/2kL

and the variance of order

Var[g(Xτk0)]

n0+

L∑

l=1

γα/(2q)kl−1

nl.

Furthermore, under assumption (AC), the cost of V n,k0 is bounded from above by a multiple of

L∑

l=0

(kκ1+1l + nl · kκ2

l )

Finally, the complexity of V n,k0 is given in the following theorem.

Theorem 4.3. Let assumptions (AP), (AC), (AQ) and (AM) hold with

γkl = k−µl , kl ∈ N

for some µ > 0. Then under the choice k∗l = k0 · θl, l = 0, 1, . . . , L, with θ > 1,

L = c

2

µ(1 + α)logθ

(

ε−1 · k−µ(1+α)/20

)

and

n∗l = cε−2

(

L∑

i=1

k(κ2−µα/(2q))i

)

·√

k(−κ2−µα/(2q))l

for some constant c not depending on ε, the estimate (2.3) fulfils

E[

V n∗,k∗

0 − V ∗0

]2≤ ε2.

As a result, the complexity of the estimate (2.3) is bounded up to a constant from above by

(4.1) Cn∗,k∗(ε).

ε−2·max

(

κ1+1µ(1+α)

,1)

, 2 · q · κ2 < µα

ε−2·

κ1+1µ(1+α) , 2 · q · κ2 = µα and κ1+1

µ(1+α) > 1

ε−2 · (log ε)2 , 2 · q · κ2 = µα and κ1+1µ(1+α) ≤ 1

ε−2·max

(

κ1+1

µ(1+α),1+

κ2−µα/(2q)

µ(1+α)

)

, 2 · q · κ2 > µα

USING SIAM’S MM LATEX MACROS 9

Discussion. Let us compare the complexities of the estimates V n∗,k∗

0 and V n∗,k∗

0 . For thesake of clarity, we will assume that κ1 = κ2 = κ as in the mesh or local regression methods.Then (4.1) versus (3.3) can be written as

ε−2·max

(

κ+1µ(1+α)

,1)

, 2 · q · κ < µα

ε−2· κ+1

µ(1+α) , 2 · q · κ = µα and κ+1µ(1+α) > 1

ε−2 · (log ε)2 , 2 · q · κ = µα and κ+1µ(1+α) ≤ 1

ε−2·max

(

κ+1µ(1+α)

,1+κ−µα/2µ(1+α)

)

, 2 · q · κ > µα.

∨ ε−2·max

(

κ+1µ(1+α)

, 1+ κ

µ(1+α)

)

Now it is clear that the multilevel algoritheorem will not be superior to the standard MonteCarlo algoritheorem as long as µ(1 + α) ≤ 1. In the case µ(1 + α) > 1, the computationalgain, up to a logaritheoremic factor, is given by

ε−2·min

(

κ

µ(1+α),1− 1

µ(1+α)

)

, 2 · q · κ < µα

ε−2·

(

1− 1µ(1+α)

)

, 2 · q · κ = µα and κ+1µ(1+α) > 1

ε−2· κ

µ(1+α) , 2 · q · κ = µα and κ+1µ(1+α) ≤ 1

ε−2·min

(

1− 1µ(1+α)

,µα/(2q)µ(1+α)

)

, 2 · q · κ > µα

Taking into account the fact that α = 1 in the usual situation, we conclude that it is advan-tageous to use MLMC as long as µ > 1/(2q).

5. Numerical example: Bermudan max calls on multiple assets. Numerical experimentsbelow aim to illustrate the potential of our algoritheorem as a general complexity reductiontool. Further computational savings can be achieved by using additional variance reductiontechniques.

5.1. Computational problem. Suppose that the price of the underlying assetX = (X1, . . . ,Xd)follows a Geometric Brownian Motion (GBM) under the risk-neutral measure, i.e.,

dXit = (r − δ)Xi

t dt+ σXit dB

it ,(5.1)

where r is the risk-free interest rate, δ the dividend rate, σ is the volatility, and Bt =(B1

t , . . . , Bdt ) is a d-dimensional Brownian motion. At any time t ∈ t0, . . . , tJ the holder of

the option may exercise it and receive the payoff

(5.2) h(Xt) = e−rt(max(X1t , . . . ,X

dt )− S)+.

We consider a benchmark example (see, e.g. [6], p. 462) with parameters x0 = (90, . . . , 90),σ = 0.2, r = 0.05, δ = 0.1, S = 100 tj = jT/J , j = 0, . . . ,J , with T = 3 and J = 9.

5.2. MLMC algoritheorem implementation. Let us first present a general version of thealgoritheorem, which does not rely on the type of the continuation values estimation method.Write a multilevel estimate based on L levels as

PL =L∑

l=0

Yl =1

n0

n0∑

r=1

gτ(r)k0

(

Z(r)

τ(r)k0

)

+L∑

l=1

1

nl

nl∑

r=1

[

gτ(r)kl

(

Z(r)

τ(r)kl

)

− gτ(r)kl−1

(

Z(r)

τ(r)kl−1

)

]

10 TEX PRODUCTION

withτ(r)k = inf

0 ≤ j ≤ J : gj(Z(r)j ) ≥ Ck,j(Z

(r)j )

, k ∈ N.

Here Yl is the Monte Carlo estimate for

Yl =

[

gτkl

(

Zτkl

)

− gτkl−1

(

Zτkl−1

)]

, l > 0

gτk0

(

Zτk0

)

, l = 0

1. Set L = 2.2. For l = 0, . . . , L,

• generate kl and kl−1 training paths;• estimate continuation values C l

1, . . . , ClJ and C l−1

1 , . . . , C l−1J ;

• generate 104 testing paths and estimate the variance Var Yl.3. Calculate nl, l = 0, . . . , L, according to (5.3)

(5.3) nl =

3 · ε−2 ·(

L∑

i=1

kκ2i · Var Yi

)

·√

k−κ2l · Var Yl

4. Estimate/update Y0, . . . , YL. If

max(

|YL−1|/2, |YL|)

≤ ε/√3,

then• set L = L+ 1;• generate kL and kL−1 training trajectories;• estimate continuation values CL

1 , . . . , CLJ and CL−1

1 , . . . , CL−1J ;

• generate 104 testing paths and estimate the variance Var YL;• go to step 3.

5. Return PL.While the cost of the standard MC algoritheorem is of order

kκ1+1L + 3 · kκ2

L · ε−2 ·L∑

l=0

Var Yl,

the cost of the above MLMC algoritheorem can be calculated as follows

L∑

l=0

kκ1+1l + kκ1+1

l−1 + nl · (kκ2l + kκ2

l−1)

with nl given by (5.3).Remark 5.1.The constant 3 in (5.3) is motivated by the following decomposition

E[

|PL − V ∗0 |2]

= |E[PL]− V ∗0 |2 +Var

[

E[PL|F(k1, . . . , kl)]]

+E[

Var[PL|F(k1, . . . , kl)]]

,

where F(k1, . . . , kl) is a σ-algebra generated by all training paths used to construct continuationvalues estimates on all levels.

USING SIAM’S MM LATEX MACROS 11

5.3. Mesh method. In this section, we report numerical results for a two dimensional case(D = 2), where the true price of the resulting Bermudan option is 8.08 (see [11], pp. 462).We shall use the mesh method combined with variance reduction to estimate the continuationvalues via

(5.4) Ckj (x) =

k∑

i=1

wij(x) ·max(

gj+1(Z(i)j+1), C

kj+1(Z

(i)j+1)

)

− bj(x) ·(

e−r(tJ −tj) maxk=1,...,d

(

ZkJ − S

)+− vj(x)

)

where vj(x) is the expectation of the control, i.e.,

vj(x) := E

[

e−r(tJ−tj) maxk=1,...,d

(

ZkJ − S

)+ ∣∣

∣Zj = x

]

.

In fact, vj(x) can be obtained analytically, as the value of the corresponding European max-call option.

It is easy to see, that the conditions of Theorem 3.3 and Theorem 4.3 are fulfilled withγk = 1/k in (AQ) and κ1 = κ2 = 1 in (AC) for the mesh method. Moreover, for the problemat hand, the assumption (AB) holds with α = 1. We set the number of training paths for themesh method to be

kl = 20× 2l.

and simulate independently kl training paths of the process Z using the exact simulationformula

Z(i)j = Z

(i)j−1 exp

([

r − δ − 1

2σ2]

(tj − tj−1) + σ√

(tj − tj−1) · ξij)

,

where ξij , i = 1, . . . , k, are i. i. d. standard normal random variables. The conditional densityof Zj given Zj−1 is explicitly given by

pj(x, y) =d∏

i=1

pj(xi, yi), x = (x1, . . . , xd), y = (y1, . . . , yd),

where

pj(xi, yi) =xi

yiσ√

2π(tj − tj−1)×

× exp

−(

log(

yixi

)

−(

r − δ − 12σ

2)

(tj − tj−1))2

2σ2(tj − tj−1)

.

Using the above paths, we construct the sequence of estimates (training phase)

Ck,0(x), . . . , Ck,J (x)

as described in Example 2.3.

12 TEX PRODUCTION

l

0 2 4 6

log2(V

ariance)

-4

-2

0

2Variance decay

l

0 2 4 6

log2(M

ean)

-10

-5

0

5Mean decay

log 2(ε−1)

2 4 6

RMSE/ε

0.6

0.8

1

log 2(ε−1)

2 4 6

SM

Ccost

MLM

Ccost

0.8

1

1.2

1.4

Figure 5.1. Mesh method: level log-variances Var[Yl] (estimated decay rate - 0.6, theoretical decay rate -0.5), log absolute increments |Yl| (estimated decay rate - 1.09, theoretical decay rate - 1), RMSE and gain incost.

In Figure 5.1, the logaritheorems of the estimated level variances Var[Yl] (top left plot) andthe absolute values of the level means |Yl| (top right plot) are shown as functions of l, alongwith the resulting RMSE (bottom plot). The corresponding fitted lines −0.6054 · l + 0.0596and −1.0941 · l− 2.9253 are in agreement with our theoretical analysis. The RMSE estimatesare done for the range of precisions ε = 0.2 · 2−k, k = 0, 1, 2, 3.

5.4. Global regression. Here a one dimensional situation is studied (D = 1) and a re-gression method with piecewise constant basis functions is used. For any natural number m,set ∆ = 100/m and define

ψi(x) =

0, x− 50 > (i− 1)∆,

1, otherwise,

0, x− 50 ≤ i∆

USING SIAM’S MM LATEX MACROS 13

for all i = 1, . . . ,m. Given a sequence of natural numbers m(k), k ∈ N, the continuationvalues estimates are

Ckj (x) =

E(x, j,J ), x < 50,m(k)∑

i=1αiψi(x), otherwise

E(x, j, j + 1), x > 150,

,(5.5)

where

E(x, j, l) = E

[

e−r(tl−tj) maxk=1,...,d

(

Zkl − S

)+ ∣∣

∣Zj = x

]

for any J ≥ l ≥ j. The number of training paths in each level is chosen to be

kl = 31250 × 2l, l = 1, . . . , L,

while the sequence m(k) is given by m(k) = ⌈7 · kρ⌉ with ρ = 0.5 (see Example 2.1). We usethe same control variates as in (5.4) while estimating the continuation values. This guaranteesthat the variance of continuation values estimates is small.

In Figure 5.2 the estimated level variances Var[Yl] (top left plot) and the absolute valuesof the level means |Yl| (top right plot) are depicted along with the RMSE (bottom plot).The fitted lines for the log variances (−0.4592 · l − 1.4615) and for the log absolute means(−0.9230 · l − 4.9415) are in agreement with our theoretical result. The RMSE estimates areshown for ε = 0.004 · k, k = 1, 2, 3, 4, 5, 6. Computational saving are greater, than in themesh method case, but they can be even greater, if one combines our approach with a simplevariance reduction technique, presented in [8].

6. Conclusion. In this paper we introduced a novel MLMC algoritheorem for solvingevaluation problem for Bermudan options. We developed a general framework, which can beused in combination with various fast approximation algoritheorems for continuation valuesestimation. As a matter of fact, the Bermudan option price evaluation problem is consideredto be one of the most difficult and computationally demanding (see [11]) problems in financialmathematics and any gains are of importance. Our algoritheorem can be extended to otherdirections, see, e.g. a recent work [8], where additional computational gains were achieved byusing a combination of our MLMC paradigm with a control variate technique. In the futurewe plan to develop a new version of our algoritheorems, which combines different levels ofaccuracy for continuation values estimation with different time discretization step sizes. Suchan extension can be useful for much more difficult problem of American option pricing.

7. Proofs.

7.1. Proof of Theorem 3.2. A family of stopping times(

τj)

j=0,...,Jw.r.t. the filtration

(Fj)j=0,...,J is called consistent if

j ≤ τj ≤ J , τJ = J

andτj > j =⇒ τj = τj+1.

14 TEX PRODUCTION

l

0 2 4 6

log2(V

ariance)

-4

-2

0

2Variance decay

l

0 2 4 6

log2(M

ean)

-10

-5

0

5Mean decay

log 2(ε−1)

5 6 7 8

RMSE/ε

0.4

0.6

0.8

1

log 2(ε−1)

5 6 7 8

SM

Ccost

MLM

Ccost

0.8

1

1.2

1.4

1.6

Figure 5.2. Global regression method: variance decay (estimated decay rate - 0.4592, theoretical decay rate- 0.5), increments decay (estimated decay rate - 0.9230, theoretical decay rate - 1), RMSE and gain in cost.

Lemma 7.1.Let (Yj) be a process adapted to the filtration (Fj) and let(

τ1j)

and(

τ2j)

be twoconsistent families of stopping times. Then

EFj

[

Yτ1j − Yτ2j

]

=(

Yj − EFj

[

Yτ1j+1

]) (

1τ1j =j,τ2j >j − 1τ1j >j,τ2j =j

)

+ EFj

J−1∑

l=j+1

(

Yl − EFl

[

Yτ1l+1

]) (

1τ1l =l,τ2l >l − 1τ1l >l,τ2l =l

)

1τ2l−1>l−1

and

EFj

[

∣Yτ1j − Yτ2j

2]

≤ EFj

J−1∑

l=j

2l(

EFl

[

∣Yl − Yτ1l+1

2])(

1τ1l =l,τ2l >l + 1τ1l >l,τ2l =l

)

.

for any j = 0, . . . ,J − 1.

USING SIAM’S MM LATEX MACROS 15

Proof. We have

Yτ1l− Yτ2l

=[

Yl − Yτ2l

]

1τ1l =l,τ2l >l +[

Yτ1l− Yl

]

1τ1l >l,τ2l =l

+[

Yτ1l− Yτ2l

]

1τ1l >l,τ2l >l

=[

Yl − Yτ1l+1

]

1τ1l =l,τ2l >l +[

Yτ1l+1− Yl

]

1τ1l >l,τ2l =l

+[

Yτ1l+1− Yτ2l+1

]

1τ1l =l,τ2l >l +[

Yτ1l+1− Yτ2l+1

]

1τ1l >l,τ2l >l.

Therefore it holds for ∆l = EFl

[

Yτ1l− Yτ2l

]

,

∆l =[

Yl − EFl

[

Yτ1l+1

]] (

1τ1l =l,τ2l >l − 1τ1l >l,τ2l =l

)

+EFl

∆l+11τ2l >l

with ∆J = 0 and the first statement follows. To prove the second inequality note that

EFj

[

∣Yτ1j− Yτ2j

2]

≤ 2(

EFj

[

∣Yj − Yτ1j+1

2])(

1τ1j =j,τ2j >j + 1τ1j >j,τ2j =j

)

+2EFj

[

EFj+1∣

∣Yτ1j+1− Yτ2j+1

2]

.

Introduce

τk,l = min

l ≤ j ≤ J : gj(Zj) ≥ Ck,j(Zj)

, l = 1, . . . ,J ,

and note that τk,l, l = 1, . . . ,J , is a consistent family of stopping times. Taking into accountthat

C∗l (Zl) = EFl

[

gτ∗l+1(Zτ∗l+1

)]

≤ gl(Zl)

on τ∗l = l andC∗l (Zl) > gl(Zl)

on τ∗l > l, we get from Lemma 7.1 with τ1j = τ∗j , τ2j = τk,j and Yj = gj(Zj) for R :=

E[V n,k0 ]− V ∗

0 ,

|R| = |E [gτ∗(Zτ∗)− gτk(Zτk)]|

≤ E

[

J−1∑

l=0

|C∗l (Zl)− gl(Zl)|

(

1τ∗l =l,τk,l>l + 1τ∗l >l,τk,l=l

)

]

.

Introduce

Ek,j = gj(Zj) ≥ C∗j (Zj), gj(Zj) < Ck,j(Zj) ∪ gj(Zj) < C∗

j (Zj), gj(Zj) ≥ Ck,j(Zj),

Ak,j,0 =

0 <∣

∣gj(Zj)− C∗j (Zj)

∣ ≤ γ1/2k

,

Ak,j,i =

2i−1γ1/2k <

∣gj(Zj)− C∗j (Zj)

∣ ≤ 2iγ1/2k

16 TEX PRODUCTION

for j = 0, . . . ,J − 1 and i > 0. It holds

|R| ≤ E

[

J−1∑

l=0

|C∗l (Zl)− gl(Zl)| 1Ek,l

]

= E

[

∞∑

i=0

J−1∑

l=0

|C∗l (Zl)− gl(Zl)| 1Ek,l∩Ak,l,i

]

= γ1/2k

J−1∑

l=0

P(

|gl(Zl)− C∗l (Zl)| ≤ γ

1/2k

)

+E

[

∞∑

i=1

J−1∑

l=0

|C∗l (Zl)− gl(Zl)| 1Ek,l∩Ak,l,i

]

.

Using the fact that |gl(Zl)−C∗l (Zl)| ≤ |Ck,l(Zl)− C∗

l (Zl)| on Ek,l, we derive

|R| ≤ γ1/2k

J−1∑

l=0

P(

|gl(Zl)− C∗l (Zl)| ≤ γ

1/2k

)

+

∞∑

i=1

2iγ1/2k E

[

J−1∑

l=0

1|gj(Zl)−C∗l (Zl)|≤2iγ

1/2k

Pk(

|Ck,l(Zl)− C∗l (Zl)| > 2i−1γ

1/2k

)

]

≤ AJ γ(1+α)/2k +AJ γ(1+α)/2

k

∞∑

i=1

2i(1+α)B1 exp(−B22i−1),

where we have made use of the assumptions (AQ) and (AM).

7.2. Proof of Theorem 3.3. Based on (3.1) we have the optimization problem

kκ1+1 + n · kκ2 → min

γ(1+α)/2k .ε

n&ε−2

It is clear thatγ(1+α)/2k = k−µ(1+α)/2 ⇒ k ≥ ε

− 2µ(1+α) ,

which immediately leads to the statement.

7.3. Proof of Theorem 4.1. We have

E[

gτkl

(

Zτkl

)

− gτkl−1

(

Zτkl−1

)]2≤ 2E

[

gτ∗(

Zτ∗

)

− gτkl−1

(

Zτkl−1

)]2

+2E[

gτkl

(

Zτkl

)

− gτ∗(

Zτ∗

)]2.

It follows from Lemma 7.1 that

E[

gτ∗(

Zτ∗

)

− gτkl−1

(

Zτkl−1

)]2≤

E

J−1∑

s=0

2sξs

(

1τ∗s =s,τkl−1,s>s + 1τ∗s >s,τkl−1,s

>s

)

USING SIAM’S MM LATEX MACROS 17

with ξs = EFs

[

∣gs(Zs)− V ∗s+1(Zs+1)

2]

. By the Holder inequality we get

E[

ξs

(

1τ∗s =s,τkl−1,s>s + 1τ∗s >s,τkl−1,s

>s

)]

≤ 4M1/pp

[

P(

τ∗s = s, τkl−1,s > s)

+ P(

τ∗s > s, τkl−1,s = s)]1/q

= 4M1/pp

[

P(Ekl−1,s)]1/q

with 1/p + 1/q = 1, since

E [|ξs|p] ≤ E[

∣gs(Zs)− V ∗s+1(Zs+1)

2p]

≤ 22p−1E[

∣gs(Zs)∣

2p]

+ 22p−1E[

∣V ∗s+1(Zs+1)

2p]

≤ 22p−1E[

∣gs(Zs)∣

2p]

+ 22p−1E

[

∣ maxk=s+1,...,J

gk(Zk)∣

2p]

≤ 22pMp.

Due to

P(Ekl−1,s) =

∞∑

i=0

P(Ekl−1,s ∩ Akl−1,s,i)

we have

E[

gτ∗(

Zτ∗

)

− gτkl−1

(

Zτkl−1

)]2≤ C1/p

p E

[

∞∑

i=0

J−1∑

s=0

2s[

P(Ekl−1,s ∩ Akl−1,s,i)]1/q

]

,

where

P(Ekl−1,s ∩ Akl−1,s,i) ≤ P(

|gs(Zs)−C∗s (Zs)| ≤ γ

−1/2kl−1

)

≤ Aγ−α/2kl−1

if i = 0 and

P(Ekl−1,s ∩ Akl−1,s,i) ≤ E

[

1|gs(Zs)−C∗

s (Zs)|≤2iγ−1/2kl−1

Pkl−1

(

∣Ckl−1,s(Zs)− C∗s (Zs)

∣ > 2i−1γ−1/2kl−1

)

]

≤ Aγ−α/2kl−1

2iαB1 exp(−B22i−1)

for i > 0. As a result

E[

gτ∗(

Zτ∗

)

− gτkl−1

(

Zτkl−1

)]2≤ Cγ

−α/(2q)kl−1

for some C > 0.

18 TEX PRODUCTION

7.4. Proof of Theorem 4.3. Due to the monotone structure of the functional, we canconsider the following optimization problem:

L∑

l=0

kκ1+1l + nl · kκ2

l → min(7.1)

γ(1+α)/2kL

= k−µ(1+α)/2L =

(

k0 · θL)−µ(1+α)/2

.ε(7.2)

L∑

l=1

γα/(2q)kl−1

nl= k

−µα/20 ·

L∑

l=1

θ−lµα/(2q)

nl.ε2.(7.3)

n0∼ε−2(7.4)

Now the Lagrange multiplier method with respect to nl gives us

kκ2l = −λk

−µα/(2q)l

n2l⇒ nl =

(−λ) · k(−κ2−µα/(2q))l .

Now one can put the value of nl in (7.3):

L∑

l=1

γα/(2q)kl−1

nl=

L∑

l=1

k−µα/(2q)l

(−λ) · k(−κ2−µα/(2q))l

=ε2 ⇒

(−λ)=ε−2 ·L∑

l=1

k(κ2−µα/(2q))l ⇒

nl∼ε−2

(

L∑

i=1

k(κ2−µα/(2q))i

)

·√

k(−κ2−µα/(2q))l .

For total number of level we have from (7.2):

(

k0 · θL)−µ(1+α)/2

.ε⇒ L ≥ 2

µ(1 + α)logθ

(

ε−1 · k−µ(1+α)/20

)

.

Due to the special structure of the constraints and the functional, we can set

L =

c2

µ(1 + α)logθ

(

ε−1 · k−µ(1+α)/20

)

Now we can rewrite (7.1) as

L∑

l=0

kκ1+1l + nl · kκ2

l ∼ kκ1+1L + ε−2 ·

(

L∑

l=1

k(κ2−µα/(2q))l

)2

+ ε−2 · kκ20 ,

so we will have three cases.

USING SIAM’S MM LATEX MACROS 19

Case 1. 2 · q · κ2 = µα.

kκ1+1L +

L∑

l=0

nl · kκ2l . kκ1+1

L + ε−2 · L2

. ε−

2·(κ1+1)µ(1+α) + ε−2 · L2

Case 2. 2 · q · κ2 < µα.

kκ1+1L +

L∑

l=0

nl · kκ2l . kκ1+1

L + ε−2

. ε−

2·(κ1+1)µ(1+α) + ε−2

Case 3. 2 · q · κ2 > µα.

kκ1+1L + kκ1+1

L +

L∑

l=0

nl · kκ2l . kκ1+1

L + ε−2 · kκ2−µα/(2q)L

. ε−

2·(κ1+1)µ(1+α) + ε

−2−2κ2−µα/qµ(1+α)

Combining all three cases one will get (4.1).

REFERENCES

[1] Ankush Agarwal and Sandeep Juneja, Comparing optimal convergence rate of stochastic mesh andleast squares method for bermudan option pricing, in Proceedings of the 2013 Winter SimulationConference: Simulation: Making Decisions in a Complex World, IEEE Press, 2013, pp. 701–712.

[2] Leif BG Andersen, A simple approach to the pricing of bermudan swaptions in the multi-factor libormarket model, Journal of Computational Finance, 3 (1999), pp. 5–32.

[3] Denis Belomestny, Pricing bermudan options by nonparametric regression: optimal rates of convergencefor lower estimates, Finance and Stochastics, 15 (2011), pp. 655–683.

[4] Denis Belomestny, John Schoenmakers, and Fabian Dickmann, Multilevel dual approach for pric-ing american style derivatives, Finance and Stochastics, 17 (2013), pp. 717–742.

[5] Mark Broadie and Paul Glasserman, Pricing american-style securities using simulation, Journal ofEconomic Dynamics and Control, 21 (1997), pp. 1323–1352.

[6] , A stochastic mesh method for pricing high-dimensional american options, Journal of Computa-tional Finance, 7 (2004), pp. 35–72.

[7] Jacques F Carriere, Valuation of the early-exercise price for options using simulations and nonpara-metric regression, Insurance: mathematics and Economics, 19 (1996), pp. 19–30.

[8] Fabian Dickmann and Nikolaus Schweizer, Faster comparison of stopping times by nested conditionalmonte carlo, arXiv preprint arXiv:1402.0243, (2014).

[9] Daniel Egloff et al., Monte carlo algorithms for optimal stopping and statistical learning, The Annalsof Applied Probability, 15 (2005), pp. 1396–1432.

[10] Michael B Giles, Multilevel monte carlo path simulation, Operations Research, 56 (2008), pp. 607–617.[11] Paul Glasserman, Monte Carlo methods in financial engineering, vol. 53, Springer Science & Business

Media, 2003.[12] Paul Glasserman, Bin Yu, et al., Number of paths versus number of basis functions in american

option pricing, The Annals of Applied Probability, 14 (2004), pp. 2090–2119.

20 TEX PRODUCTION

[13] Francis A Longstaff and Eduardo S Schwartz, Valuing american options by simulation: A simpleleast-squares approach, Review of Financial studies, 14 (2001), pp. 113–147.

[14] John N Tsitsiklis and Benjamin Van Roy, Regression methods for pricing complex american-styleoptions, Neural Networks, IEEE Transactions on, 12 (2001), pp. 694–703.

[15] Daniel Z Zanger, Quantitative error estimates for a least-squares monte carlo algorithm for americanoption pricing, Finance and Stochastics, 17 (2013), pp. 503–534.