From CVA to the Resolution of a Large Number of Small...

38
From CVA to the Resolution of a Large Number of Small Random Systems Lokman Abbas-Turki First part from a joint work with M. A. Mikou Last part from a joint work with S. Graillat UPMC, LPMA 6 April 2016 Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 1 / 38

Transcript of From CVA to the Resolution of a Large Number of Small...

Page 1: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

From CVA to the Resolution of a LargeNumber of Small Random Systems

Lokman Abbas-TurkiFirst part from a joint work with M. A. MikouLast part from a joint work with S. Graillat

UPMC, LPMA

6 April 2016

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 1 / 38

Page 2: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 2 / 38

Page 3: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Introduction

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 3 / 38

Page 4: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Introduction From linear/linear to linear/nonlinear

Credit ValuationAdjustment

In a financial transaction between a party C that has to pay another partyB some amount V , the CVA value is the price of the insurance contractthat covers the default of party C to pay the whole sum V .

CVAt,T = (1− R)Et(V+τ 1t<τ≤T

)(1)

I R is the recovery to make if the counterparty defaults (Assume R = 0),I τ is the random default time of the counterparty,I T is the protection time horizon.

Numericalsimulation

CVA0,T ≈N−1∑k=0

E(V+tk1τ∈(tk ,tk+1]

), (2)

N ≤ the number of time steps used for SDEs discretization.

ImportanceI Hold sufficient amount of liquid assets to face the counterparty default.I Basel III includes the calculation of the CVA (Credit Valuation

Adjustment) as an important part of the prudential rules.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 4 / 38

Page 5: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Introduction From linear/linear to linear/nonlinear

Various kind ofcontracts

Simulating assets St = (S1t , ..., S

dt ) trajectories then contracts trajectories

to get Vt as a sum:

Vt =∑ie

φexpie (St) +∑ii

φeuiii (St) +∑id

φeudid,t(St) +∑ia

φamia,t(St), (3)

where ie, ii , id and ia are the exposure indices and:

φexp explicit function, for example: φexp(Stk ) = S1tk− S2

tk.

φeui is a path-independent European contract,

φeui (Stk ) = E(f eui (ST )|Stk ). (4)

φeudt is a path-dependent European contract,

φeudtk(Stk ) = E(f eudtk

(Stk+1 )|Stk ), (5)

for example: f eudtk(Stk+1 ) = ( max

i=0,..,kS1ti∨ S1

tk+1− S2

tk+1)+.

φamt is an American contract, involving an optimal stopping problem

φamtk (Stk ) = f (Stk ) ∨ E(φamtk+1(Stk+1 )|Stk ) (6)

with f an explicit payoff that generally does not depend on the asset path.

Common problem ϕ(x) = E(f (Stk+1 )|Stk = x).

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 5 / 38

Page 6: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Introduction From nonlinear/linear to nonlinear/nonlinear

TVA definition Total valuation adjustment (>CVA+DVA+FVA), it covers:I Both defaults: τ = τ c ∧ τb, CVA and DVA.I Funding our risk and the risk of the counterparty: Nonlinear BSDE part,

FVA.

S. Crépey(2012) Ignoring the external funding and denoting βt = e−∫ t0 rudu where r is the

risk-free short rate process, Θ satisfies the following BSDE on [0, τ ∧ T ]

βtΘt = E

[βτ1τ<T (Vτ − Rτ ) +

∫ τ∧T

tβsgs(Vs −Θs)ds

∣∣Gt] (7)

where G is the extension of F by the natural filtration generated by τ cand by τb. R is the total close-out cash-flow specified thanks to CSA(Credit Support Annex) and g is the funding coefficient.

TVA BSDEsimulation

I Only for European contracts.I Requires a good approximation of the exposure V .I Practitioners usually use rough approximations.I No trustable procedure in the general case.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 6 / 38

Page 7: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Introduction From nonlinear/linear to nonlinear/nonlinear

WOLOG: V = P

For one Americancontract

I Pt is the American derivative exposition.I Let τ∗ ∈ [0,T ] be the optimal stopping time associated to Pt .

Theorem Θ = ΘJ + (1− J)ξτ , where Jt = 1{τ>t} and the pre-default TVA Θsatisfies the following BSDE on [0, τ∗]

βtΘt = Et

(∫ τ∗

tβs gs(Ps − Θs)ds

), gt(Pt − Θt) = gt(Pt − Θt) + γt ξt . (8)

I βt = e−∫ t0 (γu+ru)du .

I ξt := 1Et (1{τ>t})

Et(ξt1{τ>t}) with ξt := Pt − E(R1{t=τ}|Gt).

Extension Possible to extend (8) on a portfolio of various American options.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 7 / 38

Page 8: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 8 / 38

Page 9: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms

Without fundingconstraints CVA0,T =

N−1∑k=0

E(P+k+11τ∈(kh,(k+1)h]

), h =

T

N.

With fundingconstraints Θk = Ek (Θk+1 + hg(k + 1,Pk+1,Θk+1)) , ΘN = 0.

The exposure Pk (x) = E(Φk,τk (Sτk )|Sk = x

)with

τN = N,∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc

k,

(9)

where Γk ={

Φk+1,k (Sk ) > E(Pk+1(Sk+1)

∣∣∣Sk)}. The conditionalexpectation involved in Γk is approximated using a regression on a basisof monomial functions where Kk is its cardinal.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 9 / 38

Page 10: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms

Without fundingconstraints CVA0,T =

N−1∑k=0

E(P+k+11τ∈(kh,(k+1)h]

), h =

T

N.

With fundingconstraints Θk = Ek (Θk+1 + hg(k + 1,Pk+1,Θk+1)) , ΘN = 0.

An example of atwo stage

simulation withM0 = 2, M6 = 8

and M8 = 4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 10 / 38

Page 11: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms

CVA0,Tapproximation CVA0,T =

N−1∑k=0

1M0

M0∑i=1

F 2k+1

(P1(S i

1), ..., Pk+1(S ik+1)

)(10)

With{F 1k+1(x1, ..., xk+1) = E

(1τ∈(kh,(k+1)h]|P1 = x1, ...,Pk+1 = xk+1

),

F 2k+1(x1, ..., xk+1) = (xk+1)+F 1

k+1(x1, ..., xk+1).(11)

Θk approximation

For k = 1, ...,N − 1

Θk (x)= tψ(x)Ψ−1k

1M0

M0∑j=1

ψ(S jk )

(Θk+1(S j

k+1) +1Ng(k+1, Θk+1(S j

k+1), Pk+1(S jk+1)

))and ΘN(x) = 0, Θ0(S0) =

1M0

M0∑j=1

(Θ1(S j

1) +1Ng(1, Θ1(S j

1), P1(S j1))).

(12)

Where Ψk = T

1M0

M0∑i=0

ψ(S ik )tψ(S i

k )

with: {S i}i∈{1,...,M0} and {S i}i∈{1,...,M0} are two

independent simulations of the underlying asset S, ψ is a basis of monomial functions where Kis its cardinal and T is an operator that must satisfy some desired properties.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 11 / 38

Page 12: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms Without funding constraints

Theorem

E

[(CVA0,T − CVA0,T

)2]≤

N2

M0max

k∈{0,...,N−1}Var

(F 2k+1

(P1(S i

1), ..., Pk+1(S ik+1)

))

+N∑j=1

14NM2

j

(E[Vj (S

ij )f ′′j (Pj (S

ij ))F 3

j (P1(S i1), ...,Pj (S

ij ))])2

+N∑j=1

14NM2

j

EVj (S

ij )f ′′j (Pj (S

ij ))

N−1∑k=j

F 4k+1(P1(S i

1), ...,Pj (Sij ), S i

j )

2

+N∑j=1

N

4M2j

(E[Vj (S

ij )F 1

j (P1(S i1), ...,Pj (S

ij ))|Pj (S

ij ) = 0

]ϕj (0)

)2+ N

N∑j=1

(N − j + 1)2O

(1M4

j

)

Where ϕj is the density of Pj (Sij ), Vj (x) = Var

(√Mj

(Pj (x)− Pj (x)

)).

Good choices If ϕj (0) is big then take Mj ∼√M0, otherwise Mj ∼

√M0/N. In both

cases, N must be small when compared to√M0. When American options

are involved, make sure that K3j /Mj is small enough.

For example

Mj =N − j

N − 1M1 with either M1 =

√M0

Nor M1 =

√M0. (13)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 12 / 38

Page 13: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Simulation algorithms With funding constraints

Theorem As long as {Θi (x)}0≤i≤N−1 are of class Cs on the support of S ∈ Rd ,there exists a positive constant C such that for each 0 ≤ k ≤ N − 1

E

[(Θk (S i

k )−Θk (S ik ))2]≤

CK

N2

N−1∑l=k

EVl+1(S j

l+1)∂2Pg(l + 1,Θl+1(S j

l+1),Pl+1(S jl+1)

)2Ml

2

+O

(K

M0+

K2

N2M0+

K

N4M2l

+K1−2s/d

N2 + K−2s/d

).

Good choice Take Ml ∼√

M0/N, N must be sufficiently small N ∼ 10. WhenAmerican options are involved, make sure that K3

l /Ml is small enough.

For example

Ml =N − l

N − 1M1 with M1 =

√M0

N. (14)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 13 / 38

Page 14: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 14 / 38

Page 15: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options

An example of atwo stage

simulation withM0 = 2, M6 = 8

and M8 = 4

Inner dynamicprogramming

τN = N,∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc

k,

(15)

where Γk ={

Φk+1,k (Sk ) > Rk · ψ(Sk )}. The vector Rk minimizes the

quadratic error||Pk+1(Sk+1)− Rk · ψ(Sk )||L2 (16)

and denoting Ak = E (ψ(Sk )tψ(Sk )), we get

Rk = A−1k E (ψ(Sk )Pk+1(Sk+1)) . (17)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 15 / 38

Page 16: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options The difference with the references

Three main methods for symmetric big matricesCholesky

factorizationI V. Volkov and J. Demmel. LU, QR and Cholesky Factorizations using

Vector Capabilities of GPUs. Berkeley Technical Report. 2008.I G. Ballard, J. Demmel, O. Holtz and O. Schwartz,

Communication-Optimal Parallel and Sequential Cholesky Decomposition.SIAM J. SCI. COMPUT. 32(6), 3495–3523. 2010.

Tridiagonal form +cyclic reduction

I Y. Zhang , J. Cohen and J. D. Owens. 15th ACM SIGPLAN Symposiumon Principles and Practice of Parallel Programming, 127–136. 2010.

I D. Goddeke and R. Strzodka. Cyclic Reduction Tridiagonal Solvers onGPUs Applied to Mixed Precision Multigrid. Parallel and DistributedSystems, IEEE Trans. 22(1), 22–32. 2010.

Tridiagonal form +eigenproblem

I J. W. Demmel, O. A. Marques, B. N. Parlett and C. Vomel. Performanceand Accuracy of Lapack’s Symmetric Tridiagonal Eigensolvers. SIAM J.SCI. COMPUT. 30(3), 1508–1526. 2008.

I C. Vomel, S. Tomov and J. Dongarra. Divide & Conquer on HybridGpu-Accelerated Multicore Systems. SIAM J. SCI. COMPUT. 34(2),70–82. 2012.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 16 / 38

Page 17: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options The difference with the references

None of the previous works can be used directlyThe reason

I Large number of small random linear systems: The size does not exceed64 and the communication is reduced.

I Some of these random systems could be ill-conditioned.

Ak,l =1Mk

Mk∑j=1

ψl (S(j)tk

)ψl (S(j)tk

)t (18)

Typical conditionnumbers for linearregression n = 30

in the Black &Scholes model

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 17 / 38

Page 18: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options The difference with the references

Must we systematically use Householdertridiagonalization with divide & conquer when wesuspect the random linear systems to be ill-conditioned?

Our answer

I Perform Householder tridiagonalization O(4n3/3) and solvethe linear systems cheaply using parallel cyclic reductionO(n log2(n)).

I Take a decision according to the value of the residueerror:

* If the residue error is small then we already have goodsolutions.

* Otherwise, we must perform divide & conquer O(4n3/3)diagonalizations and discard the smallest eigenvalues.

I The next time we solve this same kind of linear systems:* If they used to be well-conditioned then we just process LDLtO(n3/6).

* Otherwise we execute directly the combination ofHouseholder tridiagonalization and divide & conquerdiagonalization.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 18 / 38

Page 19: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options LDLt

Shared occupation n(n + 1)/2 + n and complexityO(n3/6)

A = LDLt , Dj,j = Aj,j −j−1∑k=1

L2j,kDk,k ,

Li,j =1

Dj,j

Ai,j −j−1∑k=1

Li,kLj,kDk,k

if i > j .

(19)

Standard LDLtparallel strategy

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 19 / 38

Page 20: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options LDLt

Three different versions studied

1. An SIMD version that requires only independent threads, one for eachlinear system.

2. A collaborative version that involves n collaborative threads for eachlinear system with n unknowns.

3. An optimal hybrid solution that involves n∗ (n∗ < n) collaborativethreads for each linear system with n unknowns.

The speedup of thecollaborative and

the hybrid versionswhen compared to

the SIMDimplementation.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 20 / 38

Page 21: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options LDLt

(a) Optimal number of collaborative threads(b) Number of systems solved within a second

(a) (b)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 21 / 38

Page 22: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Householder tridiagonalization + PCR

Householder tridiagonalization: Shared occupationn2 + 2n and complexity O(4n3/3)

1. An SIMD version that requires only independent threads, one for eachlinear system.

2. A collaborative version that involves n collaborative threads for eachlinear system with n unknowns.

For symmetric A

U = Ht3...H

tnAHn...H3 =

d1 c1

c1 d2 c2 0

c2 d3. . .

. . .. . .

. . .

0. . .

. . . cn−1cn−1 dn

, (20)

with each Householder matrix H given by

H = I − uut/b, b = utu/2. (21)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 22 / 38

Page 23: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Householder tridiagonalization + PCR

From CR: Shared occupation 3n and complexityO(nlog2(n))

CR when n = 8

e1 e7e5

z2 z8z4 z6

Stepg1:gForwardgreductiongtogag4-unknowngsystemginvolvinggz2,gz4,gz6gandgz8

Stepg2:gForwardgreductiongtoag2-unknowngsystemginvolvinggz4gandgz8

Stepg3:gSolveg2-unknowngsystem

Stepg4:gBackwardgsubstitutiongtosolvegthegrestg2gunknowns

Stepg5:gBackwardgsubstitutiongtosolvegthegrestg4gunknowns

A A A AA AA A

AAA

AAAA

AA

AA

AA

A AAA AAA

AA

AA

A

A

AA

e3

z3z1 z5 z7

e1 e2 e3 e4 e5 e6 e7 e8

e'8e'6e'4e'2

e''4 e''8

e'6e'2 z4 z8

z2 z4 z6 z8

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 23 / 38

Page 24: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Householder tridiagonalization + PCR

To a new version of PCR: Shared occupation 4n andcomplexity O(nlog2(n))

PCR when n = 8

Step 1: Reduced to 2 systems of 4 unknowns

Step 2: Reduced to 4 systems of 2 unknowns

Step 3: Solve

AAAA AA AA AAA AAAA AA AA AA A

e'7e'5e'3e'1AAAA AA A AAA AAAA AA A AA A

e''6e''2e''5e''1 e''8e''4e''7e''3

e'7e'5e'3e'1 e'2 e'4 e'6 e'8

AAAA A A AA AAAAA A A A

z4z2z3z1 z8z6z7z5

e''7e''3e''5e''1 e''2 e''6 e''4 e''8

e'8e'6e'4e'2

e4e3e2e1 e5 e6 e7 e8

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 24 / 38

Page 25: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Householder tridiagonalization + PCR

d1 c1c1 d2 c2

c2 d3 c3c3 d4 c4

c4 d5 c5c5 d6 c6

c6 d7

(R)−→

d ′1 0 c ′20 d ′2 0 c ′3c ′2 0 d ′3 0 c ′4

c ′3 0 d ′4 0 c ′5c ′4 0 d ′5 0 c ′6

c ′5 0 d ′6 0c ′6 0 d ′7

(P)−→

d ′1 c ′2c ′2 d ′3 c ′4

c ′4 d ′5 c ′6c ′6 d ′7 0

0 d ′2 c ′3c ′3 d ′4 c ′5

c ′5 d ′6

(R)−→

d ′′1 0 c ′′20 d ′′3 0 c ′′4c ′′2 0 d ′′5 0

c ′4 0 d ′7 00 d ′′2 0 c ′′3

0 d ′′4 0c ′′3 0 d ′′6

(P)−→

d ′′1 c ′′2c ′′2 d ′′5 0

0 d ′′3 c ′′4c ′′4 d ′7 0

0 d ′′2 c ′′3c ′′3 d ′′6 0

0 d ′′4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 25 / 38

Page 26: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Householder tridiagonalization + PCR

(a) Number of PCRs + two matrix/vectormultiplications performed per second.(b) LDLt vs. tridiagonal + PCR

(a) (b)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 26 / 38

Page 27: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Divide and conquer for eigenproblem

For ill-conditioned systemsStandardprocedure

I Tridiagonal Householder decomposition A = QUQt where Q is orthogonaland U is symmetric tridiagonal.

I Divide & conquer algorithm for symmetric tridiagonal eigenproblems toestablish U = ODOt where O is orthogonal and D is diagonal.

I Discard the smallest eigenvalues of D that provide a condition numberlarger than 105.

U =

d1 c1

c1. . .

. . .. . .

. . . cm−1cm−1 dm − cm 0

0 dm+1 − cm cm+1

cm+1. . .

. . .. . .

. . . cn−1cn−1 dn

+ cm1m,m+11tm,m+1

=

(U1 00 U2

)+ cm1m,m+11tm,m+1

(22)

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 27 / 38

Page 28: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Divide and conquer for eigenproblem

Shared occupation 2n(n + 2) + 21+blog2(n−1)c andcomplexity O(4n3/3)

Steps1.

U =

(O1 00 O2

)((D1 00 D2

)+ cmuu

t

)(Ot

1 00 Ot

2

)where u =

(Ot

1 00 Ot

2

)1m,m+1 =

(last column of Ot

1first column of Ot

2

).

2. Let Λ = {λ1, ..., λn}, ordered family of eigenvalues of(

D1 00 D2

). If

cm 6= 0 and the eigenvalue λ of U satisfies λ /∈ Λ, then its value isobtained as a solution of

n∑i=1

u2i

λi − λ+

1cm

= 0. (23)

3. From u and the solutions of (23), Löwner’s Theorem provides vector u

that is used to compute the eigenvector Vλ of(

D1 00 D2

)+ cmuut

4. Let W = (Vλ)λ eigenvalue of U , we get the eigenvectors of U thanks to the

multiplication(

Q1 00 Q2

)W .

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 28 / 38

Page 29: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Divide and conquer for eigenproblem

Additional details on steps 1. and 2.

Step 1.

Advantage: Pure divide and conquer algorithm, it prevents to haveeigenvalues of multiplicity bigger than two at each conquering step.

Step 2. Using Gragg’s scheme (based on Newton’s method): Choose hk such thathk (λ) = xk,0 + xk,1/(λk − λ) + xk,2/(λk+1 − λ) matchesn∑

i=1

u2i

λi − λ+

1cm

at its root ∈ (λk , λk+1) up to the second derivative.

Advantage: Cubic monotonic convergence.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 29 / 38

Page 30: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Adaptation issues to American options Divide and conquer for eigenproblem

Comparison with Householder tridiagonalization

Execution timecomparison:

TDC/THouseholder

0 10 20 30 40 50 60 702

3

4

5

6

7

8

9

10

System size

CommmentsI Small matrices.I Iterative algorithm to solve the secular equation.I Divergence produced by deflation.

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 30 / 38

Page 31: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Some simulation results

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 31 / 38

Page 32: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Some simulation results

Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50

EuropeanPath-dependent

option Φ(ST ) =

(S1T

2+

S2T

2− S

3T

)+

M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0

N0.01364 4 ∗ 10−5 0.0296 2 ∗ 10−4

√M0√N

0.01307 4 ∗ 10−5 0.0294 2 ∗ 10−4

√M0 0.01265 3 ∗ 10−5 0.0291 2 ∗ 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 32 / 38

Page 33: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Some simulation results

Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50

EuropeanPath-dependent

option Φ(ST ) =

(3S1

T

10+

7S2T

10− S

3T

)+

−(7S1

T

10+

3S2T

10− S

3T

)+

M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0

N2.72 ∗ 10−3 10−5 0.0365 8 ∗ 10−4

√M0√N

2.44 ∗ 10−3 10−5 0.0453 8 ∗ 10−4

√M0 2.28 ∗ 10−3 10−5 0.0520 8 ∗ 10−4

√N√M0 2.24× 10−3 10−5 0.0528 8× 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 33 / 38

Page 34: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Some simulation results

Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50 and n = 10

American option

Φ(ST ) =

(K −

S1T

3−

S2T

3−

S3T

3

)+

M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0√N

0.0242 10−4 0.0356 2 ∗ 10−4

√M0 0.0229 10−4 0.0351 2 ∗ 10−4

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 34 / 38

Page 35: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Conclusion

Plan

IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear

Simulation algorithmsWithout funding constraintsWith funding constraints

Adaptation issues toAmerican options

The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem

Some simulationresults

Conclusion

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 35 / 38

Page 36: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Conclusion

Mathematical and computing work suited to GPUs

Mathematical partI Extending CVA (TVA) on American options.I Judicious choice of the number of inner and outer trajectories.

Computing partI CUDA source code of: LDLt, Householder reduction, parallel cyclic

reduction that is not necessary a power of two and divide and conquer foreigenproblem.

I Execution time comparison of the different methods mentioned above.I Original method to further optimize the adaptation of LDLt to our

context.I Original parallel cyclic reduction that can be used for any vector size and

not only a power of two.I Precise answer to the following question: Must we systematically use

Householder tridiagonalization with divide & conquer when we suspectthe random linear systems to be ill-conditioned?

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 36 / 38

Page 37: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

Conclusion

Future workI Further developments on the use of Nested Monte Carlo on GPUs for

BSDEs.I Studying the rounding errors and error propagation.I Use CADNA library to test each procedure:

http://www-pequan.lip6.fr/cadna/

Source codeI http://www.proba.jussieu.fr/~abbasturki/soft.htmI Premia library:

https://www.rocq.inria.fr/mathfi/Premia/index.html

ReferencesI L.A. Abbas-Turki and Stef Graillat. Resolution of a large number of small

random symmetric linear systems in single precision arithmetic on GPUs:https://hal.archives-ouvertes.fr/hal-01295549

I L.A. Abbas-Turki and M.A. Mikou. TVA on American Derivatives:https://hal.archives-ouvertes.fr/hal-01142874

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 37 / 38

Page 38: From CVA to the Resolution of a Large Number of Small ...on-demand.gputechconf.com/gtc/2016/presentation/s6239-lokman-abbas-turki-cva...From CVA to the Resolution of a Large Number

The End

Thank you

Questions?

Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 38 / 38