From CVA to the Resolution of a Large Number of Small...
Transcript of From CVA to the Resolution of a Large Number of Small...
From CVA to the Resolution of a LargeNumber of Small Random Systems
Lokman Abbas-TurkiFirst part from a joint work with M. A. MikouLast part from a joint work with S. Graillat
UPMC, LPMA
6 April 2016
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 1 / 38
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 2 / 38
Introduction
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 3 / 38
Introduction From linear/linear to linear/nonlinear
Credit ValuationAdjustment
In a financial transaction between a party C that has to pay another partyB some amount V , the CVA value is the price of the insurance contractthat covers the default of party C to pay the whole sum V .
CVAt,T = (1− R)Et(V+τ 1t<τ≤T
)(1)
I R is the recovery to make if the counterparty defaults (Assume R = 0),I τ is the random default time of the counterparty,I T is the protection time horizon.
Numericalsimulation
CVA0,T ≈N−1∑k=0
E(V+tk1τ∈(tk ,tk+1]
), (2)
N ≤ the number of time steps used for SDEs discretization.
ImportanceI Hold sufficient amount of liquid assets to face the counterparty default.I Basel III includes the calculation of the CVA (Credit Valuation
Adjustment) as an important part of the prudential rules.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 4 / 38
Introduction From linear/linear to linear/nonlinear
Various kind ofcontracts
Simulating assets St = (S1t , ..., S
dt ) trajectories then contracts trajectories
to get Vt as a sum:
Vt =∑ie
φexpie (St) +∑ii
φeuiii (St) +∑id
φeudid,t(St) +∑ia
φamia,t(St), (3)
where ie, ii , id and ia are the exposure indices and:
φexp explicit function, for example: φexp(Stk ) = S1tk− S2
tk.
φeui is a path-independent European contract,
φeui (Stk ) = E(f eui (ST )|Stk ). (4)
φeudt is a path-dependent European contract,
φeudtk(Stk ) = E(f eudtk
(Stk+1 )|Stk ), (5)
for example: f eudtk(Stk+1 ) = ( max
i=0,..,kS1ti∨ S1
tk+1− S2
tk+1)+.
φamt is an American contract, involving an optimal stopping problem
φamtk (Stk ) = f (Stk ) ∨ E(φamtk+1(Stk+1 )|Stk ) (6)
with f an explicit payoff that generally does not depend on the asset path.
Common problem ϕ(x) = E(f (Stk+1 )|Stk = x).
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 5 / 38
Introduction From nonlinear/linear to nonlinear/nonlinear
TVA definition Total valuation adjustment (>CVA+DVA+FVA), it covers:I Both defaults: τ = τ c ∧ τb, CVA and DVA.I Funding our risk and the risk of the counterparty: Nonlinear BSDE part,
FVA.
S. Crépey(2012) Ignoring the external funding and denoting βt = e−∫ t0 rudu where r is the
risk-free short rate process, Θ satisfies the following BSDE on [0, τ ∧ T ]
βtΘt = E
[βτ1τ<T (Vτ − Rτ ) +
∫ τ∧T
tβsgs(Vs −Θs)ds
∣∣Gt] (7)
where G is the extension of F by the natural filtration generated by τ cand by τb. R is the total close-out cash-flow specified thanks to CSA(Credit Support Annex) and g is the funding coefficient.
TVA BSDEsimulation
I Only for European contracts.I Requires a good approximation of the exposure V .I Practitioners usually use rough approximations.I No trustable procedure in the general case.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 6 / 38
Introduction From nonlinear/linear to nonlinear/nonlinear
WOLOG: V = P
For one Americancontract
I Pt is the American derivative exposition.I Let τ∗ ∈ [0,T ] be the optimal stopping time associated to Pt .
Theorem Θ = ΘJ + (1− J)ξτ , where Jt = 1{τ>t} and the pre-default TVA Θsatisfies the following BSDE on [0, τ∗]
βtΘt = Et
(∫ τ∗
tβs gs(Ps − Θs)ds
), gt(Pt − Θt) = gt(Pt − Θt) + γt ξt . (8)
I βt = e−∫ t0 (γu+ru)du .
I ξt := 1Et (1{τ>t})
Et(ξt1{τ>t}) with ξt := Pt − E(R1{t=τ}|Gt).
Extension Possible to extend (8) on a portfolio of various American options.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 7 / 38
Simulation algorithms
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 8 / 38
Simulation algorithms
Without fundingconstraints CVA0,T =
N−1∑k=0
E(P+k+11τ∈(kh,(k+1)h]
), h =
T
N.
With fundingconstraints Θk = Ek (Θk+1 + hg(k + 1,Pk+1,Θk+1)) , ΘN = 0.
The exposure Pk (x) = E(Φk,τk (Sτk )|Sk = x
)with
τN = N,∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc
k,
(9)
where Γk ={
Φk+1,k (Sk ) > E(Pk+1(Sk+1)
∣∣∣Sk)}. The conditionalexpectation involved in Γk is approximated using a regression on a basisof monomial functions where Kk is its cardinal.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 9 / 38
Simulation algorithms
Without fundingconstraints CVA0,T =
N−1∑k=0
E(P+k+11τ∈(kh,(k+1)h]
), h =
T
N.
With fundingconstraints Θk = Ek (Θk+1 + hg(k + 1,Pk+1,Θk+1)) , ΘN = 0.
An example of atwo stage
simulation withM0 = 2, M6 = 8
and M8 = 4
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 10 / 38
Simulation algorithms
CVA0,Tapproximation CVA0,T =
N−1∑k=0
1M0
M0∑i=1
F 2k+1
(P1(S i
1), ..., Pk+1(S ik+1)
)(10)
With{F 1k+1(x1, ..., xk+1) = E
(1τ∈(kh,(k+1)h]|P1 = x1, ...,Pk+1 = xk+1
),
F 2k+1(x1, ..., xk+1) = (xk+1)+F 1
k+1(x1, ..., xk+1).(11)
Θk approximation
For k = 1, ...,N − 1
Θk (x)= tψ(x)Ψ−1k
1M0
M0∑j=1
ψ(S jk )
(Θk+1(S j
k+1) +1Ng(k+1, Θk+1(S j
k+1), Pk+1(S jk+1)
))and ΘN(x) = 0, Θ0(S0) =
1M0
M0∑j=1
(Θ1(S j
1) +1Ng(1, Θ1(S j
1), P1(S j1))).
(12)
Where Ψk = T
1M0
M0∑i=0
ψ(S ik )tψ(S i
k )
with: {S i}i∈{1,...,M0} and {S i}i∈{1,...,M0} are two
independent simulations of the underlying asset S, ψ is a basis of monomial functions where Kis its cardinal and T is an operator that must satisfy some desired properties.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 11 / 38
Simulation algorithms Without funding constraints
Theorem
E
[(CVA0,T − CVA0,T
)2]≤
N2
M0max
k∈{0,...,N−1}Var
(F 2k+1
(P1(S i
1), ..., Pk+1(S ik+1)
))
+N∑j=1
14NM2
j
(E[Vj (S
ij )f ′′j (Pj (S
ij ))F 3
j (P1(S i1), ...,Pj (S
ij ))])2
+N∑j=1
14NM2
j
EVj (S
ij )f ′′j (Pj (S
ij ))
N−1∑k=j
F 4k+1(P1(S i
1), ...,Pj (Sij ), S i
j )
2
+N∑j=1
N
4M2j
(E[Vj (S
ij )F 1
j (P1(S i1), ...,Pj (S
ij ))|Pj (S
ij ) = 0
]ϕj (0)
)2+ N
N∑j=1
(N − j + 1)2O
(1M4
j
)
Where ϕj is the density of Pj (Sij ), Vj (x) = Var
(√Mj
(Pj (x)− Pj (x)
)).
Good choices If ϕj (0) is big then take Mj ∼√M0, otherwise Mj ∼
√M0/N. In both
cases, N must be small when compared to√M0. When American options
are involved, make sure that K3j /Mj is small enough.
For example
Mj =N − j
N − 1M1 with either M1 =
√M0
Nor M1 =
√M0. (13)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 12 / 38
Simulation algorithms With funding constraints
Theorem As long as {Θi (x)}0≤i≤N−1 are of class Cs on the support of S ∈ Rd ,there exists a positive constant C such that for each 0 ≤ k ≤ N − 1
E
[(Θk (S i
k )−Θk (S ik ))2]≤
CK
N2
N−1∑l=k
EVl+1(S j
l+1)∂2Pg(l + 1,Θl+1(S j
l+1),Pl+1(S jl+1)
)2Ml
2
+O
(K
M0+
K2
N2M0+
K
N4M2l
+K1−2s/d
N2 + K−2s/d
).
Good choice Take Ml ∼√
M0/N, N must be sufficiently small N ∼ 10. WhenAmerican options are involved, make sure that K3
l /Ml is small enough.
For example
Ml =N − l
N − 1M1 with M1 =
√M0
N. (14)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 13 / 38
Adaptation issues to American options
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 14 / 38
Adaptation issues to American options
An example of atwo stage
simulation withM0 = 2, M6 = 8
and M8 = 4
Inner dynamicprogramming
τN = N,∀k ∈ {N − 1, ..., 0}, τk = k1Γk + τk+11Γc
k,
(15)
where Γk ={
Φk+1,k (Sk ) > Rk · ψ(Sk )}. The vector Rk minimizes the
quadratic error||Pk+1(Sk+1)− Rk · ψ(Sk )||L2 (16)
and denoting Ak = E (ψ(Sk )tψ(Sk )), we get
Rk = A−1k E (ψ(Sk )Pk+1(Sk+1)) . (17)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 15 / 38
Adaptation issues to American options The difference with the references
Three main methods for symmetric big matricesCholesky
factorizationI V. Volkov and J. Demmel. LU, QR and Cholesky Factorizations using
Vector Capabilities of GPUs. Berkeley Technical Report. 2008.I G. Ballard, J. Demmel, O. Holtz and O. Schwartz,
Communication-Optimal Parallel and Sequential Cholesky Decomposition.SIAM J. SCI. COMPUT. 32(6), 3495–3523. 2010.
Tridiagonal form +cyclic reduction
I Y. Zhang , J. Cohen and J. D. Owens. 15th ACM SIGPLAN Symposiumon Principles and Practice of Parallel Programming, 127–136. 2010.
I D. Goddeke and R. Strzodka. Cyclic Reduction Tridiagonal Solvers onGPUs Applied to Mixed Precision Multigrid. Parallel and DistributedSystems, IEEE Trans. 22(1), 22–32. 2010.
Tridiagonal form +eigenproblem
I J. W. Demmel, O. A. Marques, B. N. Parlett and C. Vomel. Performanceand Accuracy of Lapack’s Symmetric Tridiagonal Eigensolvers. SIAM J.SCI. COMPUT. 30(3), 1508–1526. 2008.
I C. Vomel, S. Tomov and J. Dongarra. Divide & Conquer on HybridGpu-Accelerated Multicore Systems. SIAM J. SCI. COMPUT. 34(2),70–82. 2012.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 16 / 38
Adaptation issues to American options The difference with the references
None of the previous works can be used directlyThe reason
I Large number of small random linear systems: The size does not exceed64 and the communication is reduced.
I Some of these random systems could be ill-conditioned.
Ak,l =1Mk
Mk∑j=1
ψl (S(j)tk
)ψl (S(j)tk
)t (18)
Typical conditionnumbers for linearregression n = 30
in the Black &Scholes model
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 17 / 38
Adaptation issues to American options The difference with the references
Must we systematically use Householdertridiagonalization with divide & conquer when wesuspect the random linear systems to be ill-conditioned?
Our answer
I Perform Householder tridiagonalization O(4n3/3) and solvethe linear systems cheaply using parallel cyclic reductionO(n log2(n)).
I Take a decision according to the value of the residueerror:
* If the residue error is small then we already have goodsolutions.
* Otherwise, we must perform divide & conquer O(4n3/3)diagonalizations and discard the smallest eigenvalues.
I The next time we solve this same kind of linear systems:* If they used to be well-conditioned then we just process LDLtO(n3/6).
* Otherwise we execute directly the combination ofHouseholder tridiagonalization and divide & conquerdiagonalization.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 18 / 38
Adaptation issues to American options LDLt
Shared occupation n(n + 1)/2 + n and complexityO(n3/6)
A = LDLt , Dj,j = Aj,j −j−1∑k=1
L2j,kDk,k ,
Li,j =1
Dj,j
Ai,j −j−1∑k=1
Li,kLj,kDk,k
if i > j .
(19)
Standard LDLtparallel strategy
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 19 / 38
Adaptation issues to American options LDLt
Three different versions studied
1. An SIMD version that requires only independent threads, one for eachlinear system.
2. A collaborative version that involves n collaborative threads for eachlinear system with n unknowns.
3. An optimal hybrid solution that involves n∗ (n∗ < n) collaborativethreads for each linear system with n unknowns.
The speedup of thecollaborative and
the hybrid versionswhen compared to
the SIMDimplementation.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 20 / 38
Adaptation issues to American options LDLt
(a) Optimal number of collaborative threads(b) Number of systems solved within a second
(a) (b)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 21 / 38
Adaptation issues to American options Householder tridiagonalization + PCR
Householder tridiagonalization: Shared occupationn2 + 2n and complexity O(4n3/3)
1. An SIMD version that requires only independent threads, one for eachlinear system.
2. A collaborative version that involves n collaborative threads for eachlinear system with n unknowns.
For symmetric A
U = Ht3...H
tnAHn...H3 =
d1 c1
c1 d2 c2 0
c2 d3. . .
. . .. . .
. . .
0. . .
. . . cn−1cn−1 dn
, (20)
with each Householder matrix H given by
H = I − uut/b, b = utu/2. (21)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 22 / 38
Adaptation issues to American options Householder tridiagonalization + PCR
From CR: Shared occupation 3n and complexityO(nlog2(n))
CR when n = 8
e1 e7e5
z2 z8z4 z6
Stepg1:gForwardgreductiongtogag4-unknowngsystemginvolvinggz2,gz4,gz6gandgz8
Stepg2:gForwardgreductiongtoag2-unknowngsystemginvolvinggz4gandgz8
Stepg3:gSolveg2-unknowngsystem
Stepg4:gBackwardgsubstitutiongtosolvegthegrestg2gunknowns
Stepg5:gBackwardgsubstitutiongtosolvegthegrestg4gunknowns
A A A AA AA A
AAA
AAAA
AA
AA
AA
A AAA AAA
AA
AA
A
A
AA
e3
z3z1 z5 z7
e1 e2 e3 e4 e5 e6 e7 e8
e'8e'6e'4e'2
e''4 e''8
e'6e'2 z4 z8
z2 z4 z6 z8
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 23 / 38
Adaptation issues to American options Householder tridiagonalization + PCR
To a new version of PCR: Shared occupation 4n andcomplexity O(nlog2(n))
PCR when n = 8
Step 1: Reduced to 2 systems of 4 unknowns
Step 2: Reduced to 4 systems of 2 unknowns
Step 3: Solve
AAAA AA AA AAA AAAA AA AA AA A
e'7e'5e'3e'1AAAA AA A AAA AAAA AA A AA A
e''6e''2e''5e''1 e''8e''4e''7e''3
e'7e'5e'3e'1 e'2 e'4 e'6 e'8
AAAA A A AA AAAAA A A A
z4z2z3z1 z8z6z7z5
e''7e''3e''5e''1 e''2 e''6 e''4 e''8
e'8e'6e'4e'2
e4e3e2e1 e5 e6 e7 e8
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 24 / 38
Adaptation issues to American options Householder tridiagonalization + PCR
d1 c1c1 d2 c2
c2 d3 c3c3 d4 c4
c4 d5 c5c5 d6 c6
c6 d7
(R)−→
d ′1 0 c ′20 d ′2 0 c ′3c ′2 0 d ′3 0 c ′4
c ′3 0 d ′4 0 c ′5c ′4 0 d ′5 0 c ′6
c ′5 0 d ′6 0c ′6 0 d ′7
(P)−→
d ′1 c ′2c ′2 d ′3 c ′4
c ′4 d ′5 c ′6c ′6 d ′7 0
0 d ′2 c ′3c ′3 d ′4 c ′5
c ′5 d ′6
(R)−→
d ′′1 0 c ′′20 d ′′3 0 c ′′4c ′′2 0 d ′′5 0
c ′4 0 d ′7 00 d ′′2 0 c ′′3
0 d ′′4 0c ′′3 0 d ′′6
(P)−→
d ′′1 c ′′2c ′′2 d ′′5 0
0 d ′′3 c ′′4c ′′4 d ′7 0
0 d ′′2 c ′′3c ′′3 d ′′6 0
0 d ′′4
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 25 / 38
Adaptation issues to American options Householder tridiagonalization + PCR
(a) Number of PCRs + two matrix/vectormultiplications performed per second.(b) LDLt vs. tridiagonal + PCR
(a) (b)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 26 / 38
Adaptation issues to American options Divide and conquer for eigenproblem
For ill-conditioned systemsStandardprocedure
I Tridiagonal Householder decomposition A = QUQt where Q is orthogonaland U is symmetric tridiagonal.
I Divide & conquer algorithm for symmetric tridiagonal eigenproblems toestablish U = ODOt where O is orthogonal and D is diagonal.
I Discard the smallest eigenvalues of D that provide a condition numberlarger than 105.
U =
d1 c1
c1. . .
. . .. . .
. . . cm−1cm−1 dm − cm 0
0 dm+1 − cm cm+1
cm+1. . .
. . .. . .
. . . cn−1cn−1 dn
+ cm1m,m+11tm,m+1
=
(U1 00 U2
)+ cm1m,m+11tm,m+1
(22)
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 27 / 38
Adaptation issues to American options Divide and conquer for eigenproblem
Shared occupation 2n(n + 2) + 21+blog2(n−1)c andcomplexity O(4n3/3)
Steps1.
U =
(O1 00 O2
)((D1 00 D2
)+ cmuu
t
)(Ot
1 00 Ot
2
)where u =
(Ot
1 00 Ot
2
)1m,m+1 =
(last column of Ot
1first column of Ot
2
).
2. Let Λ = {λ1, ..., λn}, ordered family of eigenvalues of(
D1 00 D2
). If
cm 6= 0 and the eigenvalue λ of U satisfies λ /∈ Λ, then its value isobtained as a solution of
n∑i=1
u2i
λi − λ+
1cm
= 0. (23)
3. From u and the solutions of (23), Löwner’s Theorem provides vector u
that is used to compute the eigenvector Vλ of(
D1 00 D2
)+ cmuut
4. Let W = (Vλ)λ eigenvalue of U , we get the eigenvectors of U thanks to the
multiplication(
Q1 00 Q2
)W .
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 28 / 38
Adaptation issues to American options Divide and conquer for eigenproblem
Additional details on steps 1. and 2.
Step 1.
Advantage: Pure divide and conquer algorithm, it prevents to haveeigenvalues of multiplicity bigger than two at each conquering step.
Step 2. Using Gragg’s scheme (based on Newton’s method): Choose hk such thathk (λ) = xk,0 + xk,1/(λk − λ) + xk,2/(λk+1 − λ) matchesn∑
i=1
u2i
λi − λ+
1cm
at its root ∈ (λk , λk+1) up to the second derivative.
Advantage: Cubic monotonic convergence.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 29 / 38
Adaptation issues to American options Divide and conquer for eigenproblem
Comparison with Householder tridiagonalization
Execution timecomparison:
TDC/THouseholder
0 10 20 30 40 50 60 702
3
4
5
6
7
8
9
10
System size
CommmentsI Small matrices.I Iterative algorithm to solve the secular equation.I Divergence produced by deflation.
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 30 / 38
Some simulation results
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 31 / 38
Some simulation results
Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50
EuropeanPath-dependent
option Φ(ST ) =
(S1T
2+
S2T
2− S
3T
)+
M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0
N0.01364 4 ∗ 10−5 0.0296 2 ∗ 10−4
√M0√N
0.01307 4 ∗ 10−5 0.0294 2 ∗ 10−4
√M0 0.01265 3 ∗ 10−5 0.0291 2 ∗ 10−4
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 32 / 38
Some simulation results
Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50
EuropeanPath-dependent
option Φ(ST ) =
(3S1
T
10+
7S2T
10− S
3T
)+
−(7S1
T
10+
3S2T
10− S
3T
)+
M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0
N2.72 ∗ 10−3 10−5 0.0365 8 ∗ 10−4
√M0√N
2.44 ∗ 10−3 10−5 0.0453 8 ∗ 10−4
√M0 2.28 ∗ 10−3 10−5 0.0520 8 ∗ 10−4
√N√M0 2.24× 10−3 10−5 0.0528 8× 10−4
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 33 / 38
Some simulation results
Within less than 1 minute simulation on GPU:M0 = 131K , N = 10, Neds = 50 and n = 10
American option
Φ(ST ) =
(K −
S1T
3−
S2T
3−
S3T
3
)+
M1 Θ0 Θ0 std CVA0,T CVA0,T std√M0√N
0.0242 10−4 0.0356 2 ∗ 10−4
√M0 0.0229 10−4 0.0351 2 ∗ 10−4
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 34 / 38
Conclusion
Plan
IntroductionFrom linear/linear to linear/nonlinearFrom nonlinear/linear to nonlinear/nonlinear
Simulation algorithmsWithout funding constraintsWith funding constraints
Adaptation issues toAmerican options
The difference with the referencesLDLtHouseholder tridiagonalization + PCRDivide and conquer for eigenproblem
Some simulationresults
Conclusion
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 35 / 38
Conclusion
Mathematical and computing work suited to GPUs
Mathematical partI Extending CVA (TVA) on American options.I Judicious choice of the number of inner and outer trajectories.
Computing partI CUDA source code of: LDLt, Householder reduction, parallel cyclic
reduction that is not necessary a power of two and divide and conquer foreigenproblem.
I Execution time comparison of the different methods mentioned above.I Original method to further optimize the adaptation of LDLt to our
context.I Original parallel cyclic reduction that can be used for any vector size and
not only a power of two.I Precise answer to the following question: Must we systematically use
Householder tridiagonalization with divide & conquer when we suspectthe random linear systems to be ill-conditioned?
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 36 / 38
Conclusion
Future workI Further developments on the use of Nested Monte Carlo on GPUs for
BSDEs.I Studying the rounding errors and error propagation.I Use CADNA library to test each procedure:
http://www-pequan.lip6.fr/cadna/
Source codeI http://www.proba.jussieu.fr/~abbasturki/soft.htmI Premia library:
https://www.rocq.inria.fr/mathfi/Premia/index.html
ReferencesI L.A. Abbas-Turki and Stef Graillat. Resolution of a large number of small
random symmetric linear systems in single precision arithmetic on GPUs:https://hal.archives-ouvertes.fr/hal-01295549
I L.A. Abbas-Turki and M.A. Mikou. TVA on American Derivatives:https://hal.archives-ouvertes.fr/hal-01142874
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 37 / 38
The End
Thank you
Questions?
Lokman (UPMC, LPMA) GPU Tech. Conf. 2016 38 / 38