Non-parametric calibration of the local volatility surface for

Non-parametric calibration of the local volatilitysurface for European options

Dec 5th, 2011

Jian GengDepartment of Mathematics, Florida State University, Room 208, 1017 Academic Way,Tallahassee, FL, 32306-4510, USA.Email: [email protected] Tel: (01) 727-686-1176

I. Michael NavonDepartment of Scientific Computing, Florida State University, 400 Dirac ScienceLibrary, Tallahassee, FL, 32306-4120, USA.Email: [email protected] Tel: (01) 850-644-6560

Xiao ChenCenter for Applied Scientific Computing, Lawrence Livermore National Laboratory,Livermore, CA 94550, USA.Email: [email protected] Tel: (01) 925-422-6037

1

Abstract

In this paper, we explore a robust method for calibration of the local volatil-ity surface for European options. Assuming the the volatility surface is smooth,we apply a second order Tikhonov regularization to the calibration problem. Ad-ditionally we propose a new approach for choosing the Tikhonov regularizationparameter. Using the TAPENADE automatic differentiation tool in order to ob-tain adjoint code of the direct model is employed as an efficient way to obtain thegradient of cost function with respect to the local volatility surface. Finally weperform four numerical tests aimed at assessing and verifying the aforementionedtechniques.

key words: local volatility surface, second order Tikhonov, iterative regularization,inverse problems

2

1 Introduction

The celebrated Black-Scholes model under the assumption of constant volatility has es-tablished itself both in theory and practice as the classical model for pricing Europeanstyle options (Black and Scholes (1973)). Under this assumption, the implied volatilityfor all options of the same underlying would be the same. However, in the market it isusually observed that the Black-Scholes implied volatility varies with both strikes andmaturity, which are respectively referred to as volatility smile or sometimes volatilityskew and term structure of a volatility surface to reflect the change of implied volatilityin space and time direction (Hull (2009)). Sometimes the volatility smile is just used asa general term to describe any variations of the implied volatility surface.

There have been many studies to extend the Black-Scholes theory to account for thevolatility smile and its term structure. Broadly speaking, there are two directions of suchstudies: one direction introduces jumps (Merton (1976)), stochastic volatility (Heston(1993)), or both; while the other direction considers the volatility as a deterministicfunction that depends on both price (or strikes) and time (or maturities), which is usuallycalled the local volatility model. The local volatility model is an one factor model andthus retains the completeness of the model, which means hedging options with just theunderlying asset is possible(Coleman et al. (1999)). Which volatility model is better doesnot constitute the subject of this paper. Crepey (2004) showed that the local delta (thedelta of an option with local volatility) provides a better hedge than the implied deltafrom Black-Scholes model using both simulated and real time-series of equity-index data.Gatheral (2006) proved the variance of local volatility as a conditional expectation ofinstantaneous variance. This paper addresses the calibration of local volatility modelswith respect to European options. The techniques introduced here can be applied tocalibrate other volatility models or other options.

There have been a series of studies about calibration of local volatility models of Eu-ropean style options. It was established in the seminal work of Dupire (1994) that thelocal volatility function can be uniquely determined given the existence of the Europeanoptions of all strikes and maturities. However, there are only a limited number of Euro-pean options available. Interpolation or extrapolation of the sparse market prices to fillthe gap, such as studies of (Dupire (1994); Derman and Kani (1994); Rubinstein (1994)and Avellaneda et al. (1997)) are known to be subject to both artificial misinterpretationand stability issues(Crepey (2003b)).

There is another direction of research solving the problem as an inverse problemwithout any interpolation or extrapolation of market prices. Generally, providing theparameters of a model to compute the output of a model is referred to as a forwardproblem while providing the output of a model to recover the parameters is referred toas an inverse problem. Most inverse problems are ill-posed, which is due to the natureof inverse problems, see Hansen (1998) some for good explanations. Our calibrationproblem is also ill-posed. We want to point out that the ill-posedness is due to both thepresence of noisy data and the nature of the inverse problem rather than the discrete andfinite observations available as pointed out in Lagnado and Osher (1997). To control the

3

ill-posedness of the inverse problem, some regularization is needed. The most popularregularization is Tikhonov regularization.

In this research direction, Lagnado and Osher (1997) solve the inverse problem in anon-parametric space, i.e., without assuming any shape of the local volatility surface.They use the first order derivatives of the volatility surface to regularize the inverseproblem together with an expensive way to compute the gradient of a cost function.Most research that followed afterwards used the same approach, namely addressing theproblem in terms of using the first order derivatives of the volatility surface to regularizethe inverse problem, such as Bouchouev and Isakov (1997, 1999); Jiang and Tao (2001);Jiang et al. (2003); Crepey (2003a); Egger and Engl (2005); Hein (2005); Achdou andPironneau (2005) and Turinici (2009). However, the optimal volatility surface is usuallynot smooth enough.

Theoretical studies such as Bouchouev and Isakov (1997, 1999)); Jiang and Tao(2001); Crepey (2003a); Egger and Engl (2005) and Hein (2005) explore issues related tostability, uniqueness and convergence rates. However, according to the authors’ knowl-edge, there is no conclusive theoretical study about the existence, uniqueness and stabilityof the inverse problem to date. Our research also assumes that an unique local volatilityfunction exists.

Bodurtha and Jermakyan (1999) also solve the inverse problem in a non-parametricspace. However, as in Lagnado and Osher (1997), the optimal volatility surface lackssmoothness. Coleman et al. (1999); Achdou and Pironneau (2005) and Turinici (2009)all solve the inverse problem by parameterizing the local volatility surface either thoughcubic splines or piecewise linear segments. By using a cubic spline interpolation toconstruct the volatility surface, the volatility surface has nice smoothness properties.Turinici (2009) proposes to calibrate the local volatility using variance of implied volatilityrather than option prices. This parametric approach, by essentially reducing the numberof parameters, works well when the selected knots can represent well enough the keyregions of true volatility surface. However, it runs the danger of allowing too few degreesof freedom to explain the data. How many knots should be placed and where shouldthey be placed might also be problem dependent.

As illustrated in the following two sections, all of the above studies except (Colemanet al. (1999)) solve the inverse problem by minimizing a cost function which measuresthe misfit between model output and observed prices together with a Tikhonov regular-ization. To successfully carry out the process, there are two key factors required: thegradient of cost function and the parameter of Tikhonov regularization. Gradient basedoptimization routine is mostly used to carried out the minimization. To date most re-search papers compute the gradient by first deriving the analytical adjoint equations andthen discretizing and solving them numerically, such as Jiang et al. (2003); Egger andEngl (2005); Turinici (2009). However, the gradient generated by this approach can beinconsistent with the true gradient, which arises from the process of discretized approxi-mation of the analytical adjoint equations (Giering (2000)). It is also infeasible when theanalytical adjoint model is not available, for example, when the model is complicated.There is also no archival paper addressing how to suitably choose the Tikhonov regular-

4

ization parameter. At most, it is selected based on some ad hoc experience, such as inCrepey (2003b).

In this paper, we first propose a new method to generate the gradient of cost functionby using just the numerical code of the original model. This gradient has better numericalconsistency with the true gradient as demonstrated by the successful minimization ofthe cost function even without regularization. Second, we carried out a second orderTikhonov regularization, which was never carried out before in the context of quantitativefinance according to the authors’ knowledge, to make use of the smooth property ofvolatility surface. Third, by analyzing why ill-posedness occurs, we propose a new wayto select the Tikhonov regularization parameter. This approach turns out to be veryrobust.

This paper is arranged in the following order. First, in section 2, the mathematicalformulation of the calibration problem is set up and the complex issues related to the in-verse problem are discussed in more detail and possible solutions are suggested. In section3, we address the issue of using automatic differentiation tools to derive adjoint code tocompute gradient of cost function. In section 4, by analyzing how ill-posedness happenedfor linear inverse problems, we propose a robust way to select Tikhonov regularizationparameters. Last, in section 5, numerical results are presented.

2 Formulation of the calibration problem

2.1 Set up as a least square problem

For consistency, the notations used here are similar to the work of (Lagnado and Osher(1997)). The local volatility model assumes that the price S of an underlying follows ageneral diffusion process:

dS

S= µdt+ σ(S, t)dWt (1)

where µ is the asset return rate in a risk-neutral world, Wt is a standard Brownianmotion process, and the local volatility σ is a deterministic function that may depend onboth the price S of the underlying and time t.

Let V (S0, 0, K, T, σ) denote the theoretical price of an European option with strikeK and maturity T at time 0 for an underlying with spot price S0. Assuming the price Sfollows the stochastic process specified in equation (1), the price function V satisfies thefollowing generalized Black-Scholes PDE:

∂V

∂t+

1

2S2σ2(S, t)

∂2V

∂2S+ (r − q)S

∂V

∂S− rV = 0 (2)

where r is the risk-free continuously compounded interest rate and q is the continuous

5

dividend yield of the underlying. r and q are both deterministic functions, and in thispaper we assume they are constant.

If the functional form of σ(s, t) is specified, then the price of V (S0, 0, K, T, σ) canbe uniquely determined by solving equation (2) together with appropriate initial andboundary conditions. Suppose we are given the market prices of European options(calls,puts, or both) spanning a set of expiration dates T1,. . ., TN . Assume that for eachexpiration date Ti, there are a set of options with strikes spanning from Ki1, . . ., KiMi

,where Mi represents the total number of strikes for expiration date Ti. Let V a

ij and V bij

respectively denote the bid and ask prices for an option with maturity Ti and strike Kij

at the time t = 0.

The calibration of the local volatility surface to the market is to find a local volatilityfunction σ(s, t) such that the solution of (2) is located between the corresponding bidand ask prices for any option(Kij , Ti), i.e.,

V bij ≤ V (S0, 0, Kij, Tj , σ) ≤ V a

ij

for i = 1, . . . , N and j = 1, . . . ,Mi.

This problem is usually solved by solving the following optimization problem:

minσ(s, t)

G(σ) =N

Σi=1

Mi

Σj=1

[V (S0, 0, Kij, Ti, σ)− V ij

w(i, j)]2 (3)

where V ij = (V bij + V a

ij)/2 is the mean of the bid and ask prices. w(i, j) is a scalingfactor reflecting the relative importance of different options. In this paper, we assumew(i, j) is 1 for all options, which means we assume that every option available is equallyimportant. However, the calibration techniques introduced in this paper can easily caterto non-constant scaling cases. This cost functional G(σ) reasonably quantifies the misfitbetween model predicted option prices and market observed option prices. By minimizingthis functional G, the model prediction would best fit the market.

In the above minimization problem, we need to solve PDE (2) once for each optionprice V (S0, 0, Kij, Ti, σ). Instead of solving PDE (2) a number of times, we can usethe Dupire equation, the dual of Black-Scholes equation, to solve for the options pricesV (S0, 0, Kij, Ti, σ) for all Ti and strike Kij at one time.

The Dupire equation establishes the option prices as a function of strike k and matu-rity τ for a fixed underlying price S0 at reference time t = 0. Let C(k, τ) = C(S0, 0, k, τ)be the price of an European call option at strike k and maturity τ . Then C(k, τ) satisfiesthe following Dupire equation:

∂C

∂τ−

1

2k2σ2(k, τ)

∂2C

∂2k+ (r − q)k

∂C

∂k+ qC = 0 (4)

with boundary and initial conditions for European call options given as:

6

C(k, 0) = (S0 − k)+ k ∈ (0, k)C(0, τ) = S0e

−qτ τ ∈ (0, τ ]C(k, τ) = 0 τ ∈ (0, τ ]

where k and τ represent the upper boundary of our computational domain in k and τdirection respectively. r, q are still the deterministic continuously compounded interestrate and dividend yield respectively.

Please see the study of (Dupire (1994)) for the derivation of Dupire equation. Observ-ing the similarity between the Dupire equation and Black-Scholes equation, the numericalcode for solving the Black-Scholes equation just needs to be slightly modified to solvethe Dupire equation.

2.2 Issues and proposed solution

Before trying to solve problem (3), we first point out some aspects of problem (3) thatmake it complicated. (3) is a large scale nonlinear inverse problem. First, to solveequation (4), we discretize the computation domain into Nx * Nt grid points. To estimatea volatility surface σ(k, τ) that best fits market prices means we need to estimate σ at eachgrid point. The total number of parameters to estimate is thus Nt*(Nx-1) consideringno volatility is needed at the boundaries. Although as other archival material as well asour research demonstrated, only the section of volatility surface near the money can beestimated from market prices, the number of parameters to estimate is still quite large.So this is a large scale problem. The total number of options available is usually less thanthe number of parameters to be estimated. So it is also an under-determined problem.Second, although the Dupire or Black-Scholes equation is a linear operator on optionprice V when the volatility σ is independent of V , it is a nonlinear operator on σ or σ2.Third, as for most inverse problems, it is ill-posed in the sense that small changes in theoptions prices may lead to big changes in the volatility surface. But the ill-posednessdoes not imply that we can’t extract meaningful information about the volatility surfacefrom the option prices. Regularization is typically the tool introduced to control theill-posedness.

Most research papers on the calibration of local volatility models used gradient-basedoptimization routines to solve problem(3). The gradient is typically computed from theadjoint equation of either(2) or (4). That is due to the fact that when volatility surface isnot parametrized the dimension of the gradient is so huge that it renders approximatingthe gradient by finite differences computationally very expensive.

In this paper, we propose a new way to derive the gradient of function G in (3)using just the codes for solving model (4) together with application of free automaticdifferentiation tools. This method has several benefits. First, it is not necessary toderive the theoretical adjoint equation of the model. Only the code for solving a modelis needed; this needs to be constructed in any case. This benefit makes it a model freeapproach to compute gradient of cost function in the form of (3) for any model. It will

7

be a good alternative for complex models when their theoretical adjoint models cannotbe easily derived. Second, there are a number of automatic differentiation tools availablewritten in different computer languages that can be used to speed up the process ofconstructing the codes to compute the gradient. Third, the gradient derived using thisapproach possesses better numerical consistency with the true gradient than the gradientcomputed from the continuous adjoint model. This approach will be addressed in thenext section.

3 Using automatic differentiation to derive the gra-

dient of cost function

3.1 Description of adjoint model

The adjoint method has recently gained popularity in the quantitative finance field . Forexample, Giles and Glassman (2005), Capriotti and Giles (2010) used it to speed upthe calculation of Greeks. This paper addresses the application of adjoint methods foroptimal parameter estimation.

We shall establish here the relationship between the gradient of a cost function in theform of (3) and its adjoint model in a very general framework to show this relationship isindependent of the model used. We use a derivation similar to (Giering (2000)). Considera general dynamical system and a model describing this system. Assuming the model isin the form of

F : Rn → Rm

X → Y(5)

where X ∈ Rn is the input or control parameters of the model, Y ∈ Rm is the outputof the model corresponding to input X. For our calibration problem, F would be theDupire model (4), X would be the local volatility surface, Y is a vector of the computedoption prices.

Let Y ∈ Rm be a set of observations of the system output and suppose that the modelcan compute the values Y ∈ Rm corresponding to these observations.

By selecting an appropriate inner product (., .), we can measure the misfit betweenobservations and computed model output by introducing a cost function:

J =1

2(Y − Y,Y − Y)

or

J(X) =1

2(F (X)− Y, F (X)− Y) (6)

By finding the minimum of this cost function, we are looking for input or controlparameters X that best fit the model forecast with the observations.

8

To find the minimum of the cost function J , the gradient of J with respect to X isusually needed.

If we use Taylor expansion on the cost function

J(X) = J(Xi) + (▽J(Xi),X−Xi) + o(| X−Xi |)

And we neglect the higher order terms, we have

δJ = (▽J(Xi),X−Xi) = (▽J(Xi), δXi) (7)

Now let’s suppose F is sufficiently smooth, then for a small perturbation δXi at Xi,we can linearly approximate the variation in Y by

δY = (A(Xi), δXi) (8)

where A is the Jacobian of F (X) at Xi, which is also usually called the tangent linearmodel of F . The tangent linear model of F is a model that will compute the linearapproximation of δY given a perturbation of δX at X. Unless the model F is severelynonlinear, the tangent linear model is usually a good approximation of δY when δX issmall relative to X. Since this model will be implemented in computer codes, we will talkabout the codes for this tangent linear model, which we shall refer to as tangent linearcode. Similarly, we will refer to the code of adjoint model of F as the adjoint code.

From (6), the variation of cost function J around Xi is:

δJ = (δY, F (Xi)− Y) = (A(Xi)δXi, F (Xi)− Y)

Using definition of adjoint operator, we have

δJ = (A(Xi)δXi, F (Xi)− Y) = (δXi,A∗(Xi)F (Xi)− Y) = (A∗(Xi)(F (Xi)− Y), δXi)

(9)where the operator A∗ is the adjoint of linear operator A. A∗ is also called the adjointmodel of F . In discrete case, A is the Jacobian Matrix, A∗ is just the transpose ofJacobian matrix A for real numbers.

Comparing (9) with (7), we have:

▽J(Xi) = A∗(Xi)(F (Xi)− Y) = A∗(Xi)(Y(Xi)− Y) (10)

Equation (10) establishes that we can compute the gradient of cost function J(X)using adjoint model of F . But why do we want to use the adjoint model to computethe gradient? The reason is that when the dimension of the input X is really large, wewill have to run n + 1 times the model F if we were to use finite difference to computethe gradient of cost function J(X). This becomes computationally very expensive whenthe model F is large and complicated. By using (10) we can just run the adjoint modelonly once to compute the gradient of cost function J(X). Griewank (1989) shows thatthe required numerical operations take only 2−5 times the computation required for thecost function.

9

3.2 Derivation of adjoint code using automatic differentiation

A complete detailed discussion of the rationale of automatic differentiation is beyond thescope of this paper. See Giering and Kaminski (1998) for details. We will just list somemain aspects of automatic differentiation and some of the resources available. Thereare a few automatic differentiation tools available whose details are to be found on thewebsite www.autodif.org for numerical codes written in either C, Fortran or Matlab, suchas TAPENADE, TAMC, ADIFOR to cite but a few. Automatic differentiation is basedon the idea of the chain rule. A numerical model is an algorithm that can be viewed asa composition of differentiable functions F assuming non-differential points will not beincluded, each represented by either a statement or a subroutine in the numerical code.Automatic differentiation computes the derivative of each statement or subroutine andthen combines them together. There are some automatic differentiation tools which willgive warnings when a non-differential point occurs, such as ADIFOR.

There are two modes in automatic differentiation: the forward mode and the adjointor reverse mode. The forward mode computes the derivatives in a top-down approachwhile the adjoint mode computes the derivatives in a bottom-up approach. Feeding thenumerical source code of model F with inputX and outputY to automatic differentiationtools, they will generate the tangent linear code of model F in forward mode whilegenerate the adjoint code of F in reverse mode.

As pointed out in (8), the output of the tangent linear mode is actually δY = AδXrather than the Jacobian matrix A. By letting δX to be an unit vector with 1 at theith component and 0 at all other components, we can use the tangent linear model tocompute the ith column of matrix A. Iteratively we can find all the n columns of theJacobian Matrix A. As the finite difference approach, this approach is not the best wayto compute the Jacobian matrix A when n, the number of columns of A is much greaterits number of rows m.

In the application of this paper, the Jacobian Matrix A is not required explicitly.Instead, we just need a matrix and vector product A∗(Xi)(Y(Xi)− Y) as described in(10). However, adjoint code can be used to compute the Jacobian matrix A. By feedingthe adjoint code with the jth column of an identity matrix, we will be able to computethe jth row of the Jacobian matrix A. The number of runs for the adjoint model wouldbe m times, which makes it a better choice for computing A, when n is much greater thanm. This is also the rationale behind the work of Giles and Glassman (2005), Capriottiand Giles (2010).

3.3 Verification of adjoint code

Even though the automatic differentiation tools available now are more robust now, it isstill a good practice to check whether the adjoint code generated by these automatic toolsis right or not especially for complex models. The adjoint code is tested using the twostrategies suggested by Navon et al. (1992). The first strategy is the following identity

10

test and the second strategy is the gradient test.

(AQ)T (AQ) = QT (AT (AQ)) (11)

where Q represents the input of A, A represents the tangent linear code or a segmentof it, say a subroutine, a loop or even a single statement. AT is the adjoint of A. If(11) holds within machine accuracy for every segment of the tangent linear code A, itcan be said that the adjoint code is correct with respect to the tangent linear code. Inour numerical test, we checked the adjoint code segment by segment, loop by loop, andsubroutine by subroutine. With double precision, the identity (11) is always accuratewithin 13 digits or better. This verifies the correctness of the adjoint code against thetangent linear code.

3.3.1 Test of accuracy of the Tangent Linear Model(TLM)

Test (11) makes use of the tangent linear code to check the correctness of the adjointcode. The tangent linear code generated by automatic differentiation tools has muchhigher chance of being right than for adjoint code. From the authors’ own experience,the tangent linear code is right most of the time. But since the tangent linear modeldepends on the linearization assumption of model F , this assumption needs to be checked.The following test will be used to check both the validity of this assumption and also thecorrectness of tangent linear code. The accuracy of tangent linear model determines boththe accuracy of the adjoint model and also the accuracy of the gradient of cost functionwith respect to the control variables, which all lies in the linearization assumption.

To verify A, we use the fact that A is linearization of the model F :

F (X+ α ∗ δX)− F (X) = A(α ∗ δX) +O(α2)

where δX is a small perturbation around X.

We compare the output from tangent linear code forced by a small forcing δX withthe difference of the twice model call, with and without perturbation respectively. If thelinearization holds and its code is right, then the ratio between these two should approachone as α gets close to zero, as illustrated by the following equation.

r =F (X+ α ∗ δX)− F (X)

A(α ∗ δX)= 1 +O(α) (12)

After verifying the tangent linear code, we can use (11) to check the correctness ofthe adjoint code.

3.3.2 Gradient Test

Even though both the tangent linear code and adjoint code are correct, the gradientgenerated by using the adjoint model still needs to be verified since the accuracy of the

11

gradient depends not only on the accuracy of the tangent linear and adjoint model, butalso on the approximation involved in linearizing the cost function (7). The gradient canbe verified by again using the Taylor expansion.

Suppose the initial X has a perturbation αh, where α is a small scalar and h is avector of unit length (such as h = ▽J‖▽J‖−1). According to Taylor expansion, we getthe cost function:

J(X + αh) = J(X) + αhT▽J(X) +O(α2)

We can define a function of α as:

Φ(α) =J(X+ αh)− J(X)

αhT▽J(X)= 1 +O(α) (13)

The gradient ▽J(X) is calculated using the adjoint model. So as α tends to zero butnot close to machine precision, this ratio Φ should be close to 1. When α is close tomachine precision, the ratio will deviate from 1 as the machine error dominates. Figure(1) shows the gradient test of our numerical code. We can see as α is between 10−15 and10−4, Φ(α) approaches 1 with high accuracy. The gradient can be also verified indirectlyby the reduction of cost function. Figure (2) shows the decrease of cost function for S&P500 index European call options in October 1995 before any regularization. Details of theS&P 500 options are described in the second example in the numerical test section. Thecost function for the other three sets of options discussed in the numerical test sectioncan be reduced close to zero as well before any regularization is applied. However theoptimal volatility surfaces obtained are very oscillatory and unstable, see figure 3. Thisjust illustrates the under-determined and ill-posed nature of our calibration problem.Regularization is thus necessary to control both the under-determination and the ill-posedness.

12

−20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

log10(α)

log1

0 (Φ

(α))

Figure 1: verification of the gradient calculation

13

0 50 100 150 200 250−5

−4

−3

−2

−1

0

1

2

iteration

log1

0(G

)

Figure 2: Reduction of the cost function without any regularization for S&P 500 index European calloptions in October 1995

14

00.20.40.70.91.0

1.5

2.0

0.80.911.11.2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

T

K/S0

σ

Figure 3: The optimal local volatility surface reconstructed before applying any regularization for S&P500 index European call options in October 1995

15

4 Tikhonov Regularization

4.1 Second order Tikhonov Regularization

Tikhonov regularization is the most popular regularization method for inverse problems.It seeks a compromise between the size of the cost function and the size of the solution.It assumes the following form.

J(σ) = G(σ) + λ ‖ Lm|σ − σ0| ‖22 (14)

where λ is the regularization parameter. σ0 is a priori estimate of σ. It is 0 if thereis no priori estimate. Lm is an operator. When m = 0, L is the identity matrix.The regularization is called the zeroth order Tikhonov regularization. When m = 1, Lassumes the form of the first derivative of σ and it is called the first order Tikhonovregularization. As mentioned in the introduction, most papers on the calibration of localvolatility surfaces use the first order Tikhonov regularization, which assumes the formof:

J(σ) = G(σ) + λ ‖ ▽σ ‖22 (15)

However, the volatility surface generated by the first order Tikhonov regularization isusually rough. Assuming the volatility surface is smooth as in (Coleman et al. (1999)), wepropose to use the following second order Tikhonov regularization. Since only volatilitiesnear the money are sensitive to option prices, the regularization is just applied to thepart of volatility surface whose moneyness, which is defined as ratio between strike Kand spot S0, is between (0.8, 1.2).

J(σ) = G(σ) + λ‖∂2σ

∂x2+

∂2σ

∂t2+

∂σ

∂t

∂σ

∂x‖22 (16)

The calibration problem now assumes the form of a constrained minimization problem:

min0<σ(s, t)<1

J(σ) (17)

Usually a gradient based optimization routine is used to find a local minimum of J.The gradient of cost function J is composed of both the gradient of G, which is derivedfrom the adjoint model, and gradient of the regularization part in (16). Stochasticoptimization strategies can be applied in order to find a global minimum of J. Herewe are just interested in finding a dependable and smooth volatility surface that traderscan use to hedge their positions.

Since the parameters are bounded between 0 and 1, we use L-BFGS-B code (Zhuet al. (1997); Morales and Nocedal (2011)) to minimize the regularized cost function of(16). L-BFGS-B is an optimization routine to minimize bounded or unbounded large

16

scale minimization problems using just the gradient of cost function. It has a super-linear convergence rate yet requires only small memory storage since it does not storethe Hessian matrix of the cost function.

4.2 Strategy for selecting Tikhonov regularization parameter λ

A tikhonov solution of the inverse problem depends critically on a suitable selection ofthe regularization parameter λ. How to suitably choose a regularization parameter is stillat the stage of active research. For linear inverse problems, λ is usually selected by eitherthe L-curve method or generalized cross validation theory, see Hansen (1998) and Asteret al. (2005). For nonlinear inverse problems, only the L-curve method is still availableas an empirical way to select the optimal λ. However, we found that the L-curve methodcan’t generate a smooth volatility surface. In addition, the L-curve can’t retain its shapewhen plotted in a log-log scale, a technique that is typically used for linear problem inorder to select the L-corner.

As many nonlinear problems are solved iteratively by solving a linear problem at eachiteration, we will adopt an iterative regularization strategy to solve the nonlinear inverseproblem, in which a suitable regularization parameter λ is selected at each iterationrather than using the same value throughout the minimization process, such as in theL-curve method. By linearizing the problem at each iteration, some of the analysis forlinear inverse problem can be applied.

To determine how to select a suitable regularization parameter at each iteration,we consider the following analysis. This approach is inspired by (Aster et al. (2005)).But our problem is an under-determined problem while Aster et al. (2005) studied anover-determined problem.

We are actually solving for a vector M from

GM = D (18)

where G is a nonlinear model operator, M is the input characterizing a model and Dconsists of observation data. In our case, G is Dupire equation (4), M is a large vectorcontaining all the points of the volatility surface with size Nt*(Nx-1), and D is a vectorcontaining all available option prices.

This problem can’t be solved directly due to its nonlinearity. Instead, it is solved byminimizing a cost function of the form (6). When we use L-BFGS-B to iteratively findthe minimum of (16) without regularization, it finds M iteratively using the gradientinformation of G. In other words, it finds Mk+1 at iteration k+1 such that,

GMk+1 = GMk +A(δM) ≈ D (19)

where δM is a small perturbation in the vicinity of Mk. We neglect the higher powersof δM in the Taylor expansion, where A is the Jacobian matrix of nonlinear operator G.

17

If equation (19) is not well-posed, then the optimization routine L-BFGS-B may findan unstable solution. If equation (19) is well posed, the optimization routine will be morelikely to find stable solutions. Equation (19) in a linear equation assuming the form of:

A(δM) = D−GMk (20)

Considering Mk+1 =Mk + δM, (20) is equivalent to:

AMk+1 = D−GMk +AMk (21)

Let B = D−GMk+AMk, which can be computed after iteration k, (21) is equivalentto:

AMk+1 = B (22)

Using the pseudo-inverse of A, we find that the solution to (22) can be expressed as:

Mk+1 = VpS−1p UT

pB =p

Σi=1

(U., i)TB

siV., i (23)

where p is the number of singular values of matrix A. Assuming matrix Amn is notrank deficient, its rank p = min(m, n). In our case, n is the number of parameters toestimate; m is the number of options. Since m is less than n in our problem, p = m.

Considering (23) we conclude that the inverse solution can become extremely unstablewhen the small singular values, si, decay faster than (U., i)

TB. The discrete Picardnumber, which is defined as the ratio between (U., i)

TB and si is usually used to quantifythe difference between the two decay rates. Figure 4 is a plot of the discrete Picardnumbers. We can see that as the singular values get smaller at the end of the spectrum,the discrete Picard number increases. Figure 5 shows the singular values normalized bythe biggest singular value. The decay of the singular values is in the order of O(i−µ),where µ ≤ 1. The linear problem (22), according to the definition in Hofmann (1986),is a mildly ill-posed problem. By finding out where the ill-posedness originates from, wecan regulate the ill-posedness by eliminating the effects of the small singular values si inthe eigenvalue spectrum.

Thus our criteria of selecting the regularization parameter λ at each iteration is tochoose λ such that it is greater than some small singular values of A while smaller thanthe largest singular value. In this way, the effect of the small singular values is eliminatedwhile the main information represented by the dominant singular values is still retained.This is a feasible strategy due to the following reasons.

First, after adding the Tikhonov regularization to the inverse problem, the linearequation (22) becomes :

AMk+1 = B+ λ ‖ Lm(Mk+1) ‖22 (24)

18

where Lm is the second order operator defined in (16). Since this is a regularizationon a linear problem, (24) can be solved using a generalized singular value decomposi-tion(GSVD) and its solution assumes the form of:

Mk+1 =k

Σi=1

γ2i

γ2i + λ2

(U., i)TB

αi

X., i (25)

where γi are the generalized singular values. The factors fi =γ2

i

γ2

i+λ2

, which are called

filter factors, will be close to zero when γi is much smaller than λ, and will be 1 when γiis much greater than λ. Please see the Appendix for a detailed derivation and descriptionof each term of (25)

Second, by reducing the effect of the small singular values, we are essentially remov-ing the corresponding singular vectors, which usually contain many sign changes. Byremoving these singular vectors, our solution becomes much smoother.

In our numerical experiment, at each L-BFGS-B iteration from Mk to Mk+1, we firstsort the singular values in a decreasing order. Then we find the first singular value priorto which the sum of the dominant singular values accounts for 50% of the total sum ofall singular values. This singular value is then selected as the regularization parameterλ for that iteration. This percentage, which will be referred to as the truncation levelof the singular values, is used to determine which singular value will be used as theregularization parameter. It is the only parameter that is subject to change in ournumerical method. The higher the truncation level, the smaller the chosen Tikhonovregularization parameter. The value 50% is just used for illustration purposes. We willpresent a reasonable interval for the truncation level in the numerical tests section.

We use the package ARPACK o find the singular values ofA. This package is availableat: www.caam.rice.edu/software/ARPACK. All that it requires is a code computing theproduct of the matrix A with a vector rather than the matrix A itself, which suits ourcase perfectly. First, A has a high dimension and requires a serious amount of resourcesto be stored explicitly. Second, the adjoint code derived in section (3) actually computesthe product of AT and its input vector. Thus we can use the ARPACK package tocompute the singular values of AT , which are also the singular values of A.

19

0 10 20 30 40 50 6010

−3

10−2

10−1

100

101

102

|UiTB|

σi

|UiTB|/σ

i

Figure 4: The Picard plot of equation (22) after the first L-BFGS-B iteration for calibration of thelocal volatility surface for S&P 500 index European call options in October 1995

20

0 10 20 30 40 50 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

scaled singular values

i−0.2

i−1

Figure 5: Normalized singular values of the Jacobian matrix A in the first L-BFGS-B iteration forcalibration of the local volatility surface S&P 500 index European call options in October 1995

21

5 Numerical Results

Prior to discussing our numerical tests, let us first summarize our algorithm:

1. Initialize volatility surface σ0(s, t)

2. Use (4) to compute option prices Vcmpt and cost function G in (3)

3. Feed the difference between Vcmpt and Vobs into the adjoint model AT derived insection (3.2), using (10), to compute the gradient of G with respect to σ(s, t)

4. Use ARPACK to compute the singular values of Jacobian Matrix A and computethe regularization parameter λ of (16) at truncation level 50% as described insection (4.2)

5. Add the regularization part of (16) to G to form the regularized cost function J of(16). Add the gradient of regularization part of (16) to the gradient obtained instep 3 to compute the gradient of cost function J with respect to σ(s, t)

6. Insert the cost function J and its gradient with respect to σ(s, t) into L-BFGS-Broutine to find the next estimate σk+1(s, t). k = 0, 1, 2, · · ·

7. When either the stopping criterion of L-BFGS-B is satisfied or 500 function callsof cost function J are exceeded, stop. Otherwise, go back to step 2.

The Dupire equation (4) is solved using backward Euler scheme in time and centraldifference scheme in space direction, respectively. The computation domain

[

0 T]

×[

0 K]

is set as K = 2S0 as in (Lagnado and Osher (1997)) while T is the longestmaturity. NX = 200, Nt =100 are used for all our four numerical tests. This satisfiesthe CFL stability condition for parabolic PDE. The lower and upper bound for σ whenrunning L-BFGS-B is set to be 0 and 1. The stopping criteria in L-BFGS-B is set as factr= 100, pgtol = 10−6, which means L-BFGS-B stops when the projected gradient is lessthan 10−6 or the relative reduction of f between two consecutive iterations is less thanfactr*machine precision when f is greater than or equal to 1 or the absolute difference off between two consecutive iterations is less than factr*machine precision when f is lessthan 1. For details of L-BFGS-B, please see (Zhu et al. (1997)). The initial guess σ0

=0.15 for all four cases.

To demonstrate the robustness of our method, we start with a theoretical model asused in both Lagnado and Osher (1997) and Coleman et al. (1999). In this example, thelocal volatility function assumes the form of

σ(s, t) =15

s(26)

The options prices have closed-form solutions as in Cox and Ross (1976). Twentytwo European call option prices are generated using the closed-form solution for two

22

maturities T =0.5 and T =1.0 with eleven options for each maturity. Then these optionprices are used to recover the volatility surface (26). Similar to the study of Lagnadoand Osher (1997); Coleman et al. (1999), S0 =100, the risk free interest rate r =0.05,and the dividend yield q = 0.02. Figure 6 shows the recovered volatility surface andthe true volatility surface. The recovered volatility can be barely distinguished from thetrue volatility surface, which shows an almost exact recovery of the true volatility surface.Figure 7 shows the relative error of computed options prices using the recovered volatilitysurface with respect to the true option prices. The relative error is of the order of 10−3.

To demonstrate the robustness of our methods, we added noise to the true optionprices to assess whether we can still recover the volatility surface. The noise is introducedby:

vi = the exact price of option i vi*(1+noise-level(0.5-rand))

where rand is an uniformly distributed random number between 0 and 1 generated byGNU gfortran, noise-level is the percentage of noise that is added to each option price.In our example, we tested noise levels of 2%, 5%, and 10% respectively. Figures 8, 9,and 10 show the recovered volatility surfaces of these three noise-levels compared to thetrue volatility surface. We observe that when the noise level is low, for example, 2%, theoptimal volatility surface recovered still approximates the true volatility surface very well.We use equation (27) to measure the relative error of optimal volatility surface comparedto the true volatility surface. The relative error of 2% noise is 6.5%. Even when thenoise level is as high as 10%, the optimal volatility still reasonably approximates the truevolatility surface although the deviation from the true volatility surface is higher thanin the low noise level case. The relative errors calculated using (27) are 12% and 19%respectively for noise levels 5% and 10%.

r =‖σtrue − σoptimal‖2

‖σtrue‖2(27)

Table 1 shows the absolute relative error of some Greeks computed using the optimalvolatility surface with respect to the Greeks computed using the true volatility surface.Delta and Rho are computed using the forward mode of automatic differentiation, whichwill compute the exact value of the Greeks. Vega is approximately computed by finitedifferences as in (Coleman et al. (1999)). A constant perturbation of both volatilitysurfaces is used to compute the relative error. We can see that when the noise level islow, the mean absolute relative error is less than 1.5% for all three Greeks. The meanabsolute relative error for Rho is as small as 0.2%. Even when the noise level is as highas 10%, the reconstructed Greeks still approximate the true Greeks fairly well with themean absolute relative error being less than 5% for all three Greeks. Gamma is notcomputed since Gamma would be zero for both cases due to the setup of the boundaryconditions of (4). The high degree of approximation of Greeks even when the optionprices are contaminated by noise further demonstrates the robustness of our calibrationmethod. This robustness may imply a better hedge for traders.

23

The robustness of our calibration method even for noisy option prices is very differentfrom the findings of Coleman et al. (1999), where a parametric volatility surface spannedby cubic splines is solved. In Coleman et al. (1999), it is argued that when the numberof interpolation knots exceeds the number of options available, even a small amount ofnoise will render the optimal volatility surface invalid. Our method however is robustnot only with noisy data but also with an increasing number of degrees of freedom. It istested that refining the mesh grids, i.e., adding more parameters or degrees of freedom,does not reduce the effectiveness of this method. The general shape of optimal volatilitysurface remains the same when Nx and NT increase. However, NX, NT should be largeenough in order to ensure the CFL condition is satisfied.

24

noise level Greeks max |relative error| mean |relative error| min |relative error|Delta 1.3% 0.7 % 0.2 %

2 % Vega 5.2 % 1.3 % 0.01 %Rho 1.0 % 0.2 % 0.02 %Delta 3.4% 2.2 % 0.7 %

5 % Vega 12 % 3.0 % 0.3 %Rho 2.3 % 0.5 % 0.02 %Delta 6.6% 4.7 % 1.8 %

10 % Vega 21 % 4.6 % 0.6 %Rho 4.0 % 1.2 % 0.02 %

Table 1: The absolute relative error of Greeks computed using volatility surface reconstructed fromnoisy prices compared to Greeks computed from true volatility surface for volatility model σ(s, t) = 15

S

25

The second example of our method is another benchmark test as in Coleman et al.(1999); Andersen and Brotherton-Ratcliffe (1998) and Turinici (2009). The options areEuropean Call options on S&P 500 index in October 1995. There are a total of 57options with seven maturities. The initial index, interest rate, and dividend yield areprovided in the footnotes of Figure 11. Figure 11 shows the optimal volatility obtained.Compared to other studies, our volatility surface not only has the best smoothness butalso is in a reasonable range between 0.1 and 0.35, which is not the case in other studiessuch as Coleman et al. (1999). It also has a nice skew structure that agrees with thestatement by Hull (2009) that traders use a skewed volatility to price European stockindex options. The relative errors of computed prices with respect to observed prices areplotted in Figure 12. The relative errors are mostly close to zero except for options whoseprices are close to zero and maturities occur sooner. This is acceptable since options withnearly zero prices allow much higher degree of relative difference between bid and ask, orin other words, allow a much higher degree of approximation error. The mean absoluterelative error is 4.6%. Excluding the five options with big absolute relative errors, themean absolute relative error is as small as 0.27%.

The last two examples are concerned with European call options in the foreign ex-change market. The first one was studied by both Avellaneda et al. (1997) and Turinici(2009). There are 15 European call options for the US dollar/Deutsche mark with 5maturities, which are computed from 20, 25 and 50 delta risk-reversals quoted on Aug23, 1995. The spot price and interest rates are shown in the footnotes of Figure 13. Theoptimal volatility surface and relative error of computed price are shown in Figure 13 and14 respectively. The volatility surface has a nice smile shape as expected for Europeanoptions on foreign exchange rates. The mean absolute relative error is 1.8%.

The last example is about the European call options for the euro/US dollar datedMar 18, 2008 as in study (Turinici (2009)). There are a total of 30 options with 6maturities. The option prices are computed from quoted 10, 25, and 50 delta risk-reversal and strangles. The spot rate and interest rates for each currency are listed onfootnotes of Figure 15. As Figure 15 demonstrates, the volatility surface exhibits a sortof bell shape structure in the long run. It also displays some short term structures. Asthe location of maturities shows, the near term volatility structure has something to dowith the clustering of options of short maturities. Figure 16 shows the relative error ofcomputed price with respect to observed prices. Again, big relative error occurs whenthe option prices are close to zero and maturities are soon. The mean absolute relativeerror is 6%. But the mean absolute relative error is as small as 0.9% when the optionswith short maturities or nearly zero prices are not included.

5.1 Truncation level, scaling and computation time

The only parameter that is subject to change in our algorithm is the truncation level.However it is fixed at 50% thorough our four tests. Other truncation levels are also tested.The higher the truncation level, the more oscillatory the volatility surface is. Since in thiscase the Tikhonov regularization parameter is small then some noises due to discretization

26

or market noise are not efficiently filtered out. The lower the truncation level, the stablerand smoother of the volatility surface however at the risk of increasing the relative error.However the relative error and the general shape of the optimal volatility surface actuallydo not change too much overall when the truncation level is less than 0.9, which meansthis method is fairly robust with respect to different truncation levels as long as theregularization parameter selected is not close to the small singular values at the end ofthe spectrum. This point can also be deduced from the fact that we use a fixed truncationlevel for all of our four numerical tests. The author suggests that (0, 0.9) is a good rangefor the truncation level.

There might be concerns that when the truncation level is low whether some usefulinformation could be smoothed out. However, we found out this is not a problem in ourcase. Our numerical tests show that even when the truncation level is close to 0, thisregularization still works very well.

For interest rate options, it is a good practice to scale the spot price to be 100 andalso scale the option prices accordingly. Because the prices of interest rate options aresmall, after squaring them in the cost function, they become too small to be accountedfor. After scaling the spot rate S0 to 100, the relative errors are reduced. When thespot price is large, for example the stock index option with S0 = 590, there is neithersignificant improvement nor worsening of the relative error by scaling S0 to 100. However,we find it a good practice to carry out scaling since in this way we just need to pre-processthe spot price and option prices for any kind of European options on any underlying suchas equity index, foreign exchange without changing anything in the numerical code torecover the optimal volatility surface.

For all the last three numerical tests, the computation time is 166, 12, 59 secondsrespectively using a Dell Vostro 1720 with Intel Core Duo CPU @2.2G HZ and 2GBRAM. For the first numerical test, to reach high accuracy accuracy, no limit to maximumnumber of iterations was imposed, the computation time is 407 seconds using the samecomputer.

6 Summary and Conclusions

Our present research addresses solving the calibration of the local volatility surface forEuropean options in a non-parametric way. We propose a new way to use the automaticdifferentiation tool TAPENADE to develop the adjoint model of the Dupire equation,which is then used to compute the gradient of the cost function. The gradient generatedin this way is numerically more consistent with the true gradient of the cost function thanthe continuous adjoint approach used in most research papers so far. We also proposefor the first time, according to the best of the authors’ knowledge, using a second orderTikhonov regularization to regularize the calibration problem. Additionally, we proposean efficient way to choose the Tikhonov regularization parameter by exploring the causesof ill-posedness. It is selected as a suitable singular value of the Jacobian matrix that isimplicitly used during the optimization process.

27

Our method turns out to be robust as evidenced by its successful recovery of thevolatility surface of a theoretical model even when the option prices are contaminatedwith noise. This is a big improvement compared to other published papers so far, accord-ing to the best of the authors’ knowledge. The robustness of our method is also validatedby the reasonable volatility surfaces recovered from three real world examples that havebeen studied also in other papers. Furthermore, although the we are dealing with a largescale problem, the computation time is so short that it can be used for real time applica-tions. This is due to the fact that both the L-BFGS-B routine and ARPACK packagesthat we employ save time and storage requirements. Last, although our numerical testsare carried out exclusively on European call options, the method introduced here shouldalso work for European put options or a combination of both. The calibration techniqueproposed is developed in a very general framework. It can be generalized to explore thecalibration of other volatility models or calibration with respect to other options.

28

0

0.5

1

0.80.911.11.2

0.12

0.13

0.14

0.15

0.16

0.17

0.18

0.19

TK/S0

σ

Figure 6: The true volatility surface and optimal volatility surface for volatility model σ(s, t) = 15

S

as studied in (Lagnado and Osher (1997) and Coleman et al. (1999)) using 22 options prices computedfrom (Cox and Ross (1976)).

29

0 5 10 15 20 250

5

10

15

optio

n pr

ices

(Vob

s)

0 5 10 15 20 25−2

0

2

4x 10

−3

rela

tive

erro

r : r

= (V

cmpt

−V

obs)

/Vob

s

relative error

option prices

Figure 7: Left: The true options prices. Right: The relative error of the computed options pricesusing optimal volatility surface with respect to the true prices for model σ(s, t) = 15

S. Option prices are

plotted in an order of increasing maturities and decreasing of prices. S0 =100, the risk free interest rater =0.05 and the dividend yield q = 0.02.

30

0

0.5

1

0.80.9

11.1

1.2

0.1

0.12

0.14

0.16

0.18

0.2

T

K/S0

σ

Figure 8: The optimal volatility surface compared to the true volatility surface when 2% uniform noiseis added to the true option prices for volatility model σ(s, t) = 15

S

31

0

0.5

1

0.80.9

11.1

1.2

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

T

K/S0

σ

Figure 9: The optimal volatility surface compared to the true volatility surface when 5% uniform noiseis added to the true option prices for volatility model σ(s, t) = 15

S

32

0

0.5

1

0.80.911.11.2

0

0.05

0.1

0.15

0.2

0.25

TK/S0

σ

Figure 10: The optimal volatility surface compared to the true volatility surface when 10% uniformnoise is added to the true option prices for volatility model σ(s, t) = 15

S

33

00.2

0.40.7

0.91.0

1.5

2.0

0.80.9

11.1

1.2

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

TK/S0

σ

Figure 11: The optimal volatility surface obtained for S&P 500 index European call options in October1995. S0 =$ 590, r=0.06, q=0.0262. Note: the available maturities are plotted on the T axis in unit ofyears

34

0 10 20 30 40 50 600

50

100

150

optio

n pr

ices

(Vob

s)

0 10 20 30 40 50 60−1

−0.5

0

0.5

rela

tive

erro

r : (V

cmpt

−V

obs)

/Vob

s

relative erroroption prices

Figure 12: Left: The prices of S&P 500 index European call options in October 1995 (Andersenand Brotherton-Ratcliffe (1998)). Right: The relative errors of computed option prices with respect toobserved price. Option prices are plotted in an order of increasing maturities and decreasing of prices.

35

0 0.08

0.160.24

0.49

0.74

0.8

0.9

1

1.1

1.2

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

TK/S0

σ

Figure 13: The optimal volatility surface obtained for European call options of US dollar/ Deutschemark rate. The spot price was S0 =1.48875; US dollar interest rate was rUS = 5.91%; Deutsch markrate was rDeutschmark= 4.27%. Note: the available maturities are plotted on the T axis in unit of years

36

0 5 10 150

0.01

0.02

0.03

0.04

0.05

0.06

optio

n pr

ices

(Vob

s)

0 5 10 15−0.06

−0.04

−0.02

0

0.02

0.04

0.06

rela

tive

erro

r : r

= (V

cmpt

−V

obs)

/Vob

s

relative erroroption prices

Figure 14: Left: The prices of European call options on US dollar/Deutsche mark rate recovered from20, 25, and 50 delta risk reversals (Avellaneda et al. (1997)). Right: The relative error of computedoption price with respect to observed price. Option prices are plotted in an order of increasing maturitiesand prices.

37

00.020.080.16 0.25

0.50

1.0

0.8

0.9

1

1.1

1.2

0

0.05

0.1

0.15

0.2

TK/S0

σ

Figure 15: The optimal volatility surface obtained for European call options on euro/US dollar rateon Mar 18, 2008. The spot price was 1.5755; US dollar interest rate was rUSD = 2.485%; euro interestrate was rEUR =4.550%

38

0 5 10 15 20 25 300

0.2

0.4

optio

n pr

ices

(Vob

s)

0 5 10 15 20 25 30−0.5

0

0.5

Rel

ativ

e er

ror :

r =

(Vcm

pt −

Vob

s)/V

obs

option pricesrelative error

Figure 16: Left: The prices of European call options on euro/US dollar rate on Mar 18, 2008 recoveredfrom quoted 10, 25, and 50 delta risk-reversals and straddles.(Turinici (2009)). Right: The relativeerror of computed option price with respect to recovered price. Option prices are plotted in an order ofincreasing maturities and prices.

39

References

Achdou, Y. and Pironneau, O. (2005), Computational Methods for Option Pricing, Soci-ety for Industrial and Applied Mathematics(SIAM), Philadelphia.

Andersen, L. and Brotherton-Ratcliffe, R. (1998), “The equity option volatility smile: animplicit finite difference approach”, The Journal of Computational Finance , Vol. 1,pp. 37–64.

Aster, R., Borchers, B. and Thurber, C. (2005), Parameter Estimation and Invese Prob-

lems, Elesvier Academic Press, Burlington.

Avellaneda, M., Fridman, C., Holems, R. and Samperi, D. (1997), “Calibrating volatilitysurface via relative entropy minimization”, Applied Mathematical Finance , Vol. 4,pp. 667–686.

Black, F. and Scholes, M. (1973), “The pricing of options and corporate liabilities”,Journal of Political Economy , Vol. 81, pp. 637–659.

Bodurtha, Jr, J. and Jermakyan, M. (1999), “Nonparametric estimation of an impliedvolatility surface”, The Journal of Computational Finance , Vol. 2, pp. 29–61.

Bouchouev, I. and Isakov, V. (1997), “The inverse problem of option pricing”, InverseProblems , Vol. 13, pp. L11–L17.

Bouchouev, I. and Isakov, V. (1999), “Uniqueness,stability and numerical methods for theinverse problem that arises in financial markets”, Inverse Problems , Vol. 15, pp. R95–116.

Capriotti, L. and Giles, M. (2010), “Fast correlation greeks by adjoint algorithmic differ-entiation”, Risk , Vol. 23, pp. 79–83.

Coleman, T., Li, Y. and Verma, A. (1999), “Reconstructing the unknown local volatilityfunction”, The Journal of Computational Finance , Vol. 2, pp. 77–100.

Cox, J. and Ross, S. (1976), “The valuation of options for alternative stochastic pro-cesses”, Journal of Financial Economics , Vol. 3, pp. 145–166.

Crepey, S. (2003a), “Calibration of the local volatility in a generalized Black-Scholesmodel using Tikhonov regularization”, Journal of Mathematical Analysis on SIAM ,Vol. 34, pp. 1183–1206.

Crepey, S. (2003b), “Calibration of the local volatility in a trinomial tree using Tikhonovregularization”, Inverse Problems , Vol. 19, pp. 91–127.

Crepey, S. (2004), “Delta-hedging vega risk”, Quantitative Finance , Vol. 4, pp. 559–579.

Derman, E. and Kani, I. (1994), “Riding on a smile”, Risk , Vol. 7, pp. 32–39.

Dupire, B. (1994), “Pricing with a smile”, Risk , Vol. 7, pp. 18–20.

40

Egger, H. and Engl, H. (2005), “Tikhonov regularization applied to the inverse problem ofoption pricing: convergence analysis and rates”, Inverse Problems , Vol. 21, pp. 1027–1045.

Gatheral, J. (2006), The volatility surface: a practitioner’s guide, John Wiley & Sons,Inc, New Jersey.

Giering, R. (2000), Tangent linear and adjoint biogeochemical models, in P. N. R. D.P.Kasibhatla, M.Heimann, ed., ‘Inverse methods in global biogeochemical cycles’,American Geophysical Union, Washington DC, pp. 33–47.

Giering, R. and Kaminski, T. (1998), “Recipes for adjoint code construction”, ACM on

Transactions on Mathematical Software , Vol. 24, pp. 437–474.

Giles, M. and Glassman, P. (2005), “Smoking adjoints: Fast calculation of greeks inmonte carlo calculation”, Technical Report , Vol. NA-05/15.

Griewank, A. (1989), On automatic differentiation, in M.IRI and K.TANABE, eds,‘Mathematical programming: Recent Developments and Applications’, Kluwer Aca-demic Publishers, Dordrecht, pp. 83–108.

Hansen, P. (1998), Rank-Deficient and Discrete Ill-Posed Problems: numerical aspects of

linear inversion, Society for Industrial and Applied Mathematics(SIAM), Philadelphia,chapter 1.

Hein, T. (2005), “Some analysis of Tikhonov regularization for the inverse problem ofoption pricing in the price dependent case”, Journal for Analysis and its Applications

, Vol. 24, pp. 593–609.

Heston, S. (1993), “A closed-form solution for options with stochastic volatility withapplication to bond and currency options”, The Review of Financial Studies , Vol. 6,pp. 327–343.

Hofmann, B. (1986), Regularization for Applied Inverse and Ill-Posed Problems, Teubner,Stuttgart,Germany.

Hull, J. (2009), Options, futures, and other derivatives, Pearson Education, New Jersey.

Jiang, L., Chen, Q., Wang, L. and Zhang, J. (2003), “A new well-posed algorithm torecover implied local volatility”, Quantitative Finance , Vol. 3, pp. 451–457.

Jiang, L. and Tao, Y. (2001), “Identifying the volatility of underlying assets from optionprices”, Inverse Problems , Vol. 17, pp. 137–155.

Lagnado, R. and Osher, S. (1997), “Reconciling difference”, Risk , Vol. 10, pp. 79–83.

Merton, R. (1976), “Option pricing when underlying stock returns are discontinuous”,Journal of Financial Economics , Vol. 3, pp. 125–44.

41

Morales, J. and Nocedal, J. (2011), “Remark on algorithm 778: L-bfgs-b: Fortran sub-routines for large-scale bound constrained optimization”, ACM Transactions on Math-

ematical Software .

Navon, I., Zou, X., Derber, J. and Sela, J. (1992), “Variational data assimilation withan adiabatic version of the nmc spectral model”, Monthly Weather Review , Vol. 120,pp. 1433–1446.

Rubinstein, M. (1994), “Implied binomial tress”, Journal of Finance , Vol. 69, pp. 771–818.

Turinici, G. (2009), “Calibration of local volatility using the local and implied instanta-neous variance”, The Journal of Computational Finance , Vol. 13, pp. 1–18.

Zhu, C., Byrd, R. and Nocedal, J. (1997), “L-bfgs-b: Algorithm 778: L-bfgs-b, for-tran routines for large scale bound constrained optimization”, ACM Transactions on

Mathematical Software , Vol. 23, pp. 550–560.

42

Appendix

This Appendix derives the equation (25) from equation (24). Part of the analysis istaken directly from (Aster et al. (2005)) Equation (24) can be changed into the followingmatrix form:

[

AL

]

M =

[

B0

]

(28)

where we drop the subscript of both Mk+1 and Lm for the sake of simplicity. L is a pby n matrix, A is a m by n matrix, M is a vector of size n, which represents the numberof parameters to estimate. B is vector of size m, which represents the number of optionsavailable. Since L is an operator on a two dimensional volatility surface σ, p ≥ n. Sinceour inverse problem is an under-determined problem, n ≥ m. Thus p ≥ n ≥ m.

Problem (28) is usually solved by using general singular value decomposition(GSVD)on [A L], as in (Aster et al. (2005)). However, it can’t be directly applied here since weare dealing with an under-determined problem. In order to still use GSVD to analyzeour problem, we just need to switch the order of A and L, namely, GSVD is applied onthe matrix [L A] instead.

According to the theory of of GSVD,

L = U

[

Λ 00 I

]

X−1

A = V [Θ 0]X−1

where U is p by n with orthogonal columns, V is m by m and orthogonal, X is n by

n and non-singular. Λ = diag(µ1,µ2,. . ., µk). Θ = diag(θ1,θ2,

. . .,θk). k is the rank of A.Λ, Θ are sorted and normalized such that:

0 ≤ µ1 ≤ µ2 ≤. . . ≤ µk ≤ 1; 1 ≥ θ1 ≥ θ2 ≥

. . . ≥ θk ≥ 0

andµ2i + θ2i = 1

I is an identity matrix with size (n-k) by (n-k). We assume that the null space of Lintercepts the null space of A only at zero.

λ2LTL +ATA = X−T

[

λ2Λ2 +Θ2 00 I

]

X−1 (29)

From (28), we have:(λ2LTL+ATA)M = ATB (30)

43

M = X

[

λ2Λ2 +Θ2 00 I

]

−1

VTB = Σk1

θ2iθ2i + λ2µ2

i

VT., iB

θiX., i

Let γi =θiµi

, which are called the generalized singular values,

M = Σk1

γ2i

γ2i + θ2

VT., iB

θiX., i

where

f(i) =γ2i

γ2i + λ2

which are called filter factors.

44

Non-parametric calibration of the local volatility surface for

Documents

Transcript of Non-parametric calibration of the local volatility surface for