A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a...

19
A Data-Driven Optimization Heuristic for Downside Risk Minimization Manfred Gilli a,,1 , Evis K¨ ellezi b , Hilda Hysi a,2 , a Department of Econometrics, University of Geneva b Mirabaud & Cie, Geneva Abstract In practical portfolio choice models risk is often defined as VaR, expected short- fall, maximum loss, Omega function, etc. and is computed from simulated future scenarios of the portfolio value. It is well known that the minimization of these functions can not, in general, be performed with standard methods. We present a multi-purpose data-driven optimization heuristic capable to deal efficiently with a variety of risk functions and practical constraints on the positions in the portfolio. The efficiency and robustness of the heuristic is illustrated by solving a collection of real world portfolio optimization problems using different risk functions such as VaR, expected shortfall, maximum loss and Omega function with the same algorithm. Key words: Portfolio optimization, Heuristic optimization, Threshold accepting, Downside risk 1 Introduction Modern portfolio optimization originated with the mean-variance framework introduced by Markowitz (1952). One of the main reasons of its popularity Corresponding author: Department of Econometrics, University of Geneva, Bd du Pont d’Arve 40, 1211 Geneva 4, Switzerland. Tel.: + 41 22 379 8222; fax: + 41 22 379 8299. Email addresses: [email protected] (Manfred Gilli), [email protected] (Evis K¨ ellezi), [email protected] (Hilda Hysi). 1 We are grateful to Algorithmics, Inc. (www.algorithmics.com) for providing data and we thank Stan Uryasev and Arun Verma for personal communications. We also thank the editor, an earlier referee and Patrick Burns for their comments. 2 Supported by the Swiss National Science Foundation (project 1214–067809). Article published in Journal of Risk 8(3), 2006, 1–19.

Transcript of A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a...

Page 1: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

A Data-Driven Optimization Heuristic for

Downside Risk Minimization

Manfred Gilli a,∗,1, Evis Kellezi b, Hilda Hysi a,2,

aDepartment of Econometrics, University of Geneva

bMirabaud & Cie, Geneva

Abstract

In practical portfolio choice models risk is often defined as VaR, expected short-fall, maximum loss, Omega function, etc. and is computed from simulated futurescenarios of the portfolio value. It is well known that the minimization of thesefunctions can not, in general, be performed with standard methods. We present amulti-purpose data-driven optimization heuristic capable to deal efficiently with avariety of risk functions and practical constraints on the positions in the portfolio.The efficiency and robustness of the heuristic is illustrated by solving a collection ofreal world portfolio optimization problems using different risk functions such as VaR,expected shortfall, maximum loss and Omega function with the same algorithm.

Key words: Portfolio optimization, Heuristic optimization, Threshold accepting,Downside risk

1 Introduction

Modern portfolio optimization originated with the mean-variance frameworkintroduced by Markowitz (1952). One of the main reasons of its popularity

∗ Corresponding author: Department of Econometrics, University of Geneva, Bddu Pont d’Arve 40, 1211 Geneva 4, Switzerland. Tel.: + 41 22 379 8222; fax: +41 22 379 8299.

Email addresses: [email protected] (Manfred Gilli),[email protected] (Evis Kellezi), [email protected] (HildaHysi).1 We are grateful to Algorithmics, Inc. (www.algorithmics.com) for providing dataand we thank Stan Uryasev and Arun Verma for personal communications. We alsothank the editor, an earlier referee and Patrick Burns for their comments.2 Supported by the Swiss National Science Foundation (project 1214–067809).

Article published in Journal of Risk 8(3), 2006, 1–19.

Page 2: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

is the fact that it can be performed efficiently using standard quadratic pro-gramming techniques. Since then many features of the Markowitz model havebeen subject to a lot of criticism. Alternative approaches attempt to conformthe model’s assumptions to reality by, for example, dismissing the normalityhypothesis in order to account for the fat-tailedness and the asymmetry ofasset returns. As a consequence, other measures of risk, such as value-at-risk,expected shortfall, mean semi-absolute deviation, semi-variance and so on areused, leading to problems that cannot always be reduced to standard linear orquadratic programs. The resulting optimization problem often becomes fairlycomplex as it exhibits multiple local extrema and discontinuities, in particu-lar when we introduce constraints restricting the trading variables to integers,constraints on the holding size of assets, constraints on the maximum numberof different assets in the portfolio, etc.

In such situations, classical optimization methods do not work efficiently andheuristic optimization techniques may be the only way out. The use of heuris-tic optimization techniques to portfolio selection has already been suggestedin the literature. Dueck and Winker (1992) are the first to use a local searchtechnique, called threshold accepting, to portfolio choice problems. Gilli andKellezi (2002a,b) use threshold accepting for minimizing value-at-risk and ex-pected shortfall, Maringer (2005) solves a broad class of portfolio managementproblems using heuristic optimization. Comparisons of the performance for dif-ferent heuristic techniques applied to solve portfolio choice problems are givenby Chang et al. (2000) and Beasley et al. (2003).

As opposed to a standard optimization algorithm, the use of heuristic tech-niques necessitates the setting of several problem specific parameters. Despitethe simplicity of the algorithmic aspect of most heuristics, in particular forlocal search methods, their efficiency crucially depends on the choice of appro-priate values for these parameters. Without some experience, the selection ofthese parameters can be delicate. By the way, this might be one of the reasonsof the so far limited use of heuristic optimization techniques. In the followingwe provide an algorithm that is suitable to solve a broad class of portfoliooptimization problems, where the setting of the problem specific parametersis driven by the data.

In Section 2 we give an overview of the kind of portfolio choice problemsthat can be solved using the heuristic. Details of the heuristic optimizationmethod are given in Section 3. Applications of the algorithm to a variety ofrisk minimization problems as well as comparisons of the results with otherapproaches are presented in Section 4. Section 5 concludes.

2

Page 3: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

2 Models for portfolio choice

In the following we introduce the notation used to formalize different portfoliochoice models. At time t = 0 we consider an initial wealth v0 to be invested in auniverse A of nA assets with prices p0j, j = 1, . . . , nA. The quantities of assetsxj, such that

∑j∈J xj p0j = v0, constitute a portfolio with J = j |xj 6= 0

denoting the set of indices of assets in the portfolio. At the planning horizon,chosen to be t = 1, the assets generate returns rj, still unknown at time t = 0.

Portfolio choice then consists in finding an optimal allocation xj ≥ 0, orequivalently expressed in terms of weights wj = (xj p0j)/v0, that minimizesa particular risk function for a given return target under some additionalconstraints on the holding size of the assets. For a discussion of the proper useof risk measures in portfolio optimization see Ortobelli et al. (2005).

In the mean-variance framework, for example, the returns rj are treated asnormal random variables with mean E(r) = µ and variance and covariancematrix Σ. Risk is measured as the variance of portfolio return and, for agiven return target rd, the mean-variance portfolio is obtained by solving thefollowing quadratic program

minw

w′ Σ w (1)

j∈A

wj µj ≥ rd (1′)

j∈A

wj = 1 (1′′)

winfj ≤ wj ≤ wsup

j j ∈ A . (1′′′)

The vectors winfj , wsup

j , j ∈ A represent additional constraints on the minimumand maximum holding size of the individual assets in the portfolio.

Generation of price scenarios

In the following models the normality assumption of the returns is relaxed andthe uncertainty about future returns, i.e. about the future portfolio value v,is modelled through a set of possible realizations, called scenarios. These sce-narios can be generated relying on a statistical model, past returns or experts’opinions.

For the purpose of our analysis, the nS scenarios for the returns rsj, s =1, . . . , nS, j = 1, . . . , nA are bootstrapped from the observed historical returnsR = [ rtj ]. The corresponding price scenarios for the planning period t = 1

3

Page 4: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

are computed as p1sj = p0j (1 + rsj), s = 1, . . . , nS, j = 1, . . . , nA, giving riseto a different portfolio value vs for each scenario:

vs =∑

j∈J

xj p0j (1 + rsj) s = 1, . . . , nS .

Downside risk framework

This framework takes into consideration the fact that investors are often moreconcerned about losses, or the risk that their portfolio value falls below acertain target. We therefore define the losses ℓ as

ℓ = v0 − v

and write the portfolio choice problem as

minx∈RnA

Φ(ℓ) (2)

E(ℓ) ≤ −v0 rd (2′)∑

j∈J

xj p0j = v0 (2′′)

xinfj ≤ xj ≤ xsup

j j ∈ J (2′′′)

#J ≤ K (2′′′′)

where Φ(ℓ) is a function defining the risk, (2′) is the constraint on the portfolioreturn with rd the desired return target, (2′′) is the budget constraint, (2′′′)restricts the holding size of the assets in the portfolio and (2′′′′) is a cardinalityconstraint limiting the number of assets in the portfolio to a maximum of K.In the following we enumerate a number of possible definitions of the riskfunction Φ.

Value-at-risk (VaR) is defined as the (1−β)-quantile of the distribution functionF of losses of the portfolio, i.e.

VaR(1−β) = F−1(1 − β)

whereβ = P (ℓ > VaR)

is called the shortfall probability. In other words there is only a probability ofβ that losses can exceed VaR(1−β).

The conditional mean value of the losses given that the losses have exceededVaR is called expected shortfall (ES) and is defined as

ES = E(ℓ | ℓ > VaR) .

4

Page 5: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

Another characterization of risk is given by the ratio of the weighted condi-tional expectation of losses over the weighted conditional expectation of gains

Ω =P (ℓ > 0) E(ℓ | ℓ > 0)

−P (ℓ < 0) E(ℓ | ℓ < 0).

This measure is called Omega and has been introduced by Keating and Shad-wick (2002) as a performance measure.

Figure 1 illustrates these different risk measures for a continuous distributionf of losses and the corresponding cumulative distribution function F . Theexpected shortfall is given by

ES = VaR +1

βI3 (3)

where I3 =∫ ∞VaR

(1−F (z))dz. The integral I1 represents −P (ℓ < 0) E(ℓ | ℓ < 0),the weighted conditional expectation of losses, and the integral I2 the weightedconditional expectation of gains P (ℓ > 0) E(ℓ | ℓ > 0). These expectations canbe computed from the cumulative distribution function as

I1 =∫ 0

−∞F (z) dz and I2 =

∫ ∞

0(1 − F (z)) dz .

The Omega is then computed as

Ω = I2/I1 (4)

and from I1 and I2 we also recover the expected loss as 3

E(ℓ) = I2 − I1 .

0 VaR 0

1−beta

1

I3 =

∫∞

VaR

(1 −F (z)) dz

β =

∫∞

VaR

f(z) dzf

F

I1

I2

Fig. 1. Example of a distribution f of losses and the corresponding cumulativedistribution function F .

As mentioned before, in order to avoid any parametric assumptions aboutthe distribution of losses, we can use scenarios based on observed data. We

3 See (Parzen, 1960, p. 211) for details.

5

Page 6: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

consider the nS simulated losses ordered such that ℓ(1) ≤ ℓ(2) ≤ · · · ≤ ℓ(nS).The VaR1−β can then be defined as the order statistic

VaR = ℓ(⌈(1−β) nS⌉) . (5)

An alternative measure of risk can be the maximum loss over all scenariosdefined as

Max = ℓ(nS) (6)

or Max = max(ℓ) for the unordered loss vector ℓ.

The expressions (3) and (4) can be evaluated in two different ways, either bycomputing I1, I2 and I3 by numerical integration of the empirical cumulativedistribution of losses or by estimating I1, I2 and ES as an arithmetic meanover discrete scenario losses.

The second approach consists in estimating I1 as

I1 =−P (ℓ < 0) E(ℓ < 0)

=−

∑1ℓs<0

nS

1∑

1ℓs<0

∑ℓs 1ℓs<0

=−1

nS

∑ℓs 1ℓs<0.

Correspondingly, the expression for I2 is

I2 =1

nS

∑ℓs 1ℓs>0 .

Using expression (4), the Omega value can also be estimated as

Ω =

∑ℓs 1ℓs>0

−∑

ℓs 1ℓs<0

. (7)

The expected shortfall can be computed as

ES =1

∑1ℓs>VaR

∑ℓs 1ℓs>VaR (8)

and the expected loss as

E(ℓ) =1

nS

nS∑

s=1

ℓs .

The functions Φ(ℓ) defined in (3–8) quantify the risk for a given portfolio x interms of nS scenario losses ℓs and are all highly non-convex.

In order to illustrate the non-convex feature of the problem, we plot in Figure 2the objective function for the VaR and Omega minimization for a portfolio

6

Page 7: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

of three equities quoted in the swiss market 4 . The objective functions areevaluated based upon 800 price scenarios on a grid of 50 × 50 points forquantities x1 and x2. The quantity x3 of the third asset is determined by thebudget constraint allowing short selling. The presence of multiple local minimaand the non-smoothness of the surfaces indicate that classical gradient basedmethods cannot be used to solve problem (2).

2

2.5

3

3.5

4

x 104

789101112

x 104

3.2

3.4

3.6

3.8

4

4.2

x 105

x1x

2

2

3

4

5

x 104

0.60.8

11.2

1.4

x 105

0.26

0.28

0.3

0.32

0.34

0.36

x2

x1

Fig. 2. Objective functions for VaR (left panel) and Omega (right panel) minimiza-tion for a portfolio of three assets.

3 The local search algorithm

The non-convexities of the functions for the examples displayed in Figure 2show the necessity of a global optimization approach. Strategies for global op-timization generally resort to local search methods, e.g. simulated annealing(Kirkpatrick et al. (1983)), multiple starting points or sequential local min-imization. The latter, also called smoothing/continuation method has beenused by Gaivoronski and Pflug (2005) and Verma (2005) for VaR minimiza-tion. We use a modified local search method called threshold accepting (TA).

The classical local search for minimization of a function f(x) is formalizedin algorithm 1. Given f(x), x ∈ Ω with Ω ⊂ R

n the search space (possiblydiscrete) and a current solution xc, a new solution xn is computed in theneighborhood N (xc) and is accepted if f(xn) < f(xc).

The stopping criteria is generally defined as a given number of iterations.Different rules for the choice and the acceptance of the neighbor xn (statements3–4) define a particular heuristic.

If we want to escape local minima the algorithm must accept uphill moves

4 The equities are Crealogix, Swiss Steel and Swissquote.

7

Page 8: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

Algorithm 1 Local search for minimization.1: Generate current solution xc

2: while stopping criteria not met do

3: Select xn ∈ N (xc) (neighbor to current solution)4: if f(xn) < f(xc) then xc = xn

5: end while

which can be achieved by modifying statement 4 as

if

(f(xn) − f(xc)

)< τ then xc = xn

where τ is a threshold that is gradually reduced to zero. Modifying the localsearch in this way led to a heuristic called threshold accepting that was intro-duced by Dueck and Scheuer (1990). It can be considered as a deterministicanalog for simulated annealing where uphill moves are accepted according toa probabilistic criterion.

The implementation of the TA algorithm involves the definition of the objec-tive function f , the neighborhood N (x) and the threshold sequence τ whichdecreases toward zero in a given number of rounds nRounds.

The objective function corresponds to the risk functions Φ(ℓ) defined in (3–8).In each iteration we generate a new element xn in the neighborhood of thecurrent solution xc. In the case of portfolio selection the solutions are vectorsrepresenting the positions in each asset and thus the search space Ω can beconsidered as a subset of a real valued vector space R

nA and the concept ofneighborhood can be defined using ε-spheres

N (xc) = xn |xn ∈ Ω, ‖xn − xc‖ < ε

where ‖ · ‖ is a distance measure. The neighbor solutions are generated withAlgorithm 2.

Algorithm 2 Generation of a neighbor solution.1: Randomly select asset i ∈ J

2: Sell quantity qi of asset i

3: Randomly select asset j ∈ A, j 6= i

4: Buy quantity qj of asset j

The number of assets qi and qj bought and sold in instructions 2 and 4 inthe algorithm 2 are integers and have to be adjusted in order to satisfy theconstraints on the holding size (2′′′). If the neighbor solution computed withAlgorithm 2 violates constraint (2′) we add the penalty term c max(−v0rd +E(ℓ), 0) to the objective function, where c is a positive scaling parameter.

Althofer and Koschnick (1991) proved the convergence of the TA algorithmgiven an “appropriate threshold sequence”. Their proof however does not al-

8

Page 9: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

low the construction of the appropriate sequence. In practice the thresholdsequence is retrieved from the empirical distribution of nSteps distances ∆ be-tween objective function values for successive neighbors. This procedure isdetailed in Algorithm 3.

Algorithm 3 Generation of threshold sequence.1: Randomly choose xc ∈ Ω2: for i = 1 to nSteps do

3: Compute xn ∈ N (xc) and ∆i = |f(xc) − f(xn)|4: end for

5: Compute empirical distribution F of ∆i, i = 1, . . . , nSteps

6: Compute threshold sequence τr = F−1(0.8 nRounds − r

nRounds

), r = 1, . . . , nRounds

The TA algorithm is a so called trajectory method as the current solution isslightly modified by searching within its neighborhood. In order to explore thesolution space more efficiently one may restart the algorithm nRestarts times fromrandomly chosen points in the search space and then take the best solutionout of all restarts. Algorithm 4 resumes the complete optimization procedure.

Algorithm 4 TA algorithm with restarts.1: Initialize nRestarts, nRounds and nSteps

2: Compute threshold sequence τr (Algorithm 3)3: for k = 1 : nRestarts do

4: Randomly generate current solution xc ∈ X5: for r = 1 to nRounds do

6: for i = 1 to nSteps do

7: Generate xn ∈ N (xc) and compute ∆ = f(xn) − f(xc)8: if ∆ < τr then xc = xn

9: end for

10: end for

11: ξk = f(xc), xsol

(k) = xc

12: end for

13: xsol = xsol

(k), k | ξk = minξ1, . . . , ξnRestarts

4 Applications

The TA algorithm has been implemented 5 in Matlab 7.xx and the restarts areexecuted in a distributed computing environment with 32 PCs. To illustrate itsperformance we present two applications of portfolio choice minimizing the riskfunctions discussed in section 2. The first application uses daily observationsfrom 30 June 2002 to 30 June 2005 of the 213 stock prices forming the SwissPerformance Index (SPI). The second application considers a large problem

5 The code can be obtained upon request from the corresponding author.

9

Page 10: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

of bond portfolio optimization where the future portfolio values, due to creditmigration, are given by a set of 20 000 scenarios.

4.1 Equity portfolio optimization

We use daily observations from 30 June 2002 to 30 June 2005 of the 213 stockprices forming the Swiss Performance Index (SPI) and consider a planningperiod of one month. We construct a set of return scenarios by bootstrappingnS = 800 overlapping blocks of length 20 from the set of 736 daily returns.The sum of the log-returns of each block defines a monthly log-return scenario.The same set of return scenarios is used to compute mean-variance and mean-downside risk optimal portfolios.

Benchmarking the TA in the mean-variance framework

In order to provide a first evidence about the reliability of TA solutions wecompute the mean-variance efficient frontier by solving the quadratic pro-gram (1) and compare it with the solutions obtained with the TA algorithm.Constraint (1′′′) has been set to 0 ≤ ω ≤ 1, in order to make it tractable bythe QP algorithm. The covariance matrix Q and the mean return vector r areestimated from the 800 return scenarios.

In Figure 3 we reproduce the efficient frontier for mean-variance portfolioscomputed with QP and TA for monthly return targets ranging from 0.77% to3.86%. The TA solution has been computed with 60 restarts with 10 roundsof 5 000 steps. Figure 4 compares the portfolio weights for a particular port-folio on the frontier. 6 The similarity of the two frontiers and the portfoliocompositions confirm the quality of the TA solutions.

6 For a better readability positions inferior to 0.5% have been removed.

10

Page 11: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

0.5 1 1.5 2 2.5 3 3.5 40.5

1

1.5

2

2.5

3

3.5

Monthly volatility in %

Mon

thly

ret

urn

in %

QP

TA

Fig. 3. Mean-variance efficient frontier computed with QP and TA.

13 20 27 34 70 75 80 85 94 121 134 169 172 175 2040

0.05

0.1

0.15

0.2QPTA

Fig. 4. Mean-variance portfolio weights computed with QP and TA for a targetexpected return of 0.77% per month. The numbers refer to individual stocks in theSPI index.

Downside risk minimization

For the same set of scenarios, we compute the efficient frontiers of portfoliosminimizing the VaR, the ES and the Ω for a given return target. We also give theES corresponding to the VaR-optimized portfolios and the VaR correspondingto the portfolios optimized with respect to ES. We specify an initial wealth ofv0 = 107 and the holding sizes as 0.01 v0 ≤ xj ≤ 0.30 v0, j ∈ J . This constraintimplies that the portfolio weight should be at least 1% and at most 30% forall the assets that are held. Moreover we constrain the cardinality of the setof assets in the portfolio to be at most 30. The TA algorithm is executed with60 restarts of 10 rounds and 5000 steps.

Figure 5 reproduces the efficient frontiers obtained for VaR, ES and Ω mini-mization. The frontier obtained for portfolios minimizing VaR and their cor-responding ES are marked with a circle, the portfolios minimizing ES and theircorresponding VaR with a star and the triangles mark the VaR and ES corres-ponding to the portfolios optimizing Ω. The cardinality constraint is activeonly for the first portfolio in the mean-VaR frontier. The number of assets

11

Page 12: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

varies from 16 to 30 for the mean-VaR frontier, from 9 to 22 for the mean-ESand from 10 to 27 for the mean-Omega frontier. In general, we remark thatmean-VaR portfolios are more diversified than those satisfying the mean-EScriteria.

These results show that minimizing VaR or ES is not equivalent in terms oftail risk efficiency. For example, for a return target of 2.26% per month, theminimum ES that can be achieved is 1.79%. Minimizing VaR instead of ES

would yield a portfolio offering an expected shortfall of 4.68%, more thantwice as large as the minimum achievable.

0 1 2 3 4 5 6 7 8 9

1.5

2

2.5

3

3.5

Monthly loss risk in %

Mon

thly

exp

ecte

d re

turn

in %

VaRopt

ESVaRopt

ESopt

VaRESopt

VaRΩ

ESΩ

Fig. 5. Efficient frontiers for mean-VaR.95, mean-ES.95 and mean-Ω portfolios.

In Figure 6 we reproduce the empirical distribution for the different efficientportfolios with a monthly return target of 2.26%. The VaR-optimal portfolio iscomposed of 29 assets, the ES-optimal of 15 and the Ω-optimal of 17. Exceptingone asset in the ES-optimal portfolio, all asset weights are less than 15%. Theright panel shows the distribution for losses and we observe that, for this dataset, the ES minimization strategy dominates the other strategies with respectto extreme losses. This at the cost of higher probabilities for smaller losses.

12

Page 13: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

−10 −8 −6 −4 −2 0

x 105

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Monthly losses

VaR

ES

Ω

0 2 4 6 8 10

x 105

0.8

0.85

0.9

0.95

1

Monthly losses

VaR

ES

Ω

Fig. 6. Empirical cumulated distribution of monthly losses for mean-VaR.95,mean-ES.95 and mean-Ω portfolios with 2.26% monthly return target .

It is interesting to observe that the portfolio minimizing Ω dominates the VaR

strategy for almost the entire distribution of losses. The left panel shows thedistribution of gains for the different portfolios.

4.2 Bond portfolio optimization

In order to further assess the efficiency of the TA algorithm we apply it to abond portfolio optimization problem introduced by Bucay and Rosen (1999)and Mausser and Rosen (1999).

The portfolio is constructed from a set of bonds issued by nA = 80 obligorswith a mark-to-market value of 8.8 billion USD. We denote x = [x1, . . . , xnA

]the vector of positions of the obligors in the portfolio expressed as multiplesof the initial holdings, b = [b1, . . . , bnA

] the mark-to-future value of the instru-ments if no credit migration occurs and ys = [ys1, . . . , ys,nA

], s = 1, . . . , nS thescenario values of joint credit states due to events such as default and creditmigration. The simulated losses are then

ℓs =nA∑

i=1

xi (bi − ysi) s = 1, . . . , nS . (9)

Figure 7 shows the right tail of the empirical distribution of nS = 20 000scenarios of the one-year credit losses (9) corresponding to a portfolio withinitial holdings x = [1, . . . , 1]. The VaR and expected shortfall (ES) at the 95%and 99% percentiles are VaR.95 = 518 and ES.95 = 824, respectively VaR.99 =1 026 and ES.99 = 1 320. The maximum loss is 2 585.

The credit risk optimization problem for this portfolio has been approachedin different ways in the literature. Bucay and Rosen (1999) applied the Cre-

13

Page 14: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

518 824 1026 1320 2585

0.95

0.991

Fig. 7. Empirical distribution of one-year credit losses (million USD) for the initialpositions x = [1, . . . , 1].

ditMetrics 7 methodology, Mausser and Rosen (1999) the regret optimizationframework, Andersson et al. (2001) minimize the expected shortfall (calledalso CVaR) and provide the corresponding VaR and Larsen et al. (2002) sug-gest a heuristic approach where VaR is minimized by constraining a sequenceof CVaR solutions 8 .We apply the optimization model (10) for different spec-ifications of the risk function Φ(ℓ) and solve it with the threshold acceptingheuristic.

minx∈RnA

Φ(ℓ) (10)

ℓs =nA∑

i=1

xi (bi − ysi) s = 1, . . . , nS (10′)

j∈J

xj bj ≥nA∑

i=1

bi (10′′)

0 ≤ xj ≤ 2 j ∈ J (10′′′)

Constraint (10′′) maintains the future portfolio value and constraint (10′′′)avoids unrealistic positions in any obligor. Verma (2005) considers the samedata set and computes the portfolio minimizing the VaR (without an upperbound for constraints (10′′′)) using smoothing methods.

The settings for the TA algorithm are 64 restarts, 10 rounds and 2000 steps.The computing times, for the distributed execution on 32 Pentium 4 PCs,range from 3 to 7 minutes. For the particular portfolio minimizing VaR.99,Figure 8 shows the distribution of the 64 TA solutions and the obligor weightsin the best solution. Constraint (10′′′) is stringent for 24 out of the 73 positionsin the portfolio. We observe that the 64 solutions lie in a relatively narrowrange, confirming again the good functioning of TA.

Table 1 and Figures 9 and 10 summarize the results obtained by solving (10–10′′′) for different functions Φ.

7 c.f. RiskMetrics Group (1996).8 Without the cardinality constraint, the CVAR minimization can be formulatedas a linear program, see Rockafellar and Uryasev (2000).

14

Page 15: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

401 404 4150

0.5

1

1 10 20 30 40 50 60 70 800

0.05

0.1

0.15

Fig. 8. VaR.99 minimization with upper size constraints 0 ≤ xj ≤ 2. Upper panel:Empirical distribution of the 64 TA solutions. Lower panel: Ordered obligor weightsin the best solution portfolio.

Table 1Optimization results for different risk functions Φ and constraint 0 ≤ xj ≤ 2.

β Φ VaR ES Max Ω #J

VaR minimization

0.05 (5) 230 421 1795 5.50 670.01 (5) 398 604 1480 5.06 66

ES minimization

0.05 (8) 240 360 1448 4.98 690.01 (8) 454 559 1072 5.23 720.01 (3) 455 558 1074 5.21 72

Max minimization

0.01 (6) 551 645 834 5.08 60

Omega minimization

0.01 (4) 1034 1425 2703 2.22 31

494 839 1448 1815

0.95

0.99

1

VaR

ES

Max

Omega

Fig. 9. Right tails of loss distribution for portfolios minimizing VaR.95, ES.95, Maxand Omega.

Looking at Figure 9 we observe that the portfolio minimizing the expectedshortfall almost completely dominates the other portfolios in the right tail,i.e. has smaller losses with higher probability. If we look at the center of the

15

Page 16: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

distribution, illustrated in Figure 10, we see that the advantage of the Omega

portfolio is its significantly higher probability to produce gains.

−300 0 3000

0.5

1

VaR

ES

Max

Omega

Fig. 10. Center of loss distribution for portfolios minimizing VaR.95, ES.95, Max andOmega.

The results for portfolios without an upper constraint on the positions, i.e.solving (10–10′′), are given in Table 2. The minimum VaR.99 portfolio weightsobtained without upper size constraint on positions are shown in Figure 11.In fact we observe that the portfolio is concentrated in only 34 positions.

Table 2Optimization results for different risk functions Φ and constraint xj ≥ 0. ColumnΦ refers to the equation number.

β Φ VaR ES Max Ω #J

VaR minimization

0.05 (5) 20 169 1911 16.72 90.01 (5) 46 552 1754 26.50 9

ES minimization

0.05 (3) 45 84 1603 12.53 190.01 (3) 98 157 587 26.72 35

Max

0.01 (6) 200 236 281 7.27 47

Omega minimization

0.01 (4) 1864 1945 2428 0.42 9

1 10 20 30 40 50 60 70 800

0.05

0.1

0.15

Fig. 11. Ordered weights for the minimum VaR.99 portfolio without upper size con-straint on positions.

For the expected shortfall minimization our results coincide with the onesobtained by Andersson et al. (2001), whereas for the VaR.95 minimization weachieve a solution of VaR.95 = 20 instead a value of 94 reported by Verma(2005).

16

Page 17: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

In these applications, we used both ways of computing expected shortfall,Omega and expected loss. The first one, consisting in computing numericallythe integrals I1, I2 and I3, can be useful when the loss distribution is definedas a parametric function. In our case, the results based on the numerical inte-gration of the empirical distribution function of scenarios are not significantlydifferent from the the results obtained using arithmetic means. Also, the com-putational efficiency is not affected by the choice of a particular method.

The comments about the relative attractiveness of VaR, expected shortfall orOmega function in this sections should not be interpreted as general con-clusions about their usefulness in portfolio optimization. They concern theparticular data sets used and general statements would require further inves-tigation.

5 Conclusions

Threshold accepting is an optimization heuristic that can be used for a wideclass of portfolio choice problems where the objective function is non-convexand has many local minima. This is in particular the case when the risk isexpressed as VaR, expected shortfall, Omega, maximum loss etc., and when thefuture returns of the individual assets are modelled as scenarios. The methodhas been successfully tested on a set of large real world problems and has beenbenchmarked with results provided in the literature. We compare portfoliosoptimized for different risk measures highlighting the features of the differentrisk functions.

A major advantage of the proposed method is its flexibility. The same toolcan be used for all sorts of risk functions, side constraints or assumptions onreturns. Furthermore, the algorithm parameters are driven by the problemdata themselves, which implies that there is no need for particular expertiseand makes its use almost as simple as a classical method.

References

Althofer, I. and Koschnick, K.-U. (1991). On the Convergence of ‘ThresholdAccepting’. Applied Mathematics and Optimization, 24:183–195.

Andersson, F., Mausser, H., Rosen, D., and Uryasev, S. (2001). Credit riskoptimization with Conditional Value-at-Risk criterion. Mathematical Pro-

gramming, 89(2):273–291.Beasley, J., Meade, N., and Chang, T.-J. (2003). An evolutionary heuristic

for the index tracking problem. European Journal of Operational Research,148:621–643.

17

Page 18: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

Bucay, N. and Rosen, D. (1999). Credit risk of an international bond portfolio:a case study. Algo Research Quarterly, 1(2):9–29.

Chang, T.-J., Meade, N., Beasley, J. E., and Sharaiha, Y. M. (2000). Heuristicsfor cardinality constrained portfolio optimization. Computers & Operations

Research, 27:1271–1302.Dueck, G. and Scheuer, T. (1990). Threshold Accepting: A general purpose

algorithm appearing superior to Simulated Annealing. Journal of Compu-

tational Physics, 90:161–175.Dueck, G. and Winker, P. (1992). New concepts and algorithms for portfolio

choice. Applied Stochastic Models and Data Analysis, 8:159–178.Gaivoronski, A. and Pflug, G. (2005). Value-at-risk in portfolio optimization:

properties and computational approach. Journal of Risk, 7(2):1–31.Gilli, M. and Kellezi, E. (2002a). A Global Optimization Heuristic for Portfolio

Choice with VaR and Expected Shortfall. In Kontoghiorghes, E. J., Rustem,B., and Siokos, S., editors, Computational Methods in Decision-making, Eco-

nomics and Finance, Applied Optimization Series, pages 167–183. KluwerAcademic Publishers.

Gilli, M. and Kellezi, E. (2002b). The Threshold Accepting Heuristic forIndex Tracking. In Pardalos, P. and Tsitsiringos, V. K., editors, Financial

Engineering, E-Commerce and Supply Chain, Applied Optimization Series,pages 1–18. Kluwer Academic Publishers, Boston.

Keating, C. and Shadwick, W. (2002). A Universal Perfor-mance Measure. The Finance Development Centre, London,http://faculty.fuqua.duke.edu/∼charvey/Teaching/BA453 2005/

Keating A universal performance.pdf.Kirkpatrick, S., Gelatt, C., and Vecchi, M. (1983). Optimization by simulated

annealing. Science, (220):671–680.Larsen, N., Mausser, H., and Uryasev, S. (2002). Algorithms for Optimization

of Value-at-Risk. In Pardalos, P. and Tsitsiringos, V. K., editors, Financial

Engineering, E-Commerce and Supply Chain, Applied Optimization Series,pages 19–46. Kluwer Academic Publishers, Boston.

Maringer, D. (2005). Portfolio Management with Heuristic Optimization, vol-ume 8 of Advances in Computational Management Science. Springer.

Markowitz, H. (1952). Portfolio selection. Journal of Finance, 7:77–91.Mausser, H. and Rosen, D. (1999). Applying Scenario Optimization to Port-

folio Credit Risk. Algo Research Quarterly, 2(2):19–34.RiskMetrics Group (1996). RiskMetrics – Technical Document. J.P. Mor-

gan/Reuters, NY. www.jpmorgan.com/RiskManagement/RiskMetrics/

RiskMetrics.html.Rockafellar, R. T. and S. Uryasev (2000). Optimization of Conditional Value-

at-Risk. Journal of Risk 2(3):21–41.Ortobelli, S., Rachev, S. T., Stoyanov, S., Fabozzi, F. J., and Biglova, A.

(2005). The Proper Use of Risk Measures in Portfolio Theory. International

Journal of Theoretical and Applied Finance, 8(8):1–27.Parzen, E. (1960). Modern Probability Theory and its Applications. Wiley.

18

Page 19: A Data-Driven Optimization Heuristic for Downside Risk ... · j∈J xj p0j = v0, constitute a portfolio with J = {j|xj 6= 0 } denoting the set of indices of assets in the portfolio.

Verma, A. (2005). VaR optimal portfolios. A Global Optimization Approach.Workshop on Optimization in Finance, Coimbra.

19