Long-Term Asset Allocation Based on Stochastic Multistage...

1

LONG-TERM ASSET ALLOCATION BASED ON STOCHASTIC MULTISTAGE MULTI-OBJECTIVE PORTFOLIO OPTIMIZATION

Guido Chagas Pedro L. Valls Pereira EESP – FGV and Banco Itaú Unibanco EESP – FGV and CEQEF –FGV

ABSTRACT

Multi-Period Stochastic Programming (MSO) offers an appealing approach to identity

optimal portfolios, particularly over longer investment horizons, because it is inherently

suited to handle uncertainty. Moreover, it provides flexibility to accommodate coherent

risk measures, market frictions, and most importantly, major stylized facts as volatility

clustering, heavy tails, leverage effects and tail co-dependence. However, to achieve

satisfactory results a MSO model relies on representative and arbitrage-free scenarios

of the pertaining multivariate financial series. Only after we have constructed such

scenarios, we can exploit it using suitable risk measures to achieve robust portfolio

allocations. In this thesis, we discuss a comprehensive framework to accomplish that.

First, we construct joint scenarios based on a combined GJR-GARCH + EVT-GPD +

t-Copula approach. Then, we reduce the original scenario tree and remove arbitrage

opportunities using a method based on Optimal Discretization and Process Distances.

optimization taking into account market frictions such as transaction costs and liquidity

restrictions. The proposed framework is particularly valuable to real applications

because it handles various key features of real markets that are often dismissed by

more common optimization approaches. Lastly, using the approximated scenario tree

we perform a multi-period Mean-Variance-CVaR optimization taking into account

market frictions such as transaction costs and liquidity restrictions. The proposed

framework is particularly valuable to real applications because it handles various key

features of real markets that are often dismissed by more common optimization

approaches.

Keywords: Multiperiod Asset Allocation, Scenario Tree Generation and Reduction,

Optimal Discretization, Mean-Variance-CVaR Optimization

Jel Code: G10, G11, G17

2

1 Introduction

Most, if not all, activities in financial markets explore risk-adjusted returns.

They look for investment opportunities that offer attractive returns within acceptable

risk limits. However, this seemingly simple question entails great challenges. First, we

need to choose a proper way to weigh risks. But what is a good risk measure? Should

we consider just one measure when comparing opportunities? Secondly, we seldom

have exact information regarding the future performance of potential investments. In

fact, we must take into account data uncertainty to make optimal decisions. Lastly, real

markets are not perfect. They have restrictions, costs and limitations, known as market

frictions, that often affect our decisions.

Furthermore, when considering a portfolio of such investment opportunities,

we face an additional challenge: their performance is usually mutually dependent.

Therefore, to find an optimal portfolio of investment opportunities (or, at least, one that

is not dominated by any other), we also need to understand their dependence structure

and use it to our advantage.

These insights suggest that, to perform a realistic portfolio optimization, we

need a model that considers the (actual) individual and joint behaviour of the

opportunities’ expected returns, takes into account suitable risk measures and

evaluates the impacts of market frictions. This thesis proposes a integrated framework

to accomplish that.

We start by modelling the individual and joint behaviour of the investment

opportunities. These opportunities usually involve instruments whose series of returns

(i) usually have particular statistical properties, (ii) are seldom independent and

identically distributed and (iii) often display a nonlinear jointly behaviour. The more

accurate our ability to explain these empirical features, the better we can appraise and

allocate such opportunities.

These features, which are persistent across different times, markets and

instruments, are known in the literature as stylized facts. Because of their importance,

many empirical financial studies have suggested models to explain and, more

interestingly, simulate multivariate financial times series that are capable of

accommodating such key (individual and joint) empirical statistical facts. In fact, some

3

essential financial activities as risk management and portfolio optimization depend on

elaborated models to account properly for these stylized facts in order to provide

reliable conclusions.

Unfortunately, as we discuss later there are many stylized facts and, to the

best of our knowledge, most models are capable of explaining simultaneously, at most,

three of them. For instance, McNeil and Frey [213] suggested a combined approach

using GARCH models and EVT to model extreme stock returns. Rockinger and

Jondeau [268] proposed a different approach employing GARCH models and copulas

to fit multivariate stock returns. Recent works, like Huang, et al. [149] and Wang et al.

[310], combine these previous studies into a GARCH-EVT-Copula approach to study

exchange rate markets.

In this thesis, we follow a similar framework and propose an unified approach

to model four of the most significant facts: heavy tails, volatility clustering, leverage

effects and tail co-dependence, which are particularly relevant to daily financial series

(e.g., daily log returns).

Heavy tailed distributions are distributions whose tails exhibit seemingly

polynomial decay. That means that extreme (absolute) returns are more common that

those predicted by thin-tailed distributions, like the popular Gaussian distribution. For

obvious reasons, the probability of occurrence of extreme observations is essential

information for reliable risk management. Unfortunately, conventional econometric

approaches are not suited to handle extreme returns because they are relatively rare.

To do so, we need an alternative such as EVT discussed in the later sections.

Volatility clustering is another key property. Many statistical tools and

techniques, like EVT for instance, rely on the assumption that series are independent

and identically distribution (i.i.d.). Although daily raw returns do not show evidence of

significant serial correlation, daily absolute or squared returns clearly exhibit strong

serial correlation. In other words, large (absolute) returns are often followed by large

returns while small (absolute) returns are typically followed by small returns, creating

clusters of volatility. This empirical fact plainly invalidates the assumption of i.i.d. series.

Conveniently, GARCH or SV models allow us to handle these volatility clusters and

use (approximately) i.i.d filtered standardized residuals to perform subsequent

analyses and modelling.

4

Leverage effects are also particularly relevant because they explain how “bad

news” (i.e., negative shocks) have greater impact in volatility than “good news” (i.e.,

positive shocks). This behaviour can be accommodated using, for instance,

asymmetric extensions of the GARCH models such as EGARCH, GJR-GARCH or

TGARCH. These models are rather popular because they can reasonably explain the

volatility clustering, the leverage effects and, to some extent, the heavy tails observed

in financial daily series.

Lastly, tail co-dependence is likely one of the most important stylized facts

because it describes how dependence among instruments and markets are stronger

(in absolute terms) during turbulent periods. It indicates how cross-correlations change

during these periods leading to increasing risk (or, similarly, decreasing diversification).

Although some multivariate GARCH models like DCC and TVC are able to describe

partially the joint behaviour of financial series, copulas offer a more flexible alternative

because they detach the marginal modelling from the dependence structure modelling.

Once we have modelled these stylized facts, we use the calibrated model to

simulate a sufficiently large number of paths and construct a discrete scenario tree that

appropriately describes the expected behaviour of the daily log returns. Compared to

other common scenario construction approaches, the method suggested in this thesis

provides a better fit to the individual and joint dynamics of the financial series. However,

the generated scenario tree is neither free of arbitrage opportunities nor tractable in

real multi-period stochastic portfolio optimizations. Therefore, our next step is to

construct a smaller but sufficiently similar (in some proper sense discussed later)

scenario tree that is both manageable and arbitrage-free. To accomplish that, we

extend the optimal discretization method based on process distances, proposed by

Pflug [243] and further developed by Pflug and Pichler [245], [246] and Kovacevic and

Pichler [186].

The process distance is a generalization of the Wasserstein distance. While

the latter is a distance function between probability distributions, the former is a

distance functional between stochastic processes. Recalling that scenario trees are

simply filtered discrete probability spaces that represent the stochastic process model

of the simulated data, we can use process distances to evaluate how similar (i.e., how

close) two scenario trees are. Moreover, we can combine them with discretization

5

methods to approximate the original scenario trees by smaller, more manageable

scenario trees.

These distances and their variants, which are inherently related to Optimal

Transport Theory, have been vastly used in engineering applications, as pointed out

by Villani [307]. Conversely, finance applications have been scarce because usual

discretization methods based on process distances are not arbitrage-free. Therefore,

we describe in the next sections how we can extend the optimal discretization

approach to exclude arbitrage opportunities in order to construct a smaller but

reasonably similar scenario tree. We also briefly discuss other possibilities to construct

scenario trees for stochastic optimization such as conditional sampling methods,

moment-matching approaches and scenario clustering techniques, and point out why

we should exercise caution when using them in financial applications.

Lastly, once we have constructed the approximated scenario tree, we perform

our concluding step: a multi-period multi-objective stochastic portfolio optimization. It

is particularly important because there is wide literature regarding Multi-Period

Stochastic Optimization (MSO) and Multi-Objective Optimization (MOO) but very little

research combining them.

MSO has been extensively applied to capacity planning, transportation,

logistics, and long-term asset allocation over the last decades, as pointed out by

Wallace and Ziemba [309]. In these applications, future decisions depend on both

preceding decisions and observations. Moreover, any optimal solution (i.e., optimal

sequence of decisions) must take into account uncertainty. For financial applications

in particular, Multi-Period Stochastic Programming (MSP) provides a sound alternative

to handle such uncertainty, provided we can estimate the expected joint probability

distribution of the returns using scenario trees. Besides, MSP also offers flexibility to

accommodate markets frictions and elaborated risk constraints.

MOO, on the other hand, has been a little less popular. Although intuitively

appealing, applications with multiple objectives have a formidable challenge: there is

no unique optimal solution. Instead, there are many (possibly infinite) Pareto optimal

solutions. The more complex and numerous the objective functions, the more difficult

it is to identify the non-dominated solutions. As discussed by Marler and Arora [210],

Ehrgott [87], and Abraham, Jain and Goldberg [1], there are various techniques to

perform a MOO such as weighted sum approaches, epsilon-constraint methods, goal

6

attainment techniques and genetic algorithms, which we discuss in the literature

review. Among them, the epsilon-constraint method is one of the few that can be

efficiently combined with MSP, provided we have a small number of objectives.

Thus, in this work we combined a MSP approach with the epsilon-constraint

method to perform our multi-period multi-objective stochastic portfolio optimization.

The objectives are simple: (i) maximize expected return, (ii) minimize variance and (iii)

minimize conditional value-at-risk. While the first two objectives are nearly mandatory

in most practical portfolio optimization problems, the last one is chosen to manage

market risk more accurately in real applications. Although, the objectives’ selection is

arbitrary, it considers some of the most common objectives observed in real asset

allocation applications, as remarked by Roman, Darby-Dowman and Mitra [269].

This thesis’ contributions to the literature are twofold: (i) it constructs arbitrage-

free scenario trees that, differently from engineering applications (e.g. supply chain

management, inventory control, energy production), are a fundamental cornerstone of

financial applications and (ii) it provides an alternative framework to multi-period multi-

objective stochastic portfolio optimization. Although quite useful for real financial

applications, both subjects have been mildly explored by the literature.

The remainder of this work is organized as follows. Section 2 presents a brief

review of the related literature. Section 3 discusses the theoretical background related

to the scenario tree generation (i.e., GARCH models, EVT and copulas), the scenario

tree approximation (i.e., optimal discretization using process distances) and finally the

multi-period multi-objective portfolio optimization (i.e., Mean-Variance-CVaR

optimization). Section 4 describes how to integrate these methods and theories into a

unified framework. Finally, Section 5 presents some empirical results and Section 6

concludes, discussing possible further researches.

7

2 Literature Review

Many peculiar statistical features of financial time series have been

comprehensively discussed in the literature. As stressed by Campbell, Lo, and

MacKinlay [195], returns are hardly random walks. This belief dates back to Mandelbrot

[202] [203], who was among the first to suggest that financial series are seldom

Gaussian, often displaying higher kurtosis and more peaked distributions. In fact,

financial returns exhibit a set of important empirical properties, known as stylized facts,

that are observed in different markets and instruments and that influences various

financial decision problems.

More precisely, Rydberg [274], Cont [58] and McNeil, Frey and Embrechts

[214] point out that, among the most studied stylized facts regarding daily financial time

series, we can underline:

(i) Presence of heavy tails

(ii) Presence of volatility clustering

(iii) Asymmetric distribution of raw returns

(iv) Lack of significant serial correlation in raw returns

(v) Strong serial correlation in squared and absolute returns

(vi) Negative correlation between volatility and returns (known as leverage

effect)

(vii) Serial correlations of powers of absolute returns are strongest at power

one (known as Taylor effect)

(viii) Considerable evidence of cross-correlation in squared and absolute

returns

(ix) Presence of tail co-dependence between series of raw returns

(x) Time-varying cross-correlations

(xi) Long-range dependence

As stressed by Cont [58], it is somewhat difficult, if not impossible, model all

these features together. Keeping that in mind, in this brief review we focus mostly on

8

four key stylized facts: heavy-tails, volatility clustering, leverage effect and tail co-

dependence; that we will accommodate in the framework proposed later. A more

detailed review of stylized facts is offered by Campbell, Lo and MacKinlay [42], Taylor

[295] [296], and Tsay [301].

Mandelbrot [202] [203] was one of the first who claimed that commodity and

stock extreme returns are rather more common than suggested by Gaussian

distributions. As an alternative, the author suggested stable Paretian distributions (i.e.,

distributions with tail index � < 2) to accommodate series with such extreme

observations and preserve the assumption of independent and identically distributed

returns. Fama [97] [98] also supported this hypothesis pointing out that further studies

would be required to use it properly.

Although Mandelbrot and Fama correctly suggested that the Gaussian

distribution was inadequate to model financial returns, the proposed stable distribution

presented some disadvantages such as infinite variance (since � < 2). Although there

are techniques to manage this family of distributions as discussed for instance by

Mittnik, Paolella and Rachev [222] [223] and Taleb [292], many studies like Hsu et al

[147], Hagerman [122], Perry [239], Ghose and Kroner [110] and Campbell, Lo and

MacKinlay [42] advocate that there are more suitable approaches, with less

counterfactual implications (e.g., variance does not appear to increase as sample size

increases, long-horizon financial returns seem to follow Gaussian distributions) to

accomplish this objective.

Consequently, alternative approaches were proposed to handle extreme

returns properly. Blattberg and Gonedes [42] suggested the Student’s t-distribution as

a better alternative to capture the heavy-tail behaviour. Kon [180] questioned the

assumption that financial daily returns are stationary and pointed out that mixtures of

normal distributions are better suited than Student t distributions to describe such

leptokurtic distributions. Barndorff-Nielsen [16] introduced the flexible Generalized

Hyperbolic (GH) distributions whose particular cases were later applied to model

heavy-tails and, sometimes, asymmetric gains and losses. For example, Barndorff-

Nielsen [17] [18] recommended the Normal Inverse Gaussian (NIG) distribution.

Eberlein and Keller [85], and Eberlein, Keller and Prause [86] proposed a solution

based on hyperbolic distributions that was further generalized by Prause [250]. Hansen

[125] suggested a less common special case of the GH distribution: a skewed Student

9

t-distribution. These later studies appear to support Cont [58], who inferred that any

distribution capable of properly modelling the distribution of financial returns require at

least four parameters: location, scale, shape and asymmetry.

All approaches mentioned previously focused on modelling the whole

distribution of returns. Other interesting methods available to handle extreme

observations, which have been extensively applied in hydrology, meteorology and

insurance, comprise modelling the distribution tail’s behaviour separately. These

methods are based on Extreme Value Theory (EVT). Some of the first applications of

EVT in finance were proposed by Jansen and Vries [154], Koedijk et al. [178][179],

Longin [197] [198], Tsay [300], and Dacorogna at al. [59]. As underlined by Cont [58],

various works such as Longin [197], Lux [200] and Hauksson et al. [130] indicate that

a Fréchet distribution with shape parameter 0.2 ≤ � < 0.4 (i.e., with tail index 2.5 < � ≤5) seems suitable to model the distribution tails of daily financial series. Cont [58] also

points out that such shape parameter range suggests that the variance is indeed finite

and tails are not as heavy as those achieved by stable Paretian distributions. In the

next section we briefly discuss these methods and we refer to Embrechts, Klüppelberg

and Mikosch [90], and Coles [53] while McNeil, Frey and Embrechts [214] and

Finkenstädt and Holger [102], for comprehensive reviews and applications of EVT.

Another major stylized fact in financial series is volatility clustering. Although

Mandelbrot [202] first remarked that large (small) returns tend to be followed by large

(small) returns, was Engle [93] who introduced the Autoregressive Conditional

Heteroscedastic (ARCH) model, one of the first serious approaches trying to explain

this empirical property. Soon after, Bollerslev [34] proposed a more parsimonious

extension: the Generalized Autoregressive Conditional Heteroscedastic (GARCH)

model, which became a keystone for real-world volatility modelling problems.

GARCH models provided an appealing alternative to earlier approaches

because they were able to better explain relevant properties of empirical daily returns.

First, daily returns do not appear to be independent and identically distributed.

Secondly, squared returns appear to be time varying and exhibit strong evidence of

serial correlation. Lastly, extreme returns appear to cluster together, independently of

their sign.

10

Furthermore, many sophisticated extensions of the original GARCH have been

proposed over the last decades. For instance, there are various nonlinear

specifications such as the Nonlinear ARCH (NARCH) model developed by Engle and

Ng [96], the Asymmetric Power GARCH (APGARCH) model introduced by Ding et al.

[77], and some variations of the Smooth Transition GARCH (STGARCH) model

suggested by Hagerud [123], Gonzalez-Rivera [115] and Lubrano [199].

Other extensions focus on substantial evidence of long memory (i.e., very slow

decay) in the empirical serial correlations of absolute or squared daily returns as

discussed by Taylor [296], Ding et al. [77] and Ding and Granger [76]. The Fractionally

Integrated GARCH (FIGARCH) proposed by Baillie, Bollerslev and Mikkelsen [12] and

the Long Memory GARCH (LMGARCH) suggested by Karanasos et al. [167] are

examples.

Additionally, fitting traditional GARCH models to series with structural changes

may lead to potential misspecifications, as discussed by Hamilton and Susmel [124],

Cai [41], and Mikosch and Stărică [220]. Consequently, regime-switching models, such

as the Markov-Switching GARCH (MS-GARCH) models suggested by Gray [117],

Dueker [80] and Klaassen [172], were developed to take into account potential

structural breaks.

Besides GARCH approaches, there are alternatives based on stochastic

volatility such as Taylor [294] and Harvey and Shephard [128], and more recently, on

realized volatility like Andersen et al. [1] [6], and Barndorff-Nielsen and Shephard [19]

that also provide invaluable tools to properly explain some important stylized facts

discussed in the literature. We do not discuss them here and refer to Bauwens, Hafner

and Laurent [20], Andersen et al. [1] and Tsay [301] for further details regarding these

alternatives.

Regardless of these various enhancements and alternatives, the basic

univariate formulation GARCH(1,1) is still the most used according Bauwens, Hafner

and Laurent [20], and McNeil, Frey and Embrechts [214]. The reasons are twofold:

they are easier to fit and they provide reasonable estimates, particularly when

combined with complementary techniques like extreme value theory and copulas. We

will discuss some details of this particular specification in the next section and refer to

Andersen et al. [7], Tsay [301], and Bauwens, Hafner and Laurent [20] for a

comprehensive review.

11

Moreover, straightforward extensions of the GARCH(1,1) formulation are also

commonly used to address other additional statistical features of daily financial series.

In particular, some of these modifications lead to asymmetric models that can

accommodate another key stylized fact: the leverage effect.

This feature was first discussed by Black [31] and Christie [52] who noticed

that declining asset prices (i.e., negative log returns) are often follow by increasing

volatility. Further research about this asymmetry of innovations shock led, for example,

to the Exponential GARCH (EGARCH) model by Nelson [232], the GJR-GARCH model

by Glosten, Jagannathan and Runkle [111] and the Threshold GARCH (TGARCH) by

Rabemananjara and Zakoian [252] and Zakoian [312].

There are some differences among these extensions. For instance, moment

conditions for the EGARCH are less strict than those for GJR-GARCH and TGARCH,

and GJR-GARCH models the conditional variance while TGARCH focus on the

conditional standard deviation. Nonetheless, to the best of our knowledge, there are

not many studies comparing their results. Engle and Ng [96] is one exception that

suggests that the GJR-GARCH model appears more appropriate to explain extreme

shocks than the EGARCH model. Another exception can be found in Tsay [301] who

provides a few examples suggesting that no model is statistically superior. Regarding

practical applications, parsimonious EGARCH(1,1) and GJR-GARCH(1,1)

specifications are often preferred according Andersen et al. [7], and Bauwens, Hafner

and Laurent [20].

To this point we have discussed only univariate approaches but,

unsurprisingly, various multivariate specifications were proposed over the last

decades. Focusing on multivariate GARCH models for instance, Bollerslev et al. [37]

introduced the Diagonal VECH (DVECH) model. Although quite general, it has limited

practical applications because it requires too many parameters to be estimated. Baba,

Engle, Kraft and Kroner [11] and Engle and Kroner [95] suggested the BEKK model,

which further restricted the DVECH specification but still faced limitations with large

dimensions. To address this curse of dimensionality, Bollerslev [36] proposed the

Constant Conditional Correlation (CCC) Model whose constant correlation strong

assumption was later relaxed in the Time-Varying Correlation (TVC) model of Tse and

Tsui [302] and the Dynamic Conditional Correlation (DCC) model suggested by Engle

[94]. More recent specifications of these dynamic correlation models, such as

12

Cappiello, Engle and Sheppard [44] and Hafner and Franses [121], also address

additional features of financial series like leverage effects and also explain elaborate

joint behaviours of financial markets such as contagion and volatility spillover effects.

For further details, Bauwens, Laurent and Rombouts [21], and Silvennoinen and

Teräsvirta [281] provide an inclusive review of multivariate GARCH models.

Despite of the appealing properties of multivariate GARCH models, some

works such as McNeil, Frey, and Embrechts [214] and Patton [238] suggest that

copulas, which were introduced by Sklar [282] and later promoted by Joe [160] and

Nelsen [231], may offer a compelling alternative because they separate the

dependence structure from the univariate marginal distributions allowing more

flexibility (a key aspect for many practical applications). However, depending on the

required degree of flexibility, accuracy and tractability, copula models may range from

straightforward static combinations of different univariate distributions, such as those

presented by Patton [235], to much more elaborated applications comprising dynamic

multivariate processes like those discussed in Patton [236].

Among the most common copula types we have the (i) elliptical copulas like

the Gaussian and t copulas, which are readily extended to multivariate settings but can

explain only simple dependence structures; (ii) Archimedean copulas such as Frank,

Gumbel and Clayton copulas that, conversely, are difficult to apply to high dimensional

problems but offer greater flexibility to explain the joint dependence; (iii) vine copulas,

which handle high dimensional dependence by combining flexible bivariate copulas

families and; (iv) mixture copulas that are able to accommodate quite sophisticate

dependence structures. We recommend Embrechts, Lindskog and McNeil [91] for a

good review of elliptical and Archimedean families, Kurowicka and Joe [189] for a

detailed discussion of vine copulas, and Joe [161] for a comprehensive review,

including some mixture copula models.

Even though elaborated models like vine and mixture copulas provide greater

flexibility, practical financial applications (i.e., applications that comprise many

instruments) often favour more tractable approaches like elliptical copulas. Moreover,

tail co-dependence is often the most pressing concern in these applications. For that

reason, t copulas and their extensions are usually the preferred choice. See for

example Demarta and McNeil [64].

13

Additionally, many recent financial applications combine copulas with other

methods models to achieve better results. Important examples are GARCH-Copula

models, discussed by Rockinger and Jondeau [162][268], Huang et al. [150], and

Riccetti [263][264]. These models can be fitted using a two-step estimation approach,

conceptually similar to that used by McNeil and Frey [213] to combine EVT and

GARCH models.

As we elaborate in the next sections, we can integrate these two-step

estimation approaches into a single three-step GJR-GARCH + EVT (GPD) + t-Copula

framework to simulate multivariate trajectories (also known as paths) that have a

common starting point. This simulated multivariate scenario fan, which is a particular

case of a scenario tree, not only captures the uncertainty of the daily log returns but

also accommodates reasonably well the four key stylized facts mentioned previously:

heavy-tails, volatility clustering, leverage effect and tail co-dependence.

Similar works applied to exchange rates and commodities can be found Wang,

Jin and Zhou [310] and Tang et al. [293]. Besides, other studies that jointly explain at

least three stylized facts can be found in Xiong and Idzorek [311], which accounts for

heavy tails, skewness, and asymmetric dependence, and Hu and Kercheval [148],

which proposes a GHD approach to explain asymmetric dependence, volatility

clustering and (partially) heavy tails. There are also a few elaborated works that

capture four stylized facts. Patton [236], and Viebig and Poddig [306] are examples.

So far, we reviewed combined methods based exclusively on historical daily

log returns. Nevertheless, there are several alternative methods available to simulate

multivariate scenarios that use additional information such as economic variables or

forward-looking indicators. Boender [33], Cariño et al. [45], and Cariño, Myers and

Ziemba [46], for instance, construct scenarios using Vector Autoregressive (VAR)

models. Mulvey and Vladimirou [228], on the other hand, generate scenarios using

decomposition methods based on Principal Component Analysis (PCA). Mulvey [225],

and Mulvey and Thorlacius [227] employ dynamic stochastic simulation to forecast

both economic series and asset prices. For additional information regarding these

different scenario generation methods, we refer to Ziemba and Mulvey [314], and

Kouwenberg and Zenios [185].

These alternative methods based on economic variables or forward-looking

indicators are vastly used in practical applications. However, they are not particularly

14

tailored to accommodate the stylized facts mentioned earlier, which are particularly

relevant to portfolio optimization and risk management applications. For that reason,

in this thesis we simulate the multivariate scenarios using the suggested GJR-GARCH

+ EVT (GPD) + t-Copula three-step approach, which can properly describe the

multivariate underlying stochastic process of the daily log returns (at the expense of

additional information that could prove useful in some cases).

This combined approach, however, exhibits two major limitations: the

generated multivariate scenario tree (i) is often too large (i.e., computationally

demanding) to be applied in practical finance applications and (ii) is not free of arbitrage

opportunities. Therefore, we require additional steps to handle these limitations.

Regarding the first shortcoming, numerous techniques have been proposed to

approximate the original scenario trees by more tractable ones, as highlighted by

Dupačová, Consigli and Wallace [82] and Kouwenberg [184]. Dupačová, Consigli and

Wallace [82], Høyland and Wallace [144] and Prékopa [251] also point out that the

features considered by each particular stochastic optimization problem define which

properties should be contemplated by a good scenario tree approximation (e.g., a

traditional Mean-Variance optimization only entails the approximated first two moments

of the original multivariate distribution). Therefore, true convergence towards the

original scenario tree (based on the true underlying stochastic process) may be desired

but is not mandatory to achieve optimal solutions for Stochastic Programming

problems.

These approximation techniques seek to generate tractable scenario trees and

guarantee that the different approximated scenario trees lead to similar, stable and

optimal objective values, with small approximation errors. As reviewed in Dupačová,

Consigli and Wallace [82], and Heitsch and Römisch [133], we discuss some common

techniques used in finance, energy and logistics problems, which are commonly

grouped in four categories: sampling algorithms, moment-matching approaches,

clustering techniques or optimal discretization methods.

Sampling algorithms are likely the first approaches used to approximate

stochastic processes. They comprise different techniques that basically draw possible

values from a given distribution or stochastic process for each node of a predefined

scenario tree. Among these sampling techniques, bootstrapping sampling, random

15

sampling, importance sampling, rejection sampling, conditional and Markov Chain

Monte Carlo (MCMC) sampling are among the most used in practical applications.

Traditional or adjusted random sampling, also known as Monte Carlo

sampling, are the simplest, and often the least efficient, of these methods. Essentially,

they draw values randomly and associate them to probabilities using the predefined

distribution. Similarly, bootstrapping sampling methods are just random sampling

methods, with replacement, that draw values from the sample itself. Cariño et al. [45],

Consigli [54], Consigli and Dempster [55], and Consigli, Ianquita and Moriggia [56] for

instance, use these methods to construct their scenario trees.

Despite their popularity, random sampling techniques usually require too many

draws to be reliable. Therefore, we often consider other, more efficient, techniques

such as (sequential) importance sampling methods. These methods restrict (or give

more importance to) the random draws to relevant regions of alternative distributions

that are easier to handle. Dupačová, Consigli and Wallace [82], Dempster and

Thompson [71], and Dempster [67] offer interesting applications of these methods.

Moreover, Dupačová, Consigli and Wallace [82] also combines the sequential

importance sampling with the moment matching and clustering approaches (described

later) to achieve more reliable scenario approximations. Likewise, rejection sampling

follows a similar approach to importance sampling but trades simplicity for efficiency,

as discussed by Pflug and Pichler [247].

One important drawback of most of the previous sampling techniques is that

current draws do not depend on previous draws. Consequently, they preclude any type

of temporal dependence, which is quite relevant for numerous financial applications.

Conditional sampling methods take away this shortcoming by keeping track of

previously drawn values. The popular Sample Average Approximation method is an

example of conditional sampling extensively discussed by Kleywegt, Shapiro and

Homem-de-Mello [176] and Shapiro [278]. More recently, MCMC algorithms (such as

Gibbs sampling and Metropolis-Hastings-Green) have been mixed with some of the

previously mentioned methods like importance sampling, leading to advances in

multidimensional optimization problems. We refer to Ekin, Polson and Soyer [88], and

Parpas et al. [234] for recent examples.

Unfortunately, although sampling methods exhibit very nice asymptotic

properties, large scenario trees are often intractable computationally, particularly in

16

high dimensional spaces (expect for some very sophisticated MCMC sampling

models), hindering most practical applications. Likewise, small scenario trees based

on sampling methods usually lead to poor accuracy (i.e., large approximation errors

and instability). For those reasons, alternative approximation techniques are often

considered in practice.

Moment-matching (or, more generally, property-matching) approaches are

among these alternatives. As highlighted by Dupačová, Consigli and Wallace [82],

these methods do not required full knowledge of the true underlying stochastic process

and, consequently, are particularly convenient when we have limited information about

its probability distribution. Instead, property-matching approaches need only a few

statistical properties of the distribution (e.g., moments, co-moments, percentiles,

principal components, serial correlations) to simulate an approximated scenario tree,

i.e., a discrete probability distribution that is consistent with the given properties.

.Besides requiring less information (i.e., just some statistical properties of the

underlying stochastic process), moment-matching approaches can match most

properties using relatively few scenario when compared to traditional sampling

methods. They can also easily accommodate more elaborated dependency structures

such as serial correlations.

However, they also present relevant shortcomings. First, as discussed by

Dupačová, Consigli and Wallace [82] all relevant statistical properties for the

optimization problem are assumed to be known before the optimization problem is

solved, but that might not be always true. Second, Hochreiter and Pflug [141], and

Pflug and Pichler [246] show that completely distinct approximated scenario trees (i.e.,

quite different stochastic processes) might lead to the desired statistical properties,

possibly leading to unreliable solutions. Lastly, Kaut and Wallace [168] emphasises

that, differently from sampling methods, property-matching approaches do not

guarantee convergence by simply increasing the number of simulated scenarios, which

may produce instability and large optimality gaps.

We suggest Høyland and Wallace [144], Høyland, Kaut and Wallace [145] and

Dupačová, Consigli and Wallace [82] as well-known references regarding moment-

matching methods and, more recently, Date, Mamon and Jalen [62], Consiglio, Carollo

and Zenios [57], and Beraldi, Simone and Violi [26] as more elaborated applications of

17

such approximation methods. We also recommend King and Wallace [171] for a

comprehensive review about the subject.

Scenario clustering techniques have also been employed as alternatives to

traditional sampling methods. They use different methods of cluster analysis to bundle

similar scenarios together to approximate the original scenario trees by smaller ones.

However, as pointed out in Aggarwal [1], clustering techniques for time-series are more

challenging because they fall within the class of contextual representation, i.e.,

techniques that should consider both behavioural (e.g., price) and contextual (e.g.,

time) attributes. In that regard, Liao [190], and Rani and Sikka [258] offer interesting

surveys about time-series clustering techniques. They underline that among the many

clustering categories available, three have been used (directly or indirectly) to handle

time-series clustering: partitioning methods, hierarchical methods, and model-based

methods.

Partitioning clustering methods constructs (crisp or fuzzy) partitions of the

data, where each partition represents a cluster containing at least one element. As a

popular example in this category, we can point out the k-means algorithms, which are

characterized by the mean value of the elements in that cluster. Pranevičius and

Šutienė [290][291], and Šutienė, Makackas, and Pranevičius [289] applied elaborated

variations of the k-means algorithm to generate approximated scenario trees that

preserve important statistical features of the underlying stochastic process such as

intertemporal dependency structures. In addition, Beraldi and Bruni [25], and Gulpinar,

Rustem and Settergren [120] discuss interesting variations of the k-means algorithm

applied to stochastic optimization problems. However, as explained by Liao [190],

basic partitioning methods perform well with spherical-shaped clusters and small or

medium data sets but do not handle properly clusters with non-spherical or complex

shapes, which are quite common in financial applications.

Hierarchical clustering methods constructs a multilevel cluster tree by grouping

the data using agglomerative or divisive algorithms. Agglomerative (bottom-up)

clustering puts each element in its own cluster and then merge clusters into larger

clusters, until predefined conditions (e.g., desired number of clusters) are achieved.

Divisive (top-down) clustering, on the other hand, start at the top with all elements in a

single cluster and then splits it recursively until the predefined conditions are met.

Unfortunately, plain hierarchical methods are also poorly equipped to handle complex

18

scenarios as discussed by Liao [190], and Pranevičius and Šutienė [290] because they

cannot make adjustments once a merge or split action has been performed, which may

lead to poor scenario structures. On the other hand, as demonstrated by Musmeci,

Aste and Di Matteo [229], and Tumminello, Lillo and Mantegna [303], some elaborate

extensions of hierarchical methods can achieve good results, at the expense of

tractability.

Model-based clustering methods assume a model for each of the clusters and

attempt to find the best fit for the data. Statistical-based methods falls into this category.

Although fairly robust due to the use of more sophisticated frameworks (e.g., Hidden

Markov Models) and similarity measures (e.g., Dynamic Time Warping, Longest

Common Subsequence), these methods are usually rather complex and rare in the

financial literature. As examples, we refer to Knab et al. [177], and Dias, Vermunt and

Ramos [73][74] for sophisticated applications involving Hidden Markov Models (HMM)

to construct clusters of financial time-series.

Lastly, optimal discretization methods have also been used as alternative

approaches to approximated scenario trees, according Pflug [241][243], Römisch

[270], Pflug and Römisch [248], and Heitsch and Römisch [132][133][134][135][137].

As stressed by Pflug [241][243], although more elaborated these methods offer a

sound theoretical basis for approximating the true (and often intractable) underlying

probability distribution by a manageable and suitable probability measure with finite

support. In simply terms, such approximation is an optimization that seeks to minimize

an appropriate probability distance metric between the original set of scenarios and

the reduced set of scenarios.

In the literature, this approach is also known Monge-Kantorovich mass

transport problem, which in the scenario tree context corresponds to the optimal

redistribution of the approximated scenario probabilities in order to minimize a

probability distance between the original and the approximated scenario tree. We refer

to Villani [307] and Rachev and Rüschendorf [255][256] for detailed discussions about

mass transportation theory and to Gröwe-Kuska, Heitsch and Römisch [118],

Hochreiter, Pflug and Wozabal [142], and Heitsch and Römisch [133] for examples of

applications outside financial markets.

Optimal discretization theory relies deeply on the choice of a suitable

probability distance metric. As discuss by Rachev [253], Rachev, Stoyanov and

19

Fabozzi [257], and Gibbs and Su [112], numerous probability metrics are available but

most, such as the Kolmogorov metric or the total variation distance, are poorly

equipped to properly measure the distance between probability distributions. Among

the appropriate metrics to that accomplish that, there are even fewer that suited multi-

period stochastic optimization problems, i.e., that can be extended from probability

distribution distances to stochastic process distances. The most common candidates

in the literature are the Fortet-Mourier and Wasserstein distances.

Römisch [270], Heitsch and Römisch [132] and Dupačová, Gröwe-Kuska, and

Römisch [83] are renowned examples of scenario reduction using the Fortet-Mourier

metric. Likewise, Heitsch and Römisch [133]; Hochreiter and Pflug [141], Pflug

[241][243], Pflug and Pichler [244][245][246][247], and Kovacevic and Pichler [186] are

popular examples of scenario approximation using the Wasserstein metric. Moreover,

Dempster, Medova and Sook [68] provides an interesting discussion about the

statistical properties of the Wasserstein distance and its application in optimal

discretization for stochastic programming.

In particular, Pflug and Pichler [244][245][246], and Kovacevic and Pichler

[186] also explore in detail the Process (or Nested distance), which is an extension of

the Wasserstein distance for multi-period problems (i.e., for problems where we should

use some type of filtration distance, as explained by Heitsch and Römisch [132], and

Heitsch, Römisch and Strugarek [138]). They show that, in such problems, we need to

consider both the probability measures and the value-and-information structure

(represented by a proper filtration or scenario tree) during the discretization process of

a stochastic optimization problem. In this thesis, we follow this approach as well and

explain it in detail in later sections.

Concerning the second limitation of the GJR-GARCH + EVT (GPD) + t-Copula

framework to generate scenario trees, arbitrage-free scenarios are mandatory to real

finance applications to avoid biased solutions, as stressed by Klaassen

[173][174][175], Sodhi [286], Geyer, Hanke and Weissensteiner [107][108] and

Consigli, Carollo, and Zenios [57]. They also underline that most of the scenario

approximation methods do not take that into account. Accordingly, literature regarding

arbitrage-free approximation methods is, at best, limited.

Klaassen [173], for instance, shows how arbitrage opportunities can lead to

biased solutions (i.e., biased investment strategies) in stochastic programming

20

optimization problems. Klaassen [174] goes further and discusses a two-step process:

one to construct an arbitrage-free scenario and a second to reduce the tree size by

state and time aggregation while preserving the absence of arbitrage opportunities.

Unfortunately, such approach does not guarantee that other statistical features (e.g.,

kurtosis, covariance or serial correlation) are preserved. Klaassen [175], on the other

hand, extends Høyland and Wallace [144] moment-matching method, which preserves

some predetermined statistical features of the underlying stochastic process but does

not preclude arbitrage opportunities. Klaassen [175] describes how we can identify,

ex-post, such opportunities and suggests how we can avoid them, ex-ante, by adding

no-arbitrage constraints to Høyland and Wallace [144]’s moment-matching

approximation method.

Klaassen [175] also suggests that we can rerun the scenario approximation

method until an arbitrage-free scenario tree is achieved. However, Geyer, Hanke and

Weissensteiner [107][108] shows that simply rerunning or even increasing randomly

the branching scheme of the scenario tree do not guarantee an arbitrage-free scenario

tree, but it does make the optimization problem less tractable. Adding constraints to

exclude arbitrage opportunities lead to non-convex optimization problems, which

usually require a global optimization approach. Consigli, Carollo, and Zenios [57], for

instance, propose a refinement of the moment-matching method with no arbitrage-

constraints that uses such an approach (based on lower bounding techniques).

Geyer, Hanke and Weissensteiner [107][108][109] also points out that the

absence of arbitrage implies a lower bound on the branching scheme of the scenario

tree. The number of possible children scenarios at each node (i.e., each state of nature,

characterized by a particular realization of prices or returns) of the scenario tree usually

must be greater than the number of non-redundant instruments considered in the

optimization problem so that some statistical properties of the underlying stochastic

process can be properly preserved. Note that this condition is more stringent than

Harrison and Kreps [127]’s no-arbitrage condition.

Additionally, Jobst and Zenios [159], Fersti and Weissensteiner [100], and

Geyer, Hanke and Weissensteiner [107] stress that multi-period portfolio optimization

problems should use objective (i.e., real-world) probability measures rather than risk-

21

neutral probability measures to account properly for the (estimated) market price of

risk.

Finally, assuming that tractability and arbitrage conditions have been

achieved, we still need to evaluate the quality of the optimal solutions provided by these

techniques. As we mentioned earlier, such quality is typically evaluated in the literature

based on two key characteristics: optimality gap and stability.

The optimality gap, also known as the approximation error, between the

approximated and the original scenario tree, which represents the true underlying

stochastic process, is a major aspect of the approximation quality. Since the optimal

objective values or policies for true underlying stochastic problem are hardly available

(otherwise we would not need approximations in first place), we must find indirect

methods to estimate such gap. Many studies like Kaut and Wallace [168], Römisch

[270], Heitsch and Römisch [136], and Pflug [241] debate possible methods to control

it as well as their implications to the discretization process. Pflug [241], in particular,

provides a good review about the subject and proposes an important lemma that

establishes an upper bound on the optimality gap, which we discuss in later sections.

Such result is paramount to our optimal discretization approach based on the

Wasserstein and Process distances, as we will see later.

Stability is another major aspect related to the quality of the approximated

scenario tree. According Kaut and Wallace [168], King and Wallace [171], and

Dempster, Medova and Sook [69], stability analyses usually focus on both in-sample

and out-of-sample stability of the objective function rather than the stability of the

solutions, which may vary significantly due to the highly nonlinearity and non-convexity

of the optimization problem.

Definitions of in-sample and out-of-sample stability vary in the related

literature. In this thesis, we follow King and Wallace [171]’s definitions. In-sample

stability implies that optimal solutions do not vary significantly if we generate new

scenario trees with similar cardinality, using the same method. The concept of in-

sample stability implicitly assumes that the entertained scenario approximation method

does not produce the exactly same scenario tree for the same input data. Out-of-

sample stability, on the other hand, guarantees that the generated optimal solutions

lead to optimal objective function values that are close to the true objective value.

Clearly, out-of-sample stability is achieved if the optimality gap is sufficiently small.

22

Unfortunately, since the true objective function value is typically unknown in practical

applications, out-of-sample stability must be evaluated approximately by alternative

tests, as we discuss later.

Kaut and Wallace [168], and King and Wallace [171] also highlight that in-

sample stability does not imply out-of-sample stability and vice-versa, but both help to

establish a lower bound on the cardinality of the scenario tree. Various works in the

literature evaluate the stability of different methods to approximate scenario trees. For

a more comprehensive review of the subject, we refer to Dupačová, Gröwe-Kuska and

Römisch [83], Rachev and Römisch [254], Heitsch and Römisch [136], and Heitsch,

Römisch and Strugarek [138].

The various concepts and techniques mentioned previously suggest that, in

general, no particular scenario construction approach is superior in all circumstances.

They often depend heavily on the context and problem at hand. Dempster, Medova

and Sook [68], and Löhndorf [196] for example, compare sampling algorithms,

moment-matching approaches, clustering techniques or optimal discretization

methods, and conclude that differences are minimal, except for simple sampling

approaches that are outperformed.

It is also worth mentioning that some methods perform the scenario tree

generation and reduction steps simultaneously, rather than generating a large scenario

tree and then approximating it by a tractable one. In this thesis, we focus on the latter,

which seems to be more common in the recent literature. For studies and applications

regarding the former, we suggest Cariño et al. [45], Mulvey [225], Dempster and

Thorlacius [72], and Gulpinar, Rustem and Settergren [120].

Lastly, provided a manageable, sufficiently accurate and arbitrage-free

scenario tree is available, we proceed to the portfolio optimization step. Since

Markowitz’s seminal work [208], portfolio optimization has been a major subject in

financial markets. Although elegant, his single-period Mean-Variance approach entails

various simplifications that are hardly satisfied in real markets. For example, variance

is not a coherent risk measure; actual financial problems commonly require decisions

involving time varying (i.e., uncertain) variables over multiple periods (e.g., weeks,

months or years); portfolio rebalances incur in transaction costs and, sometimes,

liquidity issues and; taxes and regulatory restrictions often influence portfolio

performance.

23

A myopic (i.e., single-period) decision is optimal only in very particular

circumstances. Typically, as pointed out by Mulvey, Pauling and Madey [226] and

Brandt [40], (i) it fails to benefit or hedge against the presence of dynamic patterns and

relationships among financial instruments, (ii) it does not account for market liquidity

and transaction costs and (iii) it deals poorly with uncertainty. To accommodate such

elaborated market dynamics and frictions, various alternatives and extensions were

suggested. We refer to Guastaroba [119], Jarraya [155], Mansini, Ogryczak, and

Speranza [205] [206], and Kolm, Tütüncü and Fabozzi [166] for recent surveys. Among

these various alternatives, Multi-Period Stochastic Optimization (MSO) approaches

were particularly appealing because of their flexibility to handle key features of real

markets.

One key difference of multi-period approaches from single-period ones is that

their optimal solutions do not comprehend a single allocation decision. Instead, they

comprise a whole sequence of successive allocation decisions (i.e., rebalances),

known as allocation policy or strategy. Although an optimal policy rarely offers the best

outcome for any particular scenario, it does suggest a noninferior course of action (in

the Pareto sense) that considers the impacts and probabilities of all possible scenarios,

i.e., it suggests an allocation strategy that takes into account uncertainty. Shapiro,

Dentcheva and Ruszczyński [279], Dupačová, Hurt and Štěpán [84], Kall and Wallace

[165], and King and Wallace [171] provide numerous examples that discuss this

difference and emphasize the advantages of the multi-period stochastic approach.

There are many approaches to multi-period optimization such as dynamic

stochastic control, multi-period stochastic programming, sampling-based methods

(e.g., acceptance-rejection sampling, and importance sampling) and metaheuristic

methods (e.g., genetic algorithms, simulated annealing, and particle swarm

optimization). Early models like Mossin [224], Samuelson [276] and Merton [215], [216]

focused on manageable approaches that could be solved analytically, often based on

continuous-time frameworks with unrealistic simplifying assumptions. Conversely,

recent advances in processing power have greatly improved the tractability of discrete-

time Multi-Period Stochastic Programming (MSP) models, which offer greater flexibility

in financial applications as demonstrated, for instance, by Mulvey and Vladimirou [228],

Dantzig and Infanger [60], and Dupačová, Hurt and Štěpán [84]. Ziemba [313] goes

further and states that scenario-based discrete multi-period stochastic programming

24

usually offers a superior alternative to simulation, control theory and continuous-time

approaches.

Such advantage, however, relies on a critical and rather challenging step: the

construction and approximation of (often) multivariate scenarios. Stability, tractability

and accuracy of MSP’s allocation policies or strategies depend heavily on the quality

and number of the simulated scenarios, commonly represented as scenario trees.

Therefore, the steps mentioned earlier, and explained in detail in the next sections, are

essential to provide a reliable and manageable scenario tree that can be used by a

proper MSP approach. For further details regarding discrete-time stochastic decision

models, we suggest Uryasev and Pardalos [305], and Dupačová, Hurt and Štěpán [84].

In this thesis, a proper MSP approach refers to a stochastic programming

setup that handles satisfactorily, in the Artzner et al. [9] and Föllmer and Schied [105]’s

sense, the risk described by the approximated scenario tree, as discussed, for

instance, by Ruszczyński and Shapiro [273] and Rockafellar [265]. Markowitz’s original

Mean-Variance approach is certainly tractable and elegant but it does not account

properly for actual market risk. To accomplish that, various risk measures have been

proposed in the literature but few have achieved wide acceptance among academics

and practitioners, as discussed by Nawrocki [230].

The Lower Partial Moment (LPM) family of risk measures (or risk functionals),

proposed and explored by Bawa [20], Bawa and Lindenberg [21] and Fishburn [95],

included some of these measures. Such LPM measures are often called safety risk

measures in contrast to the more traditional dispersion risk measures, such as variance

or absolute deviation. As discussed by Nawrocki [230], the advantages of LPM risk

measures are twofold: they can handle non-normal returns and accommodate a myriad

of different investors’ utility functions (i.e., risk seeking, risk-neutral or risk averse

behaviours).

These properties are particularly relevant for practical financial applications

because: (i) non-normality of financial returns are commonly observed in real markets,

as pointed out by Campbell, Lo and MacKinlay [42], and Taylor [295] [296]; (ii)

derivation of a precise (and unique) utility function is challenging, if not impossible at

all, as stressed by Roy [271], and Kahneman and Tversky [163].

25

Therefore, LPM provides more realistic, albeit not perfect (e.g., usually difficult

to optimize or extend to standard portfolio theory), risk measures that are consistent

with stochastic dominance (for positive exponents) and focus on what typically matters

most for investors: the downside risk. We refer to Roy [271], Markowitz [209], Mao

[207], and Fishburn [103] for the original arguments regarding these characteristics

and to Unser [304], Bertsimas, Lauprete and Samarov [28], and Pflug and Römisch

[248] for recent discussions about LPM risk measures.

Among the most renowned downside risk measures, we underline the

Semivariance (SV), suggested by Markowitz [209], the Semi-Absolute Deviation

(SAD), proposed by Speranza [288], the Conditional Value-at-Risk (CVaR), proposed

by Rockafellar and Uryasev [266][267] and the Conditional Drawdown-at-Risk (CDaR),

suggested by Chekhlov, Uryasev and Zabarankin [49][50]. In particular, the CVaR, also

known as Average Value-at-Risk (AVaR) or Expected Shortfall (ES), has been quite

popular in recent practical applications according Xiong and Idzorek [311], Krokhmal,

Palmquist and Uryasev [187], and Krokhmal, Zabarankin and Uryasev [188], because

it captures tail risks reasonably well and it is computationally easy to optimize.

A comprehensive review of these various risk measures, their strengths and

limitations can be found in Dowd [79], Pflug and Römisch [248], Dempster, Pflug and

Mitra [70], and Rachev, Stoyanov, and Fabozzi [257]. Pflug and Römisch [248], in

particular, describes in detail the optimization of two types of risk measures:

acceptability type and deviation type functionals, which are especially suited for MSP.

As we discuss in later sections, we use the same approach to accommodate the risk

measures in the multi-period portfolio optimization step.

Unfortunately, a single risk measure is rarely enough to accommodate

adequately all investor’s goals in most practical portfolio optimization problems, as

pointed out by Roman, Darby-Dowman and Mitra [269]. As an example, variance (or,

correspondingly, standard deviation) is not a coherent risk measure but it is usually a

compulsory objective, from most investors’ perspectives. In such cases, a second (and

coherent) risk measure could be considered in order to manage risk properly. These

real-world cases, where multiple objectives must be evaluated simultaneously, often

lead to Multi-Objective Optimization (MOO) problems.

However, differently from MSP literature MOO literature is somewhat more

recent. There are a few early examples like Jean [156] and Konno, Shirakawa and

26

Yamazaki [182] but, only recently, improvements in processing power made it possible

to evaluate and explore new venues and ideas in practical applications. As examples

of promising applications for portfolio optimization, we can mention the Multi-Objective

Ant Colony Algorithm (MOACO), applied by Doerner et al. [78], the Non Dominated

Sorting Genetic Algorithm (NSGA-II), used by Lin, Wang and Yan [191], and the

Multiple Objective Evolutionary Algorithms (MOEA), entertained by Fieldsend, Matatko

and Peng [101]. We suggest Abraham, Jain and Goldberg [1] and Ehrgott [87] for a

comprehensive review on multicriteria optimization and Jarraya [155] for a recent

survey of MOO applications in asset allocation.

The recent advances in MOO are very appealing but most of them lack a key

feature required for many practical financial applications: they cannot handle multi-

period settings. In fact, the financial literature has barely explored approaches that can

solve stochastic multistage multi-objective problems. Roman, Darby-Dowman and

Mitra [269] and Abdelaziz [1] are among the few examples. Most studies combining

multi-period multi-objective optimization such as Fazlollahi, and Maréchal [99] are

found in Engineering. To contribute on that front, in this thesis we discuss an

additional, but narrowly explored, advantage of MSP approaches: the flexibility to

accommodate multi-objective functions in multi-period portfolio optimization problems.

In the next section, we discuss in detail the theoretical background required to

accomplish that.

27

3 Theoretical Background

In this section we discuss the theoretical framework required to:

(i) Estimate a multivariate probability distribution of daily log returns based

on a GJR-GARCH + EVT (GPD) + t-Copula combined approach that is

able to explain four important stylized facts: heavy-tails, volatility

clustering, leverage effect and tail co-dependence.

(ii) Simulate a sufficiently large tree of multivariate scenarios using the

combined approach, which is assumed to be close to the true underlying

stochastic process governing the multivariate price evolution of the

respective financial instruments.

(iii) Approximate the original large scenario tree by a smaller one with

tractable cardinality and no arbitrage opportunities.

(iv) Optimize a Mean-Variance-CVaR portfolio using Stochastic

Programming together with the approximated scenario, which is

assumed to describe properly the parameters uncertainty.

Figure 1 – Stochastic Portfolio Optimization Process Overview

Following McNeil, Frey and Embrechts [214], King and Wallace [171], and

Pflug and Pichler [246], before we elaborate on each one of these steps, we start with

some fundamental concepts and definitions.

28

Definition 1 A σ-algebra (sigma-algebra) ℱ on a scenario set � is a collection

of subsets of � such that:

(i) � ∈ ℱ

(ii) ℱ is closed under complementation (i.e., if � ∈ ℱ then �� ∈ ℱ)

(iii) ℱ is closed under countable unions (i.e., if ��. ��, ⋯ ∈ ℱ then ⋃ �� ∈ ℱ)

(iv) ℱ is closed under countable intersections (i.e., if ��. ��, ⋯ ∈ ℱ then ⋂ �� ∈ ℱ�� )

The elements of ℱare also called events of ℱ. Two important σ-algebras on �

are the trivial σ-algebra ℱ� = �∅, �� and the complete σ-algebra ℱ� = �� , which

comprises the power set (i.e., the collection of all subsets) of �. Additionally, given any

collection ! of subsets of �, the smallest σ-algebra containing all subsets of !, denoted "�! , is called the σ-algebra generated by !.

Definition 2 Given a scenario set � and a σ-algebra ℱ on �, a measure is a

mapping # from ℱ to the extended real line ℝ% = &−∞, +∞* such that:

(i) #�∅ = 0

(ii) #�� ≥ 0, ∀� ∈ ℱ

(iii) # is countably additive (i.e., if ��. ��, ⋯ ∈ ℱ are pairwise disjoint sets in ℱ

then #�⋃ �� = ∑ #�� )

Definition 3 Given a scenario set � and a σ-algebra ℱ on �, the pair ��, ℱ forms a measurable space. A measurable space is simply a set � equipped with a σ-

algebra ℱ, whose elements are called measurable sets.

Definition 4 Given a scenario set �, a σ-algebra ℱ on � and a measure #, the

triple ��, ℱ, # forms a measure space.

Definition 5 Given a measure space ��, ℱ, # , the measure # is a probability

measure if:

(i) #�� ∈ &0,1*, ∀� ∈ ℱ

(ii) #�� = 1

Definition 6 A probability space is a measure space ��, ℱ, � such that the

measure � is a probability measure.

29

Definition 7 Given a probability space ��, ℱ, � such that � is a scenario

space, ℱ is a σ-algebra on � and � is a probability measure, a (real-valued) random

variable � and a (real-valued) random vector / = �/�, ⋯ , /0 correspond, respectively,

to the mappings:

�: � → ℝ ( 1 )

/: � → ℝ0 ( 2 )

Such that � and / are ℱ-measurable

Definition 8 Given a random variable �, 3 and 4 are, respectively, the

cumulative distribution function and the probability density function of �, provided the

derivative of 3 exists, such that:

3: ℝ → &0,1* ( 3 )

3�5 = 6 4�7 879:� = �� ≤ 5 ( 4 )

Definition 9 Given a random vector � = ��, ⋯ , �0 , the multivariate

distribution function 3 of ��, ⋯ , �0 corresponds to:

3�5�, ⋯ , 50 = �� ≤ 5�, ⋯ , �0 ≤ 50 ( 5 )

Such that the marginal distributions of 3 are

3��5� = lim9>,⋯,9?@>,9?A>,⋯,9B→� 3�5�, ⋯ , 50 , C = 1, ⋯ , D ( 6 )

3.1 Scenario Tree Generation

As we discuss later in detail, an optimization based on stochastic programming

assumes that the underlying stochastic process or probability distribution of the

parameters is known. However, except for very unrealistic cases (i.e., cases that rely

on very strong simplifying assumptions), we cannot easily solve multiperiod stochastic

optimization problems using continuous processes or distributions because they are

functional optimization problems (i.e., solutions are functions, not values).

30

In practice, we often need to approximate the continuous stochastic process

or probability distribution by a similar (in some statistical sense) process or distribution

that has finitely many discrete scenarios like a scenario tree structure. Scenario trees

offer an intuitive way to represent the underlying stochastic process or probability

distribution, providing an easy structure to accommodate the uncertainty of the relevant

parameters and, in the former case, the way that information is revealed over time. In

other words, we can represent a stochastic process ��, ��, ��, ⋯ , , ��, ⋯ , �E� or a

probability distribution associated to ��E� with a discrete scenario process represented

by a tree structure.

Definition 10 Scenario trees are acyclic directed graphs comprising nodes

and leafs (i.e., nodes without descendants) that represent discrete conditional

realizations of a particular random n-vector at a given decision stage t. Nodes can have

neither siblings nor more than one parent. Consequently, scenario trees have only a

single root node. Each node is connected to its descendants by arcs (or branches) and

each arc is associated to a conditional probability of occurrence. Each possible

scenario path (or trajectory) from the root node to a leaf node is also associated to a

(unconditional) probability, which is given by the product of the arc probabilities

pertaining to that path. The number of stages and the branching scheme determines

the topology of the scenario tree. The n-vector values of the root node, which defines

the beginning of the optimization horizon, are assumed to be known.

For instance, following Pflug and Pichler [246] notation, let’s assume a

particular scenario tree with F + 1 = 11 nodes labelled �0, ⋯ , N� and H=2 stages �I�, I�, I�� such that:

31

Figure 2 – Scenario Tree Structure Example

Note that each scenario corresponds to a specific scenario path (or trajectory):

J� → Path 0, 1 and 4

J� → Path 0, 1 and 5

JK → Path 0, 2 and 6

JL → Path 0, 2 and 7

JM → Path 0, 2 and 8

JN → Path 0, 3 and 9

JO → Path 0, 3 and 10

In this example, besides the root node stage I� = �0� we have two other

stages: I� = �1,2,3� and I� = �4,5,6,7,8,9,10�. Moreover, for a given node D ∈�0, ⋯ , N�, we define that:

D ∈ IU ⇒ WXY8�D = D: ∈ IU:� (parent node) ( 7 )

D ∈ IZ, [ ∈ IZ\�, WXY8�[ = D ⇒ [ ∈ D\ (children nodes) ( 8 )

32

D ∈ IZ, [ ∈ I], ^ < _ ⇒ WXY8]�D = [ ⟹ D ≻ [ (all descendant nodes) ( 9 )

Each leaf node D ∈ IE is associated with a possible scenario J� ∈ ℝb, C =1, ⋯ ,7, which has a unconditional probability ��D = ��J� . Consequently, each non-

leaf node [ ∈ IZ, _ < H has probability:

��[ = c ��D 0≻d ( 10 )

Moreover, given the conditional probability ��D|D: , between nodes D and D:

(its parent node), we have:

��D|D: = ��D, D: ��D: = ��D ��D: ( 11 )

��[ = ��D|D: f ��[|[: 0≻d , D ∈ IE , D ≻ [ ( 12 )

��[ = 0 = 1 (root node is known) ( 13 )

Many simulating approaches applied in financial applications, as the one used

in this thesis for instance, often generate a scenario fan, which is a particular case of

a general scenario tree, rather than a traditional scenario tree. They are usually easier

to generate but less efficient (in the sense that they require too many nodes). Once a

scenario fan is simulated, we can approximate it by a smaller scenario tree using

optimal discretization, as we discuss in the next section.

Definition 11 A scenario fan is a scenario tree whose nodes (except for the

leaf nodes) have a single descendant node.

33

Figure 3 – Scenario Fan Example

A key aspect of any scenario tree structure is that its topology governs how

information is revealed at each stage. Consequently, the tree is implicitly linked to the

natural filtration associated to the underlying stochastic process and, as pointed out by

Pflug and Pichler [246], it can be assumed to be equivalent to the pertaining filtered

probability space.

Definition 12 Given a probability space ��, ℱ, � , an increasing sequence of

σ-algebras ℱ = �ℱ�, ℱ�, ⋯ , ℱE on � is a natural filtration if ℱZ ⊆ ℱZ\� ⊆ ℱ for _ =0,1, ⋯ , H − 1. A filtration represents the evolution of the available information up to time _. Additionally, the probability space ��, ℱ, � endowed with a filtration ℱ is called a

filtered probability space.

Definition 13 A stochastic process � = ��, ��, ⋯ , �E on ��, ℱ, � is said to

be adapted to the filtration ℱ if �Z is ℱZ-measurable for _ ≥ 0. It means that the

stochastic process at time _ depends only on the past information imposed by ℱZ. This

property is known as a nonanticipativity constraint. Moreover, any stochastic process � is adapted to its natural filtration (i.e., its history process).

34

For instance, in the previous example suppose that the σ-algebras ℱZ are

generated by the possible scenarios (i.e., atoms) associated with the nodes pertaining

to each stage of the tree. Consequently, the filtration ℱ = �ℱ�, ℱ�, ℱ� comprises:

ℱ� = "��J�, J�, JK, JL, JM, JN, JO� = "�� ℱ� = "��J�, J��, �JK, JL, JM�, �JN, JO�

ℱ� = "��J��, �J��, �JK�, �JL�, �JM�, �JN�, �JO� = h��

Figure 4 – Link between a Filtration and a Scenario Tree

As _ increases, additional information is revealed according the structure

imposed by the scenario tree and the natural filtration of the underlying stochastic

process. There is one-to-one relationship between them.

For multistage problems in particular, this information structure is because

crucial. Although two stochastic process may exhibit the same discrete distribution at

the final stage (i.e., the same scenarios and unconditional probabilities), they might not

reveal information in the same way, which may lead to different optimal solutions. To

35

see that, let us consider another example. Suppose we have the following scenario

trees A and B:

Figure 5 – Scenario Tree Comparison – Tree A

36

Figure 6 – Scenario Tree Comparison – Tree B

Both scenario trees lead to the same multivariate discrete distribution (i.e.,

same realizations with identical unconditional probabilities) at the final stage H = 2.

However, the underlying stochastic processes related to trees A and B are different.

To see that, just consider the node D = 1 in both scenario trees. On scenario tree A,

node 1 leads to one single outcome and, consequently, there is no uncertainty

regarding the expected scenario at the stage H (i.e., the realized scenario is

acknowledged before stage H). Conversely, on scenario tree B, node 1 leads to three

possible scenarios. Therefore, the realized scenario is not known until an event

(conditional on node 1) at stage H materializes. In that case, the optimal solutions for

the underlying stochastic processes represented by scenario trees A and B are likely

to be different (depending on values sitting on each node). Hence, any suitable optimal

discretization process should take into account the (dis)similarity between the

information structures (i.e., the natural filtrations) pertaining to the original and

approximated scenarios (i.e., discrete stochastic processes).

Another famous example in the literature highlighting the importance of the

information structure (i.e., the way information is revealed as the process evolves)

involves a fair coin tossed three times. In this example, one dollar is paid if (A) head

appears at least two times or if (B) head appears at the last toss. The following scenario

trees represent these payoffs.

37

Figure 7 – Scenario Tree for Coin Bet A

Figure 8 – Scenario Tree for Coin Bet B

38

After the last toss, the payoff distributions for both bets are equal but that the

payoff for bet A can be anticipated (i.e., can be revealed before the last toss) for some

scenarios. Therefore, earlier tosses might provide additional (and relevant) information

for bet A. Accordingly, the information structures of these bets are not the same and

any optimal decision based on them should consider this dissimilarity.

These examples show that the evolution of information disclosure is

paramount to any decision process. Obviously, we also require the content (i.e., the

values) of the information disclosed since decisions are based on both the possible

values and their (conditional) probabilities.

Fortunately, a scenario tree can describe both the information process and the

value process. The information (or tree) process /, related to the filtration ℱ,

corresponds to the tree structure that establishes how information is revealed. The

discrete value process � corresponds to the value or vectors sitting in the tree nodes

that comprise the revealed content. The pair ��, / is called (scenario) value-and-

information pair.

Definition 14 A tree process is a stochastic process / = �/�, /�, ⋯ , /E with

values on some (Polish) state space IZ, _ = 0, ⋯ , H, such that:

(i) I� C^ i ^CDjkY_lD

(ii) IZ iXY WiCXmC^Y 8C^nlCD_

(iii) "�/Z = "�/�, ⋯ , /Z , _ = 0, ⋯ , H

(iv) ℱ = "�/ = o"�/� , "�/� , ⋯ , "�/E p is a filtration

A tree process / = �/�, /�, ⋯ , /E can be constructed using the history process

of any stochastic process � = ��, ��, ⋯ , �E :

/� = ��

/� = ��, �� ⋮

/E = ��, ��, ⋯ , , �E

Assuming that all state spaces IZ of the tree process / are Polish and given

general Borel sets rZ, _ = 1, ⋯ , H, Shiryaev [283] demonstrates that we can compute

39

the joint distribution � of the tree process / = �/�, /�, ⋯ , /E using its conditional

probabilities:

��r� = ��/� ∈ r�

��r�|s� = ��/� ∈ r�|/� = s�

⋮ �Z�rZ|s�, ⋯ , sZ:� = ��/Z ∈ rZ|/� = s�, /� = s�, ⋯ , /Z:� = sZ:�

In addition, Shiryaev [283] also shows that given a filtration ℱ = "�/ , a

stochastic process � adapted to ℱ and a measurable function 4Z, we have:

�Z = 4Z�/Z ( 14 )

Finally, a tree process induces a probability distribution � on IZ. Therefore, if

we choose � = IE, we get a filtration ℱ = "�/ = o"�/� , ⋯ , "�/E p = �ℱ�, ⋯ , ℱE on �, which leads to a filtered probability space ��, ℱ, � .

Definition 15 Given two tree processes / and /t defined on possibly different

filtered probability spaces ��, ℱ, � and o�t, ℱt, �tp, Borel sets rZ, _ = 1, ⋯ , H and

conditional probabilities such that:

�Z�rZ|s�, ⋯ , sZ:� = ��/Z ∈ rZ|/� = s�, /� = s�, ⋯ , /Z:� = sZ:� , _ = 1 ⋯ , H ( 15 )

�tZ�rZ|s�, ⋯ , sZ:� = �t�/Z ∈ rZ|/� = s�, /� = s�, ⋯ , /Z:� = sZ:� , _ = 1 ⋯ , H ( 16 )

The tree processes are equivalent if:

(i) There are bijective functions u�, ⋯ , uE mapping the state spaces FZ of /Z to the state spaces FvZ of /tZ such that the processes �/�, ⋯ , /E and �u�:�o/t�p, ⋯ , uE:� �/tE have the same distribution. (ii) �t is the pushforward measure of � for the measurable functions uZ.

�t��r� = �� wu�:�orx�py

�tZ�rZ|s�, ⋯ , sZ:� = �Z wuZ:�orxZpzu�:�ost�p, ⋯ , uZ:�:� ostZ:�py , _ = 2, ⋯ , H

40

Figure 9 – Pushforward Measure of P

Finally,

Definition 16 Given and tree process Y and a value process X such that X is

adapted to the filtration ℱ generated by "�/ , the pair ��, / is called scenario value-

and-information pair.

Definition 17 Given two scenario value-and-information pairs ��, / and o�t, /tp

such that �Z = 4Z�/Z and �tZ = 4xZo/tZp, the pairs are equivalent if:

(i) Tree processes / and /t are equivalent (in the sense of Definition 15). (ii) 4xZ = 4Z ∘ uZ:� a.s. for the equivalent mappings uZ. These definitions indicate that scenario value-and-information pairs and

scenario trees are interchangeable. Consequently, given a multi-period stochastic

optimization problem and two equivalent scenario value-and-information pairs, we

should achieve similar solutions.

In addition, as we discuss later, a scenario value-and-information pair can also

described by a nested distribution, a concept proposed by Pflug [243] and further

discussed by Pflug and Pichler [245]. The nested distribution and the related process

distance concepts are particularly useful when dealing with approximations of

stochastic processes rather than probability measures (i.e., approximations in multi-

period settings rather single-period settings) because they do not require two separate

41

measures of dissimilarity (i.e., one for distributions and another for filtrations) to govern

the quality of the approximation.

However, before we delve into how we can perform such multi-period

approximations using process distances, in the next subsections we cover the

theoretical background required to simulate a (usually large) multivariate scenario tree

that properly describes real financial series. They discuss how to (i) elaborate and

calibrate a combined econometric model capable of simulating the four major financial

stylized facts discussed earlier: heavy-tails, volatility clustering, leverage effect and tail

co-dependence, and (ii) generate a large set of multivariate trajectories of log returns,

represented by a scenario fan, based on this combined econometric model.

3.1.1 GARCH Models

As we discussed in the previous section, Generalized Autoregressive

Conditional Heteroscedastic (GARCH) models are particularly appealing because they

can explain reasonably well key stylized facts of financial series like volatility clustering,

leverage effects and, to some extent, heavy-tails.

Despite of some quite interesting nonlinear and regime-switching extensions

proposed recently, in this study we focus on simpler asymmetric GARCH models that

are more tractable and still can properly accommodate these key statistical features.

A simple GARCH (p,q) process entails:

|Z = XZ − }Z:�&XZ* ( 17 )

|Z = ~Z"Z ( 18 )

}&|Z* = 0, }&|Z|Z:]* = 0 for ^ ≠ 0

"Z� = i� + c i�|Z:�� + c ��"Z:��

�� ( 19 )

i� > 0, i� ≥ 0, n = 1 ⋯ �, �� ≥ 0, C = 1 ⋯ W

c i��

�� + c ��

�� < 1

42

Where �XZ� is a series of daily log returns, �~Z� is a white noise sequence and "Z� is the conditional variance at time _. The restrictions in equation ( 19 ) ensure that

the conditional variance is positive and that the unconditional variance is finite:

�iX�|Z = }&|Z�* = i�1 − w∑ i�� + ∑ �� y ( 20 )

The conditional mean }Z:�&�Z* is usually specified as a low order ARMA model.

For daily log returns in particular, empirical evidence suggests that serial correlation in

raw returns is often statistically insignificant leading to no relevant autoregressive term

in the conditional mean specification.

Using a basic GARCH (1,1) formulation as a reference:

"Z� = i� + i�|Z:�� + ��"Z:��

i� > 0, i� > 0, �� > 0

i� + �� < 1

( 21 )

We can see how volatility clustering and, to some extent, heavy-tails can be

explained. According to Bauwens, Hafner and Laurent [20], most estimates of the

coefficients i� and �� lie, respectively, in the ranges [0.02 0.05] and [0.75, 0.98]. We

notice that for these values of i� and ��, a large value "Z:�� prompts a large value "Z�

and a small value of "Z:�� induces a small value "Z�, provided |Z:�� is not too large.

Consequently, the volatility clustering observed in daily financial returns can be

reasonably explained by estimating appropriate values for i� and ��. Besides, provided 1 − 2i�� − �i� + �� > 0 and assuming that ~Z follows a Gaussian distribution, He [131]

shows that

�L = }&|ZL*�}&|Z�* � = 3 � &1 − �i� + �� *1 − �i� + �� − 2i�� > 3 ( 22 )

That means that a GARCH(1,1) process also induces a leptokurtic

unconditional distribution, i.e., an unconditional distribution that exhibits heavier tails

than those suggested by a Gaussian distribution. Unfortunately, assuming Gaussian

innovations, the induced unconditional kurtosis is not large enough to fully explain the

tail behaviour of daily financial returns. There are, however, alternatives: (i) we can

assume innovations with Student t-distributions, as recommended by Bollerslev [35],

or Generalized Error Distributions (GED), as suggested by Nelson [232] or; (ii) we can

43

combine GARCH models with Extreme Value Theory (EVT) as proposed by McNeil

and Frey [213]. We shall explore the latter alternative in the next section.

Parameters of GARCH models are often estimated using Maximum-Likelihood

Estimation (MLE) or Quasi-Maximum-Likelihood Estimation (QMLE), depending on the

choice of the innovation distribution. Unfortunately, as stressed in Andersen et al. [7],

GARCH log-likelihood functions are often ill-behaved and poor choices of starting

values can lead to convergence problems. Moreover, constraints such as positive

variance can compromise optimization performance. As a consequence, simpler

specifications are often preferred in practical applications.

These simpler specifications comprise, for example, early formulations of

asymmetric GARCH models like the EGARCH and GJR-GARCH models. They

are also widely used because they can explain a third important stylized fact: the

leverage effect.

The EGARCH (p,q) process substitutes equation ( 19 ) for

ln "Z� = i� + c �i� ��|Z:��"Z:� − } ��|Z:��"Z:� �� + 8� |Z:�"Z:�� + c �� ln "Z:��

�� ( 23 )

The additional standardized terms can capture potential asymmetric impacts

of innovations. For instance, if 8� is negative, bad news has a larger impact on volatility.

Besides, the logarithmic transformation guarantees positive variance and relaxes the

positive constraint requirements. On the other hand, EGARCH processes might lead

to infinite variance under non-Gaussian innovations (for example, with Student-t

innovations) as underlined by Nelson [232]. In addition, conditional variance’s

forecasts are biased according Jensen’s inequality

}&"Z�* ≥ Y�� On the other hand, the GJR-GARCH formulation substitutes equation ( 19 ) for

"Z� = i� + c i�|Z:�� + c ��"Z:��

�� + c 8��o|Z:� < 0p|Z:�� ( 24 )

i� > 0, i� ≥ 0, 8� ≥ 0, n = 1 ⋯ �, �� ≥ 0, C = 1 ⋯ W

Where ��. is an indicator function such that

44

�o|Z:� < 0p = �1 C4 |Z:� < 00 C4 |Z:� ≥ 0 ( 25 )

Notice that potential asymmetric impacts on the conditional variance are

explained by the last term. In this specification, we need to impose restrictions on the

coefficients to ensure a positive conditional variance. Moreover, GJR-GARCH

processes have more stringent stationarity conditions than EGARCH process, as

discussed by Ling and McAleer [192].

For example, a GJR-GARCH (1,1) process is stationary and has finite variance

provided

i� + �� + 8�2 < 1 ( 26 )

Moreover, the fourth moment exists if

�� + 2i�� + ��i�� + ��8� + �� i�8� + 8��2 � < 1 ( 27 )

�� = � 3 C4 ~Z~F�0,1 3 � − 2� − 4 C4 ~Z~_�0,1, �

To the best of our knowledge, there are few studies comparing performance

of EGARCH and GJR-GARCH models. Engle and Ng [96], and Peters [240]

recommend the GJR-GARCH over the EGARCH specification while Alberg, Shalit and

Yosef [1] suggests otherwise. Without loss of generality, we follow the former and focus

on the GJR-GARCH formulation in our analysis.

3.1.2 Extreme Value Theory

Although some models of the GARCH family can adequately emulate

important key stylized facts such as volatility clustering and leverage effects, they are

often still unable to fully model heavy-tailed distributions. To accomplish that, an

additional technique such as Extreme Value Theory (EVT) may be used to handle tails

properly, where very few observations are usually available.

45

EVT encompasses two major approaches to model extreme returns: Block

Maxima (BM) and Peaks over Threshold (POT) that use, respectively, two fundamental

results from EVT. We briefly discuss both approaches next.

Block Maxima Method

The central result underlying block maxima methods is the Fisher-Tippett-

Gnedenko theorem, initially proposed by Fisher and Tippett [104], and fully proved by

Gnedenko [114]. It shows that, under certain conditions, the asymptotic distribution for

the series of maxima (minima) converges to the Weibull, Gumbel or Fréchet

distributions, independently of the distribution of the series.

Theorem 1 (Fisher-Tippett-Gnedenko theorem) Let �X�, ⋯ , X0� be a sequence

of independent and identically distributed random variables with common distribution

function 3, �0 = [i5� � 0�X�� the maximum order statistic and ��0 ≤ 5 = ��X� ≤5, ⋯ , X0 ≤ 5 = &3�5 *0. If there exist constants "0, #0 ∈ ℝ, "0 > 0 and a non-

degenerate distribution function ¡ such that:

lim0→� � ¢�0 − #0"0 ≤ 5£ = lim0→�&3�"05 + #0 *0 =¡�5 ( 28 )

Then ¡ converges in distribution to one of the three limiting families:

Gumbel (or Type I): ¡�5 = Y:¤@¥ , −∞ < 5 < +∞ ( 29 )

Fréchet (or Type II): ¡�5 = �Y:��\¦9 @>§ C4 5 > − 1�0 l_ℎYXmC^Y ( 30 )

Weibull (or Type III): ¡�5 = �Y:��\¦9 @>§ C4 5 < − 1�1 l_ℎYXmC^Y ( 31 )

In other words, the Fisher-Tippett-Gnedenko theorem shows that if 3 belongs

to the maximum domain of attraction (MDA) of ¡ for some non-degenerate distribution

function ¡ then ¡ must converge weakly to one of these distributions of properly

normalized maxima. Moreover, these limiting distributions are particular cases of the

Generalized Extreme Value (GEV) distribution (Figure 10), introduced by von Mises

[221] and Jenkinson [157]:

46

¡¦�5 = ©Y:��\¦9 @>§ C4 � ≠ 0Y:¤@¥ C4 � = 0 such that 1 + �5 > 0 ( 32 )

Figure 10 – Generalized Extreme Value Distributions

The shape parameter � determines the corresponding limiting distribution for

the tail and, therefore, its thickness. It is also common to refer to the tail behaviour

using the tail index � = �:�. If � < 0, ¡¦ converges to the Weibull family that includes

distributions with finite support like the beta distribution. If � = 0, ¡¦ converges to the

Gumbel family that encompasses distributions like the normal, log-normal and

hyperbolic whose right tails exhibit an exponential decay (i.e., thin-tailed behaviour).

Finally, if � > 0, ¡¦ converges to the Fréchet family that comprehends, for example,

Student t, Pareto and stable distributions whose right tails follow a polynomial decay

(i.e., heavy-tailed behaviour).

Given that daily financial series of returns typically display heavy tails as

empirical research suggests, the Fréchet distribution seems a suitable candidate to

model the asymptotic behaviour of their tails, even if we do not know, as is usually the

case, the actual distributions of the financial series.

As we do not know a priori the normalized maxima 5 required by ¡¦�5 , we

need to identify, besides the shape parameter �, the scale " and location # parameters.

Differently from the former, the latter two often depend on the distribution of the

financial series. Therefore, we use an alternative definition for the GEV distribution:

47

¡¦�� = ¡¦,�,ª�5 = �Y:w�\¦¥@«¬ y@>§ C4 � ≠ 0Y:¤@¥@«¬ C4 � = 0 such that 1 + � 9:ª� > 0 ( 33 )

Then, we can apply a parametric approach like maximum-likelihood to

estimate these parameters as suggested by Embrechts, Klüppelberg and Mikosch [90]

and Tsay [301]. First, we divide the sample in [ non-overlapping subperiods of length u. Then we select the maximum return in each subperiod and construct a sequence of

maximal returns ��...d, assumed to have a ¡¦�5 distribution. Supposing that the

sequence of maximal returns is independent and identically distributed, the likelihood

function is:

k��, ⋯ , �d|�, ", # = f ℎ¦,�,ª�� d�� ( 34 )

ℎ¦,�,ª�� =®®°1" ±1 + � �� − #" ²:�\¦¦ Y:w�\¦³?:ª� y@>§ C4 � ≠ 01" Y:³?:ª� :¤@´?@«¬ C4 � = 0 ( 35 )

And, consequently, the log-likelihood function is:

µ��, ⋯ , �d|�, ", # = c kD�ℎ¦,�,ª�� d�� ( 36 )

ℎ¦,�,ª�� =®®°− ln " − 1 + �� ln ±1 + � �� − #" ² − ±1 + � �� − #" ²:�¦ C4 � ≠ 0

− ln " − �� − #" − Y:³?:ª� C4 � = 0 ( 37 )

Embrechts, Klüppelberg and Mikosch [90] and Hosking [143] discuss potential

numerical procedures to find the maximum-likelihood estimates for the shape, scale

and location parameters of the GEV distribution. Based on Smith [284], we can show

that these estimates are consistency and asymptotic efficiency as long as � > − 1 2⁄ .

Although straightforward, the previous parametric approach relies on the

strong simplifying hypothesis that extreme returns follow precisely a GEV distribution.

A more realistic approach, for example, would be to assume that extreme returns follow

approximately a GEV distribution. Unfortunately, this alternative semi-parametric

48

method is more involved. We refer to Embrechts, Klüppelberg and Mikosch [90] for

further details regarding this approach.

The traditional extreme value approach based on the limiting GEV distribution

is rather simple and extensively used but it presents some shortcomings, as pointed

out by Embrechts [89], Tsay [301] and Gilli and Këllezi [113]. First, it does not use all

available data efficiently, i.e., it uses only maxima from (possibly sizable) subsamples.

Second, estimation results might vary greatly depending on the number of subperiods [ and the subperiod sample size u and it is not clear how to identify, supposing it

exists, a suitable trade-off between [ and u. Lastly, the Fisher-Tippett-Gnedenko

theorem holds if financial series are (almost) independent and identically distributed or

if the sample size u is excessively large so that the block maxima sequence can be

assumed to be independent. So we need to handle these limitations before we can

apply this approach to financial series. The technique discussed next avoids the first

and second problems while the last one is handled in a later section, where we

combine GARCH, EVT and copula methods to properly simulate daily multivariate

return series.

Peaks over Thresholds Method

As an alternative to block maxima approach, Davison and Smith [63] and Smith

[285] introduced an alternative approach focused on the exceedances beyond a given

(large) threshold, also known as Peaks over Threshold (POT), that does not exhibit

some of the shortcomings of the traditional approach.

Rather than focusing on the distribution function 3�5 of the independent and

identically distributed returns, this method focuses on the conditional excess

distribution 3·�� over a sufficiently large threshold u (i.e., on the exceedances

distribution) where � = 5 − ¸, 5 > ¸. 3·�� is given by:

3·�� = �� ≤ � + ¸|� > ¸ = �� ≤ � + ¸, � > ¸ �� > ¸ ⇒

3·�� = ��¸ < � ≤ � + ¸ 1 − �� ≤ ¸

3·�� = 3�� + ¸ − 3�¸ 1 − 3�¸ ( 38 )

Or, in terms of the tail’s distribution 3¹ = 1 − 3:

49

�� > � + ¸|� > ¸ = 1 − 3·�� = 1 − 3�� + ¸ − 3�¸ 1 − 3�¸ ⇒

1 − 3·�� = 1 − 3�¸ − 3�� + ¸ + 3�¸ 1 − 3�¸ ⟹

1 − 3·�� = 1 − 3�� + ¸ 1 − 3�¸ ⟹

3¹·�� = 3¹�� + ¸ 3¹�¸ ⟹

3¹�� + ¸ = 3¹�¸ 3¹·�� ( 39 )

So, we can estimate the tail distribution 3¹�� + ¸ by individually estimating 3¹�¸ and 3¹·�� . The fundamental result underlying the estimation of 3¹·�� was

proposed by Pickands [249], and Balkema and de Haan [14]. Basically, it shows that

under certain conditions the conditional distribution 3·�5 converges to a Generalized

Pareto Distribution (GPD) (Figure 11) where

º¦,��· �� = »1 − w1 + � ¼��· y:>§ C4 � ≠ 01 − Y: ½¬�¾ C4 � = 0 where ¿ > 0 ( 40 )

Figure 11 – Generalized Pareto Distributions

With support 5 ≥ # when � ≥ 0 and 0 < 5 ≤ − ��· ¦ when � < 0. Similarly to

GEV distribution, the GPD can be defined by a shape parameter �, a scale parameter " and a location parameter # (here assumed to be zero).

50

Theorem 2 (Pickands-Balkema-de Haan theorem) Let �X�, ⋯ , X0� be a

sequence of independent and identically distributed random variables with common

distribution function 3 and common conditional distribution 3· over a sufficiently large

threshold ¸. If 3 ∈ �Àro¡¦p there is a positive measurable function "�¸ such that:

lim·→9Á ^¸W� 9Â9Á:·�3·�5 − º¦,��· �5 � = 0 ( 41 )

Where 5Ã is the right endpoint (finite or infinite) of the distribution 3.

We observe that, while the GEV distribution ¡¦,�,ª describes the limiting

distributions of normalized maxima of a distribution 3 that belongs to the �Àro¡¦p, the

GPD º¦,��· describes the respective limiting distribution of exceedances beyond a

given sufficiently large threshold. Incidentally, the shape parameter � is the same for

both distributions.

If � < 0, º¦,��· converges to distributions whose right tails are finite like the

beta distribution. If � = 0, º¦,��· converges to distributions whose right tails exhibit an

exponential decay like the normal. Lastly, if � > 0, º¦,��· converges to distributions

whose right tails follow a polynomial decay like the Student t or Pareto distributions.

Consequently, a GPD with positive shape parameter � seems a likely candidate to

model the asymptotic behaviour of the conditional tails of daily return series.

To find 3¹·�� , we start by applying Theorem 1:

lim0→�&3�"05 + #0 *0 =¡¦�5 ⟹

lim0→�&1 − 3¹�"05 + #0 *0 = ¡¦�5 ⟹

lim0→� D ln&1 − 3¹�"05 + #0 * = kD ¡¦�5 ( 42 )

Defining a sequence of thresholds ¸0�5 = "05 + #0 and assuming, for large

values of ¸0�5 such that 3¹o¸0�5 p ≈ 1, a first-order Taylor series approximation:

ln�1 − 3¹o¸0�5 p� ≈ −3¹o¸0�5 p ⟹

lim0→� D ln&1 − 3¹�"05 + #0 * ≈ lim0→�D�−3¹o¸0�5 p� ≈ kD ¡¦�5 ⇒

lim0→�D3¹o¸0�5 p ≈ −kD ¡¦�5 ( 43 )

Finally, for D sufficiently large:

51

D3¹�¸ ≈ −kD ¡¦ w¸ − #" y⟹

3¹�¸ ≈ − 1D ln ¡¦ w¸ − #" y ⟹

3¹�¸ ≈ °− 1D ln Y:w�\¦·:ª� y@>§ C4 � ≠ 0

− 1D ln Y:¤@¾@«¬ C4 � = 0 ⟹

3¹�¸ ≈ °1D w1 + � ¸ − #" y:�¦ C4 � ≠ 01D Y:·:ª� C4 � = 0 ( 44 )

3¹�� + ¸ ≈®®°1D ¢1 + � � + ¸ − #" £:�¦ C4 � ≠ 01D Y:¼\·:ª� C4 � = 0 ( 45 )

Using these results, we can estimate 3¹·�� by recalling that:

3¹·�� = 3¹�� + ¸ 3¹�¸ ⟹

3¹·�� = 1D w1 + � � + ¸ − #" y:�¦1D w1 + � ¸ − #" y:�¦ ⟹

3¹·�� = Å1 + � � + ¸ − #"1 + � ¸ − #" Æ:�¦ ⟹

3¹·�� = �" + �� + ¸ − # " + ��¸ − # �:�¦ = ¢1 + ��" + ��¸ − # £:�¦ ( 46 )

Alternatively, we could find 3¹·�5 based on Theorem 2:

kC[·→9Á ^¸W� 9Â9Á:·�3·�5 − º¦,��· �5 � = 0 ⟹

kC[·→9Á ^¸W� 9Â9Á:·�1 − 3·�5 − 1 + º¦,��· �5 � = 0 ⟹

52

kC[·→9Á ^¸W� 9Â9Á:·�3¹·�5 − º¦,��· �5 � = 0 ( 47 )

As a result, for ¸ sufficiently large, we can assume that:

3¹·�5 ≈ º¦,��· �5 ( 48 )

Lastly, supposing that we have a sufficiently large number F· of observations

greater than the threshold ¸, as proposed by Embrechts, Klüppelberg and Mikosch

[90], we can consider the empirical approximation:

3¹�5 ≈ 3¹0�5 = F·D ( 49 )

Using these results, we can estimate 3¹�� + ¸ by:

3¹�� + ¸ = 3¹·�� 3¹·�� ⟹

3¹�� + ¸ = F·D w1 + � � − ¸" y:�¦ ( 50 )

Then, we can use a maximum-likelihood approach to estimate the shape and

scale parameters � and " of the tail distribution 3¹�� + ¸ , as discussed by Embrechts,

Klüppelberg and Mikosch [90] and Hosking and Wallis [146].

As pointed out by McNeil, Frey and Embrechts [92], the POT approach uses

the data more efficiently than the BM approach but it still present two major challenges:

(i) appropriate choice of thresholds and (ii) unreliable results when return series exhibit

serial correlation.

The first challenge refers to the trade-off between variance and bias as

discussed by McNeil [211], and McNeil and Saladin [212]. A large threshold ¸ means

low bias because fewer observations outside the tail are selected to estimate the GPD

parameters. Conversely, it also means higher variance because the tail itself contains

fewer observations, which increases the estimates’ variance. Castillo and Hadi [48]

further emphasizes that the threshold ¸ must select significantly large quantiles

(preferably above 0.99) so we can apply the Pickands-Balkema-de Haan theorem.

The most two common approaches to choose suitable thresholds are the Hill

estimator plot discussed by Hill [139], Resnick and Stărică [261] and the empirical

mean excess function plot recommended by Embrechts, Klüppelberg and Mikosch

[41].

53

Following McNeil, Frey and Embrechts [214], using the order statistics �0,0 ≤⋯ ≤ ��,0 and assuming a large sample size D and sufficiently small number of

extremes u, beyond the threshold ¸, we can employ the standard form of the Hill

estimator:

�Èb,0 = É1u c kDo��,0p − kDo�b,0pb�� Ê:� , 2 ≤ u ≤ D ( 51 )

To plot estimates for different sets of order statistics (i.e., different values of u)

and look for a stable region in the Hill estimator plot, where we can assume that �È:�~�Ë. Similarly, we can also use the empirical mean excess function plot as

described by McNeil, Frey and Embrechts [214] to identify appropriate thresholds.

Embrechts, Klüppelberg and Mikosch [90] shows that a key property of the GPD is that

the mean excess function is linear in the threshold ¸, provided ¸ is sufficiently large.

Thus, we can use the empirical estimator of the mean excess function:

Y0�¸ = ∑ �� − ¸ �Ì?Í·0��∑ �Ì?Í·0�� ( 52 )

�Ì?Í· = �1, C4 �� > ¸0, C4�� ≤ ¸

To plot estimates of the empirical mean excess function for different values of ¸. If the distribution tail follows a GPD with positive shape parameter �, the plot should

exhibit a linear upward trend for satisfactorily higher threshold values.

The second challenge requires that we handle dependence before employing

the POT approach. Resnick and Stărică [262], for instance, suggests using

standardized residuals from ARMA processes to mitigate this problem. McNeil and

Frey [213] extends the idea by using the standardized residuals from GARCH

processes. In the next section, we develop the latter to account for leverage effects

and tail co-dependence.

Lastly, we can also extend the POT approach previously described to a point

process framework, i.e., rather than handling the peaks over threshold and the

occurrence times as separate processes, we can combine them into a point process

of exceedances. We do not explore this alternative in this work and refer to Resnick

[260] for further details.

54

This brief review suggests that EVT provides an elegant way to deal with

univariate extreme returns. However, it is not easily extended to a multivariate

framework because there is no longer a unique definition for an extreme events (i.e.,

we may entertain different concepts of order). Fortunately, as we discuss next copulas

provide an interesting alternative to handle multivariate heavy-tailed distributions.

3.1.3 Copulas

Although Pearson’s correlation is likely one of the most applied measure of

dependence in practical finance models, it is not always able to describe properly how

close assets move together. Financial series often display higher (in absolute terms)

dependence during turbulent periods than in calmer times. In other words, linear

measures such as Pearson’s correlation cannot deal appropriately with the nonlinear

dependence structures exhibited by financial markets, particularly in the tails of the

multivariate distribution.

As we mentioned earlier some elaborated multivariate GARCH models, like

the DCC proposed by Engle [94] and TVC suggested by Tse and Tsui [302], are

capable of accounting for time-varying dependence. However, they often impose

restrictions that may reduce the flexibility to accommodate stylized facts of individual

series.

An alternative to explain and, more interestingly, simulate the dependence

structure observed among financial series are copulas, proposed by Sklar [282]. In

basic terms, copula are links between multivariate distributions and their marginal

distributions. They provide greater flexibility to (i) model particular properties of each

univariate series and (ii) select a suitable dependence structure for their joint

behaviour. Besides, they also offer a straightforward method to simulate draws from

joint distributions, once we estimate the marginal distributions and copulas’

parameters.

According Nelsen [231] and McNeil, Frey and Embrechts [214], an n-

dimensional copula is simply a multivariate uniform distribution Î�¸�, ⋯ , ¸0 such that Î: &0,1*0 → &0,1*, which exhibits the following properties:

(i) Î�¸�, ⋯ , ¸0 is grounded and n-increasing for all ¸� ∈ &0,1*, C = 1, ⋯ , D.

55

(ii) Î�1, ⋯ , ¸�, ⋯ ,1 = ¸� for all ¸� ∈ &0,1*, C = 1, ⋯ , D.

(iii) Î�¸�, ⋯ , ¸�:�, 0, ¸�\�, ⋯ , ¸0 = 0 for all ¸� ∈ &0,1*, C = 1, ⋯ , D.

In addition, the key result underlying the copula approach is the Sklar theorem,

which let us decouple the modelling of the marginal distributions from the dependence

structure. It states that we can either extract a copula from a multivariate distribution

function or combine marginal distributions to a copula to construct a multivariate

distribution function.

Theorem 3 (Sklar theorem) Let 3�3�, ⋯ , 30 be an n-dimensional distribution

function with continuous margins 3�, ⋯ , 30. Then there exists a unique copula Î: &0,1*0 → &0,1* with uniform margins such that

3�5�, ⋯ , 50 = Îo3��5� , ⋯ , 30�50 p ( 53 )

Another important result is the Fréchet–Hoeffding theorem, which defines the

lower and upper limits of a particular copula. This is important because some copulas

can describe only a certain range of dependence.

Theorem 4 (Fréchet–Hoeffding theorem) Let Î be an n-dimensional copula.

Then for any �¸�, ⋯ , ¸0 ∈ &0,1*Ï we have

max É1 − D + c ¸�0

�� ; 0Ê ≤ Î�¸�, ⋯ , ¸0 ≤ min�¸�, ⋯ , ¸0 ( 54 )

Given these results, the next step involves selecing an appropriate copula to

describe the dependence structure. As we pointed out previously, in this study we use

a Student t copula, from the elliptical family, to accommodate tail dependence.

Although elliptical copulas impose tail symmetry and do not offer a closed form, they

are tractable in higher dimensions. Conversely, other families like Archimedean and

vine copulas usually offer more flexibility at the cost of tractability. Even t copulas can

be extended to more elaborated models such as the skewed t or grouped t copulas,

as discussed by Demarta and McNeil [64], but these extensions are less tractable as

well.

The Student t copula ÎÓ,ÔZ is easily defined in terms of an n-dimensional

multivariate standardized Student t distribution function HÓ,Ô and a univariate

standardized Student t distribution function _Ô. Given the correlation matrix Õ, the

degrees of freedom � and the gamma function Ö�∙ , we have:

56

HÓ,Ô�5�, ⋯ , 50 = HÓ,Ô�5 = 6 ⋯ 6 Γ w� + D2 y |Õ|:��Γ w�2y ��Ù Ú� �1 + 5EÕ:�5� �:Ô\0� 859B

:�9>

:� ( 55 )

_Û�5� = ¸� , ¸� ∈ &0,1* and C = 1, ⋯ , D ( 56 )

ÎÓ,ÔZ �¸�, ⋯ , ¸0; Õ, � = ÎÓ,ÔZ �¸; Õ, � = HÓ,Ôo_Ô:��¸� , ⋯ , _Ô:��¸0 p = HÓ,Ô�� ( 57 )

Additionally, we can express the corresponding density function ÝÓ,ÔZ as:

ÝÓ,ÔZ �¸�, ⋯ , ¸0; Õ, � = |Õ|:�� Γ w� + D2 y ÞΓ w�2yß0ÞΓ w� + D2 yß0 Γ w�2y

¢1 + �EÕ:�� £:Ô\0�

à∏ ¢1 + &_Ô:��¸� *�� £:Ô\��0�� â ( 58 )

Following Bouyé et al. [39] and Cherubini, Luciano and Vecchiato [51], for a

vector of parameters ã, ä observations and a log-likelihood function k�ã such that:

k�ã = c ln ÝÓ,ÔZ o3�o5�bp, ⋯ , 30�50b ; ãpåb�� + c c ln 4�o5�b; ã�pÚ

��å

b�� ( 59 )

k�ã = c ln ÝÓ,ÔZ o¸�b, ⋯ , ¸0b; ãpåb�� ( 60 )

k�ã ∝ − ä2 ln Õ − � + D2 c ln �1 + �EÕ:�� åb�� + � + 12 c c ln �1 + &_Ô:��¸� *�� Ú

��å

b�� ( 61 )

We can use a parametric method such as the Exact Maximum Likelihood

(EML) to estimate jointly both the dependence structure and the marginal parameters.

Unfortunately, the EML method is computationally intensive for high dimensional

distributions.

Alternatively, we can use a two-step approach proposed by Joe [160] and

Genest, Ghoudi and Rivest [106] that first estimates the individual parameters of the

marginal distributions and then estimates the copula parameters. Joe [160] method,

called Inference Functions for Margins (IFM), assumes that the univariate marginal

distributions are known and that we can start by separately estimating their parameters ãç�:

57

k�ã = c ln ÝÓ,ÔZ o3�o5�b; ã�p, ⋯ , 30�50b; ã0 ; Jpåb�� + c c ln 4�o5�b; ã�pÚ

��å

b�� ( 62 )

ãç� = arg maxê? c ln 4�o5�b; ã�påb�� ( 63 )

Then, we use these values to estimate the parameters of the dependence

structure:

Jë = arg maxì c ln ÝÓ,ÔZ o3�o5�b; ãç�p, ⋯ , 30�50b; ã� ; Jpåb�� ( 64 )

Conversely, Genest, Ghoudi and Rivest [106] method, known as Canonical

Maximum Likelihood (CML) or Pseudo Maximum Likelihood, does not assume

parametric distributions for the margins. Instead, it relies on the univariate empirical

cumulative distribution functions 3ç� to generate uniform variates È�b, which are

subsequently used to estimate the copula parameters:

Jë = arg maxì c ln ÝÓ,ÔZ o È�b, ⋯ , È0b; Jpåb�� ( 65 )

For large samples, both methods yield similar results but for a small number

of observations Kim, Silvapulle and Silvapulle [170] strongly recommends the CML

approach. Their study suggests that IFM method is not robust against

misspecifications of the marginal distributions and that, consequently, the CML method

generally performs better in practical applications, where margins are often unknown.

3.1.4 Simulating a Multivariate Scenario Tree

Once we have estimated the copula parameters and the marginal distributions,

we have a powerful tool to jointly simulate the daily financial series. Based on Sklar

theorem, given an n-dimensional copula ÎÓ,ÔZ �¸�, ⋯ , ¸0 and a multivariate distribution 3�5�, ⋯ , 50 with margins 3��5� , ⋯ , 30�50 , such that:

3�5�, ⋯ , 50 = ÎÓ,ÔZ o3��5� , ⋯ , 30�50 p ( 66 )

We have:

58

¸� = 3��5� ⟹ 5� = 3�:��¸� , C = 1, ⋯ , D ( 67 )

Therefore, we simply need to generate n dependent uniform variates.

According Cherubini, Luciano and Vecchiato [51], the general iterative approach to

accomplish that involves the conditional copula distribution function:

Î�¸�|¸�, ⋯ , ¸�:� = 6 Ý�7|¸�, ⋯ , ¸�:� 87 ⟹·?�

Ý�7|¸�, ⋯ , ¸�:� = Ý�¸�, ⋯ , ¸� Ý�¸�, ⋯ , ¸�:�, 1 = í�Î�¸�, ⋯ , ¸� í¸� ⋯ í¸�í�:�Î�¸�, ⋯ , ¸�:�, 1 í¸� ⋯ í¸�:�⟹

Î�¸�|¸�, ⋯ , ¸�:� = í�:�Î�¸�, ⋯ , ¸�:�, ¸� í¸� ⋯ í¸�:�í�:�Î�¸�, ⋯ , ¸�:�, 1 í¸� ⋯ í¸�:� ( 68 )

We begin by drawing ¸� from a standard uniform distribution and then uses

the inverse of the conditional copula distribution to compute the subsequent dependent

uniform variates ¸�, ⋯ , ¸0:

¸� = Î:��7�|¸�, ⋯ , ¸�:� ( 69 )

C = 1, ⋯ , D

Where each 7� is also drawn from a standard uniform distribution.

Fortunately, Cherubini, Luciano and Vecchiato [51] also points out that there

is an easier method to generate dependent uniform variates from a t copula. Given a

correlation matrix Õ and the degrees of freedom �, we can:

(i) Find the Cholesky decomposition r of Õ

(ii) Draw n random variates ~ = �~�, ⋯ , ~0 ′ from the standard normal

distribution

(iii) Draw a random variate ^ from the chi-squared distribution with � degrees

of freedom

(iv) Set � = r~ and 5 = �ï� ^⁄

Set ¸� = _Ô�5� , C = 1, ⋯ , D, where _Ô is the univariate Student t distribution with � degrees of freedom

59

We use the method above to generate dependent multivariate random

residuals. Then, we combine each series of residuals with its previously fitted GJR-

GARCH + EVT (GPD) + t-Copula model to simulate fully dependent univariate

trajectories for all daily financial series. For example, suppose we simulate [

multivariate scenarios (i.e., paths) for D instruments over H periods. We are basically

generating the following scenario tree:

Figure 12 – Multivariate Scenario Fan Graph

Each multivariate scenario C is simulated independently from other scenarios

and has probability ��J� = �� = 1 [, C = 1, ⋯ , [⁄ . Moreover, following Pflug and

Pichler [247], we can also represent the scenario tree as a nested distribution (further

explained in later subsections):

Figure 13 – Multivariate Scenario Fan Structure

60

Note that in the structural scenario representation, we omit the root node

because it is not stochastic (i.e., it is assumed to be known a priori). As a numerical

example, suppose we have [ = 4 scenarios, D = 3 instruments and H = 2 periods.

Suppose also that the node values are:

�� = �1.0,1.0,1.0

��,� = �2.0,1.3,1.7 , ��,� = �0.9,1.4,1.2 , ��,K = �0.9,1.4,1.2 , ��,L = �0.7,1.1,0.8 ��,� = �2.2,1.7,1.6 , ��,� = �1.0,1.3,1.0 , ��,K = �0.8,1.2,1.0 , ��,L = �0.6,1.0,0.9

The respective scenario tree can be represented as:

Figure 14 – Multivariate Scenario Fan Graph Example (m=4, n=3 and T=2)

Figure 15 – Multivariate Scenario Fan Structure Example (m=4, n=3 and T=2)

As we discussed previously, this simulated multivariate scenario tree offers a

reasonably accurate way to represent a (continuous) underlying stochastic process by

a discrete approximation. However, in practical financial applications it faces two

61

important shortcomings: it is often too large (i.e., too computationally demanding to be

optimized within a reasonable amount of time) and it is not free of arbitrage

opportunities. Therefore, in the next section we introduce an additional step to address

these limitations.

3.2 Scenario Tree Approximation

Although scenario trees based on the combined GJR-GARCH + EVT (GPD) +

t-Copula approach provide an interesting framework for representing daily log returns,

they lead to two relevant shortcomings in the optimization process that must be

addressed before they can be used. First, stochastic portfolio optimizations based on

simulated scenarios are computationally demanding, particularly in multi-period

problems, because a huge number of scenarios are often required to achieve proper

accuracy. Second, there is no guarantee that the simulated scenarios preclude

arbitrage opportunities, which may induce unreliable optimization solutions.

An alternative to eliminate the first shortcoming is to approximate the original

set of simulated scenarios to a much smaller set, with adjusted probabilities, using

some metric distance to keep both scenarios reasonably similar. This approach is often

known as optimal discretization. As we saw in the literature review, several authors

have explored this approach to approximate the original scenario tree by smaller (and,

consequently, more tractable) tree.

Optimal discretization does not require knowledge of the corresponding

stochastic optimization model as, for example, moment-matching methods. This is

mostly interesting for general portfolio optimization applications where the structure of

the constraints and objective functions (and, therefore, the relevant statistical

properties of the optimization problem) differ from one investor to another. On the other

hand, alike most conditional sampling and clustering techniques, optimal discretization

must be particularly careful to avoid biased and unstable (in-sample and out-of-

sample) solutions.

In-sample stability is related to internal consistency. It requires that if we

simulate many approximated scenarios using the same discretization method, all of

them must lead to similar objective function values. If the optimal values are not similar

62

for slightly different scenarios generated using the same process and data, the

discretization method is not stable. Following Kaut and Wallace [168], most in-sample

stability tests focus on the objective function values rather than on the solution values

because objective functions are often flat, which means that an optimal objective

function value may be attained by quite different solutions, making solution stability

very hard to achieve.

Hence, given an objective function 4�5, � where 5 is a solution and � is the

parameter uncertainty embodied by the approximated scenario tree, in-sample stability

means that, for two approximated scenario trees generated by the same discretization

method leading to �� and ��, we should have:

min9∈ð 4�5, �� ≈ min9∈ð 4o5, ��p ( 70 )

In addition to the potential instability issues, we also need to keep the optimality

gap small. This optimality gap is simply the approximation error (or bias) between the

optimal objective function values achieved using the true underlying stochastic process

and the approximated scenario tree. Therefore, given a stochastic optimization

problem such that 4�5, ¸ is the objective function, º is the true underlying distribution

function and ºt is the approximated discrete distribution, we can define the

approximation error Yo3, 3tp:

min9∈ð 3�5 = min9∈ð 6 4�5, ¸ 8º�¸ ( 71 )

min9∈ð 3t�5 = min9∈ð 6 4�5, ¸ 8ºt�¸ ( 72 )

Yo3, 3tp = 3 ¢argmin9 3t�5 £ − 3 ¢argmin9 3�5 £ ≥ 0 ( 73 )

Yo3, 3tp = 3�5ñ∗ − 3�5∗ ≥ 0 ( 74 )

Pflug [241] shows that, although the true optimal objective value 3�5∗ is

unavailable for most practical applications, we can define an upper bound to the

approximation error Yo3, 3tp based on the following lemma:

Lemma 1

Yo3, 3tp ≤ 2^¸W9�3�5 − 3t�5 � ≥ 0 ( 75 )

63

Proof. Let 5∗ ∈ iXj[CD 3, 5ñ∗ ∈ iXj[CD 3t and ó = ^¸W9�3�5 − 3t�5 �. Let also � = �5: 3�5 ≤ 3�5∗ + 2ó� and suppose that 5ñ∗ ∉ �. Then:

3�5∗ + 2ó < 3�5ñ∗ ≤ 3t�5ñ∗ + ó ≤ 3t�5∗ + ó ≤ 3�5∗ + 2ó ( 76 )

This contradiction shows that 5ñ∗ ∈ � and, consequently:

Yo3, 3tp = 3�5ñ∗ − 3�5∗ ≤ 2ó ( 77 )

The quality of stochastic optimization solution rely heavily on the level of

stability and optimality gap. These levels, in turn, depend greatly on the chosen metric

distance for the approximated scenario. Among the many available metrics, the

Wasserstein or Kantorovich distance and its extensions seem particularly suited

according various works such as Hochreiter and Pflug [141], Pflug [241][243], Pflug

and Pichler [245][246] and Heitsch, Römisch [270], and Römisch and Strugarek [138].

The Wasserstein distance seems to be particular popular because it exhibits some

interesting properties when applied to stochastic programming problems. For instance,

(i) it metricizes convergence in distribution, (ii) it is intuitively extended to stochastic

processes and (iii) it is also computationally straightforward to handle.

Unfortunately, most traditional discretization methods do not handle the second

shortcoming. In fact, despite of the recent advances of in optimal discretization, there

is very limited research concerning the prevention of arbitrage opportunities. To

address that, we need to adjust the discretization approach to avoid scenarios that

lead to these opportunities. Arbitrage-free scenarios might not be particularly essential

for energy generation and logistics applications as we pointed out in the literature

review but they are critical for financial applications. For that reason, additional

constraints must be imposed in the optimal discretization approach to prevent scenario

tree nodes that lead could to these opportunities.

In the following subsections, we discuss in detail the Wasserstein and Process

distance concepts, and describe a method to approximate a large scenario tree by a

tractable and arbitrage-free one. First, we describe how we can use the Wasserstein

and Processes distances to compute, respectively, the distance between distributions

and stochastic processes. Then, we describe the approach proposed by Pflug and

Pichler [246], which performs optimal discretizations using the Process distance, and

extend it to prevent arbitrage opportunities.

64

3.2.1 Distributions Distance

Optimal discretization relies on the concept of probability metrics to evaluate

the dissimilarity between different probability distributions. Such discretization

approaches minimize an appropriately chosen probability metric to approximate the

original and often continuous distribution by a smaller but similar (in some statistical

sense) discrete one, which can be used in practical optimization problems.

Definition 18 Given a set of probability measures h on ℝd and ��, ��, �K ∈ h,

a probability metric 8 on h × h is a distance function between probability distributions

such that:

(i) 8��, �� = 0 ⇔ �� = �� ( 78 )

(ii) 8��, �� ≥ 0 ( 79 )

(iii) 8��, �� = 8��, �� ( 80 )

(iv) 8��, �� ≤ 8��, �K + 8��, �K ( 81 )

Definition 19 Given a set Ξ and a probability metric 8: Ξ × Ξ → ℝ, the pair �ø, 8 is called metric space. A metric space �ø, 8 is said to be separable and complete

if it has a countable dense subset and if every Cauchy sequence in �ø, 8 converges.

A complete separable metric space is called Polish space.

Various metrics or distances (e.g., Kolmogorov metric, Lévy-Prokhorov metric,

total variation distance) can be defined on a given set but few are suited to measure

distances between discrete probability distributions. Two metrics in particular, the

Fortet-Mourier and the Wasserstein distances, have been favoured in the optimal

discretization literature because of their appealing computational and statistical

properties, particularly in Polish spaces. As pointed out by Pflug [241], Pflug and

Pichler [244], and Kovacevic and Pichler [186], they are especially relevant for scenario

approximation because they metricize weak convergence and because discrete

probability measures are dense in the corresponding spaces of probability measures.

Definition 20 Given all continuous bounded functions ℎ, a sequence of

probability measures �0 on ℝ³ converge weakly as D → ∞ if:

65

6 ℎ8�0 → 6 ℎ8� ( 82 )

In other words, assuming a Polish space a proper distance 8 metricizes weak

converge from a discrete approximated distribution �0 to the true underlying distribution �, i.e. 8��0, � → 0, if and only if �0 ⟹ �. This property, exhibited by the Wasserstein

distance, is paramount to the approximation scenario technique used in the thesis.

The Wasserstein distance is generalization of the Kantorovich distance (also

known as Wasserstein distance of order X = 1) that can also be generalized from

probability measures to stochastic processes, as we discuss in the next subsection. It

is directly associated to the concept of optimal transportation cost between

distributions, which is extensively applied in engineering applications.

Definition 21 Given two probability spaces ��, ℱ, � and o�t, ℱt, �tp with

elements J ∈ � and Jù ∈ �t, a joint probability measure Ù�J, Jù on � × �t , and a

(transportation) cost function Ý�J, Jù between them the optimal transportation cost is

given by:

CD4ú 6 Ý�J, Jù Ù�8J, 8Jù û×ûv ( 83 )

Such that, for all measurable sets r ∈ ℱ and s ∈ ℱt:

Ùor × �tp = ��r ( 84 )

ÙoΩ × stp = �t�s ( 85 )

Definition 22 Given two probability spaces ��, ℱ, � and o�t, ℱt, �tp and two ℝd-

valued random variables �: � → ℝd and �x: �t → ℝd, the inherited distance (from � and �x) between � and �t is defined either as:

8�J, Jù = ý��J − �x�Jù ý or ( 86 )

8 w��J , �x�Jù y = ý��J − �x�Jù ý ( 87 )

For some norm ‖∙‖ in ℝd.

66

Definition 23 Given an inherited distance 8�J, Jù between � and �t , the

Wasserstein distance is defined either as:

8�o�, �tp = àCD4ú � 8�J, Jù �Ù�8J, 8Jù û×ûv â�� ( 88 )

8�o�, �tp = àCD4ú � 8�¸, 7 �Ù�8¸, 87 ℝ�×ℝ�â

�� ( 89 )

As we will discuss in detail later, multi-period stochastic programs depend on

multivariate stochastic processes ��Z�Z��E , which are defined on some probability space ��, ℱ, � such that �Z ∈ ℝÏ. In addition, a decision 5Z at time t cannot depend on future

realizations (i.e., �Z\b, u ≥ 1) meaning that 5Z is measurable with respect to ℱZ�� . This

restriction is known as nonanticipativity condition.

3.2.2 Stochastic Processes Distance

As we have underlined earlier, multi-period stochastic optimization problems

often require a simpler (i.e., tractable) discrete stochastic process �x with some filtration ℱt that approximates the original underlying stochastic process �, with another filtration ℱ. In other words, given two general multi-period stochastic optimization problems:

[i5��&¡�5�, ��, ⋯ , 5E:�, �E *: 5 ⊲ ℱ� ( 90 )

[i5��¡o5�, �x�, ⋯ , 5E:�, �xEp�: 5 ⊲ ℱt� ( 91 )

We need to identify �x and ℱt such that the optimality gap (i.e., the approximation

error) between the solutions of these optimization problems is stable and small.

However, an important detail cannot be overlooked: in practical applications, these

processes and filtrations are rarely defined in the same probability space �Ω, ℱ, � .

Therefore, the approximation metric used to measure the distance (i.e., the optimality

gap) between these processes must take it into account. The process or nested

distance, proposed by Pflug [243] and further detailed in Pflug and Pichler

[244][245][246], is particularly interesting.

67

3.2.3 Scenario Tree Reduction

The optimal discretization approach applied in this section follows the

techniques presented in Pflug [243], Pflug and Pichler [246], and Kovacevic and

Pichler [186], which are based on process distances.

3.2.4 No-Arbitrage Scenario Tree Reduction

Informally, an arbitrage opportunity consists of a riskless investment with a

positive return that, in real financial markets, does not persist for long (because

investors quickly explore it). Unfortunately, as discussed by Klassen [173][174][175],

and Geyer, Hanke and Weissensteiner [107], portfolio optimization models are ill

equipped to handle arbitrage opportunities, because they often lead to unbounded or

biased solutions. For that reason, scenario trees must preclude such opportunities to

achieve reliable solutions. Formally, following Ingersoll [151]:

Definition 24 Given non-redundant instruments u = 1, ⋯ , ä, their allocations ãb,Z and prices �b,Z0 at stage _ and state (i.e., scenario tree node) D, and their prices �b,Z\�d at stage _ + 1 and states [ such that [ ∈ D\ (i.e., scenario tree direct

descendent nodes), an arbitrage opportunity exists if either:

(i) It is possible to construct zero net investment portfolio or investment

strategy ãZ = oã�,Z, ⋯ , ãå,Zp that is profitable in at least one future state of

nature (i.e., that delivers a positive return in at least one scenario arc)

and never produces a loss, whatever the future state of nature (i.e., that

does not deliver negative returns in any scenario arc). c ãb,Z�b,Z0åb�� = 0 ( 92 )

c ãb,Z�b,Z\�dåb�� ≥ 0, 4lX ikk [ ∈ D\ ( 93 )

68

c ãb,Z�b,Z\�dåb�� > 0, 4lX i_ kYi^_ lDY [ ∈ D\ ( 94 )

(i) It is possible to construct an investment portfolio or strategy ãZ =oã�,Z, ⋯ , ãå,Zp that has a positive cash flow today and generates a

nonnegative return, whatever the future state of nature.

c ãb,Z�b,Z0åb�� < 0 ( 95 )

c ãb,Z�b,Z\�dåb�� ≥ 0, 4lX ikk [ ∈ D\ ( 96 )

Putting it differently, as imposed by Harrison and Kreps theorem, given �b,Z0

and �b,Z\�d , [ ∈ D\ the scenario tree does not exhibit arbitrage opportunities if and only

if there is a state price vector Ù such that:

�b,Z\�d Ù = �b,Z0 , 4lX ikk [ ∈ D\ ( 97 )

However, as explained by Jobst and Zenios [159], Fersti and Weissensteiner

[100], and Geyer, Hanke and Weissensteiner [107], portfolio optimization must be

carried on objective (i.e., real world) probability measures rather than risk-neutral

probability measures.

3.3 Stochastic Portfolio Optimization

Once we have constructed our tractable scenario tree, we need to choose an

appropriate optimization method that properly explores the impacts and outcomes of

potential sequence of allocations in each scenario and, as we discuss earlier, the

Stochastic Programming (SP) is a suited approach to accomplish that.

4 Methodology

4.1 Data Description

69

The combined GJR-GARCH + EVT (GPD) + t-Copula that we discuss next is

applied to fit the joint behaviour of five global stock indexes: IBX, S&P 500, FTSE 100,

DAX and Nikkei 225 from January 4, 2005 to August 15, 2015 comprising 2.673 trading

days.

According Rydberg [274], most daily financial series analyzed in the literature

are based on log returns of closing prices. We adopt the same approach and use Dollar

closing prices to compute the multivariate financial series of F = 5 assets and H =2.672 daily log returns:

X�,Z = ln ��,Z��,Z:� ( 98 )

C = 1, ⋯ , F; _ = 1, … , H

Table 1 – Univariate Statistics – Daily Log Returns

Unfortunately, there are only 2.379 trading days with no missing prices (Table

2 and Figure 16) and we need to handle the missing information before we can

proceed. Standard techniques for missing data cannot be applied without distorting

considerably the multivariate series. For example, row-wise removal decreases the

sample size and creates irregular (i.e., non-daily) time series. Single imputation

techniques, such as copying the last available price or interpolating the nearest prices,

may introduce substantial biases.

Instead, we use a better alternative: an Expectation-Maximization (EM)

algorithm for recovering multivariate normal missing data proposed by Meucci [217]. It

still induces some distortions, but they do not appear to be severe (Figure 16 and

Figure 17) as usually happens with the standard methods.

IBX S&P 500 FTSE 100 DAX Nikkei 225

Mean 0.0262% 0.0207% 0.0039% 0.0277% 0.0151%

Standard Deviation 2.3230% 1.2639% 1.4560% 1.6460% 1.5018%

Skewness -0.2630 -0.3327 -0.0310 -0.0317 -0.4776

Kurtosis 10.1024 14.2830 13.2818 9.0508 8.8348

Median 0.097% 0.074% 0.067% 0.055% 0.048%

Mean Absolute Deviation 1.631% 0.804% 0.978% 1.133% 1.073%

Lower Semi Standard Deviation 1.679% 0.920% 1.050% 1.173% 1.098%

Upper Semi Standard Deviation 1.605% 0.866% 1.009% 1.154% 1.024%

70

Table 2 – Missing Return Patterns

Figure 16 – Missing Returns Histogram

Occurrences DAX FTSE SP IBX NikkeiMissing Returns

per Row

2378 0

91 NA 1

50 NA 1

21 NA 1

1 NA 1

132 NA 1

5 NA NA 2

8 NA NA 2

11 NA NA 2

16 NA NA 2

2 NA NA 2

3 NA NA 2

8 NA NA 2

1 NA NA NA 3

2 NA NA NA 3

4 NA NA NA 3

2 NA NA NA 3

9 NA NA NA 3

18 NA NA NA NA 4

62 79 91 139 156 527

71

Figure 17 – Distributions with Adjusted Missing Data

4.2 Scenario Tree Generation

The framework proposed in this study combines the approaches proposed by

Diebold, Schuermann, and Stroughair [75], McNeil and Frey [213], and Rockinger and

72

Jondeau [162][268]. We substitute an asymmetric GJR-GARCH model for the original

GARCH model to encompass potential leverage effects and expand the univariate

analysis to a multivariate setting using a t-Copula to account for the dependency

structure among the financial series.

Our combined approach comprises four major steps:

(i) Fit a GJR-GARCH model to each of the univariate series to generate

approximately i.i.d. standardized residuals

(ii) Fit GPD tails and a smoothed-kernel body to each series of standardized

residuals

(iii) Estimate a t-copula using the transformed residuals of the previous

margins

(iv) Simulate dependent paths using the combined models

Before the initial step, we examine the series looking for indication of the four

stylized facts: heavy-tails, volatility clustering, leverage effect and tail co-dependence.

Cumulative returns and daily log returns can help us identify signs of these features.

Additionally, correlograms and tests such as the Augmented Dickey-Fuller test, Ljung-

Box Q test and Engle test for ARCH effects can further support our analysis.

Once we confirm that these stylized facts seem to be present, we can model

them. First, we take into account volatility clustering and leverage effects by fitting a

GJR-GARCH process with Student t innovations to each series of log daily returns.

This step constructs sequences of approximately i.i.d. standardized residuals with

heavy tails that we use in the subsequent steps.

As pointed out earlier, we selected a parsimonious specification GJR-GARCH

(1,1) to model the conditional variance because it is more tractable and still provides

adequate estimates. Conversely, we do not assume a priori that the conditional mean

does not have a significant autoregressive term. Instead we compare low order ARMA

specifications to verify that conditional raw returns indeed exhibit little serial correlation

are close to zero, as suggested by the literature.

Thus, we assume that:

|Z = XZ − }Z:�&XZ* ( 99 )

}Z:�&XZ* = Ý + XZ:� + ãXZ:� ( 100 )

73

|Z = ~Z"Z ( 101 )

"Z� = i� + i�|Z:�� + ��"Z:�� + 8��|Z:� < 0 |Z:�� ( 102 )

i� > 0, i� ≥ 0, �� ≥ 0, 8� ≥ 0

Where �XZ� is the series of daily log returns, }Z:�&XZ* is the conditional return,

whose best specification is likely to omit autoregressive and moving average terms, �~Z� is a i.i.d. sequence with Student t distribution and "Z� is the conditional variance at

time _. We then model each univariate series of daily log returns using the formulation

above and examine the residuals diagnostics.

We perform a visual analysis of the daily standardized residuals to check how

reasonable is the hypothesis of (approximately) i.i.d. standardized residuals. We also

check for remaining signs of volatility clustering, leverage effects and heavy-tails.

Given the discussed literature, we expect only the latter to be present. Further checks

employ correlograms, Ljung-Box Q tests and Engle tests for ARCH effects of the raw

and squared standardized residuals to confirm that the former two stylized facts have

been adequately modelled.

Once we have the i.i.d. standardized residual series, we can improve the fitting

by separating each series’ distribution in three parts: the lower tail, the upper tail and

the central part of distribution. First, we need to decide which particular distribution is

suitable to each one of these parts. Based on the Peaks over Threshold (POT) method

of Extreme Value Theory discussed earlier, it seems reasonable to assume that, given

a sufficiently large threshold ¸ and shape parameter � > 0, tails follow approximately

a GPD

º¦,��· �� = »1 − w1 + � ¼��· y:>§ C4 � ≠ 01 − Y: ½¬�¾ C4 � = 0 where ¿ > 0 ( 103 )

Note that for the upper tails (i.e., the positive extreme observations), we can fit

a GPD directly to the daily standardized residuals while for the lower tails (i.e., the

negative extreme observations), we need to fit a GPD to the negative standardized

residuals.

The first step we need to estimate each GPD is to define proper thresholds,

which separate the tails from the centre of the distributions. As we underlined in the

74

previous section, these thresholds are not easily identified. One alternative is to

analyze the quantile-quantile and empirical excess mean plots to obtain an

approximate idea of each tail’s threshold. Once the lower and upper thresholds are

selected, we proceed with the GPD estimation and identify, for each standardized

residual series, the shape parameter � (or, equivalently, the tail index � = 1 �⁄ ) and the

scale parameter " of each tail.

Then, based on selected thresholds we use the central part of standardized

residuals to fit the body of each distribution to a Gaussian kernel using a Kernel Density

Estimator (KDE). This approach assumes that a large number of observations are

available to properly describe the central part of distribution.

After we have fitted the parametric tails and kernel body of each distribution of

standardized residuals, we evaluate the results. First, we compare the empirical upper

and lower tail distributions to the suggested GPD model. Secondly, we compare the

empirical cumulative distribution functions to the semi-parametric ones generated by

each combined GJR-GARCH + EVT (GPD) model. Lastly, we evaluate how properly

each proposed semi-parametric distribution fits the empirical distribution using

quantile-quantile and probability density function plots.

Provided the combined GJR-GARCH + EVT(GPD) models are adequate, we

can proceed and assess the joint dependence structure by using the standardized

residuals to fit the a t-copula. The approach is straightforward: employing the GJR-

GARCH + EVT (GPD) cumulative distribution functions, we transform the residuals into

uniform variates that are used to fit the t-copula using the CML method described

earlier. As pointed out in the literature, we expect to explain some of the tail co-

dependence between the margins observed in the bivariate scatter and density plots.

As the final step, we use the combined GJR-GARCH + EVT (GPD) + t Copula

model to simulate dependent samples (or trajectories) of the daily log returns. As

described in the previous section, we first generate samples of dependent uniform

variates using the fitted t copula. Then, we transform back these variates into

standardized residuals using the inverse of their distributions based on calibrated GPD

tails and kernel-smoothed bodies. Lastly, we filter the standardized residuals into the

ARMA + GJR-GARCH models to simulate the dependent daily log returns.

75

This combined approach generates multivariate samples that exhibit the four

key stylized facts discussed previously: heavy-tails, volatility clustering, leverage effect

and tail co-dependence. As practical applications, we perform a multi-objective Mean-

Variance-CVaR portfolio optimization; using samples simulated from a multivariate

Gaussian distribution and from the proposed GJR-GARCH + EVT (GPD) + t Copula

approach. We present the results and analyses in the next section.

5 Empirical Results

In this section, we present our empirical findings and comment some of the

results. All analysis were performed using Matlab R2015a (version: 8.5), except for the

missing data analysis discussed in the previous section, which were partially

implemented using R 3.2.3.

We begin by an initial inspection of the cumulative log returns of these financial

series (Figure 18). It suggests some interesting features such as time-varying

volatilities, erratic extreme variations and some degree of co-dependence.

Figure 18 – Stock Indexes’ Cumulative Returns

76

We first examine the univariate behaviour of each series. A closer look at the

daily log returns (Figure 21) seem to confirm the presence of volatility clustering,

heavy-tails and some leverage effects in all series.

Figure 19 – ACF and PACF of Raw Returns

77

Figure 20 – ACF and PACF of Squared Returns

78

Figure 21 – Daily Log Returns

Augmented Dickey-Fuller tests (Table 3) with significance level of 5% reject

the hypothesis of unit roots in these series. Additionally, as suggested by previous

researches, the sample Autocorrelation Functions (ACFs) (Figure 19) and sample

Partial Autocorrelation Functions (PACFs) (Figure 20) as well as the Ljung-Box Q-tests

(Table 4) show little evidence of serial correlation in raw returns. They do, however,

79

suggest strong evidence of serial correlation in squared returns (with apparently very

slow decay) and consequently reject the assumption of i.i.d. series. Similar analyses

of the absolute returns also corroborate these findings.

Table 3 – ADF Tests

Following Tsay [301], we set the number of lags to ln T = ln 2.672 ≅ 7.89, rounded up to the next integer, to increase the

performance of the Ljung-Box Q test.

Table 4 – Ljung Box Q-test on Squared Residuals

Table 5 – ARCH Engle test for Residual Heteroscedasticity

Fitting a ARMA (p,q) + GJR-GARCH (1,1) model to each one of the univariate

series, we observe that most of them are best explained by parsimonious ARMA (0,0)

+ GJR-GARCH (1,1) processes (Table 6), which suggests that the conditional means

are constant and, in these cases, not statistically different from zero (Table 7). The

exceptions are the IBX and S&P 500 series. However, the autoregressive term is not

p-Value t-Statistic

IBX < 0.001 -49.91

S&P 500 < 0.001 -57.85

FTSE 100 < 0.001 -52.33

DAX < 0.001 -51.15

Nikkei 225 < 0.001 -59.68

ADF Test (Critical Value = -1.94)

Lag 1 2 3 4 5 6 7 8

IBX 177.30 736.70 947.32 1230.76 1783.94 1981.57 2506.86 2596.46

S&P 500 122.33 539.67 643.83 854.88 1193.11 1448.50 1761.37 1962.63

FTSE 100 81.20 229.19 401.58 558.66 1005.58 1117.65 1221.16 1305.16

DAX 66.57 219.65 412.30 505.39 736.86 794.04 929.37 988.69

Nikkei 225 143.81 585.49 670.36 1005.09 1088.10 1285.66 1373.98 1532.22

Critical

Values3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51

Q-Statistic

Ljung-Box Q-test on Squared Residuals

Lag 1 2 3 4 5 6 7 8

IBX 177.05 614.10 651.65 675.28 887.74 889.99 939.63 951.04

S&P 500 122.15 463.30 476.97 510.91 662.00 705.58 738.34 751.14

FTSE 100 81.09 196.56 294.74 353.44 599.38 608.99 609.21 610.24

DAX 66.47 192.20 312.01 334.02 423.16 423.40 441.99 441.78

Nikkei 225 143.60 494.23 497.49 594.37 596.66 607.92 613.54 619.85

Critical

Values3.84 5.99 7.81 9.49 11.07 12.59 14.07 15.51

ARCH Engle test for Residual Heteroscedasticity

F-Statistic

80

statistically relevant in the former (Table 8) and neither is the moving average term in

the latter (Table 9). Therefore, in this study we also assume ARMA (0,0) + GJR-

GARCH (1,1) specifications for these series (Table 7).

Table 6 – BIC Comparison of ARMA (P,Q) + GJR-GARCH (1,1) Models

0 1 2

0 -13,561.9 -13,562.2 -13,554.4

1 -13,562.2 -13,554.4 -13,550.7

2 -13,554.4 -13,550.1 -13,542.9

P

IBX

Bayesian Information Criterion

Q

0 1 2

0 -17,559.9 -17,560.1 -17,554.8

1 -17,559.6 -17,560.1 -17,546.9

2 -17,554.5 -17,546.7 -17,544.6

P

S&P 500


Q

0 1 2

0 -16,407.0 -16,399.1 -16,391.2

1 -16,399.1 -16,391.6 -16,383.7

2 -16,391.2 -16,386.3 -16,377.5

P

Q

FTSE 100


0 1 2

0 -15,433.0 -15,425.3 -15,417.6

1 -15,425.3 -15,418.4 -15,410.8

2 -15,417.6 -15,410.8 -15,404.2

P

Q

DAX


0 1 2

0 -15,786.9 -15,731.8 -15,779.1

1 -15,784.4 -15,779.1 -15,773.2

2 -15,779.3 -15,773.1 -15,765.5

P

Q

Nikkei 225


81

Table 7 – ARMA (0,0) + GJR-GARCH (1,1) Specifications

Parameter Value Std. Error t Stat

Constant 0.0003 0.0003 0.91


Constant 0.0000 0.0000 5.66

GARCH{1} 0.9194 0.0105 87.65

ARCH{1} 0.0061 0.0108 0.57

Leverage{1} 0.1106 0.0156 7.09


DoF 12.7322 2.8893 4.41

ARMA(0,0) Model

IBX

GJR-GARCH(1,1) Model

Student t Innovations


Constant 0.0004 0.0001 3.13


Constant 0.0000 0.0000 3.02

GARCH{1} 0.8864 0.0136 64.99

ARCH{1} 0.0000 0.0168 0.00

Leverage{1} 0.1953 0.0256 7.63


DoF 6.0985 0.8277 7.37

S&P 500

ARMA(0,0) Model




Constant 0.0002 0.0002 0.88


Constant 0.0000 0.0000 2.75

GARCH{1} 0.9080 0.0108 83.90

ARCH{1} 0.0092 0.0105 0.88

Leverage{1} 0.1444 0.0191 7.57


DoF 9.4403 1.5830 5.96


FTSE 100

ARMA(0,0) Model



Constant 0.0005 0.0002 2.35


Constant 0.0000 0.0000 3.54

GARCH{1} 0.9078 0.0122 74.49

ARCH{1} 0.0019 0.0122 0.16

Leverage{1} 0.1493 0.0199 7.51


DoF 8.3243 1.3512 6.16


DAX

ARMA(0,0) Model



Constant 0.0002 0.0002 0.95


Constant 0.0000 0.0000 4.05

GARCH{1} 0.8802 0.0155 56.66

ARCH{1} 0.0296 0.0159 1.86

Leverage{1} 0.1154 0.0224 5.15


DoF 9.2877 1.4095 6.59

Nikkei 225

ARMA(0,0) Model



82

Table 8 – ARMA (1,0) + GJR-GARCH (1,1) Specification - IBX

Table 9 – ARMA (0,1) + GJR-GARCH (1,1) Specification – S&P 500

We notice that there is strong evidence of volatility clustering and leverage

effects in all series. On the other hand, we also notice that ARCH terms are not

statistically different from zero, which might have led to identification problems and

compromise quasi-maximum likelihood estimates as underlined in Andersen et al. [7].

Fortunately, it seems not to be the case as residual diagnostics suggest.

A visual examination of the daily standardized residuals appears to support

that they are nearly i.i.d. (Figure 23). Analyses of the ACFs and PACFs further

corroborate this assumption (Figure 24 and Figure 25). Lastly, Quantile-Quantile (QQ)

plots against Gaussian and Student t-distributions indicate that we can improve the


AR{1} 0.0002 0.0003 0.66

Constant 0.0567 0.0207 2.73


Constant 0.0000 0.0000 5.72

GARCH{1} 0.9185 0.0105 87.86

ARCH{1} 0.0040 0.0108 0.37

Leverage{1} 0.1168 0.0163 7.15


DoF 12.9052 2.9286 4.41

IBX

ARMA(1,0) Model




MA{1} 0.0005 0.0003 1.47

Constant -0.0559 0.0211 -2.66


Constant 0.0000 0.0000 2.95

GARCH{1} 0.8899 0.0134 66.43

ARCH{1} 0.0000 0.0164 0.00

Leverage{1} 0.1863 0.0244 7.63


DoF 6.0824 0.8226 7.39


S&P 500

ARMA(0,1) Model


83

fitting of the tails (Figure 26), except maybe for FTSE 100 series. Although Gaussian

distributions are too thin to accommodate the extreme observations, Student t

distributions do not seem to achieve good results regarding tail behaviour, possibly

because they are fitted to the whole distributions rather than to the tails individually.

Figure 22 – Filtered Residuals and Filtered Conditional Volatility

84

Figure 23 – Standardized Residuals

85

Figure 24 – ACF and PACF of Raw Standardized Residuals

86

Figure 25 – ACF and PACF of Squared Standardized Residuals

87

Figure 26 – QQ Plots of the Standardized Residuals vs Normal and Student t Distributions

As we saw previously, an interesting alternative to accomplish that is to use

the Peaks over Threshold (POT) method to fit each tail of standardized residuals

individually. First, we use the QQ plots (Figure 26) and the Empirical Excess Mean

(EEM) plots (Figure 27 and Figure 28) to select an appropriate threshold for each tail

(Table 10).

88

Figure 27 – Empirical Mean Excess Plots – Lower Tails

89

Figure 28 – Empirical Mean Excess Plots – Upper Tails

90

Table 10 – GPD Thresholds

Given the selected lower and upper thresholds, we estimate the GPD

parameters of each tail, fit the kernel-smoothed bodies and evaluate the results (Figure

29 to Figure 33). We see that the lower tail of the IBX and DAX distributions and the

upper tail of the FTSE 100 distribution have finite variance but not finite fourth moment

(Table 11). We also notice that distribution tails do not seem to be symmetric.

Nonetheless, an overall analysis of the cumulative function, density function and QQ

plots suggest that proposed GJR-GARCH + EVT (GPD) approach provides a good fit

to the heavy-tails, volatility clustering, leverage effect and tail co-dependence observed

in the empirical distributions (Figure 31, Figure 32 and Figure 33).

Table 11 – Tail Fitting – GPD Parameters

Lower Tail Upper Tail

IBX -3.0 2.2

S&P 500 -2.5 2.0

FTSE 100 -2.5 2.0

DAX -3.2 2.6

Nikkei 225 -2.5 2.1

GDP Thresholds

Tail IndexScale

ParameterTail Index

Scale

Parameter

IBX 2.8416 0.2710 5.0607 0.2591

S&P 500 6.3026 0.4457 56.5416 0.3355

FTSE 100 34.8760 0.5119 2.5257 0.3028

DAX 3.3312 0.4719 257.8469 0.2498

Nikkei 225 4.2835 0.4692 28.5293 0.3123

GDP Parameters

Lower Tail Upper Tail

91

Figure 29 – Lower Tail Fitting – Standardized Residuals

92

Figure 30 – Upper Tail Fitting – Standardized Residuals

93

Figure 31 – Empirical Cumulative Distribution Functions

94

Figure 32 – QQ Plots of the Standardized Residuals vs GJR-GARCH + GPD Distributions

95

Figure 33 – Standardized Residuals Fitting

Therefore, assuming that the GJR-GARCH + EVT (GPD) models are suited to

model the marginal distributions, we use them to fit a t-copula that captures their

96

dependence structure. As we described previously, the approach is simple: we use the

GJR-GARCH + EVT (GPD) cumulative distribution functions to transform the residuals

into uniform variates. Then we fit the t-copula to the uniform univariates using the CML

method. As we observe in the bivariate scatter and density plots (Figure 34 to Figure

37) all series, except for the Nikkei 225, appear to exhibit some degree of tail co-

dependence. The t copula estimated parameters (Table 12) also indicate a high

correlation between most of them series.

Table 12 – t-Copula Parameters

Figure 34 – Dependence between Series – Part 1

IBX S&P 500 FTSE 100 DAX Nikkei 225

IBX 1.00 0.63 0.60 0.57 0.11

S&P 500 0.63 1.00 0.57 0.59 0.00

FTSE 100 0.60 0.57 1.00 0.86 0.22

DAX 0.57 0.59 0.86 1.00 0.20

Nikkei 225 0.11 0.00 0.22 0.20 1.00

13.41Degrees of Freedom

t-Copula Correlations

97

Figure 35 – Dependence between Series – Part 2

98

Figure 36 – Bivariate Histograms – Part 1

99

Figure 37 – Bivariate Histograms – Part 2

Lastly, we use the complete combined approach GJR-GARCH + EVT (GPD)

+ t Copula to simulate 500.000 cross and serially dependent samples of the daily

series. We begin by construing dependent uniform variates using the fitted t copula as

explained in the previous section.

Figure 38 – Mean-CVaR Efficient Frontiers

100

Figure 39 – Mean-Variance Efficient Frontiers

101

6 Conclusions and Further Research

In the thesis, we discuss three key steps to perform a long-term multistage

multi-objective portfolio optimization based on stochastic programming. First, we

develop and estimate a combined GJR-GARCH + EVT-GPD + t-Copula econometric

model to simulate a large number of discrete multivariate trajectories, which is

represented by a large scenario tree. Secondly, we approximate the original scenario

tree by a smaller one that is both tractable and arbitrage-free. Lastly, we use the

approximated scenario tree to optimize a Mean-Variance-CVaR portfolio.

The first step combines different econometric models, extending approaches

similar to those suggested by McNeil and Frey [213], and Jondeau and Rockinger

[162], to simulate four major stylized facts: volatility clustering, heavy tails, leverage

effects and tail co-dependence, which are typically indispensable to proper risk

management and portfolio optimization in financial markets. The reason is

straightforward: without unrealistic simplifying assumptions, proper optimization

models (i.e., models that can handle real-world multivariate distributions) often achieve

superior results because the simulated scenario tree (or, more precisely, scenario fan)

describe better key empirical features of the financial series.

However, the original scenario tree is neither tractable nor arbitrage-free.

Therefore, we need a second step that (i) approximates (i.e., reduce the size) the

original tree by a smaller one and (ii) removes potential opportunities of arbitrage.

Otherwise, practical portfolio optimization problems are likely to be either unsolvable

(within a reasonable amount of time) or biased. To accomplish both objectives, this

step extends the optimal discretization approach, proposed by Pflug [203] and

developed by Pflug and Pichler [205][206], and Kovacevic and Pichler [155], to

preclude arbitrage opportunities.

Finally, the last step exploits the approximated scenario tree to perform a multi-

objective multi-period stochastic portfolio Mean-Variance-CVaR optimization that

considers market frictions like transaction costs and liquidity restrictions. Maximizing

expected return and minimizing variance are virtually mandatory objectives in most

real-world portfolio optimization problems. On the other hand, minimizing the

Conditional Value-at-Risk is incorporated to handle market risk more accurately in

102

practical asset allocation applications. This final step extends the approach discussed

by Roman, Darby-Dowman and Mitra [269] to our multi-period setting.

Results discussed in the previous section suggest that these combined steps

offer a thorough and reliable framework to optimize such portfolios. The absence of

questionable assumptions (e.g., no market frictions or liquidity restrictions) and better

uncertainty modelling (e.g., co-dependence and heavy tails) combined with a more

realistic set of objectives seem better equipped to manage practical portfolios.

Although some of these portfolio optimization enhancements have been proposed in

the literature, to the best of our knowledge they have never been combined into a single

framework.

Such improvements have only been possible because of the recent advances

in computing power, which proved invaluable to implement approaches like multi-

period stochastic programming. Such approaches are instrumental to real applications

because they handle key features of real markets that are often dismissed by common

optimization approaches.

Furthermore, the techniques and models used in the proposed framework can

be replaced by newer extensions that have been recently explored and/or further

developed. The Student t distribution, for instance, can be replaced by Hansen’s

skewed-t distribution [125] to accommodate also potential gain and loss asymmetries

in daily returns, a fifth and potentially relevant stylized fact. Bayesian Stochastic

Volatility models, based on Bayesian Markov Chain Monte Carlo (MCMC) algorithms,

can substitute the GJR-GARCH models to mitigate with potential model

misspecifications, such as discussed by Jacquier, Polson and Rossi [152][153], and

Bauwens, Hafner and Laurent [20]. The t copulas can be replaced by vine copulas, as

suggested by Kurowicka and Joe [189], and Patton [237][238], which are able to

explain more elaborated dependence structures and include some additional financial

multivariate properties like tail dependence asymmetries. Lastly, we assumed that the

true probability distribution was fully known. In practice, however, this is hardly true. In

financial applications, the probability model is usually unknown and can only be

estimated. This model “ambiguity”, as discussed by Pflug and Pichler [246] should also

be accounted by the stochastic optimization problems to produce solutions that are

more robust.

103

References

[1] Abdelaziz, F., 2012. Solution Approaches for the Multiobjective Stochastic

Programming. European Journal of Operational Research, Volume 216, Issue

1, Pages 1–16.

[2] Abraham, A., Jain, L., Goldberg, R., 2005. Evolutionary Multiobjective

Optimization. Springer-Verlag, London.

[3] Aggarwal, C., 2014. Data Classification – Algorithms and Applications.

Chapman and Hall/CRC, Boca Raton.

[4] Alberg, D., Shalit, H., Yosef, R., 2008. Estimating Stock Market Volatility

using Asymmetric GARCH Models. Applied Financial Economics, Volume 18,

Issue 15, Pages 1201-1208.

[5] Andersen, T., Bollerslev, T., Diebold, F., Ebens, H., 2001. The Distribution

of Realized Stock Return Volatility. Journal of Financial Economics, Volume

61, Issue 1, Pages 43-67.

[6] Andersen, T., Bollerslev, T., Diebold, F., Labys, P., 2001a. The Distribution

of Realized Exchange Rate Volatility. Journal of the American Statistical

Association, Volume 96, Issue 453, Pages 42–55.

[7] Andersen, T., et al., 2009. Handbook of Financial Time Series. Springer-

Verlag. Heidelberg, Berlin.

[8] Artzner, P., Delbaen, F., Heath, D., 1997. Thinking Coherently. Risk, Volume

10, Issue 11, Pages 68–71.

[9] Artzner, P., Delbaen, F., Eber, J., Heath, D., 1999. Coherent Measures of

Risk. Mathematical Finance, Volume 9, Issue 3, Pages 203–228.

[10] Artzner, P., Delbaen, F., Eber, J., Heath, D., Ku, H., 2007. Coherent

Multiperiod Risk Adjusted Values and Bellman Principle. Annals of Operations

Research, Volume 152, Issue 1, Pages 5-22.

[11] Baba, Y., Engle, R., Kraft, D., Kroner F., 1990. Multivariate Simultaneous

Generalized ARCH. Working Paper. Department of Economics, University of

California, San Diego.

[12] Baillie, R., Bollerslev, T., Mikkelsen, H., 1996a. Fractionally integrated

Generalized Autoregressive Conditional Heteroskedasticity. Journal of

Econometrics, Volume 74, Issue 1, Pages 3–30.

104

[13] Baillie, R., Chung, C., Tieslau, M., 1996b. Analyzing Inflation by the

Fractionally Integrated ARFIMA-GARCH Model. Journal of Applied


[14] Balkema, A., de Haan, L., 1974. Residual Life Time at Great Age. Annals of

Probability, Volume 2, Issue 5, Pages 792-804.

[15] Balzer, L., 1995. Measuring Investment Risk: A Review. Journal of Investing,

Volume 4, Issue 3, Pages 5-16.

[16] Barndorff-Nielsen, O. 1977. Exponentially Decreasing Distributions for the

Logarithm of Particle Size. Proceedings of the Royal Society of London A,

Volume 353, Issue 1674, Pages 401–419.

[17] Barndorff-Nielsen, O., 1995. Normal Inverse Gaussian Processes and the

Modelling of Stock Returns. Research Report Issue 300, Department of

Theoretical Statistics, University of Aarhus.

[18] Barndorff-Nielsen, O., 1997. Normal Inverse Gaussian Distributions and

Stochastic Volatility Modelling. Scandinavian Journal of Statistics, Volume

24, Issue 1, Pages 1-13.

[19] Barndorff-Nielsen, O., Shephard, N. 2002. Econometric Analysis of

Realised Volatility and its use in Estimating Stochastic Volatility Models.

Journal of the Royal Statistical Society, Series B, Volume 64, Issue 2, Pages

253–280.

[20] Bauwens, L., Hafner, C., Laurent, S., 2012. Handbook of Volatility Models

and their Applications. John Wiley & Sons. Hoboken, New Jersey.

[21] Bauwens, L., Laurent, S., Rombouts, J., 2006. Multivariate GARCH

Models: A Survey. Journal of Applied Econometrics, Volume 21, Issue 1,

Pages 79-109.

[22] Bawa, V., 1975. Optimal Rules for Ordering Uncertain Prospects. Journal of

Financial Economics, Volume 2, Issue 1, Pages 95-121.

[23] Bawa, V., Lindenberg, E., 1977. Capital Market Equilibrium in a Mean-

Lower Partial Moment Framework. Journal of Financial Economics, Volume

5, Issue 2, Pages 189-200.

[24] Ben-Tal, A., Ghaoui, L., Nemirovski, A., 2009. Robust Optimization.

Princeton University Press, New Jersey.

105

[25] Beraldi, P., Bruni, M., 2014. A Clustering Approach for Scenario Tree

Reduction: An Application to a Stochastic Programming Portfolio

Optimization Problem. TOP, Volume 22, Issue 3, Pages 934-949.

[26] Beraldi, P., Simone, F., Violi, A., 2010. Generating Scenario Trees: A

Parallel Integrated Simulation-Optimization Approach. Journal of

Computational and Applied Mathematics, Volume 233, Issue 9, Pages 2322-

2331.

[27] Berman, M., 1964. Limiting Theorems for the Maximum Term in Stationary

Sequences. Annals of Mathematical Statistics, Volume 35, Issue 2, Pages

502-516.

[28] Bertsimas, D., Lauprete, G., Samarov, A., 2004. Shortfall as a Risk

Measure: Properties, Optimization and Applications. Journal of Economic

Dynamics & Control, Volume 28, Issue 7, Pages 1353-1381.

[29] Bertocchi, M., Consigli, G., Dempster, M., 2011. Stochastic Optimization

Methods in Finance and Energy. Springer, New York.

[30] Birge, J., Louveaux, F., 2011. Introduction to Stochastic Programming,

Second Edition. Springer, New York.

[31] Black, F. 1976. Studies of Stock Price Volatility Changes. Proceedings of the

1976 Meetings of the American Statistical Association, Business, and

Economics Section, Pages 177-181.

[32] Blattberg, R., Gonedes, N., 1974. A Comparison of the Stable and Student

Distributions as Statistical Models for Stock Prices. Journal of Business,


[33] Boender, G., 1997. A Hybrid Simulation/Optimisation Scenario Model for

Asset/Liability Management. European Journal of Operational Research,


[34] Bollerslev, T., 1986. Generalized Autoregressive Conditional

Heteroscedasticity. Journal of Econometrics, Volume 31, Issue 3, Pages 307-

327.

[35] Bollerslev, T., 1987. A Conditional Heteroskedastic Time Series Model for

Speculative Prices and Rates of Return. Review of Economics and Statistics,


106

[36] Bollerslev, T., 1990. Modelling the Coherence in Short-Run Nominal

Exchange Rates: A Multivariate Generalized ARCH Model. Review of

Economics and Statistics, Volume 72, Issue 3, Pages 498-505.

[37] Bollerslev, T., Engle, R., Wooldridge, J., 1988. A Capital Asset Pricing

Model with Time-Varying Covariances. Journal of Political Economy, Volume

96, Issue 1, Pages 116-131.

[38] Bollerslev, T., Wooldridge, J., 1992. Quasi-Maximum Likelihood Estimation

and Inference in Dynamic Models with Time-Varying Covariances.

Econometric Reviews, Volume 11, Issue 2, Pages 143-172.

[39] Bouyé, E., et al., 2000. Copulas for Finance - A Reading Guide and Some

Applications. Working Paper, City University Business School – Financial

Econometrics Research Center. London.

[40] Brandt, M., 2010. Portfolio Choice Problems. In: Aït-Sahalia, Y., and

Hansen, L., 2010. Handbook of Financial Econometrics: Tools and

Techniques. Elsevier B.V., Pages 269-336.

[41] Cai, J., 1994. A Markov Model of Switching-Regime ARCH. Journal of

Business and Economic Statistics, Volume 12, Issue 3, Pages 309-316.

[42] Campbell, J., Lo, A., MacKinlay, C. 1997. The Econometrics of Financial

Markets. Princeton University Press. Princeton, New Jersey.

[43] Campbell, J., Vieira, L., 2002. Strategic Asset Allocation: Portfolio Choice

for Long-Term Investors. Oxford University Press. Oxford.

[44] Cappiello, L. Engle, R., Sheppard, K., 2006. Asymmetric Dynamics in the

Correlations of Global Equity and Bond Returns. Journal of Financial

Econometrics, Volume 4, Issue 4, Pages 537-572.

[45] Cariño, D., et al., 1994. The Russell-Yasuda Kasai Model: An Asset Liability

Model for a Japanese Insurance Company using Multi-Stage Stochastic

Programming. Operations Research, Volume 24, Issue 1, Pages 29-49.

[46] Cariño, D., Myers, D., Ziemba, W., 1998. Concepts, Technical Issues, and

Uses of the Russell-Yasuda Kasai Financial Planning Model. Operations


[47] Cariño, D., Turner, A., 1998. Multiperiod Asset Allocation with Derivative

Assets. In: Ziemba, W., Mulvey, J.. Worldwide Asset and Liability Modeling.

Cambridge University Press, Cambridge.

107

[48] Castillo, E., Hadi, A., 1997. Fitting the Generalized Pareto Distribution to

Data. Journal of the American Statistical Association, Volume 92, Issue 440,

Pages 1609-1620.

[49] Chekhlov, A., Uryasev, S., Zabarankin, M., 2000. Portfolio Optimization

with Drawdown Constraints. Research Report 2000-5, Center for Applied

Optimization, Department of Industrial and Systems Engineering, University

of Florida. Available at: http://papers.ssrn.com/sol3/papers.cfm?

abstract_id=223323

[50] Chekhlov, A., Uryasev, S., Zabarankin, M., 2005. Drawdown Measure in

Portfolio Optimization. International Journal of Theoretical and Applied

Finance, Volume 8, Issue 1, Pages 13-58.

[51] Cherubini, U., Luciano, E., Vecchiato, W., 2004. Copulas Methods in

Finance. John Wiley & Sons. Chichester, UK.

[52] Christie, A., 1982. The Stochastic Behavior of Common Stock Variances:

Value, Leverage and Interest Rate Effects. Journal of Financial Economics,


[53] Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values.

Springer Series in Statistics. London.

[54] Consigli, G., 1997. Dynamic Stochastic Programming for Asset and Liability

Management. Doctoral Thesis, University of Essex, London.

[55] Consigli, G., Dempster, M., 1998. Dynamic Stochastic Programming for

Asset and Liability Management. Annals of Operations Research, Volume 81,

Issue 0, Pages 131-162.

[56] Consigli, G., Ianquita, G., Moriggia, V., 2010. Path-Dependent Scenario

Trees for Multistage Stochastic Programmes in Finance. Quantitative

Finance, Volume 12, Issue 8, Pages 1265-1281.

[57] Consiglio, A., Carollo, A., Zenios, S., 2015. A Parsimonious Model for

Generating Arbitrage-Free Scenario Trees. Quantitative Finance.

Forthcoming. Available at: http://papers.ssrn.com/sol3/papers.cfm?

abstract_id=2362014

[58] Cont, R., 2001. Empirical Properties of Asset Returns: Stylized Facts and

Statistical Issues. Quantitative Finance, Volume 1, Issue 2, Pages 223-236.

108

[59] Dacorogna, M., Müller, U., Pictet, O., Vries, C., 2001. Extremal Forex

Returns in Extremely Large Data Sets. Extremes, Volume 4, Issue 2, Pages

105–127.

[60] Dantzig, G., Infanger, G., 1991. Large-Scale Stochastic Linear Programs:

Importance Sampling and Benders Decomposition. Technical Report SOL

91-4, Department of Operations Research, Stanford University.

[61] Dantzig, G., Infanger, G., 1993. Multi-Stage Stochastic Linear Programs for

Portfolio Optimization. Annals of Operations Research, Volume 45, Issue 1,

Pages 59-76.

[62] Date, P., Mamon, R., Jalen, L. 2008. A New Moment Matching Algorithm for

Sampling from Partially Specified Symmetric Distributions. Operations

Research Letters, Volume 36, Issue 6, Pages 669–672.

[63] Davison, A., Smith, R., 1990. Models for Exceedances over High

Thresholds. Journal of the Royal Statistical Society – Series B. Volume 52,


[64] Demarta, S., McNeil, A., 2005. The t Copula and Related Copulas.

International Statistical Review, Volume 73, Issue 1, Pages 111-129.

[65] DeMiguel, V., Garlappi, L., Uppal, R., 2009. Optimal versus Naive

Diversification: How Inefficient is the 1/N Portfolio Strategy? Review of

Financial Studies. Volume 22, Issue 5, Pages 1915-1953.

[66] DeMiguel, V., Plyakha, Y., Uppal, R., Vilkov, G., 2009. Improving Portfolio

Selection using Option-Implied Volatility and Skewness. Journal of Financial

and Quantitative Analysis, Volume 48, Issue 6, Pages 1813-1845.

[67] Dempster, M., 2006. Sequential Importance Sampling Algorithms for

Dynamic Stochastic Programming. Journal of Mathematical Sciences,


[68] Dempster, M., Medova, E., Sook, Y., 2011. Comparison of Sampling

Methods for Dynamic Stochastic Programming. In: Bertocchi, M., Consigli,

G., and Dempster, M.. Stochastic Optimization Methods in Finance and

Energy. Springer, New York.

[69] Dempster, M., Medova, E., Sook, Y., 2015. Stabilizing Implementable

Decisions in Dynamic Stochastic Programming. Available at:

http://ssrn.com/abstract=2545992

109

[70] Dempster, M., Pflug, G., Mitra, G., 2008. Quantitative Fund Management.

Chapman and Hall/CRC, Boca Raton.

[71] Dempster, M., Thompson, R., 1999. EVPI-Based Importance Sampling

Solution Procedures for Multistage Stochastic Linear Programmes on Parallel

MIMD Architectures, Annals of Operations Research, Volume 90, Issue 0,

Pages 161-184.

[72] Dempster, M., Thorlacius, A., 1998. Stochastic Simulation of Internal

Economic Variables and Asset Returns: The Falcon Asset Model.

Proceedings of the 8th International AFIR Colloquium, London Institute of

Actuaries, Pages 29–45.

[73] Dias, J., Vermunt, J., Ramos, S., 2009. Mixture Hidden Markov Models in

Finance Research. In: Fink, A., et al.. Advances in Data Analysis, Data

Handling and Business Intelligence. Springer, New York.

[74] Dias, J., Vermunt, J., Ramos, S., 2015. Clustering Financial Time Series:

New Insights from an Extended Hidden Markov Model. European Journal of

Operational Research, Volume 243, Issue 3, Pages 852-864.

[75] Diebold, F., Schuermann, T., Stroughair, J., 1998. Pitfalls and

Opportunities in the Use of Extreme Value Theory in Risk Management. In:

Advances in Computational Finance, Volume 2, Pages 3-12, Kluwer

Academic Publishers, Amsterdam.

[76] Ding, Z., Granger, C., 1996. Modeling Volatility Persistence of Speculative

Returns: A New Approach. Journal of Econometrics, Volume 73, Issue 1,

Pages 185–215.

[77] Ding, Z., Granger, C., Engle, R., 1993. A Long Memory Property of Stock

Market Returns and a New Model. Journal of Empirical Finance, Volume 1,

Issue 1, Pages 83–106.

[78] Doerner, K., et al., 2004. Pareto Ant Colony Optimization: A Metaheuristic

Approach to Multiobjective Portfolio Selection. Annals of Operations


[79] Dowd, K., 2002. Measuring Market Risk. John Wiley & Sons, Hoboken, New

Jersey.

[80] Dueker, M., 1997. Markov Switching in GARCH Process and Mean-

Reverting Stock-Market Volatility. Journal of Business and Economics

Statistics, Volume 15, Issue 1, Pages 26-34.

110

[81] Dupačová, J., 1999. Portfolio Optimization via Stochastic Programming:

Methods of Output Analysis. Mathematical Methods of Operational Research,

Volume 50, Pages 245-270.

[82] Dupačová, J., Consigli, G., Wallace, S., 2000. Scenarios for Multistage

Stochastic Programs. Annals of Operations Research, Volume 100, Issue 1,

Pages 25-53.

[83] Dupačová, J., Gröwe-Kuska, N., Römisch, W., 2003. Scenario Reduction

in Stochastic Programming. Mathematical Programming, Volume 95, Issue 3,

Pages 493-511.

[84] Dupačová, J., Hurt, J., Štěpán, J. 2003. Stochastic Modeling in Economics

and Finance. Kluwer Academic Publishers, New York.

[85] Eberlein, E., Keller, U., 1995. Hyperbolic Distributions in Finance. Bernoulli,


[86] Eberlein, E., Keller, U., Prause, K., 1998. New Insights into Smile,

Mispricing, and Value at Risk – The Hyperbolic Model. Journal of Business,


[87] Ehrgott, M., 2005. Multicriteria Optimization, Second Edition. Springer-

Verlag, Heidelberg.

[88] Ekin, T., Polson, N., Soyer, R., 2014. Augmented Markov Chain Monte

Carlo Simulation for Two-Stage Stochastic Programs with Recourse.

Decision Analysis, Volume 11, Issue 4, Pages 250-264.

[89] Embrechts, P., 2000. Extreme Value Theory: Potential and Limitations as an

Integrated Risk Management Tool. In: Derivatives Use, Trading and

Regulation, Volume 6, Issue 1, Pages 449-456.

[90] Embrechts, P., Klüppelberg, C., Mikosch, T., 1997. Modelling Extremal

Events. Springer, Berlin.

[91] Embrechts, P., Lindskog, F., McNeil, A., 2001. Modelling Dependence with

Copulas and Applications to Risk Management. In: Handbook of Heavy

Tailed Distributions in Finance, Elsevier, North Holland.

[92] Embrechts, P., McNeil, A., Straumann, D., 2002. Correlation and

Dependence in Risk Management: Properties and Pitfalls. In: Risk

Management: Value at Risk and Beyond, Pages 176-223, Cambridge

University Press, Cambridge.

111

[93] Engle, R., 1982. Autoregressive Conditional Heteroscedasticity with

Estimates of the Variance of United Kingdom Inflation. Econometrica,


[94] Engle, R., 2002a. Dynamic Conditional Correlation: A Simple Class of

Multivariate Generalized Autoregressive Conditional Heteroskedasticity

Models. Journal of Business and Economics Statistics, Volume 20, Issue 3,

Pages 339-350.

[95] Engle, R., Kroner, F., 1995. Multivariate Simultaneous Generalized ARCH.

Economic Theory, Volume 11, Pages 122-150.

[96] Engle, R., Ng, V., 1993. Measuring and Testing the Impact of News on

Volatility. Journal of Finance, Volume 48, Issue 5, Pages 1749-1778.

[97] Fama, E., 1963. Mandelbrot and the Stable Paretian Hypothesis. Journal of

Business, Volume 36, Issue 4, Pages 420-429.

[98] Fama, E., 1965. Portfolio Analysis in a Stable Paretian Market. Management

Science, Volume 11, Issue 3, Pages 404-419.

[99] Fazlollahi, S., Maréchal, F., 2013. Multi-Objective, Multi-Period Optimization

of Biomass Conversion Technologies using Evolutionary Algorithms and

Mixed Integer Linear Programming (MILP). Applied Thermal Engineering,


[100] Fersti, R., Weissensteiner, A., 2010. Cash Management using Multi-Stage

Stochastic Programming. Quantitative Finance, Volume 10, Issue 2, Pages

209-219.

[101] Fieldsend, J., Matatko, J., Peng, M. 2003. Cardinality Constrained Portfolio

Optimisation Problem. Proceedings of the Fifth International Conference on

Intelligent Data Engineering and Automated Learning, Exerter.

[102] Finkenstädt, B., Holger, R. 2003. Extreme Values in Finance,

Telecommunications, and the Environment. Chapman and Hall/CRC.

London, UK.

[103] Fishburn, P., 1977. Mean-Risk Analysis with Risk Associated with Below-

Target Returns. American Economic Review, Volume 1967, Issue 2, Pages

116-126.

[104] Fisher, R., Tippett, L., 1928. Limiting Forms of the Frequency Distribution of

the Largest or Smallest Member of a Sample. Proceedings of the Cambridge

Philosophical Society, Volume 24, Issue 2, Pages 180-190.

112

[105] Föllmer, H., Schied, A., 2002. Convex Measures of Risk and Trading

Constraints. Finance and Stochastics, Volume 6, Issue 4, Pages 429-447.

[106] Genest, C., Ghoudi, K., Rivest, L., 1995. A Semiparametric Estimation

Procedure of Dependence Parameters in Multivariate Families of

Distributions. Biometrika, Volume 82, Issue 3, Pages 543-552.

[107] Geyer, A., Hanke, M., Weissensteiner, A., 2010. No-Arbitrage Conditions,

Scenario Trees, and Multi-Asset Financial Optimization. European Journal of


[108] Geyer, A., Hanke, M., Weissensteiner, A., 2013. Scenario Tree Generation

and Multi-Asset Financial Optimization Problems. Operations Research

Letters, Volume 41, Issue 5, Pages 494–498.

[109] Geyer, A., Hanke, M., Weissensteiner, A., 2014. No-Arbitrage Bounds for

Financial Scenarios. European Journal of Operational Research, Volume

236, Issue 2, Pages 657–663.

[110] Ghose, D., Kroner, K., 1995. The Relationship between GARCH and

Symmetric Stable Processes - Finding the Source of Fat Tails in Financial

Data. Volume 2, Issue 3, Pages 225-251.

[111] Glosten, L., Jagannathan, R., Runkle, 1993. On the Relation between the

Expected Value and the Volatility of the Nominal Excess Return on Stocks.

Journal of Finance, Volume 48, Issue 5, Pages 1779–1801.

[112] Gibbs, A., Su, F., 2002. On Choosing and Bounding Probability Metrics.


[113] Gilli, M., Këllezi, E., 2006. An Application of Extreme Value Theory for

Measuring Financial Risk. Computational Economics, Volume 27, Issue 1,

Pages 1-23.

[114] Gnedenko, B., 1943. Sur la Distribution Limite du Terme Maximum d’une

Série Aléatoire. The Annals of Mathematics, Volume 44, Issue 3, Pages 423–

453.

[115] Gonzalez-Rivera, G., 1998. Smooth Transition GARCH Models – Studies in

Nonlinear Dynamics and Economics, Volume 3, Issue 2, Pages 161-178.

[116] Granger, C., Spear, S., Ding. Z., 2000. Stylized Facts on the Temporal and

Distributional Properties of Absolute Returns: An Update. In: Statistics and

Finance: An Interface. Imperial College Press, London.

113

[117] Gray, S., 1996. Modeling the Conditional Distribution of Interest Rates as a

Regime-Switching Process. Journal of Financial Economics, Volume 42,

Issue 1, Page 27-62.

[118] Gröwe-Kuska, N., Heitsch, H., Römisch, W., 2003. Scenario Reduction

and Scenario Tree Construction for Power Management Problems.

Proceedings of the IEEE Bologna Power Tech Conference. Bologna, Italy.

[119] Guastaroba, G., 2010. Portfolio Optimization: Scenario Generation, Models

and Algorithms. Doctoral Thesis, University of Bergamo, Bergamo.

[120] Gulpinar, N., Rustem, B., Settergren, R., 2004. Simulation and

Optimization Approaches to Scenario Tree Generation. Journal of Economic

Dynamics & Control, Volume 28, Issue 7, Pages 1291-1315.

[121] Hafner, C., Franses, P., 2009. A Generalized Dynamic Conditional

Correlation Model: Simulation and Application to Many Assets. Econometric

Reviews, Volume 26, Issue 6, Pages 612-631.

[122] Hagerman, R., 1978. More Evidence on the Distribution of Security Returns.

Journal of Finance, Volume 33, Issue 4, Pages 1213-1221.

[123] Hagerud, G., 1997. A New Non-Linear GARCH Model. Doctoral Thesis,

Stockholm School of Economics, Stockholm.

[124] Hamilton, J., Susmel, R., 1994. Autoregressive Conditional

Heteroskedasticity and Changes in Regime. Journal of Econometrics,

Volume 64, Issues 1-2, Pages 307-333.

[125] Hansen, B., 1994. Autoregressive Conditional Density Estimation.

International Economic Review, Volume 35, Issue 3, Pages 705-730.

[126] Hansen, P., Lunde, A., 2005. A Forecast Comparison of Volatility Models:

Does Anything beat a GARCH (1,1)? Journal of Applied Econometrics,


[127] Harrison, M., Kreps, D., 1979. Martingale and Arbitrage in Multiperiod

Securities Markets. Journal of Economic Theory, Volume 20, Issue 3, Pages

381–408.

[128] Harvey, A., Shephard, N., 1996. Estimation of an Asymmetric Stochastic

Volatility Model for Asset Returns. Journal of Business and Economic

Statistics, Volume 14, Issue 4, Pages 429-434.

[129] Hansen, B., 1994. Autoregressive Conditional Density Estimation.

International Economic Review, Volume 35, Issue 3, Pages 705–730

114

[130] Hauksson, A., Dacorogna, M., Domenig, T., Müller, U., Samorodnitsky,

G., 2001. Multivariate Extremes, Aggregation and Risk Estimation.

Quantitative Finance, Volume 1, Issue 1, Pages 79–95.

[131] He, C., 1997. Statistical Properties of GARCH Processes. Doctoral Thesis,

Stockholm School of Economics, Stockholm.

[132] Heitsch, H., Römisch, W., 2003. Scenario Reduction in Stochastic

Programming. Computational Optimization and Applications, Volume 24,


[133] Heitsch, H., Römisch, W., 2005. Generation of Multivariate Scenario Trees

to model Stochasticity in Power Management. IEEE St. Petersburg

PowerTech Proceedings, St. Petersburg.

[134] Heitsch, H., Römisch, W., 2007. Scenario Tree Modelling for Multistage

Stochastic Programs. Mathematical Programming, Volume 118, Issue 2,

Pages 371-406.

[135] Heitsch, H., Römisch, W., 2008. Scenario Tree Reduction for Multistage

Stochastic Programs. Computational Management Science, Volume 6, Issue

2, Pages 117-133.

[136] Heitsch, H., Römisch, W., 2011. Stability and Scenario Trees for Multistage

Stochastic Programs. In: Infanger, G.. Stochastic Programming. Springer,

New York.

[137] Heitsch, H., Römisch, W., 2011. Scenario Tree Generation for Multi-Stage

Stochastic Programs. In: Bertocchi, M., Consigli, G., Dempster, M..

Stochastic Optimization Methods in Finance and Energy. Springer, New

York.

[138] Heitsch, H., Römisch, W., Strugarek, C., 2006. Stability of Multistage

Stochastic Programs. SIAM Journal on Optimization, Volume 17, Issue 2,

Pages 511-525.

[139] Hill, B., 1975. A Simple General Approach to Inference about the Tail of a

Distribution. Annals of Statistics, Volume 3, Issue 5, Pages 1163–1174.

[140] Hochreiter, R., 2014. Modeling Multi-Stage Decision Optimization Problems.

In: Fonseca, R., Weber, G., Telhada, J.. Computational Management

Science: State of the Art 2014. Pages 209-214. Springer, New York.

115

[141] Hochreiter, R., Pflug, G., 2006. Financial Scenario Generation for

Stochastic Multi-Stage Decision Processes as Facility Location Problems.

Annals of Operations Research, Volume 152, Issue 1, Pages 257-272.

[142] Hochreiter, R., Pflug, G., Wozabal, D., 2006. Multi-Stage Stochastic

Electricity Portfolio Optimization in Liberated Energy Markets.

[143] Hosking, J., 1985. Maximum Likelihood Estimation of the Parameters of the

Generalized Extreme-Value Distribution. Applied Statistics, Volume 34, Issue

3, Pages 301–310.

[144] Høyland, K., Wallace, S., 2001. Generating Scenario Trees for Multistage

Decision Problems. Management Science, Volume 47, Issue 2, Pages 295-

307.

[145] Høyland, K., Kaut, M., Wallace, S., 2003. A Heuristic for Moment-Matching

Scenario Generation. Computational Optimization and Applications, Volume

24, Issue 2, Pages 169-185.

[146] Hosking, J., Wallis, J., 1987. Parameter and Quantile Estimation for the

Generalized Pareto Distribution. Technometrics, Volume 29, Issue 3, Pages

339–349.

[147] Hsu, D., Miller, R., Wichern, D., 1974. On the Stable Paretian Behavior of

Stock-Market Prices. Journal of American Statistical Association, Volume 69,


[148] Hu, W., Kercheval, A., 2007. Risk Management with Generalized Hyperbolic

Distributions. Proceedings of the Fourth IASTED International Conference on

Financial Engineering and Applications, Pages 19-24. Berkeley, California.

[149] Huang, S., Chien, Y., Wang, R., 2011. Applying GARCH-EVT-Copula

Models for Portfolio Value-at-Risk on G7 Currency Markets. International

Research Journal of Finance and Economics, Issue 74, Pages 136-151.

[150] Huang, J., Lee, K., Liang, H., Lin, W., 2009. Estimating Value at Risk of

Portfolio by Conditional Copula-GARCH Method. Insurance: Mathematics

and Economics, Volume 45, Issue 3, Pages 315–324.

[151] Ingersoll, J., 1987. Theory of Financial Decision Making. Rowman &

Littlefield Publishers, Lanham.

[152] Jacquier, E., Polson, N., Rossi, P. 1994. Bayesian Analysis of Stochastic

Volatility Models. Journal of Business & Economics Statistics, Volume 12,


116

[153] Jacquier, E., Polson, N., Rossi, P. 2004. Bayesian Analysis of Stochastic

Volatility Models with Fat-Tails and Correlated Errors. Journal of


[154] Jansen, D,. Vries, C., 1991. On the Frequency of Large Stock Market

Returns: Putting Booms and Busts into Perspective. Review of Economics

and Statistics Volume 73, Issue 1, Pages 18-24.

[155] Jarraya, B., 2013. Asset Allocation and Portfolio Optimization Problems with

Metaheuristics: A Literature Survey. Business Excellence and Management,


[156] Jean, W., 1971. The Extension of Portfolio Analysis to Three or More

Parameters. Journal of Financial and Quantitative Analysis, Volume 6, Issue

1, Pages 505-515.

[157] Jenkinson, A., 1955. The Frequency Distribution of the Annual Maximum (or

Minimum) Values of Meteorological Elements. Quarterly Journal of the Royal

Meteorological Society, Volume 81, Issue 348, Pages 158–171.

[158] Jenkinson, A., 1969. Statistics of Extremes. In: Estimation of Maximum

Floods. World Meteorological Organization. Technical Note 98, Pages 183-

228.

[159] Jobst, N., Zenios, S., 2005. On the Simulation of Portfolios of Interest Rate

and Credit Risk Sensitive Securities. European Journal of Operational


[160] Joe, H. 1997. Multivariate Models and Dependence Concepts. Chapman &

Hall, London, UK.

[161] Joe, H., 2015. Dependence Modelling With Copulas. CRC Press, Taylor &

Francis Group, Boca Raton.

[162] Jondeau, E., Rockinger, M., 2006. A Copula-GARCH Model of Conditional

Dependencies: An International Stock Market Application. Journal of

International Money and Finance, Volume 25, Issue 5, Pages 827-853.

[163] Kahneman, D., Tversky, A., 1979. Prospect Theory: An Analysis of Decision

under Risk. Econometrica, Volume 47, Issue 2, Pages 263-291.

[164] Kall, P., Mayer, J., 2011. Stochastic Linear Programming: Models, Theory

and Computation, Second Edition. Springer, New York.

[165] Kall, P., Wallace, S., 2003. Stochastic Programming. John Wiley & Sons.

Hoboken, New Jersey.

117

[166] Kolm, P., Tütüncü, R., Fabozzi, F., 2014. 60 Years of Portfolio Optimization:

Practical Challenges and Current Trends. European Journal of Operational


[167] Karanasos, M., Karanassou, M., Fountas, S., 2004. Analyzing US Inflation

by a GARCH Model with Simultaneous Feedback. WSEAS Transactions on

Information Science and Applications, Volume 2, Issue 1, Pages 767–772.

[168] Kaut, M., Wallace, W., 2007. Evaluation of Scenario Generation Methods for

Stochastic Programming. Pacific Journal of Optimization, Volume 3, Issue 2,

Pages 275-271.

[169] Keefer, D., 1994. Certainty Equivalents for Three-Point Discrete-Distribution

Approximations. Journal of Management Science, Volume 40, Issue 6,

Pages 760-773.

[170] Kim, G., Silvapulle, M., Silvapulle, P., 2007. Comparison of Semiparametric

and Parametric Methods for Estimating Copulas. Computational Statistics &

Data Analysis, Volume 51, Issue 6, Pages 2836-2850.

[171] King, A., Wallace, S., 2012. Modeling with Stochastic Programming.

Springer, New York.

[172] Klaassen, F., 2002. Improving GARCH Volatility Forecasts with Regime-

Switching GARCH. Empirical Economics, Volume 27, Issue 2, Pages 363-

394.

[173] Klaassen, P., 1997. Discretized Reality and Spurious Profits in Stochastic

Programming Models in Asset/Liability Management. European Journal of


[174] Klaassen, P., 1998. Financial Asset-Pricing Theory and Stochastic

Programming Models for Asset/Liability Management: A Synthesis.

Management Science, Volume 44, Issue 1, Pages 31-48.

[175] Klaassen, P., 2002. Comment on “Generating Scenario Trees for Multistage

Decision Problems. Management Science, Volume 48, Issue 11, Pages

1512-1516.

[176] Kleywegt, A., Shapiro, A., Homem-de-Mello, T., 2002. The Sample

Average Approximation Method for Stochastic Discrete Optimization. SIAM

Journal on Optimization, Volume 12, Issue 2, Pages 479-2002.

[177] Knab, B., et al., 2003. Model-Based Clustering with Hidden Markov Models

and its Applications to Financial Time-Series Data. In: Schader, M., Gaul, W.,

118

Vichi, M.. Between Data Science and Applied Data Analysis. Springer, New

York.

[178] Koedijk, G., Schafgans, M., Vries, C., 1990. The Tail Index of Exchange

Rate Returns. Journal of International Economics, Volume 29, Issues 1-2,

Pages 93-108.

[179] Koedijk, G., Stork, P., Vries, C., 1992. Differences between Foreign

Exchange Rate Regimes: The View from the Tails. Journal of International

Money and Finance, Volume 11, Issue 5, Pages 462-473.

[180] Kon, S., 1984. Models of Stock Returns – A Comparison. Journal of Finance,


[181] Konno, H., Yamazaki, H., 1991. Mean-Absolute Deviation Portfolio

Optimization Model and its Applications to Tokyo Stock Market. Management

Science, Volume 37, Issue 5, Pages 519-531.

[182] Konno, H., Shirakawa, H., Yamazaki, H., 1993. A Mean-Absolute Deviation-

Skewness Portfolio Optimization Model. Annals of Operations Research,


[183] Konno, H., Waki, H., Yuuki, A., 2002. Portfolio Optimization under Lower

Partial Risk Measures. Asian-Pacific Financial Markets, Volume 9, Issue 2,

Pages 127-140.

[184] Kouwenberg, R., 2001. Scenario Generation and Stochastic Programming

Models for Asset Liability Management. European Journal of Operational


[185] Kouwenberg, R., Zenios, S., 2001. Stochastic Programming Models for

Asset Liability Management. In: Zenios, S., Ziemba, W., 2001. Handbook of

Asset and Liability Management: Applications and Case Studies. Elsevier,

Amsterdam.

[186] Kovacevic, R., Pichler, A., 2015. Tree Approximation for Discrete Time

Stochastic Processes: A Process Distance Approach. Annals of Operations


[187] Krokhmal, P., Palmquist, J., Uryasev, S., 2002. Portfolio Optimization with

Conditional Value-at-Risk Objective and Constraints. Journal of Risk, Volume

4, Pages 43-68.

119

[188] Krokhmal, P., Zabarankin, M., Uryasev, S., 2011. Modeling and

Optimization of Risk. Surveys in Operations Research and Management

Science, Volume 16, Pages 49-66.

[189] Kurowicka, D., Joe, H., 2011. Dependence Modelling: Vine Copula

Handbook. World Scientific Publishing, Singapore.

[190] Liao, W., 2005. Clustering of Time Series Data – A Survey. Pattern

Recognition, Volume 38, Issue 11, Pages 1857-1874.

[191] Lin, D., Wang, S., Yan, H., 2001. A Multiobjective Genetic Algorithm for

Portfolio Selection Problem. Proceedings of ICOTA, Hong Kong.

[192] Ling, S., McAleer, M., 2002. Stationarity and the Existence of Moments of a

Family of GARCH Processes. Journal of Econometrics, Volume 106, Issue 1,

Pages 109-117.

[193] Lo, A., MacKinlay, A., 1988. Stock Market Prices do not follow Random

Walks: Evidence from a Simple Specification Test. Review of Financial

Studies, Volume 1, Issue 1, Pages 41–66.

[194] Lo, A., MacKinlay, A., 1990. When are Contrarian Profits due to Stock

Market Overreaction? Review of Financial Studies, Volume 3, Issue 2, Pages

175–205.

[195] Lo, A., MacKinlay, A., 1999. A Non-Random Walk Down Wall Street.

Princeton University Press, Princeton, New Jersey.

[196] Löhndorf, N., 2016. An Empirical Analysis of Scenario Generation Methods

for Stochastic Optimization. Working Paper. Vienna University of Economics

and Business. Available at: http://www.optimization-

online.org/DB_HTML/2016/02/5319.html, 2016-03-03.

[197] Longin, F., 1996. The Asymptotic Distribution of Extreme Stock Market

Returns. Journal of Business, Volume 69, Issue 3, Pages 383–408.

[198] Longin, F., 1999. Optimal Margin Level in Futures Markets: Extreme Price

Movements. Journal of Futures Markets, Volume 19, Issue 2, Pages 127–

152.

[199] Lubrano, M. 2001. Smooth Transition GARCH Models: A Bayesian

Perspective. Recherches Economiques de Louvain, Louvain Economic

Review, Volume 67, Issue 3, Pages 257-287.

120

[200] Lux, T. 2001. The Limiting Extremal Behaviour of Speculative Returns: An

Analysis of Intra-Daily Data from the Frankfurt Stock Exchange. Applied

Financial Economics, 2001, Volume 11, Issue 3, Pages 299-315.

[201] Malmsten, H., Teräsvirta, T., 2010. Stylized Facts of Financial Series and

Three Popular Models of Volatility. European Journal of Pure and Applied

Mathematics, Volume 3, Issue 3, Pages 443-477.

[202] Mandelbrot, B., 1963. The Variation of Certain Speculative Prices. Journal

of Business, Volume 36, Issue 4, Pages 394-419.

[203] Mandelbrot, B., 1967. The Variation of Some Other Speculative Prices.

Journal of Business, Volume 40, Issue 4, Pages 393-413.

[204] Mandelbrot, B., Taylor, H., 1967. On the Distribution of Stock Price

Differences. Operations Research. Volume 15, Issue 6, Pages 1057-1062.

[205] Mansini, R., Ogryczak, W., Speranza, M., 2014. Twenty Years of Linear

Programming based Portfolio Optimization. Volume 234, Issue 2, Pages 518-

535.

[206] Mansini, R., Ogryczak, W., Speranza, M., 2015. Linear and Mixed Integer

Programming for Portfolio Optimization. Springer International Publishing,

Switzerland.

[207] Mao, J., 1970. Models of Capital Budgeting, E-V vs E-S. Journal of Financial

and Quantitative Analysis, Volume 4, Issue 5, Pages 657-675.

[208] Markowitz, H., 1952. Portfolio Selection, Journal of Finance, Volume 7,


[209] Markowitz, H., 1959. Portfolio Selection: Efficient Diversification of

Investments, Second Edition. John Wiley & Sons, New York.

[210] Marler, R., Arora, J., 2004. Survey of Multi-Objective Methods for

Engineering. Structural and Multidisciplinary Optimization, Volume 26, Issue

6, Pages 369-395.

[211] McNeil, A., 1997. Estimating the Tails of Loss Severity Distributions using

Extreme Value Theory. ASTIN Bulletin, Volume 27, Issue 1, Pages 117-137.

[212] McNeil, A., Saladin, T., 1997. The Peak over Threshold Method for

Estimating High Quantiles of Loss Distributions. In: Proceedings of 28th

International ASTIN Colloquium.

121

[213] McNeil, A., Frey, R., 2000. Estimation of Tail Related Risk Measure for

Heteroscedastic Financial Time Series: An Extreme Value Approach. Journal

of Empirical Finance, Volume 7, Issues 3-4, Pages 271-300.

[214] McNeil, A., Frey, R., Embrechts, P., 2005. Quantitative Risk Management.

Princeton University Press. Princeton, New Jersey.

[215] Merton, R., 1969. Lifetime Portfolio Selection under Uncertainty: The

Continuous-Time Case. Review of Economics and Statistics. Volume 51,


[216] Merton, R., 1971. Optimum Consumption and Portfolio Rules in a

Continuous-Time Model. Journal of Economic Theory. Volume 3, Issue 4,

Pages 373-413.

[217] Meucci, A., 2005. Risk and Asset Allocation. Springer-Verlag. Heidelberg,

Berlin.

[218] Mikosch, T., 2003. Modeling Dependence and Tails of Financial Time

Series. In: Extreme Values in Finance, Telecommunications, and the

Environment. Chapman & Hall, London.

[219] Mikosch, T., Stărică, C., 2000. Limit Theory for the Sample Autocorrelations

and Extremes of a GARCH (1,1) Process. Annals of Statistics, Volume 28,


[220] Mikosch, T., Stărică, C., 2004. Nonstationarities in Financial Time Series,

the Long-Range Dependence, and the IGARCH effects. Review of

Economics and Statistics, Volume 86, Issue 1, Pages 378-390.

[221] Mises, R., 1936. La distribution de la Plus Grande de N Valeurs. Review of

the Mathematical Union Interbalcanique, Volume 1, Pages 141-160.

[222] Mittnik, S., Rachev, S., Paolella, M., 1998. Stable Paretian Modelling in

Finance: Some Empirical and Theoretical Aspects. In: A Practical Guide to

Heavy Tails: Statistical Techniques and Applications. Birkhäuser. Pages 79–

110.

[223] Mittnik, S., Paolella, M., Rachev, S., 2000. Diagnosing and Treating the Fat

Tails in Financial Returns Data. Volume 7, Issues 3–4, Pages 389–416.

[224] Mossin, J., 1968. Optimal Multiperiod Portfolio Choices. Journal of Business

of the University of Chicago. Volume 41, Pages 215-229.

[225] Mulvey, J., 1996. Generating Scenarios for the Towers Perrin Investment

System. Interfaces, Volume 26, Issue 2, Pages 1-15.

122

[226] Mulvey, J., Pauling, W., Madey, R., 2003. Advantages of Multiperiod

Portfolio Models. Journal of Portfolio Management, Volume 23, Issue 2,

Pages 35-45.

[227] Mulvey, J., Thorlacius, A., 1998. The Towers Perrin Global Capital Market

Scenario Generation System. In: World Asset and Liability Management,


[228] Mulvey, J., Vladimirou H., 1989. Stochastic Network Optimization Models

for Investment Planning. Annals of Operations Research. Volume 20, Issue

1, Pages 187-217.

[229] Musmeci, N., Aste, T., Di Matteo, T. 2014. Relation between Financial

Market Structure and the Real Economy: Comparison between Clustering

Methods. PLOS One, Volume 10, Issue 4, Pages 1-24.

[230] Nawrocki, D., 1999. A Brief History of Downside Risk Measures. Journal of

Investing, Volume 8, Issue 3, Pages 9-25.

[231] Nelsen, R., 2006. An Introduction to Copulas, Second Edition. Springer, New

York.

[232] Nelson, D., 1991. Conditional Heteroskedasticity in Asset Returns: A New

Approach. Econometrica, Volume 59, Issue 2, Pages 347-370.

[233] Nystrom, K., Skoglund, J., 2002. Univariate Extreme Value Theory,

GARCH and Measures of Risk. Preprint, Swedbank.

[234] Parpas, P. et al., 2015. Importance Sampling in Stochastic Programming: A

Markov Chain Monte Carlo Approach. INFORMS Journal on Computing,


[235] Patton, A., 2002. Applications of Copula Theory in Financial Econometrics.

PhD Dissertation. University of California, San Diego.

[236] Patton, A., 2004. On the Out-of-Sample Importance of Skewness and

Asymmetric Dependence for Asset Allocation. Journal of Financial of


[237] Patton, A., 2011. Copula Methods for Forecasting Multivariate Time Series.

In: Handbook of Economic Forecasting, Volume 2, Elsevier, Oxford.

[238] Patton, A., 2012. A Review of Copula Models for Economic Time Series.

Journal of Multivariate Analysis, Volume 110, Pages 4-18.

123

[239] Perry, P., 1983. More Evidence on the Nature of Distribution of Security

Returns. Journal of Financial and Quantitative Analysis, Volume 18, Issue 2,

Pages 211-221.

[240] Peters, J., 2001. Estimating and Forecasting Volatility of Stock Indices using

Asymmetric GARCH Models and (Skewed) Student-t Densities. Preprint.

University of Liege, Belgium.

[241] Pflug, G., 1996. Optimization of Stochastic Models: The Interface between

Simulation and Optimization. Kluwer Academic Publishers, New York.

[242] Pflug, G., 2001. Scenario Tree Generation for Multiperiod Financial

Optimization by Optimal Discretization. Mathematical Programming, Volume

89, Issue 2, Pages 251-271.

[243] Pflug, G., 2009. Version-Independence and Nested Distributions in

Multistage Stochastic Optimization. SIAM Journal on Optimization, Volume

20, Issue 3, Pages 1406-1420.

[244] Pflug, G., Pichler, A., 2011. Approximations for Probability Distributions and

Stochastic Optimization Problems. In: Bertocchi, M., Consigli, G., and

Dempster, M.. Stochastic Optimization Methods in Finance and Energy.

Springer, New York.

[245] Pflug, G., Pichler, A., 2012. A Distance for Multistage Stochastic

Optimization Models. SIAM Journal on Optimization, Volume 22, Issue 1,

Pages 1-23.

[246] Pflug, G., Pichler, A., 2014. Multistage Stochastic Optimization. Springer

International Publishing, Switzerland.

[247] Pflug, G., Pichler, A., 2015. Dynamic Generation of Scenario Trees.

Computational Optimization and Applications, Volume 62, Issue 3, Pages

641-668

[248] Pflug, G., Römisch, W., 2007. Modeling, Measuring and Managing Risk.

World Scientific Publishing Company, Singapore.

[249] Pickands, J. 1975. Statistical Inference using Extreme Value Order

Statistics. Annals of Statistics. Volume 3, Issue 1, Pages 119-131. The

[250] Prause, K., 1999. The Generalized Hyperbolic Model: Estimation, Financial

Derivatives, and Risk Measures. PhD Dissertation. University of Freiburg,

Freiburg im Breisgau.

[251] Prékopa, A., 1995. Stochastic Programming. Springer, Netherlands.

124

[252] Rabemananjara, R., Zakoian, J., 1993. Threshold ARCH Models and

Asymmetries in Volatility. Journal of Applied Econometrics, Volume 8, Issue

1, Pages 31-49.

[253] Rachev, S., 1991. Probability Metrics and the Stability of Stochastic Models.

John Wiley & Sons. Chichester, UK.

[254] Rachev, S., Römisch, W., 2002. Quantitative Stability in Stochastic

Programming: The Method of Probability Metrics. Mathematics of Operations

Research, Volume 27, Issue 4, Pages 792-818. John Wiley & Sons.

Chichester, UK.

[255] Rachev, S., Rüschendorf, L., 1998. Mass Transportation Problems, Volume

I: Theory. Springer Verlag, New York.

[256] Rachev, S., Rüschendorf, L., 1998. Mass Transportation Problems, Volume

II: Applications. Springer Verlag, New York.

[257] Rachev, S., Stoyanov, S., Fabozzi, F., 2011. A Probability Metrics

Approach to Financial Risk Measures.

[258] Rani, S., Sikka, G., 2012. Recent Techniques of Clustering of Time Series

Data: A Survey. International Journal of Computer Applications, Volume 52,


[259] Reiss, R., Thomas, M., 2007. Statistical Analysis of Extreme Value with

Applications to Insurance, Finance, Hydrology and Other Fields, 3rd Edition.

Birkhäuser, Basel.

[260] Resnick, S., 1987. Extreme Values, Regular Variation, and Point Processes.

Springer Verlag, New York.

[261] Resnick, S., Stărică, C., 1995. Consistency of Hill's Estimator for Dependent

Data. Journal of Applied Probability, Volume 32, Issue 1, Pages 139-167.

[262] Resnick, S., Stărică, C., 1998. Tail index estimation for dependent data.

Annals of Applied Probability, Volume 8, Number 4, Pages 1156-1183.

[263] Riccetti, L., 2010. The Use of Copulas in Asset Allocation: When and How a

Copula Model can be Useful? Lambert Academic Publishing, Saarbrücken.

[264] Riccetti, L., 2013. A Copula–GARCH Model for Macro Asset Allocation of a

Portfolio with Commodities. Empirical Economics, Volume 44, Issue 3, Pages

1315-1336.

[265] Rockafellar, T., 2007. Coherent Approaches to Risk in Optimization under

Uncertainty. In: Klastorin, T., Gray, P., Greenberg, H., 2007. INFORMS

125

Tutorials in Operations Research: OR Tools and Applications – Glimpses of

Future Technologies, Chapter 3, Pages 38-61.

[266] Rockafellar, T., Uryasev, S., 2000. Optimization of Conditional Value-at-

Risk. Journal of Risk, Volume 2, Pages 21-42

[267] Rockafellar, T., Uryasev, S., 2002. Conditional Value-at-Risk for General

Loss Distributions. Journal of Banking & Finance, Volume 26, Issue 7, Pages

1443-1471.

[268] Rockinger, M., Jondeau, E., 2001. Conditional Dependency of Financial

Series: An Application of Copulas. Working Paper Nº 82, Banque de France.

[269] Roman, D., Darby-Dowman, K., Mitra, G., 2008. Mean-Risk Models using

Two Risk Measures: A Multi-Objective Approach. In: Dempster, M., Pflug, G.,

Mitra, G., 2008. Quantitative Fund Management. Chapman and Hall/CRC,

Boca Raton.

[270] Römisch, W., 2009. Scenario Reduction Techniques in Stochastic

Programming. In: Watanabe, O., Zeugmann, T.. Stochastic Algorithms:

Foundations and Applications. Springer Berlin Heidelberg. Pages 1-14.

[271] Roy, A., 1952. Safety First and the Holding of Assets. Econometrica, Volume

20, Issue 3, Pages 431-449.

[272] Ruszczyński, A., Shapiro, A., 2002. Handbooks in Operations Research

and Management Science - Stochastic Programming, Volume 10. Elsevier.

[273] Ruszczyński, A., Shapiro, A., 2006. Optimization of Risk Measures. In:

Calafiore, G., Dabbene, F., 2006. Probabilistic and Randomized Methods for

Design and Uncertainty. Springer-Verlag, London.

[274] Rydberg, T., 2000. Realistic Statistical Modelling of Financial Data.


[275] Salvadori, G., Michele, C., Kottegoda, N., Rosso, R., 2007. Multivariate

Extreme Value theory. In: Extremes in Nature, Volume 56. Pages 113-129.

[276] Samuelson, P., 1969. Lifetime Portfolio Selection by Dynamic Stochastic

Programming. Review of Economics and Statistics, Volume 51, Issue 3,

Pages 239-246.

[277] Schilling, R., 2005. Measures, Integrals and Martingales. Cambridge

University Press, Cambridge.

126

[278] Shapiro, A., 2013. Sample Average Approximation. In: Gass, S., Fu, M..

Encyclopedia of Operations Research and Management Science, 3rd Edition.

Springer, New York.

[279] Shapiro, A., Dentcheva, D., Ruszczyński, A., 2009. Lectures on Stochastic

Programming: Modeling and Theory. SIAM Series on Optimization.

[280] Sigurbjörnsson, S., 2008. Scenario Tree Generation by Optimal

Discretization. Master Dissertation. Technical University of Denmark,

Denmark.

[281] Silvennoinen, A., Teräsvirta, T., 2009. Modeling Multivariate

Autoregressive Conditional Heteroskedasticity with the Double Smooth

Transition Conditional Correlation GARCH Model. Journal of Financial


[282] Sklar, A., 1959. Fonctions de Repartition à n Dimensions et Leurs Marges.

Publication de l’Institut de Statistique de l’Université de Paris, Volume 8,

Pages 229-231.

[283] Shiryaev, A., 1996. Probability, Second Edition. Springer, New York.

[284] Smith, R., 1985. Maximum Likelihood Estimation in a Class of Nonregular

Cases. Biometrika, Volume 72, Issue 1, Pages 67-90.

[285] Smith, R., 1989. Extreme Value Analysis of Environmental Time Series: An

Application to Trend Detection in Ground-Level Ozone. Statistical Science,


[286] Sodhi, M., 2005. LP Modeling for Asset-Liability Management: A survey of

Choices and Simplifications. Operations Research, Volume 53, Issue 2,

Pages 181–196.

[287] Spall, J., Hill, S., Stark, D., 2006. Theoretical Framework for Comparing

Several Stochastic Optimization Approaches. In: Calafiore, G., Dabbene, F.,

2006. Probabilistic and Randomized Methods for Design and Uncertainty.

Springer-Verlag, London.

[288] Speranza, M., 1993. Linear Programming Model for Portfolio Optimization.

Finance, Volume 14, Pages 107–123.

[289] Šutienė, K., Makackas, D., Pranevičius, H., 2010. Multistage K-Means

Clustering for Scenario Tree Construction. Informatica, Volume 21, Issue 1,

Pages 123-138.

127

[290] Šutienė, K., Pranevičius, H., 2007. Scenario Tree Generation by Clustering

the Simulated Data Paths. Proceedings of the 21st European Conference on

Modelling and Simulation. Prague, Czech Republic.

[291] Šutienė, K., Pranevičius, H., 2008. Copula Effect on Scenario Tree.

International Journal of Applied Mathematics, Volume 37, Issue 2, Pages

112-127.

[292] Taleb, N., 2009. Finiteness of Variance is Irrelevant in the Practice of

Quantitative Finance. Complexity, Volume 14, Issue 3, Pages 66-76.

[293] Tang, J., Zhou, C., Yuan, X., Sriboonchitta, S., 2015. Estimating Risk of

Natural Gas Portfolios by Using GARCH-EVT-Copula Model. Scientific World

Journal, Volume 2015, Article ID 125958, Pages 1-7.

[294] Taylor, S. 1982. Financial Returns Modelled by the Product of Two

Stochastic Processes: A Study of Daily Sugar Prices 1961-79. In: Time

Series Analysis: Theory and Practice 1. North-Holland, Amsterdam.

[295] Taylor, S., 2005. Asset Price Dynamics, Volatility, and Prediction. Princeton

University Press. Princeton, New Jersey.

[296] Taylor, S., 2007. Modelling Financial Time Series, Second Edition. World

Scientific Publishing. New York.

[297] Teräsvirta, T., 1996. Two Stylized Facts and the GARCH(1,1) Model.

Working Paper Series in Economics and Finance, Stockholm School of

Economics, Nº 96.

[298] Teräsvirta, T., Zhao, Z., 2007. Stylized Facts of Return Series, Robust

Estimates, and Three Popular Models of Volatility. Working Paper Series in

Economics and Finance, Stockholm School of Economics, No. 662.

[299] Timonina, A., 2015. Multi-Stage Stochastic Optimization: The Distance

between Stochastic Scenario Processes. Computational Science Magazine,


[300] Tsay, R., 1999. Extreme Value Analysis of Financial Data. Working paper.

Graduate School of Business, University of Chicago.

[301] Tsay, R., 2010. Analysis of Financial Data, 3rd Edition. John Wiley & Sons.

Hoboken, New Jersey.

[302] Tse, Y., Tsui, A., 2002. A Multivariate Generalized Autoregressive

Heteroscedasticity Model with Time-Varying Correlations. Journal of

Business and Economic Statistics, Volume 20, Issue 3, Pages 351-362.

128

[303] Tumminello, M., Lillo, F., Mantegna, R., 2010. Correlation, Hierarchies, and

Networks in Financial Markets. Journal of Economics Behaviour &

Organization, Volume 75, Issue 1, Pages 40-58.

[304] Unser, M. 2000. Lower Partial Moments as Measures of Perceived Risk: An

Experimental Study. Journal of Economic Psychology, Volume 21, Issue 3,

Pages 253-280.

[305] Uryasev, S., Pardalos, P., 2001. Stochastic Optimization: Algorithms and

Applications. Springer, New York.

[306] Viebig, J., Poddig, T., 2010. Modeling Extreme Returns and Asymmetric

Dependence Structures of Hedge Fund Strategies using Extreme Value

Theory and Copula Theory. Journal of Risk, Volume 13, Issue 2, Pages 23-

55.

[307] Villani, C., 2009. Optimal Transport, Old and New. Springer-Verlag,

Heidelberg, Berlin.

[308] Vladimirou, H., Zenios, S., 2012. Parallel Algorithms for Large-Scale

Stochastic Programming. In: Migdalas, A.,, Pardalos, P., and Storøy, S..

Parallel Computing in Optimization, Springer Publishing Company, London.

[309] Wallace, S., Ziemba, W., 2005. Applications of Stochastic Programming.

SIAM Series on Optimization.

[310] Wang, Z., Jin, Y., Zhou, Y., 2010. Estimating Portfolio Risk Using GARCH-

EVT-Copula Model: An Empirical Study on Exchange Rate Market. Advances

in Neural Network Research and Applications, Volume 67, Pages 65-72.

[311] Xiong, J., Idzorek, T., 2011. The Impact of Skewness and Fat Tails on the

Asset Allocation Decision. Financial Analysts Journal, Volume 67, Issue 2,

Pages 23-35.

[312] Zakoian, J., 1994. Threshold Heteroskedastic Models. Journal of Economic

Dynamics and Control, Volume 18, Issue 5, Pages 931-955.

[313] Ziemba, W., 2003. The Stochastic Programming Approach to Asset, Liability,

and Wealth Management. Research Foundation of AIMR, Charlottesville,

Virginia.

[314] Ziemba, W., Mulvey, J. 1998. World Asset and Liability Management.


Long-Term Asset Allocation Based on Stochastic Multistage...

Documents

Transcript of Long-Term Asset Allocation Based on Stochastic Multistage...