An introduction to coherent risk measures [5mm] - unibo. · PDF fileAn introduction to...

131
An introduction to coherent risk measures Giacomo Scandolo Universit` a di Firenze Risk measures: frontiers of mathematics and regulations Universit` a di Bologna - 2015

Transcript of An introduction to coherent risk measures [5mm] - unibo. · PDF fileAn introduction to...

An introduction to coherent risk measures

Giacomo Scandolo

Universita di Firenze

Risk measures: frontiers of mathematics and regulations

Universita di Bologna - 2015

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

The problem of risk assessment

I Assessing the ”risk” of something (an asset, a portfolio) is tricky, asthere is no precise definition of risk

I In the ’70 some psychological tests have investigated the subjectivenotion of risk (positive attitude)

I For sure ”risk” is not exactly the same as

I dispersion

I uncertainty (or ambiguity)

I Within financial regulation: ”risk” is defined by the procedure we useto quantify it (normative attitude)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 1 / 118

Horizon

I Whatever the notion we consider, (financial) ”risk” depends on thetime horizon we look at

I very short (infra-day): liquidity risk

I short (1 to 10 days): market risk

I medium (1 month to 1 year): credit and operational risk

I long (several years): longevity and other social risks

I So, a risk figure is a number attached to a portfolio and for a given timehorizon.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 2 / 118

Profit and Loss

I Once the horizon T is fixed, the Profit&Loss (PL) of a portfolio is

PL = VT − V0

where Vt is the portfolio ”value” at time t (t = 0,T )

I positive values are Profits, negative values are Losses

I in Insurance L = −PL (Loss variable) is more often used

I today, at time t = 0, V0 is known while VT is unknown

I actually, in practice even V0 is often ”unknown” (lack of quotedprices, liquidity issues, etc.)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 3 / 118

Risk factors and risk mapping

I Actual portfolios are often way too complex. Some simplification isintroduced by selecting some basic financial variables

Y = (Y1, . . . ,Yd)′ (d ”small”)

and writing

Vt ' v(Yt ) = v(Y1,t , . . . ,Yd,t )

I Yi are the risk factors

I typical choices are: log-prices, log-indices, interest/FX rates,credit spreads

I v = v(y) is the risk mapping

I the higher is d, the more accurate is V ' v(Y)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 4 / 118

Risk factors and risk mapping

I If ∆Y = YT − Y0, we can write (from now on ”=” in place of ”'”)

PL = v(YT )− v(Y0) = v(Y0 + ∆Y)− v(Y0)

I ∆Y is a log-return (prices, indices), or just a variation (rates, spreads).

I We can then writePL = pl(∆Y)

for the function pl(z) = v(Y0 + z)− v(Y0).

I Note: Y0 is known and ∆Y and pl can be seen as (actual) risk factorsand risk mapping

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 5 / 118

Risk factors and risk mapping

I 4 different situations

1. 1 risk factor, linear risk mapping.

The easiest (but less realistic) case.

2. 1 risk factor, non-linear risk mapping.

Typical for option portfolios; problems in deriving thedistribution of PL.

3. more risk factors, linear risk mapping.

This is the case of realistic equity or bond portfolios; requiresmultivariate models for R.

4. more risk factors, non-linear risk mapping.

Combining the difficulties of both case 2 and 3.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 6 / 118

Approaches for assessing risk

I 3 different approaches for quantifying portfolio risk

I sensitivities approach:

risk is the strength of dependence of the portfolio value w.r.t. anunderlying variable

I stress testing approach:

risk is the response of the PL with respect to some extremescenarios

I probabilistic approach:

a risk statistics (like VaR or ES) is applied to the PL distribution

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 7 / 118

Sensitivity approachI Consider just 1 risk factor Y . If T is short, we can write (Delta

approximation)

PL = v(Y0 + ∆Y )− v(Y0) ' v′(Y0)∆Y

I Within the sensitivity approach, risk is quantified as the strength of thedependence of PL on ∆Y , i.e.

risk = v′(Y0)

I For an options portfolio: v′(Y0) is the Delta (Y0 underlying price). For abond portfolio: v′(Y0) is proportional to the duration (Y0 referenceyield)

I Pros: 1. no probabilistic model, 2. very common in practice since along time

I Cons: 1. just 1 variable at the time, 2. only for a short horizon, 3. riskexpressed in non-monetary terms

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 8 / 118

Stress test approachI Within the stress test approach we select a certain number of

scenarios, i.e. hypothetical evolutions for Y:

∆Y(1), . . . ,∆Y(K )

and obtain corresponding K test values for PL:

PL(i) = pl(∆Y(i)) i = 1, . . . ,K

I Then we can retain all test values or consider the worst-case, i.e.

risk = mini

PL(i)

I Pros: 1. favoured by regulators, 2. for all horizons, 3. captures extremescenarios ruled out by common probabilistic models (e.g. −10% dailystock returns, 200bp yield increases, etc), 4. risk is expressed inmonetary terms

I Cons (a big one): highly dependent on the choice of scenarios

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 9 / 118

Probabilistic approachI Within the probabilistic approach we consider a risk measureρ : L → R (L is a set of r.v.) which is law-invariant, i.e.

X ∼ Y =⇒ ρ(X ) = ρ(Y )

So, ρ depends only on FX and can be called a risk statistics

I For any portfolio, we develop a probabilistic model for PL (givenhorizon) and compute

risk = ρ(PL)

I In practice, we first build a probabilistic model for ∆Y and then derivethe distribution of PL = pl(∆Y)

I Pros: 1. for all horizons, 2. expressed in monetary terms, 3. results canbe checked on a statistical ground (backtesting), 4. different ρ capturedifferent facets of risk

I Cons: 1. building a probabilistic model is not easy, 2. sometime,non-technical people find ρ difficult to understand

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 10 / 118

The three basic approaches

I Let PL = pl(∆Y). There are 3 approaches to derive the distribution ofPL:

I Analytical (or variance-covariance).

We ”know” a distribution for ∆Y and we can obtain analyticallythe distribution of pl(∆Y)

I Monte Carlo.

We ”know” a distribution for ∆Y, but we need to resort tosimulation to get a (empirical) distribution for pl(∆Y).

I Historical.

We don’t ”know” a distribution for ∆Y. We use its empirical (orhistorical) distribution and derive the corresponding empiricaldistribution of PL.

I Analytical: typically used with linear portfolios (pl linear)Monte Carlo: non linear portfoliosHistorical: can always be used (indeed, the ”standard” method)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 11 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Quantiles - continuous r.v.

I Consider X with invertible cdf F (x) = P(X 6 x) (no jumps, no flatsections).

I The quantile of order α ∈ (0, 1) is

q(α) = F−1(α)

Other notation: qX (α) and qα(X )

I By definition: it is the unique q ∈ R s.t.

F (q) = P(X 6 q) =

∫ q

−∞f (x) dx = α

I Note: q(n/100) is the n-th percentile, q(0.5) is the median,q(0.75)− q(0.25) is the interquartile range, etc.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 12 / 118

Quantiles - continuous r.v.

−3 q(20%) 0 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure: Quantile of order α = 20% in terms of F

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 13 / 118

Quantiles - continuous r.v.

−3 q(20%) 0 30

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

20%

Figure: Quantile of order α = 20% in terms of f

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 14 / 118

Quantile functionI The quantile function α 7→ q(α) is increasing and q(0+) = −∞ and

q(1−) = +∞when the support of X is R (i.e. f > 0 on all R).

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

Figure: Black: F (standard normal). Red: q. In order to invert a function,just mirror its graph w.r.t. the main bisectrix (dashed line)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 15 / 118

Examples

I If U ∼ U (0, 1), then F (x) = x (x ∈ [0, 1]) so q(α) = α

I If X ∼ Exp(λ), then F (x) = 1− e−λx (x > 0), so that

q(α) = − 1λ

log(1− α)

If λ = 1, then q(0.9) = 2.303 and this means P(X 6 2.303) = 90%

I For X ∼ N (0, 1) and∼ t(ν), the cdf is not known explicitly. However:

zα = qN(0,1)(α) = Φ−1(α) tν,α = qt(ν)(α)

can be easily approximated with great precision (e.g. z1% ' −2.3263)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 16 / 118

Quantiles - general case

I F not invertible when either

I F is not strictly increasing corresponding to α (flat section)

I F has a jump corresponding to α

I In both cases, define

q(α) = infx : F (x) > α

I If F is invertible, then this definition coincides with the previous one.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 17 / 118

Quantiles - discrete r.v.

I Applying the definition to an empirical distribution

X ∼ x1, . . . , xN/1/N . . . , 1/N

(with xk < xk+1) we have

qX (α) =

x1 if α ∈ (0, 1/N ]x2 if α ∈ (1/N , 2/N ]. . .xk if α ∈ ((k − 1)/N , k/N ]. . .

I So, if N = 200, then

q(1%) = x2 q(5%) = x10

and so on.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 18 / 118

First key result

I If h is continuous and strictly increasing (hence, invertible), then

qα(h(X )) = h(qα(X ))

I So, for instance,

q(X 3) = q(X )3

q(eX ) = eq(X )

q(log X ) = log q(X ) (X > 0)

q(aX + b) = aq(X ) + b (a > 0)

I Instead, q(X 2) 6= q(X )2 and q(−X ) 6= −q(X ) in general

I However, if FX is invertible, then qα(−X ) = −q1−α(X )

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 19 / 118

Quantiles and location-scale familiesI Let X have (finite) mean µ and variance σ2.

If X = (X − µ)/σ (note: it is standard), then

q(X ) = σq(X ) + µ

I For instance, if X ∼ N (5, 4), then X ∼ N (0, 1) and

q5%(X ) = 2q5%(X ) + 5 = 2z5% + 5 = 1.710

I For a given standard Z , the associated location-scale family is

σZ + µ : σ > 0, µ ∈ R = X : X = Z

So, knowing q(Z) is ”enough” for computing q(X ) for X in thelocation-scale family.

I Note: a location-scale can be defined even if Z has no mean/variance(e.g. Pareto or Cauchy): σ and µ are just parameters and X makes nosense

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 20 / 118

Second key result

I Consider X and let q = qX . Then there exists UX ∼ U (0, 1) such thatX = q(UX ). In particular

X ∼ q(U ) U ∼ U (0, 1)

I It follows ∫ 1

0q(u) du = E [q(U )] = E [X ]

I More generally, if α ∈ (0, 1) and X is a.c., then∫ α

0q(u) du = αE [q(U )|U 6 α] = αE [X |X 6 q(α)]

or1α

∫ α

0q(u) du = E [X |X 6 q(α)].

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 21 / 118

Value-at-Risk

I The Value-at-Risk of order α ∈ (0, 1) is

VaRα(X ) = −qα(X )

where X is the portfolio PL over a certain horizon T

I The minus sign translates high losses (X 0) into a high risk(VaR 0)

I Order α: 1%-5% (market risk), 1% or less (credit, operative,insurance risk)

I Horizon ∆t : 1 day (internal monitoring), 10 days (market riskunder Basel III), 1 year (credit, operative, insurance risk)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 22 / 118

Expected Shortfall

I The Expected Shortfall of order α ∈ (0, 1) is

ESα(X ) = − 1α

∫ α

0qX (u) du

or

ESα(X ) =1α

∫ α

0VaRu(X ) du

(again, X is the portfolio PL)

I Other names are used (but beware):

Average/Conditional/Tail VaR (AVaR/CVaR/TVaR)

Tail Conditional Expectation (TCE)

Expected Tail Loss (ETL)

I α and ∆t chosen similarly as for VaR

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 23 / 118

Expected Shortfall

I If X is a.c. we know that:

ESα(X ) = −E [X |X 6 qα(X )] (1)

and this better explains the name ”Expected Shortfall”

I Remind that

E [X |A]def=

E [X · IA]

P(A)

whenever A is an event with positive probability.

I No more true if X is not a.c. In that case

ESα(X ) = −E [X |X 6 qα] · F (qα)

α+ qα ·

(F (qα)

α− 1)

However, for practical applications (1) is still a good approximation.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 24 / 118

VaR vs ES

I If PL is continuous and v = VaRα(PL):

α = P(PL 6 −v) = P(Loss > v), Loss = −PL

I VaR5% = 10000 Euro means: the probability of a loss of 10000 Euro ormore is 5%

I Regarding ES:

ESα(PL) = E [−PL|PL 6 q5%(PL)] = E [Loss|Loss > VaR5%(PL)]

I ES5% = 12000 Euro means: the average of the 5% worst losses (largerthan 10000) is 12000

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 25 / 118

Some basic inequalities

I VaR and ES are decreasing in α. So

VaR5%(X ) 6 VaR1%(X )

ES5%(X ) 6 ES1%(X )

Remarkably, when X is normal

VaR1%(X ) ' ES2.5%(X )

I As VaR is decreasing in α:

ESα(X ) > VaRα(X )

i.e. ES is more conservative than VaR.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 26 / 118

VaR vs ESI VaR:

I introduced in the early 90’s by JP Morgan (RiskMetricsprocedure)

I the standard in the financial sector

I Basel III capital requirement is based on VaR

I ES:I increasingly used by fund managers and in the insurance sector

I current debate about replacing VaR (at 1%) with ES (at 2.5%) inBasel rules

I Also:I Estimation process is similar

I ES is coherent, VaR is not (see later)

I VaR is defined for all distribution; ES requires an ”integrable” tail,so it is +∞ for a Cauchy (possible problem only for operative andinsurance risks)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 27 / 118

VaR and ES computation

I VaR computation: sign change; ES: a bit more work

I If X ∼ Lap(1), i.e. f (x) = e−|x|/2, then q(u) = log(2u) (u < 1/2) and

VaRα(X ) = − log(2α) α < 1/2

I Concerning ES, we compute (α < 1/2)

ESα(X ) = − 1α

∫ α

0log(2u) du

= − 1α

[u(log(2u)− 1)]α0

= −α(log(2α)− 1)

α= 1− log(2α).

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 28 / 118

VaR and ES computationI If Z ∼ N (0, 1) and α < 1/2 then (zα < 0)

VaRα(Z) = |zα|

I Concerning ES:

ESα(Z) = −E [Z |Z 6 zα] = −E [Z · IZ6zα ]

P(Z 6 zα)= − 1

α

∫ zα

−∞xϕ(x) dx

Observing that ϕ′ = −xϕ and ϕ(−∞) = 0 we readily obtain

ESα(Z) =ϕ(zα)

α

I Key values for VaR and ES for a standard normal are as follows

α 0.1% 1% 5%

VaRα 3.090 2.326 1.645ESα 3.367 2.665 2.063

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 29 / 118

VaR and ES computation

0 0.01 0.02 0.03 0.04 0.051

2

3

4

5

6

7

8

Figure: VaR (solid) and ES (dashed) depending on α. Black: N(0,1); red:Lap(1). Note: the heavier the tail (Lap) the higher the VaR and ES.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 30 / 118

VaR and ES of transformed variables

I As q(aX + b) = a · q(X ) + b, a > 0

VaRα(aX + b) = a VaRα(X )− b

and

ESα(aX + b) =1α

∫ α

0a VaRu(X )− b du

=aα

∫ α

0VaRu(X ) du − b

α

∫ α

0du = a ESα(X )− b

I Both are

I positively homogeneous: ρ(aX ) = aρ(X ), a > 0

I translation equivariant: ρ(X + b) = ρ(X )− b

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 31 / 118

VaR and ES and location-scale families

I If X has (finite) mean µ and variance σ2

VaRα(X ) = σ VaRα(X )− µ

ESα(X ) = σ ESα(X )− µ

I If X ∼ N (0, σ2), then (α < 1/2)

VaRα(X ) = σ · |zα| ESα(X ) = σϕ(zα)

α

so that, for instance,

VaR5%(X ) = 1.645 · σ ES5%(X ) = 2.063 · σ

I Under a location-scale family assumption, estimation of VaR/ESreduces to the estimation of µ and σ.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 32 / 118

VaR and ES for Loss variables

I In Insurance the Loss variable L = −PL is most often used.

I Modified definitions of VaR and ES:

VaR(Ins)β (L) = qβ(L)

ES(Ins)β (L) =

11− β

∫ 1

β

VaR(Ins)u (X ) du

= E [L|L > qβ(L)] (L a.c.)

where β close to 1 (e.g. β = 95%, 99%, 99.9%).

I Note, when PL is a.c.

VaR(Bank)α (PL) = −qα(PL) = q1−α(L) = VaR(Ins)

1−α(L)

and similarly for ES.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 33 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Sub-additivity

I A risk measure is sub-additive if

ρ(X + Y ) 6 ρ(X ) + ρ(Y ) for all X ,Y

I There are (at least) 3 advantages choosing a sub-additive ρ:

1. Sub-additivity deters a financial institution from splitting in orderto lower the total risk (hence, the capital requirement asked by theregulator)

2. If PL = PL1 + . . .+ PLN where PLi is the PL of internal unit i, then

ρ(PL) 6 ρ(PL1) + . . .+ ρ(PLN )

Estimation of partial risks (i.e. ρ(PLi)) is usually more accurateand ρ(PL1) + . . .+ ρ(PLN ) is a reliable upper bound.

3. A sub-additive ρ is also convex and this is a key property inportfolio optimization.

I But not everybody is so fascinated with subadditivity...

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 34 / 118

VaR and elliptical returns

I Are VaR and ES sub-additive?

I We start with a particular case:

I Assume that ∆Y ∈ RN is elliptical and let

L = X = v′∆Y : v ∈ RN

be the vector space of all r.v. that are linear combinations of ∆YI If α < 0.5 there exists a constant c > 0 such that

VaRα(X ) = c · σ(X )− E [X ] for all X ∈ L

As a consequence

VaRα(X + Y ) 6 VaRα(X ) + VaRα(Y ) for all X ,Y ∈ L

A similar result holds for ES as well.

I ”Elliptical” means that level sets of the density are ”ellipsoids”. Keyexample: multivariate normal.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 35 / 118

Beyond elliptical returns

I Put it briefly, VaR and ES are certainly sub-additive when ∆Y iselliptical and we only consider linear portfolios (linear pl).

I These assumptions are often reasonable when dealing with equityportfolios

I This is no more the case for portfolios containing options or whenconsidering other types of risks (credit, operational)

I Is VaR always sub-additive? No.

I Is ES always sub-additive? Yes.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 36 / 118

VaR is not sub-additive - Example 1

I Consider a portfolio made by D = 100 different ZC defaultable bonds,face value 105, current price 100, maturity T = 1.

I Defaults are independent and occur with probability p = 2% for eachbond

I The PL of bond d is

PLd = 105(1− Bd)− 100 = 5− 105Bd

where Bd = 1 if bond d defaults and Bd = 0 otherwise.

I The total PL is

PL =∑

d

PLd = 500− 105∑

d

Bd ,

where the Bd are IID Ber(2%)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 37 / 118

VaR is not sub-additive - Example 1

I For any d, VaR5%(PLd) = −5

I Using a MC simulation we estimate

VaR5%(PL) ' 25

I Therefore

VaR5%

(∑d

PLd

)' 25 > −500 =

∑d

VaR5%(PLd) = 100VaR5%(PL1)

and sub-additivity is broken at α = 5%

I Instead

VaR1%

(∑d

PLd

)' 130 < 10000 = 100VaR1%(PL1)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 38 / 118

VaR is not sub-additive - Example 1I Consider

I Portfolio A: we invest 100 Euro in each bond

I Portfolio B: we invest 10000 Euro in bond 1

I We have just seen that VaR5%(PLA) > VaR5%(PLB)

−10000 5000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure: PL histogram for portfolio A (blue) and B (red)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 39 / 118

VaR is not sub-additive - Example 2

I A Cauchy r.v. (or t-Student with ν = 1) has density

f (x) =1

π(1 + x2)

so that f ∼ x−2 and F ∼ |x|−1 for |x| → ∞

I The following is a known (and odd..) fact:

I if X ,Y are Cauchy and independent, then X + Y ∼ 2X

I Note: if X ,Y are IID standard normal (finite variance), thenX + Y ∼

√2X . A Cauchy r.v. has infinite variance

I Therefore, for IID Cauchy

VaRα(X + Y ) = VaRα(2X ) = 2VaRα(X ) = VaRα(X ) + VaRα(Y )

I Weird, but still not a counterexample for lack of sub-additivity...

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 40 / 118

VaR is not sub-additive - Example 2

I Consider a Pareto-like distribution

F (x) = |x|−1/2, x < −1,

for which f (x) ∼ |x|−3/2 (like a t-Student with ν = 1/2).

I If X ,Y both have the distribution F above and are independent then itcan be computed

P(X + Y 6 −z) =2√

z − 1z

<

√2z

= P(2X 6 −z) z > 2

I As a consequence, for any α ∈ (0, 1)

VaRα(X + Y ) > VaRα(2X ) = VaRα(X ) + VaRα(Y )

I Lack of subadditivity is uniform in α

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 41 / 118

VaR is not sub-additive - Example 3I The following problem has recently received attention:

maxVaRα(X + Y ) : X ∼ F , Y ∼ G,

where F and G are fixed distributions. We are looking for the worst VaRof the sum for given marginals

I There is no comprehensive solution, but some particular cases can betreated effectively

I For instance, if F and G are standard normal, then it has been shownthat

maxVaR5%(X + Y ) : X ∼ Y ∼ N (0, 1) ' 3.92 > 2 · 1.645

I Therefore, we can violate sub-additivity even when the marginaldistributions are simple (normal).

I However the (worst-case) dependence structure (copula) between Xand Y is far from a normal one (and highly non-elliptical)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 42 / 118

VaR is not sub-additive

I We have seen 3 counterexamples

I Bond: IID risks with a pronounced skewnessExamples: credit risk, portfolios with highly non-linear options

I Cauchy: IID risks with null skewness, but very thick tailsExamples: operative risk, insurance products

I Copula: ID normal risks, but highly asymmetric dependencestructureExample: interactions during crisis periods

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 43 / 118

Why VaR is not sub-additive

I Assume PL = X + Y

I Consider two time series of length 100 for the vector (X ,Y ):

x1, . . . . . . , x100

y1, . . . . . . , y100

where xn and yn are the values observed n days ago

I Using the historical method amounts to use the empirical distributionfor X

FX (x) =1

100

100∑n=1

Ix>xn

and similar for Y

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 44 / 118

Why VaR is not sub-additive

I For α = 2% we have

VaR2%(X ) = −x(2), VaR2%(Y ) = −y(2)

where

x(1) 6 x(2) 6 . . . 6 x(100)

y(1) 6 y(2) 6 . . . 6 y(100)

are the order statistics for (xn) and (yn)

I If zn = xn + yn, then

VaR2%(X + Y ) = −z(2),

where z(1) 6 . . . 6 z(100) is the order statistics of the sum X + Y

I Warning! It can be the case that z(2) 6= x(2) + y(2)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 45 / 118

Why VaR is not sub-additive

I Consider

n . . . 22 . . . 45 . . . 91 . . .

xn + −20 + −10 + −5 +yn + −5 + −10 + −19 +

zn = xn + yn + −25 + −20 + −24 +

where + indicates some positive values for the variables.

I We haveVaR2%(X + Y ) = −z(2) = 24

whileVaR2%(X ) + VaR2%(Y ) = −x(2) − y(2) = 20

I ThereforeVaR2%(X + Y ) > VaR2%(X ) + VaR2%(Y )

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 46 / 118

Why ES is sub-additive

I Recall

n . . . 22 . . . 45 . . . 91 . . .

xn + −20 + −10 + −5 +yn + −5 + −10 + −19 +

zn = xn + yn + −25 + −20 + −24 +

I We have

ES2%(X + Y ) = −z(1) + z(2)

2= 24.5

while

ES2%(X ) + ES2%(Y ) = −x(1) + x(2) + y(1) + y(2)

2= 29.5

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 47 / 118

Why ES is sub-additive

I The crucial point is:

I we can have(x + y)(2) < x(2) + y(2)

and this leads to lack of sub-additivity for VaR

I we always have

(x + y)(1) + (x + y)(2) > x(1) + x(2) + y(1) + y(2)

and this implies sub-additivity for ES

I It is possible to prove that ES is sub-additive for general distributions.Embrechts and Wang (WP, 2015) provide 7 (!) different (rigorous)proofs.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 48 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Coherence and diversification

I A good risk measure should encourage diversification, but this isdifficult to write in precise terms.

I Sub-additivity of a risk measure does not seem to be equivalent to this.

I Whileσ(X + Y ) 6 σ(X ) + σ(Y )

always holds,σ2(X + Y ) 6 σ2(X ) + σ2(Y )

only holds when corr(X ,Y ) 6 0.

I However, both are used in Markowitz portfolio theory.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 49 / 118

Coherence and independence

I It would be tempting to think that the risks of two independentpositions should add up, i.e.

ρ(X + Y ) = ρ(X ) + ρ(Y ) X ,Y independent

I This is always true for σ2 (not for σ), but false in general:

I If X and Y are IID normal, then X + Y ∼√

2X . If ρ satisfies ispositively homogeneous (like VaR and ES), then

ρ(X + Y ) = ρ(√

2X ) =√

2ρ(X ) < 2ρ(X ) = ρ(X ) + ρ(Y )

provided ρ(X ) > 0

I We have seen before that if X ,Y are Pareto-like distributed andindependent, then

VaRα(X + Y ) > VaRα(X ) + VaRα(Y )

for any α

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 50 / 118

Coherence and comonotonicity

I X and Y are comonotonic if X = f (Z) and Y = g(Z) for two increasingfunctions f and g and a third r.v. Z .

I When this happens, none of the two r.v. is an hedge for the other. So, itseems natural that the two risks sum up in this case.

I A risk measure is comonotonic additive if

ρ(X + Y ) = ρ(X ) + ρ(Y )

for any X ,Y comonotonic.

I It is quite easy to verify that both VaR and ES are comonotonic additiveWhen f and g are increasing and invertible, just observe that

q(X ) + q(Y ) = f (q(Z)) + g(q(Z))

= (f + g)(q(Z)) = q((f + g)(Z)) = q(X + Y )

as f + g is also increasing and invertible.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 51 / 118

Coherence and comonotonicityI Comonotonicity attains the maximum correlation for given marginals:

I If X and Y are comonotonic, then

corr(X ,Y ) > corr(X ′,Y ′)

for any X ′ ∼ X and Y ′ ∼ Y such that X ′ and Y ′ are notcomonotonic.

I Let ρ be sub-additive and comonotonic additive. If X and Y arecomonotonic and X ′ ∼ X and Y ′ ∼ Y , then of course

ρ(X ′ + Y ′) 6 ρ(X ′) + ρ(Y ′) = ρ(X ) + ρ(Y ) = ρ(X + Y )

I So, for the ES, the maximum risk of a sum X + Y with given marginalsis attained at the comonotonic case or, equivalently, at the maximumcorrelation case.

I This is not true for VaR, which is not sub-additive.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 52 / 118

Subadditivity and convexityI If ρ is positively homogeneous (PH), then sub-additivity (Sub) is

equivalent to convexity (C)

ρ(λX + (1− λ)Y ) 6 λρ(X ) + (1− λ)ρ(Y ) λ ∈ [0, 1]

I Under (PH), (Sub) implies (C)

ρ(λX + (1− λ)Y ) 6 ρ(λX ) + ρ((1− λ)Y )

= λρ(X ) + (1− λ)ρ(Y )

I Under (PH), (C) implies (Sub)

ρ(X + Y ) = ρ

(12

2X +12

2Y)

612ρ(2X ) +

12ρ(2Y ) = ρ(X ) + ρ(Y )

I So, ES is convex, VaR is not

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 53 / 118

Subadditivity and convexity

I Note: ρ is convex if and only if, for all X and Y

λ 7→ ρ(λX + (1− λ)Y ) λ ∈ [0, 1]

is convex or, more in general

λ = (λ1, . . . , λN ) 7→ ρ

(∑n

λnXn

)λn > 0,

∑n

λn = 1

is convex.

I Important: (strict) convexity ensures uniqueness of a minimum; also,it greatly helps building numerical algorithms

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 54 / 118

Optimization with ES/VaR

I Consider two bonds A and B, both have current price 104.6, face value100 and coupon 8.

I Both can have a soft default, with probability 2%, or a hard default,with probability 3%. In a soft (resp. hard) default the coupon (resp. thecoupon and the face value) is not paid back

I The probability that A and B default together is 0

I The following table shows the PL of bond A, B and of the portfolio A+B

event prob. PLA PLB PLA+B

no default 90% +3.4 +3.4 +6.8soft for A 2% −4.6 +3.4 −1.2

hard for A 3% −104.6 +3.4 −101.2soft for B 2% +3.4 −4.6 −1.2

hard for B 3% +3.4 −104.6 −101.2

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 55 / 118

Optimization with ES/VaR

I Note that

VaR5%(PLA+B) = 101.2 > 9.2 = VaR5%(PLA) + VaR5%(PLB)

while

ES5%(PLA+B) = 101.2 < 2 · 64.6 = ES5%(PLA) + ES5%(PLB)

I For λ ∈ [0, 1] consider the portfolio λA + (1− λ)B with current value104.6. Its PL is

PLλ = λPLA + (1− λ)PLB

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 56 / 118

Optimization with ES/VaR

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

Figure: VaR5% (black) and ES5% (red) of PLλ as a function of λ

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 57 / 118

Optimization with ES/VaR

I Consider N bonds with face value 100 and current price 97. Bonds cango to default, independently each other and with probability p = 3%.We invest 1 Euro allocating 1/N to each bond.

I The PL of each of the N partial investments is

PLi =1N

(100(1− Bi)

97− 1)

=3− 100Bi

97Ni = 1, . . . ,N

where Bi = 1 if bond i defaults and 0 otherwise. Therefore, the total PLis

PLtot,N =

N∑i=1

PLi =3N − 100

∑i Bi

97N=

397− 100X

97N,

where X =∑

i Bi ∼ Bin(N ,p) counts the number of defaults in theportfolio

I We expect that the larger is N , the lower is VaR5% (diversificationeffect). Instead...

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 58 / 118

Optimization with ES/VaR

0 10 20 30 40 500

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Figure: VaR5%(PLN ) (black) and ES5%(PLN ) (red) as a function of N

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 59 / 118

Optimization with ES

I Rockafellar and Uryasev (2000) showed that

ESα(X ) = minη∈R−η + α−1E [(η − X )+]

and that the minimum is reached for η = VaRα(X )

I This is a useful result for the minimization of ES

I If X (λ) is the PL depending on the parameters λ,

minλ

ESα(X (λ)) = minη∈R, λ

−η + α−1E [(η − X (λ))+]

I This procedure allows to avoid a sort algorithm in the target function

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 60 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Coherent risk measures

I A seminal paper by Artzner, Delbaen, Eber, Heath (Math. Finance,1999) introduced the following key definition.

I A risk measure ρ : L → R is coherent if it satisfies

I (TE) Translation Equivariance (originally, t. invariance)

ρ(X + b) = ρ(X )− b ∀b ∈ R

I (PH) Positive Homogeneity

ρ(aX ) = aρ(X ) ∀a > 0

I (M) Monotonicity

X > Y =⇒ ρ(X ) 6 ρ(Y )

I (Sub) Sub-additivity

ρ(X + Y ) 6 ρ(X ) + ρ(Y )

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 61 / 118

Coherent risk measures

I ADEH is basically the first theoretical paper dealing with general riskmeasures. Huge stream of literature and continuous debate thereafter.

I Three main reasons for this:

I Clear financial motivation for all 4 properties (but some disagree)

I Several connections with diverse areas in Analysis and Probabilityand many different examples of coherence

I But, above all: ES is coherent, VaR is not

I We have already seen that VaR and ES both satisfy TE and PH; they alsoboth satisfy M.

I However, only ES satisfies also Sub, so only ES is coherent.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 62 / 118

Translation equivariance

I Risk measures satisfying TE are also called monetary or capitalrequirements. The reason is in the following results

I If ρ satisfies TE andA = X : ρ(X ) 6 0, then

ρ(X ) = minb : X + b ∈ A (2)

Indeed

minb : X + b ∈ A = minb : ρ(X + b) 6 0= minb : ρ(X ) 6 b = ρ(X )

I Viceversa, any ρ in the form (2) satisfies TE when finite.

I The setA is called the acceptance set and ρ(X ) is the amount ofmoney that has to be added to X to make it acceptable.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 63 / 118

Translation equivariance

I The acceptance set for VaRα is

A = X : VaRα(X ) 6 0 = X : qα(X ) > 0

I Therefore a (a.c.) PL is acceptable if and only if

P(PL 6 0) = P(VT 6 V0) 6 α

I If α = 1% or 5%, a PL is very seldom acceptable with this criterion, butinjecting a suitable amount it will fall inA, that is

VaRα(PL) = minb : P(VT + b 6 V0) 6 α

I Similar arguments for ES

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 64 / 118

Positive homogeneity

I PH tells that when we change currency (or, more generally, numeraire),the risk changes accordingly.

I PH has been criticized: for instance, it would imply that

ρ(100000 · X ) = 100000 · ρ(X ).

and in illiquid markets the risk of 100000 shares may be higher then100000 times the risk of a single share.

I This critique seems wrong: in illiquid market, the link betweennumber of assets and the PL is a non-linear one (so the PL of 100000shares is not necessarily 100000 · X ).

I Remind: TE and PH together imply the very useful identity

ρ(X ) = −E [X ] + σ(X )ρ(X ).

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 65 / 118

Monotonicity

I Property M is almost trivial and, actually, rather weak

I X > Y is very strong and implies (1st order) stochastic dominance of Xover Y :

FX (x) = P(X 6 x) 6 P(Y 6 x) = FY (x) for all x

I It readily follows qu(X ) > qu(Y ) for all u, so that

X > Y =⇒ VaRα(X ) 6 VaRα(Y )

and VaR satisfies M.

I Integrating in u we see that ES too satisfies M

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 66 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Standard deviationI In Markowitz Portfolio Theory standard deviation is used:

ρ(X ) = σ(X )

I σ satisfies PH and Sub, but not TE nor M. Instead, it is an example of adispersion measure, as:

I σ(−X ) = σ(X )

I σ(X + b) = σ(X )

I We can then consider

ρ(X ) = −E [X ] + a · σ(X ) (a > 0)

I it is still PH and Sub (note: E is linear)

I it becomes TE (note: −E [X + b] = −E [X ]− b)

I however, it is not M (check with a r.v. with 2 values)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 67 / 118

Standard semi-deviation

I A variant of σ is the (lower) standard semi-deviation

ρ(X ) = σ−(X ) =√

E [(X − E [X ])2−],

where x− = −min(x, 0). It is PH and Sub.

I Plainlyρ(X ) = −E [X ] + a · σ−(X ) (a > 0)

is TE, PH and Sub. It can be proved it is also M, hence coherent iffa ∈ [0, 1]

I More generally, Fisher (2003) proved that

ρ(X ) = −E [X ] + a · E[(X − E [X ])

p−]1/p

is coherent provided p > 2 and a ∈ [0, 1]. These are calleddeviation-based risk measures.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 68 / 118

Spectral risk measures

I We can write

ESα(X ) =1α

∫ α

0VaRu(X ) du =

∫ 1

0VaRu(X )ψ(u) du,

whereψ(u) = α−1I06u6α.

Notice that ψ is a density over [0, 1]

I We can then generalize and consider a risk measure in the form

ρψ(X ) =

∫ 1

0VaRu(X )ψ(u) du

where ψ is a generic density over [0, 1], i.e. ψ > 0 and∫ 1

0 ψ = 1.

I These measures are weighted sums of VaR of different orders

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 69 / 118

Spectral risk measures

I We observe that (a > 0)

ρψ(aX + b) =

∫ 1

0VaRu(aX + b)ψ(u) du

= a∫ 1

0VaRu(X )ψ(u) du − b

∫ 1

0ψ(u) du

= aρψ(X )− b

I Moreover, if X > Y , then VaRu(X ) 6 VaRu(Y ) for any u and thereforeρψ(X ) 6 ρψ(Y )

I Concerning (Sub), we have the following result: ρψ is sub-additive ifand only if ψ is decreasing (weak sense).

I When ψ is decreasing, then ρψ is called a spectral risk measure (withspectrum ψ). This class of coherent risk measures was introduced byAcerbi (2002).

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 70 / 118

Spectral risk measures

I ψ(u) = α−1I06u6α is decreasing and therefore ESα is coherent

I reasoning in a heuristic (but basically correct) way we observe thatVaRα corresponds to

VaRα(X ) =

∫ 1

0VaRu(X ) δα(u) du,

where δα is a point-mass in α, which is not a decreasing distribution.In fact VaRα is not coherent

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 71 / 118

Concave distortionsI We observe that ρ(X ) = −E [X ] is trivially coherent. A basic identity

(F = FX ) is

E [X ] = −∫ 0

−∞F (x) dx +

∫ +∞

0(1− F (x)) dx

I Consider a function h : [0, 1]→ [0, 1] such that h(0) = 0 and h(1) = 1and define

E h[X ] = −∫ 0

−∞h(F (x)) dx +

∫ +∞

0(1− h(F (x))) dx

I This is sometime called a Choquet integral or a distorted expectation(note: probabilities F are distorted by h). The risk measureρh(X ) = −E h[X ] satisfies TE, PH, M

I Remarkably: ρh satisfies (Sub), i.e. it is coherent, if and only if h is convex

I Usually, E h[X ] is expressed in terms of the dual functiong(u) = 1− h(1− u). Plainly, h is convex iff g is concave.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 72 / 118

Generalized quantilesI It is a known fact in Statistics that

qα(X ) = argminq αE [(X − q)+] + (1− α)E [(X − q)−]

I We can then consider generalized quantiles by defining

qα,Φ(X ) = argminq αE [Φ((X − q)+)] + (1− α)E [Φ((X − q)−)]

for some function Φ : R+ → R+

I For instance, if Φ(x) = x2, we obtain the so-called α-expectiles

eα = argminq

αE [(X − q)2

+] + (1− α)E [(X − q)2−]

I It can be proved that if Φ is strictly increasing and strictly convex, thenqα,Φ is uniquely defined.

I If in addition Φ is differentiable and X is continuous, then q = qα,Φ isthe unique solution of

αE [Φ′((X − q)+)] = (1− α)E [Φ′((X − q)−)]An introduction to coherent risk measures Giacomo Scandolo (Unifi) 73 / 118

Generalized quantiles

I For given α and Φ (assume w.l.o.g. Φ(0) = 0, Φ(1) = 1) we can put

ρ(X ) = −qα,Φ(X )

and wonder whether ρ is coherent.

I A bit surprisingly:

I ρ is coherent if and only if Φ(x) = x2, i.e. qα,Φ = eα, and α > 0.5

I We can generalize further and consider risk measures in the form

ρ(X ) = −argminy E [Ψ(X , y)],

where Ψ = Ψ(x, y) is some convex function (with reasonableproperties ensuring ρ is well-defined)

I Such risk measures are called elicitable and backtesting them seems tobe ”easier”. Remarkably, VaR is elicitable, ES is not

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 74 / 118

Two remarks

I We have seen different classes of coherent r.m.: deviation-based,spectral, concave distortions, generalized quantiles.

I There is some overlapping:

I ρ(X ) = −E [X ] is deviation-based (a = 0), spectral (ψ(u) = u) andconcave distortion (h(x) = x)

I ES and all other spectral risk measures can be represented asconcave distortions (actually the two classes nearly coincide)

I Note: VaR is defined for all distributions, while all coherent r.m. wehave presented are not (they usually require X ∈ L1 at least).

I Does there exist coherent r.m. defined for all distributions?Remarkably: No (Delbaen, 2000)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 75 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Combinations preserving coherenceI If ρ1 and ρ2 are coherent risk measures, then

ρ(X ) = ρ1(X ) + ρ2(X )

is not coherent: TE is not satisfied

I However, it is easy to show that

maximumρ(X ) = maxρ1(X ), ρ2(X )

convex combination

ρ(X ) = λρ1(X ) + (1− λ)ρ2(X ), λ ∈ (0, 1)

inf-convolution

ρ(X ) = infρ1(X1) + ρ2(X2) : X1 + X2 = X

are coherent.

I The solution of the inf-convolution problem may be interpreted as anoptimal sharing of the risk X between two agents.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 76 / 118

Combinations preserving coherence

I Maximum and convex combinations can be much generalized

I Let (ρa)a∈A be an arbitrary family of coherent risk measures; then

ρ(X ) = supa∈A

ρa(X )

is coherent (provided is finite)I If (A,A) is a measurable space and µ is a probability measure on

it, then

ρ(X ) =

∫Aρa(X )µ(da)

is coherent. In particular, if (ρa)a∈(0,1) is a family of coherent riskmeasures and ψ is a density on [0, 1], then

ρ(X ) =

∫ 1

0ρa(X )ψ(a) da

is coherent.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 77 / 118

Convex combinations of ESI We know that (ESα)α∈(0,1) is a family of coherent risk measures.

I Therefore, if ξ is a density on [0, 1]

ρ(X ) =

∫ 1

0ESa(X )ξ(a) da

is a coherent risk measure.

I We also have (V (u) = VaRu(X ))

ρ(X ) =

∫ 1

0

1a

∫ a

0V (u) du ξ(a) da =

∫ 1

0

∫ 1

0

V (u)

aIu6aξ(a) du da (Fubini)

=

∫ 1

0

∫ 1

0

1a

V (u)Ia>uξ(a) da du =

∫ 1

0V (u)

∫ 1

u

ξ(a)

ada du

=

∫ 1

0V (u)ψ(u) du

An easy check shows that ψ(u) =∫ 1

u a−1ξ(a) da is a decreasing densityon [0, 1]. So, ρ is a spectral risk measure.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 78 / 118

Convex combinations of ES

I More in general consider a risk measure in the form

ρ(X ) =

∫ 1

0ESa(X )µ(da), (3)

where µ is a probability on [0, 1]

I Reasoning as before, we can prove that ρ can be written in the form

ρ(X ) =

∫ 1

0VaRu(X ) ν(du), (4)

where ν is decreasing, i.e. ν([a, b]) > ν([a + ε, b + ε]) for ε > 0

I Risk measures in the form (4) with ν decreasing, are coherent and canbe seen as generalized spectral risk measures.

I Indeed, more is true...

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 79 / 118

Kusuoka representation

I The following result is known as the Kusuoka representation(Kusuoka, 2001)

I Let ρ be a risk measure on L1 satisfying a (mild) continuity condition.Then the following are equivalent:

1. ρ is coherent and comonotonic additive, i.e.

ρ(X + Y ) = ρ(X ) + ρ(Y )

for any X ,Y comonotonic.

2. ρ is a convex combination of ES as in (3)

3. ρ is a (coherent) generalized spectral r.m. as in (4)

I So, generalized spectral r.m. are, in a sense, a large class of coherentrisk measures.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 80 / 118

Dual representation

I Kusuoka representation exploits convex combinations and uses ES as abuilding block

I Dropping comonotonic additivity, there is even a more generalrepresentation: any coherent and law invariant risk measure can bewritten as a supremum of a family of generalized spectral risk measure

I Dual representation exploits maxima of risk measures and usesexpected values as building blocks. It holds even for non law-invariantrisk measures.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 81 / 118

Dual representationI Consider a finite space Ω = (ω1, . . . , ωN )′, so that a random variable X

is represented by the vector

x = (x1, . . . , xN ) xn = X (ωn)

I A probability measure P on Ω is represented by

p = (p1, . . . ,pN )′, pn = P(ωn) > 0

I LetP = p ∈ [0, 1]N :

∑n

pn = 1

be the set of all probabilities.

I If P ∈ P , then the expectation of X under P is

E P [X ] =∑

n

pnxn = p′x

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 82 / 118

Dual representation

I LetQ ⊆ P be a set of probabilities and define

ρQ(X ) = sup−E Q[X ] : Q ∈ Q

I As each ρ(X ) = −E Q[X ] is coherent (immediate), ρQ is coherent aswell.

I Remarkably, the converse holds as well. This is the dualrepresentation:

I Any coherent risk measure can be written in the form ρQ for somesetQ

I This result holds also in general probability spaces provided ρ satisfiesa mild continuity condition.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 83 / 118

Dual representation

I Here is a sketch of the proof

I The risk measure ρ may be seen as a map ρ : RN → R

I Since it is coherent, it is a convex and PH map, so that, accordingto a classical result in Convex Analysis we may write

ρ(x) = supv∈V−v′x,

for a subset V ∈ RN . In other words, any convex and PH functionis the pointwise supremum of a family of linear functionals.

I Using standard algebraic arguments, we can prove that

M =⇒ V ⊂ v ∈ RN : vn > 0 ∀n

TE =⇒ V ⊂ v ∈ RN :∑

n

vn = 1,

so that V ⊂ P and since v′x = E Q[X ] for some Q, we conclude.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 84 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

The process of risk measurement

I Remind (below some notation is changed):

I A risk measure is fixed ρ, by a regulator or internally.Popular choices are VaR5% VaR0.1%, ES1%)

I A time horizon ∆t is fixed, again by a regulator or internally.Popular choices are 1 day or 2 weeks (market risk), 1 year (creditand operational risk).

I The portfolio of interest is singled out. For instance, the focus ison a derivatives trader, on the Euro bond unit, on the overallportfolio of the institution.

I The two regulatory frameworks are Basel III (banks) and Solvency II(insurance companies)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 85 / 118

The process of risk measurement

I At date t (today):

I We assess the current portfolio ”value” Vt

I We propose a (multivariate) distribution for

∆Yt+T = Yt+T − Yt

The distribution can be conditional on It (information set at timet) or unconditional.

I We compute/estimate the distribution of PLt,t+T = pl(∆Yt+T )and

ρt = ρ(PLt,t+T | It ) (conditional)

= ρ(PLt,t+T ) (unconditional)

I Afterwards:

I After some runs of the steps above, we can check whether ourmethodology is sufficiently correct (this is called back-testing)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 86 / 118

Conditional and unconditional risk measurementI Let I = It and X = ∆Y (i.e. d = 1 for simplicity)

I When measuring market risk, the horizon is short (2 weeks or less). So

I Recent information matters for future events: I must be exploited

I The conditional distribution FX |I(x) = P(X 6 x | I) differs fromthe unconditional one FX (x) = P(X 6 x). Example: a GARCHprocess is conditionally normal, but unconditionally thick-tailed

I The risk measurement is conditional:

ρ(X | I) = ρ(FX |I)

I When measuring credit, operational or insurance risk, the horizon islong (1 year). So

I Recent information matters less and we may assume FX |I ≡ FX

(e.g. X part of an IID sequence)

I The risk measurement is unconditional, i.e. we are interested in

ρ(X ) = ρ(FX )

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 87 / 118

The parametric approach

I Within the (plain) parametric approach we:

1. Assume that the (multivariate, conditional or unconditional)distribution of X = ∆Y is in the parametric family

FX(x) = F (x |θ), x = (x1, . . . , xd)′

Here, θ = (θ1, . . . , θK )′ is the parameter vector.

2. Estimate parameters from data:

θ = θ(x1, . . . , xN )

Here, xn is the n-th historical observation for X.

3. Analytically compute the distribution of PL = pl(X) (it dependson the estimate θ)

4. Compute ρ(PL) (or ρ(PL | It ))

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 88 / 118

The parametric approach - basic exampleI Equity portfolio: vi is invested (at time t) in asset i (6 d).

I If Ri is the return for asset i (from t to t + T ), then

PL =∑

i

viRi = v′R

with v = (v1, . . . , vd)′, R = (R1, . . . ,Rd)′. Note: pl(x) = v′x is linear

I A common (rough) assumption is:

R | It ∼ Nd(µ,Σ)

where µ and Σ are conditional on It (e.g. a multivariateARMA-GARCH). Note: θ = (µ,Σ)

I Under this assumption

PL | It ∼ N1(µPL, σ2PL) µPL = v′µ, σ2

PL = v′Σv

and risk measures are easily computed.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 89 / 118

The Monte Carlo approach

I Within the (plain) Monte Carlo approach we:

1. As before, assumeFX(x) = F (x |θ)

for some parameter vector θ.

2. As before, estimate the parameters θ from observed data for X.

3. Simulate a large IID sample w1, . . . ,wM with commondistribution F (·;θ)

4. Obtain the corresponding IID sample for the PL

PL1 = pl(w1), . . . ,PLM = pl(wM )

5. Compute ρ(PL) using the obtained empirical distribution

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 90 / 118

The historical approach

I Within the (plain) historical approach we:

1. Using the observed series x1, . . . , xN , obtain the correspondingseries for the PL

PL1 = pl(x1), . . . ,PLN = pl(xN )

2. Compute ρ(PL) using the obtained empirical distribution

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 91 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

An IID framework

I Consider the simplest framework in which PL = X , i.e. d = 1 and pl isthe identity.

I Also, assume an IID∼ X sequence X1,X2, . . . is available

I For instance, this can be the case when

I we are dealing with operational or insurance losses

I we are using a MC step to evaluate PL (then we get an IIDsequence for PL)

I In order to estimate ρ(PL), we can proceed non-parametrically, i.e. wedo not want to postulate any parametric family for X

I Instead, we work with the empirical distribution and the associatedplug-in estimators

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 92 / 118

Empirical distributionI Given a data set x ∈ RN , the empirical distribution function is defined

as

Fx(y) =1N

N∑n=1

Ixn6y

−4 −3 −2 −1 0 1 2 3 40

0.2

0.4

0.6

0.8

1

Figure: Empirical distribution function for x = (−3, 0, 1, 1.5, 3) (obviously,the order does not count)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 93 / 118

Glivenko-Cantelli Theorem - 1

I Let (Xn)n be an IID sequence with common distribution F . For a givenx ∈ R consider the random sequence

FN (y) = Fx(y), x = (X1, . . . ,XN )

I We expect FN (y) to approach F (y) in some sense (Prob, a.s.). Much(much) more is true

I Glivenko-Cantelli Theorem. It holds

supx∈R

∣∣∣FN (x)− F (x)∣∣∣ a.s.−→ 0 N →∞

I So, the convergence is strong (a.s.) and uniform across x and FN (x) is aconsistent estimator of F (x).

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 94 / 118

Glivenko-Cantelli Theorem - 2

I The probability of large errors is also uniformly bounded through theDKW inequalities

P(

supx∈R

∣∣∣FN (x)− F (x)∣∣∣ > ε

)6 2e−2Nε2

Note that the upper bound decays exponentially with N

I As FN (x) has (clearly) a binomial distribution, asymptotic normality iseasily established and we get

√N(

FN (x)− F (x))

Law−→ N (0, σ∞)

where σ∞ = F (x)− F (x)2

I This last result can be much generalized through the DonskerTheorem: basically the sequence of processes VN = (FN (t))t∈Rconverges in law to a gaussian process with mean 0 and a simplecovariance structure that can be written explicitly in terms of F .

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 95 / 118

Plug-in estimators

I Consider a (law-invariant) risk measure ρ.

I The plug-in estimator (or sample or empirical estimator) for ρ is

ρ(x) = ρ(Fx)

that is, we just plug the empirical distribution into ρ.

I We have seen that FN converges in a very strong sense to F and isasymptotically normal.

I If ρ displays some continuity or even differentiability properties, thenρN is a good estimator for ρ.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 96 / 118

Plug-in estimators for risk measures - 1I Remind that

qα(Fx) = xk:N

where xk:N is the k-th least element in x, whenever

α ∈(

k − 1N

,kN

]or, equivalently, Nα ∈ (k − 1, k]

I As a consequence, if ρ(X ) = VaRα(X ), then the plug-in estimator is

ρ(x) = −xk:N , k = dNαe

I If ρ(X ) =∫

qX (u)ψ(u) du, we easily compute

ρ(x) = −N∑

k=1

ck,N xk:N ,

where

ck,N =

∫ kN

k−1N

ψ(u) du

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 97 / 118

Plug-in estimators for risk measures - 2

I In particular, for ρ = ESα, ψ(u) = α−1Iu6α and we immediatelycompute

ck,N =

1

Nαk 6 bNαc

1− bNαcNα

k = bNαc+ 1

0 k > bNαc+ 2

so that

ρ(x) = − 1Nα

bNαc∑k=1

xk:N −(

1− bNαcNα

)xbNαc+1:N

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 98 / 118

Order statistics - definition

I If x = (x1, . . . , xN ) is a vector and 1 6 k 6 N , we indicate with xk:N thek-th least element. In other words

x1:N 6 x2:N 6 . . . 6 xN :N

and xkk = xk:Nk .

I Note: x1:N = xk(1) for some k(1), x2:N = xk(2) for some k(2) 6= k(1), andso on

I If X = (X1, . . . ,XN ) is a random vector, then the random vector (Xk:N )k

is called the order statistics for X. Notice that a.s.

X1:N 6 X2:N 6 . . . 6 XN :N

I Note: X1:N = Xk(1) for some k(1), but k(1) depends on ω (i.e. it is itselfrandom)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 99 / 118

Order statistics - finite sample distributionI Assume X has IID components with common a.c. distribution FI The distribution of XN :N = maxk Xk is easy to derive

FN :N (x) = P(Xk 6 x, ∀k) =

N∏k=1

P(Xk 6 x) = F (x)N

Concerning the distribution of X1:N = mink Xk

F1:N (x) = 1− P(Xk > x, ∀k) = 1− (1− F (x))N

I For a general k,

Fk:N (x) = P(Xk:N 6 x) = P(at least k elements are 6 x)

=

N∑i=k

P(exactly i elements are 6 x)

=N∑

i=k

P(X1, . . . ,Xi 6 x) · P(Xi+1, . . . ,XN > x) · (# permutations)

=

N∑i=k

(Ni

)F (x)i(1− F (x))N−i

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 100 / 118

Properties of the plug-in estimator for VaRI Let (Xn)n be IID with common distribution F . Assume F a.c. with

positive density f .

I From Glivenko-Cantelli it follows

qα,N = qα(FX) = XdNαe:Na.s.−→ qα(F ) N →∞,

meaning that the plug-in estimator for the quantile, and therefore forVaR, is strongly consistent.

I More generally, it can be proved that

XkN :Na.s.−→ qα(F ) N →∞

provided kN/N → α. An example is kN = bNαc

I Asymptotic normality also holds (N →∞)

√N(

qα,N − qα(F )) law−→ N (0, σ2

∞) σ∞ =

√α(1− α)

f (qα(F ))

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 101 / 118

Properties of the plug-in estimator for ES and ρψ - 1

I The key result (corollary of a Van Zwet Theorem) is

I Let ψ be piecewise continuous and satisfy a mild technicalcondition and let

ck,N =

∫ kN

k−1N

ψ(u) du

If (Xn)n are IID with common distribution F , then

N∑k=1

ck,N Xk:Na.s.−→

∫ 1

0q(u)ψ(u) du N →∞

I The spectrum of ES satisfies these conditions, so that its plug-inestimator is strongly consistent.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 102 / 118

Properties of the plug-in estimator for ES and ρψ - 2

I Asymptotic normality of qα can be generalized:

I Let α1 < . . . < αD and qd the plug-in estimator for qd = qF (αd).Then√

N((q1,N , . . . , qD,N )− (q1, . . . , qD)

) a.s.−→ ND (0,Σ) N →∞,

where

Σi,j =αi(1− αj)

f (qi)f (qj)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 103 / 118

Properties of the plug-in estimator for ES and ρψ - 3

I From the previous result (and some work), it is possible to deriveasymptotic normality for the plug-in estimator of ES and spectral riskmeasures.

I For general ρψ (under mild conditions on ψ):

σ2∞ = 2

∫ 1

0

∫ t

0ψ(s)ψ(t)

s(1− t)

f (q(s))f (q(t))ds dt

I In particular, for ESα, where ψ(s) = α−1Is6α, this is

σ2∞ =

2α2

∫(s,t) : 06s6t6α

s(1− t)

f (q(s))f (q(t))ds dt

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 104 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Parametric estimation - 1

I Within the simplified setting (PL = X ), the parametric estimationconsists in:

1. Postulating that FX (x) = F (x |θ), where θ = (θ1, . . . , θK ) is theparameter vector (the value of some of the θk may be fixed inadvanced)

2. Estimating θ = θ(x) where x = (x1, . . . , xN ) is the sample ofobserved values for X

3. Computing ρ(F (· | θ))

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 105 / 118

Parametric estimation - 2

I Having fixed F (· |θ), θ ∈ Θ, and ρ (defined at least on the parametricfamily), let

h(θ) = ρ(F (· |θ))

I We know that if h is well-behaved and θ is a good estimator for θ (i.e.consistent and asymptotically normal), then h(θ) is a good estimatorfor ρ (at any F in the parametric family). This is a consequence of tworesults:

I if Yn → c a.s. and h is continuous, then h(Yn)→ h(c) a.s.I Delta-method: if θ is asymptotically normal for θ and h is

differentiable with h′(θ) 6= 0, then h(θ) is asymptotically normalfor h(θ) and moreover, the asymptotic variances satisfy

σh(θ)∞ = h′(θ) · σθ∞

(and similarly for dimension > 2)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 106 / 118

Maximum Likelihood estimators - 1

I A general recipe to obtain a good parameter estimator is by way ofmaximizing the likelihood.

I Assume F (· |θ) has density f (· |θ). The log-likelihood at a given dataset x ∈ RN is

l(θ | x) =

N∑n=1

log f (xn |θ) θ ∈ Θ

that has to be viewed as a function of θ

I The Maximum Likelihood estimator (MLE) of θ is then defined as

θmle

(x) = arg.maxθl(θ | x)

I The MLE, when it is well defined, is consistent, asymptotically normaland has, in a sense, the lowest possible asymptotic variance

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 107 / 118

Maximum Likelihood estimators - 2

I The normal parametric family has density

f (x |µ, σ) =1

σ√

2πe−

(x−µ)2

2σ2

where θ = (µ, σ) ∈ R× R+ are parameters.

I A simple computation shows that the MLE for µ and σ are

µmle(x) =1N

N∑n=1

xn, σmle(x) =

√√√√ 1N

N∑n=1

(xn − µmle)2

I Notice that they are the usual (or sample) estimators for the mean andstandard deviation.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 108 / 118

Maximum Likelihood estimators - 3I For the Laplace parametric family

f (x |µ, λ) =1

2λe−|x−µ|

λ

we get instead

µmle(x) =1N

N∑n=1

xn, λmle(x) =1N

N∑n=1

|xn − µmle |

I For this family, the mean is µ and the standard deviation is λ√

2

I In this case, σ is better estimated through√

2λmle than through thesample (plug-in) estimator.

I In general, the MLE may be difficult to compute (for instance fort-Student distributions), calling for numerical methods. It may not beeven well defined (because the maximization problem has no solutionor has multiple solutions)

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 109 / 118

Location-scale families

I A 2-parametric family of distributions is called a location-scale familyif it comes in the form

F (x | a, b) = F0

(x − b

a

), a > 0, b ∈ R

for some given reference distribution F0. In this case, a is the scaleparameter, b is the location parameter.

I The reason is: if F0 is the distribution of X0, then F (· | a, b) is thedistribution of aX0 + b

I If F0 is standard (mean 0, variance 1), then F (· | a, b) has mean b andvariance a2

I In terms of densities

f (x | a, b) =1a

f0

(x − b

a

), a > 0, b ∈ R

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 110 / 118

Location-scale families - basic examplesI Normal family, where a = σ and b = µ

I Laplace family, where a = λ and b = µ. Note that F0 (corresponding toµ = 0, λ = 1) is not standard as σ(F0) =

√2

I t-Student family with a fixed ν > 0 (degrees of freedom),corresponding to

f0(x) = cν

(1 +

x2

ν

)− ν+12

where cν is the normalizing constant. This is the classical form for thedensity: however σ(F0) > 1.

I When ν = 1 we have the Cauchy family, corresponding to

f0(x) =1

π(1 + x2)

This distribution does not have a mean, so location and scale have tobe interpreted in a loose sense here.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 111 / 118

Location-scale families and risk measures - 2

I Then, if F (x | a, b) is a location-scale family and ρ is VaR, ES, ρψ or anyother risk measure such that ρ(aX + b) = aρ(X )− b, we have

h(a, b) = ρ(F (· | a, b)) = a · ρ(F0)− b

where ρ(F0) can be considered as a constant

I If a and b are good estimators (possibly MLE) for a and b, then

r = aρ(F0)− b

is a good estimator for ρ(F ) and its asymptotic variance may becomputed.

I However (a big however) this is the case only if F is indeed in thelocation-scale family.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 112 / 118

A comparisonI Assume (Xn)n are IID standard normal.

I Postulating data come from the parametric family N (0, σ), the MLE forVaR5% is

VaRmleα,N (x) = 1.645

√√√√ 1N

N∑k=1

x2k

The estimator is consistent and asymptotically normal with σ∞ ' 1.16;however if data come from a non-normal distribution, this estimator isnot consistent

I Using a non-parametric approach, the plug-in estimator is

VaRα,N (x) = −xdN ·0.05e:N

The estimator is consistent and asymptotically normal with

σ∞ =

√0.05 · 0.95ϕ(z0.05)

' 2.11 ( 1.16)

This estimator remains consistent even if data are not normal.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 113 / 118

Outline

I The problem of risk assessment

I Value-at-Risk (VaR) and Expected Shortfall (ES)I Quantiles, VaR, ESI Subadditivity propertyI Convexity property and portfolio allocation

I Coherent risk measuresI Definition of coherenceI Examples of coherenceI Combinations preserving coherence

I Risk estimationI Non-parametric risk estimationI Parametric risk estimationI Robustness issues in risk estimation

An introduction to coherent risk measures Giacomo Scandolo (Unifi)

Robust Statistics

I Robust Statistics is concerned with statistical procedures, primarilyestimation, for which the transition from good to bad behaviour is nottoo fast

I Loosely speaking, a parametric estimator is robust if it retains goodproperties even in a neighborhood of the parametric family

I There are several notions of robustness for an estimator, among which

I Qualitative robustness. An estimator is qualitatively robust if,changing a bit the distribution generating data, the samplingdistribution does not change too much. It is a continuity property.

I Quantitative robustness. An estimator is quantitatively robust ifthe addition of an outlier to a large data set does not change toomuch the estimate.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 114 / 118

Qualitative robustness - 1

I Consider an estimator r and a sequence of IID (Xn)n with commondistribution F .

I The sampling distribution is defined as

Hr,F ,N = Law(r(X1, . . . ,XN )) Xn ∼ F

I We know that for an asymptotically normal estimator r at F , thesampling distribution resembles more and more to a normal one.

I In general, qualitative robustness is concerned with the (asymptotic inN ) dependence of G on F

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 115 / 118

Qualitative robustness - 2

I Consider 2 distributions, G and F . The Levy distance between G and His defined as

dL(G,F ) = infε > 0 : G(x − ε)− ε 6 F (x) 6 G(x + ε) + ε

I It is indeed a distance. Geometrically, it measures the maximumdistance between the graphs of G and F , in the direction NW-SE.

I Also, if (Gn)n is a sequence of distributions, then Gn tends to F in theweak sense if and only if dL(Gn, F )→ 0.

I Weak convergence is the classical convergence criterion amongdistributions: it corresponds to convergence in law of r.v. distributed asGn and F .

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 116 / 118

Qualitative robustness - 3

I The estimator r is (qualitatively) robust at F if for any ε there exist δand n such that

dL(G, F ) < δ implies dL(Hr,G,n,Hr,F ,n) < ε n > n

I In other words, we require the sampling distribution to dependcontinuously (uniformly in n) on the distribution generating data.

I If C is a set of distributions containing F , we may define C-robustnessby requiring the previous fact just for G ∈ C. It is a weaker, and morerealistic, notion of robustness.

I Also, we may use distances other than the Levy one. We can even usetwo different distances in the two sides of the implication. Severalrecent works in this direction.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 117 / 118

Qualitative robustness - 4

I A crucial result is

I Hampel Theorem. Let ρ be the plug-in estimator of ρ and assumeit is consistent with ρ at G for any G in a neighborhood of F .Then ρ is qualitatively robust at F if and only if ρ is continuous(w.r.t. Levy distance) at F .

I If F is invertible at α, then ρ = −qα is continuous at F . Under the sameassumptions, ρψ is continuous at F if and only if ψ(u) = 0 around 0and 1. As both−qα and ρψ are consistent:

I the plug-in estimator for VaRα is qualitatively robustI the plug-in estimator for ρψ is not qualitatively robust whenever ψ

is decreasing (note: ψ(0) > 0)I so: the plug-in estimator for ESα is not qualitatively robust

I We can adapt Hampel Theorem to other estimators r (not necessarilyplug-in). In general MLE of VaR or ρψ for location-scale families are notrobust. See Cont, Deguest, Scandolo 2010.

An introduction to coherent risk measures Giacomo Scandolo (Unifi) 118 / 118