Loss Estimation using Monte Carlo Simulationabellott/Presentations/CSCC 2017 MC... ·...

Loss Estimation using Monte Carlo

Simulation

Tony Bellotti, Department of Mathematics, Imperial College London

Credit Scoring and Credit Control Conference XV

Edinburgh, 29 August to 1 September 2017

Motivation

• Accurate estimation of loss based on underlying models of PD, LGD

and EAD.

• Use of Monte Carlo Simulation (integration) to avoid complex analytic

solution: giving a distribution of possible loss.

• Confidence intervals to quantify error in expected loss estimates.

• Applications:

• Internal risk management,

• Regulation (Basel 3),

• Accounting rules (IFRS9, CECL),

• Stress testing,

• Profit estimation.

Basic Idea

• Simple idea: Simulate Loss given by

For a portfolio of loans, with 𝑖 = 1 to 𝑛 accounts, compute

Loss = 𝑃𝐷𝑖 × 𝐿𝐺𝐷𝑖× 𝐸𝐴𝐷𝑖

𝑛

𝑖=1

for a portfolio of loans 𝑖 = 1 to 𝑛, where

• 𝑃𝐷𝑖 = probability of default;

• 𝐿𝐺𝐷𝑖= loss given default;

• 𝐸𝐴𝐷𝑖= exposure at default

across distributions of these risk factors, informed by models.

• Devil in the detail: relationship between these three risk factors.

Scope of this study

For this study, we considered the simplified problem:

• Assume no population change between training and forecast data (ie

IID data).

• Do not consider inclusion of economic conditions just yet.

• Show results from both a simulation study plus using real credit card

data.

The Maths: Defining Loss

Consider estimating loss on 𝑛 accounts in a portfolio. For each account

𝑖 ∈ 1, … , 𝑛 :

• Let 𝐱𝑖 be a vector of characteristics of mixed data types.

• Let 𝑌𝑖 ∈ 0,1 be default event for account 𝑖; 1=default, 0=non-

default.

• Let 𝐿𝑖 ∈ ℝ be loss-given-default (LGD).

• Let 𝐸𝑖 > 0 be exposure-at-default (EAD).

Then, total loss on the portfolio is 𝑉 = 𝑌𝑖𝐿𝑖𝐸𝑖𝑛𝑖=1 .

Then, expected loss is 𝐸 𝑉 = 𝐸 𝑌𝑖𝐿𝑖𝐸𝑖𝑛𝑖=1 .

Introducing the risk models

• Suppose we have models 𝑚1,𝑚2,𝑚3 for probability of default (PD),

LGD and log-EAD respectively. Hence,

𝑃 𝑌𝑖 = 1|𝐱𝑖 = 𝑚1 𝐱𝑖

𝐿𝑖 = 𝑚2 𝐱𝑖 + 𝜀2,𝑖 log𝐸𝑖 = 𝑚3 𝐱𝑖 + 𝜀3,𝑖

where 𝜀2,𝑖 and 𝜀3,𝑖 are residual terms.

The Maths: Expected Loss

Then with change of variables, expected loss 𝐸(𝑌𝑖𝐿𝑖𝐸𝑖) can be rewritten as

𝑚1 𝐱𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1, 𝐱𝑖 d𝜖2,𝑖d𝜖3,𝑖

which can be approximated using Monte Carlo integration by

EL ≈1

𝑀 𝑚1 𝐱𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖

𝑀

𝑚=1

for random samples 𝜖2,𝑖 , 𝜖3,𝑖~𝑓:

• Assume independence of residuals from 𝐱𝑖, ie simulate from the

density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 .

• Estimate using either the empirical distribution or kernel density

estimation on training or validation data set.

• Note: I will not show derivation of these formulae, but these are available upon

request by email.

Quantile estimation of Loss

• It is valuable to consider the distribution of possible losses, and in

particular compute quantiles.

• This allows confidence intervals (CI) on Loss estimates.

• The 𝑞th quantile 𝑣𝑞 of 𝑉 is

𝑞 = 𝑓 𝑣|𝐱1, … , 𝐱𝑛 d𝑣𝐶

where 𝑓 is the density over 𝑉, conditional on characteristics, and

𝐶 = 𝑣: 𝑣 ≤ 𝑣𝑞 .

• Note: here 𝑞 is known and 𝑣𝑞 is unknown.

For example, to compute a 95%CI, find 𝑣𝑞 for 𝑞 = 0.025 and

𝑞 = 0.975: 𝑣0.025, 𝑣0.975 .

Quantile estimation of Loss using Monte Carlo

• Using Monte Carlo integration, this integral can be approximated by

𝑞 ≈1

𝑀 I 𝑣𝑖

𝑛

𝑖=1

≤ 𝑣𝑞

𝑀

𝑚=1

where 𝑣𝑖 = 𝑦𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖

and random samples 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖~𝑓.

• The loss quantile 𝑣𝑞 is easily estimated by ranking simulated values

𝑣𝑖𝑛𝑖=1 in ascending order and choosing the value at the 𝑀𝑞 rank.

Quantile estimation: Sampling

• We need to sample 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖~𝑓.

1.Notice 𝑓 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖|𝐱𝑖 = 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑦𝑖 , 𝐱𝑖 𝑃 𝑦𝑖|𝐱𝑖 .

2.Hence, for each account 𝑖, simulate 𝑦𝑖 = 0 or 1 from 𝑃 𝑦𝑖|𝐱𝑖 = 𝑚1 𝐱𝑖 .

3.If 𝑦𝑖 = 0, it does not matter how 𝜖2,𝑖 , 𝜖3,𝑖 are simulated, since 𝑦𝑖 = 0 ⇒

𝑣𝑖 = 0, always.

4.If 𝑦𝑖 = 1, simulate 𝜖2,𝑖 , 𝜖3,𝑖 from 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 , assuming that 𝜖2,𝑖 , 𝜖3,𝑖

are independent of 𝐱𝑖.

5.The density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 can be estimated based on a validation

data set of previous defaults. Either the empirical distribution or a kernel

density estimator (KDE) can be used.

Note: it is easy to simulate from a KDE: randomly sample an example from

the validation/training data, then add random noise corresponding to the

kernel function.

Why a simulation study?

• Simulate credit accounts with default, LGD and EAD outcomes and

correlations controlled by different predictor variables.

• Allows us to control the generating distribution for the data.

• Allows for testing and debug of models and loss estimation technique,

since we know the true values.

• Endless supply of artificial data allows for repeat experiments and

hence samples of results for statistical analysis.

Simulation study: Data generation

• A credit portfolio was simulated with multiple risk factors to simulate

default events, LGD and EAD.

• All variables are standard normally distributed,

• All variables are expressed as the sum of an observable and

unobservable component; only the observable component can be used

in the model built, hence simulating uncertainty.

• X1 and X2 are common to more than one component, hence inducing

a correlation.

Risk factors: X1 X2 X3 X4 X5

Default * * *

LGD * * *

EAD * *

Simulation study: models and distribution of residuals

LGD model 𝑅2=0.29

Log-EAD model 𝑅2=0.25

Contour map of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE:

LGD residual 𝜖2,𝑖

Simulation study: results

Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0

No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3

• 𝑀=5000 and repeat each experiment 100 times.

• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).

• EL error = % error for analytic expected loss estimate, compared to actual loss.

• EL MC error = % error for Monte Carlo expected loss estimate.

• 95% CI is % difference from EL estimate.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Main result: Reliable and accurate

predictions, but high error: +/-10%


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Increase sample size: more accuracy,

but less reliability.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Poor models (due to small training set)

leads to poor reliability.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Accuracy is sensitive to bandwidth in

KDE: perhaps just use the empirical

distribution for sampling.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Using a fixed value for LGD is fine,

so long as residual error for LGD is

used in MC sampling.

A similar result when using a fixed

value for EAD.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







Poor PD model (just one predictor

variable), leads to poor reliability.


Model details N

train

N

test

EL

error

EL MC

error

95% CI % below

Q2.5%

% above

Q97.5%

100 10 +1.38 +1.04 (-9.5,+9.8) 3 3

100 100 -0.13 -0.47 (-3.1,+3.1) 4 10

10 10 +0.14 -0.10 (-9.3,+10.1) 9 8

Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0

Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5

Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34

Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0







No need to include EAD as a

predictor variable in the LGD

model.

UK credit card data study

• Behavioural data for UK credit cards, observed during 2008-2011.

• Define default as 3 months missed payments within a 12 month period.

• Predictor variables include client and account ages, application data

(employment status, tenure status, months at current address) and

behavioural data (balance, utilization, past delinquency) .

• Build simple underlying models for PD using logistic regression, LGD

and log-EAD using OLS linear regression.

• Train / test over two different periods:-

Data set Observation date N train N test

A July 2008 21067 10533

B September 2009 15525 7762

Data set A

LGD model 𝑅2=0.09

Log-EAD model 𝑅2=0.74

Contour maps of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE

Credit card data: models and distribution of residuals


Data set B

LGD model R2=0.11

Log-EAD model R2=0.81


Credit card data study: Results

Model details EL

error

EL MC

error

95% CI EL

error

EL MC

error

95% CI

-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)

Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)

Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)

Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)

Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)

No EAD in LGD

model

-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)

• 𝑀=10000, average over 50 runs with different train / test split.




Data set A Data set B


Model details EL

error

EL MC

error

95% CI EL

error

EL MC

error

95% CI

-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)

Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)

Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)

Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)

Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)

No EAD in LGD

model

-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)






Monte Carlo simulation gives

accurate EL estimates, on

average. However, CI is

broad (+/-20%).


Model details EL

error

EL MC

error

95% CI EL

error

EL MC

error

95% CI

-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)

Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)

Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)

Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)

Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)

No EAD in LGD

model

-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)






Accuracy is sensitive to

bandwidth used in KDE.


Model details EL

error

EL MC

error

95% CI EL

error

EL MC

error

95% CI

-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)

Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)

Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)

Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)

Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)

No EAD in LGD

model

-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)






Accuracy is affected by using a

fixed value for LGD. Similar

result for EAD. Also, potentially bad

result with poor PD model

(ie insufficient predictors).


Model details EL

error

EL MC

error

95% CI EL

error

EL MC

error

95% CI

-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)

Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)

Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)

Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)

Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)

No EAD in LGD

model

-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)






No need to include EAD as a

predictor in the LGD model.

• When EAD is not explicitly included as a predictor in the LGD model,

the correlation between the LGD and log-EAD model residuals is

stronger, to compensate:-

Data set A

Contour maps of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE

Credit card data: LGD/EAD model residuals


Data set B


Conclusions and future work

• Monte Carlo simulation can be used to give reliable estimates of Loss,

and estimates of error in expected loss estimation.

• But, sensitivity to model risk. Care is needed to ensure the underlying

models are correctly specified.

• Future work:-

• Test procedure on other data (eg mortgage).

• Extend the exercise to include dynamic components:

environmental/macroeconomic conditions and forecasting.

• Use reliable prediction techniques (conformal predictors) to output

reliable confidence intervals, even with model error.

Loss Estimation using Monte Carlo Simulation

Thank you!

I hope you have found this presentation useful.

Any questions?

Dr Tony Bellotti

Senior Lecturer in Statistics

Department of Mathematics

Imperial College London

[email protected]

Part of the Statistics in Finance

Research Group at Imperial College

London.

Research, Training, Consultancy.

ICON: www.imperial-consultants.co.uk

www.imperial-business-partners.com

http://www.imperial-consultants.co.uk/



Loss Estimation using Monte Carlo Simulationabellott/Presentations/CSCC 2017 MC... ·...

Documents

Transcript of Loss Estimation using Monte Carlo Simulationabellott/Presentations/CSCC 2017 MC... ·...