Loss Estimation using Monte Carlo Simulationabellott/Presentations/CSCC 2017 MC... ·...
Transcript of Loss Estimation using Monte Carlo Simulationabellott/Presentations/CSCC 2017 MC... ·...
Loss Estimation using Monte Carlo
Simulation
Tony Bellotti, Department of Mathematics, Imperial College London
Credit Scoring and Credit Control Conference XV
Edinburgh, 29 August to 1 September 2017
Motivation
• Accurate estimation of loss based on underlying models of PD, LGD
and EAD.
• Use of Monte Carlo Simulation (integration) to avoid complex analytic
solution: giving a distribution of possible loss.
• Confidence intervals to quantify error in expected loss estimates.
• Applications:
• Internal risk management,
• Regulation (Basel 3),
• Accounting rules (IFRS9, CECL),
• Stress testing,
• Profit estimation.
Basic Idea
• Simple idea: Simulate Loss given by
For a portfolio of loans, with 𝑖 = 1 to 𝑛 accounts, compute
Loss = 𝑃𝐷𝑖 × 𝐿𝐺𝐷𝑖× 𝐸𝐴𝐷𝑖
𝑛
𝑖=1
for a portfolio of loans 𝑖 = 1 to 𝑛, where
• 𝑃𝐷𝑖 = probability of default;
• 𝐿𝐺𝐷𝑖= loss given default;
• 𝐸𝐴𝐷𝑖= exposure at default
across distributions of these risk factors, informed by models.
• Devil in the detail: relationship between these three risk factors.
Scope of this study
For this study, we considered the simplified problem:
• Assume no population change between training and forecast data (ie
IID data).
• Do not consider inclusion of economic conditions just yet.
• Show results from both a simulation study plus using real credit card
data.
The Maths: Defining Loss
Consider estimating loss on 𝑛 accounts in a portfolio. For each account
𝑖 ∈ 1, … , 𝑛 :
• Let 𝐱𝑖 be a vector of characteristics of mixed data types.
• Let 𝑌𝑖 ∈ 0,1 be default event for account 𝑖; 1=default, 0=non-
default.
• Let 𝐿𝑖 ∈ ℝ be loss-given-default (LGD).
• Let 𝐸𝑖 > 0 be exposure-at-default (EAD).
Then, total loss on the portfolio is 𝑉 = 𝑌𝑖𝐿𝑖𝐸𝑖𝑛𝑖=1 .
Then, expected loss is 𝐸 𝑉 = 𝐸 𝑌𝑖𝐿𝑖𝐸𝑖𝑛𝑖=1 .
Introducing the risk models
• Suppose we have models 𝑚1,𝑚2,𝑚3 for probability of default (PD),
LGD and log-EAD respectively. Hence,
𝑃 𝑌𝑖 = 1|𝐱𝑖 = 𝑚1 𝐱𝑖
𝐿𝑖 = 𝑚2 𝐱𝑖 + 𝜀2,𝑖 log𝐸𝑖 = 𝑚3 𝐱𝑖 + 𝜀3,𝑖
where 𝜀2,𝑖 and 𝜀3,𝑖 are residual terms.
The Maths: Expected Loss
Then with change of variables, expected loss 𝐸(𝑌𝑖𝐿𝑖𝐸𝑖) can be rewritten as
𝑚1 𝐱𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1, 𝐱𝑖 d𝜖2,𝑖d𝜖3,𝑖
which can be approximated using Monte Carlo integration by
EL ≈1
𝑀 𝑚1 𝐱𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖
𝑀
𝑚=1
for random samples 𝜖2,𝑖 , 𝜖3,𝑖~𝑓:
• Assume independence of residuals from 𝐱𝑖, ie simulate from the
density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 .
• Estimate using either the empirical distribution or kernel density
estimation on training or validation data set.
• Note: I will not show derivation of these formulae, but these are available upon
request by email.
Quantile estimation of Loss
• It is valuable to consider the distribution of possible losses, and in
particular compute quantiles.
• This allows confidence intervals (CI) on Loss estimates.
• The 𝑞th quantile 𝑣𝑞 of 𝑉 is
𝑞 = 𝑓 𝑣|𝐱1, … , 𝐱𝑛 d𝑣𝐶
where 𝑓 is the density over 𝑉, conditional on characteristics, and
𝐶 = 𝑣: 𝑣 ≤ 𝑣𝑞 .
• Note: here 𝑞 is known and 𝑣𝑞 is unknown.
For example, to compute a 95%CI, find 𝑣𝑞 for 𝑞 = 0.025 and
𝑞 = 0.975: 𝑣0.025, 𝑣0.975 .
Quantile estimation of Loss using Monte Carlo
• Using Monte Carlo integration, this integral can be approximated by
𝑞 ≈1
𝑀 I 𝑣𝑖
𝑛
𝑖=1
≤ 𝑣𝑞
𝑀
𝑚=1
where 𝑣𝑖 = 𝑦𝑖 𝑚2 𝐱𝑖 + 𝜖2,𝑖 exp 𝑚3 𝐱𝑖 + 𝜖3,𝑖
and random samples 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖~𝑓.
• The loss quantile 𝑣𝑞 is easily estimated by ranking simulated values
𝑣𝑖𝑛𝑖=1 in ascending order and choosing the value at the 𝑀𝑞 rank.
Quantile estimation: Sampling
• We need to sample 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖~𝑓.
1.Notice 𝑓 𝑦𝑖 , 𝜖2,𝑖 , 𝜖3,𝑖|𝐱𝑖 = 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑦𝑖 , 𝐱𝑖 𝑃 𝑦𝑖|𝐱𝑖 .
2.Hence, for each account 𝑖, simulate 𝑦𝑖 = 0 or 1 from 𝑃 𝑦𝑖|𝐱𝑖 = 𝑚1 𝐱𝑖 .
3.If 𝑦𝑖 = 0, it does not matter how 𝜖2,𝑖 , 𝜖3,𝑖 are simulated, since 𝑦𝑖 = 0 ⇒
𝑣𝑖 = 0, always.
4.If 𝑦𝑖 = 1, simulate 𝜖2,𝑖 , 𝜖3,𝑖 from 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 , assuming that 𝜖2,𝑖 , 𝜖3,𝑖
are independent of 𝐱𝑖.
5.The density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 can be estimated based on a validation
data set of previous defaults. Either the empirical distribution or a kernel
density estimator (KDE) can be used.
Note: it is easy to simulate from a KDE: randomly sample an example from
the validation/training data, then add random noise corresponding to the
kernel function.
Why a simulation study?
• Simulate credit accounts with default, LGD and EAD outcomes and
correlations controlled by different predictor variables.
• Allows us to control the generating distribution for the data.
• Allows for testing and debug of models and loss estimation technique,
since we know the true values.
• Endless supply of artificial data allows for repeat experiments and
hence samples of results for statistical analysis.
Simulation study: Data generation
• A credit portfolio was simulated with multiple risk factors to simulate
default events, LGD and EAD.
• All variables are standard normally distributed,
• All variables are expressed as the sum of an observable and
unobservable component; only the observable component can be used
in the model built, hence simulating uncertainty.
• X1 and X2 are common to more than one component, hence inducing
a correlation.
Risk factors: X1 X2 X3 X4 X5
Default * * *
LGD * * *
EAD * *
Simulation study: models and distribution of residuals
LGD model 𝑅2=0.29
Log-EAD model 𝑅2=0.25
Contour map of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE:
LGD residual 𝜖2,𝑖
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Main result: Reliable and accurate
predictions, but high error: +/-10%
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Increase sample size: more accuracy,
but less reliability.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Poor models (due to small training set)
leads to poor reliability.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Accuracy is sensitive to bandwidth in
KDE: perhaps just use the empirical
distribution for sampling.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Using a fixed value for LGD is fine,
so long as residual error for LGD is
used in MC sampling.
A similar result when using a fixed
value for EAD.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Poor PD model (just one predictor
variable), leads to poor reliability.
Simulation study: results
Model details N
train
N
test
EL
error
EL MC
error
95% CI % below
Q2.5%
% above
Q97.5%
100 10 +1.38 +1.04 (-9.5,+9.8) 3 3
100 100 -0.13 -0.47 (-3.1,+3.1) 4 10
10 10 +0.14 -0.10 (-9.3,+10.1) 9 8
Bandwidth=high 100 10 +0.81 +5.96 (-10.2,+10.6) 15 0
Fix LGD 100 10 -0.11 (-9.5,+9.7) 3 5
Fix LGD, 𝜖2,𝑖=0 100 10 -5.90 (-8.5,+8.7) 0 34
Poor PD model 100 10 +11.4 +10.5 (-8.8,+11.6) 46 0
No EAD in LGD model 100 10 -5.84 -0.05 (-9.44,+9.73) 3 3
• 𝑀=5000 and repeat each experiment 100 times.
• Ntrain, Ntest are numbers of examples in train and test data sets (in 1000’s).
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
No need to include EAD as a
predictor variable in the LGD
model.
UK credit card data study
• Behavioural data for UK credit cards, observed during 2008-2011.
• Define default as 3 months missed payments within a 12 month period.
• Predictor variables include client and account ages, application data
(employment status, tenure status, months at current address) and
behavioural data (balance, utilization, past delinquency) .
• Build simple underlying models for PD using logistic regression, LGD
and log-EAD using OLS linear regression.
• Train / test over two different periods:-
Data set Observation date N train N test
A July 2008 21067 10533
B September 2009 15525 7762
Data set A
LGD model 𝑅2=0.09
Log-EAD model 𝑅2=0.74
Contour maps of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE
Credit card data: models and distribution of residuals
LGD residual 𝜖2,𝑖
Data set B
LGD model R2=0.11
Log-EAD model R2=0.81
LGD residual 𝜖2,𝑖
Credit card data study: Results
Model details EL
error
EL MC
error
95% CI EL
error
EL MC
error
95% CI
-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)
Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)
Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)
Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)
Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)
No EAD in LGD
model
-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)
• 𝑀=10000, average over 50 runs with different train / test split.
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Data set A Data set B
Credit card data study: Results
Model details EL
error
EL MC
error
95% CI EL
error
EL MC
error
95% CI
-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)
Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)
Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)
Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)
Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)
No EAD in LGD
model
-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)
• 𝑀=10000, average over 50 runs with different train / test split.
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Data set A Data set B
Monte Carlo simulation gives
accurate EL estimates, on
average. However, CI is
broad (+/-20%).
Credit card data study: Results
Model details EL
error
EL MC
error
95% CI EL
error
EL MC
error
95% CI
-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)
Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)
Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)
Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)
Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)
No EAD in LGD
model
-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)
• 𝑀=10000, average over 50 runs with different train / test split.
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Data set A Data set B
Accuracy is sensitive to
bandwidth used in KDE.
Credit card data study: Results
Model details EL
error
EL MC
error
95% CI EL
error
EL MC
error
95% CI
-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)
Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)
Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)
Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)
Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)
No EAD in LGD
model
-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)
• 𝑀=10000, average over 50 runs with different train / test split.
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Data set A Data set B
Accuracy is affected by using a
fixed value for LGD. Similar
result for EAD. Also, potentially bad
result with poor PD model
(ie insufficient predictors).
Credit card data study: Results
Model details EL
error
EL MC
error
95% CI EL
error
EL MC
error
95% CI
-4.05 +2.03 (-14.7,+20.0) -5.75 +0.09 (-17.9,+27.8)
Bandwidth=high -5.00 +5.59 (-15.2,+20.7) -4.30 +7.71 (-19.0,+29.3)
Fix LGD -6.24 -5.05 (-14.0,+18.3) -2.54 -4.45 (-17.7,+28.6)
Fix LGD, 𝜖2,𝑖=0 -5.57 -6.53 (-12.9,+16.4) -4.01 -8.56 (-15.1,+21.7)
Poor PD model -5.65 -3.11 (-15.2,+20.5) -4.66 -22.4 (-20.1,+31.9)
No EAD in LGD
model
-5.12 -0.65 (-14.4,+19.2) -2.26 +2.80 (-18.5,+32.4)
• 𝑀=10000, average over 50 runs with different train / test split.
• EL error = % error for analytic expected loss estimate, compared to actual loss.
• EL MC error = % error for Monte Carlo expected loss estimate.
• 95% CI is % difference from EL estimate.
Data set A Data set B
No need to include EAD as a
predictor in the LGD model.
• When EAD is not explicitly included as a predictor in the LGD model,
the correlation between the LGD and log-EAD model residuals is
stronger, to compensate:-
Data set A
Contour maps of density 𝑓 𝜖2,𝑖 , 𝜖3,𝑖|𝑌𝑖 = 1 using KDE
Credit card data: LGD/EAD model residuals
LGD residual 𝜖2,𝑖
Data set B
LGD residual 𝜖2,𝑖
Conclusions and future work
• Monte Carlo simulation can be used to give reliable estimates of Loss,
and estimates of error in expected loss estimation.
• But, sensitivity to model risk. Care is needed to ensure the underlying
models are correctly specified.
• Future work:-
• Test procedure on other data (eg mortgage).
• Extend the exercise to include dynamic components:
environmental/macroeconomic conditions and forecasting.
• Use reliable prediction techniques (conformal predictors) to output
reliable confidence intervals, even with model error.
Loss Estimation using Monte Carlo Simulation
Thank you!
I hope you have found this presentation useful.
Any questions?
Dr Tony Bellotti
Senior Lecturer in Statistics
Department of Mathematics
Imperial College London
Part of the Statistics in Finance
Research Group at Imperial College
London.
Research, Training, Consultancy.
ICON: www.imperial-consultants.co.uk
www.imperial-business-partners.com