survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic...

18
1 Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software Specialist – Analytics, SAS R&D (India) Pvt. Ltd., and Naeem Siddiqi, Senior Solutions Specialist, SAS Institute Inc.) Introduction The recent global credit crisis has reiterated the need for bankers to focus on macro-economic factors (such as inflation, house price index, and so on) that severely impact the default behavior of credit portfolios. At the micro level, the response to these economic changes varies with individual capacity and tolerance. For example, although accounts within the same credit score or rating group show homogeneous behavior in default performance, the individual borrowers respond to macroeconomic shocks in a different manner than others in the same segment. This results in scattered default distributions within the predefined performance window. Because defaults are scattered across time, assessing the macro economic impact on a credit portfolio is a challenging task. In credit scoring, the traditional probability-of-default (PD) model is used to estimate the individual likelihood to default. This involves generating a sample of new or existing accounts at one point in time and monitoring the performance of those accounts across a specified performance window to determine their default or non-default status. It is generally assumed that the probability of default is constant through the entire performance window, and the outcome of the model is a probability of default value (PD) that is valid for the full window. Because the defaults are scattered across time in the performance window, PD varies with respect to time within the same window. Along with the PD, the time of default can be a key piece of information for bankers, allowing them to create better account management strategies. The traditional methodology makes it difficult to calculate PD by different time points and the time of default within the specified performance window. In this paper, we propose a survival analysis workflow for two purposes: to observe the impact of macro-economic shocks on default behavior of credit exposures to predict the time of default We intend to use traditional PD models in conjunction with a survival analysis technique (explained later in this paper) for overcoming the challenges associated with traditional PD models. In this paper, we explain a survival model called “Cox Proportional Hazard Model” that can be used to establish a relationship between macro-economic variables and the hazard

Transcript of survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic...

Page 1: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

1

Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software Specialist – Analytics, SAS R&D (India) Pvt. Ltd., and Naeem Siddiqi, Senior Solutions Specialist, SAS Institute Inc.)

Introduction

The recent global credit crisis has reiterated the need for bankers to focus on macro-economic factors (such as inflation, house price index, and so on) that severely impact the default behavior of credit portfolios. At the micro level, the response to these economic changes varies with individual capacity and tolerance. For example, although accounts within the same credit score or rating group show homogeneous behavior in default performance, the individual borrowers respond to macroeconomic shocks in a different manner than others in the same segment. This results in scattered default distributions within the predefined performance window. Because defaults are scattered across time, assessing the macro economic impact on a credit portfolio is a challenging task.

In credit scoring, the traditional probability-of-default (PD) model is used to estimate the individual likelihood to default. This involves generating a sample of new or existing accounts at one point in time and monitoring the performance of those accounts across a specified performance window to determine their default or non-default status. It is generally assumed that the probability of default is constant through the entire performance window, and the outcome of the model is a probability of default value (PD) that is valid for the full window. Because the defaults are scattered across time in the performance window, PD varies with respect to time within the same window. Along with the PD, the time of default can be a key piece of information for bankers, allowing them to create better account management strategies. The traditional methodology makes it difficult to calculate PD by different time points and the time of default within the specified performance window.

In this paper, we propose a survival analysis workflow for two purposes:

• to observe the impact of macro-economic shocks on default behavior of credit exposures

• to predict the time of default

We intend to use traditional PD models in conjunction with a survival analysis technique (explained later in this paper) for overcoming the challenges associated with traditional PD models. In this paper, we explain a survival model called “Cox Proportional Hazard Model” that can be used to establish a relationship between macro-economic variables and the hazard

Page 2: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

2

function (that is, the rate of change of probability of default at time t) at different time points in a observation window. The survival distribution of default can be estimated by this model. The survival function is used to determine the time of default. Further analysis is then performed to obtain the expected time-dependent default count distribution per segment. Finally, variances of macro-economic factors are used to stress-test the credit portfolio.

This paper also gives insights into the data requirements, SAS procedures for survival analysis, and the techniques to validate and monitor the workflow for accuracy and consistency.

Benefits of Using Survival Analysis in Credit Scoring

1. Modelers in banks are aware of the fact that macro-economic variables affect probabilities of default that are generated using models. These macro-economic variables can change at any time during the observation period. However, such variables, and in particular, the leading indicators, have not been included in the traditional logistic regression models used in banking. The impact of macro-economic variables on the PD value can be gauged by considering those variables as time-dependent covariates in the survival analysis model.

The survival function of default as well as the time of default can be derived from that survival model. Time of default here refers to the survival time of a particular account. Survival time is the time from when the account was opened to the date of default. If the account does not default by the end of the observation period, that observation can be considered as a censored event.

For analyzing the survival data, the hazard function gives the rate of change of probability of default at time t. ℎ( ) = lim∆ → ( ∆ | )∆

The probability of survival at time t can be given in terms of the hazard function: ( ) = exp{− ℎ( ) }

This is the probability of survival at time t. In credit scoring scenario, this is the probability that an account has not defaulted by time t after the account was opened (that is, 1-PD at time t). If the interval censoring is considered, then this indicates the probability that an account has not defaulted by time t after the start of the observation period.

There are several ways to generate the hazard function. We will use the Cox Proportional Hazard (Cox PH) model because it allows the inclusion of macro-economic variables as time-dependent covariates in the model. By Cox PH model,

Page 3: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

3

h (t) = λ (t) exp ( β X (t) + β X (t) + … β X (t))

where,

i is a subscript for observation i=1(1)n. h (t) = Hazard of the ith individual at time t.

λ0 (t) = Baseline hazard function. X (t) … … X (t) are the k time-dependent covariates of the ith individual at time t. β …..β are the regression parameters associated with the covariates, estimated by the Partial Likelihood method in Cox model.

For applying Cox PH model in the credit scoring environment, macro-economic variables should be captured at account default date, prior to default date (any lag period can be considered), at account open date, and at regular time intervals during the observation period. The change in macro-economic values between the account open date and the default date or prior to the default date (considering any lag period, for example, 6 months, 3 months, and so on) can also be considered as a time-dependent covariate in the model. The time of default and the default indicators in the observation window should also be captured. The survival probabilities for each of the individuals at failure time points can be estimated by using the baseline survival function and the parameter estimates β of β by maximum likelihood estimation and the macro-economic values at those time points. If a user changes the macro-economic values, the user can get the different survival probabilities of those accounts at default time points. In this way, the Cox PH model can help to see the impact of those variables on the probability of default.

2. Survival analysis can be used to determine the probability of default across time in an observation window. It is also possible to derive the survival function of default for future time points. This represents an advantage over the traditional PD-estimation methods, which limit the prediction to a predefined observation window. By generating PDs across time, a risk manager can adjust account treatment strategies according to an account’s risk level at that point in time, rather than rely on a single, long-term PD forecast that might not represent the account’s current risk levels.

Figure 1 shows an example of the probability of default across time in an observation window, as generated by using survival analysis. Figure 2 shows an example of the survival function of default across time in that observation window. Both the graphs are based on the same sample data. From Figure 1 and Figure 2, it is evident that as time passes, the probability of default increases, and the corresponding survival function decreases. At the early stages of the performance window, the change in the macro-economic values might not be significant to

Page 4: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

cause dconsideof the phigh. Hodefault

Figure 1

Figure 2

efault evenrably, which

performanceowever, as tmight incre

1:

2:

ts. Howeveh might force window, ttime passes

ease, and th

r, as time pace the accouhe probabil, due to chae survival p

4

asses, thoseunts to defality of defauanges in ecorobability m

e economic ault. Thus, itult is low, anonomic condmight decrea

conditions t indicates tnd the survivditions, the ase.

might vary that at an eaval probabiprobability

arly stage lity is of

Page 5: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

3. Bankeforecastbuild seinsight ihow thegives thsimple Psurvival

Figure 3subpopuaccountsegmenscore seaccount

Figure 3observesegmenFigure 4segmen

Figure 3

ers generallts for specif

eparate survinto how the PDs of fivee risk mana

PD predictio functions o

3 and Figureulations arets—that is, t

nt, the probaegments. Sets.

3 shows the ed that, for snts. Howeve4, it can be onts. Howeve

3:

ly build sevefic populatiovival modelse PDs of the

e separate sager deeperon for the wof default fo

e 4 are basee based on tthe highest-ability of de

egment E ha

trend of PDsegment A,

er, for segmeobserved th

er, for segme

eral segmenons in that ps for the vare various susubpopulatior insights int

whole perforor the variou

d on the samthe scores o-risk-profile

efault is highas the highe

D values for the distribu

ent E, the dhat for segment E, the su

5

nted modelsportfolio. Asrious segmeub-populatioons within ato account brmance winus segments

me sample of the accou accounts. T

her comparest-score acc

segments aution of PD istribution o

ment A, survurvival func

s for any givs an extensi

ents in the pons vary acra portfolio vbehavior thadow. Figures can be com

data. Let usnts. Segme

This means ed to the accounts—tha

across time.values acroof PD valuesival functiontion of defa

ven portfolioon of the ab

population tross time. Fivary across tan when coe 4 shows hmpared.

s assume thnt A has lowthat for the

ccounts thatat is, lowest

. In the figurss time is his across timn of defaultault is high.

o, to generabove, a useo generate igure 3 illusttime. Againmpared to oow the diffe

at the west-score e accounts int belong to t-risk-profile

re, it can beigh in all the

me is low. In t is low in al

ate better r can further trates , this one erent

n that higher-

e

e e the l the

Page 6: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

Figure 4

4. Usingof defauderive tFigure 5monitorcohort a

Figure 5accountmeans tthe accoaccountparticulsegmen

4:

g the survivault for each he expected

5 shows an er the observanalysis use

5 is based onts. Segmentthat for the ounts that bts—that is, tar time poin

nt E.

al functions account, wd number oexample of vation of poed widely in

n sample dat A has lowe

accounts inbelong to hithe lowest-rnt, the perc

generated ithin all the

of defaults fosuch an ana

ortfolios willthe industr

ata. Let us aest-score accn that segmgher-score risk-profile

centage of d

6

by the surv segments. or each segalysis for fiv recognize t

ry.

ssume thatcounts—thaent, the prosegments. Saccounts. F

default is rel

ival modelsWe can thement in con

ve pools acrothis chart as

segments aat is, the higobability of Segment E hrom Figure atively decr

s, it is possiben use thesensecutive fuoss five tims being simi

are based oghest-risk-pdefault is hihas the high5, it is evide

reasing from

ble to derivee predictionuture time pe points. Thlar to vintag

n the scoresrofile accouigher compahest-score ent that, form segment A

e the time ns to points. hose who ge or

s of the unts. This ared to

r a A to

Page 7: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

Figure 5

5. Becauimpact odefault changesvariable

Let us aFigure 6based o

Figure 6

5:

use macro-eof changes can be recas in default es, can be m

ssume that 6 shows the on certain m

6:

economic vain macro-ec

alculated at behavior, w

monitored.

five differeexpected p

macro-econo

ariables areconomic confuture time

which result

ent survival mpercentage oomic values.

7

e being usednditions on e points. Fig

due to chan

models havof default a.

d as time-dethe predicture 6 showsnges in five

ve been builtcross future

ependent coed default rs an exampselected m

t for five dife time point

ovariates herates and thle of how thacro-econo

fferent segmts for five se

ere, the he time of he mic

ments. egments

Page 8: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

Now, a on this son the mupdatedmight b

Table 1:

Time

2345

The usesegmenfrom thtime. It percentappearsthe varichanges

Figure 7

user might sort of repomacro-econd macro-ecoe interested

:

Updated 1 <value> 2 <value> 3 <value> 4 <value> 5 <value>

er can then rnts for thosee survival fuis also poss

tage of defas different fation in pes in the mac

7:

be interesteort. Now, letnomic valuesonomic valud to see its i

Macro-econ

recalculate te future timunctions, it sible to geneult distributrom the earrcentage of

cro-econom

ed to see wt us assumes for future

ues are signiimpact on t

nomic Values

the survivale points, fois possible terate same tion across rlier one (Figf default acr

mic values.

8

hat impact e that the us

time pointsificantly diffhe default b

s

l function ofr updated mto predict wkind of repotime for thegure 6). Thiross time (w

the changesser has the cs from any tferent frombehavior (Ta

f default formacro-econowhich accouort (Figure 6e five segmes sort of ana

within the su

s in macro-ecapability totime series m the previouable 1).

r each accouomic valuesnts will defa6) that definents. This realysis can heub-populatio

economic vao perform itmodel. Nowus values, th

unt within as (Table 1). Tault and at wnes the expeeport (Figurelp bankersons), with a

alues has teration

w, if the he user

all Then, what ected e 7)

s to see any

Page 9: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

Proce

Data rTherbe uand as thchanas it

ss Flow o

requiremere are no lim

used for thiswhether th

he followingnges in the t incorporat

infla

inte

unem

hous

cons

aver

stoc

hous

GDP

mon

of Surviva

ents mitations ons exercise. There is a busg are considindicator oves the direc

ation rate

rest rates a

mployment

sing price in

sumer price

rage person

k market in

sing starts a

P

ney supply

al Analys

n the numbeThe exact lissiness buy-indered. Althover a 3 to 6 ction in whic

nd bond yie

t rate and em

ndex

e index

al income g

dex

and similar s

9

sis in Cred

er or the nast will depenn for using t

ough the rawmonths perch that indic

elds

mployment

growth rate,

statistics su

dit Scorin

ature of macnd on the avthe data. In w value of eriod can be cator is mov

growth rat

, growth rat

ch as buildi

ng

cro-economvailability ofgeneral, lea

each indicatoa better preving.

e

te

ng permits

mic variablesf data in eacading indicaor can be usedictor of th

issued

s that can ch market ators such sed, he future,

Page 10: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

10

manufacturing output

corporate profits and inventory levels

In addition to the macro-economic variables above, demographic (postal code, residential status, age, and so on) and behavioral variables (account-related variables) can also be considered. If the segmentation was based on these variables, then for the Survival model, only the macro-economic variables should be considered. The default time of each defaulted account should be captured in the modeling data. If the account does not default in the outcome period, then the entire outcome period should be considered as survival time for that account. The default indicator should also be considered in the modeling data. Macro-economic variables should be captured at account default date, prior to default date (any lag period can be considered), at account open date, and at regular time intervals during the observation period. The change in macro-economic values between the account open date and the default date or prior to the default date (considering any lag period, for example, 6 months, 3 months, and so on) can also be considered as time-dependent covariate in the model. For the survival model, interval censoring can be considered. If the defaults are captured for a specific time period, then the censoring technique can be considered as interval censoring. Table 2 shows the structure of a sample modeling analytical base table (ABT) for using survival analysis in a credit-scoring environment. Consider a sample data consisting of the following: 1. ID (account identification number) 2. Time (time of default of the account in a predefined observation period) 3. Default (censoring status where 1 = default and 0 = censored) 4. Consider that a time-dependent covariate is captured at default dates cov1 – cov6 (a covariate value at six default times in the observation period). covi denotes a covariate value at the ith time and i=1(1)6. For instance, the first account defaults at time3. Therefore, up to time 3, the covariate values have been captured. The second account defaults at time 4. Therefore, up to time 4, the covariate values have been captured. The third account does not default in the outcome period. Therefore, up to time 6, its covariate values can be captured. The third account will be considered a censored event.

Page 11: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

11

Table2:

Account id time default cov1 cov2 cov3 cov4 cov5 cov6

1 3 1 <value> <value> <value> 2 4 1 <value> <value> <value> <value> 3 6 0 <value> <value> <value> <value> <value> <value>

Study the impact of macro-economic variables on default behavior (using Cox model by PROC PHREG)

Many types of models have been used for survival analysis. In this paper, the Cox proportional hazard model has been mentioned. In SAS, the PHREG procedure performs regression analysis of the survival data based on the Cox proportional hazard model.

1. To handle time-dependent covariate in PHREG procedure, the counting process style of input is used. The advantage of this format that user can easily get the residual and influence statistics. In this format, for each record, multiple records are crated, one record for each distinct pattern of the time-dependent measurements. Each record contains a T1 value and a T2 value, representing the time interval T1and T2 during which the values of the explanatory variables remain unchanged. Each record also contains the censoring status at T2.

Following is an example of a modeling data set that is created in counting process style of input format. PHREG can be run on this type of data set.

Table 3:

Account id Time default T1 T2 status covariate

1 3 1 0 1 0 <value> 1 3 1 1 2 0 <value> 1 3 1 2 3 1 <value>

2. Consider that we have a scoring data set, and we want to find the predicted survival probability of default across time for the accounts in that data set.

Table 4:

Page 12: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

12

Account id time covariate

1 1 <value> 1 2 <value> 1 3 <value> 1 4 <value>

PROC PHREG can be run on the modeling data set (Table 3) and can generate survival probability for the accounts in the scoring data set (Table 4). Following is a sample SAS code for generating survival probabilities.

Data Zerocov; covariate= 0; Run;

proc sort data = scoredata; by time; run;

proc phreg data= inputdata; ods output ParameterEstimates = para_est FitStatistics = fit_stat ; model (T1,T2)*Status(0)=covariate; baseline covariates=Zerocov out=Basesurv Survival=Basesurvival; id ID Time Default; run; proc sql; select Estimate into:Estimate from para_est ; quit; data Scoredata(drop=lastS);

retain lastS 1;

merge Scoredata Basesurv(rename=(t2=time)drop= covariate);

by time;

if missing(Basesurvival) then Basesurvival=lastS;

else lastS= Basesurvival;

survival= Basesurvival ** exp(&Estimate*covariate);

run;

Page 13: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

This wayfollowinprobabi

Table 5:

Accounid

A user cprobabimacro-e

3. One aresidual

proc p mode outp run;

In the duser pro

DifferenThe couwhich is

A. The M

Consideindicatecountin

y, it is possing table (Tabilities for ea

:

nt time

1 11 21 31 41 51 6

can change tility for eacheconomic va

advantage ols and influe

phreg da

el (T1,Tput out=

ev_est dataoceed on re

nt Survival unting proces known as t

Multiplicativ

er a set of n es the numbg process

ble to geneble 5) showch account

survival probabil

1 <value> 2 <value> 3 <value> 4 <value> 5 <value> 6 <value>

the covariath account foariables on

of using the ence statisti

ata= inpu

T2)*Statu= dev_est

a set, the vasidual analy

Methods ess formulatthe multipli

ve Hazards

accounts sober of obser

are step f

rate survivas the structacross time

lity

te values foor those timdefault beh

counting pics. To do th

utdata;

us(0)=cot xbe

riable “marysis of the m

tion enablescative haza

Model

o that the corved events unctions th

13

al probabiliture of the d

e:

or future timme points. Inhavior.

rocess formhis, you can

ovariate;ta = est

rting” contamodel.

s PROC PHRrds model.

ounting prothat occur oat have jum

ty for each adata set tha

me points ann this way, it

mulation is thuse the foll

; t resmar

ins marting

REG to fit a s

ocess over time t.

mps of size

account acrot contains t

nd can get dt is possible

hat you canlowing code

t= mart

gale residua

superset of

. The sample, with

oss time. Thhe survival

ifferent sure to see the

easily obtae:

ting ;

ls that can h

the Cox mo

for the ith e paths of t

. Let

he

rvival impact of

ain various

help the

odel,

account he

denote

Page 14: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

the vect is de

where,

indotherwi

is t

is

B. Piece

In PieceThen, cohazard f

ConsideLet

When h

where,

The bas

where,

tor of unknonoted by

dicates whetise

the vector o

an unspecif

e-wise Const

e-wise Constonsidering afunction is g

ering single f

hazards are i

seline cumu

own regress

ther the th).

of explanato

fied baseline

tant Baselin

tant Baselina baseline hgenerated.

failure time

in original s

lative hazar

sion coeffici

h account de

ory variables

e hazard fun

ne Hazard M

ne Hazard Mazard value

e variable cabe a

cale, the ha

rd function i

14

ients. The m

efaults at tim

s for the th

nction.

Model

Model, the pe for each of

ase, let partition of

azard functio

is denoted b

multiplicativ

me (specif

h account at

period is divf the interva

f the time ax

on for accou

by

e hazards fu

fically,

t time .

vided into cals, the base

be xis.

unt is den

unction

if defau

ertain intereline cumul

the observe

oted by

for

ults,

vals. ative

ed data.

Page 15: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

For usindata. Le

Then,

the form

The BAYbaselinecan be u

proc p ods mode baye run;

The codpartitionof event

Algor

After a mthe surv

The timTheoretto get a Now, if time mijudgme

ng counting et

.

should

mulation fo

YES statemee hazard moused.

phreg dataoutput Pa

el (T1,T2)es seed=1

de specifies ns the time ts in each in

rithm for P

model has bvival functio

e point at wtically, the s zero probathe predicteght be connt. It can va

process sty

be replaced

or the single

ent invokes odel (also kn

a= inputdaarameterEs)*Status(0piecewise

the numberaxis into th

nterval.

Predicting

been createon of default

which the susurvival probability of sured survival sidered the

ary industry

yle of input, be a partit

d with

e failure tim

a Bayesian nown as the

ata; stimates =0)=covariae=hazard (

r of intervalhe given num

g the Time

ed, it can be t for an acco

urvival probbability can rvival. To coprobability time of defto industry

15

let tion of the t

me variable

analysis of te piecewise

= para_estate; (n =interv

ls with consmber of inte

e of Defaul

used to preount can be

ability is zerbe zero. Hompensate tis less than fault. The th

y.

ime axis, w

e applies.

the Cox moexponentia

t ;

val);

stant baselinervals with a

lt

edict defaule generated

ro can be coowever, prathis, a user cor equal to

hreshold val

bhere

del or the pal model). Th

ne hazard raapproximate

ts. By using.

onsidered asctically, it mcan predefin

o the thresholue can be b

e the obser for all

piecewise cohe following

ates. PROC Pely equal nu

g the Cox PH

s the time omight not bene a threshoold value, thbased on bu

rved

onstant g code

PHREG umber

H model,

of default. e possible old value. hen that

usiness

Page 16: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

Generalopenedaccountconsideobservatime wo

Validat1. A sta

modpast(preof abe pclos

Figure 8

This gradefaultseach oth

In additbetwee

A measu

lly, the surv. If the accot was openered for the

ation periodould be the

tion Technandard pracdeling procet can be useedicted timen account is

predicted. Ine to the act

8:

ph is baseds of accounther.

ion to a visun the actua

ure can be c

vival time ofount never ded to the ensurvival mo

d starts to thentire obse

ique for Prctice in any ess. For the ed. The surve to default)s known forn an ideal sctual time of

on sample ts are plotte

ual confirmal and the pr

calculated a

f an accountdefaults, thend of the obodel, then thhe date of dervation per

redicting Tpredictive mvalidation p

vival functio) based on ar the develocenario, thedefault.

data. The aed across tim

ation, someredicted obs

as:

16

t is measureen the surviservation phe survival tefault. If th

riod.

Time of Defmodeling prprocess, a van is applied

a specified topment datae predicted t

actual numbme. In an id

e statistics cservations.

ed from the val time caneriod. If thetime can bee account d

fault roject is to validation da on this dat

threshold vaa set. The timtime of defa

ber of defaueal case, bo

an be gener

date when n be measue interval ce measured

does not def

validate theata set froma set to gen

alue. The acme of defauault for an a

lts and the oth the plots

rated to me

the accounred from w

ensoring is from when fault then su

e outcome om some poinnerate scorectual time ofult of an accaccount sho

predicted ns should be

easure the d

nt was hen the

the urvival

of the t in the

es f default count can uld be

number of close to

difference

Page 17: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

17

( ℎ ℎ − ℎ )

Where, n is the total number of accounts in the data.

The absolute difference of the actual and the predicted time of default can be considered. The discriminatory power of a time of default model gets better as the above measure gets closer to zero, indicating a closer match between the predicted and actual values.

Some other statistics (for example, Brier score) can also be generated in this scenario. A non-parametric test, such as median test, can also be performed to compare the median actual time of default with the median predicted time of default.

Page 18: survival analysis workflow...Survival Analysis Workflow assessing the impact of Macro-Economic Shocks on Credit Portfolios and predicting the Time of Default (By Anupam Saha, Software

18

Acknowledgements

I would like to thank Mrs. Billie Anderson, Mr. Ying So, Mr. David Park, Mr. Rajesh Khanwelkar, and the SAS Credit Scoring for Banking team.

References

Paul D. Allison. Survival Analysis using SAS A Practical Guide

SAS/STAT 9.2 User’s Guide

Elisa T Lee, John Wenyu Wang. Statistical methods for survival data analysis

John P Klein, Melvin L. M. Moeschberger. Survival analysis: techniques for censored and

truncated data

Shenyang Guo. Survival Analysis

J D Kalbfleisch, Ross L. prentice. The statistical analysis of failure time data

Miller R.G. 1981 “Survival Analysis.” New York Wiley

Peto R. 1973. ”Experimental Survival curves for interval-Censored Data.” Applied Statistics 22:86-93

Alan B. Cantor. SAS Survival Analysis Techniques

Jeraid F. Lawless. Statistical models and methods for lifetime data

David W.Hosmer, stanleylemeshow. Applied Survival Analysis: Regression Modeling of Time to Event Data (Wiley Series in Probability and Statistics)