Combining GLM and data mining techniques
description
Transcript of Combining GLM and data mining techniques
![Page 1: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/1.jpg)
1
Combining GLM and data mining techniques
Greg Taylor
Taylor Fry Consulting Actuaries
University of Melbourne
University of New South Wales
Casualty Actuarial Society
Special Interest Seminar on Predictive Modeling
Boston, October 4-5 2006
![Page 2: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/2.jpg)
2
Overview
• Examine general form of model of claims data• Examine the specific case of a GLM to represent
the data • Consider how the GLM structure is chosen• Introduce and discuss Artificial Neural Networks
(ANNs)• Consider how these may assist in formulating a
GLM• Presentation draws heavily on work of colleague
Dr Peter Mulquiney
![Page 3: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/3.jpg)
3
Model of claims data
• General form of claims data model
Yi = f(Xi; β) + εi
• Yi = some observation on claims experience
• β = vector of parameters that apply to all observations
• Xi = vector of attributes (covariates) of i-th
observation
• εi = vector of centred stochastic error terms
![Page 4: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/4.jpg)
4
Model of claims data
• General form of claims data model
Yi = f(Xi; β) + εi
• Yi = some observation on claims experience• β = vector of parameters that apply to all observations
• Xi = vector of attributes (covariates) of i-th observation
• εi = vector of centred stochastic error terms
• Examples1. Yi = Yad = paid losses in (a,d) cell
• a = accident period• d = development period
2. Yi = cost of i-th completed claim
![Page 5: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/5.jpg)
5
Examples (cont’d)
• Yad = paid losses in (a,d) cell
1. E[Yad] = βd Σr=1d-1 Yar (chain ladder)
![Page 6: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/6.jpg)
6
Examples (cont’d)
• Yad = paid losses in (a,d) cell
1. E[Yad] = βd Σr=1d-1 Yar (chain ladder)
2. E[Yad] = A db exp(-cd) = exp [α+β ln d - γd] (Hoerl curve for each accident period’s payments)
![Page 7: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/7.jpg)
7
Examples (cont’d)
• Yad = paid losses in (a,d) cell1. E[Yad] = βd Σr=1
d-1 Yar (chain ladder)2. E[Yad] = A db exp(-cd) = exp [α+β ln d - γd] (Hoerl
curve for each accident period’s payments)
• Yi = cost of i-th completed claim• Yi ~ Gamma• E[Yi] = exp [α+β ti]where1. ai = accident period to which i-th claim belongs2. ti = operational time at completion of i-th claim
= proportion of claims from the accident period ai completed before i-th claim
![Page 8: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/8.jpg)
8
Examples of individual claim models
More generally
E[Yi]
= exp {function of operational time}
![Page 9: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/9.jpg)
9
Examples of individual claim models (cont’d)
More generally
E[Yi]
= exp {function of operational time
+ function of accident period (legislative change)}
![Page 10: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/10.jpg)
10
Examples of individual claim models (cont’d)
More generally
E[Yi]
= exp {function of operational time
+ function of accident period (legislative change)
+ function of completion period (superimposed inflation)}
![Page 11: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/11.jpg)
11
Examples of individual claim models (cont’d)
More generally
E[Yi]
= exp {function of operational time
+ function of accident period (legislative change)
+ function of completion period (superimposed inflation)
+ joint function (interaction) of operational time & accident period (change in payment pattern attributable to legislative change)}
![Page 12: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/12.jpg)
12
Examples of individual claim models (cont’d)• Models of this type may be very detailed• May include
• Operational time effect (payment pattern)• Seasonality • Creeping change in payment pattern• Abrupt change in payment pattern• Accident period effect (legislative change)• Completion quarter effect (superimposed inflation)• Variations in superimposed inflation over time• Variations of superimposed inflation with operational
time• etc
![Page 13: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/13.jpg)
13
Identification of data features
• Typically largely ad hoc, using• Trial and error regressions
• Diagnostics, e.g. residual plots
![Page 14: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/14.jpg)
14
Identification of data features - illustration
• Modelling about 60,000 Auto Bodily Injury claims
• First fitting just an operational time effect
Linear Predictor
8.0
8.5
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
optime
optime
![Page 15: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/15.jpg)
15
Identification of data features - illustration
• But there appear to be unmodelled trends by• Accident quarter
• Completion (finalisation) quarter
Studentized Standardized Deviance Residuals by finalisation quarter
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
Dec-94
Mar-95
Jun-95
Sep-95
Dec-95
Mar-96
Jun-96
Sep-96
Dec-96
Mar-97
Jun-97
Sep-97
Dec-97
Mar-98
Jun-98
Sep-98
Dec-98
Mar-99
Jun-99
Sep-99
Dec-99
Mar-00
Jun-00
Sep-00
Dec-00
Mar-01
Jun-01
Sep-01
Dec-01
Mar-02
Jun-02
Sep-02
Dec-02
Mar-03
Jun-03
Sep-03
Studentized Standardized Deviance Residuals by accident quarter
-8
-6
-4
-2
0
2
4
6
8
10
Sep-94
Dec-94
Mar-95
Jun-95
Sep-95
Dec-95
Mar-96
Jun-96
Sep-96
Dec-96
Mar-97
Jun-97
Sep-97
Dec-97
Mar-98
Jun-98
Sep-98
Dec-98
Mar-99
Jun-99
Sep-99
Dec-99
Mar-00
Jun-00
Sep-00
Dec-00
Mar-01
Jun-01
Sep-01
Dec-01
Mar-02
Jun-02
Sep-02
Dec-02
Mar-03
Jun-03
Sep-03
![Page 16: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/16.jpg)
16
Identification of data features - illustration• Final model includes terms for:
• Operational time
• Seasonality
• Claim frequency• Decrease induces increased claim sizes
• Accident quarter• Change in Scheme rules
• Change in operational time effect with change in Scheme rules
• Superimposed inflation• Varying with operational time
![Page 17: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/17.jpg)
17
Identification of data features – alternative approach
• Final model is complex in structure
• Structure identified in ad hoc manner
• More rigorous approach desirable
• Try Artificial Neural Network (ANN)• Essentially a form of non-linear regression
![Page 18: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/18.jpg)
18
(Feed-forward) ANN for regression problem Y = f(X)
• Start with vector of P inputs X = {xp}
• Create hidden layer with M hidden units• Make M linear combinations of inputs
• Linear combinations then passed through layer of activation functions g(hm)
p
pmpm xwh
)()( p
pmpmm xwghgZ
![Page 19: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/19.jpg)
19
ANN for Regression problem Y = f(X)
• Activation function• Commonly a sigmoidal curve
• Function introduces non-linearity to model
keeps response bounded
hehg
1
1)(
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-5 -4 -3 -2 -1 0 1 2 3 4 5
h
g(h)
![Page 20: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/20.jpg)
20
ANN for Regression problem Y = f(X)
• Y is then given by a linear combination of the outputs from the hidden layer
• This function can describe any continuous function
• 2 hidden layers ANN can describe any function
)( m p
pmpmm
mm xwgWZWY )( m p
pmpmm
mm xwgWZWY
)( m p
pmpmm
mm xwgWZWY
![Page 21: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/21.jpg)
21
Illustration of ANN
Xi
g
Zm
hm
Y
Wm
wm
![Page 22: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/22.jpg)
22
Training of ANN
• Weights are usually determined by minimising the least-squares error
• Weight decay penalty function stops overfitting
• Larger smaller weights• Smaller weights smoother fit
N
iii XfyErr
1
2))((2
1
)( 22 m p
mpm
m wWErr
![Page 23: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/23.jpg)
23
Training of ANN - example
• Training data set: 70% of available data• Test data set: 30% of available data• Network structure:
• Single hidden layer• 20 units• Weight decay λ=0.05
• These tuning parameters determined by cross-validation• Prediction error in test data set
![Page 24: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/24.jpg)
24
Comparison of GLM and ANN
• GLM
Average absolute error =$33,777
• ANN
Average absolute error =$33,559
![Page 25: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/25.jpg)
25
GLM and ANN forecasts
• Both by simple extrapolation of trends here
• ANN case• Development quarter 10: red
• Development quarter 20: green
• Development quarter 30: yellow
• Development quarter 40: blue
• Note negative superimposed inflation• May be undesirable
ANN extrapolation
![Page 26: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/26.jpg)
26
GLM and ANN forecasts
• Note negative superimposed inflation• May be undesirable
• But ANN useful in searching out general form of past superimposed inflation• Which can then be
modelled explicitly in GLM
ANN extrapolation
![Page 27: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/27.jpg)
27
Application of ANN
• Generalisation of preceding remark
• ANN may be most useful as an automated tool for seeking out detailed trends in data• Apply ANN to data set
• Study trends in fitted model against a range of predictors or pairs of predictors
• Use this knowledge to choose the functional forms of included in the linear predictor of the GLM
![Page 28: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/28.jpg)
28
Application of ANN (cont’d)
• Ultimate test of the GLM is to apply ANN to its residuals, seeking structure
• There should be none• The example indicates that
the chosen GLM structure may:• Over-estimate the more
recent experience at the mid-ages of claim
• Under-estimate it at the older ages
![Page 29: Combining GLM and data mining techniques](https://reader035.fdocuments.us/reader035/viewer/2022062322/56814657550346895db37283/html5/thumbnails/29.jpg)
29
Conclusions
• GLMs provide a powerful and flexible family of models for claims data
• Complex GLM structures may be required for adequate representation of the data• The identification of these may be difficult• The identification procedures are likely to be ad hoc
• ANNs provide an alternative form of non-linear regression• These are likely to involve their own shortcomings if
left to stand on their own• They may, however, provide considerable assistance if
used in parallel with GLMs to identify GLM structure