Using Machine Learning Techniques to Enhance Predictive Models 1030... · Using Machine Learning...
Transcript of Using Machine Learning Techniques to Enhance Predictive Models 1030... · Using Machine Learning...
17th17th
7 – 10 November 2010 Sheraton Mirage, Gold Coast
Using Machine Learning Techniques to Enhance Predictive Models
Nelson Henwood, Wei Lin and John Yick
Finity Consulting, 2010
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
What is Machine Learning?• “…computer algorithms that improve
automatically through experience”
• Learner (algorithm) processes data representing past experiences and tries to either:– Develop an appropriate response to future data –
Supervised Learning– Understand the relationships between the data
components – Unsupervised Learning
[ Mitchell,T. (1997) Machine Learning. McGraw Hill]
[Lecture Notes, CIS 526, Temple University]
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Machine Learning Algorithms• Include:
– Decision Trees– Neural Networks– Support Vector Machines (SVMs)– Gaussian process regression– Ridge Regression– Nearest Neighbours– Partial least squares
• Presentation results are based on Talon algorithms:– Supervised learning– Proprietary algorithms– Fusion of techniques– Tailored for insurance data
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Ensemble Techniques• Systems for combining predictions from multiple models
to obtain a more powerful predictor than could be obtained from any of the constituent models
• Requires:– Prediction diversity– Combination rules
Ensemble System How it achieves diversity?Combination method -
Classification / Regression
Bagging Bootstrap resampling Majority vote / mean
Random Forests Bagging + random selection of predictors Majority vote / mean
Boosting Biased resampling Weighted average majority vote / weighted mean
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
How is ML different to current predictive modelling practice? (GLMs)
Current Practice Assumes that variables are
independent unless specifically defined otherwise “Optimal” predictors are based
on assumptions Can’t solve what you don’t
know
Large potential volume of interactions results in only a small subset being investigated
Pricing cost models are done at a peril level versus a policy or customer level
Additional model assumptions required eg linearity, error distribution
Machine Learning Allows data to interact naturally to
find the patterns between characteristics within the data
“Data Speaks For Itself”
Does not require the user to specify the predictors and interactions to be included in the model - it discovers them
Extremely Fast and Efficient
Can be performed at peril, policy or customer level
No assumptions regarding linearity, response distribution
Current Practice Assumes that variables are
independent unless specifically defined otherwise “Optimal” predictors are based
on assumptions Can’t solve what you don’t
know
Large potential volume of interactions results in only a small subset being investigated
Pricing cost models are done at a peril level versus a policy or customer level
Additional model assumptions required eg linearity, error distribution
Machine Learning Allows data to interact naturally to
find the patterns between characteristics within the data
“Data Speaks For Itself”
Does not require the user to specify the predictors and interactions to be included in the model - it discovers them
Extremely Fast and Efficient
Can be performed at peril, policy or customer level
No assumptions regarding linearity,assumed error distribution much less important
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Signal ValidationValidation by Random Partition• Training using randomly
selected 70% of the data• Validate on remaining 30%
A More Cautious Strategy• 3-way split of the data• Training using say 50%-70%
of the data• Validate using 20% (or random
cross-section)• Use 30% unseen data for post-
modelling test
Segmentation Validation
0%10%20%
30%40%50%60%
70%80%90%
5 10 9 1 3 4 7 8 2 6
Training
Validation
0%10%20%30%40%50%60%70%80%90%100%
12 5 13 10 2 11 4 8 7 9 16 14 18 6 15 1 3 17
Loss
Rat
io
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Processes Used in Talon for Analysis• Complete Segmentation – partitions the portfolio into mutually
exclusive and exhaustive segments where each segment contains policies that have common attributes and have similar and consistent loss ratio, frequency, severity, loss cost, profit per policy, retention or conversion
• Multi-split – undertakes multiple segmentation runs and incorporates a random element in the splitting process – doesn’t always take the “optimum path”. Can result in different areas of the data being explored and can lower the risk of finding a local minima
• Scoring – a practically continuous segmentation of a portfolio achieved through application of ensemble techniques. Produces greater differentiation than Complete Segmentation with respect to the factor of interest
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Our Business Problem• Build an accurate predictive model of claims cost for a
comprehensive motor portfolio• Frequency and size models constructed at peril (group) level• GLMs and Decision Tree techniques (CART) were employed
• Some of our constraints include:– Model assumptions:
• inherent in the GLM – linearity, error distribution• typically assume independence of perils and frequency / size
– Sleeping and ageing
• Can machine learning techniques be used to find consistent signals in the model residuals and thus improve the predictions?– Results shown for at-fault collision models and total cost
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
At-fault Collision models fit “well”• For frequency we have:
– Reasonable gains chart– Model Pearson residual 38% lower than null model– CART won’t split the residual further than shown with 5,000
exposure year threshold
Gains Chart - GLM for Multi Vehicle at fault collision
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Key Parameters for Segmentation• Our segmentation target is the actual vs modelled claim
frequency / size• Minimum volume threshold for a segment is a key
parameter for segmentation to ensure a credible segmentation– Frequency and Cost – exposure– Size – number of claims
• One test we undertook was to explore a number of different volume thresholds using a Complete Segmentation
• We examined the degree of segmentation vs persistency of the signal– Lift = ratio of prediction (A vs E) in highest to lowest segment– Correlation is measured on A vs E across segments between
the training (70%) and validation (30%) data
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Frequency – Volume Threshold TestingLift vs Correlation
30
35
40
45
70
50 65
55
60
120%
125%
130%
135%
140%
145%
150%
155%
160%
165%
80% 85% 90% 95% 100%
Correlation
Lift
Selected Volume Threshold
Complete Segmentation for Volume Threshold (000)
Change in Gini vs Correlation
30
35
40
45
70
5065
55
60
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
80% 85% 90% 95% 100%
Correlation
Cha
nge
in G
ini -
Val
idat
ion
Lift vs Correlation
30
35
40
45
70
50 65
55
60
120%
125%
130%
135%
140%
145%
150%
155%
160%
165%
80% 85% 90% 95% 100%
Correlation
Lift
Selected Volume Threshold
Complete Segmentation for Volume Threshold (000)
Change in Gini vs Correlation
30
35
40
45
70
5065
55
60
0.000
0.001
0.002
0.003
0.004
0.005
0.006
0.007
80% 85% 90% 95% 100%
Correlation
Cha
nge
in G
ini -
Val
idat
ion
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Size – Volume Threshold TestingLift vs Correlation
1
1.5 2
2.5
3
4
3.5
4.55
110%
115%
120%
125%
130%
135%
140%
80% 85% 90% 95% 100%
Correlation
Lift
Selected Volume Threshold
Complete Segmentation for Volume Threshold (000)
Change in Gini vs Correlation
1
2
5
1.5
2.53
4.5
3.5
4
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
0.0014
0.0016
0.0018
80% 85% 90% 95% 100%
Correlation
Cha
nge
in G
ini -
Val
idat
ion
Lift vs Correlation
1
1.5 2
2.5
3
4
3.5
4.55
110%
115%
120%
125%
130%
135%
140%
80% 85% 90% 95% 100%
Correlation
Lift
Selected Volume Threshold
Complete Segmentation for Volume Threshold (000)
Change in Gini vs Correlation
1
2
5
1.5
2.53
4.5
3.5
4
0.0000
0.0002
0.0004
0.0006
0.0008
0.0010
0.0012
0.0014
0.0016
0.0018
80% 85% 90% 95% 100%
Correlation
Cha
nge
in G
ini -
Val
idat
ion
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Multi Split RunsAverage Size
110%
115%
120%
125%
130%
135%
50% 60% 70% 80% 90% 100%
Correlation
Lift
Frequency
110%
115%
120%
125%
130%
135%
140%
145%
150%
50% 60% 70% 80% 90% 100%
Correlation
Lift
Individual Multi-Split RunComplete SegmentationSelected Segmentation
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Complete Segmentation - Frequency• A number of segments are identified which show
significant and persistent deviation from the model fit
60%
80%
100%
120%
140%
2005 2006 2007 2008Year
A vs
E
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Frequency Signal Validation• Signal persists across the validation data – better
persistence at the extremesActual / Modelled Frequency
70%
80%
90%
100%
110%
120%
130%
13 4 7 9 1 14 12 2 11 3 6 5 10 8 15 16Segments
Training Validation
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Frequency Example Segments• The segments identified are quite complex• Minimum of 4 variable conditions required to describe• A vs E shown over training + validation data
Segment # % Exposure
Actual / Model Segment
13 8% 0.82 6 way interaction involving vehicle characterstics, NCB level, policy status and others
4 5% 0.87 7 way interaction involving vehicle characterstics, NCB level, policy status and others
.. .. .. ..
5 5% 1.07 4 way interaction involving vehicle characteristics, NCB level and policy status
10 6% 1.09 4 way interaction, including vehicle characteristics, policy duration and NCB level
15 6% 1.11 4 way interaction, including vehicle characteristics, policy duration and NCB level
8 7% 1.11 4 way interaction including age, NCB level and others
16 6% 1.20 4 way interaction including vehicle characteristics, policy duration and others
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Frequency Scoring• The scoring run shows a strong persistent signal• The signal validates quite well across the entire spectrum of mis-fit
identifiedActual / Modelled Frequency
60%
70%
80%
90%
100%
110%
120%
130%
0%-5%
5%-10%
10%-15%
15%-25%
25%-35%
35%-50%
50%-65%
65%-75%
75%-85%
85%-90%
90%-95%
95%-100%
Score Percentile
Training Validation
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Frequency – Scoring vs Segmentation• Scoring delivers more lift than segmentation – here shown across
the validation data
Actual / Modelled Frequency
70%
80%
90%
100%
110%
120%
130%
0%-5%
5%-10%
10%-15%
15%-25%
25%-35%
35%-50%
50%-65%
65%-75%
75%-85%
85%-90%
90%-95%
95%-100%
Score Percentile
Segmentation (Val) Scoring (Val)
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Modification of GLM Prediction -Frequency
• Data divided into deciles based on GLM predicted frequency• Chart illustrates how the Complete Segmentation and Scoring “pull-
apart” the GLM prediction• Nb: log scale used
0% 1% 10% 100%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GLM
Fre
q Pe
rcen
tile
Claim Frequency
GLM prediction rangeSegmentation upperSegmentation lowerScoring upperScoring lower
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Complete Segmentation - Average Size• Analysis of the average size model also reveals a number of
segments with significant deviation from the model fit
80%
90%
100%
110%
120%
130%
2005 2006 2007 2008Year
A vs
E
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Average Size – Signal validation• The lift is not as great as for the frequency model• Persistence of the signal across the validation data is good at the
extremesActual / Modelled Average Size
80%
85%
90%
95%
100%
105%
110%
115%
120%
7 8 10 3 9 4 1 2 6 5Segments
Training Validation
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Modification of GLM Prediction - Size• The segmentation and scoring also significantly “pull-
apart” the GLM prediction across the range of average size
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
GLM
Siz
e Pe
rcen
tile
Average Size ($)
GLM prediction rangeSegmentation upperSegmentation lowerScoring upperScoring lower
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Cost = modelled frequency x modelled size?• We adjusted the component GLM predictions by the output from the scoring runs• Determined a cost per policy prediction as the product of these “enhanced” models• Ran a Complete Segmentation at cost level• Discovered a few segments where the cost prediction did not fit well• Examined the component model fit for those segments (also checked the base GLM)• The size model is the culprit but this is not clear until we look at the fitted average
size weighted by the expected frequency
Segment % Exposure Freq Size1 Cost Size2 Freq Size1 Cost Size2
1 5.8% 96 99 87 90 99 93 84 852 8.9% 99 97 91 91 100 96 91 91
Other 78.0% 100 100 100 100 100 100 100 1003 7.2% 103 104 126 122 108 105 134 124
Scoring Enhanced Base GLMActual vs Expected
1 – Based on scoring of policies with a claim used in size model
2 – Implied average size based on cost scoring of entire exposure file
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Scoring Run against Total Cost Prediction• Chart shows a scoring run against a CART enhanced total cost
prediction• Model shows good lift and good persistence of signal across the
validation data
Actual / Modelled Total Cost
70%
80%
90%
100%
110%
120%
130%
140%
0%-5%
5%-10%
10%-15%
15%-25%
25%-35%
35%-50%
50%-65%
65%-75%
75%-85%
85%-90%
90%-95%
95%-100%
Score Percentile
Training Validation
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Using Machine Learning• Why use a GLM? (vs ensemble prediction)
– Enables a deep understanding of drivers– Forces us to get close to the data and understand it– “Easy” to communicate
• There are limitations– Linearity– Response distribution– Independence– Practicality
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Predictive Model Improvement• Component model by peril:
– Use segmentation to guide search for complex interactions– Incorporate these findings into the GLM
• Perform segmentation on peril cost prediction– Adjust prediction outside the GLM
• Examine total cost prediction– Perform segmentation and adjust outside the GLM
• Use scoring to incorporate an additional complex layer which improves predictive power
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Other Applications of Machine Learning
Useful segmentation targets include:• Loss ratio (or COR)• Joint profitability / sales or profitability /
competitive position• Instability• Price elasticity
7 – 10 November 2010 Sheraton Mirage, Gold Coast
17th
Conclusions• GLMs are a powerful tool for building predictive
models• There are inherent assumptions and other
practical challenges– Probabilistic inference to test a hypothesis
• Limitations compromise predictive accuracy– “Sub” optimisation
• Machine learning techniques can be used to improve predictive models– Using CART is not enough