Using Machine Learning Techniques to Enhance Predictive Models 1030... · Using Machine Learning...

17th17th

7 – 10 November 2010 Sheraton Mirage, Gold Coast

Using Machine Learning Techniques to Enhance Predictive Models

Nelson Henwood, Wei Lin and John Yick

Finity Consulting, 2010


17th

What is Machine Learning?• “…computer algorithms that improve

automatically through experience”

• Learner (algorithm) processes data representing past experiences and tries to either:– Develop an appropriate response to future data –

Supervised Learning– Understand the relationships between the data

components – Unsupervised Learning

[ Mitchell,T. (1997) Machine Learning. McGraw Hill]

[Lecture Notes, CIS 526, Temple University]


17th

Machine Learning Algorithms• Include:

– Decision Trees– Neural Networks– Support Vector Machines (SVMs)– Gaussian process regression– Ridge Regression– Nearest Neighbours– Partial least squares

• Presentation results are based on Talon algorithms:– Supervised learning– Proprietary algorithms– Fusion of techniques– Tailored for insurance data


17th

Ensemble Techniques• Systems for combining predictions from multiple models

to obtain a more powerful predictor than could be obtained from any of the constituent models

• Requires:– Prediction diversity– Combination rules

Ensemble System How it achieves diversity?Combination method -

Classification / Regression

Bagging Bootstrap resampling Majority vote / mean

Random Forests Bagging + random selection of predictors Majority vote / mean

Boosting Biased resampling Weighted average majority vote / weighted mean


17th

How is ML different to current predictive modelling practice? (GLMs)

Current Practice Assumes that variables are

independent unless specifically defined otherwise “Optimal” predictors are based

on assumptions Can’t solve what you don’t

know

Large potential volume of interactions results in only a small subset being investigated

Pricing cost models are done at a peril level versus a policy or customer level

Additional model assumptions required eg linearity, error distribution

Machine Learning Allows data to interact naturally to

find the patterns between characteristics within the data

“Data Speaks For Itself”

Does not require the user to specify the predictors and interactions to be included in the model - it discovers them

Extremely Fast and Efficient

Can be performed at peril, policy or customer level

No assumptions regarding linearity, response distribution

Current Practice Assumes that variables are

independent unless specifically defined otherwise “Optimal” predictors are based

on assumptions Can’t solve what you don’t

know

Large potential volume of interactions results in only a small subset being investigated

Pricing cost models are done at a peril level versus a policy or customer level

Additional model assumptions required eg linearity, error distribution

Machine Learning Allows data to interact naturally to

find the patterns between characteristics within the data

“Data Speaks For Itself”

Does not require the user to specify the predictors and interactions to be included in the model - it discovers them

Extremely Fast and Efficient

Can be performed at peril, policy or customer level

No assumptions regarding linearity,assumed error distribution much less important


17th

Signal ValidationValidation by Random Partition• Training using randomly

selected 70% of the data• Validate on remaining 30%

A More Cautious Strategy• 3-way split of the data• Training using say 50%-70%

of the data• Validate using 20% (or random

cross-section)• Use 30% unseen data for post-

modelling test

Segmentation Validation

0%10%20%

30%40%50%60%

70%80%90%

5 10 9 1 3 4 7 8 2 6

Training

Validation

0%10%20%30%40%50%60%70%80%90%100%

12 5 13 10 2 11 4 8 7 9 16 14 18 6 15 1 3 17

Loss

Rat

io


17th

Processes Used in Talon for Analysis• Complete Segmentation – partitions the portfolio into mutually

exclusive and exhaustive segments where each segment contains policies that have common attributes and have similar and consistent loss ratio, frequency, severity, loss cost, profit per policy, retention or conversion

• Multi-split – undertakes multiple segmentation runs and incorporates a random element in the splitting process – doesn’t always take the “optimum path”. Can result in different areas of the data being explored and can lower the risk of finding a local minima

• Scoring – a practically continuous segmentation of a portfolio achieved through application of ensemble techniques. Produces greater differentiation than Complete Segmentation with respect to the factor of interest


17th

Our Business Problem• Build an accurate predictive model of claims cost for a

comprehensive motor portfolio• Frequency and size models constructed at peril (group) level• GLMs and Decision Tree techniques (CART) were employed

• Some of our constraints include:– Model assumptions:

• inherent in the GLM – linearity, error distribution• typically assume independence of perils and frequency / size

– Sleeping and ageing

• Can machine learning techniques be used to find consistent signals in the model residuals and thus improve the predictions?– Results shown for at-fault collision models and total cost


17th

At-fault Collision models fit “well”• For frequency we have:

– Reasonable gains chart– Model Pearson residual 38% lower than null model– CART won’t split the residual further than shown with 5,000

exposure year threshold

Gains Chart - GLM for Multi Vehicle at fault collision

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96


17th

Key Parameters for Segmentation• Our segmentation target is the actual vs modelled claim

frequency / size• Minimum volume threshold for a segment is a key

parameter for segmentation to ensure a credible segmentation– Frequency and Cost – exposure– Size – number of claims

• One test we undertook was to explore a number of different volume thresholds using a Complete Segmentation

• We examined the degree of segmentation vs persistency of the signal– Lift = ratio of prediction (A vs E) in highest to lowest segment– Correlation is measured on A vs E across segments between

the training (70%) and validation (30%) data


17th

Frequency – Volume Threshold TestingLift vs Correlation

30

35

40

45

70

50 65

55

60

120%

125%

130%

135%

140%

145%

150%

155%

160%

165%

80% 85% 90% 95% 100%

Correlation

Lift

Selected Volume Threshold

Complete Segmentation for Volume Threshold (000)

Change in Gini vs Correlation

30

35

40

45

70

5065

55

60

0.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

80% 85% 90% 95% 100%

Correlation

Cha

nge

in G

ini -

Val

idat

ion

Lift vs Correlation

30

35

40

45

70

50 65

55

60

120%

125%

130%

135%

140%

145%

150%

155%

160%

165%

80% 85% 90% 95% 100%

Correlation

Lift




30

35

40

45

70

5065

55

60

0.000

0.001

0.002

0.003

0.004

0.005

0.006

0.007

80% 85% 90% 95% 100%

Correlation

Cha

nge

in G

ini -

Val

idat

ion


17th

Size – Volume Threshold TestingLift vs Correlation

1

1.5 2

2.5

3

4

3.5

4.55

110%

115%

120%

125%

130%

135%

140%

80% 85% 90% 95% 100%

Correlation

Lift




1

2

5

1.5

2.53

4.5

3.5

4

0.0000

0.0002

0.0004

0.0006

0.0008

0.0010

0.0012

0.0014

0.0016

0.0018

80% 85% 90% 95% 100%

Correlation

Cha

nge

in G

ini -

Val

idat

ion

Lift vs Correlation

1

1.5 2

2.5

3

4

3.5

4.55

110%

115%

120%

125%

130%

135%

140%

80% 85% 90% 95% 100%

Correlation

Lift




1

2

5

1.5

2.53

4.5

3.5

4

0.0000

0.0002

0.0004

0.0006

0.0008

0.0010

0.0012

0.0014

0.0016

0.0018

80% 85% 90% 95% 100%

Correlation

Cha

nge

in G

ini -

Val

idat

ion


17th

Multi Split RunsAverage Size

110%

115%

120%

125%

130%

135%

50% 60% 70% 80% 90% 100%

Correlation

Lift

Frequency

110%

115%

120%

125%

130%

135%

140%

145%

150%

50% 60% 70% 80% 90% 100%

Correlation

Lift

Individual Multi-Split RunComplete SegmentationSelected Segmentation


17th

Complete Segmentation - Frequency• A number of segments are identified which show

significant and persistent deviation from the model fit

60%

80%

100%

120%

140%

2005 2006 2007 2008Year

A vs

E


17th

Frequency Signal Validation• Signal persists across the validation data – better

persistence at the extremesActual / Modelled Frequency

70%

80%

90%

100%

110%

120%

130%

13 4 7 9 1 14 12 2 11 3 6 5 10 8 15 16Segments

Training Validation


17th

Frequency Example Segments• The segments identified are quite complex• Minimum of 4 variable conditions required to describe• A vs E shown over training + validation data

Segment # % Exposure

Actual / Model Segment

13 8% 0.82 6 way interaction involving vehicle characterstics, NCB level, policy status and others

4 5% 0.87 7 way interaction involving vehicle characterstics, NCB level, policy status and others

.. .. .. ..

5 5% 1.07 4 way interaction involving vehicle characteristics, NCB level and policy status

10 6% 1.09 4 way interaction, including vehicle characteristics, policy duration and NCB level

15 6% 1.11 4 way interaction, including vehicle characteristics, policy duration and NCB level

8 7% 1.11 4 way interaction including age, NCB level and others

16 6% 1.20 4 way interaction including vehicle characteristics, policy duration and others


17th

Frequency Scoring• The scoring run shows a strong persistent signal• The signal validates quite well across the entire spectrum of mis-fit

identifiedActual / Modelled Frequency

60%

70%

80%

90%

100%

110%

120%

130%

0%-5%

5%-10%

10%-15%

15%-25%

25%-35%

35%-50%

50%-65%

65%-75%

75%-85%

85%-90%

90%-95%

95%-100%

Score Percentile

Training Validation


17th

Frequency – Scoring vs Segmentation• Scoring delivers more lift than segmentation – here shown across

the validation data

Actual / Modelled Frequency

70%

80%

90%

100%

110%

120%

130%

0%-5%

5%-10%

10%-15%

15%-25%

25%-35%

35%-50%

50%-65%

65%-75%

75%-85%

85%-90%

90%-95%

95%-100%

Score Percentile

Segmentation (Val) Scoring (Val)


17th

Modification of GLM Prediction -Frequency

• Data divided into deciles based on GLM predicted frequency• Chart illustrates how the Complete Segmentation and Scoring “pull-

apart” the GLM prediction• Nb: log scale used

0% 1% 10% 100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

GLM

Fre

q Pe

rcen

tile

Claim Frequency

GLM prediction rangeSegmentation upperSegmentation lowerScoring upperScoring lower


17th

Complete Segmentation - Average Size• Analysis of the average size model also reveals a number of

segments with significant deviation from the model fit

80%

90%

100%

110%

120%

130%

2005 2006 2007 2008Year

A vs

E


17th

Average Size – Signal validation• The lift is not as great as for the frequency model• Persistence of the signal across the validation data is good at the

extremesActual / Modelled Average Size

80%

85%

90%

95%

100%

105%

110%

115%

120%

7 8 10 3 9 4 1 2 6 5Segments

Training Validation


17th

Modification of GLM Prediction - Size• The segmentation and scoring also significantly “pull-

apart” the GLM prediction across the range of average size

0 2,000 4,000 6,000 8,000 10,000 12,000 14,000 16,000 18,000

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

GLM

Siz

e Pe

rcen

tile

Average Size ($)

GLM prediction rangeSegmentation upperSegmentation lowerScoring upperScoring lower


17th

Cost = modelled frequency x modelled size?• We adjusted the component GLM predictions by the output from the scoring runs• Determined a cost per policy prediction as the product of these “enhanced” models• Ran a Complete Segmentation at cost level• Discovered a few segments where the cost prediction did not fit well• Examined the component model fit for those segments (also checked the base GLM)• The size model is the culprit but this is not clear until we look at the fitted average

size weighted by the expected frequency

Segment % Exposure Freq Size1 Cost Size2 Freq Size1 Cost Size2

1 5.8% 96 99 87 90 99 93 84 852 8.9% 99 97 91 91 100 96 91 91

Other 78.0% 100 100 100 100 100 100 100 1003 7.2% 103 104 126 122 108 105 134 124

Scoring Enhanced Base GLMActual vs Expected

1 – Based on scoring of policies with a claim used in size model

2 – Implied average size based on cost scoring of entire exposure file


17th

Scoring Run against Total Cost Prediction• Chart shows a scoring run against a CART enhanced total cost

prediction• Model shows good lift and good persistence of signal across the

validation data

Actual / Modelled Total Cost

70%

80%

90%

100%

110%

120%

130%

140%

0%-5%

5%-10%

10%-15%

15%-25%

25%-35%

35%-50%

50%-65%

65%-75%

75%-85%

85%-90%

90%-95%

95%-100%

Score Percentile

Training Validation


17th

Using Machine Learning• Why use a GLM? (vs ensemble prediction)

– Enables a deep understanding of drivers– Forces us to get close to the data and understand it– “Easy” to communicate

• There are limitations– Linearity– Response distribution– Independence– Practicality


17th

Predictive Model Improvement• Component model by peril:

– Use segmentation to guide search for complex interactions– Incorporate these findings into the GLM

• Perform segmentation on peril cost prediction– Adjust prediction outside the GLM

• Examine total cost prediction– Perform segmentation and adjust outside the GLM

• Use scoring to incorporate an additional complex layer which improves predictive power


17th

Other Applications of Machine Learning

Useful segmentation targets include:• Loss ratio (or COR)• Joint profitability / sales or profitability /

competitive position• Instability• Price elasticity


17th

Conclusions• GLMs are a powerful tool for building predictive

models• There are inherent assumptions and other

practical challenges– Probabilistic inference to test a hypothesis

• Limitations compromise predictive accuracy– “Sub” optimisation

• Machine learning techniques can be used to improve predictive models– Using CART is not enough

Using Machine Learning Techniques to Enhance Predictive Models 1030... · Using Machine Learning...

Documents

Transcript of Using Machine Learning Techniques to Enhance Predictive Models 1030... · Using Machine Learning...