Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

25
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California Session CPP-53

description

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates. David Speights Senior Research Statistician HNC Insurance Solutions Irvine, California. Session CPP-53. Presentation Outline. Introduction to Neural Networks - PowerPoint PPT Presentation

Transcript of Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Page 1: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring

and Covariates

David SpeightsSenior Research Statistician

HNC Insurance Solutions

Irvine, California

Session CPP-53

Page 2: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Presentation Outline

• Introduction to Neural Networks

• Introduction to Survival Analysis

• Neural Networks with Right Censoring

• Simulated Example

• Predicting Claim Duration

Page 3: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural NetworksMotivation

• Complex Classification– Character Recognition

– Voice Recognition

• Humans have no trouble with these concepts– We can read even distorted documents

– We can recognize voices over poor telephone lines.

• Attempt to model human brain

Page 4: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural NetworksConnection to Brain Functionality

• Brain – made up of millions of neurons sending signals to the

body and each other

• Neural Networks – collection of “neurons” which send “signals” to

produce an output

Page 5: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural NetworksCommon Representation

. . .

. . .

X1 X2 XP

Y

P predictors (inputs)

1 Hidden Layer with M Neurons

1 output

1 2 M

Page 6: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural Networks Architecture of the ith Neuron

Represents a neuron in the brain

X1

X2XP

...

O=bi0 + bi1X1 + … + bipXp

s(O)

S is a function on the interval (0,1) representing the strength of the output

0

1

s

O

Activation Function

Page 7: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural Networks Connection to Multiple Regressions

• Similarities– Both describe relationships between variables

– Both can create predictions

• Differences– Function describing the relationships is more complex

– Response variables are typically called outputs

– Predictor variables are typically called inputs

– Estimating the parameters is usually called training

Page 8: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural NetworksFunctional Representation

Y = f(X1, …, Xp) + error

• Multiple Linear Regression – f() = linear combination of regressors– Forced to model only specified relationships

• Neural Network– f() = nonlinear combination of regressors– Can deal with nonlinearities and interactions without special

designation

Page 9: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Neural NetworksFunctional Specification

• For a neural network f() is written

• Here g and s are transformation functions specified in advance

))((),...,(

Equation RegressionLinear Multiple

10

101

p

kkjk

M

jjp

XsgXXf

Page 10: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Survival AnalysisWhat is Survival Analysis

• Used to model time to event data (example: time until a claim ends)

• Usually represented by (1) right skewed data (2) multiplicative error structure (3) right censoring

• Common in cancer clinical trials, component failure analysis, and AIDS data analysis among other examples

Page 11: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Survival AnalysisNotation

• T1, ..., Tn - independent failure times with distribution F

and density function f

• C1, ..., Cn - independent censoring times with distribution

G and density function g

• Yi = min(Ti,Ci) - observed time

• i = I(Yi = Ti) - Censoring indicator

• Xi = (Xi1, ..., Xip) - vector of known covariates

associated with the ith individual

Page 12: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Introduction to Survival Analysis Likelihood Analysis (Parametric Models)

• (Yi, i, Xi) i=1, …, n , independent observations

• Likelihood written

n

iiiiXYfL

1

)|,()(

• f(Y,|X)=[f (Y|X)(1-G(Y|X))][g(Y|X)(1-F (Y|X))]

n

i

i

ii

i

iiXYFXYfLL

1

1

2))|(1()|()(

• Here L2 does not depend on

Page 13: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Neural Networks with Right CensoringModel Specification

• Neural Network Model

• Here has distribution function F and density f• = {0, …, p, 1, …, p}

• The likelihood isi

iin

i

i

iixmlpT

FxmlpT

fLL

1),()log(

1),()log(1

),(1

2

),(

)'()log(1

0

xmlp

xsTM

jjj

Page 14: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Neural Networks with Right Censoring Fitting Neural Networks without Censoring

• estimated by minimizing squared error

n

iii

n

iii

xCxmlpYC11

2),(),()log()(

n

i

ii

n

i

ixmlpiY

xmlpYnL

eL

1

2

2

1

2),()log(

21

2

),()log(21

)2log(2

)),(log(

21

),(

• Ifis normal minimizing squared error same as maximizing the likelihood.

Page 15: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Neural Networks with Right CensoringFitting Neural Networks without Censoring

• Gradient decent algorithm for estimating ),(

1:1:: iijiijijxC

• Algorithm updated at each observation• is known as the learning rate

• j:0=j-1:n

• Known as back-propagation algorithm• To generalize to right censored data, replace C() with

the likelihood for censored neural networks.

Page 16: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Neural Networks with Right CensoringFitting Neural Networks with Censoring

• Step 1 - Estimating – Fix and pass through data once using

• Step 2 - Estimating – fix at end of pass through data

– iterate until |j-j-1|<using Newton-Raphson algorithm

),( 1:1:: jijiijij

L

),(

),(

1

2

1

1

j

j

jj L

L

Page 17: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Neural Networks with Right CensoringFitting Neural Networks with Censoring

• With highly parameterized neural networks we risk over fitting

Page 18: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

• We need to design the fitting procedure to find a good fit to the data

Neural Networks with Right CensoringFitting Neural Networks with Censoring

Page 19: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

• The negative of the likelihood is calculated on both sets of data at the same time.

Negative Likelihood

75% Training Data 25% Testing Data

Parameter Estimates

Training Cycles Training Cycles

Neural Networks with Right CensoringFitting Neural Networks with Censoring

Page 20: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

• Potential drawbacks to neural networks– Hard to tell the individual effects of each predictor

variable on the response.

– Can have poor extrapolation properties

• Potential Gains from neural networks– Can reduce preliminary analysis in modeling

• discovery of interactions and nonlinear relationships becomes automatic

– Increases predictive power of models

Neural Networks with Right CensoringFitting Neural Networks with Censoring

Page 21: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

• True Time Model : log(t) = x2 + 0.5• Censoring Model: log(c) = 0.25 + x2 + 0.5• x ~ U(-3,3)

• ~ N(0,1)

• Censored if c < t

• ~ 35% censoring

• 3 node neural network fit

Simulated Example

Page 22: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

log P

redic

tion/

Actua

l

-2

-1

0

1

2

3

4

5

6

7

8

9

10

X-3 -2 -1 0 1 2 3

• Scatter are true times versus x

• Solid line represents NN fit to data

Simulated Example

Page 23: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Predicting Claim Duration

• Predictor Variables– NCCI Codes

• Body Part Code

• Injury Type

• Nature of Injury

• Industry Class Code

– Demographic Information• Age

• Gender

• Weekly Wage

• Zip Code

• Response Variable– Time from report until the

claim is closed

Page 24: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Predicting Claim Duration

• Ratio of prediction to actual duration on log10 scale

• Difficult to represent open claim results

Open Claim Closed Claim

Page 25: Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates

Conclusions

• Provides an intuitive method to address right censored data with a neural network

• Allows for more flexible mean function

• Can be used with many time to event data situations