03461238.2012.728537

21
This article was downloaded by: [Winchester School of Art] On: 26 May 2015, At: 08:36 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Click for updates Scandinavian Actuarial Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/sact20 Modelling critical illness claim diagnosis rates I: methodology Erengul Ozkok a , George Streftaris b , Howard R. Waters b & A. David Wilkie b a Department of Actuarial Sciences , Hacettepe University , Ankara , Turkey b Department of Actuarial Mathematics and Statistics , Heriot- Watt University , Edinburgh , UK Published online: 12 Dec 2012. To cite this article: Erengul Ozkok , George Streftaris , Howard R. Waters & A. David Wilkie (2014) Modelling critical illness claim diagnosis rates I: methodology, Scandinavian Actuarial Journal, 2014:5, 439-457, DOI: 10.1080/03461238.2012.728537 To link to this article: http://dx.doi.org/10.1080/03461238.2012.728537 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

Transcript of 03461238.2012.728537

  • This article was downloaded by: [Winchester School of Art]On: 26 May 2015, At: 08:36Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

    Click for updates

    Scandinavian Actuarial JournalPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/sact20

    Modelling critical illness claimdiagnosis rates I: methodologyErengul Ozkok a , George Streftaris b , Howard R. Waters b & A.David Wilkie ba Department of Actuarial Sciences , Hacettepe University ,Ankara , Turkeyb Department of Actuarial Mathematics and Statistics , Heriot-Watt University , Edinburgh , UKPublished online: 12 Dec 2012.

    To cite this article: Erengul Ozkok , George Streftaris , Howard R. Waters & A. David Wilkie (2014)Modelling critical illness claim diagnosis rates I: methodology, Scandinavian Actuarial Journal,2014:5, 439-457, DOI: 10.1080/03461238.2012.728537

    To link to this article: http://dx.doi.org/10.1080/03461238.2012.728537

    PLEASE SCROLL DOWN FOR ARTICLE

    Taylor & Francis makes every effort to ensure the accuracy of all the information (theContent) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoeveror howsoever caused arising directly or indirectly in connection with, in relation to orarising out of the use of the Content.

    This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

  • Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • Original Article

    Modelling critical illness claim diagnosis rates I: methodology

    ERENGUL OZKOKa*, GEORGE STREFTARISb, HOWARD R. WATERSb and

    A. DAVID WILKIEb

    aDepartment of Actuarial Sciences, Hacettepe University, Ankara, Turkey;bDepartment of Actuarial Mathematics and Statistics, Heriot-Watt University, Edinburgh, UK

    (Accepted September 2012)

    In a series of two papers, this paper and the one by Ozkok et al. (Modelling critical illness claim

    diagnosis rates II: results), we develop statistical models to be used as a framework for estimating, and

    graduating, Critical Illness (CI) insurance diagnosis rates. We use UK data for 19992005 supplied bythe Continuous Mortality Investigation (CMI) to illustrate their use. In this paper, we set out the basic

    methodology. In particular, we set out some models, we describe the data available to us and we discuss

    the statistical distribution of estimators proposed for CI diagnosis inception rates. A feature of CI

    insurance is the delay, on average about 6 months but in some cases much longer, between the

    diagnosis of an illness and the settlement of the subsequent claim. Modelling this delay, the so-called

    Claim Delay Distribution, is a necessary first step in the estimation of the claim diagnosis rates and

    this is discussed in the present paper. In the subsequent paper, we derive and discuss diagnosis rates for

    CI claims from all causes and also from specific causes.

    Keywords: critical illness insurance; diagnosis rates; statistical models; Burr generalised linear-type

    model; Claim Delay Distribution; Continuous Mortality Investigation

    1. Introduction

    Critical Illness (CI) insurance is a type of long-term insurance, typically secured by

    regular premiums throughout the term of the policy, that provides a lump sum on the

    diagnosis of one of a specified list of critical illnesses within the policy conditions. In the

    UK, there are two types of CI policy: Full Accelerated (FA), which covers both CI and

    death and Stand Alone (SA), which covers only CI. The former is far more popular than

    the latter. CI coverage includes, but is not limited to, cancer, heart attack, stroke, coronary

    artery by-pass graft (CABG), kidney failure (KF), major organ transplant (MOT) and

    multiple sclerosis (MS). Most policies also include total and permanent disability (TPD)

    for completeness, essentially to cover disability arising from other causes not covered

    explicitly in the policy.

    CI insurance has been very popular in the UK since it was introduced in the 1980s.

    Around 700,000 new policies were sold in 1998 (Dinani et al. (2000)) and more than

    1 million new policies were issued in 2002 (CMI WP 50 (2011)), many of them linked to

    *Corresponding author. E-mail: [email protected]

    # 2012 Taylor & Francis

    http://dx.doi.org/10.1080/03461238.2012.728537

    Scandinavian Actuarial Journal, 2014

    Vol. 2014, No. 5, 439457,

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • mortgage repayments. Further background information on CI insurance can be found in

    CMI WP 50 (2011).

    This is Paper I in a series of the two papers; the other paper, Ozkok et al. (2012a), is

    referred to as Paper II. Our objective in these papers is to set out, and to illustrate, a

    methodology for the estimation and graduation of CI insurance claim diagnosis rates.

    These rates are clearly needed for the calculation of premium rates and policy values and

    also for monitoring CI experience. The CMI has published all causes (CMI WP 50 (2011))

    and cause specific (CMI WP 52 (2011)) CI claim diagnosis rates using its own

    methodology. Our methodology differs from that of the CMI mainly because we start

    by specifying a statistical model. From this starting point we can:

    (1) determine the statistical properties of our estimators, and,

    (2) use modern statistical methodology to smooth our estimates, for example, by

    specifying (Cox-type) regression models incorporating a range of covariates.

    In Section 2, we introduce in very general terms the statistical models whose

    parameterisation is discussed in Paper II. The data used to parameterise these models

    are described in Section 3. This data set, supplied by the CMI, relates to policies in force,

    and claims settled, in the UK in the seven calendar years 19992005. A feature of CIinsurance is the delay between diagnosis of an illness and the settlement of the subsequent

    claim. Our estimation procedure requires the distribution of this delay to be modelled,

    this is discussed in Section 4. In Section 5, we set out equations for point estimates of the

    claim diagnosis rates, discuss the distribution of these estimates and how we can smooth

    them using generalised linear (GL)-type models.

    In Paper II, we illustrate our methodology by using it to derive numerical results for all

    cause claim diagnosis rates and for cause specific claim diagnosis rates.

    More details of most of the research reported in these papers can be found in Ozkok

    (2011).

    2. Models

    Our most detailed model for CI insurance is a cause specific model which is represented in

    Figure 1.

    Points to note about the model represented in Figure 1 are:

    (1) Healthy indicates that the individual has not yet been diagnosed with a CI or died.

    (2) An individual exits the Healthy state on death or on the diagnosis of a CI, as specified

    in the policy conditions.

    (3) The model is specified in terms of transition intensities, labelled kjx;h and kDx;h .

    Transition intensities are analogous to the force of mortality and there are good

    reasons for specifying the model in this way. See, for example, Waters (1984).

    2 E. Ozkok et al.442440

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • (4) A transition from Healthy to Dead means death before the diagnosis of a CI, so that,

    numerically, we might expect kDx;h to be different from, possibly lower than, the totalforce of mortality for a corresponding set of individuals.

    (5) The transition intensities, kjx;h and kDx;h , depend on the cause, on the current age of

    the individual, x, and also on a set of other covariates, labelled u. These covariates are

    the important characteristics of the individual and/or the policy which affect the

    likelihood of the diagnosis of a CI or death; for example, Sex, Benefit amount, Office.

    The set of covariates cannot include any characteristics which are not recorded in our

    data and a major part of the statistical modelling is to determine which of those

    characteristics recorded in our data are important and hence should be included in u.

    (6) This model can be used for both FA and SA policies. Each transition intensity would

    be estimated separately and data from FA policies only would be used to estimate kDx;h .

    Figure 2 represents a simpler, all causes, model for CI insurance. We could use the model

    in Figure 2 to model FA and SA policies separately. In this case, a transition from Healthy

    to Insured event means:

    (1) diagnosis with a CI or death before diagnosis with a CI (FA policies), or,

    (2) diagnosis with a CI (SA policies).

    In this case, kx;h in Figure 2 corresponds in terms of the model in Figure 1 to:Xnj1

    kjx;h kDx;h for FA polices; and;

    Xnj1

    kjx;h for SA polices:

    Figure 1. A cause specific model for critical illness insurance.

    Figure 2. An all causes model for critical illness insurance.

    Modelling critical illness I: methodology 3443441

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • Alternatively, we could use the model in Figure 2 to model FA and SA policies together.

    In this case, Insured event has different meanings for FA and SA policies: for FA policies

    it would include death before diagnosis with a CI whereas for SA policies it would not.

    We might reasonably expect that Benefit type, FA or SA, would be an important covariate

    in this model and that, other things being equal, the total claim rate for FA policies

    would be higher than for SA policies.A more satisfactory model, in terms of Benefit type, FA or SA, is illustrated in Figure 3.

    The transition intensities kCIx;h and kDx;h in Figure 3 correspond to

    Pnj1 k

    jx;h and k

    Dx;h in

    Figure 1. For FA policies, a transition to either of the two exit states would result in a

    claim. For SA policies only a transition to Diagnosed with a CI would result in a claim;

    death before diagnosis with a CI would terminate the policy.

    We discuss in Paper II, the parameterisation of the models represented in

    Figures 13.

    3. Data

    3.1. Covariates

    We were provided by the CMI with a set of CI data relating to UK policies in the seven

    calendar years from 1999 to 2005. The data consisted of records of policies in force at the

    start and at the end of each of the seven years and details of claims settled within the seven

    years. The covariates included in each data record were as follows:

    (1) Sex.

    (2) Smoker status: non-smoker or smoker.

    (3) Benefit type: FA or SA.

    (4) Office (coded anonymously).

    (5) Policy type: joint or single life.

    (6) Benefit amount in pounds.

    (7) Date of birth

    (8) Date of commencement of the policy.

    The original data set contained details of 27,244 claims. Data from some offices could not

    be used because of problems associated with missing claims information. Data from these

    Figure 3. An all CI causes and death model for critical illness insurance (Ozkok et al. (2012a)).

    4 E. Ozkok et al.444442

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • offices, both in-force and claims, were removed from our analyses, leaving us with data

    from a total of 13 offices consisting of 19,127 claims and approximately 18,000,000 policy

    years of exposure.

    An additional covariate included in our data was Sales channel, which took one of five

    possible values: Bancassurer, Direct sales, IFA, Other or Unknown. There was a very close

    association between Sales channel and Office 6 of the 13 offices used only one saleschannel, a further three offices used just two, and one office classified all its data as sales

    channel Unknown. We decided that it was unnecessary to include both Sales channel and

    Office as possible covariates so we excluded the former from our analyses.

    For Joint Life policies, both lives are included in the in-force data, but only one claim

    can occur.

    The presence of duplicate policies in the data would not affect point estimates of the

    claim diagnosis rates, but would affect the standard deviations (SD) of these estimates.

    This would distort the goodness of fit statistics, making the fit appear to be worse. No

    attempt was made to remove duplicate policies from either our in-force or our claims files.

    An investigation by the CMI of their 19992004 data indicated that this was not likely tobe a serious problem [see CMI WP 33 (2008, paragraphs 4.104.12)].

    3.2. Cause of claim

    For the claims records, we were also provided with information about the type of claim,

    death or CI, and, in the latter case, the cause of claim. There were 55 different causes of

    claim, including death, in our data set, of which many related to different sites for cancer.

    Among the specified sites for cancer, the largest group was female breast cancer with 1838

    claims. Since we have a significant amount of data for this cancer, it is tempting to analyse

    it as a separate cause. However, cancer claims include site not specified which has the

    largest number of claims, 4363 out of a total 19,127. Since site not specified includes

    female breast cancer claims, as well as other cancers, we cannot reliably analyse female

    breast cancer as a separate cause [see CMI WP 43 (2010, p. 37)].

    We grouped the claims into 10 separate causes, including death, with cancer treated as a

    single cause and the numerically minor causes, such as Motor Neurone Disease and

    Angioplasty grouped together as Other causes. Table 1 shows the split of our claims data

    by various factors, including cause.

    The Association of British Insurers (ABI) has issued a series of reports on CI

    insurance, starting in 1999, with the most recent (to date) being ABI (2011), designed to

    clarify and standardise the definitions of illnesses covered by CI policies in the UK.

    These definitions are accepted for most of the illnesses as a standard guide by the

    insurance companies.

    3.3. Missing dates

    The natural sequence of events for a CI claim is:

    Diagnosis: the illness is diagnosed, or death occurs, then,

    Modelling critical illness I: methodology 5445443

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • Notification: a claim is notified to the insurer, then,

    Admission: the insurer admits the claim, and finally,

    Settlement: the insurer settles the claim.

    For each claim, the CMI asks contributing offices to provide the date for each of these

    four events. However, some claim records have one or more of these dates missing. Table 2

    shows the mean delay (in days) between selected pairs of these dates and the percentage of

    the 19,127 claims which have both dates recorded. Just over 9% of the claims records had

    no date of diagnosis, but all of these claims had a date of settlement. We need the date of

    diagnosis because diagnosis with a CI, rather than the settlement of the subsequent claim,

    is the insured event note that for all three models in Section 2, diagnosis with a CItriggers the transition from Healthy and the date of diagnosis determines importantcharacteristics of the claim, for example, the age of the insured life and the duration of the

    policy when the diagnosis occurred. By modelling, the time delay between the dates of

    diagnosis and settlement of a claim, the so-called Claim Delay Distribution (CDD), we

    Table 1. Number of claims and percentages by various factors.

    Benefit typeFull accelerated 16,875 (88.2%)

    Stand alone 2252 (11.8%)

    Joint/single life

    Joint life 9743 (50.9%)

    Single life 9384 (49.1%)

    Gender

    Female 8173 (42.7%)

    Male 10,954 (57.3%)

    Smoker status

    Non-smoker 14,129 (73.9%)

    Smoker 4998 (26.1%)

    Cause of claim

    Coronary artery bypass graft 393 (2.1%)

    Cancer 9381 (49.0%)

    Death 3371 (17.6%)

    Heart attack 2220 (11.6%)

    Kidney failure 110 (0.6%)

    Major organ transplant 36 (0.2%)

    Multiple sclerosis 825 (4.3%)

    Other 1265 (6.6%)

    Stroke 1027 (5.4%)

    Total and permanent disability 499 (2.6%)

    Type of claim

    Critical illness 15,756 (82.4%)

    Death 3371 (17.6%)

    Table 2. Average observed delays between dates of diagnosis, notification, admission and settlement (in days).

    Diagnosis to

    notification

    Notification to

    admission

    Admission to

    settlement

    Diagnosis to

    settlement

    Mean delay 93 80 18 185

    Number of observations 15,585 9190 9752 15,860

    Percentage of observations having

    both dates

    81 48 51 83

    6 E. Ozkok et al.446444

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • can estimate missing dates of diagnosis by subtracting from the date of settlement the

    median of the CDD. The sensitivity of our final results to the use of the median in this

    context is discussed in Paper II.

    All claim records had a year of settlement, but, in some cases, the exact date of

    settlement was missing. For all these cases, the date of diagnosis was given and a date of

    settlement was estimated using the median of the CDD.

    The modelling of the CDD is discussed in Section 4 below [see also CMI WP 14

    (2005)].

    4. Modelling the CDD

    4.1. Earlier work allowing for missing dates

    In an earlier paper, Ozkok et al. (2012b), the authors described how to model the CDD

    using the same data set being used in the present paper. The details are not repeated here.

    Key points about this modelling exercise were:

    (1) Various GL-type models were fitted to the data, with different error distributions,

    most notably the lognormal and the three parameter Burr. The latter gave the best fit.

    (2) We fitted three parameter Burr distributions, parameterised as follows:

    f u; a; s; s a s u=ss

    u1 u=ssa1 (1)

    where f(.) is the probability density function of the CDD, a and t are (positive) shape

    parameters and s is a (positive) scale parameter. With this parametrisation, the kth

    moment of the delay between diagnosis and settlement is:

    sk C a ks

    C 1 k

    s

    Ca (2)

    for a > k=s, and otherwise.Note that the CMI has also used Burr distributions to fit CDDs to CI data sets

    (CMI WP 33 (2008)), but not in a GLM setting.

    (3) The mean of our CDD is a loglinear function of a selected set of covariates denotedby the vector u, so that:

    EX expb hT (3)

    where X denotes the delay and b is a set of regression coefficients. The equation for

    the mean given by Equation (3) is achieved by modelling the parameter s as follows:

    s CaC a 1

    s

    C 1 1

    s

    expb hT:

    Modelling critical illness I: methodology 7447445

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • The variance of the distribution is given by:

    VX s2 CaCa 2=sC1 2=s Ca 1=s2C1 1=s2

    Ca2 (4)

    and the loglikelihood is given by:

    n loga n logs asX

    i

    logs s 1X

    i

    logdi a 1X

    i

    logss dsi (5)

    where di is the delay for the ith claim and n is the number of claims.

    (4) The parameters a and t and the regression coefficients b were modelled using

    Bayesian techniques using the full data set consisting of 19,127 claims. Missing event

    dates were treated as additional parameters and estimated using their posterior

    predictive distributions. Truncation was used where appropriate, so that, for example,

    a missing date of diagnosis could not be before the date of commencement of the

    policy or after the date of notification of the claim. The Bayesian analysis results in a

    posterior distribution for each parameter, and, in particular, for each missing date of

    diagnosis. A point estimate of the missing date could be, for example, the median of

    the posterior distribution.

    (5) Gibbs variable selection was used to determine which covariates should be retained in

    the models.

    4.2. Allowing for business growth

    The analysis in Ozkok et al. (2012b) did not take into account one significant factor:

    business growth. For almost all the offices contributing to our data and for almost all

    years, the number of CI policies in force increased year on year. If this is not taken into

    account, it can introduce bias into the modelling of the CDD. Recall that our claims data

    consist of claims settled in the years 19992005. For claims settled in any of these 7 years,those with relatively short delays relate to claims from policies in force in more recent

    years; those with relatively longer delays relate to policies in force in earlier years. The

    growth in the numbers of policies in force means that claims with shorter delays are likely

    to be relatively over-represented in our data. We allow for business growth in our

    modelling of the CDD as follows:

    (1) For each office and each year of diagnosis we assign a growth rate, denoted GR. This

    depends only on office and year of diagnosis and not on any other characteristic of

    the claim, for example type of policy (FA or SA) or age of the policyholder.

    (2) For the most recent year for which the office contributed data, 2005 for all but 2 of the

    13 offices, GR is set at 1. For each earlier year of diagnosis, GR is set equal to the ratio

    of the average number of policies in force in the following year to the average number

    in force in the year in question. In this context, average number in force is the average

    of the numbers of policies in force at the start and at the end of the year. For years of

    diagnosis prior to the earliest for which the office contributed data, the growth rate is

    assumed to be the same as in the earliest year for which data exists.

    8 E. Ozkok et al.448446

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • (3) For each office and each year of diagnosis we assign a growth factor, denoted GF. The

    growth factor is the product of the growth rates for that year of diagnosis and all

    subsequent years of diagnosis up to the final year for which the office contributed

    data.

    (4) For the purposes of parameter estimation, the parameter s in the three parameter

    Burr distribution is replaced in the loglikelihood function (5) by sw, where:

    sw s=GF

    p:

    The effect of this is to decrease the variance for this claim by a factor GF, as can be

    seen from Equation (4), so giving more weight to data from years where relatively few

    policies were in force. Using weights inversely proportional to the variance is common

    in weighted least squares estimation (see, for example, Greene (1990)).

    The procedure described above requires the date of diagnosis to be known so that GF

    can be estimated. For claims where the date of diagnosis was not known, an iterative

    procedure was used. A CDD was parameterised without allowing for business growth and

    a preliminary estimate of the year of diagnosis was calculated from the date of settlement

    minus the median of the CDD. A value for GF was calculated using this preliminary

    estimate, a revised CDD was parameterised and a revised estimate of the year of diagnosis

    was calculated. The process ended when two consecutive estimates of the year of diagnosis

    were the same this never took more than three iterations.

    4.3. Details of the covariates

    Details of the covariates used in the modelling of the CDD are given in Table 3. These

    covariates are labelled x and u1u9. The values of x and u1u7 for each claims recordhave been standardised by subtracting the mean and dividing by the standard deviation,

    calculated from the claims data. This makes sense for covariates where the non-

    standardised value can be very large, for example, benefit amount, and has been done

    for consistency for other covariates. For example, Sex has been coded 0 for females, 1 for

    males and then standardised for each record by subtracting the mean, 0.573, and

    dividing by the standard deviation, 0.495, so that a claims record for a female has a

    value u1(00.573)/0.4951.158.Note that Settlement year is a covariate but Year of diagnosis is not. It would not be

    appropriate to have both as covariates. We used the former because we have full

    information about Settlement year whereas we have had to estimate the latter in some

    cases. This causes minor complications in the estimation of diagnosis rates (see Section 5.4).

    Equation (3) can then be written in more detail as follows:

    EX expb hT b0 b1 x X7j1

    bj1 hj b8;Officei b9;Causei (6)

    where b0 is an intercept and individual bs are taken to be zero if the corresponding

    covariate is not included in the model.

    Modelling critical illness I: methodology 9449447

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • 4.4. The best fitting CDD

    To estimate the missing dates of diagnosis, the best fitting CDD was selected. The

    covariates to be included in this model were selected by Gibbs variable selection; those

    excluded were x (Age), u1 (Sex), u3 (Smoker status) and u5 (Settlement year). The means

    and the standard deviations of the regression coefficients and the two Burr parameters are

    given in Table 4.

    Some points to note about the regression coefficients in Table 4 are:

    (1) Benefit amount has a negative coefficient, so that the larger the amount, the shorter

    the expected delay.

    (2) The expected delay depends on Office and can differ by up to a factor of 2.4, since

    exp(0.5810.315)2.4.(3) The expected delay depends on Cause and can differ by up to a factor of 2.1, since

    exp(0.2150.542)2.1, with death giving the shortest delays, and TPD the longest.(4) The mean and the variance of the posterior distributions for the parameters a and t

    are shown in Table 4. From these values, we can see that whatever (reasonable) point

    estimates we choose for these parameters, a is less than 2/t, and so the standard

    deviation of the posterior distribution for the delay is infinite.

    (5) 95% credible intervals for all the parameters are given approximately by the mean

    plus and minus twice the standard deviation.

    The best fitting CDD is taken to mean the three-parameter Burr distribution, as

    specified in Section 4.1, whose coefficients/parameters are equal to the means shown in

    Table 4.

    Table 5 shows 11 scenarios. Scenario 1 has typical values for its covariates; bold font

    marks a change in a covariate from Scenario 1. The mean of the posterior distribution for

    Table 3. Definitions of the covariates used in the modelling of the CDDs.

    Covariate Number of levels Additional information Mean SD

    x Age Age last birthday 44.424 9.478

    u1 Sex 2 F0, M1 0.573 0.495u2 Benefit type 2 FA0, SA1 0.118 0.322u3 Smoker status 2 N0, S1 0.261 0.439u4 Policy type 2 JL0, SL1 0.491 0.499u5 Settlement year 7 1999 0 1,2000 0 2, . . . 4.917 1.786u6 Benefit amount () Continuous 55,397 56,988

    u7 Policy duration (days) Continuous 1167 946

    u8 Office 13

    u9 Cause of claim 10 1. CABG

    2. Cancer

    3. Death

    4. Heart attack

    5. Kidney failure

    6. Major organ transplant

    7. Multiple sclerosis

    8. Other

    9. Stroke

    10. Total and permanent disability

    10 E. Ozkok et al.450448

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • the delay between diagnosis and settlement for each of these 11 scenarios is shown in

    Table 6, together with the standard deviation and some percentage points of the estimate

    of the mean. Note that these are not the standard deviation and percentage points of the

    posterior distribution itself the standard deviation is infinite in every case, as pointedout in comment (4) earlier in this section. The means in Table 6 can be obtained from

    Equation (6), noting that for this model u(u2,u4,u6,u7,u8,u9), and using the informationin Table 3 and the parameters in Table 4. For example, the mean delay for scenario 1 is

    calculated as follows:

    EX exp5:469 0:023 0 0:118=0:322 0:034 0 0:491=0:499 0:032 50 000 55 397=56 988 0:098 1460 1167=946 0:158 0:101

    174 days

    Table 4. Coefficients for the best fitting CDD model.

    Covariate Parameter Mean SD Covariate Parameter Mean SD

    Intercept b0 5.469 0.025 Cause of claim b10;Cause1 0.145 0.040Benefit type b3 0.023 0.006 b10;Cause2 0.101 0.018Policy type b5 0.034 0.006 b10;Cause3 0.542 0.026Benefit amount b7 0.032 0.007 b10;Cause4 0.029 0.023Policy duration b8 0.098 0.007 b10;Cause5 0.129 0.079Office b9;Office1 0.303 0.022 b10;Cause6 0.194 0.116

    b9;Office2 0.215 0.020 b10;Cause7 0.152 0.033b9;Office3 0.205 0.061 b10;Cause8 0.006 0.028b9;Office4 0.249 0.050 b10;Cause9 0.215 0.027b9;Office5 0.090 0.037 b10;Cause10 0.121 0.056b9;Office6 0.050 0.085 a 0.618 0.015b9;Office7 0.129 0.118 t 2.570 0.034b9;Office8 0.106 0.021b9;Office9 0.315 0.025b9;Office10 0.201 0.033b9;Office11 0.158 0.017b9;Office12 0.209 0.023b9;Office13 0.581 0.047

    Table 5. Scenarios for prediction of the CDD under the best fitting model.

    Scenario 1 2 3 4 5 6

    Benefit type FA SA FA FA FA FA

    Joint/single life J J S J J J

    Benefit amount 50,000 50,000 50,000 10,000 250,000 50,000

    Policy duration 1460 1460 1460 1460 1460 365

    Office code 11 11 11 11 11 11

    Cause of claim Cancer Cancer Cancer Cancer Cancer Cancer

    Scenario 7 8 9 10 11

    Benefit type FA FA FA FA FA

    Joint/single life J J J J J

    Benefit amount 50,000 50,000 50,000 50,000 50,000

    Policy duration 3650 1460 1460 1460 1460

    Office code 11 6 10 11 11

    Cause of claim Cancer Cancer Cancer Death TPD

    Modelling critical illness I: methodology 11451449

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • 4.5. CDDs incorporating all covariates

    For the purpose of estimating CI diagnosis rates, it is convenient to have a CDD which

    incorporates all possible covariates which could be included in our model. Since we

    estimate CI diagnosis rates for an all causes model and for a cause specific model

    (Paper II), we need two further CDDs: one which incorporates all the covariates x and

    u1u8, but does not include cause, and a separate CDD which incorporates all thecovariates x and u1u9, which includes cause. The coefficients for these two CDDs areset out in Tables 7 and 8.

    Points to note about these two CDDs are:

    (1) They were fitted in the same way as the best fitting CDD in Section 4.4, with the one

    difference that Gibbs variable selection was not used to determine which covariates

    are important and so should be retained. In particular, the fitting used Bayesian

    methodology, allowing for business growth and including claim records where data

    were missing.

    (2) For each covariate incorporated, the methodology produces a posterior distribution

    for the regression coefficient. Tables 7 and 8 show the mean and the standard

    deviation of the posterior distribution for each coefficient.

    (3) For each of these two CDDs, aB2/t using the means, or any reasonable estimates, ofthese parameters, so that the standard deviation of the posterior distribution for the

    delay is always infinite.

    (4) 95% credible intervals for all the parameters in Tables 7 and 8 are given

    approximately by the mean plus and minus twice the standard deviation.

    (5) These two CDDs are used in Paper II when we discuss the estimation of CI diagnosis

    rates. See Section 5.4 for an outline of the methodology.

    The CDD with all covariates excluding (resp. including) cause is taken to mean the three-

    parameter Burr distribution, as specified in Section 4.1, whose coefficients/parameters are

    equal to the means shown in Table 7 (resp. Table 8).

    Table 6. The mean of the posterior distribution of the CDD under the different scenarios given in Table 5 using

    the best fitting model, and the standard deviation and some percentage points of the estimate of the mean.

    Scenario Mean SD 2.5% 50% 97.5%

    1 174 4.0 167 174 182

    2 162 4.8 153 162 172

    3 186 4.3 178 186 195

    4 178 4.1 170 178 186

    5 156 5.1 146 155 166

    6 195 4.4 187 194 204

    7 139 4.2 131 139 147

    8 194 17.8 162 194 231

    9 249 10.0 230 249 270

    10 112 3.2 106 112 119

    11 217 12.7 193 217 243

    12 E. Ozkok et al.452450

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • 5. Estimating and smoothing CI diagnosis rates

    5.1. Preliminaries

    In this section, we outline the procedure for the estimation and smoothing of the CI

    diagnosis rates for the models in Figures 13. We use the generic notation lx;u for thisdiagnosis rate, even though this may be, for example, a cause specific rate, as in Figure 1.

    We assume the following general functional form for the diagnosis rate:

    kx;h grx exp fsx bhT

    ; r; s 0; 1; . . . (7)

    Table 7. Coefficients for the CDD with all covariates except cause.

    Covariate Parameter Mean SD Covariate Parameter Mean SD

    Intercept b0 5.288 0.020 Office b9;Office1 0.279 0.020Age b1 0.006 0.006 b9;Office2 0.203 0.018Sex b2 0.022 0.005 b9;Office3 0.184 0.056Benefit type b3 0.010 0.005 b9;Office4 0.279 0.046Smoker status b4 0.015 0.005 b9;Office5 0.112 0.035Policy type b5 0.033 0.005 b9;Office6 0.025 0.068Settlement year b6 0.008 0.006 b9;Office7 0.086 0.120Benefit amount b7 0.026 0.006 b9;Office8 0.122 0.019Policy duration b8 0.083 0.007 b9;Office9 0.302 0.023

    b9;Office10 0.201 0.030b9;Office11 0.170 0.017b9;Office12 0.226 0.021b9;Office13 0.581 0.030a 0.543 0.011

    t 2.958 0.036

    Table 8. Coefficients for the CDD with all covariates, including cause.

    Covariate Parameter Mean SD Covariate Parameter Mean SD

    Intercept b0 5.206 0.022 Cause of claim b10;Cause1 0.137 0.036Age b1 0.014 0.006 b10;Cause2 0.120 0.018Sex b2 0.010 0.005 b10;Cause3 0.498 0.019Benefit type b3 0.026 0.005 b10;Cause4 0.026 0.021Smoker status b4 0.011 0.005 b10;Cause5 0.106 0.067Policy type b5 0.030 0.005 b10;Cause6 0.149 0.109Settlement year b6 0.116 0.006 b10;Cause7 0.137 0.029Benefit amount b7 0.036 0.006 b10;Cause8 0.003 0.024Policy duration b8 0.103 0.006 b10;Cause9 0.203 0.025Office b9;Office1 0.217 0.019 b10;Cause10 0.182 0.047

    b9;Office2 0.095 0.017 a 0.660 0.015b9;Office3 0.209 0.053 t 2.850 0.034b9;Office4 0.177 0.043b9;Office5 0.190 0.033b9;Office6 0.391 0.062b9;Office7 0.344 0.112b9;Office8 0.004 0.019b9;Office9 0.193 0.022b9;Office10 0.178 0.029b9;Office11 0.120 0.016b9;Office12 0.197 0.020b9;Office13 0.587 0.028

    Modelling critical illness I: methodology 13453451

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • where grx and fs(x) are polynomials in age x (last birthday) of degree r and s,respectively, so that:

    kx;h Xri1

    ji xi1 exp

    Xsj1

    dj xj1 bhT

    !: (8)

    where ki and dj, i1, . . . ,r, j1, . . . ,s, are constantsu is a vector of covariates, and,

    b is a vector of regression coefficients.

    Points to note about this general functional form are:

    (1) Without the covariates in u this is a GompertzMakeham GM(r,s) function of age.(2) Without the term gr(x) this gives the linear predictor in a generalised linear model

    incorporating the covariates in u and age, x, albeit with powers of age included up to xs1.

    (3) Although it is not explicit in the notation, we can, and do, allow for interaction terms

    involving two covariates, including age.

    (4) For any given set of covariates, u, and regression coefficients, b, lx;u is necessarily asmooth function of age.

    (5) Many different functional forms could have been chosen for lx;u. The particularfunctional form in Equation (7) was chosen because:

    (i) it is very flexible,

    (ii) it allows one component of the diagnosis rate, gr(x), to depend only on age, and,

    (iii) if, as happened in almost all cases, the optimal value of r is 0, it reduces to a

    generalised linear model, making it possible to use standard statistical software

    to estimate the parameters.

    To determine which covariates should be included in the model and to estimate the

    parameters r,s,ki,dj and b, we need an estimator for lx;u, calculated from our data, withknown statistical properties.

    5.2. The covariates used in the modelling of the intensity rates

    The full set of covariates to be considered in the modelling of the diagnosis rates is shown

    in Table 9.

    Points to note about the covariates in Table 9 are:

    (1) The list of covariates is the same as in Table 3, with the exception of Cause of claim,

    which is no longer needed as a covariate, and u5 which was Settlement year but is now

    Year of exposure for the in-force or Year of diagnosis for the claims. However, the

    treatment of the covariates is, in some cases, different.

    (2) The maximum values of r and s required in Equation (7) were 1 and 3, respectively.

    The values of Age, Age2 and Year were standardised by subtracting the mean and

    dividing by the standard deviation. The values of these moments, calculated from the

    in-force data, are shown in Table 9.

    14 E. Ozkok et al.454452

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • (3) Two covariates, Benefit amount and Policy duration, were treated as continuous in the

    modelling of the CDD but are now categorised as shown in Table 9. The reason for

    this in both cases is computational convenience.

    (4) The regression coefficients for the covariates u6u8 were chosen so that they summed to 0.(5) The regression coefficients for the covariates u1u4 were chosen so that the base

    category, as indicated in Table 9, has coefficient zero and the alternative category has,

    if appropriate, a non-zero coefficient.

    5.3. Calculation of the exposure

    For each office we have in-force data for the start and end of all or some of the calendar

    years 1999, 2000, . . ., 2005. For each calendar year for which we have in-force data, we

    have details of claims settled in that year. Many offices contributed data for all seven

    calendar years. Those that did not, contributed data for a contiguous set of years, so that

    no offices contributed for two or more periods with breaks between them.

    For any given calendar year for which an office contributed data, we can count the

    number of policies in force at the start and end of the year classified by age x last birthday

    and by a set of covariates, u. Using linear interpolation, we can then estimate E(x,u;u), the

    number of policies in force at time u, 05u51, after the start of the year, classified by xand u.

    We make the simplifying assumption that a policy is removed from the in-force data as

    soon as a CI is diagnosed, or death occurs. In practice, there is at least a short period

    between diagnosis and the policys removal because of the delay between diagnosis and

    notification, and there may be a significant period. With our assumption, we regard

    E(x,u;u) as the number of policies exposed to the risk of the diagnosis of a CI, or death, at

    Table 9. Definitions of the covariates used in the modelling of the intensity rates.

    Covariate Number of levels Additional information

    x Age last birthday Integer values Age: mean39.75, SD11.21Age2: mean1705, SD930

    u1 Sex 2 (F & M) F is the base category

    u2 Benefit type 2 (FA & SA) FA is the base category

    u3 Smoker status 2 (N & S) N is the base category

    u4 Policy type 2 (Joint/Single life) J is the base category

    u5 Year Numerical (1999, . . .,2005) Calendar year of exposure/diagnosis

    Year: mean2002.36, SD1.86u6 Benefit amount 4 1: Benefit amountB25,000

    2: 25,000BBenefit amountB50,0003: 50,000BBenefit amountB75,0004: Benefit amount75,000

    u7 Policy duration 6 Duration between the commencement of the policy and the

    beginning of the year of exposure or diagnosis

    Duration 0: Policy DurationB1 yearDuration 1: 1 yearBPolicy Duration52 yearsDuration 2: 2 yearsBPolicy Duration53 yearsDuration 3: 3 yearsBPolicy Duration54 yearsDuration 4: 4 yearsBPolicy Duration55 yearsDuration 5: Policy Duration5 years

    u8 Office 13

    Modelling critical illness I: methodology 15455453

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • time u from the start of a given calendar year, for a given office, classified by x and u. In

    conventional actuarial terminology, this is a central exposure. Note that this exposure does

    not depend on whether we are estimating cause specific diagnosis rates or all causes rates.

    If we knew the number of critical illnesses (cause specific, all causes, including or

    excluding deaths, as appropriate) diagnosed in this year, for this office, classified by x and

    u, say D(x;u), then, using standard methodology, see, for example, Macdonald (1996), we

    could write:

    Dx; h Poisson kx;hZ 1

    u0Ex; u; h du

    so that our estimator for the diagnosis rate, k^x;h, would be given by:

    k^x;h Dx; hZ 1

    u0Ex; u; h du

    (9)

    which has a standard deviation which could be estimated by:

    Dx; h

    p Z 1u0

    Ex; u; h du:

    The difficulty with this approach is that we do not know the number of critical illnesses

    diagnosed in this year; what we know is the number of critical illnesses settled in this year,

    and in the subsequent years within the observation period for which this office

    contributed data.

    5.4. An estimator for lx;u

    We can get around the estimation problem outlined above as follows. Consider a specific:

    office, calendar year for the exposure and diagnosis, age last birthday, x, and, set of

    covariates, u.

    Let: E(x,u;u) denote the exposure at time u years after the start of the specific calendar

    year, t denote the time in years from the start of the specific calendar year until the end of

    the last year, within the observation period, for which the specific office submitted data,

    F(s,x;u) denote the cumulative distribution function for the CDD incorporating all the

    covariates in u in practice this is one of the two CDDs in Section 4.5 depending onwhether or not we are estimating an all causes or a cause specific diagnosis rate, and,

    N(x;u) denote the number of critical illnesses (all causes or cause specific as required)

    diagnosed in the specific calendar year, for this office, at age x last birthday and with

    covariates u, and settled within one of the years (in the observation period) for which this

    office submitted data.

    Note that N(x;u) differs from D(x;u) since some critical illness claims included in the

    latter will not be settled until after the period in which the office contributes data. The

    16 E. Ozkok et al.456454

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • probability that a CI diagnosed at time u will be settled by the end of the last year of

    contribution is F(tu;x,u). Hence, we can write:

    Nx; hfPoisson kx;hZ 1

    u0Ex; u; hFt u; x; h du

    so that our estimator for the diagnosis rate, k^x;q, is given by:

    k^x;h Nx; hZ 1

    u0Ex; u; hFt u; x; h du (10)

    which has a standard deviation which can be estimated by:

    Nx; h

    p Z 1u0

    Ex; u; hFt u; x; h du: (11)

    Comparing Equations (9) and (10), we can see that the numerators are different, as

    explained above, and that the denominator of the latter has been reduced by the inclusion

    of the term F(tu;x,u) to allow for the probability that a CI diagnosed in the specific yearwill be settled within the observation period.

    Points to note about this estimation methodology are:

    (1) As a starting point, the exposure, E(x,u;u), and the claims count, N(x;u), are classified

    by every combination of all possible covariates, as listed in Table 9. It is

    computationally convenient, but not essential, that the CDD also includes each of

    these covariates. If the claims count relates to a specific cause, then it is convenient for

    the model for the CDD to incorporate cause of claim. If it is found that a covariate is

    statistically unimportant for the modelling of the diagnosis rates, then the claims

    count, N(x;u), and the adjusted exposure,R 1

    u0 Ex; u; hFt u; x; h du can beaggregated over the values for that covariate.

    (2) The estimator in Equation (10) is based on critical illnesses diagnosed in a particular

    year. This year is specified in the covariate u5 for the exposure and the claims count

    (see Table 9). However, the CDD used in the estimator has Year of settlement rather

    than Year of diagnosis as a covariate (see Table 3). This slight mismatch is unfortunate

    but is not likely to be of any numerical significance since:

    (i) Year of settlement was not an important covariate for the best fitting CDD, and,

    (ii) many claims are settled in, or very soon after the end of, their Year of diagnosis.

    (3) The two CDDs in Section 4.5 incorporate Benefit amount (u6) and Policy duration (u7)

    as continuous covariates, whereas for the estimation of the diagnosis rates these

    covariates have been categorised as shown in Table 9. The value of the CDD in the

    calculation of the estimator in Equation (10) uses a mid-point value for these two

    covariates, as shown in Table 10, although the mid-point for the upper end is fixed

    somewhat arbitrarily. The categories for Benefit amount correspond approximately to

    the quartiles from the data.

    Modelling critical illness I: methodology 17457455

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • 5.5. Parameter estimation

    The parameters r,s,ki,dj and b were estimated under the assumed Poisson model using

    either maximum likelihood (when the term gr(x) was present) or GLM methodology

    otherwise. The covariates to be included in the model were chosen by minimising the

    Bayes Information Criterion (BIC), given by:

    BIC 2 log Lj^; d^; bb p lognwhere L() is the likelihood function, j^; d^ and bb are the (vectors of) estimates of the modelparameters, p is the total number of estimated parameters, and, n is the number of data

    points.

    In principle, we could try to minimise the BIC as a function of the complete set of

    parameters. In practice, this would cause computational difficulties and so a pragmatic

    approach was employed. We used the following procedure to determine the best model(s):

    (1) First we set r0 and s1. We then choose the value of d1 and the set of covariates, u,together with their parameter values, b, which minimises the BIC. In choosing the

    optimal set of covariates, we allow for an interaction only if there is a prima facie case

    for including it. In practice, the only interaction investigated (and, in some cases,

    included) was AgeSmoker.(2) Keeping r0, we then increase s by 1 and choose the values for d1 and d2, u, and the

    corresponding parameter values, b, which minimise the BIC.

    (3) We repeat step (2) until the BIC increases. The value of s and the corresponding

    values for d1, . . . ,ds, set of covariates, u, and parameters, b, which minimise the BIC,

    at least locally, are then our selected values.

    (4) For the selected values of s and u we increase the value of r by 1 and check whether,

    by optimising over the ks, ds and bs, the BIC decreases or not. If it decreases, we

    repeat step (4). If it increases, we choose the value of r which (locally) minimises the

    BIC. In almost all cases, the optimal value of r was zero. The only exception was the

    diagnosis rate for death for the models in Figures 1 and 3, where the optimal value for

    r was 1.

    The calculations were carried out using the statistical package R.

    Table 10. Values of benefit amount and policy duration used in the CDDs for the estimation of

    CI diagnosis rates.

    Benefit amount Policy duration

    Category Mid-point Category Mid-point

    1:525,000 12,500 0: B1 year 183 days2: 25,000 0 50,000 37,500 1: 1 0 2 years 548 days3: 50,000 0 75,000 62,500 2: 2 0 3 years 913 days4:]75,000 100,000 3: 3 0 4 years 1278 days

    4: 4 0 5 years 1643 days5: ]5 years 2585 days

    18 E. Ozkok et al.458456

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15

  • The results of our modelling are set out and discussed in Paper II. More details of the

    procedures and results can be found in Ozkok (2011).

    Acknowledgements

    The authors are grateful to the Continuous Mortality Investigation for supplying the data

    and for advice and support throughout the course of this research, and also to Hacettepe

    University for their financial support for one of the authors, Erengul Ozkok, while this

    research was being carried out.

    References

    Association of British insurers. (2011). Statement of best practice for critical illness. London: ABI.

    CMI WP 14. (2005). Continuous Mortality Investigation Committee Working Paper 14 Methodology underlyingthe 19992002 CMI critical illness experience investigation. Institute of Actuaries and Faculty of Actuaries.

    CMI WP 33 (2008). Continuous Mortality Investigation Committee Working Paper 33 A new methodology foranalysing CMI critical illness experience. Institute of Actuaries and Faculty of Actuaries.

    CMI WP 43 (2010). Continuous Mortality Investigation Committee Working Paper 43 CMI critical illnessdiagnosis rates for accelerated business, 19992004. Institute of Actuaries and Faculty of Actuaries.

    CMI WP 50 (2011). Continuous Mortality Investigation Committee Working Paper 50 CMI critical illnessdiagnosis rates for accelerated business, 20032006. Institute and Faculty of Actuaries.

    CMI WP 52 (2011). Continuous Mortality Investigation Committee Working Paper 52 Causespecific CMIcritical illness diagnosis rates for accelerated business, 20032006. Institute and Faculty of Actuaries.

    Dinani, A., Grimshaw, D., Robjohns, N., Somerville, S., Spry, A., Staffurth, J. (2000). A critical review: report of

    the critical illness healthcare study group. Presented to the Staple Inn Actuarial Society.

    Greene, W. H. (1990). Econometric analysis. New York: Macmillan.

    Macdonald, A. S. (1996). An actuarial survey of statistical models for decrement and transition data. I: multiple

    state, binomial and Poisson models. British Actuarial Journal 2, 129155.Ozkok, E. (2011). A stochastic model for critical illness insurance. PhD thesis. HeriotWatt University, 213 p.Ozkok, E., Srefraris, G., Waters, H. R. & Wilkie, A. D. (2012a). Modelling critical illness claim diagnosis rates II:

    results. The Scandinavian Actuarial Journal, DOI:10.1080/03461238.2012.728538.

    Ozkok, E., Sreftaris, G., Waters, H. R. & Wilkie, A. D. (2012b). Bayesian modelling of the time delay between

    diagnosis and settlement for critical illness insurance using a burr generalised-linear-type model. Insurance:

    Mathematics and Economics 50, 266279.Waters, H. R. (1984). An approach to the study of multiple state models. Journal of the Institute of Actuaries 111,

    363374.

    Modelling critical illness I: methodology 19459457

    Dow

    nloa

    ded

    by [W

    inche

    ster S

    choo

    l of A

    rt] at

    08:36

    26 M

    ay 20

    15