st3054_slides.pdf

219
 Introduction Survival models Lifetime distribution functions Cox regression ST3054/ST6004 - Survival Analysis Eric Wolsztynski [email protected] Department of Statistics School of Mathematical Sciences University College Cork, Ireland 2014-2015 Version 1.0

Transcript of st3054_slides.pdf

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    ST3054/ST6004 - Survival Analysis

    Eric Wolsztynski

    [email protected]

    Department of StatisticsSchool of Mathematical SciencesUniversity College Cork, Ireland

    2014-2015Version 1.0

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Acknowledgment

    These lecture notes follow and adapt a section of the Institute andFaculty of Actuaries CT4 notes, in respect of the exemptionprogramme in place for ST3054 and ST6004. However thisdocument does not reproduce the CT4 notes fully nor exactly, andalso presents notions, developments and examples not found inthose notes.

    These notes also use a large part of former ST3054 notes writtenby Dr Kingshuk Roy Choudhury and Dr Tony Fitzgerald for aprevious course syllabus.

    For any comment or query about this document, please [email protected]

    ST3054 - ST6004 2

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Timetable and assesment

    I ST3054 is part of ST6004

    I ST3054 is taught in Semester 1

    Lectures / Tutorials: Tuesdays 2-3pm in G15Fridays 11am-12pm in G13

    Tutorials / Practicals: Monday 10-11am in lab G34

    I Continuous assesment: 2 home assignments (10 marks each)

    I 90-minute exam in December (80 marks)

    ST3054 - ST6004 3

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Module objective and content

    I Module Objective:To develop techniques for the analysis of survival data

    I Module Content:1) Parametric models of survival, use of life tables, types of

    censoring, hazard functions

    2) Non-parametric estimation of hazard and survival functions,Kaplan-Meier and Nelson-Aalen estimators

    3) Proportional hazards model with covariates

    I Use of software

    ST3054 - ST6004 4

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Learning Outcomes

    I Explain the concept of a survival model and be able todescribe the more commonly used mortality / survivalfunctions and apply these to solve practical problems

    I Define the distribution and density functions of the randomfuture lifetime, the survival function, the force of mortalityand derive relationships between them

    I State the Gompertz and Makeham laws of mortality and beable to apply both to solve practical problems

    I Describe various ways in which lifetime data might becensored and be able to describe the various problemsintroduced by censoring

    ST3054 - ST6004 5

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Learning Outcomes

    I Describe both the Kaplan-Meier and Nelson-Aalen estimate ofthe survival function in the presence of censoring, explain howit arises as a maximum likelihood estimate, compute it fromtypical data and estimate its variance

    I Describe the Cox model for proportional hazards, derive thepartial likelihood estimate in the absence of ties, and state itsasymptotic distribution

    I Interpret the effect of covariates on the hazard of a populationat risk in the Cox proportional hazards model

    I Explain and apply the concept of proportional hazards modelselection using likelihood ratio tests

    ST3054 - ST6004 6

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Related material

    I Pre-requisite: ST2053, ST2054

    I Co-requisite: ST3053

    I Textbook: CT4 Notes from Institute and Faculty of Actuaries- Contact Damian or Linda- Before the end of September

    I ST3054 syllabus also connects with ST3074

    I ST3054 used to include IFA CT4 content:

    Ch4 The two-state Markov model

    Ch10 The Binomial and Poisson models

    ST3054 - ST6004 7

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Outline

    I Introduction

    II Survival models (Ch7 of CT4 notes 2013)

    III Lifetime distribution functions (Ch8 of CT4)

    IV The Cox regression model (Ch9 of CT4)

    ST3054 - ST6004 8

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Introduction

    ST3054 - ST6004 9

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Demography & public health

    I How long will we live?

    I Do men live longer than women?

    I How do lifestyle factors affect your lifespan?

    I How long will our children live?

    I Which people have the longest lifespan?

    I What implications does an increasing lifespan have?

    ST3054 - ST6004 10

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Life expectancy in Irelandhttp://www.cso.ie/Quicktables/GetQuickTables.aspx?FileName=VSA30.

    asp&TableName=Life+Expectancy&StatisticalProduct=DB_VS

    (CSO QuickTables - VSA30 - Life Expectancy)

    At Age 0 10 20 35 55 65 751926 57.4 55.2 46.4 34.4 19.1 12.8 7.72006 76.8 67.2 57.5 43.3 24.8 16.6 9.8

    Table : Life expectancy (Males)

    At Age 0 10 20 35 55 65 751926 57.9 54.9 46.4 34.7 19.6 13.4 8.42006 81.6 72.0 62.1 47.4 28.5 19.8 12.1

    Table : Life expectancy (Females)

    ST3054 - ST6004 11

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Life expectancy worldwidehttps://www.cia.gov/library/publications/the-world-factbook/rankorder/

    2102rank.html (CIA World Factbook: Country Comparison :: Life expectancy at birth)

    Rank Country L.E.1 Monaco 89.68

    2 Macau 84.43

    3 Japan 83.91

    4 Singapore 83.75

    5 San Marino 83.07

    6 Andorra 82.50

    7 Guernsey 82.24

    8 Hong Kong 82.12

    9 Australia 81.90

    10 Italy 81.86

    Rank Country L.E.212 Mozambique 52.02

    213 Lesotho 51.86

    214 Zimbabwe 51.82

    215 Somalia 50.80

    216 Central Afr. Rep. 50.48

    217 Afghanistan 49.72

    218 Swaziland 49.42

    219 South Africa 49.41

    220 Guinea-Bissau 49.11

    221 Chad 48.69

    ST3054 - ST6004 12

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Lifestyle factors

    ST3054 - ST6004 13

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Insurance & pension

    I Insurance: pay a lump sum on death

    I Pension: pay an annuity (fixed amount) till death

    I Times (& cost) of payment are dependent on human lifetimes

    I Calculation of expected cashflow depends on distribution ofhuman lifetime

    I Study of distribution of lifetimes is called survival analysis

    ST3054 - ST6004 14

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Machine reliability

    I Air conditioner, working at high temp

    I Begins working at time t = 0

    I S(t) = P(AC functioning at future time t)

    I S(t) = survival probability function

    I T = future lifetime (failure time)

    I S(t) = P(T > t) = 1 FT (t)I Machines are not humans: different models are needed

    ST3054 - ST6004 15

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Biostatistics (clinical studies)

    I Is a new drug for cancer more effective?

    I t = 0 is date of diagnosis / injection

    I S(t) = P(Alive at future time t)

    I S(t) can be used to judge efficacy of new treatments,indicators

    I Regression with S(t) as response

    ST3054 - ST6004 16

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistics in survival analysis

    I Estimating the survival function

    I Modelling dependence of the survival function on covariates

    I Predicting survival

    I Estimating significance / standard errors

    ST3054 - ST6004 17

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Types of data

    I Actuarial and demographic studies deal with large samples

    I Clinical studies deal with smaller samples

    I Actuarial and demographic studies are generallycross-sectional (or transversal) studies:- observation of a population/sample at one point in time- aim to provide data on the whole population

    I Clinical studies are generally longitudinal:- observations are repeated on the same individuals overperiods of time

    ST3054 - ST6004 18

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Aggregate model

    I Consider S(t) where t = 0 denotes time of birth

    I X (age at death)?= T (time of death)

    I x = attained age

    I Survival function S(x) = S(t)

    I When survival is not dependent on age, use S(t)

    I If age is important, use Sx(t)

    ST3054 - ST6004 19

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Select model

    I Person opting for insurance, age x

    I Want to find S(t)

    I Sx(10) is different if x = 25 than if x = 55

    I Survival function is really S(t, x)

    I x is a concomitant variable, or covariate

    I Other variables also affect S(t)

    I Study the effect of other variables on S(t)

    I Actuarial method: study Sx(t)

    ST3054 - ST6004 20

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Section I

    Survival models

    ST3054 - ST6004 21

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    I.1 Simple survival models

    ST3054 - ST6004 22

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    A simple survival model

    I We consider a first model of random lifetimes

    I The future lifetime of an individual is treated as a continuousrandom variable

    I Model already provides a set of fundamental tools for theanalysis of human mortality

    ST3054 - ST6004 23

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Future lifetime

    I The lifetime of a person (or life) is not known in advance

    I Lifetimes (random variables) range from 0 to over 100 years

    I Let denote the limiting age (maximum age)

    Assumption: the future lifetime of a new-born person, denoted T ,is a random variable continuously distributed on an interval [0, ]where 0 <

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Future lifetimeDefinition:

    F (t) = P(T t) is the distribution function of TS(t) = P(T > t) = 1 F (t) is the survival function of T

    I S(t) is the probability of a new-born surviving to age t

    I Let Tx be the future lifetime after age x , of a life whosurvives to age x , with 0 x and T0 = T

    Definition (0 x ):

    Fx(t) = P(Tx t) is the distribution function of TxSx(t) = P(Tx > t) = 1 Fx(t) is the survival function of Tx

    ST3054 - ST6004 25

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Future lifetimeExamples:

    I F30(50) denotes the probability that a 30-year old dies beforehis/her 80th birthday

    I S25(32) represents the probability that a 25-year old survivesat least another 32 years

    For consistency with T , the distribution function of the randomvariable Tx must satisfy the following:

    Fx(t) = P(Tx t)= P(T x + t|T > x)=

    F (x + t) F (x)S(x)

    ST3054 - ST6004 26

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Probabilities of death and survival

    Actuarial notation for death and survival probabilities:

    tqx probability that someone aged x dies with t years

    qx probability that someone aged x dies within 1 year

    tpx probability that someone aged x is still alive after t years

    px probability that someone aged x is still alive after 1 year

    In particular we have

    tqx = Fx(t)

    tpx = 1t qx = Sx(t)

    ST3054 - ST6004 27

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Survival probabilities: example 1

    Age x lx dx px qx90 9,253 2,035 0.78006 0.2199491 7,218 1,711 0.76297 0.2370392 5,507 1,403 0.74515 0.2548593 4,104 1,122 0.72659 0.2734194 2,982 873 0.70730 0.2927095 2,109 660 0.68728 0.3127296 1,449 483 0.66652 0.3334897 966 343 0.64503 0.3549798 623 235 0.62281 0.3771999 388 155 0.59985 0.40015

    Table : Irish Life Table No. 14 2001-2003 (Males)

    Probability that a 90 year old man survives to 95, i.e. 5p90?ST3054 - ST6004 28

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Survival probabilities: example 1

    For a 90 year old man to survive to 95 he must

    Event Probabilitysurvive from 90 to 91 1p90 = 0.78006survive from 91 to 92 1p91 = 0.76297survive from 92 to 93 p92 = 0.74515survive from 93 to 94 p93 = 0.72659survive from 94 to 95 p94 = 0.70730

    Thus

    5p90 = 1p90 1p91 1p92 1p93 1p94 = 0.2278

    ST3054 - ST6004 29

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Factoring survival probabilities

    5p90 = P(90 year old man will survive to 95)

    5p90 = 1p90 1p91 1p92 1p93 1p945p90 = (1p90 1p91 1p92) (1p93 1p94) = 3p90 2p935p90 = (1p90 1p91) (1p92 1p93 1p94) = 2p90 3p92

    s+tpx = spx tpx+ss+tpx = tpx spx+t

    ST3054 - ST6004 30

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Survival probabilities: example 2

    Question: which is bigger, 5p34 or 7p33?

    ST3054 - ST6004 31

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Relating conditional and aggregate survival probabilities

    I Person opting for insurance, age x

    I Survival function is really S(t, x)

    I Actuarial method: study Sx(t) = S(t, x)

    I Sx(t) is the (select) survival function for the r.v. Tx

    I Tx is the future (select) lifetime after age x

    I T = T0 is called the aggregate lifetime

    ST3054 - ST6004 32

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Relating conditional and aggregate survival probabilities

    Sx(t) = probability someone aged x survives for t years or more

    = probability someone survives to age x + t given that

    they have already survived to age x

    = P(Tx > t) = P(T > x + t|T > x)=

    P(T > x + t and T > x)

    P(T > x)

    =S(x + t)

    S(x)

    Equivalently,

    tpx =x+tp0

    xp0

    ST3054 - ST6004 33

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Relating conditional and aggregate survival probabilities

    From this relationship

    tpx =x+tp0

    xp0

    we may thus also derive

    s+tpx =x+s+tp0

    xp0=

    x+sp0

    xp0

    x+s+tp0

    x+sp0= spx tpx+s

    Similary,

    s+tpx = tpx spx+t

    (i.e. the result seen previously on factoring survival probabilities)

    ST3054 - ST6004 34

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The distribution of mortality (age at death)

    SDF: S(x) = 1 F (x) = P(t > x)PDF: f (x) = dF (x)/dx = dS(x)/dx (unconditional)f (x) = limh0+ 1h [F (x + h)F (x)] = limh0+ 1hP(x < T x + h)

    Figure : Distribution of the random variable T : number of deaths vs ageST3054 - ST6004 35

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    Definition: the force of mortality x at age x (0 x ) is

    x = limh0+

    1

    hP (T x + h|T > x)

    I This limit is always assumed to exist

    I x is an instantaneous measure of mortality at age x , or theconditional instantaneous failure rate given survival to time x

    I It is the continuous equivalent of qx

    I Statisticians call it the hazard rate function

    ST3054 - ST6004 36

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    Intuitively, the force of mortality may be seen as the probability ofinstant death for an individual aged x .

    The probability P (T x + h|T > x) is Fx(h) = hqx .For small h, we can ignore the limit and write

    hqx ' h x

    This means that the probability of death in a short time h afterage x is roughly proportional to h. Moreover, the constant ofproportionality for this relationship is x .

    ST3054 - ST6004 37

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    The expected number of deaths in a very short time interval h in avery large population of n individuals aged exactly x is thereforen x h.

    Example: estimate 50 from a very large population of 50-yearolds by counting how many die within 1 hour. The annual rate ofmortality would then be approximated by the proportion of thegroup that had died multiplied by 24 365.

    This would not yield an exact value for 50 due to (i) statisticalfluctuations, (ii) rounding errors, (iii) leap years are ignored,(iv) the1-hour period is still very large for an instantaneous measure.

    ST3054 - ST6004 38

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    There is a close connection to the distribution of mortality:

    x = limh0+

    P (x < T x + h)h

    1

    P(T > x)

    thus the 1-1 relationship between x and f (x)

    x =f (x)

    S(x)=dS(x)/dx

    S(x)

    Inversion: since x = ddx log S(x), we have

    S(x) = exp

    [ x

    0sds

    ]= exp [(x)]

    ST3054 - ST6004 39

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    0.0

    0.5

    1.0

    1.5

    2.0

    haza

    rd e

    stim

    ates

    0 20 40 60 80 100

    Age (years)

    Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x vs age

    ST3054 - ST6004 40

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    .0001

    .001

    .01

    .1

    1

    haza

    rd e

    stim

    ates

    (log

    sca

    le)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0su

    rviv

    al e

    stim

    ates

    0 20 40 60 80 100Age (years)

    Figure : UK mortality 2003-2005 (Males, ONS): x = d log S(x)dx(increasing) and S(x) = exp

    [ x0sds

    ](decreasing) vs age

    ST3054 - ST6004 41

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x

    .0001

    .001

    .01

    .1

    1

    haza

    rd e

    stim

    ates

    (log

    sca

    le)

    0.00

    0.01

    0.02

    0.03

    0.04D

    ensi

    ty

    0 20 40 60 80 100Age (years)

    Figure : UK mortality 2003-2005 (Males, ONS): x (increasing) and f (x)(unimodal) vs age

    ST3054 - ST6004 42

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The force of mortality x+t

    Definition: for x 0 and t > 0, we could define the force ofmortality x+t in two ways:

    (1) x+t = limh0+

    1

    hP (T x + t + h|T > x + t)

    (2) x+t = limh0+

    1

    hP (Tx t + h|Tx > t)

    We often use x+t for a fixed age x and 0 t < x .

    ST3054 - ST6004 43

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Density of select model (pdf of Tx)By def, the distribution function of Tx is Fx(t). Thus its pdf:

    fx(t) =d

    dtFx(t) =

    d

    dtP(Tx t) = 1

    S(x)

    [ ddt

    S(x + t)

    ]= lim

    h0+1

    h(P(Tx t + h) P(Tx t))

    = limh0+

    P(T x + t + h|T > x) P(T x + t|T > x)h

    = limh0+

    {P(T x + t + h) P(T x)

    S(x) h (P(T x + t) P(T x))

    S(x) h}

    = limh0+

    P(T x + t + h) P(T x + t)S(x) h

    ST3054 - ST6004 44

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Density of select model (pdf of Tx)

    Now by multiplying by S(x + t)/S(x + t) we obtain

    fx(t) =S(x + t)

    S(x + t) lim

    h0+P(T x + t + h) P(T x + t)

    S(x) h= Sx(t) lim

    h0+1

    hP (T x + t + h|T > x + t)

    = Sx(t) x+tIn actuarial notation, for a fixed age 0 x , this is equivalentto the following very important relationship

    fx(t) = tpx x+t (0 t < x)

    ST3054 - ST6004 45

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Density of select model (pdf of Tx)

    Alternatively, we may observe that

    fx(t) = ddt

    Sx(t) = ddt

    S(x + t)

    S(x)=

    1

    S(x)

    [ ddt

    S(x + t)

    ]and since

    ddt

    S(x + t) = ddt

    x+t

    f (u)du = f (x + t)

    we have

    fx(t) =f (x + t)

    S(x)=

    S(x + t)

    S(x)

    f (x + t)

    S(x + t)= tpx x+t

    ST3054 - ST6004 46

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Summary

    Tx random future lifetime after age x (continuous r.v.)

    tqx probability that someone aged x dies with t years

    qx probability that someone aged x dies within 1 year

    tpx probability that someone aged x is still alive after t years

    px probability that someone aged x is still alive after 1 year

    Probabilistic notation Actuarial notation

    x h ' P(T x + h|T > x) x h ' hqx

    x =f (x)S(x) or f (x) = x S(x) fX (t) = x+t tpx

    s+tpx = spx tpx+s( s, t > 0)

    ST3054 - ST6004 47

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    I.2 Life table functions

    ST3054 - ST6004 48

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table functions

    A life table provides the expected number of survivors to each agein a hypothetical group of lives. We use:

    lx = expected number of lives at age x

    dx = expected number of deaths between ages x and x + 1

    dx = lx lx+1px =

    lx+1lx

    qx = 1 px = 1 lx+1lx

    =lx lx+1

    lx=

    dxlx

    tpx =lx+tlx

    ST3054 - ST6004 49

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table functions

    In life tables, values are tabulated only for integer age. Whenworking with continuous age variables, assumptions on thevariation of mortality between integer ages are required. Ex:

    I deaths occur uniformly between integer ages

    I the force of mortality is constant between integer ages

    I the Balducci assumption holds

    For integer ages x and 0 t 1, the Balducci assumption statesthat

    1tqx+t = (1 t)qxThis assumption is useful particularly in Cox regression analysis.

    ST3054 - ST6004 50

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table: example including life expectancy

    ...

    ST3054 - ST6004 51

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table: example 1

    x S(x) lx0 1.0000 100,0001 0.97408 97,4082 0.97259 97,2593 0.97160 97,1604 0.97082 97,082. . . . . . . . .109 0.00001 1110 0.00000 0

    I Traditional description of survivalfunction

    I Start e.g. with radix l = 100, 000

    I For subsequent ages:

    lx = xp l

    I Survival probabilities:

    tpx =lx+tlx

    ST3054 - ST6004 52

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table: example 2Estimate l58.25 assuming a uniform distribution of deaths betweenexact ages 58 and 59, from the English Life Table 15 (Males):

    Age x lx58 88,79259 87,805

    There are 88,792 - 87,805 = 987 deaths expected between theages of 58 and 59. Assuming deaths within this interval areuniformly distributed, the number of deaths expected between theages of 58 and 58.25 is

    987/4 = 246.75

    So the expected number of lives at age 58.25 is

    88, 792 246.75 = 88, 545.25ST3054 - ST6004 53

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life tables: uniform distribution of deaths assumption

    Note: Under the assumption of uniform distribution of deaths, thesurviving population at the start of each quarter is decreasing.This assumption therefore implies that the force of mortality isincreasing over the year of age (58,59).

    Result: If deaths are assumed uniformly distributed between theages of x and x + 1, it follows that

    tqx = t qx

    for 0 t 1.

    ST3054 - ST6004 54

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life tables: uniform distribution of deaths assumptionProof: Under the assumption we have by linear interpolation

    lx+t = (1 t)lx + t lx+1for 0 t 1. So

    tqx = 1 lx+tlx

    = 1 (1 t)lx + t lx+1lx

    =t lx t lx+1

    lx= t(1 px)= t qx

    ST3054 - ST6004 55

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Initial and central rates of mortality

    Since qx is the probability that a life alive at age x (the initialtime) dies before age x + 1, it is called an initial rate of mortality.

    Definition: The central rate of mortality mx defined as

    mx =dx 1

    0 lx+tdt=

    qx 10 tpxdt

    This alternative represents the probability that a life alive betweenages x and x + 1 dies. The denominator

    10 tpxdt represents the

    expected amount of time spent alive between ages x and x + 1 bya life alive at age x .

    ST3054 - ST6004 56

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Initial and central rates of mortality

    Note: mx measures the rate of mortality over the whole year fromexact age x to exact age x + 1. By contrast, x measures theinstantaneous rate of mortality at exact age x .

    mx is useful for projecting numbers of deaths, given the number oflives alive in age groups. It constitutes one of the basiccomponents of a population projection.

    In practice, the age groups used in population projection are oftenbroader than exactly one year, in which case the definition of mxmust be adjusted accordingly.

    ST3054 - ST6004 57

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Initial and central rates of mortality

    If x+t is a constant, , between ages x and x + 1, then

    mx =qx 1

    0 tpxdt=

    10 tpxdt 10 tpxdt

    =

    This quantity is close to an occurence-exposure rate statistic

    Number of deaths

    Total time spent alive and at risk

    which can be used to estimate the force of morality x .

    Note from this that mx can never be less than qx .

    ST3054 - ST6004 58

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    I.3 Expected future lifetime

    ST3054 - ST6004 59

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Complete expectation of life

    Definition: The expected future lifetime after age x (orexpectation of life at age x) is defined as E[Tx ] and is denoted ex :

    ex =

    x0

    t tpxx+tdt

    =

    x0

    t

    ( t

    tpx

    )dt

    = [t tpx

    ]x0

    +

    x0

    tpxdt (integration by parts)

    =

    x0

    tpxdt

    (using tpxx+t = fx (t) = tqx/t = tpx/t and[t tpx

    ]x0

    = 0)

    ST3054 - ST6004 60

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Curtate future lifetime

    Definition: The curtate future lifetime of a life aged x isKx = [Tx ], where the square brackets [.] denote the integer part.

    Kx is a discrete r.v. taking values on {0, 1, 2, . . . , x}, withprobability function

    P(Kx = k) = P(k Tx < k + 1)= P(k < Tx k + 1) (assuming Fx (t) is continuous in t)= kpx qx+k

    P(Kx = k) is often denoted k|qx (k deferred qx), as we considerhere deferring the event of death until the year that begins in kyears from now.

    ST3054 - ST6004 61

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Curtate expectation of lifeDefinition: The curtate expectation of life, denoted ex , is

    ex = E[Kx ]

    We have

    ex =

    [x]k=0

    k kpx qx+k

    = 1px qx+1

    +2px qx+2 + 2px qx+2

    +3px qx+3 + 3px qx+3 + 3px qx+3

    + . . .

    =

    [x]k=1

    [x]j=k

    jpx qx+j =

    [x]k=1

    kpx

    ST3054 - ST6004 62

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Curtate expectation of life

    Additional years Probabilityfrom age X

    1 1px qx+1(survives one year and then dies in the next year)

    2 2px qx+2... ...

    k kpx qx+k... ...

    Sum

    kpx qk+1 (k = 1 to x)

    ST3054 - ST6004 63

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Relationship between ex and exConsidering the two formulae

    ex =

    x0

    tpxdt and ex =

    [x]k=1

    kpx

    the complete and curtate expectations of life are related by theapproximate equation

    ex = ex +1

    2

    I Define Jx = Tx Kx to be the random lifetime after the highestinteger age to which a life x survives. Approximately, E[Jx ] = 1/2(assuming deaths occur uniformly within each year of age), andsince E[Tx ] = E[Kx ] + E[Jx ], we have ex ' ex + 1/2.

    ST3054 - ST6004 64

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table: example 3

    Figure : Irish Life Table No. 13, 1995-97 (Males)ST3054 - ST6004 65

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Life table: example 3 Why e81 6= e80 1?How would you write e81 w.r.t e80 and px?

    Figure : Irish Life Table No. 13, 1995-97 (Males), continuedST3054 - ST6004 66

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Future lifetimes: varianceThe variances of the complete and curtate future lifetimes are

    Var[Tx ] =

    x0

    t2tpxx+tdt e2x

    Var[Kx ] =

    [x]k=0

    k2kpxqx+kdt e2x

    These expressions may be useful to:

    I derive the variance of functions based on future lifetimes

    I further quantify the likely variation in some index of interest(e.g. the profits from a life insurance policy, or the cost ofproviding a benefit from a pension scheme)

    ST3054 - ST6004 67

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Uses of the expectation of life

    The expectation of life is often used as a measure of the standardof living and health care in a given country.

    The complete expectation of life is used e.g. for premiumcalculations, and for longevity studies.

    35-40 Angola, Zambia40-45 Afghanistan, Malawi45-50 Nigeria, Rwanda, South Africa, Zimbabwe50-55 Cameroon, Ethiopia, Uganda55-60 Bangladesh, Ghana, Haiti, Kenya, Russia60-65 Botswana, Burma, Guyana, Pakistan, Yemen65-70 Brazil, Guatemala, India70-75 Barbados, China, Serbia85-80 Australia, Japan, New Zealand, USA,

    most Western European countries

    Average life expectancy at birth for males (2009, CIA World Factbook and IFA 2011)

    ST3054 - ST6004 68

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    I.4 Some important formulae

    ST3054 - ST6004 69

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    A formula for tqx

    tqx =

    t0

    spxx+sds

    This follows from the relationship fx(t) = tpxx+t . For each times [0, t], the integrand is the product of

    (i) spx , the probability of surviving to age x + s

    (ii) x+s , which is approximately equal to dsqx+s , the probabilityof dying just after age x + s

    These probabilities are mutually exclusive and are thus just addedup (or in the limit integrated).This result allows deriving animportant relationship between tpx and x .

    ST3054 - ST6004 70

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    A formula for tpx

    tpx = exp

    { t

    0x+sds + c

    }

    ST3054 - ST6004 71

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    A formula for tpxProof: This follows from

    sspx =

    ssqx = fx(s) = spx x+s

    Note that

    slog spx =

    s spx

    spx= x+s

    hence, for some constant of integration c (which is 0), t0

    slog spxds =

    t0x+sds + c

    Since 0px = 1 we have[

    log spx]t

    0= log tpx and

    tpx = exp{ t

    0x+cds + c} = exp{

    t0x+cds}

    (since e0 = 1, we use c = 0)ST3054 - ST6004 72

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Summary: integral expressions for tqx and tpx

    tqx = t

    0 spxx+sds

    tpx = exp{ t0 x+sds + c}

    (yes, it is so important that we repeat it)

    ST3054 - ST6004 73

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    I.5 Simple parametric survival models

    ST3054 - ST6004 74

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Parametric models?

    .0001

    .001

    .01

    .1

    1

    haza

    rd e

    stim

    ates

    (log

    sca

    le)

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0su

    rviv

    al e

    stim

    ates

    0 20 40 60 80 100Age (years)

    Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x (increasing) and S(x) (decreasing) vs age I WHAT MODEL?

    ST3054 - ST6004 75

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Parametric models?

    .0001

    .001

    .01

    .1

    1

    haza

    rd e

    stim

    ates

    (log

    sca

    le)

    0.00

    0.01

    0.02

    0.03

    0.04D

    ensi

    ty

    0 20 40 60 80 100Age (years)

    Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x (increasing) and f (x) (unimodal) vs age I WHAT MODEL?

    ST3054 - ST6004 76

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The exponential model

    I Various survival models may be used

    I In simpler models, the distribution of future lifetime uses asmall number of parameters

    I One of the simplest models is the exponential model

    I In this model the hazard rate is constant, i.e. x = 0,and for t 0

    tpx = Sx(t) = e t0 ds = e

    [s

    ]t0 = et

    and

    tqx = 1 tpx = 1 etST3054 - ST6004 77

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The exponential model

    We also have

    f (t) = et

    E[T ] =1

    Var[T ] =1

    2

    I Not appropriate for human survival over broad ranges

    I Can be used over short ranges

    I Is used for machine reliability

    ST3054 - ST6004 78

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The exponential model

    Q1 If x = 0.001 (constant) between ages 25 and 30, calculatethe probability that a life aged exactly 25 will survive to age 30

    Q2 If x = 0.02 at all ages, calculate the age x for which

    xp0 = 0.5. What does this age represent?

    Q3 Given that e50 = 30 and 50+t = 0.005 for 0 t 1, what isthe value of e51?

    ST3054 - ST6004 79

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The exponential model

    0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    0 10 20 30 40 50 60 70 80 90 100

    Figure : Exponential models in function of age (in years), withS(t) = et for = 0.10, 0.05, 0.03. Then E[T ] = 1/ = 10, 20, 33 resp.

    ST3054 - ST6004 80

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The Weibull model

    I The Weibull model is a simple extension of the exponentialmodel

    I The exponential model is a special case of Weibull ( = 1)

    I In this model the survival function S(t) is

    S(t) = et

    I Weibull hazard is monotonically increasing (or decreasing):

    t = t1

    ST3054 - ST6004 81

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The Weibull model

    I Since t = ddt log [S(t)] we have

    t = ddt

    [t1] = t1

    I Weibull is a special case of Gamma distribution

    I Its moments can be calculated relatively easily

    ST3054 - ST6004 82

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The Gompertz distribution (1825)I Gompertz law of mortality (t 0,B > 0, c > 1):

    S(t) = eB

    log(c)(1ct)

    t = Bct

    I Gompertz law yields an exponentially increasing hazard ratethroughout life

    log(t) = log(B) + t log(c)

    t = Bet log(c)

    I Often a reasonable assumption for middle and older ages

    I Simple expressions for mean and variance are not available

    ST3054 - ST6004 83

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The Gompertz distribution (1825)

    .0001

    .001

    .01

    .1

    1ha

    zard

    est

    imat

    es

    0 10 20 30 40 50 60 70 80 90 100Age (years)

    Figure : The hazard function plotted on log-scale here is approximatelylinear beyond 30 years of age

    ST3054 - ST6004 84

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    The Makeham distribution (1860)

    I Makehams law of mortality (t 0,B > 0, c > 1,A > B):

    S(t) = eB

    log(c)(1ct)At

    t = A + Bct

    I The constant term is sometimes interpreted as an allowancefor accidental deaths

    I Suggests that part of the hazard rate is age-independent

    I Otherwise same as Gompertz law

    I Simple expressions for mean and variance are again notavailable

    ST3054 - ST6004 85

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Calculating the parameter values

    If a life table is know to follow Gompertz law, the parameters Band c can be determined given the values of t at any two ages.

    In the case of a life table following Makehams law, the parametersA,B and c can be determined given the values of t at any threeages.

    Question: given that 50 = 0.017609 and 55 = 0.028359,calculate B and c for a force of mortality t known to followGompertz law.

    ST3054 - ST6004 86

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Survival probabilitiesSurvival probabilities tpx can be found using

    tpx = exp

    ( t

    0x+sds

    )

    Gompertz law: for g = exp(B

    log(c)

    ), we have

    tpx = gcx (ct1)

    Makehams law: for g = exp(B

    log(c)

    )and s = exp(A), we have

    tpx = stg c

    x (ct1)

    ST3054 - ST6004 87

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Survival probabilities

    Proof ...?

    ST3054 - ST6004 88

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Gompertz-Makeham family

    The force of mortality can be modeled using one of theGompertz-Makeham curves. This family of functions is of the form

    GM(r , s) = 1 + 2t + + r tr1 + er+1+r+2t++r+s ts1

    where the 1, . . . , r+s are constants independent of t.

    This is the most widely used form of the GM family. Anotherpopular form is

    x = GM(r , s) = poly1(t) + epoly2(t)

    where t is a linear function of x and poly1(t) and poly2(t) arepolynomials of degrees r and s respectively.

    ST3054 - ST6004 89

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Suitable distribution families (summary)Key constraint for a parametric model:

    the domain of f (t) must be R+.

    Here is a summary of some suitable families:

    Exp Weibull Gompertz Makeham Log-logistic

    f (t) et t1et

    Bc teB(1ct )

    log(c) (A+Bc t)eB(1ct )

    log(c) t1

    (1+t )2

    F (t) 1 et 1 et 1 eB(1ct )

    log(c) 1 eB(1ct )

    log(c)At

    1 11+t

    S(t) et et

    eB(1ct )

    log(c) eB(1ct )

    log(c)At 1

    1+t

    t t1 Bc t A+Bc t t

    11+t

    ST3054 - ST6004 90

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Exam-style question (1/2)1. An investigation is undertaken into the mortality of menbetween exact ages 50 and 55 years. A sample of n men is followedfrom their 50th birthday until their either die or they reach their55th birthdays.

    The force of mortality (or hazard rate) is assumed to have thefollowing form

    x = + x

    where and are parameters to be estimated and x is measuredin years since the 50th birthday.

    (a) Derive an expression for the survival function between ages 50and 55 years

    (b) Sketch this on a graphST3054 - ST6004 91

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models

    Exam-style question (2/2)

    [...] The force of mortality (or hazard rate) is assumed to have thefollowing form

    x = + x

    where and are parameters to be estimated and x is measuredin years since the 50th birthday.

    (c) Comment on the appropriateness of the assumed form of thehazard function for modelling mortality over this age range

    (d) If there were 100,000 men aged 50 then how many deathswould you expect between ages 50 and 55 years

    (e) Describe the distribution of the number of deaths betweenages 50 and 55 years among the n men

    ST3054 - ST6004 92

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Section II

    Estimating lifetime distributionfunctions

    ST3054 - ST6004 93

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    II.1 Statistical inference

    ST3054 - ST6004 94

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Introduction

    I Previous section introduced the continuous r.v. T of futurelifetime

    I This section presents a methodology for using observationsfrom an investigation to estimate the lifetime distributionfunction F (t) = P(T t) empirically

    I Statistical properties for the estimates may be derived toconstruct variances and confidence intervals

    I The possibility of incomplete data will also be considered

    I Defining a decrement of interest by death can easily beextended to the analysis of other decrements such as sickness,mechanical breakdowns, etc.

    ST3054 - ST6004 95

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Overview

    I Empirical Survival Function

    I Parametric MLE

    I Non-parametric MLE

    I Kaplan-Meier survival function

    ST3054 - ST6004 96

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Estimating the lifetime distribution function

    Statistical inference: given some mild conditions on thedistribution of T , we can obtain all information by estimatingF (t), S(t), f (t) or (t) for all t 0.

    Simple experiment:

    I Observe a large number of new-born lives

    I The proportion alive at age t > 0 provides an estimate of S(t)

    I Use a step function to estimate S(t) with S(t)

    ST3054 - ST6004 97

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Estimating the lifetime distribution function

    I This is known as the empirical distribution function of T

    I It is a non-parametric approach to estimation

    I It is not necessary to assume that T is a member of anyparametric family

    ST3054 - ST6004 98

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Estimating the lifetime distribution function

    Non-parametric approach:

    I No prior assumption about the distribution shape or form

    I Use the data to estimate this shape/form

    Parametric approach:

    I The distribution is assumed to belong to a certain family (e.g.exponential)

    I Use the data to estimate the appropriate parameters of thisfamily (e.g. mean and variance)

    ST3054 - ST6004 99

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Estimating the lifetime distribution function

    Ex: A life insurance company prefers to base its premiumcalculations on a smooth estimate to ensure the premiums changegradually from one age to the next, without sudden jumps.

    I A larger sample yields a smoother estimate

    I Further smoothing also possible

    ST3054 - ST6004 100

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Example

    Time Died Alive SDF

    5 1 9 0.910 1 8 0.815 1 7 0.720 1 4 0.420 1 4 0.420 1 4 0.435 1 3 0.340 0 3 0.340 0 3 0.340 0 3 0.3

    ST3054 - ST6004 101

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Example

    0.0

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1.0

    survival

    0 5 10 15 20 25 30 35 40Time

    Figure : Observed proportion S(t) surviving a given time tST3054 - ST6004 102

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Example

    0.00

    0.25

    0.50

    0.75

    1.00

    0 10 20 30 40analysis time

    Empirical Survival

    Note step down of 0.1

    Note step down of 0.3 (3 deaths)

    Note 3 of the 10 (0.3) are still alive at 40 months.

    Figure : Observed proportion S(t) surviving a given time t

    ST3054 - ST6004 103

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Estimating the lifetime distribution function

    Limitations:

    I Difficult to find a satisfactory group of lives for study

    I The experiment would take about 100 years to complete

    I Deaths of all the lives must be recorded (highly impractical)

    I Censoring is therefore nearly always required

    I All we know in respect of some lives is that they died after acertain age

    ST3054 - ST6004 104

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    CohortsI Complete data: All units under observation until failure

    I Incomplete data: units can withdraw/become lost fromobservation before death

    I Analysis of complete data is far simpler, so we do this first

    I Cohort: All units come under observation at time t = 0

    I No entrants after t = 0

    I All are observed until failure/death:I lab expt. with mice injected with nicotine; t = 0 is beginning

    of experiment

    I People diagnosed with certain type of cancer; t = 0 on day ofdiagnosis

    ST3054 - ST6004 105

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Follow-up timeI Important to distinguish between calendar/chronological time

    and time under observation (follow-up time)

    I Patient A diagnosed on March 1, 1990, dies on March 11,1991

    I Patient B diagnosed on July 1, 1991, dies on August 1, 1991

    0 365 730

    A

    B

    1/1/90 1/1/92 1/1/91

    A B

    ST3054 - ST6004 106

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    II.2 Censoring

    ST3054 - ST6004 107

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    CensoringI Censoring results in loss of data

    I Depending on the type of censoring (informative), it may alsoyield biased mortality rates

    I An observational plan is required in a mortality investigation,to specify start and end dates and categories of lives to beincluded

    I In e.g. medical statistics, non-parametric estimation is veryimportant

    I Experiments can be amended to allow for censoring

    I Otherwise, inference must be based on data with shortertimes (e.g. 3 or 4 years)

    ST3054 - ST6004 108

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Censoring

    If inference is based on data with shorter times:

    I We no longer observe the same cohort throughout their jointlifetimes

    I We might not be sampling from the same distribution

    I Model assumptions may thus need to be widened so that themortality of lives born in year y is modelled by T y

    I In practice, the investigation is divided up into single years ofage (outside scope of ST3054)

    ST3054 - ST6004 109

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Censoring

    I Observing lives between integer ages x and x + 1, and limitingthe period of investigation, are also forms of censoring

    I Censoring might still occur at unpredictable times(e.g. lapsing of a policy)

    I Time of observation corresponding to loss of survivors isknown: either age x + 1 or end of investigation

    ST3054 - ST6004 110

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Censoring mechanisms

    Data are censored if we do not know the exact values of eachobservation but we do have information about the value of eachobservation in relation to one or more bounds (e.g. we know that aperson was still alive at age 20 at end of investigation).

    I Censoring is the key feature of survival data

    I Survival analysis may be seen as the analysis of censored data

    I Censoring mechanisms play an important role in statisticalinference

    ST3054 - ST6004 111

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Censoring mechanisms

    Most common censoring assumptions (not all mutually exclusive):

    I Right censoring

    I Left censoring

    I Interval censoring

    I Random censoring

    I Informative and non-informative censoring

    I Type I censoring

    I Type II censoring

    ST3054 - ST6004 112

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Right censoring

    I Data are right censored if obs in progress are cut short

    I Most common form of censoring in actuarial investigations

    I Ex: end of mortality study before all lives observed have died

    I Person still alive when investigation ends are right censored

    I We only know that their lifetime exceeds some value

    I Ex: life insurance policy holders surrender their policy, activelives of a pension scheme retire, endowment assurance policiesmature

    ST3054 - ST6004 113

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Right censoring

    ST3054 - ST6004 114

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Left censoring

    I Data are left censored if we cannot know when entry into thestate we wish to observe took place

    I Ex: medical studies where time elapsed between onset andbaseline diagnosis is unknown

    I Ex: estimating functions of exact age without knowledge ofDOB, estimating functions of exact policy duration withoutknowledge of exact date of policy entry, estimating functionsof duration since onset of sickness without knowledge of exactdate of start of sickness

    I Left censoring is different to left truncation

    ST3054 - ST6004 115

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Left censoring

    ST3054 - ST6004 116

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    TruncationI Truncation: when estimating functions of exact age without

    info from before the start of investigation period, or beforeentry date of policy, etc.

    I Observed data: time of occurrence (or censored observationof) the event

    I Ascertainment time B, earliest initiation time 0

    I Ex: estimate incubation distribution based on retrospectivesamples of AIDS cases with known infection times

    I Proba: for an individual observed to have experienced theevent after t time units

    f (t)

    F (B t)ST3054 - ST6004 117

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Truncation

    ST3054 - ST6004 118

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Interval censoring

    I Data are interval censored if the observational plan only allowsto date an event of interest within a time interval

    I Ex: actuarial studies where only calendar year of death isknown

    I Right and left censoring are special cases of interval censoring

    I Ex: estimating functions of exact age when deaths are knownup to nearest birthday only

    I Ex: knowing calendar date of death and calendar year of birth(example of left censoring and also interval censoring since weonly know the lifetime falls within a certain range)

    ST3054 - ST6004 119

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Interval censoring

    ST3054 - ST6004 120

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Random censoring

    I Censoring is random if the time Ci at which observation of thei th lifetime is censored is a random variable

    I Obs will be censored if Ci < Ti where Ti is the randomlifetime of the i th life

    I Ex: when individuals may leave the observation by a meansother than death, and where the time of leaving is not knownin advance

    I Ex: life insurance withdrawals, emigration from a population,members of a company pension scheme may leave voluntarilywhen moving to another employer

    ST3054 - ST6004 121

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Random censoring

    I Random censoring is a special case of right censoring

    I The case in which the censoring mechanism is a seconddecrement of interest gives rise to multiple decrement models

    I Ex: suppose that lives can leave a pension scheme throughdeath, age retirement or withdrawal. The rates of decrementfor all these causes of decrement can be estimated by amultiple decrement model.

    ST3054 - ST6004 122

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Informative and non-informative censoring

    I Censoring is non-informative if it gives no information aboutthe lifetimes {Ti}

    I If random censoring: the independence of each pair Ti ,Ci issufficient to ensure non-informative censoring

    I Informative censoring is more difficult to analyse

    I Essentially this is because the resulting likelihoods cannotusually be factorised (recall that statistical independencegreatly simplifies calculation of likelihoods)

    ST3054 - ST6004 123

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Informative and non-informative censoring

    Examples of informative censoring:

    I Withdrawal of life insurance policies (likely to be in betteraverage health than those who do not withdraw). Themortality rates of the lives that remain in the at-risk group arelikely to be higher than the mortality rates of the lives thatsurrender their policy.

    I Ill-health retirements from pension schemes (likely to be inworse average health than continuing members). Mortalityrates of those who remain in pension scheme are likely to belower than those of the lives that left through ill-healthretirement.

    ST3054 - ST6004 124

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Informative and non-informative censoring

    Example of non-informative censoring:

    I The end of the investigation period, because it affects all livesequally, regardless of their propensity to die at that point

    ST3054 - ST6004 125

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Type I censoring

    I Type I censoring occurs if the censoring times {Ci} are knownin advance

    I This is a degenerate case of random censoring

    I Also a special case of right censoring

    I Lives censored at end of investigation period might also beconsidered as an example of Type I censoring

    ST3054 - ST6004 126

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Type I censoring

    Examples of right censoring mechanisms:

    I When estimating functions of exact age, individuals are notfollowed up anymore once they have reached 60

    I When lives retire from a pension scheme at normal retirementage (if this is a pre-determined exact age)

    I When estimating functions of policy duration, observing onlyindividuals up to their 10th policy anniversary

    I When measuring functions of duration since having aparticular medical operation, and only observing people for amax of 12 mths from date of operation

    ST3054 - ST6004 127

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Type II censoring

    I Type II censoring is present if observation is continued until apredetermined number of deaths has occurred

    I Can simplify the analysis: non-random number of events

    I Ex: when a medical trial is ended after 100 lives on aparticular course of treatment have died

    I Observational plan is likely to introduce censoring

    I Consideration should be given to the effect on the analysis inspecifying this plan

    I Censoring might also depend on the results of theobservations to date (oncologic trials)

    ST3054 - ST6004 128

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Type I and II censoring

    I Many actuarial investigations are characterised by acombination of random and Type I censoring

    I Ex: in life office mortality studies where policies rather thanlives are observed, and observation ceases either when a policylapses (random cens) or at some predetermined date markingthe end of investigation (Type I cens)

    I Type I and Type II censoring are most frequently met with inthe design of medical survival studies

    I See Question 8.4

    ST3054 - ST6004 129

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    II.3 The Kaplan-Meier (product limit) model

    ST3054 - ST6004 130

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: introduction

    I Derive the empirical distribution function from the data toallow for censoring

    I Consider lifetimes as a function of time t without specifying astarting age x

    I Applies equally to new-born lives, lives aged x at outset, oflives sharing a common property at time t (e.g. diagnosis of amedical condition)

    Note: patient age may be important but not the sole determinant,and is usually treated as an explanatory variable in a multivariateregression model (cf. next section). Ex: measure mortalityamongst patients suffering from a virulent tropical disease.

    ST3054 - ST6004 131

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptions

    I Suppose we observe a population of n lives in the presence ofnon-informative right censoring, and suppose we observe mdeaths

    I Non-informative censoring mortality of the lives alive in thegroup is not systematically higher or lower than that of thecensored lives

    I Estimates of the distribution and survival functions will bebiased if informative censoring actually occurs

    I If informative censoring is allowed, the lifetimes and censoringtimes are no longer independent

    ST3054 - ST6004 132

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptions

    I Define t0 = 0 and tk+1 = and let t1 < < tk , k m, bethe ordered times at which deaths were observed

    I k m: more than one death may be observed at a singlefailure time

    I Assume dj deaths are observed at time tj (1 j k) so thatd1 + + dk = m

    I Observation of the remaining n m lives is censored (i.e.these remaining lives are not tracked further)

    ST3054 - ST6004 133

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptions

    I Assume cj lives are censored (i.e. removed from investigation)between times tj and tj+1 (0 j k)

    I Then c0 + c1 + + ck = n mI Let dj be the number of individuals experiencing the event at

    duration tj

    I Let nj be the risk of experiencing the event just prior toduration tj

    ST3054 - ST6004 134

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptionsThe Kaplan-Meier (KM) estimator of the survivor function adoptsthe following conventions:

    (a) The hazard of experiencing the event is zero at all durationsexcept those where an event actually happens in our sample

    (b) The hazard of experiencing the event at any particular

    duration tj when an event takes place is equal todjnj

    (c) For any 0 j k , if cj > 0, thenI If dj = 0, the persons censored are removed from observation

    at duration tj (at which censoring takes place)

    I If dj > 0, persons who are censored at tj are assumed to becensored immediately after the events have taken place (sothat they are still at risk at that duration)

    ST3054 - ST6004 135

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptionsExample [IFA notes]: a group of 15 lab rats are injected with anew drug. They are observed over the next 30 days. The followingevents occur:

    Day Event3 Rat 4 dies from effects of drug4 Rat 13 dies from effects of drug6 Rat 7 gnaws through bars of cage and escapes11 Rats 6 and 9 die from effects of drug17 Rat 1 killed by other rats21 Rat 10 dies from effects of drug24 Rat 8 freed during raid by animal liberation activists25 Rat 12 accidentally freed by journalist reporting earlier raid26 Rat 5 dies from effects of drug30 Investigation closes. Remaining rats hold street party.

    ST3054 - ST6004 136

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    3 4

    6

    11 (2 rats)

    17 24

    26

    25 30 (5 rats)

    21

    censored

    died Day

    t1 t2 t3 t5 t4

    Day Event3 Rat 4 dies from effects of drug4 Rat 13 dies from effects of drug6 Rat 7 gnaws through bars of cage and escapes11 Rats 6 and 9 die from effects of drug17 Rat 1 killed by other rats21 Rat 10 dies from effects of drug24 Rat 8 freed during raid by animal liberation activists25 Rat 12 accidentally freed by journalist reporting earlier raid26 Rat 5 dies from effects of drug30 Investigation closes. Remaining rats hold street party.

    ST3054 - ST6004 137

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptionsI n = 15 lives under investigation, m = 6 drug-related deaths

    I k = 5 death time points; times at which deaths wereobserved: t1 = 3, t2 = 4, t3 = 11, t4 = 21, t5 = 26

    I Number of deaths observed at each failure time:d1 = 1, d2 = 1, d3 = 2, d4 = 1, d5 = 1

    I n m = 9 lives did not die due to drugsI Number of lives censored:

    c0 = 0, c1 = 0, c2 = 1, c3 = 1, c4 = 2, c5 = 5(k

    j=0 cj = n m)I Number of lives and at risk at time ti :

    n1 = 15, n2 = 14, n3 = 12, n4 = 9, n5 = 6

    ST3054 - ST6004 138

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptions

    I We can see the approach as a partition of duration into verysmall intervals

    I The risk of the event happening is 0 at those intervals whereno event occurs

    I The data offers no evidence to suppose anything else

    I In those intervals in which events do occur, the hazard isassumed constant (i.e. piecewise exponential) within eachinterval

    I The hazard is allowed to vary between eventful intervals

    ST3054 - ST6004 139

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    The Kaplan-Meier model: assumptionsI Recall that if x+t = , then Sx(t) = tpx = et

    I The survival function is exponential over each short intervalover which the force of mortality (or hazard) is constant

    I The hazard within the interval containing event time tj isestimated for 1 j k as

    j =djnj

    I This is a non-parametric MLE that maximises

    kj=1

    djj (1 j)njdj (product of independent binomial likelihoods)

    I In eventless intervals, dj = 0 and the hazard becomes 0

    ST3054 - ST6004 140

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Extending the force of mortality to discrete distributions

    Definition: Suppose F (t) has probability masses ar the pointst1, . . . , tk . Then the discrete hazard function is defined as

    j = P[T = tj |T tj ] (1 j k)

    I j may be seen as the proba that a given individual dies onday tj , given that they were still alive at the start of that day

    ST3054 - ST6004 141

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Extending the force of mortality to discrete distributionsEx: butterflies of a certain species have short lives. After hatching,each butterfly experiences a lifetime defined by the followingprobability distribution:

    Lifetime (days) Probability1 0.102 0.303 0.254 0.205 0.15

    Calculate j for j = 1, 2, ..., 5 (to 3 decimal places) and sketch agraph of the discrete hazard function.

    ST3054 - ST6004 142

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Calculating the KM estimate of the survival function

    If we assume that T has a discrete distribution then

    1 F (t) =tjt

    (1 j)

    Since 1 F (t) = S(t), we can estimate the survival function usingthe formula

    S(t) =tjt

    (1 j)

    This is the Kaplan-Meier estimator.

    ST3054 - ST6004 143

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Calculating the KM estimate of the survival function

    To compute S(t), we multiply the survival probabilities within eachof the intervals up to and including duration t. The survivalprobability at time tj is estimated by

    1 j = nj djnj

    =number of survivors

    number at risk

    So the probability of survival at time t is estimated by

    S(t) =tjt

    nj djnj

    The KM estimate is also called the product limit estimate as aresult of this expression.

    ST3054 - ST6004 144

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Calculating the KM estimate of the survival functionTo summarize the approach:

    I Finer and finer partitions of the time axis are chosen(1 j k)

    I (1 F (t)) is estimated as the product of the probabilities ofsurviving each sub-interval

    I Then the KM estimate is obtained usingj = P[T = tj |T tj ], as the mesh of the partition tends to 0

    I This KM estimate is constant after the last duration at whichan event occurred: it is not defined at durations longer thanthe duration of the last censored observation

    I Only those at risk at {tj} contribute to the estimateST3054 - ST6004 145

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Calculating the KM estimate of the survival function

    I It is unnecessary to start observation on all lives at the sametime or age

    I The estimate is valid for data truncated from the left,provided truncation is non-informative in the sense that entryto the study at a particular age or time is independent of theremaining lifetime

    ST3054 - ST6004 146

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Calculating the KM estimate of the survival functionEx: using the data from the observation of lab rats, calculate theKaplan-Meier estimate of F (t).

    j tj dj nj j = dj/nj (1 j) 1jk=1(1 k)1 3 1 15 0.0667 0.9333 0.0667

    2 4 1 14 0.0714 0.9286 0.1333

    3 11 2 12 0.1667 0.8333 0.2778

    4 20 1 9 0.1111 0.8889 0.3580

    5 26 1 6 0.1667 0.8333 0.4650

    F (t) =

    0 for 0 t < 30.0667 for 3 t < 40.1333 for 4 t < 110.2778 for 11 t < 210.3580 for 21 t < 260.4650 for t 26

    ST3054 - ST6004 147

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    A graphical approach

    I We can use a graphical approach to carry out KM estimation

    I Ex: derive an estimate S(t) of the survival function S(t) toobtain F (t) = 1 S(t)

    I The graph of S(t) is a step function starting at 1 andstepping down at each new death

    I The heigh of each step must be calculated to specify S(t)

    ST3054 - ST6004 148

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    A graphical approach: example (lab rat data)

    ll

    ll

    l

    Estimate of the survival function

    Time

    Surv

    ival

    pro

    babi

    lity

    l

    0 3 11 21 26 304

    0.00

    0.25

    0.50

    0.75

    1.00

    S(t) t

    1 0 t < 314/15 3 t < 4

    14/1513/14 4 t < 1113/1510/12 11 t < 21

    ... ...

    ST3054 - ST6004 149

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Comparing lifetime distributions

    I Ex: compare lifetime distributions of two populationsfollowing different drug treatments

    I Use statistical properties of KM estimates for comparison

    I Greenwoods formula for MLE F :

    Var[F (t)

    ](

    1 F (t))2

    tjt

    djnj(nj dj)

    I Accurate if large # of uncensored data (20+) and for0 S(t) 1; otherwise estimates may be beyond 0 or 1

    I This variance estimate can be used to construct CIs

    ST3054 - ST6004 150

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Points to note on the KM estimator

    I KM estimator is based on non-informative censoring

    I Value of estimator not well defined if last data point iscensored

    I With no censoring, KM is the same as empirical SDF

    I KM is implemented in most statistical packages, including R

    I Can also be derived from the theory of counting processes

    ST3054 - ST6004 151

  • IntroductionSurvival models

    Lifetime distribution functionsCox regression

    Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function

    Pointwise confidence intervals for KM estimator

    [S(x)1/, S(x)

    ], where = exp