st3054_slides.pdf
-
Upload
damien-ashwood -
Category
Documents
-
view
9 -
download
0
Transcript of st3054_slides.pdf
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
ST3054/ST6004 - Survival Analysis
Eric Wolsztynski
Department of StatisticsSchool of Mathematical SciencesUniversity College Cork, Ireland
2014-2015Version 1.0
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Acknowledgment
These lecture notes follow and adapt a section of the Institute andFaculty of Actuaries CT4 notes, in respect of the exemptionprogramme in place for ST3054 and ST6004. However thisdocument does not reproduce the CT4 notes fully nor exactly, andalso presents notions, developments and examples not found inthose notes.
These notes also use a large part of former ST3054 notes writtenby Dr Kingshuk Roy Choudhury and Dr Tony Fitzgerald for aprevious course syllabus.
For any comment or query about this document, please [email protected]
ST3054 - ST6004 2
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Timetable and assesment
I ST3054 is part of ST6004
I ST3054 is taught in Semester 1
Lectures / Tutorials: Tuesdays 2-3pm in G15Fridays 11am-12pm in G13
Tutorials / Practicals: Monday 10-11am in lab G34
I Continuous assesment: 2 home assignments (10 marks each)
I 90-minute exam in December (80 marks)
ST3054 - ST6004 3
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Module objective and content
I Module Objective:To develop techniques for the analysis of survival data
I Module Content:1) Parametric models of survival, use of life tables, types of
censoring, hazard functions
2) Non-parametric estimation of hazard and survival functions,Kaplan-Meier and Nelson-Aalen estimators
3) Proportional hazards model with covariates
I Use of software
ST3054 - ST6004 4
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Learning Outcomes
I Explain the concept of a survival model and be able todescribe the more commonly used mortality / survivalfunctions and apply these to solve practical problems
I Define the distribution and density functions of the randomfuture lifetime, the survival function, the force of mortalityand derive relationships between them
I State the Gompertz and Makeham laws of mortality and beable to apply both to solve practical problems
I Describe various ways in which lifetime data might becensored and be able to describe the various problemsintroduced by censoring
ST3054 - ST6004 5
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Learning Outcomes
I Describe both the Kaplan-Meier and Nelson-Aalen estimate ofthe survival function in the presence of censoring, explain howit arises as a maximum likelihood estimate, compute it fromtypical data and estimate its variance
I Describe the Cox model for proportional hazards, derive thepartial likelihood estimate in the absence of ties, and state itsasymptotic distribution
I Interpret the effect of covariates on the hazard of a populationat risk in the Cox proportional hazards model
I Explain and apply the concept of proportional hazards modelselection using likelihood ratio tests
ST3054 - ST6004 6
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Related material
I Pre-requisite: ST2053, ST2054
I Co-requisite: ST3053
I Textbook: CT4 Notes from Institute and Faculty of Actuaries- Contact Damian or Linda- Before the end of September
I ST3054 syllabus also connects with ST3074
I ST3054 used to include IFA CT4 content:
Ch4 The two-state Markov model
Ch10 The Binomial and Poisson models
ST3054 - ST6004 7
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Outline
I Introduction
II Survival models (Ch7 of CT4 notes 2013)
III Lifetime distribution functions (Ch8 of CT4)
IV The Cox regression model (Ch9 of CT4)
ST3054 - ST6004 8
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Introduction
ST3054 - ST6004 9
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Demography & public health
I How long will we live?
I Do men live longer than women?
I How do lifestyle factors affect your lifespan?
I How long will our children live?
I Which people have the longest lifespan?
I What implications does an increasing lifespan have?
ST3054 - ST6004 10
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Life expectancy in Irelandhttp://www.cso.ie/Quicktables/GetQuickTables.aspx?FileName=VSA30.
asp&TableName=Life+Expectancy&StatisticalProduct=DB_VS
(CSO QuickTables - VSA30 - Life Expectancy)
At Age 0 10 20 35 55 65 751926 57.4 55.2 46.4 34.4 19.1 12.8 7.72006 76.8 67.2 57.5 43.3 24.8 16.6 9.8
Table : Life expectancy (Males)
At Age 0 10 20 35 55 65 751926 57.9 54.9 46.4 34.7 19.6 13.4 8.42006 81.6 72.0 62.1 47.4 28.5 19.8 12.1
Table : Life expectancy (Females)
ST3054 - ST6004 11
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Life expectancy worldwidehttps://www.cia.gov/library/publications/the-world-factbook/rankorder/
2102rank.html (CIA World Factbook: Country Comparison :: Life expectancy at birth)
Rank Country L.E.1 Monaco 89.68
2 Macau 84.43
3 Japan 83.91
4 Singapore 83.75
5 San Marino 83.07
6 Andorra 82.50
7 Guernsey 82.24
8 Hong Kong 82.12
9 Australia 81.90
10 Italy 81.86
Rank Country L.E.212 Mozambique 52.02
213 Lesotho 51.86
214 Zimbabwe 51.82
215 Somalia 50.80
216 Central Afr. Rep. 50.48
217 Afghanistan 49.72
218 Swaziland 49.42
219 South Africa 49.41
220 Guinea-Bissau 49.11
221 Chad 48.69
ST3054 - ST6004 12
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Lifestyle factors
ST3054 - ST6004 13
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Insurance & pension
I Insurance: pay a lump sum on death
I Pension: pay an annuity (fixed amount) till death
I Times (& cost) of payment are dependent on human lifetimes
I Calculation of expected cashflow depends on distribution ofhuman lifetime
I Study of distribution of lifetimes is called survival analysis
ST3054 - ST6004 14
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Machine reliability
I Air conditioner, working at high temp
I Begins working at time t = 0
I S(t) = P(AC functioning at future time t)
I S(t) = survival probability function
I T = future lifetime (failure time)
I S(t) = P(T > t) = 1 FT (t)I Machines are not humans: different models are needed
ST3054 - ST6004 15
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Biostatistics (clinical studies)
I Is a new drug for cancer more effective?
I t = 0 is date of diagnosis / injection
I S(t) = P(Alive at future time t)
I S(t) can be used to judge efficacy of new treatments,indicators
I Regression with S(t) as response
ST3054 - ST6004 16
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistics in survival analysis
I Estimating the survival function
I Modelling dependence of the survival function on covariates
I Predicting survival
I Estimating significance / standard errors
ST3054 - ST6004 17
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Types of data
I Actuarial and demographic studies deal with large samples
I Clinical studies deal with smaller samples
I Actuarial and demographic studies are generallycross-sectional (or transversal) studies:- observation of a population/sample at one point in time- aim to provide data on the whole population
I Clinical studies are generally longitudinal:- observations are repeated on the same individuals overperiods of time
ST3054 - ST6004 18
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Aggregate model
I Consider S(t) where t = 0 denotes time of birth
I X (age at death)?= T (time of death)
I x = attained age
I Survival function S(x) = S(t)
I When survival is not dependent on age, use S(t)
I If age is important, use Sx(t)
ST3054 - ST6004 19
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Select model
I Person opting for insurance, age x
I Want to find S(t)
I Sx(10) is different if x = 25 than if x = 55
I Survival function is really S(t, x)
I x is a concomitant variable, or covariate
I Other variables also affect S(t)
I Study the effect of other variables on S(t)
I Actuarial method: study Sx(t)
ST3054 - ST6004 20
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Section I
Survival models
ST3054 - ST6004 21
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
I.1 Simple survival models
ST3054 - ST6004 22
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
A simple survival model
I We consider a first model of random lifetimes
I The future lifetime of an individual is treated as a continuousrandom variable
I Model already provides a set of fundamental tools for theanalysis of human mortality
ST3054 - ST6004 23
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Future lifetime
I The lifetime of a person (or life) is not known in advance
I Lifetimes (random variables) range from 0 to over 100 years
I Let denote the limiting age (maximum age)
Assumption: the future lifetime of a new-born person, denoted T ,is a random variable continuously distributed on an interval [0, ]where 0 <
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Future lifetimeDefinition:
F (t) = P(T t) is the distribution function of TS(t) = P(T > t) = 1 F (t) is the survival function of T
I S(t) is the probability of a new-born surviving to age t
I Let Tx be the future lifetime after age x , of a life whosurvives to age x , with 0 x and T0 = T
Definition (0 x ):
Fx(t) = P(Tx t) is the distribution function of TxSx(t) = P(Tx > t) = 1 Fx(t) is the survival function of Tx
ST3054 - ST6004 25
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Future lifetimeExamples:
I F30(50) denotes the probability that a 30-year old dies beforehis/her 80th birthday
I S25(32) represents the probability that a 25-year old survivesat least another 32 years
For consistency with T , the distribution function of the randomvariable Tx must satisfy the following:
Fx(t) = P(Tx t)= P(T x + t|T > x)=
F (x + t) F (x)S(x)
ST3054 - ST6004 26
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Probabilities of death and survival
Actuarial notation for death and survival probabilities:
tqx probability that someone aged x dies with t years
qx probability that someone aged x dies within 1 year
tpx probability that someone aged x is still alive after t years
px probability that someone aged x is still alive after 1 year
In particular we have
tqx = Fx(t)
tpx = 1t qx = Sx(t)
ST3054 - ST6004 27
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Survival probabilities: example 1
Age x lx dx px qx90 9,253 2,035 0.78006 0.2199491 7,218 1,711 0.76297 0.2370392 5,507 1,403 0.74515 0.2548593 4,104 1,122 0.72659 0.2734194 2,982 873 0.70730 0.2927095 2,109 660 0.68728 0.3127296 1,449 483 0.66652 0.3334897 966 343 0.64503 0.3549798 623 235 0.62281 0.3771999 388 155 0.59985 0.40015
Table : Irish Life Table No. 14 2001-2003 (Males)
Probability that a 90 year old man survives to 95, i.e. 5p90?ST3054 - ST6004 28
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Survival probabilities: example 1
For a 90 year old man to survive to 95 he must
Event Probabilitysurvive from 90 to 91 1p90 = 0.78006survive from 91 to 92 1p91 = 0.76297survive from 92 to 93 p92 = 0.74515survive from 93 to 94 p93 = 0.72659survive from 94 to 95 p94 = 0.70730
Thus
5p90 = 1p90 1p91 1p92 1p93 1p94 = 0.2278
ST3054 - ST6004 29
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Factoring survival probabilities
5p90 = P(90 year old man will survive to 95)
5p90 = 1p90 1p91 1p92 1p93 1p945p90 = (1p90 1p91 1p92) (1p93 1p94) = 3p90 2p935p90 = (1p90 1p91) (1p92 1p93 1p94) = 2p90 3p92
s+tpx = spx tpx+ss+tpx = tpx spx+t
ST3054 - ST6004 30
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Survival probabilities: example 2
Question: which is bigger, 5p34 or 7p33?
ST3054 - ST6004 31
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Relating conditional and aggregate survival probabilities
I Person opting for insurance, age x
I Survival function is really S(t, x)
I Actuarial method: study Sx(t) = S(t, x)
I Sx(t) is the (select) survival function for the r.v. Tx
I Tx is the future (select) lifetime after age x
I T = T0 is called the aggregate lifetime
ST3054 - ST6004 32
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Relating conditional and aggregate survival probabilities
Sx(t) = probability someone aged x survives for t years or more
= probability someone survives to age x + t given that
they have already survived to age x
= P(Tx > t) = P(T > x + t|T > x)=
P(T > x + t and T > x)
P(T > x)
=S(x + t)
S(x)
Equivalently,
tpx =x+tp0
xp0
ST3054 - ST6004 33
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Relating conditional and aggregate survival probabilities
From this relationship
tpx =x+tp0
xp0
we may thus also derive
s+tpx =x+s+tp0
xp0=
x+sp0
xp0
x+s+tp0
x+sp0= spx tpx+s
Similary,
s+tpx = tpx spx+t
(i.e. the result seen previously on factoring survival probabilities)
ST3054 - ST6004 34
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The distribution of mortality (age at death)
SDF: S(x) = 1 F (x) = P(t > x)PDF: f (x) = dF (x)/dx = dS(x)/dx (unconditional)f (x) = limh0+ 1h [F (x + h)F (x)] = limh0+ 1hP(x < T x + h)
Figure : Distribution of the random variable T : number of deaths vs ageST3054 - ST6004 35
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
Definition: the force of mortality x at age x (0 x ) is
x = limh0+
1
hP (T x + h|T > x)
I This limit is always assumed to exist
I x is an instantaneous measure of mortality at age x , or theconditional instantaneous failure rate given survival to time x
I It is the continuous equivalent of qx
I Statisticians call it the hazard rate function
ST3054 - ST6004 36
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
Intuitively, the force of mortality may be seen as the probability ofinstant death for an individual aged x .
The probability P (T x + h|T > x) is Fx(h) = hqx .For small h, we can ignore the limit and write
hqx ' h x
This means that the probability of death in a short time h afterage x is roughly proportional to h. Moreover, the constant ofproportionality for this relationship is x .
ST3054 - ST6004 37
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
The expected number of deaths in a very short time interval h in avery large population of n individuals aged exactly x is thereforen x h.
Example: estimate 50 from a very large population of 50-yearolds by counting how many die within 1 hour. The annual rate ofmortality would then be approximated by the proportion of thegroup that had died multiplied by 24 365.
This would not yield an exact value for 50 due to (i) statisticalfluctuations, (ii) rounding errors, (iii) leap years are ignored,(iv) the1-hour period is still very large for an instantaneous measure.
ST3054 - ST6004 38
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
There is a close connection to the distribution of mortality:
x = limh0+
P (x < T x + h)h
1
P(T > x)
thus the 1-1 relationship between x and f (x)
x =f (x)
S(x)=dS(x)/dx
S(x)
Inversion: since x = ddx log S(x), we have
S(x) = exp
[ x
0sds
]= exp [(x)]
ST3054 - ST6004 39
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
0.0
0.5
1.0
1.5
2.0
haza
rd e
stim
ates
0 20 40 60 80 100
Age (years)
Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x vs age
ST3054 - ST6004 40
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
.0001
.001
.01
.1
1
haza
rd e
stim
ates
(log
sca
le)
0.0
0.2
0.4
0.6
0.8
1.0su
rviv
al e
stim
ates
0 20 40 60 80 100Age (years)
Figure : UK mortality 2003-2005 (Males, ONS): x = d log S(x)dx(increasing) and S(x) = exp
[ x0sds
](decreasing) vs age
ST3054 - ST6004 41
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x
.0001
.001
.01
.1
1
haza
rd e
stim
ates
(log
sca
le)
0.00
0.01
0.02
0.03
0.04D
ensi
ty
0 20 40 60 80 100Age (years)
Figure : UK mortality 2003-2005 (Males, ONS): x (increasing) and f (x)(unimodal) vs age
ST3054 - ST6004 42
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The force of mortality x+t
Definition: for x 0 and t > 0, we could define the force ofmortality x+t in two ways:
(1) x+t = limh0+
1
hP (T x + t + h|T > x + t)
(2) x+t = limh0+
1
hP (Tx t + h|Tx > t)
We often use x+t for a fixed age x and 0 t < x .
ST3054 - ST6004 43
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Density of select model (pdf of Tx)By def, the distribution function of Tx is Fx(t). Thus its pdf:
fx(t) =d
dtFx(t) =
d
dtP(Tx t) = 1
S(x)
[ ddt
S(x + t)
]= lim
h0+1
h(P(Tx t + h) P(Tx t))
= limh0+
P(T x + t + h|T > x) P(T x + t|T > x)h
= limh0+
{P(T x + t + h) P(T x)
S(x) h (P(T x + t) P(T x))
S(x) h}
= limh0+
P(T x + t + h) P(T x + t)S(x) h
ST3054 - ST6004 44
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Density of select model (pdf of Tx)
Now by multiplying by S(x + t)/S(x + t) we obtain
fx(t) =S(x + t)
S(x + t) lim
h0+P(T x + t + h) P(T x + t)
S(x) h= Sx(t) lim
h0+1
hP (T x + t + h|T > x + t)
= Sx(t) x+tIn actuarial notation, for a fixed age 0 x , this is equivalentto the following very important relationship
fx(t) = tpx x+t (0 t < x)
ST3054 - ST6004 45
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Density of select model (pdf of Tx)
Alternatively, we may observe that
fx(t) = ddt
Sx(t) = ddt
S(x + t)
S(x)=
1
S(x)
[ ddt
S(x + t)
]and since
ddt
S(x + t) = ddt
x+t
f (u)du = f (x + t)
we have
fx(t) =f (x + t)
S(x)=
S(x + t)
S(x)
f (x + t)
S(x + t)= tpx x+t
ST3054 - ST6004 46
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Summary
Tx random future lifetime after age x (continuous r.v.)
tqx probability that someone aged x dies with t years
qx probability that someone aged x dies within 1 year
tpx probability that someone aged x is still alive after t years
px probability that someone aged x is still alive after 1 year
Probabilistic notation Actuarial notation
x h ' P(T x + h|T > x) x h ' hqx
x =f (x)S(x) or f (x) = x S(x) fX (t) = x+t tpx
s+tpx = spx tpx+s( s, t > 0)
ST3054 - ST6004 47
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
I.2 Life table functions
ST3054 - ST6004 48
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table functions
A life table provides the expected number of survivors to each agein a hypothetical group of lives. We use:
lx = expected number of lives at age x
dx = expected number of deaths between ages x and x + 1
dx = lx lx+1px =
lx+1lx
qx = 1 px = 1 lx+1lx
=lx lx+1
lx=
dxlx
tpx =lx+tlx
ST3054 - ST6004 49
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table functions
In life tables, values are tabulated only for integer age. Whenworking with continuous age variables, assumptions on thevariation of mortality between integer ages are required. Ex:
I deaths occur uniformly between integer ages
I the force of mortality is constant between integer ages
I the Balducci assumption holds
For integer ages x and 0 t 1, the Balducci assumption statesthat
1tqx+t = (1 t)qxThis assumption is useful particularly in Cox regression analysis.
ST3054 - ST6004 50
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table: example including life expectancy
...
ST3054 - ST6004 51
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table: example 1
x S(x) lx0 1.0000 100,0001 0.97408 97,4082 0.97259 97,2593 0.97160 97,1604 0.97082 97,082. . . . . . . . .109 0.00001 1110 0.00000 0
I Traditional description of survivalfunction
I Start e.g. with radix l = 100, 000
I For subsequent ages:
lx = xp l
I Survival probabilities:
tpx =lx+tlx
ST3054 - ST6004 52
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table: example 2Estimate l58.25 assuming a uniform distribution of deaths betweenexact ages 58 and 59, from the English Life Table 15 (Males):
Age x lx58 88,79259 87,805
There are 88,792 - 87,805 = 987 deaths expected between theages of 58 and 59. Assuming deaths within this interval areuniformly distributed, the number of deaths expected between theages of 58 and 58.25 is
987/4 = 246.75
So the expected number of lives at age 58.25 is
88, 792 246.75 = 88, 545.25ST3054 - ST6004 53
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life tables: uniform distribution of deaths assumption
Note: Under the assumption of uniform distribution of deaths, thesurviving population at the start of each quarter is decreasing.This assumption therefore implies that the force of mortality isincreasing over the year of age (58,59).
Result: If deaths are assumed uniformly distributed between theages of x and x + 1, it follows that
tqx = t qx
for 0 t 1.
ST3054 - ST6004 54
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life tables: uniform distribution of deaths assumptionProof: Under the assumption we have by linear interpolation
lx+t = (1 t)lx + t lx+1for 0 t 1. So
tqx = 1 lx+tlx
= 1 (1 t)lx + t lx+1lx
=t lx t lx+1
lx= t(1 px)= t qx
ST3054 - ST6004 55
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Initial and central rates of mortality
Since qx is the probability that a life alive at age x (the initialtime) dies before age x + 1, it is called an initial rate of mortality.
Definition: The central rate of mortality mx defined as
mx =dx 1
0 lx+tdt=
qx 10 tpxdt
This alternative represents the probability that a life alive betweenages x and x + 1 dies. The denominator
10 tpxdt represents the
expected amount of time spent alive between ages x and x + 1 bya life alive at age x .
ST3054 - ST6004 56
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Initial and central rates of mortality
Note: mx measures the rate of mortality over the whole year fromexact age x to exact age x + 1. By contrast, x measures theinstantaneous rate of mortality at exact age x .
mx is useful for projecting numbers of deaths, given the number oflives alive in age groups. It constitutes one of the basiccomponents of a population projection.
In practice, the age groups used in population projection are oftenbroader than exactly one year, in which case the definition of mxmust be adjusted accordingly.
ST3054 - ST6004 57
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Initial and central rates of mortality
If x+t is a constant, , between ages x and x + 1, then
mx =qx 1
0 tpxdt=
10 tpxdt 10 tpxdt
=
This quantity is close to an occurence-exposure rate statistic
Number of deaths
Total time spent alive and at risk
which can be used to estimate the force of morality x .
Note from this that mx can never be less than qx .
ST3054 - ST6004 58
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
I.3 Expected future lifetime
ST3054 - ST6004 59
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Complete expectation of life
Definition: The expected future lifetime after age x (orexpectation of life at age x) is defined as E[Tx ] and is denoted ex :
ex =
x0
t tpxx+tdt
=
x0
t
( t
tpx
)dt
= [t tpx
]x0
+
x0
tpxdt (integration by parts)
=
x0
tpxdt
(using tpxx+t = fx (t) = tqx/t = tpx/t and[t tpx
]x0
= 0)
ST3054 - ST6004 60
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Curtate future lifetime
Definition: The curtate future lifetime of a life aged x isKx = [Tx ], where the square brackets [.] denote the integer part.
Kx is a discrete r.v. taking values on {0, 1, 2, . . . , x}, withprobability function
P(Kx = k) = P(k Tx < k + 1)= P(k < Tx k + 1) (assuming Fx (t) is continuous in t)= kpx qx+k
P(Kx = k) is often denoted k|qx (k deferred qx), as we considerhere deferring the event of death until the year that begins in kyears from now.
ST3054 - ST6004 61
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Curtate expectation of lifeDefinition: The curtate expectation of life, denoted ex , is
ex = E[Kx ]
We have
ex =
[x]k=0
k kpx qx+k
= 1px qx+1
+2px qx+2 + 2px qx+2
+3px qx+3 + 3px qx+3 + 3px qx+3
+ . . .
=
[x]k=1
[x]j=k
jpx qx+j =
[x]k=1
kpx
ST3054 - ST6004 62
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Curtate expectation of life
Additional years Probabilityfrom age X
1 1px qx+1(survives one year and then dies in the next year)
2 2px qx+2... ...
k kpx qx+k... ...
Sum
kpx qk+1 (k = 1 to x)
ST3054 - ST6004 63
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Relationship between ex and exConsidering the two formulae
ex =
x0
tpxdt and ex =
[x]k=1
kpx
the complete and curtate expectations of life are related by theapproximate equation
ex = ex +1
2
I Define Jx = Tx Kx to be the random lifetime after the highestinteger age to which a life x survives. Approximately, E[Jx ] = 1/2(assuming deaths occur uniformly within each year of age), andsince E[Tx ] = E[Kx ] + E[Jx ], we have ex ' ex + 1/2.
ST3054 - ST6004 64
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table: example 3
Figure : Irish Life Table No. 13, 1995-97 (Males)ST3054 - ST6004 65
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Life table: example 3 Why e81 6= e80 1?How would you write e81 w.r.t e80 and px?
Figure : Irish Life Table No. 13, 1995-97 (Males), continuedST3054 - ST6004 66
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Future lifetimes: varianceThe variances of the complete and curtate future lifetimes are
Var[Tx ] =
x0
t2tpxx+tdt e2x
Var[Kx ] =
[x]k=0
k2kpxqx+kdt e2x
These expressions may be useful to:
I derive the variance of functions based on future lifetimes
I further quantify the likely variation in some index of interest(e.g. the profits from a life insurance policy, or the cost ofproviding a benefit from a pension scheme)
ST3054 - ST6004 67
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Uses of the expectation of life
The expectation of life is often used as a measure of the standardof living and health care in a given country.
The complete expectation of life is used e.g. for premiumcalculations, and for longevity studies.
35-40 Angola, Zambia40-45 Afghanistan, Malawi45-50 Nigeria, Rwanda, South Africa, Zimbabwe50-55 Cameroon, Ethiopia, Uganda55-60 Bangladesh, Ghana, Haiti, Kenya, Russia60-65 Botswana, Burma, Guyana, Pakistan, Yemen65-70 Brazil, Guatemala, India70-75 Barbados, China, Serbia85-80 Australia, Japan, New Zealand, USA,
most Western European countries
Average life expectancy at birth for males (2009, CIA World Factbook and IFA 2011)
ST3054 - ST6004 68
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
I.4 Some important formulae
ST3054 - ST6004 69
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
A formula for tqx
tqx =
t0
spxx+sds
This follows from the relationship fx(t) = tpxx+t . For each times [0, t], the integrand is the product of
(i) spx , the probability of surviving to age x + s
(ii) x+s , which is approximately equal to dsqx+s , the probabilityof dying just after age x + s
These probabilities are mutually exclusive and are thus just addedup (or in the limit integrated).This result allows deriving animportant relationship between tpx and x .
ST3054 - ST6004 70
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
A formula for tpx
tpx = exp
{ t
0x+sds + c
}
ST3054 - ST6004 71
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
A formula for tpxProof: This follows from
sspx =
ssqx = fx(s) = spx x+s
Note that
slog spx =
s spx
spx= x+s
hence, for some constant of integration c (which is 0), t0
slog spxds =
t0x+sds + c
Since 0px = 1 we have[
log spx]t
0= log tpx and
tpx = exp{ t
0x+cds + c} = exp{
t0x+cds}
(since e0 = 1, we use c = 0)ST3054 - ST6004 72
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Summary: integral expressions for tqx and tpx
tqx = t
0 spxx+sds
tpx = exp{ t0 x+sds + c}
(yes, it is so important that we repeat it)
ST3054 - ST6004 73
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
I.5 Simple parametric survival models
ST3054 - ST6004 74
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Parametric models?
.0001
.001
.01
.1
1
haza
rd e
stim
ates
(log
sca
le)
0.0
0.2
0.4
0.6
0.8
1.0su
rviv
al e
stim
ates
0 20 40 60 80 100Age (years)
Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x (increasing) and S(x) (decreasing) vs age I WHAT MODEL?
ST3054 - ST6004 75
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Parametric models?
.0001
.001
.01
.1
1
haza
rd e
stim
ates
(log
sca
le)
0.00
0.01
0.02
0.03
0.04D
ensi
ty
0 20 40 60 80 100Age (years)
Figure : UK mortality 2003-2005 (Males, Office of National Statistics):x (increasing) and f (x) (unimodal) vs age I WHAT MODEL?
ST3054 - ST6004 76
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The exponential model
I Various survival models may be used
I In simpler models, the distribution of future lifetime uses asmall number of parameters
I One of the simplest models is the exponential model
I In this model the hazard rate is constant, i.e. x = 0,and for t 0
tpx = Sx(t) = e t0 ds = e
[s
]t0 = et
and
tqx = 1 tpx = 1 etST3054 - ST6004 77
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The exponential model
We also have
f (t) = et
E[T ] =1
Var[T ] =1
2
I Not appropriate for human survival over broad ranges
I Can be used over short ranges
I Is used for machine reliability
ST3054 - ST6004 78
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The exponential model
Q1 If x = 0.001 (constant) between ages 25 and 30, calculatethe probability that a life aged exactly 25 will survive to age 30
Q2 If x = 0.02 at all ages, calculate the age x for which
xp0 = 0.5. What does this age represent?
Q3 Given that e50 = 30 and 50+t = 0.005 for 0 t 1, what isthe value of e51?
ST3054 - ST6004 79
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The exponential model
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100
Figure : Exponential models in function of age (in years), withS(t) = et for = 0.10, 0.05, 0.03. Then E[T ] = 1/ = 10, 20, 33 resp.
ST3054 - ST6004 80
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The Weibull model
I The Weibull model is a simple extension of the exponentialmodel
I The exponential model is a special case of Weibull ( = 1)
I In this model the survival function S(t) is
S(t) = et
I Weibull hazard is monotonically increasing (or decreasing):
t = t1
ST3054 - ST6004 81
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The Weibull model
I Since t = ddt log [S(t)] we have
t = ddt
[t1] = t1
I Weibull is a special case of Gamma distribution
I Its moments can be calculated relatively easily
ST3054 - ST6004 82
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The Gompertz distribution (1825)I Gompertz law of mortality (t 0,B > 0, c > 1):
S(t) = eB
log(c)(1ct)
t = Bct
I Gompertz law yields an exponentially increasing hazard ratethroughout life
log(t) = log(B) + t log(c)
t = Bet log(c)
I Often a reasonable assumption for middle and older ages
I Simple expressions for mean and variance are not available
ST3054 - ST6004 83
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The Gompertz distribution (1825)
.0001
.001
.01
.1
1ha
zard
est
imat
es
0 10 20 30 40 50 60 70 80 90 100Age (years)
Figure : The hazard function plotted on log-scale here is approximatelylinear beyond 30 years of age
ST3054 - ST6004 84
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
The Makeham distribution (1860)
I Makehams law of mortality (t 0,B > 0, c > 1,A > B):
S(t) = eB
log(c)(1ct)At
t = A + Bct
I The constant term is sometimes interpreted as an allowancefor accidental deaths
I Suggests that part of the hazard rate is age-independent
I Otherwise same as Gompertz law
I Simple expressions for mean and variance are again notavailable
ST3054 - ST6004 85
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Calculating the parameter values
If a life table is know to follow Gompertz law, the parameters Band c can be determined given the values of t at any two ages.
In the case of a life table following Makehams law, the parametersA,B and c can be determined given the values of t at any threeages.
Question: given that 50 = 0.017609 and 55 = 0.028359,calculate B and c for a force of mortality t known to followGompertz law.
ST3054 - ST6004 86
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Survival probabilitiesSurvival probabilities tpx can be found using
tpx = exp
( t
0x+sds
)
Gompertz law: for g = exp(B
log(c)
), we have
tpx = gcx (ct1)
Makehams law: for g = exp(B
log(c)
)and s = exp(A), we have
tpx = stg c
x (ct1)
ST3054 - ST6004 87
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Survival probabilities
Proof ...?
ST3054 - ST6004 88
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Gompertz-Makeham family
The force of mortality can be modeled using one of theGompertz-Makeham curves. This family of functions is of the form
GM(r , s) = 1 + 2t + + r tr1 + er+1+r+2t++r+s ts1
where the 1, . . . , r+s are constants independent of t.
This is the most widely used form of the GM family. Anotherpopular form is
x = GM(r , s) = poly1(t) + epoly2(t)
where t is a linear function of x and poly1(t) and poly2(t) arepolynomials of degrees r and s respectively.
ST3054 - ST6004 89
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Suitable distribution families (summary)Key constraint for a parametric model:
the domain of f (t) must be R+.
Here is a summary of some suitable families:
Exp Weibull Gompertz Makeham Log-logistic
f (t) et t1et
Bc teB(1ct )
log(c) (A+Bc t)eB(1ct )
log(c) t1
(1+t )2
F (t) 1 et 1 et 1 eB(1ct )
log(c) 1 eB(1ct )
log(c)At
1 11+t
S(t) et et
eB(1ct )
log(c) eB(1ct )
log(c)At 1
1+t
t t1 Bc t A+Bc t t
11+t
ST3054 - ST6004 90
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Exam-style question (1/2)1. An investigation is undertaken into the mortality of menbetween exact ages 50 and 55 years. A sample of n men is followedfrom their 50th birthday until their either die or they reach their55th birthdays.
The force of mortality (or hazard rate) is assumed to have thefollowing form
x = + x
where and are parameters to be estimated and x is measuredin years since the 50th birthday.
(a) Derive an expression for the survival function between ages 50and 55 years
(b) Sketch this on a graphST3054 - ST6004 91
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Simple survival modelLife table functionsExpected future lifetimeSome important formulaeSimple parametric survival models
Exam-style question (2/2)
[...] The force of mortality (or hazard rate) is assumed to have thefollowing form
x = + x
where and are parameters to be estimated and x is measuredin years since the 50th birthday.
(c) Comment on the appropriateness of the assumed form of thehazard function for modelling mortality over this age range
(d) If there were 100,000 men aged 50 then how many deathswould you expect between ages 50 and 55 years
(e) Describe the distribution of the number of deaths betweenages 50 and 55 years among the n men
ST3054 - ST6004 92
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Section II
Estimating lifetime distributionfunctions
ST3054 - ST6004 93
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
II.1 Statistical inference
ST3054 - ST6004 94
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Introduction
I Previous section introduced the continuous r.v. T of futurelifetime
I This section presents a methodology for using observationsfrom an investigation to estimate the lifetime distributionfunction F (t) = P(T t) empirically
I Statistical properties for the estimates may be derived toconstruct variances and confidence intervals
I The possibility of incomplete data will also be considered
I Defining a decrement of interest by death can easily beextended to the analysis of other decrements such as sickness,mechanical breakdowns, etc.
ST3054 - ST6004 95
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Overview
I Empirical Survival Function
I Parametric MLE
I Non-parametric MLE
I Kaplan-Meier survival function
ST3054 - ST6004 96
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Estimating the lifetime distribution function
Statistical inference: given some mild conditions on thedistribution of T , we can obtain all information by estimatingF (t), S(t), f (t) or (t) for all t 0.
Simple experiment:
I Observe a large number of new-born lives
I The proportion alive at age t > 0 provides an estimate of S(t)
I Use a step function to estimate S(t) with S(t)
ST3054 - ST6004 97
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Estimating the lifetime distribution function
I This is known as the empirical distribution function of T
I It is a non-parametric approach to estimation
I It is not necessary to assume that T is a member of anyparametric family
ST3054 - ST6004 98
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Estimating the lifetime distribution function
Non-parametric approach:
I No prior assumption about the distribution shape or form
I Use the data to estimate this shape/form
Parametric approach:
I The distribution is assumed to belong to a certain family (e.g.exponential)
I Use the data to estimate the appropriate parameters of thisfamily (e.g. mean and variance)
ST3054 - ST6004 99
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Estimating the lifetime distribution function
Ex: A life insurance company prefers to base its premiumcalculations on a smooth estimate to ensure the premiums changegradually from one age to the next, without sudden jumps.
I A larger sample yields a smoother estimate
I Further smoothing also possible
ST3054 - ST6004 100
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Example
Time Died Alive SDF
5 1 9 0.910 1 8 0.815 1 7 0.720 1 4 0.420 1 4 0.420 1 4 0.435 1 3 0.340 0 3 0.340 0 3 0.340 0 3 0.3
ST3054 - ST6004 101
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Example
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
survival
0 5 10 15 20 25 30 35 40Time
Figure : Observed proportion S(t) surviving a given time tST3054 - ST6004 102
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Example
0.00
0.25
0.50
0.75
1.00
0 10 20 30 40analysis time
Empirical Survival
Note step down of 0.1
Note step down of 0.3 (3 deaths)
Note 3 of the 10 (0.3) are still alive at 40 months.
Figure : Observed proportion S(t) surviving a given time t
ST3054 - ST6004 103
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Estimating the lifetime distribution function
Limitations:
I Difficult to find a satisfactory group of lives for study
I The experiment would take about 100 years to complete
I Deaths of all the lives must be recorded (highly impractical)
I Censoring is therefore nearly always required
I All we know in respect of some lives is that they died after acertain age
ST3054 - ST6004 104
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
CohortsI Complete data: All units under observation until failure
I Incomplete data: units can withdraw/become lost fromobservation before death
I Analysis of complete data is far simpler, so we do this first
I Cohort: All units come under observation at time t = 0
I No entrants after t = 0
I All are observed until failure/death:I lab expt. with mice injected with nicotine; t = 0 is beginning
of experiment
I People diagnosed with certain type of cancer; t = 0 on day ofdiagnosis
ST3054 - ST6004 105
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Follow-up timeI Important to distinguish between calendar/chronological time
and time under observation (follow-up time)
I Patient A diagnosed on March 1, 1990, dies on March 11,1991
I Patient B diagnosed on July 1, 1991, dies on August 1, 1991
0 365 730
A
B
1/1/90 1/1/92 1/1/91
A B
ST3054 - ST6004 106
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
II.2 Censoring
ST3054 - ST6004 107
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
CensoringI Censoring results in loss of data
I Depending on the type of censoring (informative), it may alsoyield biased mortality rates
I An observational plan is required in a mortality investigation,to specify start and end dates and categories of lives to beincluded
I In e.g. medical statistics, non-parametric estimation is veryimportant
I Experiments can be amended to allow for censoring
I Otherwise, inference must be based on data with shortertimes (e.g. 3 or 4 years)
ST3054 - ST6004 108
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Censoring
If inference is based on data with shorter times:
I We no longer observe the same cohort throughout their jointlifetimes
I We might not be sampling from the same distribution
I Model assumptions may thus need to be widened so that themortality of lives born in year y is modelled by T y
I In practice, the investigation is divided up into single years ofage (outside scope of ST3054)
ST3054 - ST6004 109
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Censoring
I Observing lives between integer ages x and x + 1, and limitingthe period of investigation, are also forms of censoring
I Censoring might still occur at unpredictable times(e.g. lapsing of a policy)
I Time of observation corresponding to loss of survivors isknown: either age x + 1 or end of investigation
ST3054 - ST6004 110
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Censoring mechanisms
Data are censored if we do not know the exact values of eachobservation but we do have information about the value of eachobservation in relation to one or more bounds (e.g. we know that aperson was still alive at age 20 at end of investigation).
I Censoring is the key feature of survival data
I Survival analysis may be seen as the analysis of censored data
I Censoring mechanisms play an important role in statisticalinference
ST3054 - ST6004 111
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Censoring mechanisms
Most common censoring assumptions (not all mutually exclusive):
I Right censoring
I Left censoring
I Interval censoring
I Random censoring
I Informative and non-informative censoring
I Type I censoring
I Type II censoring
ST3054 - ST6004 112
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Right censoring
I Data are right censored if obs in progress are cut short
I Most common form of censoring in actuarial investigations
I Ex: end of mortality study before all lives observed have died
I Person still alive when investigation ends are right censored
I We only know that their lifetime exceeds some value
I Ex: life insurance policy holders surrender their policy, activelives of a pension scheme retire, endowment assurance policiesmature
ST3054 - ST6004 113
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Right censoring
ST3054 - ST6004 114
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Left censoring
I Data are left censored if we cannot know when entry into thestate we wish to observe took place
I Ex: medical studies where time elapsed between onset andbaseline diagnosis is unknown
I Ex: estimating functions of exact age without knowledge ofDOB, estimating functions of exact policy duration withoutknowledge of exact date of policy entry, estimating functionsof duration since onset of sickness without knowledge of exactdate of start of sickness
I Left censoring is different to left truncation
ST3054 - ST6004 115
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Left censoring
ST3054 - ST6004 116
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
TruncationI Truncation: when estimating functions of exact age without
info from before the start of investigation period, or beforeentry date of policy, etc.
I Observed data: time of occurrence (or censored observationof) the event
I Ascertainment time B, earliest initiation time 0
I Ex: estimate incubation distribution based on retrospectivesamples of AIDS cases with known infection times
I Proba: for an individual observed to have experienced theevent after t time units
f (t)
F (B t)ST3054 - ST6004 117
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Truncation
ST3054 - ST6004 118
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Interval censoring
I Data are interval censored if the observational plan only allowsto date an event of interest within a time interval
I Ex: actuarial studies where only calendar year of death isknown
I Right and left censoring are special cases of interval censoring
I Ex: estimating functions of exact age when deaths are knownup to nearest birthday only
I Ex: knowing calendar date of death and calendar year of birth(example of left censoring and also interval censoring since weonly know the lifetime falls within a certain range)
ST3054 - ST6004 119
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Interval censoring
ST3054 - ST6004 120
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Random censoring
I Censoring is random if the time Ci at which observation of thei th lifetime is censored is a random variable
I Obs will be censored if Ci < Ti where Ti is the randomlifetime of the i th life
I Ex: when individuals may leave the observation by a meansother than death, and where the time of leaving is not knownin advance
I Ex: life insurance withdrawals, emigration from a population,members of a company pension scheme may leave voluntarilywhen moving to another employer
ST3054 - ST6004 121
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Random censoring
I Random censoring is a special case of right censoring
I The case in which the censoring mechanism is a seconddecrement of interest gives rise to multiple decrement models
I Ex: suppose that lives can leave a pension scheme throughdeath, age retirement or withdrawal. The rates of decrementfor all these causes of decrement can be estimated by amultiple decrement model.
ST3054 - ST6004 122
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Informative and non-informative censoring
I Censoring is non-informative if it gives no information aboutthe lifetimes {Ti}
I If random censoring: the independence of each pair Ti ,Ci issufficient to ensure non-informative censoring
I Informative censoring is more difficult to analyse
I Essentially this is because the resulting likelihoods cannotusually be factorised (recall that statistical independencegreatly simplifies calculation of likelihoods)
ST3054 - ST6004 123
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Informative and non-informative censoring
Examples of informative censoring:
I Withdrawal of life insurance policies (likely to be in betteraverage health than those who do not withdraw). Themortality rates of the lives that remain in the at-risk group arelikely to be higher than the mortality rates of the lives thatsurrender their policy.
I Ill-health retirements from pension schemes (likely to be inworse average health than continuing members). Mortalityrates of those who remain in pension scheme are likely to belower than those of the lives that left through ill-healthretirement.
ST3054 - ST6004 124
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Informative and non-informative censoring
Example of non-informative censoring:
I The end of the investigation period, because it affects all livesequally, regardless of their propensity to die at that point
ST3054 - ST6004 125
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Type I censoring
I Type I censoring occurs if the censoring times {Ci} are knownin advance
I This is a degenerate case of random censoring
I Also a special case of right censoring
I Lives censored at end of investigation period might also beconsidered as an example of Type I censoring
ST3054 - ST6004 126
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Type I censoring
Examples of right censoring mechanisms:
I When estimating functions of exact age, individuals are notfollowed up anymore once they have reached 60
I When lives retire from a pension scheme at normal retirementage (if this is a pre-determined exact age)
I When estimating functions of policy duration, observing onlyindividuals up to their 10th policy anniversary
I When measuring functions of duration since having aparticular medical operation, and only observing people for amax of 12 mths from date of operation
ST3054 - ST6004 127
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Type II censoring
I Type II censoring is present if observation is continued until apredetermined number of deaths has occurred
I Can simplify the analysis: non-random number of events
I Ex: when a medical trial is ended after 100 lives on aparticular course of treatment have died
I Observational plan is likely to introduce censoring
I Consideration should be given to the effect on the analysis inspecifying this plan
I Censoring might also depend on the results of theobservations to date (oncologic trials)
ST3054 - ST6004 128
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Type I and II censoring
I Many actuarial investigations are characterised by acombination of random and Type I censoring
I Ex: in life office mortality studies where policies rather thanlives are observed, and observation ceases either when a policylapses (random cens) or at some predetermined date markingthe end of investigation (Type I cens)
I Type I and Type II censoring are most frequently met with inthe design of medical survival studies
I See Question 8.4
ST3054 - ST6004 129
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
II.3 The Kaplan-Meier (product limit) model
ST3054 - ST6004 130
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: introduction
I Derive the empirical distribution function from the data toallow for censoring
I Consider lifetimes as a function of time t without specifying astarting age x
I Applies equally to new-born lives, lives aged x at outset, oflives sharing a common property at time t (e.g. diagnosis of amedical condition)
Note: patient age may be important but not the sole determinant,and is usually treated as an explanatory variable in a multivariateregression model (cf. next section). Ex: measure mortalityamongst patients suffering from a virulent tropical disease.
ST3054 - ST6004 131
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptions
I Suppose we observe a population of n lives in the presence ofnon-informative right censoring, and suppose we observe mdeaths
I Non-informative censoring mortality of the lives alive in thegroup is not systematically higher or lower than that of thecensored lives
I Estimates of the distribution and survival functions will bebiased if informative censoring actually occurs
I If informative censoring is allowed, the lifetimes and censoringtimes are no longer independent
ST3054 - ST6004 132
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptions
I Define t0 = 0 and tk+1 = and let t1 < < tk , k m, bethe ordered times at which deaths were observed
I k m: more than one death may be observed at a singlefailure time
I Assume dj deaths are observed at time tj (1 j k) so thatd1 + + dk = m
I Observation of the remaining n m lives is censored (i.e.these remaining lives are not tracked further)
ST3054 - ST6004 133
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptions
I Assume cj lives are censored (i.e. removed from investigation)between times tj and tj+1 (0 j k)
I Then c0 + c1 + + ck = n mI Let dj be the number of individuals experiencing the event at
duration tj
I Let nj be the risk of experiencing the event just prior toduration tj
ST3054 - ST6004 134
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptionsThe Kaplan-Meier (KM) estimator of the survivor function adoptsthe following conventions:
(a) The hazard of experiencing the event is zero at all durationsexcept those where an event actually happens in our sample
(b) The hazard of experiencing the event at any particular
duration tj when an event takes place is equal todjnj
(c) For any 0 j k , if cj > 0, thenI If dj = 0, the persons censored are removed from observation
at duration tj (at which censoring takes place)
I If dj > 0, persons who are censored at tj are assumed to becensored immediately after the events have taken place (sothat they are still at risk at that duration)
ST3054 - ST6004 135
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptionsExample [IFA notes]: a group of 15 lab rats are injected with anew drug. They are observed over the next 30 days. The followingevents occur:
Day Event3 Rat 4 dies from effects of drug4 Rat 13 dies from effects of drug6 Rat 7 gnaws through bars of cage and escapes11 Rats 6 and 9 die from effects of drug17 Rat 1 killed by other rats21 Rat 10 dies from effects of drug24 Rat 8 freed during raid by animal liberation activists25 Rat 12 accidentally freed by journalist reporting earlier raid26 Rat 5 dies from effects of drug30 Investigation closes. Remaining rats hold street party.
ST3054 - ST6004 136
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
3 4
6
11 (2 rats)
17 24
26
25 30 (5 rats)
21
censored
died Day
t1 t2 t3 t5 t4
Day Event3 Rat 4 dies from effects of drug4 Rat 13 dies from effects of drug6 Rat 7 gnaws through bars of cage and escapes11 Rats 6 and 9 die from effects of drug17 Rat 1 killed by other rats21 Rat 10 dies from effects of drug24 Rat 8 freed during raid by animal liberation activists25 Rat 12 accidentally freed by journalist reporting earlier raid26 Rat 5 dies from effects of drug30 Investigation closes. Remaining rats hold street party.
ST3054 - ST6004 137
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptionsI n = 15 lives under investigation, m = 6 drug-related deaths
I k = 5 death time points; times at which deaths wereobserved: t1 = 3, t2 = 4, t3 = 11, t4 = 21, t5 = 26
I Number of deaths observed at each failure time:d1 = 1, d2 = 1, d3 = 2, d4 = 1, d5 = 1
I n m = 9 lives did not die due to drugsI Number of lives censored:
c0 = 0, c1 = 0, c2 = 1, c3 = 1, c4 = 2, c5 = 5(k
j=0 cj = n m)I Number of lives and at risk at time ti :
n1 = 15, n2 = 14, n3 = 12, n4 = 9, n5 = 6
ST3054 - ST6004 138
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptions
I We can see the approach as a partition of duration into verysmall intervals
I The risk of the event happening is 0 at those intervals whereno event occurs
I The data offers no evidence to suppose anything else
I In those intervals in which events do occur, the hazard isassumed constant (i.e. piecewise exponential) within eachinterval
I The hazard is allowed to vary between eventful intervals
ST3054 - ST6004 139
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
The Kaplan-Meier model: assumptionsI Recall that if x+t = , then Sx(t) = tpx = et
I The survival function is exponential over each short intervalover which the force of mortality (or hazard) is constant
I The hazard within the interval containing event time tj isestimated for 1 j k as
j =djnj
I This is a non-parametric MLE that maximises
kj=1
djj (1 j)njdj (product of independent binomial likelihoods)
I In eventless intervals, dj = 0 and the hazard becomes 0
ST3054 - ST6004 140
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Extending the force of mortality to discrete distributions
Definition: Suppose F (t) has probability masses ar the pointst1, . . . , tk . Then the discrete hazard function is defined as
j = P[T = tj |T tj ] (1 j k)
I j may be seen as the proba that a given individual dies onday tj , given that they were still alive at the start of that day
ST3054 - ST6004 141
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Extending the force of mortality to discrete distributionsEx: butterflies of a certain species have short lives. After hatching,each butterfly experiences a lifetime defined by the followingprobability distribution:
Lifetime (days) Probability1 0.102 0.303 0.254 0.205 0.15
Calculate j for j = 1, 2, ..., 5 (to 3 decimal places) and sketch agraph of the discrete hazard function.
ST3054 - ST6004 142
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Calculating the KM estimate of the survival function
If we assume that T has a discrete distribution then
1 F (t) =tjt
(1 j)
Since 1 F (t) = S(t), we can estimate the survival function usingthe formula
S(t) =tjt
(1 j)
This is the Kaplan-Meier estimator.
ST3054 - ST6004 143
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Calculating the KM estimate of the survival function
To compute S(t), we multiply the survival probabilities within eachof the intervals up to and including duration t. The survivalprobability at time tj is estimated by
1 j = nj djnj
=number of survivors
number at risk
So the probability of survival at time t is estimated by
S(t) =tjt
nj djnj
The KM estimate is also called the product limit estimate as aresult of this expression.
ST3054 - ST6004 144
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Calculating the KM estimate of the survival functionTo summarize the approach:
I Finer and finer partitions of the time axis are chosen(1 j k)
I (1 F (t)) is estimated as the product of the probabilities ofsurviving each sub-interval
I Then the KM estimate is obtained usingj = P[T = tj |T tj ], as the mesh of the partition tends to 0
I This KM estimate is constant after the last duration at whichan event occurred: it is not defined at durations longer thanthe duration of the last censored observation
I Only those at risk at {tj} contribute to the estimateST3054 - ST6004 145
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Calculating the KM estimate of the survival function
I It is unnecessary to start observation on all lives at the sametime or age
I The estimate is valid for data truncated from the left,provided truncation is non-informative in the sense that entryto the study at a particular age or time is independent of theremaining lifetime
ST3054 - ST6004 146
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Calculating the KM estimate of the survival functionEx: using the data from the observation of lab rats, calculate theKaplan-Meier estimate of F (t).
j tj dj nj j = dj/nj (1 j) 1jk=1(1 k)1 3 1 15 0.0667 0.9333 0.0667
2 4 1 14 0.0714 0.9286 0.1333
3 11 2 12 0.1667 0.8333 0.2778
4 20 1 9 0.1111 0.8889 0.3580
5 26 1 6 0.1667 0.8333 0.4650
F (t) =
0 for 0 t < 30.0667 for 3 t < 40.1333 for 4 t < 110.2778 for 11 t < 210.3580 for 21 t < 260.4650 for t 26
ST3054 - ST6004 147
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
A graphical approach
I We can use a graphical approach to carry out KM estimation
I Ex: derive an estimate S(t) of the survival function S(t) toobtain F (t) = 1 S(t)
I The graph of S(t) is a step function starting at 1 andstepping down at each new death
I The heigh of each step must be calculated to specify S(t)
ST3054 - ST6004 148
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
A graphical approach: example (lab rat data)
ll
ll
l
Estimate of the survival function
Time
Surv
ival
pro
babi
lity
l
0 3 11 21 26 304
0.00
0.25
0.50
0.75
1.00
S(t) t
1 0 t < 314/15 3 t < 4
14/1513/14 4 t < 1113/1510/12 11 t < 21
... ...
ST3054 - ST6004 149
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Comparing lifetime distributions
I Ex: compare lifetime distributions of two populationsfollowing different drug treatments
I Use statistical properties of KM estimates for comparison
I Greenwoods formula for MLE F :
Var[F (t)
](
1 F (t))2
tjt
djnj(nj dj)
I Accurate if large # of uncensored data (20+) and for0 S(t) 1; otherwise estimates may be beyond 0 or 1
I This variance estimate can be used to construct CIs
ST3054 - ST6004 150
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Points to note on the KM estimator
I KM estimator is based on non-informative censoring
I Value of estimator not well defined if last data point iscensored
I With no censoring, KM is the same as empirical SDF
I KM is implemented in most statistical packages, including R
I Can also be derived from the theory of counting processes
ST3054 - ST6004 151
-
IntroductionSurvival models
Lifetime distribution functionsCox regression
Statistical inferenceCensoringThe Kaplan-Meier (product-limit) modelParametric estimation of the survival function
Pointwise confidence intervals for KM estimator
[S(x)1/, S(x)
], where = exp