Lecture 6: Survival Analysis · Lecture 6: Survival Analysis Introduction...a clariﬁcation I...

Lecture 6: Survival Analysis


Dankmar Bohning

Southampton Statistical Sciences Research InstituteUniversity of Southampton

S3RI, 2 - 4 March 2015

1 / 16


Outline

Introduction

Basic definitions

The hazard

2 / 16


Introduction

A couple of questions and...

I What makes survival data so special that their analysis needsa special treatment, even as long as a one-term course?

I Why isn’t it simply covered as a sub-topic in, let’s say,regression analysis?

3 / 16


Introduction

...a clarification

I Survival data subsume more than only times from birth todeath for some individuals.

I Analysis of duration data, that is the time from a well-definedstarting point until the event of interest occurs.

4 / 16


Introduction

Examples

I how long patients survived after diagnosis or treatment

I the length of unemployment spells

I how long a marriage lasts

I how long PhD students need to finish writing their theses

I and more...

5 / 16


Introduction

Features

I Survival data result from a dynamic process and we want tocapture these dynamics in the analysis properly.

I The observation scheme for duration data can be rathercomplex, leading to data that are somehow cut.

6 / 16


Basic definitions

The basic functions

In the following we will assume that time is running continuously,and we therefore will describe duration by a continuous randomvariable, denoted by T .

I T ≥ 0

I f (t) ⇒ density function

I F (t) ⇒ cumulative density function (cdf)

I S(t) ⇒ survival function

7 / 16


Basic definitions

Recall that...

I The density function f (t) describes how the total probabilityof 1 is distributed over the domain of T .

I The function f (t) itself is not a probability and can takevalues bigger than 1. But still one can derive basic propertiesfrom looking at the density.

I For regions where the density has large values the area underthe curve over an interval of given length will be larger ascompared to an interval of same length where the density islower.

I Regions over which the density is high are regions where weexpect to observe more data points than in regions with lowdensities.

8 / 16


Basic definitions

Recall that...

I The cdf F (t) is defined as F (t) := P(T ≤ t) which can becomputed from the density as

F (t) =

∫ t

0f (s)ds

.

I A cdf is an increasing function, even strictly increasing if thedensity f (t) > 0 everywhere.

I F (0) = 0 and limt→∞ F (t) = 1.

I There is a one-to-one link between f (t) and F (t) asF ′(t) = f (t). Knowing one of the functions means, at least inprinciple, knowing the other (you may have to take thederivative or perhaps solve an ugly integral).

9 / 16


Basic definitions

Recall that...

I Instead of looking at the cdf, which gives the probability ofsurviving at most t time units, one prefers to look at survivalbeyond a given point in time. This is described by the survivalfunction S(t):

S(t) = P(T > t) = 1− P(T ≤ t) = 1− F (t)

I Consequently, S(t) starts at 1 for t = 0 and then declines to 0for t →∞.

I It should be obvious that knowing any one of f (t), F (t) andS(t) allows to derive the other two functions.

10 / 16


Basic definitions

To summarize

Pr(a ≤ T ≤ b)

All the three functions introduced so far allowed to describe, in oneway or another, how the survival times are distributed over thepotential range.

11 / 16


The hazard

The dynamic process

I Density, cdf and survival function look at the marginaldistribution

I Conditioning on the survival experience so far, we have

Pr(t < T ≤ t + ∆t | T > t)

I Defining the Hazard Rate

h(t) = lim∆t→0

Pr(t < T ≤ t + ∆t | T > t)

∆t

12 / 16


The hazard

The hazard in more details

The basic information in the hazard is, first of all, its qualitativebehavior.

13 / 16


The hazard

Some useful identities

I h(t) = f (t)S(t) ⇒ f (t) = h(t)S(t)

I h(t) = [− log S(t)]′

I S(t) = exp{−

∫ t0 h(s)ds

}I Define the cumulative hazard H(t)

H(t) =

∫ t

0h(s)ds ⇒ S(t) = exp{−H(t)}or log S(t) = −H(t)

14 / 16


The hazard

By using the definition of conditional probabilities

Pr(t < T ≤ t + ∆t | T > t) =Pr([t < T ≤ t + ∆t] ∩ [T > t])

Pr(T > t)

=Pr(t < T ≤ t + ∆t | T > t)

Pr(T > t)

It may be helpful to sketch this relation graphically

15 / 16


The hazard

An example

16 / 16

Survival Analysis: Non-Parametric Estimation

Survival Analysis:Non-Parametric Estimation

Dankmar Bohning


S3RI, 2 - 4 March 2015

1 / 15


Outline

General Concepts

Non-Parametric Estimation (no censoring)

Non-Parametric Estimation (including censoring)

2 / 15


General Concepts

Few remarks before starting

I Each subject has a beginning and an end anywhere along thetime line of the complete study.

I In many clinical trials, subjects may enter or begin the studyand reach end-point at vastly differing points.

I Each subject is characterized by

1. Survival time2. Status at the end of the survival time (event occurrence or

censored)3. The study group they are in.

3 / 15


General Concepts

Censoring

I The total survival time for that subject cannot be accuratelydetermined.

I A subject drops out, is lost to follow-up, or required data arenot available

I The study ends before the subject had the event of interestoccur, i.e., they survived at least until the end of the study,

I There is no knowledge of what happened thereafter.

4 / 15


General Concepts

Censoring

I Right censoring: the period of observation expires, or anindividual is removed from the study, before the event occurs.

I Left censoring: the initial time at risk is unknown.

I Interval censoring: both right and left censored

5 / 15



Estimation

I Random variable T with cdf F (t)

I S(t) = 1− F (t)

I With no censored observations:

S(t) = 1− F (t)

I To estimate F (t) at each time t:I data t1, . . . , tnI parameter of interest θ = F (t) = Pr(T ≤ t)

I θ = #obs.≤tn =

Pni=1 I(0,ti )

(t)

n

6 / 15



Confidence intervals

I Confidence interval for F (t):

θ ∓ zα/2

√θ(1− θ)

n

I Confidence interval for S(t):

1− θ ∓ zα/2

√θ(1− θ)

n

7 / 15



8 / 15



Estimation

I To estimate the proportions θi

I ni = # of individuals at risk at the beginning of the i-thinterval

I di = # of individuals experiencing the event

θi =ni − di

ni

I Kaplan Meier estimator

S(t) =∏

i :ti≤t

ni − di

ni

I It reduces to 1− F (t) with no censored observations

9 / 15



Example

Subject Group Survival # surviving Event # surviving Cumulativetime at risk after event survival

in the interval rate

1 1 1 6 1 5 1× 56

2 1 2 5 1 4 1× 56× 4

53 1 3 4 1 3 1× 5

6× 4

5× 3

44 1 4 3 1 2 1× 5

6× 4

5× 3

4× 2

35 1 4.5 2 1 1 1× 5

6× 4

5× 3

4× 2

3× 1

26 1 5 0

7 2 0.5 6 1 5 1× 56

8 2 0.75 5 1 4 1× 56× 4

59 2 1 4 1 3 1× 5

6× 4

5× 3

410 2 1.5 0

11 2 2 2 1 1 1× 56× 4

5× 3

4× 1

212 2 3.5 1 1 0 1× 5

6× 4

5× 3

4× 1

2

10 / 15



Example

1

1

0.00

0.25

0.50

0.75

1.00

0 1 2 3 4 5analysis time

group = 1 group = 2

Kaplan-Meier survival estimates

11 / 15



Understanding KM analysis

I The lengths of the horizontal lines represent the survivalduration for that interval.

I The interval is terminated by the occurrence of the event ofinterest.

I The vertical distances between horizontal lines illustrate thechange in the cumulative probability.

I The KM curve is a step-wise estimator, not a smooth function.

I What about estimate of point survival?

I Which is the effect of censoring?

12 / 15



Comparison of KM estimates

I It is simple to visualize the difference between two survivalcurves.

I The difference must be quantified in order to assess statisticalsignificance.

I MethodsI log-rank test ⇒ Most sensitive to consistent differenceI Wilcoxon test ⇒ Most sensitive to early differencesI hazard ratio ⇒ gives relative event rate in the groups

13 / 15



Log-Rank test: Example

Time Group 1 Group 2 Group 1 Group 2 Group 1 Group 2Event Event At Risk At Risk Expected Expected

0.5 0 1 6 6 0.50 0.500.75 0 1 6 5 0.55 0.451 1 1 6 4 1.20 0.802 1 1 5 2 1.43 0.573 1 0 4 1 0.80 0.20

3.5 0 1 3 1 0.75 0.254 1 0 3 0 1.00 0.00

4.5 1 0 2 0 1.00 0.00

The logrank test statistic is constructed by computing the observedand expected number of events in one of the groups at each

observed event time and then adding these to obtain an overallsummary across all time points where there is an event.

χ2 = 3.07; p − value = 0.0798

14 / 15



What to avoid

I Compare mean survival ⇒ Censoring makes this meaningless

I Overinterpret the tail of a survival curve ⇒ There aregenerally few subjects in tails

I Compare proportions surviving at a fixed time ⇒ Fine fordescription, not for hypothesis testing

15 / 15

Cox Proportional Hazards Regression for Survival Data

Cox Proportional Hazards Regressionfor Survival Data

Dankmar Bohning


S3RI, 2 -4 March 2015

1 / 11


Outline

Some simple distributions

The Cox PH model

Model diagnostics

2 / 11



Survival distributions

I Survival analysis focuses on the distribution of survival times.

I Although there are well known methods for estimatingunconditional survival distributions, most interesting survivalmodeling examines the relationship between survival and oneor more predictors.

I In principle, every distribution on R+ can serve to characterizesurvival data.

I Constant hazardI Gompertz distributionI Weibull distribution

3 / 11



Survival distributions

Modeling of survival data usually employs the hazard function

h(t) = lim∆t→0

Pr(t < T ≤ t + ∆t | T > t)

∆t

I Constant hazard: h(t) = λ ⇒ S(t) = e−λt

I Gompertz: h(t) = aebt , a > 0, b > 0 ⇒ S(t) = eab[1−ebt ]

I Weibull: h(t) = λata−1 ⇒ S(t) = e−λta

4 / 11


The Cox PH model

Regression-like model

A parametric model based on the exponential distribution may bewritten as

log hi (t) = β0 + β1xi1 + · · ·+ βpxip

log-baseline hazard

The constant β0 represents a kind of log-baseline hazard

5 / 11


The Cox PH model

The Cox model

The Cox model leaves the baseline hazard functionβ0(t) = log h0(t) unspecified

log hi (t) = β0(t) + β1xi1 + · · ·+ βpxip

The model is semiparametric, because while the baseline hazardcan take any form, the covariates enter the model linearly.

I The baseline hazard does not depend on covariates, but onlyon time

I The covariates are time-constant

I Proportional hazard assumption follows

6 / 11


The Cox PH model

The hazard ratio

For two observations i and j , the hazard ratio

hi (t)

hj(t)=

h0(t) exp(β1xi1 + · · ·+ βpxip)

h0(t) exp(β1xj1 + · · ·+ βpxjp)

=exp(β1xi1 + · · ·+ βpxip)

exp(β1xj1 + · · ·+ βpxjp)

= exp

(p∑

l=1

βl(xil − xjl)

)

is independent of time t. Consequently, the Cox model is aproportional hazards model.

7 / 11


The Cox PH model

The hazard ratio: an example

I Only one covariate: TreatmentI xi = 1 ⇒ PlaceboI xj = 0 ⇒ Treatment

I Hazard ratio is then exp(β1)

I We expect that hazard is larger in the placebo group, i.e. thehazard ratio is expected grater than 1.

8 / 11


The Cox PH model

Time-constant covariates

I Not changing over time (e.g. gender)

I Values are set at time t = 0

I Variables unlikely to change are often consideredtime-constant

I Other variables are sometimes treated as time independent

I Time-dependent covariates are allowed, but PH assumptionsis not satisfied (an extended Cox model is needed)

9 / 11


The Cox PH model

Advantages

I Robustness

I Because of the model form, the estimated hazards are alwaysnon-negative

I We can estimate fixed effects and compute the hazard ratioeven though the baseline hazard is left unspecified

10 / 11


Model diagnostics

Checking proportional hazards

I Test and graphical diagnostic for PH may be based on scaledSchoenfeld residuals

I Influential observations

I Nonlinearity

11 / 11

Survival Analysis in STATA

Survival Analysis in STATA

Dankmar Bohning


S3RI, 2 - 4 March 2015

1 / 1

. . . . . .

Survival Analysis - Stata

Outline

Introduction

Coding

Kaplan-Meier

PH Cox model

2 / 34

. . . . . .


Introduction

Aim

Illustrate how to use Stata to

I prepare survival data for analysis

I estimate hazard and survival functions

3 / 34

. . . . . .


Introduction

Data manipulationA manipulation of the data is needed to facilitate summary andanalysis.

help st

4 / 34

. . . . . .


Introduction

Assumptions

I Continuous time survival data

I Single failure data, i.e. one record per unit

I No complications such as truncation and/or missing values

I Data do not need to be weighted

5 / 34

. . . . . .


Introduction

Data structure

Data have a very simple structure

I One row per unit (e.g. subject)

I The survival time and the censoring status must be includedas variables (1= failure, 0 = otherwise)

I Covariates (explanatory variables) could be included

6 / 34

. . . . . .


Introduction

Data description

7 / 34

. . . . . .


Coding

stset

stset declares the data in memory to be st dataI Main

I Time variable ⇒ survival timeI Failure variable ⇒ censoring status

I OptionsI Origin time expression sets when a subject becomes at riskI Enter time expressions specifies when a subject first comes

under observationI Exit time expression specifies the latest time under which the

subject is both under observation and at risk.

8 / 34

. . . . . .


Coding

stset in practice

9 / 34

. . . . . .


Coding

stset in practice

10 / 34

. . . . . .


Coding

stset: example

11 / 34

. . . . . .


Coding

Using stsetNew variables in the data, why? Which is your meaning? Shouldyou use them?

12 / 34

. . . . . .


Coding

Using stset

I st is a binary variable indicating cases included (1) orexcluded (0) from the analysis

I d is a censoring indicator

I t is the survival time

I t0 is the time at which units are observed to be at risk

13 / 34

. . . . . .


Coding

Using stset

14 / 34

. . . . . .


Coding

Summary statisticsYou must stset your data before using

I stdescribe produces a summary of the st data

I stsum summarizes survival-time data

15 / 34

. . . . . .


Kaplan-Meier

Kaplan-Meier

I Simple single-spell type

I Right censoring

I No left censoring (truncation)

16 / 34

. . . . . .


Kaplan-Meier

sts

Survival times are treated as observations on a continuous variable

I sts list

I sts graph

I sts test

I sts generate

17 / 34

. . . . . .


Kaplan-Meier

sts list

18 / 34

. . . . . .


Kaplan-Meier

sts list: example

19 / 34

. . . . . .


Kaplan-Meier

sts graph

20 / 34

. . . . . .


Kaplan-Meier

sts graph: example

0.00

0.25

0.50

0.75

1.00

0 200 400 600 800 1000analysis time

Kaplan−Meier survival estimate

21 / 34

. . . . . .


Kaplan-Meier

sts graph: example

0.00

1.00

2.00

3.00

0 200 400 600 800 1000analysis time

Nelson−Aalen cumulative hazard estimate

22 / 34

. . . . . .


Kaplan-Meier

sts graph: example

.001

.002

.003

.004

.005

0 200 400 600 800analysis time

Smoothed hazard estimate

23 / 34

. . . . . .


Kaplan-Meier

sts graph: stratification

24 / 34

. . . . . .


Kaplan-Meier

sts graph: stratification

11

1

3

2

1

21

22

112

1

1

22

1

3

11

1

2

1

11

2

11 3

211

22 3

11 1

111

1

0.00

0.25

0.50

0.75

1.00

0 200 400 600 800 1000analysis time

sex = 1 sex = 2

Kaplan−Meier survival estimates

25 / 34

. . . . . .


Kaplan-Meier

sts test

26 / 34

. . . . . .


Kaplan-Meier

sts test

27 / 34

. . . . . .


PH Cox model

stcox

28 / 34

. . . . . .


PH Cox model

stcox: options for model checking

29 / 34

. . . . . .


PH Cox model

stcox: example

30 / 34

. . . . . .


PH Cox model

stphplot: model checking

31 / 34

. . . . . .


PH Cox model

stphplot: model checking

−2

02

4−

ln[−

ln(S

urvi

val P

roba

bilit

y)]

2 3 4 5 6 7ln(analysis time)

sex = 1 sex = 2

32 / 34

. . . . . .


PH Cox model

estat phtest: model checking

33 / 34

. . . . . .


PH Cox model

estat phtest: model checking

34 / 34

Lecture 6: Survival Analysis · Lecture 6: Survival Analysis Introduction...a clariﬁcation I...

Documents

Transcript of Lecture 6: Survival Analysis · Lecture 6: Survival Analysis Introduction...a clariﬁcation I...