OECD Universities’ Joint Economics Congress Paris, 6-8 July 2011 New Directions in Welfare II
Joint Modelling - Challenges and Future Directions
description
Transcript of Joint Modelling - Challenges and Future Directions
Joint Modelling of Longitudinal and Time-to-Event Data:Challenges and Future Directions
Dimitris Rizopoulos
Department of Biostatistics, Erasmus Medical Center, the Netherlands
Italian Statistical Society Annual Meeting
June 16th, 2010, Padova
Outline
• Basics of the Joint Modelling Framework
• Residuals
• Dynamic Predictions
SIS 2010, Padova 1/33
1.1 Introduction
• Over the last 10-15 years increasing interest in joint modelling of longitudinal andtime-to-event data (Tsiatis & Davidian, Stat. Sinica, 2004; Yu et al., Stat. Sinica, 2004)
• Wide range of applications
◃ HIV studies
◃ cancer studies
◃ surrogate markers
◃ . . .
SIS 2010, Padova 2/33
1.2 When These Models are Applicable?
• In many cases, we are interested in the effect of time-dependent covariates onsurvival times
◃ treatment changes with time (e.g., dose)
◃ time-dependent exposure (e.g., smoking)
◃ longitudinal measurements on the patient level (e.g., blood values)
◃ . . .
SIS 2010, Padova 3/33
1.3 Case Study – AIDS Data
• Longitudinal study on 467 HIV infected patients who had failed or were intolerant ofzidovudine therapy
• Outcomes
◃ Time to death
◃ randomized treatment: didanosine (ddI) and zalcitabine (ddC)
◃ longitudinal measurements of CD4 cell counts
SIS 2010, Padova 4/33
1.3 Case Study – AIDS Data (cont’d)
0 1 2 3 4 5
24
68
10
Time
CD
4
SIS 2010, Padova 5/33
1.3 Case Study – AIDS Data (cont’d)
0 1 2 3 4 5
24
68
10
Time
CD
4
Time toDeath
SIS 2010, Padova 5/33
1.3 Case Study – AIDS Data (cont’d)
0 1 2 3 4 5
24
68
10
Time
CD
4
Patient Died
Time toDeath
= 2.9 months
SIS 2010, Padova 5/33
1.3 Case Study – AIDS Data (cont’d)
Months
CD
4
0
1
2
3
4
0 2 6 12 18
ddC0 2 6 12 18
ddI
SIS 2010, Padova 6/33
1.3 Case Study – AIDS Data (cont’d)
0 5 10 15 20
0.0
0.2
0.4
0.6
0.8
1.0
Months
Sur
viva
l
Died: 188 (40.3%)
ddCddI
SIS 2010, Padova 7/33
2.1 Requirement for Joint Modelling
• Aim: we want to measure the effect of a time-dependent covariate to the hazard foran event
• Problem: CD4 cell count is an internal time-dependent covariate have a stochasticnature
◃ we do not observe the true values but rather contaminated with measurementerror values
◃ the complete history is not available
SIS 2010, Padova 8/33
2.1 Requirement for Joint Modelling (cont’d)
0 1 2 3 4 5
46
810
Time
CD
4
Event Time
Values measuredwith error
Underlyingtrue evolution
SIS 2010, Padova 9/33
2.1 Requirement for Joint Modelling (cont’d)
• Aim: we want to measure the effect of a time-dependent covariate to the hazard foran event
• Problem: internal covariate have a stochastic nature
◃ we do not observe the true values but rather contaminated with measurementerror values
◃ the complete history is not available
• Solution: Joint Modelling of Longitudinal and Time-to-Event Data
SIS 2010, Padova 10/33
2.2 Joint Modelling Framework
• Step 1: let’s assume that we know mi(t), i.e., the true & unobserved value of CD4cell count at time t
• Then, we can define a standard relative risk model
hi(t | Mi(t)) = h0(t) exp{γ⊤wi + αmi(t)},
where
◃ Mi(t) = {mi(s), 0 ≤ s < t} CD4 cell count history
◃ α quantifies the effect of CD4 cell count on the hazard for death
◃ wi baseline covariates
SIS 2010, Padova 11/33
2.2 Joint Modelling Framework (cont’d)
• Step 2: from the observed longitudinal response yi(t) reconstruct the covariatehistory for each subject
• Mixed effects model
yi(t) | bi = mi(t) + εi(t)
= x⊤i (t)β + z⊤i (t)bi + εi(t), εi(t) ∼ N (0, σ2)
where
◃ xi(t) and β: fixed effects part
◃ zi(t) and bi: random effects part
SIS 2010, Padova 12/33
2.2 Joint Modelling Framework (cont’d)
• Step 3: the two processes are associated ⇒ define a model for their joint distribution
• Joint Models for such joint distributions are of the following form(Tsiatis & Davidian, Stat. Sinica, 2004)
p(yi, Ti, δi) =
∫
p(yi | bi){
hi(Ti | bi)δ S(Ti | bi)
}
p(bi) dbi
where
◃ yi: longitudinal measurements; Ti: time to event; δi: event indicator
◃ bi a vector of random effects that explains the interdependencies
◃ p(·) density function; S(·) survival function
SIS 2010, Padova 13/33
2.2 Joint Modelling Framework (cont’d)
• Assumptions for the baseline hazard functions h0(t): A choice that often workssatisfactorily is a piecewise-constant baseline hazard
h0(t) =
Q∑
q=1
ξqI(vq−1 < t ≤ vq)
where 0 = v0 < v1 < · · · < vQ denotes a split of the time scale
• For the random effects: bi ∼ N (0, D)
◃ sensitivity to misspecification: minimal, especially as ni increases(Rizopoulos, Verbeke & Molenberghs, Biometrika, 2008)
SIS 2010, Padova 14/33
2.3 Estimation
• Estimation of joint models is a computationally challenging task
◃ computation of survival function: integration with respect to time
Si(t | bi) = exp
(
−
∫ t
0
h0(s) exp{γ⊤wi + αmi(s)} ds
)
◃ computation of log-likelihood: integration wrt the random effects
ℓ(θ) =∑
i
log
∫
p(yi | bi; θ){
p(Ti | bi; θ)δi Si(Ti | bi; θ)
1−δi}
p(bi; θ) dbi
SIS 2010, Padova 15/33
2.3 Estimation (cont’d)
• Combination of optimization and numerical integration
• Numerical integration
◃ Gaussian quadrature
◃ Monte Carlo
◃ Laplace (especially useful in high-dimensional settings)(Rizopoulos, Verbeke & Lesaffre, JRSSB, 2009)
• Optimization
◃ EM algorithm
◃ Newton-Raphson or quasi-Newton algorithm
◃ hybrid algorithms (combination of EM & quasi-Newton)
SIS 2010, Padova 16/33
3.1 A Joint Model for the AIDS Data
• Longitudinal submodel
◃ fixed effects: time effect + interaction of treatment with time
◃ random effects: intercept + time effect
• Survival submodel
◃ treatment effect + underlying CD4 cell count effect
◃ piecewise-constant baseline hazard in 7 intervals
SIS 2010, Padova 17/33
3.1 A Joint Model for the AIDS Data (cont’d)
Survival Longitudinal Random Effects
value (std.err) value (std.err) value
Treat 0.35 (0.15) Inter 2.56 (0.04) Inter 0.87
CD4 −1.10 (0.12) Time −0.04 (0.005) Time 0.04
Treat:Time 0.01 (0.01) ρ 0.07
σ 0.38
• Highly significant association between the underlying CD4 cell count at time t, andthe risk for death at t
SIS 2010, Padova 18/33
3.2 A Comparison of JM vs naive TD Cox
• To illustrate the virtues of joint modelling, we compare with the standardtime-dependent Cox model
◃ i.e., we ignore the measurement error in the CD4 cell count
Joint Model Naive TD Cox
value (std.err) value (std.err)
Treat 0.35 (0.15) 0.33 (0.15)
CD4 −1.10 (0.12) −0.72 (0.08)
• Clearly, there is a considerable effect of ignoring the measurement error, especially forthe effect of CD4!
SIS 2010, Padova 19/33
3.2 A Comparison of JM vs naive TD Cox (cont’d)
0 1 2 3 4 5
46
810
Time
CD
4
Event Time
Joint Modeltime−dependent Cox
SIS 2010, Padova 20/33
4.1 Extensions
• Joint modelling of longitudinal and time-to-event data is a very active area of currentBiostatistics research
• Several extensions of the standard joint model have been proposed
◃ recurrent events
◃ competing risks
◃ semiparametric modelling of longitudinal profiles
◃ semiparametric modelling of random effects distribution
◃ . . .
SIS 2010, Padova 21/33
4.2 Challenges: Checking Model Assumptions
• Various types of residuals have been proposed to check the fit of mixed models andsurvival models separately
• However, not all of these cannot be directly used for joint models
• Problems arise due to nonrandom dropout, i.e., the longitudinal evolutions that weend-up observing are not a random sample of the target population
⇓
The reference distribution of the observed residuals is not directly evident
SIS 2010, Padova 22/33
4.2 Challenges: Checking Model Assumptions
• yoi : longitudinal measurements before Ti; ymi : longitudinal measurements after Ti
• Missing data mechanism:
p(Ti | yoi , y
mi ) =
∫
p(Ti | bi) p(bi | yoi , y
mi ) dbi
• This still depends on ymi , which means nonrandom dropout
• Observed measurements are not a random sample of the target population
SIS 2010, Padova 23/33
4.2 Challenges: Checking Model Assumptions
• Not too much work for checking the assumptions of joint models
• Two approaches
◃ Conditional residuals (Dobson & Henderson, Biometrics, 2003)
◃ Multiple Imputation residuals (Rizopoulos, Verbeke & Molenberghs, Biometrics, 2010)
* appropriately multiply impute ymi* complete data ⇒ use standard residual plots
• More work is required
SIS 2010, Padova 24/33
4.3 Challenges: Dynamic Prediction
• Increasing interest in individualized predictions
◃ e.g., how treatment will work on a specific patient (with a specific medical history)
• Within the Joint Modelling Framework
◃ estimate the probability for an event for a specific patient
◃ how does this probability changes as longitudinal measurements are collected
• Dynamic Prediction: update probabilities for an event dynamically as longitudinalinformation is recorded
SIS 2010, Padova 25/33
4.3 Challenges: Dynamic Prediction (cont’d)
• Example: we take two patients who provided 4 CD4 cell count measurements each
SIS 2010, Padova 26/33
4.3 Challenges: Dynamic Prediction (cont’d)
Months
CD
4
2.0
2.5
3.0
3.5
4.0
0 2 6 12
Patient 200 2 6 12
Patient 7
SIS 2010, Padova 27/33
4.3 Challenges: Dynamic Prediction (cont’d)
• Example: we take two patients who provided 4 CD4 cell count measurements each
• We are interested in comparing predicted survival probabilities for these two patients
SIS 2010, Padova 28/33
4.3 Challenges: Dynamic Prediction (cont’d)
• More formally, we have available measurements up to time point t
Yi(t) = {yi(s), 0 ≤ s ≤ t}
and we are interested in
πi(u | t) = Pr{
Ti ≥ u | Ti > t,Yi(t),Dn
}
where
◃ where u > t, and
◃ Dn denotes the sample on which the joint model was fitted
SIS 2010, Padova 29/33
4.3 Challenges: Dynamic Prediction (cont’d)
• It is convenient to proceed using a Bayesian formulation of the problem. Morespecifically, πi(u | t) can be written as
Pr{
Ti ≥ u | Ti > t,Yi(t),Dn
}
=
∫
Pr{
Ti ≥ u | Ti > t,Yi(t); θ}
p(θ | Dn) dθ
• The first part of the integrand reduces to
Pr{
Ti ≥ u | Ti > t,Yi(t); θ}
=
=
∫
Si
{
u | Mi(u, bi, θ); θ}
Si
{
t | Mi(t, bi, θ); θ} p(bi | Ti > t,Yi(t); θ) dbi
SIS 2010, Padova 30/33
4.3 Challenges: Dynamic Prediction (cont’d)
Survival Probability
baseline
after 2 m
after 6 m
after 12 m
0.6 0.7 0.8 0.9 1.0
extra 2 months survivalPatient 20
extra 4 months survivalPatient 20
0.6 0.7 0.8 0.9 1.0
extra 6 months survivalPatient 20
baseline
after 2 m
after 6 m
after 12 m
extra 2 months survivalPatient 7
0.6 0.7 0.8 0.9 1.0
extra 4 months survivalPatient 7
extra 6 months survivalPatient 7
SIS 2010, Padova 31/33
5 Software
• Software: R package JM freely available via http://cran.r-project.org/
◃ http://rwiki.sciviews.org/doku.php?id=packages:cran:jm
• Longitudinal process
◃ linear mixed effects model
• Survival process
◃ Wulfsohn and Tsiatis model (Biometrics, 1997)
◃ Weibull accelerated failure time & PH model
◃ PH model with piecewise-constant baseline hazard
◃ PH model with spline-approximated baseline hazard
SIS 2010, Padova 32/33
Thank you for your attention!
SIS 2010, Padova 33/33