Inside GitHub: Helpful Github Hacks And Tips For Web Development
Additive Hazards - GitHub Pagesleemcdaniel.github.io/presentations/addhaz.pdfAnother Form The...
Transcript of Additive Hazards - GitHub Pagesleemcdaniel.github.io/presentations/addhaz.pdfAnother Form The...
Additive Hazards
1 / 34
Motivation
Let λ(t|Zi ) be the conditional hazard function given the covariateprocess
Zi (t) = (Zi1(t), . . . ,Zip(t))′
Then an extraordinarily flexible model is
λ(t|Zi ) = α(t,Zi (t)) (1)
With α an unknown function of time and the covariates.
2 / 34
Motivation
• First assume α(t, 0) = 0.
• Then take a first order Taylor expanion of α(t, z) about 0
Now it turns out that (1) reduces to
λ(t|Zi ) =
p∑j=1
αj(t)Zij(t) (2)
This is Aalen’s additive hazard model.
3 / 34
Another Form
The previous model is the original formulation. Let’s instead use
λ(t|Zj(t)) = β0(t) +
p∑k=1
βk(t)Zjk(t) (3)
to follow the notation in Klein and Moeschberger.
4 / 34
A Quick Definition
First we make a design matrix X(t), an n× (p + 1) matrix. For theith row, set
Xi (t) = Yi (t)(1,Zi (t)).
So the ith row of X(t) is (1,Zi (t)) if subject i is at risk at time t,otherwise 0.
5 / 34
Martingale Form
By the form in (3),
M(t) = N(t)−∫ t
0X(u)β(u)du
where M(t) is a n × 1 vector of martingales. So
dN(t) = X(t)β(t) + dM(t)
Now set dM(t) = 0
6 / 34
Martingale Form
∫ t
0β(u)du =
∫ t
0X−(u)dN(u)
=∑Tj≤t
δjX−(Tj), for t ≤ τ
This gives a way to estimate what we need.
Note that X−(u) can be any generalized inverse.
7 / 34
Estimation
We won’t estimate βk(t) directly, instead estimate
Bk(t) =
∫ t
0βk(u)du
for each k .
Bk(t) is called a cumulative regression function
Now we will use least-squares.
8 / 34
Another Definition
Let I(t) be the n× 1 vector with ith element equal to 1 if subject idies at t and 0 otherwise.
This is essentially dN(t).
9 / 34
Estimation
The least squares estimate of B(t) = (B0(t), . . . ,Bp(t))′ is
B̂(t) =∑Ti≤t
[X′(Ti )X(Ti )
]−1X′(Ti )I(Ti ) (4)
This is using the generalized inverse suggested by Aalen.
10 / 34
Variance
(B̂− B)(t) =
∫ t
0X−(u)dN(u)−
∫ t
0β(u)du
=
∫ t
0X−(u) {X(u)β(u) + dM(u)} −
∫ t
0β(u)du
=
∫ t
0X−(u)dM(u)
and so
〈B̂− B〉(t) =
∫ t
0X−(u)〈dM(u)〉X−(u)′
=
∫ t
0X−(u)diag(Y(u)λ(u))X−(u)′
11 / 34
Variance
A good estimator for Y(t)λ(t) is dN(t)
So, using I(t) for dN(t) and the specific choice of generalizedinverse,
V̂ar(B̂(t))
=∑Ti≤t
[X′(Ti )X(Ti )
]−1X′(Ti )D(I(Ti ))X(Ti )
{[X′(Ti )X(Ti )
]−1}′where D(I(t)) is diag(I(t)).
12 / 34
Estimation
• [X′(Ti )X(Ti )]−1X′(Ti ) is one specific choice for X(Ti )−
• Based on least squares
• Non-optimal
• Optimality requires the true values of parameter functions
• Estimator is defined as long as X′(Ti )X(Ti ) is invertible
13 / 34
Simulation
Similar (but not identical) to Aalen’s 1989 paper, we will use
λ(t) = 1 + 1.75I{t ≤ 0.2}Z1 + 3Z2
with Z1, Z2 binary and
P(Z1 = 0) = P(Z2 = 0) = 1/2
No censoring
14 / 34
Cumulative Regression Functions
Initial risk set = 100
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Time
Cum
ulat
ive R
egre
ssio
n Fu
nctio
n
Z1Z2Intercept
15 / 34
Cumulative Regression Functions
Initial risk set = 1000
0.0 0.2 0.4 0.6 0.8 1.0
01
23
4
Time
Cum
ulat
ive R
egre
ssio
n Fu
nctio
n
Z1Z2Intercept
16 / 34
Cumulative Regression Functions
Initial risk set = 10000
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Time
Cum
ulat
ive R
egre
ssio
n Fu
nctio
n
Z1Z2Intercept
17 / 34
Testing
Aalen presents the null hypothesis:
Hj : βj(t) = 0 ∀t
A test statistic for Hj is given by the (j + 1)th element of
U =∑Ti
K(Ti )I(Ti ) =∑Ti
W(Ti )[X′(Ti )X(Ti )]−1X′(Ti )I(Ti )
(5)with
W(t) ={
diag[X′(t)X(t)]−1}−1
18 / 34
Testing
The covariance matrix of U is
V =∑Ti
K(Ti )diag(I(Ti ))K′(Ti ) (6)
ThenU′VU
is asymptotically χ2p when Hj holds for all j .
19 / 34
Efficiency Considerations
• Based on least squares
• No guarantee estimates are optimal
• No guarantee tests are optimal
20 / 34
Lung Data
• From survival package in R: Survival in patients with advancedlung cancer from the North Central Cancer Treatment Group.
• Performance scores rate how well the patient can performusual daily activities.
• We will use ECOG performance score as covariate
21 / 34
ECOG Score
Score ECOG
0 Fully active1 Restricted activity but ambulatory and capable of light work2 Ambulatory, capable of self care, no work3 Limited self care, bed or chair most of the time4 Completely disabled5 Dead
22 / 34
aareg() Function
>library(survival)
>aalen <- aareg(Surv(time, status) ~ age + sex + ph.ecog,
data=lung)
>aalen
slope coef se(coef) z p
Intercept 5.05e-03 5.87e-03 4.74e-03 1.240 0.216000
age 4.01e-05 7.15e-05 7.23e-05 0.989 0.323000
sex -3.16e-03 -4.03e-03 1.22e-03 -3.310 0.000935
ph.ecog 3.01e-03 3.67e-03 1.02e-03 3.610 0.000303
Chisq=26.18 on 3 df, p=8.7e-06; test weights=aalen
23 / 34
ECOG Cumulative Regression Function
0 200 400 600 800
0.00.5
1.01.5
Time
ph.ec
og
24 / 34
Lin and Ying’s Model
25 / 34
Lin and Ying’s Model
Lin and Ying (1994) present the reduced model
λ(t|Zj(t)) = β0(t) +
p∑k=1
βkZjk(t) (7)
With the usual definitions, the intensity function for Ni (t) is
Yi (t)dΛ(t;Zi ) = Yi (t){dΛ0(t) + β′0Zi (t)dt}
with Λ0(t) =∫ t0 λ0(u)du
26 / 34
Martingale Structure
Now it’s useful to note that Ni (·) can be decomposed into
Ni (t) =Mi (t) +
∫ t
0Yi (u)dΛ(u;Zi )
=Mi (t) +
∫ t
0Yi (u){dΛ0(t) + β′0Zi (t)dt} (8)
Then the obvious way to estimate Λ0(t) is by
Λ̂0(β̂, t) =
∫ t
0
∑ni=1{dNi (u)− Yi (u)β̂′Zi (u)du}∑n
j=1 Yj(u)
27 / 34
Estimation
Use the estimating function
U(β) =n∑
i=1
∫ ∞0
Zi (t){dNi (t)− Yi (t)d Λ̂0(β, t)− Yi (t)β′Zi (t)dt}
Which is equivalent to
U(β) =n∑
i=1
∫ ∞0{Zi (t)− Z̄ (t)}{dNi (t)− Yi (t)β′Zi (t)dt}
where
Z̄ (t) =
∑nj=1 Yj(t)Zj(t)∑n
j=1 Yj(t)
28 / 34
Estimation
Then U(β) = 0 and solving, we get that
β̂ = A−1
[n∑
i=1
∫ ∞0{Zi (t)− Z̄ (t)}dNi (t)
]
with
A =
[n∑
i=1
∫ ∞0
Yi (t){Zi (t)− Z̄ (t)}⊗2dt
]
29 / 34
Variance Estimation
The covariance of β̂ can be estimated by
(n−1A
)−1 [1
n
n∑i=1
∫ ∞0{Zi (t)− Z̄ (t)}⊗2dNi (t)
] (n−1A
)−1
30 / 34
Properties of the Estimating Equation
If β0 is the true covariate vector, then
U(β0) =n∑
i=1
∫ ∞0{Zi (t)− Z̄ (t)}dMi (t)
and n−1/2U(β0)→d N(0,Σ), with Σ estimated by
n−1n∑
i=1
∫ ∞0{Zi (t)− Z̄ (t)}⊗2dNi (t)
31 / 34
Asymptotic Considerations
• No guarantee of optimal consistency
• Weighting by true hazard leads to optimality
• Can estimate hazard, then weight
32 / 34
ahaz() Function
library(ahaz)
attach(lung)
time2 <- time[!is.na(ph.ecog) & !is.na(time)]
time2 <- time2 + runif(length(time2), 0, 0.001)
sex2 <- sex[!is.na(ph.ecog) & !is.na(time)]
age2 <- age[!is.na(ph.ecog) & !is.na(time)]
status2 <- status[!is.na(ph.ecog) & !is.na(time)]
ph.ecog2 <- ph.ecog[!is.na(ph.ecog) & !is.na(time)]
ly <- ahaz(Surv(time2, status2),
cbind(age2,sex2,ph.ecog2))
33 / 34
Results
Coefficients:
Estimate Std. Error Z value Pr(>|z|)
age2 2.109e-05 2.149e-05 0.982 0.326304
sex2 -1.216e-03 3.595e-04 -3.384 0.000715 ***
ph.ecog2 1.116e-03 3.039e-04 3.672 0.000240 ***
34 / 34