4. Comparison of Two (K) Samples
Transcript of 4. Comparison of Two (K) Samples
4. Comparison of Two (K) Samples
K=2
๐: Treatment indicator, i.e. ๐ = 1 for treatment 1 (new treatment); ๐ = 0 for treatment 0 (standard treatment or placebo)
Problem: compare the survival distributions between two groups.
Ex: comparing treatments on patients with a particular disease.
Null Hypothesis:
H0: no treatment (group) differenceH0: ๐0 ๐ก = ๐1 ๐ก , for ๐ก โฅ 0H0: ๐0 ๐ก = ๐1 ๐ก , for ๐ก โฅ 0
Alternative Hypothesis:
Ha: the survival time for one treatment is stochastically larger or smaller than the survival time for the other treatment.Ha: ๐1 ๐ก โฅ ๐0 ๐ก , for ๐ก โฅ 0 with strict inequality for some ๐ก (one-sided)Ha: either ๐1 ๐ก โฅ ๐0 ๐ก , or ๐0 ๐ก โฅ ๐1 ๐ก , for ๐ก โฅ 0 with strict inequality for some ๐ก
Solution: In biomedical applications, it has become common practice to use nonparametric tests; that is, using test statistics whose distribution under the null hypothesis does not depend on specific parametric assumptions on the shape of the probability distribution. With censored survival data, the class of weighted logrank tests are mostly used, with the logrank test being the most commonly used.
NotationsA sample of triplets ๐๐ , ฮ๐ , ๐๐ , ๐ = 1, 2, โฆ , ๐, where
๐๐ = min(๐๐ , ๐ถ๐)
๐๐ = latent failure time; ๐ถ๐ = latent censoring time
ฮ๐ = ๐ผ ๐๐ โค ๐ถ๐ ๐๐ = แ1 ๐๐๐ค ๐ก๐๐๐๐ก๐๐๐๐ก0 ๐ ๐ก๐๐๐๐๐๐ ๐๐๐๐๐ก๐๐๐๐ก
Also, define,
๐1 = number of individuals in group 1๐0 = number of individuals in group 0
๐๐ =
๐=1
๐
๐ผ(๐๐ = ๐) , ๐ = 0, 1
๐ = ๐0 + ๐1๐1(๐ฅ) = number of individuals at risk at time ๐ฅ from trt 1 = ฯ๐=1
๐ ๐ผ(๐๐ โฅ ๐ฅ, ๐๐ = 1)๐0(๐ฅ) = number of individuals at risk at time ๐ฅ from trt 0 = ฯ๐=1
๐ ๐ผ(๐๐ โฅ ๐ฅ, ๐๐ = 0)
๐(๐ฅ) = ๐0(๐ฅ) + ๐1(๐ฅ)
๐๐1(๐ฅ) = # of deaths observed at time ๐ฅ from trt 1 = ฯ๐=1๐ ๐ผ(๐๐ = ๐ฅ, ฮ๐ = 1, ๐๐ = 1)
๐๐0(๐ฅ) = # of deaths observed at time ๐ฅ from trt 0 = ฯ๐=1๐ ๐ผ(๐๐ = ๐ฅ, ฮ๐ = 1, ๐๐ = 0)
๐๐ ๐ฅ = ๐๐0 ๐ฅ + ๐๐1 ๐ฅ = ฯ๐=1๐ ๐ผ(๐๐ = ๐ฅ, ฮ๐ = 1)
Note: ๐๐ ๐ฅ actually correspond to the observed number of deaths in time window ๐ฅ, ๐ฅ + ฮ๐ฅ for some partition of the time axis into intervals of length ฮ๐ฅ. If the partition
is sufficiently fine then thinking of the number of deaths occurring exactly at ๐ฅ or in ๐ฅ, ๐ฅ + ฮ๐ฅ makes little difference, and in the limit makes no difference at all.
Weighted logrank Test Statistic๐ ๐ค =
๐(๐ค)
๐ ๐ ๐ ๐ค
๐ ๐ค =
๐ฅ
๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)
๐ ๐ ๐ ๐ค will be given later.
The null hypothesis of treatment equality will be rejected if ๐ ๐ค is sufficiently different from zero.
Note: 1. At any time ๐ฅ for which there is no observed death
๐๐1 ๐ฅ โ๐1 ๐ฅ ร๐๐ ๐ฅ
๐ ๐ฅ= 0.
This means that the sum above is only over distinct failure times.2. A weighted sum over the distinct failure times of observed number of deaths from
treatment 1 minus the expected number of deaths from treatment 1 if the null hypothesis were true.
3. When ๐ค ๐ฅ = 1, logrank test statistic
Where,
MotivationTake a slice of time ๐ฅ, ๐ฅ + ฮ๐ฅ :
The following 2 ร 2 table can be formulated:
๐๐1 ๐ฅ |๐1 ๐ฅ , ๐ ๐ฅ , ๐๐ ๐ฅ ~๐ป๐ฆ๐๐๐๐๐๐๐๐๐ก๐๐๐ ๐1 ๐ฅ , ๐๐ ๐ฅ , ๐ ๐ฅ
Under H0:
So, ๐ธ ๐๐1 ๐ฅ |๐1 ๐ฅ , ๐ ๐ฅ , ๐๐ ๐ฅ =๐1 ๐ฅ ๐๐(๐ฅ)
๐(๐ฅ)
๐๐1 ๐ฅ โ๐1 ๐ฅ ร๐๐(๐ฅ)
๐(๐ฅ)is the observed number of deaths minus expected number of
deaths due to treatment 1. Hence,
โข if H0 is true, sum of ๐๐1 ๐ฅ โ๐1 ๐ฅ ร๐๐(๐ฅ)
๐(๐ฅ)over ๐ฅ is expected to be near zero.
โข If the hazard rate for treatment 1 were lower than that for treatment 0 consistently
over ๐ฅ, then on average, we expect ๐๐1 ๐ฅ โ๐1 ๐ฅ ร๐๐ ๐ฅ
๐ ๐ฅto be negative.
โข If the hazard rate for treatment 1 were higher than that for treatment 0 consistently
over ๐ฅ, then on average, we expect ๐๐1 ๐ฅ โ๐1 ๐ฅ ร๐๐ ๐ฅ
๐ ๐ฅto be positive.
Specifically, the weighted logrank test statistic is given by
๐ ๐ค =ฯ๐ฅ๐ค ๐ฅ ๐๐1 ๐ฅ โ
๐1 ๐ฅ ร ๐๐(๐ฅ)๐(๐ฅ)
ฯ๐ฅ๐ค2 ๐ฅ
๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]๐2 ๐ฅ ๐ ๐ฅ โ 1
1/2
Under H0: ~aT(w) N(0, 1)
Therefore, a level ๐ผ test (two-sided) will reject H0: ๐0 ๐ก = ๐1 ๐ก , when
๐ ๐ค โฅ ๐ง๐ผ/2
Remarks:
1. Logrank test stat. =ฯ๐ฅ ๐๐1 ๐ฅ โ
๐1 ๐ฅ ร๐๐(๐ฅ)
๐(๐ฅ)
ฯ๐ฅ๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ1
1/2
2. The statistic in the numerator is a weighted sum of observed minus the expected over the ๐ 2 ร 2 tables, where ๐ is the number of distinct failure times.
3. The weight function ๐ค ๐ฅ can be used to emphasize differences in the hazard rates over time according to their relative values. For example, if the weight early in time is larger and later becomes smaller, then such test statistic would emphasize early differences in the survival curves.
4. If the weights ๐ค ๐ฅ are stochastic (functions of data), then they need to be a function of the censoring and survival information prior to time ๐ฅ.
5. ๐ค ๐ฅ = 1: Logrank test
6. ๐ค ๐ฅ = ๐(๐ฅ):Gehanโฒs generalization of wilcoxon test
7. ๐ค ๐ฅ = ๐พ๐(๐ฅ): PetoโPrenticeโฒs generalization of wilcoxon test
Note: Since both ๐(๐ฅ) and ๐พ๐(๐ฅ) are non-increasing functions of ๐ฅ, both Gehanโฒsand PetoโPrenticeโฒs tests emphasize the difference early in the survival curves.
A Heuristic Proof๐น ๐ฅ = ๐๐0 ๐ข , ๐๐1 ๐ข , ๐1 ๐ข , ๐0 ๐ข ,๐ค1 ๐ข ,๐ค0 ๐ข , ๐๐ ๐ฅ for all grid points ๐ข < ๐ฅ
Define a set of random variables:
Assume H0 is true. Knowing ๐น ๐ฅ would imply (with respect to the 2 ร 2 table) that:
We know ๐1 ๐ฅ , ๐0 ๐ฅ (i.e., the number at risk at time ๐ฅ from either treatment group),and, in addition, we know ๐๐ ๐ฅ (i.e., the number of deaths โ total from bothtreatment groups โ occurring in ๐ฅ, ๐ฅ + ฮ๐ฅ ). The only thing we don't know is ๐๐1 ๐ฅ .
Conditional on ๐น ๐ฅ , we have a 2 ร 2 table, which under the null hypothesis follows independence, and we have the knowledge of the marginal counts of the table (i.e., the marginal count are fixed conditional on ๐น ๐ฅ ). Therefore, the conditional distribution of one of the counts, say, ๐๐1 ๐ฅ , in the cell of the table, given ๐น ๐ฅ follows a hypergeometric distribution.
๐ ๐๐1 ๐ฅ = ๐|๐1 ๐ฅ , ๐ ๐ฅ , ๐๐ ๐ฅ =
๐๐(๐ฅ)๐
๐ ๐ฅ โ๐๐(๐ฅ)๐1 ๐ฅ โ๐
๐(๐ฅ)๐1 ๐ฅ
๐ธ ๐๐1 ๐ฅ |๐น ๐ฅ =๐1 ๐ฅ ๐๐(๐ฅ)
๐(๐ฅ)
๐๐๐ ๐๐1 ๐ฅ |๐น ๐ฅ =๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
๐ ๐ค =
๐ฅ
๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)
๐ธ ๐ ๐ค =
๐ฅ
๐ธ ๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)
=
๐ฅ
๐ธ ๐ธ ๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)๐น(๐ฅ)
=
๐ฅ
๐ธ ๐ค ๐ฅ ๐ธ ๐๐1 ๐ฅ ๐น(๐ฅ) โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)= 0
The numerator of the weighted logrank test statistic is:
Notice that under H0 :
Next, we will find an unbiased estimator for the variance of ๐ ๐ค . Let
๐ด ๐ฅ = ๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ).
Then,
๐๐๐ ๐ ๐ค = ๐๐๐
๐ฅ
๐ด(๐ฅ) =
๐ฅ
๐๐๐ ๐ด ๐ฅ +
๐ฅโ ๐ฆ
๐ถ๐๐ฃ ๐ด ๐ฅ , ๐ด ๐ฆ .
Notice that we already show: ๐ธ ๐ด ๐ฅ = ๐ธ ๐ด ๐ฆ = 0. WOLG, suppose y < ๐ฅ, then,
๐ถ๐๐ฃ ๐ด ๐ฅ , ๐ด ๐ฆ = ๐ธ ๐ด ๐ฅ โ ๐ด(๐ฆ) = ๐ธ ๐ธ ๐ด ๐ฅ โ ๐ด(๐ฆ) ๐น(๐ฅ)
= ๐ธ ๐ด ๐ฆ ๐ธ ๐ด(๐ฅ) ๐น(๐ฅ) = 0
Now, ๐๐๐ ๐ ๐ค =
๐ฅ
๐๐๐ ๐ด ๐ฅ =
๐ฅ
๐ธ ๐ด2 ๐ฅ =
๐ฅ
๐ธ ๐ธ ๐ด2 ๐ฅ ๐น(๐ฅ)
=
๐ฅ
๐ธ ๐ธ ๐ค2 ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)
2
๐น(๐ฅ)
=
๐ฅ
๐ธ ๐ค2 ๐ฅ ๐ธ ๐๐1 ๐ฅ โ ๐ธ ๐๐1 ๐ฅ2๐น(๐ฅ)
=
๐ฅ
๐ธ ๐ค2 ๐ฅ ๐๐๐ ๐๐1 ๐ฅ ๐น(๐ฅ)
=
๐ฅ
๐ธ ๐ค2 ๐ฅ๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
This means:
๐ฅ
๐ค2 ๐ฅ๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1is an unbiased estimator for ๐๐๐ ๐ ๐ค .
๐ ๐ค =๐(๐ค)
๐ ๐ ๐ ๐ค
ฯ๐ฅ๐ค ๐ฅ ๐๐1 ๐ฅ โ๐1 ๐ฅ ร ๐๐(๐ฅ)
๐(๐ฅ)
ฯ๐ฅ๐ค2 ๐ฅ
๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]๐2 ๐ฅ ๐ ๐ฅ โ 1
1/2
Recapping:
Under H0 : ๐0 ๐ก = ๐1 ๐ก
๐ ๐ค =๐(๐ค)
๐ ๐ ๐ ๐ค
1. The Statistics ๐ ๐ค = ฯ๐ฅ๐ด(๐ฅ) has expectation equal to zero, i.e. E ๐ ๐ค = 0.
2. ๐ ๐ค = ฯ๐ฅ๐ด(๐ฅ) is made up of a sum of conditionally uncorrelated terms each with mean zero. By the central limit theory for such martingale structures, U(w) properly normalized will be approximately a standard normal random variable. That is:
~a N(0, 1)
๐ฅ
๐ค2 ๐ฅ๐1 ๐ฅ ๐0 ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
3. An unbiased estimate of the variance of ๐ ๐ค was given by
~a N(0, 1)
Therefore,
#
An ExampleThe data give the survival times for 25 myelomatosis patients randomized to two treatments (1 or 2):
dur status trt renal8 1 1 1180 1 2 0โฆ1296 1 2 0
dur is the patient's survival or censored time, status is the censoring indicator, trt is the treatment indicator,renal is the indicator of impaired renal function (0 = normal; 1 =impaired).
To test the null hypothesis the treatment trt has no effect, i.e. H0 : ๐0 ๐ก = ๐1 ๐ก
SAS & R codes
Note:
1. the numerator of Gehan's Wilcoxon test is much larger than that of logranktest since Gehan's Wilcoxon test uses the number at risk as the weight and logrank test uses identity weight.
2. The likelihood ratio test is based on exponential model.
3. In this example, logrank test gives a more significant result than Gehan'sWilcoxon test (although none of them provides strong evidence against the null hypothesis). Why is that?
The treatment specific Kaplan-Meier survival estimates were generated using the following R functions:
pdf(file="fig_myel.pdf", horizontal = F, height=6, width=8.5, pointsize=14)
# par(mfrow=c(1,2))
example <- read.table(file="chap4_myel.txt", header=T);
fit <- survfit(Surv(dur, status) ~ trt, example);
plot(fit, xlab="Patient time (months)", ylab="survival probability", lty=c(1,2))
legend(1000,1, c("trt = 1", "trt = 2"), lty=c(1,2), cex=0.8)
dev.off()
> survdiff(Surv(dur, status) ~ trt, example)Call:survdiff(formula = Surv(dur, status) ~ trt, data = example)
N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 12 6 8.34 0.655 1.31trt=2 13 11 8.66 0.631 1.31
Chisq= 1.3 on 1 degrees of freedom, p= 0.252 > survdiff(Surv(dur, status) ~ trt, rho=1, example)
N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 12 4.80 5.60 0.115 0.304trt=2 13 6.83 6.03 0.106 0.304
Chisq= 0.3 on 1 degrees of freedom, p= 0.581
logrank test in R
Peto-Prentice's Wilcoxon test
Kaplan-Meier estimates for two treatments
Power and Sample SizeSince a survival curve is infinite dimensional, describing departures from the null as differences at every point in time over the survival curve would be complicated. Clearly, some simplifying conditions must be given. In clinical trials, proportional hazards alternatives have become very popular. That is,
๐1 ๐ก
๐0 ๐ก= exp ๐ฝ , for all ๐ก โฅ 0
1. ฮฒ > 0 individuals on treatment 1 have worse survival (i.e., die faster).2. ฮฒ = 0 no treatment difference (null is true)3. ฮฒ < 0 individuals on treatment 1 have better survival (i.e., live longer).
๐1 ๐ก
๐0 ๐ก= exp ๐ฝ โ
๐๐๐๐ ๐1 ๐ก
๐๐ก= โ
๐๐๐๐ ๐0 ๐ก
๐๐กexp(๐ฝ)
๐๐๐ ๐1 ๐ก = ๐๐๐ ๐0 ๐ก exp ๐ฝ + C
๐1 ๐ก = ๐0๐พ๐ก , ๐พ = exp ๐ฝ
(t = 0C = 0)
log โ๐๐๐ ๐1 ๐ก = log โ๐๐๐ ๐0 ๐ก + ๐ฝ
Based on the last equation, by plotting estimated survival curves (say, Kaplan-Meier estimates) for two treatments (groups) on a log[-log] scale, we would see constant vertical shift of the two curves if the hazards are proportional.
EX:
Note: Do not be misled by the visual impression of the curves near the origin.
For the specific case where the survival curves for the two groups are exponentially distributed, (i.e., constant hazard), we automatically have proportional hazards, since
๐1 ๐ก
๐0 ๐ก=
๐1
๐0, for all ๐ก โฅ 0
The ratio of median ๐ or mean ๐ survival times for two groups having exponential distributions is
๐1
๐0=log 2 /๐1log 2 /๐0
=๐0๐1
=1/๐11/๐0
=๐1๐0
logrank Test & Proportional Hazards The logrank test is the most powerful test among the weighted logrank tests to detect proportional hazards alternatives. In fact, it is the most powerful test among all nonparametric tests for detecting proportional hazards alternatives. Therefore, the proportional hazards alternative has not only a nice interpretation but alsonice statistical properties. These features leads to the natural use of logrank tests (unweighted) .
For ๐ป๐:๐1 ๐ก
๐0 ๐ก= exp ๐ฝ๐ด ; ๐ฝ๐ด โ 0, When censoring does not depend on treatment
(e.g., randomized experiments), the logrank test has distribution approximated by
~a๐๐ ๐ ๐ฝ๐ด ๐๐ 1 โ ๐ , 1
where ๐ is the total number of deaths (events), ๐ is the proportion in group 1, ๐ฝ๐ด is the log hazard ratio under the alternative.
Let ๐ = ๐ฝ๐ด ๐๐ 1 โ ๐ ~a๐๐ ๐ ๐, 1
Sample Size FormulaRecall that our test procedure is that: Reject ๐ป0 when ๐๐ > ๐ง๐ผ/2
~aunder ๐ป0, ๐๐ ๐ 0,1 ~a
under ๐ป๐, ๐๐ ๐ ๐, 1and
By the definition of power, we have
๐ ๐๐ > ๐ง๐ผ/2 ๐ป๐ = 1 โ ๐พ (1 โ ๐พ) is the desired power.
๐ ๐๐ > ๐ง๐ผ/2 ๐ป๐ + ๐ ๐๐ < โ๐ง๐ผ/2 ๐ป๐ = 1 โ ๐พ
Assume ๐ฝ๐ด > 0, then ๐ > 0. In this case,
๐ ๐๐ > ๐ง๐ผ/2 ๐ป๐ = ๐ ๐๐ โ ๐ < ๐ง๐ผ/2 โ ๐ ๐ป๐ = ๐ ๐ > ๐ง๐ผ/2 โ ๐
๐ ๐๐ < โ๐ง๐ผ/2 ๐ป๐ = ๐ ๐๐ โ ๐ < โ๐ง๐ผ/2 โ ๐ ๐ป๐ = ๐ ๐ < โ๐ง๐ผ/2 โ ๐
= ๐ ๐ > ๐ง๐ผ/2 + ๐ โ 0 ๐~๐(0,1)
๐ ๐ > ๐ง๐ผ/2 โ ๐ โ 1 โ ๐พ ๐ ๐ < ๐ง๐ผ/2 โ ๐ โ ๐พ ๐ ๐ > โ๐ง๐ผ/2 + ๐ โ ๐พ
โ๐ง๐ผ/2 + ๐ = ๐ง๐พ ๐ = ๐ง๐พ + ๐ง๐ผ/2 ๐ฝ๐ด ๐๐ 1 โ ๐ = ๐ง๐พ + ๐ง๐ผ/2 ๐ =๐ง๐พ + ๐ง๐ผ/2
2
( ๐ฝ๐ด)2โ๐ 1โ๐
โข Exactly the same formula for ๐ can be derived if ๐ฝ๐ด< 0. โข ๐ acts as the sample size.
Take a two-sided logrank test with level ๐ผ = 0.05, power 1 โ ๐พ = 0.90, ๐ = 0.5. Then
๐ =4 1.96 + 1.28 2
( ๐ฝ๐ด)2
The following table gives some required number of events for different hazard ratio exp ๐ฝ๐ด .
Hazard ratio exp ๐ฝ๐ด ๐
2.00 88
1.50 256
1.25 844
1.10 4623
One strategy is to enter some larger number of patients, say 350 patients (about 175 patients on each treatment arm) and then continue following until we have 256 deaths.
EX: Suppose patients with advanced lung cancer have a median survival time of 6 months. We have a new treatment which we hope will increase the median survival time to 9 months. If the survival time follows exponential distributions, then this
difference would correspond to a hazard ratio of exp ๐ฝ๐ด =๐1 ๐ก
๐0 ๐ก=
๐1
๐0=
๐0
๐1=
6
9=
2
3.
๐ =4 1.96 + 1.28 2
(log 2/3 )2= 256
Design SpecificationMore often in survival studies we need to be able to specify to the investigators the following:1. number of patients;2. accrual period;3. follow-up time.
It was shown by Schoenfeld that reasonable approximations for obtaining the desired power can be made by ensuring that the total expected number of deaths (events) from both groups, computed under the alternative, should equal (assuming equal probability of assigning treatments)
๐ธ ๐ =4 ๐ง๐พ + ๐ง๐ผ/2
2
( ๐ฝ๐ด)2
That is, we compute the expected number of deaths for both groups โ0โ and โ1โ separately under the alternative hypothesis, the sum of these should be equal to the above formula.
Computing ๐ธ ๐ in One SampleSuppose (๐๐, ฮ๐), ๐ = 1, โฆ , ๐ represents a sample of possibly censored survival data, with ๐๐ = min(๐๐ , ๐ถ๐), ฮ๐ = ๐ผ ๐๐ โค ๐ถ๐ , and the following notations:
๐ ๐ถ
๐(๐ก) Density function ๐(๐ก)
๐น(๐ก) C.D.F ๐บ(๐ก)
๐(๐ก) Survival function ๐ป(๐ก)
๐(๐ก) Hazard function ๐(๐ก)
The expected number of deaths is
๐ธ ๐ = ๐ โ ๐ ฮ = 1 = เถฑ0
โ
๐ ๐ฅ, ฮ = 1 ๐๐ฅ = เถฑ0
โ
๐ ๐ฅ ๐ป ๐ฅ ๐๐ฅ
Ex: Suppose ๐ is exponential with hazard ๐ ยธ and ๐ถ is exponential with hazard ๐,then
๐ ฮ = 1 = เถฑ0
โ
๐ ๐ฅ ๐ป ๐ฅ ๐๐ฅ
= เถฑ0
โ
๐๐โ๐๐ฅ๐โ๐๐ฅ๐๐ฅ =๐
๐ + ๐
Design with Censoring Due To Staggered Entry
Suppose ๐ patients enter the study at times ๐ธ1, ๐ธ2, โฆ , ๐ธ๐ assumed to be independent and identically distributed (i.i.d.) with distribution function ๐๐ธ ๐ข = ๐[๐ธ โค ๐ข]. If there was no other loss to follow-up or competing risk, the censoring random variable would be ๐ถ =๐ฟ โ ๐ธ. Hence,
๐ป๐ถ ๐ข = ๐ ๐ฟ โ ๐ธ โฅ ๐ข = ๐ ๐ธ โค ๐ฟ โ ๐ข = ๐๐ธ ๐ฟ โ ๐ข , ๐ข โ 0, ๐ฟ .
Therefore, for such an experiment, the expected number of deaths in a sample of size ๐would be equal to
๐ โ ๐ ฮ = 1 = เถฑ0
๐ฟ
๐๐๐ข ๐๐ ๐ข ๐๐ธ(๐ฟ โ ๐ข)๐๐ข
Ex: Suppose the underlying survival of a population follows an exponential distribution. A study will accrue patients for ๐ด years uniformly during that time and then analysis will be conducted after an additional ๐น years of follow-up. What is the expected number of deaths for a sample of ๐ patients.
The entry rate follows a uniform distribution in 0, ๐ด . That is
๐๐ธ ๐ข = ๐ ๐ธ โค ๐ข =
0 ๐๐ ๐ข โค 0๐ข
๐ด๐๐ 0 < ๐ข < ๐ด
1 ๐๐ ๐ข > ๐ดConsequently,
๐ป๐ถ ๐ข = ๐๐ธ ๐ฟ โ ๐ข =
1 ๐๐ ๐ข โค ๐ฟ โ ๐ด๐ฟโ๐ข
๐ด๐๐ ๐ฟ โ ๐ด < ๐ข โค ๐ฟ
0 ๐๐ ๐ข > ๐ฟ
๐ ฮ = 1 = เถฑ0
๐ฟ
๐๐๐ข ๐๐ ๐ข ๐ป๐ถ ๐ข ๐๐ข
= เถฑ0
๐ฟโ๐ด
๐๐โ๐๐ข๐๐ข +เถฑ๐ฟโ๐ด
๐ฟ
๐๐โ๐๐ข๐ฟ โ ๐ข
๐ด๐๐ข
= เถฑ0
๐ฟโ๐ด
๐๐โ๐๐ข๐๐ข +๐ฟ
๐ดเถฑ๐ฟโ๐ด
๐ฟ
๐๐โ๐๐ข๐๐ข โ1
๐ดเถฑ๐ฟโ๐ด
๐ฟ
๐ข๐โ๐๐ข๐๐ข
= โฏ
= 1 โ๐โ๐๐ฟ
๐๐ด๐๐๐ด โ 1
Hence,
Therefore, if we accrue ๐ patients uniformly over ๐ด years, who fail according to an exponential distribution with hazard ยธ, and follow them for an additional ๐น years, then the
expected number of deaths in the sample is ๐ โ 1 โ๐โ๐๐ฟ
๐๐ด๐๐๐ด โ 1
Lung cancer example (continued)๐0 = 4 ๐ฆ๐๐๐๐ ; ๐0 =
log 2
๐0= 0.173;๐1 = 6 ๐ฆ๐๐๐๐ ; ๐0 =
log 2
๐1= 0.116;
๐ =4 1.96 + 1.28 2
(log 2/3 )2= 256
Suppose we decide to accrue patients for ๐ด = 5 years and then follow them for an additional ๐น = 3 years, so L = ๐ด + ๐น = 8 years. How large a sample size is necessary?
In a randomized trial where we randomize the patients to the two treatments with equal probability, the expected number of deaths would be equal to ๐ท1 + ๐ท0, where
๐ท๐ =๐
2โ 1 โ
๐โ๐๐๐ฟ
๐๐๐ด๐๐๐๐ด โ 1 , ๐ = 0,1
For this problem, the expected number of deaths is
๐ท1 + ๐ท0 =๐
2โ 1 โ
๐โ0.173โ8
0.173 โ 5๐0.173โ5 โ 1 +
๐
2โ 1 โ
๐โ0.116โ8
0.116 โ 5๐0.116โ5 โ 1
=๐
2โ 0.6017 +
๐
2โ 0.4642 =
๐
2โ 1.0658
Thus if we want the expected number of deaths to equal 256, then
๐
2โ 1.0658 = 256 ๐ = 480
Note:
1. Different combinations of sample sizes, accrual periods and follow-up periods can be experimented to give the desired answer and best suits the needs of the experiment being conducted.
2. The above calculation for the sample size requires that we are able to get ๐ =480 patients within ๐ด = 5 years. If this is not the case, we will be underpowered to detect the difference of interest.
3. the sample size ๐ and the accrual period ๐ด are tied by the accrual rate ๐ (number of patients available per year) by ๐ = ๐ด๐ . If we have information on R, the above calculation has to be modified.
4. Other issues that affect power and may have to be considered are: a). loss to follow-up; b). competing risks; c). non-compliance.
5. Originally, we introduced a class of weighted logrank tests to test H0: S1 t =S0 t , for t โฅ 0. The weighted logrank test with weight function w(t) is optimal to detect the following alternative hypothesis
๐1 ๐ก = ๐0 ๐ก ๐๐ฝ๐ค(๐ก) or log๐1 ๐ก
๐0 ๐ก= ๐ฝ๐ค ๐ก ; ๐ฝ โ 0
๐พ sample weighted logrank testTesting the null hypothesis that the survival distributions are the same for ๐พ > 2 groups.
A sample of triplets ๐๐ , ฮ๐ , ๐๐ , ๐ = 1, 2, โฆ , ๐, where
๐๐ = min(๐๐ , ๐ถ๐) ฮ๐ = ๐ผ ๐๐ โค ๐ถ๐
๐๐ = {1,2, โฆ , ๐พ} corresponding to group membership in one of the ๐พ groups
H0: ๐1 ๐ก = ๐2 ๐ก = โฏ = ๐๐พ ๐ก , for ๐ก โฅ 0, or equivalently H0: ๐1 ๐ก = ๐2 ๐ก = โฏ = ๐๐พ ๐ก , for ๐ก โฅ 0
Denote ๐๐ ๐ก = ๐ ๐๐ โฅ ๐ก as the survival function for the ๐th group. The null hypothesis
can then be represented as:
๐๐(๐ฅ) = # at risk at time ๐ฅ from group ๐
๐๐๐(๐ฅ) = # of deaths observed at time ๐ฅ ([๐ฅ + ฮ๐ฅ)) from group ๐ = 1,2, โฆ , ๐พ
๐๐ ๐ฅ = ฯ๐=1๐พ ๐๐๐ ๐ฅ , total # of observed deaths at time ๐ฅ
๐ ๐ฅ = ฯ๐=1๐พ ๐๐ ๐ฅ , total # at risk at time ๐ฅ
๐น ๐ฅ = ๐๐๐ ๐ข , ๐๐ ๐ฅ ; ๐ = 1,2, โฆ , ๐พ for all grid points ๐ข < ๐ฅ, ๐๐๐ ๐๐(๐ฅ)
Notations:
At a slice of time[๐ฅ + ฮ๐ฅ), the data can be viewed as a 2 ร ๐พ contingency table:
Conditioning on ๐น(๐ฅ), we know the marginal counts of this 2 ร ๐พ table, in which case the
vector ๐๐1 ๐ฅ , ๐๐2 ๐ฅ ,โฆ , ๐๐๐พ ๐ฅT
is distributed as a multivariate version of a
hypergeometric distribution. Particularly,
๐ธ ๐๐๐ ๐ฅ |๐น ๐ฅ =๐๐ ๐ฅ ๐๐(๐ฅ)
๐(๐ฅ), ๐ = 1,2, โฆ , ๐พ
๐๐๐ ๐๐๐ ๐ฅ |๐น ๐ฅ =๐๐ ๐ฅ ๐ ๐ฅ โ ๐๐ ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
Cov ๐๐๐ ๐ฅ , ๐๐๐โฒ ๐ฅ |๐น ๐ฅ = โ๐๐ ๐ฅ ๐๐โฒ ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
๐ ๐ค =
ฯ๐ฅ๐ค ๐ฅ ๐๐1 ๐ฅ โ ๐1 ๐ฅ โ๐๐ ๐ฅ๐(๐ฅ)
ฯ๐ฅ๐ค ๐ฅ ๐๐2 ๐ฅ โ ๐2 ๐ฅ โ๐๐ ๐ฅ๐(๐ฅ)
โฎ
๐ฅ
๐ค ๐ฅ ๐๐๐พโ1 ๐ฅ โ ๐๐พโ1 ๐ฅ โ๐๐ ๐ฅ
๐(๐ฅ)
Consider a ๐พ โ 1 dimensional vector ๐ ๐ค , made up by the weighted sum of observed number of deaths minus their expected number of deaths for each treatment group ๐ =1,2, โฆ , ๐พ, summer over ๐ฅ
Note: 1. The ๐พ โ 1 dimensional vector is considered here since the sum of all ๐พ elements is
equal to zero and hence we have redundancy. If we included all ๐พ elements then the resulting vector would have a singular variance matrix.
2. Using arguments similar to the two-sample test, it can be shown that the vector of observed minus expected counts computed at different times, ๐ฅ and ๐ฅโฒ are uncorrelated. Consequently, the corresponding ๐พ โ 1 ร ๐พ โ 1 covariance matrix
of the vector ๐๐(๐ค) is given by ๐ = ๐๐๐โฒ , ๐, ๐โฒ = 1,2, โฆ , ๐พ โ 1, where
๐๐๐โฒ = โ
๐ฅ
๐ค2 ๐ฅ๐๐ ๐ฅ ๐๐โฒ ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
๐๐๐ =
๐ฅ
๐ค2 ๐ฅ๐๐ ๐ฅ ๐ ๐ฅ โ ๐๐ ๐ฅ ๐๐ ๐ฅ [๐ ๐ฅ โ ๐๐(๐ฅ)]
๐2 ๐ฅ ๐ ๐ฅ โ 1
Test Statistic: ๐ฒ sample weighted logrank test
๐ ๐ค = ๐ ๐ค ๐๐โ1๐ ๐ค
The test statistic used to test the null hypothesis is given by the quadratic form
Note: This statistic would be numerically identical regardless which of the ๐พ โ 1groups were included to avoid redundancy.
Under ๐ป0, this is distributed asymptotically as a ๐2 distribution with ๐พ โ 1 degrees of freedom. Hence, a level ๐ผ test would reject the null hypothesis whenever
๐ ๐ค = ๐ ๐ค ๐๐โ1๐ ๐ค โฅ ๐๐ผ;๐พโ12 ,
Where ๐๐ผ;๐พโ12 is the quantity that satisfies P ๐๐พโ1
2 โฅ ๐๐ผ;๐พโ12 = ๐ผ
Remark: 1. As with the two-sample tests, if the weight function ๐ค(๐ฅ) is stochastic, then it
must be a function of the survival and censoring data prior to time ๐ฅ.2. The most popular test was a weight ๐ค ๐ฅ โก 1 and is referred to as the ๐พ โsample
logrank test. These tests are available on most major software packages such as SAS, S+, etc. For example, the SAS code is exactly the same the that for two sample tests.
Stratified Test: Do We Need it?โข When comparing survival distributions among groups, especially in non-randomized
studies, confounding effect, i.e., other factors that may affect the interpretation of the relationship between survival and groups, is a concern.
โข For example, suppose we extract hospital records for patients who were treated after a myocardial infarction (heart attack) with either bypass surgery or angioplasty, and wish to test whether or not there is a difference in the survival distributions between these treatments. If we believe that these two groups of patients are comparable, a logrank or weighted logrank test can be used.
โข However, since this study was not randomized, there is no guarantee that the patients being compared are prognostically similar, e.g., it may be that the group of patients receiving angioplasty are younger on average or prognostically better in other ways. Then we wouldn't know whether significant difference in treatment groups, if they occurred, were due to treatment or other prognostic factors.
โข Or the treatments do have different effects. But the difference was blocked by some other factors the were distributed unbalancedly between treatment groups.
โข The effect of these prognostic factors can be adjusted either through stratification or through regression modeling (discussed later .
โข To adjust by stratification, strata of our population were definde according to combination of factors which make individuals within each strata more prognostically similar. Comparisons of survival distribution between groups are made within each strata and then these results are combined across the strata.
Stratified logrank TestConsider a population being sampled as consisting of p strata. The strata, for example,could be those used in balanced randomization of a clinical trial, or combination of factors that make individuals within each strata prognostically similar. Consider two-sample comparisons (treatments 0 vs. 1), and let ๐ index the strata ๐ = 1,2, โฆ , ๐. The null hypothesis being tested in a stratified test is
H0: ๐1๐ ๐ก = ๐0๐ ๐ก , for ๐ก โฅ 0, ๐ = 1,2, โฆ , ๐
The stratified logrank test consists of computing two-sample test statistic within each strata and then combining these results across strata. For example,
๐ ๐ค =
ฯ๐=1๐ ฯ๐ฅ๐ค๐ ๐ฅ ๐๐1๐ ๐ฅ โ
๐๐๐ ๐ฅ โ ๐1๐ ๐ฅ ]
๐๐ ๐ฅ
ฯ๐=1๐ ฯ๐ฅ๐ค๐
2 ๐ฅ๐1๐ ๐ฅ ๐0๐ ๐ฅ ๐๐๐ ๐ฅ [๐๐ ๐ฅ โ ๐๐๐(๐ฅ)]
๐๐2 ๐ฅ ๐๐ ๐ฅ โ 1
1/2
Since within each of the strata there was no additional balance being forced between twogroups beyond chance, the mean and variance of the test statistics computed within strata under the null hypothesis, are correct. The combining of the statistics and their variances over independent strata is now also correct. The resulting stratified logrank test has a standard normal distribution (asymptotically) under ๐ป0, i.e.,
~a๐(๐ค) ๐ 0,1 ~a ๐1
2or ๐(๐ค) 2
Remarks:
โข Stratified tests can be constructed for ๐พsamples as well. We just add the vector of test statistics over strata, as well as the covariance matrices before you
compute the quadratic form leading to the ๐2 statistic with (๐พ โ 1)degrees of freedom.
โข Sample size consideration are similar to the unstratified tests. Power is dependent on the number of observed deaths and the hazard ratio between groups within strata. For example, the stratified logrank test with ๐ค(๐ฅ) โก 1 for all ๐ฅ and ๐, is most powerful to detect proportional hazards alternatives within strata, where the hazard ratio is also assumed constant between strata. Namely
Ha: ๐1๐ ๐ฅ = ๐0๐ ๐ฅ exp(๐ฝ๐ด)
The number of deaths total in the study necessary to obtain power 1 โ ๐พ for detecting a difference corresponding to ๐ฝ๐ด above, using a stratified logranktest at the ๐ผ level of significance (two-sided), is equal to
๐ =4 โ ๐ง๐ผ + ๐ง1โ๐พ
2
๐ฝ๐ด2
This assumes equal randomization to the two treatments and is the same value as that obtained for unstratified tests. To compute the expected number of deaths using the design stage, we must compute separately over treatments and strata and these should add up to the desired number above.
SAS Example