4. Comparison of Two (K) Samples

32
4. Comparison of Two (K) Samples

Transcript of 4. Comparison of Two (K) Samples

Page 1: 4. Comparison of Two (K) Samples

4. Comparison of Two (K) Samples

Page 2: 4. Comparison of Two (K) Samples

K=2

๐‘: Treatment indicator, i.e. ๐‘ = 1 for treatment 1 (new treatment); ๐‘ = 0 for treatment 0 (standard treatment or placebo)

Problem: compare the survival distributions between two groups.

Ex: comparing treatments on patients with a particular disease.

Null Hypothesis:

H0: no treatment (group) differenceH0: ๐‘†0 ๐‘ก = ๐‘†1 ๐‘ก , for ๐‘ก โ‰ฅ 0H0: ๐œ†0 ๐‘ก = ๐œ†1 ๐‘ก , for ๐‘ก โ‰ฅ 0

Alternative Hypothesis:

Ha: the survival time for one treatment is stochastically larger or smaller than the survival time for the other treatment.Ha: ๐‘†1 ๐‘ก โ‰ฅ ๐‘†0 ๐‘ก , for ๐‘ก โ‰ฅ 0 with strict inequality for some ๐‘ก (one-sided)Ha: either ๐‘†1 ๐‘ก โ‰ฅ ๐‘†0 ๐‘ก , or ๐‘†0 ๐‘ก โ‰ฅ ๐‘†1 ๐‘ก , for ๐‘ก โ‰ฅ 0 with strict inequality for some ๐‘ก

Solution: In biomedical applications, it has become common practice to use nonparametric tests; that is, using test statistics whose distribution under the null hypothesis does not depend on specific parametric assumptions on the shape of the probability distribution. With censored survival data, the class of weighted logrank tests are mostly used, with the logrank test being the most commonly used.

Page 3: 4. Comparison of Two (K) Samples

NotationsA sample of triplets ๐‘‹๐‘– , ฮ”๐‘– , ๐‘๐‘– , ๐‘– = 1, 2, โ€ฆ , ๐‘›, where

๐‘‹๐‘– = min(๐‘‡๐‘– , ๐ถ๐‘–)

๐‘‡๐‘– = latent failure time; ๐ถ๐‘– = latent censoring time

ฮ”๐‘– = ๐ผ ๐‘‡๐‘– โ‰ค ๐ถ๐‘– ๐‘๐‘– = แ‰Š1 ๐‘›๐‘’๐‘ค ๐‘ก๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก0 ๐‘ ๐‘ก๐‘Ž๐‘›๐‘‘๐‘Ž๐‘Ÿ๐‘‘ ๐‘‡๐‘Ÿ๐‘’๐‘Ž๐‘ก๐‘š๐‘’๐‘›๐‘ก

Also, define,

๐‘›1 = number of individuals in group 1๐‘›0 = number of individuals in group 0

๐‘›๐‘— =

๐‘–=1

๐‘›

๐ผ(๐‘๐‘— = ๐‘—) , ๐‘— = 0, 1

๐‘› = ๐‘›0 + ๐‘›1๐‘Œ1(๐‘ฅ) = number of individuals at risk at time ๐‘ฅ from trt 1 = ฯƒ๐‘–=1

๐‘› ๐ผ(๐‘‹๐‘– โ‰ฅ ๐‘ฅ, ๐‘๐‘– = 1)๐‘Œ0(๐‘ฅ) = number of individuals at risk at time ๐‘ฅ from trt 0 = ฯƒ๐‘–=1

๐‘› ๐ผ(๐‘‹๐‘– โ‰ฅ ๐‘ฅ, ๐‘๐‘– = 0)

๐‘Œ(๐‘ฅ) = ๐‘Œ0(๐‘ฅ) + ๐‘Œ1(๐‘ฅ)

๐‘‘๐‘1(๐‘ฅ) = # of deaths observed at time ๐‘ฅ from trt 1 = ฯƒ๐‘–=1๐‘› ๐ผ(๐‘‹๐‘– = ๐‘ฅ, ฮ”๐‘– = 1, ๐‘๐‘– = 1)

๐‘‘๐‘0(๐‘ฅ) = # of deaths observed at time ๐‘ฅ from trt 0 = ฯƒ๐‘–=1๐‘› ๐ผ(๐‘‹๐‘– = ๐‘ฅ, ฮ”๐‘– = 1, ๐‘๐‘– = 0)

๐‘‘๐‘ ๐‘ฅ = ๐‘‘๐‘0 ๐‘ฅ + ๐‘‘๐‘1 ๐‘ฅ = ฯƒ๐‘–=1๐‘› ๐ผ(๐‘‹๐‘– = ๐‘ฅ, ฮ”๐‘– = 1)

Note: ๐‘‘๐‘ ๐‘ฅ actually correspond to the observed number of deaths in time window ๐‘ฅ, ๐‘ฅ + ฮ”๐‘ฅ for some partition of the time axis into intervals of length ฮ”๐‘ฅ. If the partition

is sufficiently fine then thinking of the number of deaths occurring exactly at ๐‘ฅ or in ๐‘ฅ, ๐‘ฅ + ฮ”๐‘ฅ makes little difference, and in the limit makes no difference at all.

Page 4: 4. Comparison of Two (K) Samples

Weighted logrank Test Statistic๐‘‡ ๐‘ค =

๐‘ˆ(๐‘ค)

๐‘ ๐‘’ ๐‘ˆ ๐‘ค

๐‘ˆ ๐‘ค =

๐‘ฅ

๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

๐‘ ๐‘’ ๐‘ˆ ๐‘ค will be given later.

The null hypothesis of treatment equality will be rejected if ๐‘‡ ๐‘ค is sufficiently different from zero.

Note: 1. At any time ๐‘ฅ for which there is no observed death

๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘ ๐‘ฅ

๐‘Œ ๐‘ฅ= 0.

This means that the sum above is only over distinct failure times.2. A weighted sum over the distinct failure times of observed number of deaths from

treatment 1 minus the expected number of deaths from treatment 1 if the null hypothesis were true.

3. When ๐‘ค ๐‘ฅ = 1, logrank test statistic

Where,

Page 5: 4. Comparison of Two (K) Samples

MotivationTake a slice of time ๐‘ฅ, ๐‘ฅ + ฮ”๐‘ฅ :

The following 2 ร— 2 table can be formulated:

๐‘‘๐‘1 ๐‘ฅ |๐‘Œ1 ๐‘ฅ , ๐‘Œ ๐‘ฅ , ๐‘‘๐‘ ๐‘ฅ ~๐ป๐‘ฆ๐‘๐‘’๐‘Ÿ๐‘”๐‘’๐‘œ๐‘š๐‘’๐‘ก๐‘Ÿ๐‘–๐‘ ๐‘Œ1 ๐‘ฅ , ๐‘‘๐‘ ๐‘ฅ , ๐‘Œ ๐‘ฅ

Under H0:

So, ๐ธ ๐‘‘๐‘1 ๐‘ฅ |๐‘Œ1 ๐‘ฅ , ๐‘Œ ๐‘ฅ , ๐‘‘๐‘ ๐‘ฅ =๐‘Œ1 ๐‘ฅ ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

Page 6: 4. Comparison of Two (K) Samples

๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)is the observed number of deaths minus expected number of

deaths due to treatment 1. Hence,

โ€ข if H0 is true, sum of ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)over ๐‘ฅ is expected to be near zero.

โ€ข If the hazard rate for treatment 1 were lower than that for treatment 0 consistently

over ๐‘ฅ, then on average, we expect ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘ ๐‘ฅ

๐‘Œ ๐‘ฅto be negative.

โ€ข If the hazard rate for treatment 1 were higher than that for treatment 0 consistently

over ๐‘ฅ, then on average, we expect ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘ ๐‘ฅ

๐‘Œ ๐‘ฅto be positive.

Specifically, the weighted logrank test statistic is given by

๐‘‡ ๐‘ค =ฯƒ๐‘ฅ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’

๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)๐‘Œ(๐‘ฅ)

ฯƒ๐‘ฅ๐‘ค2 ๐‘ฅ

๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

1/2

Under H0: ~aT(w) N(0, 1)

Therefore, a level ๐›ผ test (two-sided) will reject H0: ๐‘†0 ๐‘ก = ๐‘†1 ๐‘ก , when

๐‘‡ ๐‘ค โ‰ฅ ๐‘ง๐›ผ/2

Page 7: 4. Comparison of Two (K) Samples

Remarks:

1. Logrank test stat. =ฯƒ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’

๐‘Œ1 ๐‘ฅ ร—๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

ฯƒ๐‘ฅ๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’1

1/2

2. The statistic in the numerator is a weighted sum of observed minus the expected over the ๐‘˜ 2 ร— 2 tables, where ๐‘˜ is the number of distinct failure times.

3. The weight function ๐‘ค ๐‘ฅ can be used to emphasize differences in the hazard rates over time according to their relative values. For example, if the weight early in time is larger and later becomes smaller, then such test statistic would emphasize early differences in the survival curves.

4. If the weights ๐‘ค ๐‘ฅ are stochastic (functions of data), then they need to be a function of the censoring and survival information prior to time ๐‘ฅ.

5. ๐‘ค ๐‘ฅ = 1: Logrank test

6. ๐‘ค ๐‘ฅ = ๐‘Œ(๐‘ฅ):Gehanโ€ฒs generalization of wilcoxon test

7. ๐‘ค ๐‘ฅ = ๐พ๐‘€(๐‘ฅ): Petoโˆ’Prenticeโ€ฒs generalization of wilcoxon test

Note: Since both ๐‘Œ(๐‘ฅ) and ๐พ๐‘€(๐‘ฅ) are non-increasing functions of ๐‘ฅ, both Gehanโ€ฒsand Petoโˆ’Prenticeโ€ฒs tests emphasize the difference early in the survival curves.

Page 8: 4. Comparison of Two (K) Samples

A Heuristic Proof๐น ๐‘ฅ = ๐‘‘๐‘0 ๐‘ข , ๐‘‘๐‘1 ๐‘ข , ๐‘Œ1 ๐‘ข , ๐‘Œ0 ๐‘ข ,๐‘ค1 ๐‘ข ,๐‘ค0 ๐‘ข , ๐‘‘๐‘ ๐‘ฅ for all grid points ๐‘ข < ๐‘ฅ

Define a set of random variables:

Assume H0 is true. Knowing ๐น ๐‘ฅ would imply (with respect to the 2 ร— 2 table) that:

We know ๐‘Œ1 ๐‘ฅ , ๐‘Œ0 ๐‘ฅ (i.e., the number at risk at time ๐‘ฅ from either treatment group),and, in addition, we know ๐‘‘๐‘ ๐‘ฅ (i.e., the number of deaths โ€“ total from bothtreatment groups โ€“ occurring in ๐‘ฅ, ๐‘ฅ + ฮ”๐‘ฅ ). The only thing we don't know is ๐‘‘๐‘1 ๐‘ฅ .

Conditional on ๐น ๐‘ฅ , we have a 2 ร— 2 table, which under the null hypothesis follows independence, and we have the knowledge of the marginal counts of the table (i.e., the marginal count are fixed conditional on ๐น ๐‘ฅ ). Therefore, the conditional distribution of one of the counts, say, ๐‘‘๐‘1 ๐‘ฅ , in the cell of the table, given ๐น ๐‘ฅ follows a hypergeometric distribution.

๐‘ƒ ๐‘‘๐‘1 ๐‘ฅ = ๐‘|๐‘Œ1 ๐‘ฅ , ๐‘Œ ๐‘ฅ , ๐‘‘๐‘ ๐‘ฅ =

๐‘‘๐‘(๐‘ฅ)๐‘

๐‘Œ ๐‘ฅ โˆ’๐‘‘๐‘(๐‘ฅ)๐‘Œ1 ๐‘ฅ โˆ’๐‘

๐‘Œ(๐‘ฅ)๐‘Œ1 ๐‘ฅ

๐ธ ๐‘‘๐‘1 ๐‘ฅ |๐น ๐‘ฅ =๐‘Œ1 ๐‘ฅ ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

๐‘‰๐‘Ž๐‘Ÿ ๐‘‘๐‘1 ๐‘ฅ |๐น ๐‘ฅ =๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

Page 9: 4. Comparison of Two (K) Samples

๐‘ˆ ๐‘ค =

๐‘ฅ

๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

๐ธ ๐‘ˆ ๐‘ค =

๐‘ฅ

๐ธ ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐ธ ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)๐น(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐‘ค ๐‘ฅ ๐ธ ๐‘‘๐‘1 ๐‘ฅ ๐น(๐‘ฅ) โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)= 0

The numerator of the weighted logrank test statistic is:

Notice that under H0 :

Next, we will find an unbiased estimator for the variance of ๐‘ˆ ๐‘ค . Let

๐ด ๐‘ฅ = ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ).

Then,

๐‘‰๐‘Ž๐‘Ÿ ๐‘ˆ ๐‘ค = ๐‘‰๐‘Ž๐‘Ÿ

๐‘ฅ

๐ด(๐‘ฅ) =

๐‘ฅ

๐‘‰๐‘Ž๐‘Ÿ ๐ด ๐‘ฅ +

๐‘ฅโ‰ ๐‘ฆ

๐ถ๐‘œ๐‘ฃ ๐ด ๐‘ฅ , ๐ด ๐‘ฆ .

Page 10: 4. Comparison of Two (K) Samples

Notice that we already show: ๐ธ ๐ด ๐‘ฅ = ๐ธ ๐ด ๐‘ฆ = 0. WOLG, suppose y < ๐‘ฅ, then,

๐ถ๐‘œ๐‘ฃ ๐ด ๐‘ฅ , ๐ด ๐‘ฆ = ๐ธ ๐ด ๐‘ฅ โˆ— ๐ด(๐‘ฆ) = ๐ธ ๐ธ ๐ด ๐‘ฅ โˆ— ๐ด(๐‘ฆ) ๐น(๐‘ฅ)

= ๐ธ ๐ด ๐‘ฆ ๐ธ ๐ด(๐‘ฅ) ๐น(๐‘ฅ) = 0

Now, ๐‘‰๐‘Ž๐‘Ÿ ๐‘ˆ ๐‘ค =

๐‘ฅ

๐‘‰๐‘Ž๐‘Ÿ ๐ด ๐‘ฅ =

๐‘ฅ

๐ธ ๐ด2 ๐‘ฅ =

๐‘ฅ

๐ธ ๐ธ ๐ด2 ๐‘ฅ ๐น(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐ธ ๐‘ค2 ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

2

๐น(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐‘ค2 ๐‘ฅ ๐ธ ๐‘‘๐‘1 ๐‘ฅ โˆ’ ๐ธ ๐‘‘๐‘1 ๐‘ฅ2๐น(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐‘ค2 ๐‘ฅ ๐‘‰๐‘Ž๐‘Ÿ ๐‘‘๐‘1 ๐‘ฅ ๐น(๐‘ฅ)

=

๐‘ฅ

๐ธ ๐‘ค2 ๐‘ฅ๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

This means:

๐‘ฅ

๐‘ค2 ๐‘ฅ๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1is an unbiased estimator for ๐‘‰๐‘Ž๐‘Ÿ ๐‘ˆ ๐‘ค .

Page 11: 4. Comparison of Two (K) Samples

๐‘‡ ๐‘ค =๐‘ˆ(๐‘ค)

๐‘ ๐‘’ ๐‘ˆ ๐‘ค

ฯƒ๐‘ฅ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’๐‘Œ1 ๐‘ฅ ร— ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ)

ฯƒ๐‘ฅ๐‘ค2 ๐‘ฅ

๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

1/2

Recapping:

Under H0 : ๐‘†0 ๐‘ก = ๐‘†1 ๐‘ก

๐‘‡ ๐‘ค =๐‘ˆ(๐‘ค)

๐‘ ๐‘’ ๐‘ˆ ๐‘ค

1. The Statistics ๐‘ˆ ๐‘ค = ฯƒ๐‘ฅ๐ด(๐‘ฅ) has expectation equal to zero, i.e. E ๐‘ˆ ๐‘ค = 0.

2. ๐‘ˆ ๐‘ค = ฯƒ๐‘ฅ๐ด(๐‘ฅ) is made up of a sum of conditionally uncorrelated terms each with mean zero. By the central limit theory for such martingale structures, U(w) properly normalized will be approximately a standard normal random variable. That is:

~a N(0, 1)

๐‘ฅ

๐‘ค2 ๐‘ฅ๐‘Œ1 ๐‘ฅ ๐‘Œ0 ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

3. An unbiased estimate of the variance of ๐‘ˆ ๐‘ค was given by

~a N(0, 1)

Therefore,

#

Page 12: 4. Comparison of Two (K) Samples

An ExampleThe data give the survival times for 25 myelomatosis patients randomized to two treatments (1 or 2):

dur status trt renal8 1 1 1180 1 2 0โ€ฆ1296 1 2 0

dur is the patient's survival or censored time, status is the censoring indicator, trt is the treatment indicator,renal is the indicator of impaired renal function (0 = normal; 1 =impaired).

To test the null hypothesis the treatment trt has no effect, i.e. H0 : ๐‘†0 ๐‘ก = ๐‘†1 ๐‘ก

SAS & R codes

Page 13: 4. Comparison of Two (K) Samples

Note:

1. the numerator of Gehan's Wilcoxon test is much larger than that of logranktest since Gehan's Wilcoxon test uses the number at risk as the weight and logrank test uses identity weight.

2. The likelihood ratio test is based on exponential model.

3. In this example, logrank test gives a more significant result than Gehan'sWilcoxon test (although none of them provides strong evidence against the null hypothesis). Why is that?

The treatment specific Kaplan-Meier survival estimates were generated using the following R functions:

pdf(file="fig_myel.pdf", horizontal = F, height=6, width=8.5, pointsize=14)

# par(mfrow=c(1,2))

example <- read.table(file="chap4_myel.txt", header=T);

fit <- survfit(Surv(dur, status) ~ trt, example);

plot(fit, xlab="Patient time (months)", ylab="survival probability", lty=c(1,2))

legend(1000,1, c("trt = 1", "trt = 2"), lty=c(1,2), cex=0.8)

dev.off()

Page 14: 4. Comparison of Two (K) Samples

> survdiff(Surv(dur, status) ~ trt, example)Call:survdiff(formula = Surv(dur, status) ~ trt, data = example)

N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 12 6 8.34 0.655 1.31trt=2 13 11 8.66 0.631 1.31

Chisq= 1.3 on 1 degrees of freedom, p= 0.252 > survdiff(Surv(dur, status) ~ trt, rho=1, example)

N Observed Expected (O-E)^2/E (O-E)^2/Vtrt=1 12 4.80 5.60 0.115 0.304trt=2 13 6.83 6.03 0.106 0.304

Chisq= 0.3 on 1 degrees of freedom, p= 0.581

logrank test in R

Peto-Prentice's Wilcoxon test

Kaplan-Meier estimates for two treatments

Page 15: 4. Comparison of Two (K) Samples

Power and Sample SizeSince a survival curve is infinite dimensional, describing departures from the null as differences at every point in time over the survival curve would be complicated. Clearly, some simplifying conditions must be given. In clinical trials, proportional hazards alternatives have become very popular. That is,

๐œ†1 ๐‘ก

๐œ†0 ๐‘ก= exp ๐›ฝ , for all ๐‘ก โ‰ฅ 0

1. ฮฒ > 0 individuals on treatment 1 have worse survival (i.e., die faster).2. ฮฒ = 0 no treatment difference (null is true)3. ฮฒ < 0 individuals on treatment 1 have better survival (i.e., live longer).

๐œ†1 ๐‘ก

๐œ†0 ๐‘ก= exp ๐›ฝ โˆ’

๐‘‘๐‘™๐‘œ๐‘” ๐‘†1 ๐‘ก

๐‘‘๐‘ก= โˆ’

๐‘‘๐‘™๐‘œ๐‘” ๐‘†0 ๐‘ก

๐‘‘๐‘กexp(๐›ฝ)

๐‘™๐‘œ๐‘” ๐‘†1 ๐‘ก = ๐‘™๐‘œ๐‘” ๐‘†0 ๐‘ก exp ๐›ฝ + C

๐‘†1 ๐‘ก = ๐‘†0๐›พ๐‘ก , ๐›พ = exp ๐›ฝ

(t = 0C = 0)

log โˆ’๐‘™๐‘œ๐‘” ๐‘†1 ๐‘ก = log โˆ’๐‘™๐‘œ๐‘” ๐‘†0 ๐‘ก + ๐›ฝ

Based on the last equation, by plotting estimated survival curves (say, Kaplan-Meier estimates) for two treatments (groups) on a log[-log] scale, we would see constant vertical shift of the two curves if the hazards are proportional.

Page 16: 4. Comparison of Two (K) Samples

EX:

Note: Do not be misled by the visual impression of the curves near the origin.

For the specific case where the survival curves for the two groups are exponentially distributed, (i.e., constant hazard), we automatically have proportional hazards, since

๐œ†1 ๐‘ก

๐œ†0 ๐‘ก=

๐œ†1

๐œ†0, for all ๐‘ก โ‰ฅ 0

The ratio of median ๐‘š or mean ๐œ‡ survival times for two groups having exponential distributions is

๐‘š1

๐‘š0=log 2 /๐œ†1log 2 /๐œ†0

=๐œ†0๐œ†1

=1/๐œ†11/๐œ†0

=๐œ‡1๐œ‡0

Page 17: 4. Comparison of Two (K) Samples

logrank Test & Proportional Hazards The logrank test is the most powerful test among the weighted logrank tests to detect proportional hazards alternatives. In fact, it is the most powerful test among all nonparametric tests for detecting proportional hazards alternatives. Therefore, the proportional hazards alternative has not only a nice interpretation but alsonice statistical properties. These features leads to the natural use of logrank tests (unweighted) .

For ๐ป๐‘Ž:๐œ†1 ๐‘ก

๐œ†0 ๐‘ก= exp ๐›ฝ๐ด ; ๐›ฝ๐ด โ‰  0, When censoring does not depend on treatment

(e.g., randomized experiments), the logrank test has distribution approximated by

~a๐‘‡๐‘› ๐‘ ๐›ฝ๐ด ๐‘‘๐œƒ 1 โˆ’ ๐œƒ , 1

where ๐‘‘ is the total number of deaths (events), ๐œƒ is the proportion in group 1, ๐›ฝ๐ด is the log hazard ratio under the alternative.

Let ๐œ‡ = ๐›ฝ๐ด ๐‘‘๐œƒ 1 โˆ’ ๐œƒ ~a๐‘‡๐‘› ๐‘ ๐œ‡, 1

Page 18: 4. Comparison of Two (K) Samples

Sample Size FormulaRecall that our test procedure is that: Reject ๐ป0 when ๐‘‡๐‘› > ๐‘ง๐›ผ/2

~aunder ๐ป0, ๐‘‡๐‘› ๐‘ 0,1 ~a

under ๐ป๐‘Ž, ๐‘‡๐‘› ๐‘ ๐œ‡, 1and

By the definition of power, we have

๐‘ƒ ๐‘‡๐‘› > ๐‘ง๐›ผ/2 ๐ป๐‘Ž = 1 โˆ’ ๐›พ (1 โˆ’ ๐›พ) is the desired power.

๐‘ƒ ๐‘‡๐‘› > ๐‘ง๐›ผ/2 ๐ป๐‘Ž + ๐‘ƒ ๐‘‡๐‘› < โˆ’๐‘ง๐›ผ/2 ๐ป๐‘Ž = 1 โˆ’ ๐›พ

Assume ๐›ฝ๐ด > 0, then ๐œ‡ > 0. In this case,

๐‘ƒ ๐‘‡๐‘› > ๐‘ง๐›ผ/2 ๐ป๐‘Ž = ๐‘ƒ ๐‘‡๐‘› โˆ’ ๐œ‡ < ๐‘ง๐›ผ/2 โˆ’ ๐œ‡ ๐ป๐‘Ž = ๐‘ƒ ๐‘ > ๐‘ง๐›ผ/2 โˆ’ ๐œ‡

๐‘ƒ ๐‘‡๐‘› < โˆ’๐‘ง๐›ผ/2 ๐ป๐‘Ž = ๐‘ƒ ๐‘‡๐‘› โˆ’ ๐œ‡ < โˆ’๐‘ง๐›ผ/2 โˆ’ ๐œ‡ ๐ป๐‘Ž = ๐‘ƒ ๐‘ < โˆ’๐‘ง๐›ผ/2 โˆ’ ๐œ‡

= ๐‘ƒ ๐‘ > ๐‘ง๐›ผ/2 + ๐œ‡ โ‰ˆ 0 ๐‘~๐‘(0,1)

๐‘ƒ ๐‘ > ๐‘ง๐›ผ/2 โˆ’ ๐œ‡ โ‰ˆ 1 โˆ’ ๐›พ ๐‘ƒ ๐‘ < ๐‘ง๐›ผ/2 โˆ’ ๐œ‡ โ‰ˆ ๐›พ ๐‘ƒ ๐‘ > โˆ’๐‘ง๐›ผ/2 + ๐œ‡ โ‰ˆ ๐›พ

โˆ’๐‘ง๐›ผ/2 + ๐œ‡ = ๐‘ง๐›พ ๐œ‡ = ๐‘ง๐›พ + ๐‘ง๐›ผ/2 ๐›ฝ๐ด ๐‘‘๐œƒ 1 โˆ’ ๐œƒ = ๐‘ง๐›พ + ๐‘ง๐›ผ/2 ๐‘‘ =๐‘ง๐›พ + ๐‘ง๐›ผ/2

2

( ๐›ฝ๐ด)2โˆ—๐œƒ 1โˆ’๐œƒ

โ€ข Exactly the same formula for ๐‘‘ can be derived if ๐›ฝ๐ด< 0. โ€ข ๐‘‘ acts as the sample size.

Page 19: 4. Comparison of Two (K) Samples

Take a two-sided logrank test with level ๐›ผ = 0.05, power 1 โˆ’ ๐›พ = 0.90, ๐œƒ = 0.5. Then

๐‘‘ =4 1.96 + 1.28 2

( ๐›ฝ๐ด)2

The following table gives some required number of events for different hazard ratio exp ๐›ฝ๐ด .

Hazard ratio exp ๐›ฝ๐ด ๐‘‘

2.00 88

1.50 256

1.25 844

1.10 4623

One strategy is to enter some larger number of patients, say 350 patients (about 175 patients on each treatment arm) and then continue following until we have 256 deaths.

EX: Suppose patients with advanced lung cancer have a median survival time of 6 months. We have a new treatment which we hope will increase the median survival time to 9 months. If the survival time follows exponential distributions, then this

difference would correspond to a hazard ratio of exp ๐›ฝ๐ด =๐œ†1 ๐‘ก

๐œ†0 ๐‘ก=

๐œ†1

๐œ†0=

๐‘š0

๐‘š1=

6

9=

2

3.

๐‘‘ =4 1.96 + 1.28 2

(log 2/3 )2= 256

Page 20: 4. Comparison of Two (K) Samples

Design SpecificationMore often in survival studies we need to be able to specify to the investigators the following:1. number of patients;2. accrual period;3. follow-up time.

It was shown by Schoenfeld that reasonable approximations for obtaining the desired power can be made by ensuring that the total expected number of deaths (events) from both groups, computed under the alternative, should equal (assuming equal probability of assigning treatments)

๐ธ ๐‘‘ =4 ๐‘ง๐›พ + ๐‘ง๐›ผ/2

2

( ๐›ฝ๐ด)2

That is, we compute the expected number of deaths for both groups โ€œ0โ€ and โ€œ1โ€ separately under the alternative hypothesis, the sum of these should be equal to the above formula.

Page 21: 4. Comparison of Two (K) Samples

Computing ๐ธ ๐‘‘ in One SampleSuppose (๐‘‹๐‘–, ฮ”๐‘–), ๐‘– = 1, โ€ฆ , ๐‘› represents a sample of possibly censored survival data, with ๐‘‹๐‘– = min(๐‘‡๐‘– , ๐ถ๐‘–), ฮ”๐‘– = ๐ผ ๐‘‡๐‘– โ‰ค ๐ถ๐‘– , and the following notations:

๐‘‡ ๐ถ

๐‘“(๐‘ก) Density function ๐‘”(๐‘ก)

๐น(๐‘ก) C.D.F ๐บ(๐‘ก)

๐‘†(๐‘ก) Survival function ๐ป(๐‘ก)

๐œ†(๐‘ก) Hazard function ๐œ‡(๐‘ก)

The expected number of deaths is

๐ธ ๐‘‘ = ๐‘› โˆ— ๐‘ƒ ฮ” = 1 = เถฑ0

โˆž

๐‘“ ๐‘ฅ, ฮ” = 1 ๐‘‘๐‘ฅ = เถฑ0

โˆž

๐‘“ ๐‘ฅ ๐ป ๐‘ฅ ๐‘‘๐‘ฅ

Ex: Suppose ๐‘‡ is exponential with hazard ๐œ† ยธ and ๐ถ is exponential with hazard ๐œ‡,then

๐‘ƒ ฮ” = 1 = เถฑ0

โˆž

๐‘“ ๐‘ฅ ๐ป ๐‘ฅ ๐‘‘๐‘ฅ

= เถฑ0

โˆž

๐œ†๐‘’โˆ’๐œ†๐‘ฅ๐‘’โˆ’๐œ‡๐‘ฅ๐‘‘๐‘ฅ =๐œ†

๐œ† + ๐œ‡

Page 22: 4. Comparison of Two (K) Samples

Design with Censoring Due To Staggered Entry

Suppose ๐‘› patients enter the study at times ๐ธ1, ๐ธ2, โ€ฆ , ๐ธ๐‘› assumed to be independent and identically distributed (i.i.d.) with distribution function ๐‘„๐ธ ๐‘ข = ๐‘ƒ[๐ธ โ‰ค ๐‘ข]. If there was no other loss to follow-up or competing risk, the censoring random variable would be ๐ถ =๐ฟ โˆ’ ๐ธ. Hence,

๐ป๐ถ ๐‘ข = ๐‘ƒ ๐ฟ โˆ’ ๐ธ โ‰ฅ ๐‘ข = ๐‘ƒ ๐ธ โ‰ค ๐ฟ โˆ’ ๐‘ข = ๐‘„๐ธ ๐ฟ โˆ’ ๐‘ข , ๐‘ข โˆˆ 0, ๐ฟ .

Therefore, for such an experiment, the expected number of deaths in a sample of size ๐‘›would be equal to

๐‘› โˆ— ๐‘ƒ ฮ” = 1 = เถฑ0

๐ฟ

๐œ†๐‘‡๐‘ข ๐‘†๐‘‡ ๐‘ข ๐‘„๐ธ(๐ฟ โˆ’ ๐‘ข)๐‘‘๐‘ข

Ex: Suppose the underlying survival of a population follows an exponential distribution. A study will accrue patients for ๐ด years uniformly during that time and then analysis will be conducted after an additional ๐น years of follow-up. What is the expected number of deaths for a sample of ๐‘› patients.

Page 23: 4. Comparison of Two (K) Samples

The entry rate follows a uniform distribution in 0, ๐ด . That is

๐‘„๐ธ ๐‘ข = ๐‘ƒ ๐ธ โ‰ค ๐‘ข =

0 ๐‘–๐‘“ ๐‘ข โ‰ค 0๐‘ข

๐ด๐‘–๐‘“ 0 < ๐‘ข < ๐ด

1 ๐‘–๐‘“ ๐‘ข > ๐ดConsequently,

๐ป๐ถ ๐‘ข = ๐‘„๐ธ ๐ฟ โˆ’ ๐‘ข =

1 ๐‘–๐‘“ ๐‘ข โ‰ค ๐ฟ โˆ’ ๐ด๐ฟโˆ’๐‘ข

๐ด๐‘–๐‘“ ๐ฟ โˆ’ ๐ด < ๐‘ข โ‰ค ๐ฟ

0 ๐‘–๐‘“ ๐‘ข > ๐ฟ

๐‘ƒ ฮ” = 1 = เถฑ0

๐ฟ

๐œ†๐‘‡๐‘ข ๐‘†๐‘‡ ๐‘ข ๐ป๐ถ ๐‘ข ๐‘‘๐‘ข

= เถฑ0

๐ฟโˆ’๐ด

๐œ†๐‘’โˆ’๐œ†๐‘ข๐‘‘๐‘ข +เถฑ๐ฟโˆ’๐ด

๐ฟ

๐œ†๐‘’โˆ’๐œ†๐‘ข๐ฟ โˆ’ ๐‘ข

๐ด๐‘‘๐‘ข

= เถฑ0

๐ฟโˆ’๐ด

๐œ†๐‘’โˆ’๐œ†๐‘ข๐‘‘๐‘ข +๐ฟ

๐ดเถฑ๐ฟโˆ’๐ด

๐ฟ

๐œ†๐‘’โˆ’๐œ†๐‘ข๐‘‘๐‘ข โˆ’1

๐ดเถฑ๐ฟโˆ’๐ด

๐ฟ

๐‘ข๐‘’โˆ’๐œ†๐‘ข๐‘‘๐‘ข

= โ‹ฏ

= 1 โˆ’๐‘’โˆ’๐œ†๐ฟ

๐œ†๐ด๐‘’๐œ†๐ด โˆ’ 1

Hence,

Therefore, if we accrue ๐‘› patients uniformly over ๐ด years, who fail according to an exponential distribution with hazard ยธ, and follow them for an additional ๐น years, then the

expected number of deaths in the sample is ๐‘› โˆ— 1 โˆ’๐‘’โˆ’๐œ†๐ฟ

๐œ†๐ด๐‘’๐œ†๐ด โˆ’ 1

Page 24: 4. Comparison of Two (K) Samples

Lung cancer example (continued)๐‘š0 = 4 ๐‘ฆ๐‘’๐‘Ž๐‘Ÿ๐‘ ; ๐œ†0 =

log 2

๐‘š0= 0.173;๐‘š1 = 6 ๐‘ฆ๐‘’๐‘Ž๐‘Ÿ๐‘ ; ๐œ†0 =

log 2

๐‘š1= 0.116;

๐‘‘ =4 1.96 + 1.28 2

(log 2/3 )2= 256

Suppose we decide to accrue patients for ๐ด = 5 years and then follow them for an additional ๐น = 3 years, so L = ๐ด + ๐น = 8 years. How large a sample size is necessary?

In a randomized trial where we randomize the patients to the two treatments with equal probability, the expected number of deaths would be equal to ๐ท1 + ๐ท0, where

๐ท๐‘— =๐‘›

2โˆ— 1 โˆ’

๐‘’โˆ’๐œ†๐‘—๐ฟ

๐œ†๐‘—๐ด๐‘’๐œ†๐‘—๐ด โˆ’ 1 , ๐‘— = 0,1

For this problem, the expected number of deaths is

๐ท1 + ๐ท0 =๐‘›

2โˆ— 1 โˆ’

๐‘’โˆ’0.173โˆ—8

0.173 โˆ— 5๐‘’0.173โˆ—5 โˆ’ 1 +

๐‘›

2โˆ— 1 โˆ’

๐‘’โˆ’0.116โˆ—8

0.116 โˆ— 5๐‘’0.116โˆ—5 โˆ’ 1

=๐‘›

2โˆ— 0.6017 +

๐‘›

2โˆ— 0.4642 =

๐‘›

2โˆ— 1.0658

Thus if we want the expected number of deaths to equal 256, then

๐‘›

2โˆ— 1.0658 = 256 ๐‘› = 480

Page 25: 4. Comparison of Two (K) Samples

Note:

1. Different combinations of sample sizes, accrual periods and follow-up periods can be experimented to give the desired answer and best suits the needs of the experiment being conducted.

2. The above calculation for the sample size requires that we are able to get ๐‘› =480 patients within ๐ด = 5 years. If this is not the case, we will be underpowered to detect the difference of interest.

3. the sample size ๐‘› and the accrual period ๐ด are tied by the accrual rate ๐‘… (number of patients available per year) by ๐‘› = ๐ด๐‘…. If we have information on R, the above calculation has to be modified.

4. Other issues that affect power and may have to be considered are: a). loss to follow-up; b). competing risks; c). non-compliance.

5. Originally, we introduced a class of weighted logrank tests to test H0: S1 t =S0 t , for t โ‰ฅ 0. The weighted logrank test with weight function w(t) is optimal to detect the following alternative hypothesis

๐œ†1 ๐‘ก = ๐œ†0 ๐‘ก ๐‘’๐›ฝ๐‘ค(๐‘ก) or log๐œ†1 ๐‘ก

๐œ†0 ๐‘ก= ๐›ฝ๐‘ค ๐‘ก ; ๐›ฝ โ‰  0

Page 26: 4. Comparison of Two (K) Samples

๐พ sample weighted logrank testTesting the null hypothesis that the survival distributions are the same for ๐พ > 2 groups.

A sample of triplets ๐‘‹๐‘– , ฮ”๐‘– , ๐‘๐‘– , ๐‘– = 1, 2, โ€ฆ , ๐‘›, where

๐‘‹๐‘– = min(๐‘‡๐‘– , ๐ถ๐‘–) ฮ”๐‘– = ๐ผ ๐‘‡๐‘– โ‰ค ๐ถ๐‘–

๐‘๐‘– = {1,2, โ€ฆ , ๐พ} corresponding to group membership in one of the ๐พ groups

H0: ๐‘†1 ๐‘ก = ๐‘†2 ๐‘ก = โ‹ฏ = ๐‘†๐พ ๐‘ก , for ๐‘ก โ‰ฅ 0, or equivalently H0: ๐œ†1 ๐‘ก = ๐œ†2 ๐‘ก = โ‹ฏ = ๐œ†๐พ ๐‘ก , for ๐‘ก โ‰ฅ 0

Denote ๐‘†๐‘— ๐‘ก = ๐‘ƒ ๐‘‡๐‘— โ‰ฅ ๐‘ก as the survival function for the ๐‘—th group. The null hypothesis

can then be represented as:

๐‘Œ๐‘—(๐‘ฅ) = # at risk at time ๐‘ฅ from group ๐‘—

๐‘‘๐‘๐‘—(๐‘ฅ) = # of deaths observed at time ๐‘ฅ ([๐‘ฅ + ฮ”๐‘ฅ)) from group ๐‘— = 1,2, โ€ฆ , ๐พ

๐‘‘๐‘ ๐‘ฅ = ฯƒ๐‘—=1๐พ ๐‘‘๐‘๐‘— ๐‘ฅ , total # of observed deaths at time ๐‘ฅ

๐‘Œ ๐‘ฅ = ฯƒ๐‘—=1๐พ ๐‘Œ๐‘— ๐‘ฅ , total # at risk at time ๐‘ฅ

๐น ๐‘ฅ = ๐‘‘๐‘๐‘— ๐‘ข , ๐‘Œ๐‘— ๐‘ฅ ; ๐‘— = 1,2, โ€ฆ , ๐พ for all grid points ๐‘ข < ๐‘ฅ, ๐‘Ž๐‘›๐‘‘ ๐‘‘๐‘(๐‘ฅ)

Notations:

Page 27: 4. Comparison of Two (K) Samples

At a slice of time[๐‘ฅ + ฮ”๐‘ฅ), the data can be viewed as a 2 ร— ๐พ contingency table:

Conditioning on ๐น(๐‘ฅ), we know the marginal counts of this 2 ร— ๐พ table, in which case the

vector ๐‘‘๐‘1 ๐‘ฅ , ๐‘‘๐‘2 ๐‘ฅ ,โ€ฆ , ๐‘‘๐‘๐พ ๐‘ฅT

is distributed as a multivariate version of a

hypergeometric distribution. Particularly,

๐ธ ๐‘‘๐‘๐‘— ๐‘ฅ |๐น ๐‘ฅ =๐‘Œ๐‘— ๐‘ฅ ๐‘‘๐‘(๐‘ฅ)

๐‘Œ(๐‘ฅ), ๐‘— = 1,2, โ€ฆ , ๐พ

๐‘‰๐‘Ž๐‘Ÿ ๐‘‘๐‘๐‘— ๐‘ฅ |๐น ๐‘ฅ =๐‘Œ๐‘— ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ ๐‘Œ๐‘— ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

Cov ๐‘‘๐‘๐‘— ๐‘ฅ , ๐‘‘๐‘๐‘—โ€ฒ ๐‘ฅ |๐น ๐‘ฅ = โˆ’๐‘Œ๐‘— ๐‘ฅ ๐‘Œ๐‘—โ€ฒ ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

Page 28: 4. Comparison of Two (K) Samples

๐‘ˆ ๐‘ค =

ฯƒ๐‘ฅ๐‘ค ๐‘ฅ ๐‘‘๐‘1 ๐‘ฅ โˆ’ ๐‘Œ1 ๐‘ฅ โˆ—๐‘‘๐‘ ๐‘ฅ๐‘Œ(๐‘ฅ)

ฯƒ๐‘ฅ๐‘ค ๐‘ฅ ๐‘‘๐‘2 ๐‘ฅ โˆ’ ๐‘Œ2 ๐‘ฅ โˆ—๐‘‘๐‘ ๐‘ฅ๐‘Œ(๐‘ฅ)

โ‹ฎ

๐‘ฅ

๐‘ค ๐‘ฅ ๐‘‘๐‘๐พโˆ’1 ๐‘ฅ โˆ’ ๐‘Œ๐พโˆ’1 ๐‘ฅ โˆ—๐‘‘๐‘ ๐‘ฅ

๐‘Œ(๐‘ฅ)

Consider a ๐พ โˆ’ 1 dimensional vector ๐‘ˆ ๐‘ค , made up by the weighted sum of observed number of deaths minus their expected number of deaths for each treatment group ๐‘— =1,2, โ€ฆ , ๐พ, summer over ๐‘ฅ

Note: 1. The ๐พ โˆ’ 1 dimensional vector is considered here since the sum of all ๐พ elements is

equal to zero and hence we have redundancy. If we included all ๐พ elements then the resulting vector would have a singular variance matrix.

2. Using arguments similar to the two-sample test, it can be shown that the vector of observed minus expected counts computed at different times, ๐‘ฅ and ๐‘ฅโ€ฒ are uncorrelated. Consequently, the corresponding ๐พ โˆ’ 1 ร— ๐พ โˆ’ 1 covariance matrix

of the vector ๐‘‡๐‘›(๐‘ค) is given by ๐‘‰ = ๐‘‰๐‘—๐‘—โ€ฒ , ๐‘—, ๐‘—โ€ฒ = 1,2, โ€ฆ , ๐พ โˆ’ 1, where

๐‘‰๐‘—๐‘—โ€ฒ = โˆ’

๐‘ฅ

๐‘ค2 ๐‘ฅ๐‘Œ๐‘— ๐‘ฅ ๐‘Œ๐‘—โ€ฒ ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

๐‘‰๐‘—๐‘— =

๐‘ฅ

๐‘ค2 ๐‘ฅ๐‘Œ๐‘— ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ ๐‘Œ๐‘— ๐‘ฅ ๐‘‘๐‘ ๐‘ฅ [๐‘Œ ๐‘ฅ โˆ’ ๐‘‘๐‘(๐‘ฅ)]

๐‘Œ2 ๐‘ฅ ๐‘Œ ๐‘ฅ โˆ’ 1

Page 29: 4. Comparison of Two (K) Samples

Test Statistic: ๐‘ฒ sample weighted logrank test

๐‘‡ ๐‘ค = ๐‘ˆ ๐‘ค ๐‘‡๐‘‰โˆ’1๐‘ˆ ๐‘ค

The test statistic used to test the null hypothesis is given by the quadratic form

Note: This statistic would be numerically identical regardless which of the ๐พ โˆ’ 1groups were included to avoid redundancy.

Under ๐ป0, this is distributed asymptotically as a ๐œ’2 distribution with ๐พ โˆ’ 1 degrees of freedom. Hence, a level ๐›ผ test would reject the null hypothesis whenever

๐‘‡ ๐‘ค = ๐‘ˆ ๐‘ค ๐‘‡๐‘‰โˆ’1๐‘ˆ ๐‘ค โ‰ฅ ๐œ’๐›ผ;๐พโˆ’12 ,

Where ๐œ’๐›ผ;๐พโˆ’12 is the quantity that satisfies P ๐œ’๐พโˆ’1

2 โ‰ฅ ๐œ’๐›ผ;๐พโˆ’12 = ๐›ผ

Remark: 1. As with the two-sample tests, if the weight function ๐‘ค(๐‘ฅ) is stochastic, then it

must be a function of the survival and censoring data prior to time ๐‘ฅ.2. The most popular test was a weight ๐‘ค ๐‘ฅ โ‰ก 1 and is referred to as the ๐พ โˆ’sample

logrank test. These tests are available on most major software packages such as SAS, S+, etc. For example, the SAS code is exactly the same the that for two sample tests.

Page 30: 4. Comparison of Two (K) Samples

Stratified Test: Do We Need it?โ€ข When comparing survival distributions among groups, especially in non-randomized

studies, confounding effect, i.e., other factors that may affect the interpretation of the relationship between survival and groups, is a concern.

โ€ข For example, suppose we extract hospital records for patients who were treated after a myocardial infarction (heart attack) with either bypass surgery or angioplasty, and wish to test whether or not there is a difference in the survival distributions between these treatments. If we believe that these two groups of patients are comparable, a logrank or weighted logrank test can be used.

โ€ข However, since this study was not randomized, there is no guarantee that the patients being compared are prognostically similar, e.g., it may be that the group of patients receiving angioplasty are younger on average or prognostically better in other ways. Then we wouldn't know whether significant difference in treatment groups, if they occurred, were due to treatment or other prognostic factors.

โ€ข Or the treatments do have different effects. But the difference was blocked by some other factors the were distributed unbalancedly between treatment groups.

โ€ข The effect of these prognostic factors can be adjusted either through stratification or through regression modeling (discussed later .

โ€ข To adjust by stratification, strata of our population were definde according to combination of factors which make individuals within each strata more prognostically similar. Comparisons of survival distribution between groups are made within each strata and then these results are combined across the strata.

Page 31: 4. Comparison of Two (K) Samples

Stratified logrank TestConsider a population being sampled as consisting of p strata. The strata, for example,could be those used in balanced randomization of a clinical trial, or combination of factors that make individuals within each strata prognostically similar. Consider two-sample comparisons (treatments 0 vs. 1), and let ๐‘— index the strata ๐‘— = 1,2, โ€ฆ , ๐‘. The null hypothesis being tested in a stratified test is

H0: ๐‘†1๐‘— ๐‘ก = ๐‘†0๐‘— ๐‘ก , for ๐‘ก โ‰ฅ 0, ๐‘— = 1,2, โ€ฆ , ๐‘

The stratified logrank test consists of computing two-sample test statistic within each strata and then combining these results across strata. For example,

๐‘‡ ๐‘ค =

ฯƒ๐‘—=1๐‘ ฯƒ๐‘ฅ๐‘ค๐‘— ๐‘ฅ ๐‘‘๐‘1๐‘— ๐‘ฅ โˆ’

๐‘‘๐‘๐‘— ๐‘ฅ โˆ— ๐‘Œ1๐‘— ๐‘ฅ ]

๐‘Œ๐‘— ๐‘ฅ

ฯƒ๐‘—=1๐‘ ฯƒ๐‘ฅ๐‘ค๐‘—

2 ๐‘ฅ๐‘Œ1๐‘— ๐‘ฅ ๐‘Œ0๐‘— ๐‘ฅ ๐‘‘๐‘๐‘— ๐‘ฅ [๐‘Œ๐‘— ๐‘ฅ โˆ’ ๐‘‘๐‘๐‘—(๐‘ฅ)]

๐‘Œ๐‘—2 ๐‘ฅ ๐‘Œ๐‘— ๐‘ฅ โˆ’ 1

1/2

Since within each of the strata there was no additional balance being forced between twogroups beyond chance, the mean and variance of the test statistics computed within strata under the null hypothesis, are correct. The combining of the statistics and their variances over independent strata is now also correct. The resulting stratified logrank test has a standard normal distribution (asymptotically) under ๐ป0, i.e.,

~a๐‘‡(๐‘ค) ๐‘ 0,1 ~a ๐œ’1

2or ๐‘‡(๐‘ค) 2

Page 32: 4. Comparison of Two (K) Samples

Remarks:

โ€ข Stratified tests can be constructed for ๐พsamples as well. We just add the vector of test statistics over strata, as well as the covariance matrices before you

compute the quadratic form leading to the ๐œ’2 statistic with (๐พ โˆ’ 1)degrees of freedom.

โ€ข Sample size consideration are similar to the unstratified tests. Power is dependent on the number of observed deaths and the hazard ratio between groups within strata. For example, the stratified logrank test with ๐‘ค(๐‘ฅ) โ‰ก 1 for all ๐‘ฅ and ๐‘—, is most powerful to detect proportional hazards alternatives within strata, where the hazard ratio is also assumed constant between strata. Namely

Ha: ๐œ†1๐‘— ๐‘ฅ = ๐œ†0๐‘— ๐‘ฅ exp(๐›ฝ๐ด)

The number of deaths total in the study necessary to obtain power 1 โˆ’ ๐›พ for detecting a difference corresponding to ๐›ฝ๐ด above, using a stratified logranktest at the ๐›ผ level of significance (two-sided), is equal to

๐‘‘ =4 โˆ— ๐‘ง๐›ผ + ๐‘ง1โˆ’๐›พ

2

๐›ฝ๐ด2

This assumes equal randomization to the two treatments and is the same value as that obtained for unstratified tests. To compute the expected number of deaths using the design stage, we must compute separately over treatments and strata and these should add up to the desired number above.

SAS Example