Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... ·...

13
Power and Sample Size Calculations A Review and Computer Program William D. Dupont, PhD, and Walton D. Plummer Jr., Department of Preventive Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee BS ABSTRACT: Methods of sample size and power calculations are reviewed for the most com- mon study designs. The sample size and power equations for these designs are shown to be special cases of two generic formulae for sample size and power calculations. A computer program is available that can be used for studies with dichotomous, con- tinuous, or survival response measures. The alternative hypotheses of interest may be specified either in terms of differing response rates, means, or survival times, or in terms of relative risks or odds ratios. Studies with dichotomous or continuous outcomes may involve either a matched or independent study design. The program can determine the sample size needed to detect a specified alternative hypothesis with the required power, the power with which a specific alternative hypothesis can be detected with a given sample size, or the specific alternative hypotheses that can be detected with a given power and sample size. The program can generate help messages on request that fadlitate the use of this software. It writes a log file of all calculated estimates and can produce an output file for plotting power curves. It is written in FORTRAN-77 and is in the public domain. KEY WORDS: Power and sample size calculations, cohort studies, case-control studies, dichotomous or continuous outcomes INTRODUCTION Sample size and power calculations for clinical trials and observational studies are typically performed either by hand [1-5], through the use of published graphs or tables [6-10], or through the use of specialized computer programs [11-17 I. Selecting the sample size for a study inevitably requires a compromise balancing the needs for power, economy, and timeliness. In- vestigators must determine their study's sample size, power, and detectable alternative hypotheses. To do this, it is useful to have a program that, given any two of the preceding parameters, is to be able to calculate the third. The purpose of this article is to introduce such a program (POWER) and to review the power and sample size calculations that are required for the most common study designs. For each design considered in this article, POWER calculates the sample size needed to detect a particular difference in treatment efficacy with a specified power, the power with which a particular difference Address reprint requests to: William D. Dupont, S-3301 Medical Center North, Department of Pre- ventive Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232-2637. Received May 10, 1989; revised October 11, 1989. 116 ControUed Clinical Trials 11:116-128 (1990) 0197-2456/1990/$3.50 © Elsevier Science Publishing Co., Inc. 1990 655 Avenue of Americas, New York, New York I0010

Transcript of Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... ·...

Page 1: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Power and Sample Size Calculations A Review and Computer Program

William D. Dupont, PhD, and Walton D. Plummer Jr., Department of Preventive Medicine, Vanderbilt University School of Medicine, Nashville, Tennessee

BS

ABSTRACT: Methods of sample size and power calculations are reviewed for the most com- mon study designs. The sample size and power equations for these designs are shown to be special cases of two generic formulae for sample size and power calculations. A computer program is available that can be used for studies with dichotomous, con- tinuous, or survival response measures. The alternative hypotheses of interest may be specified either in terms of differing response rates, means, or survival times, or in terms of relative risks or odds ratios. Studies with dichotomous or continuous outcomes may involve either a matched or independent study design. The program can determine the sample size needed to detect a specified alternative hypothesis with the required power, the power with which a specific alternative hypothesis can be detected with a given sample size, or the specific alternative hypotheses that can be detected with a given power and sample size. The program can generate help messages on request that fadlitate the use of this software. It writes a log file of all calculated estimates and can produce an output file for plotting power curves. It is written in FORTRAN-77 and is in the public domain.

KEY WORDS: Power and sample size calculations, cohort studies, case-control studies, dichotomous or continuous outcomes

I N T R O D U C T I O N

Sample size and p o w e r calculations for clinical trials and observat ional s tudies are typically p e r f o r m e d ei ther by hand [1-5], t h rough the use of publ i shed g r aphs or tables [6-10], or th rough the use of specialized c o m p u t e r p rog rams [11-17 I. Selecting the sample size for a s tudy inevitably requires a c o m p r o m i s e balancing the needs for power , economy, and t imeliness. In- vest igators mus t de te rmine their s t udy ' s sample size, power , and detectable al ternat ive hypo theses . To do this, it is useful to have a p r o g r a m that, g iven any two of the p reced ing paramete rs , is to be able to calculate the third.

The p u r p o s e of this article is to in t roduce such a p r o g r a m (POWER) and to review the power and sample size calculations that are required for the most com m on s tudy designs. For each design considered in this article, POWER calculates the sample size n e e d e d to detect a part icular difference in t r ea tment efficacy with a specified power , the p o w e r with which a part icular difference

Address reprint requests to: William D. Dupont, S-3301 Medical Center North, Department of Pre- ventive Medicine, Vanderbilt University School of Medicine, Nashville, TN 37232-2637.

Received May 10, 1989; revised October 11, 1989.

116 ControUed Clinical Trials 11:116-128 (1990) 0197-2456/1990/$3.50 © Elsevier Science Publishing Co., Inc. 1990

655 Avenue of Americas, New York, New York I0010

Page 2: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Review of Power and Sample Size Calculations 117

can be detected with a given sample size, and the difference that can be detected with a specified power and sample size.

The study designs that can be evaluated by this program are summarized in Table 1. In this table, independent study designs refer to those in which subjects are independently selected at random from some target population. Matched designs are ones in which one or more control subjects are matched to each case patient with respect to certain attributes. Paired designs are matched designs with one control per case. In cohort studies, subjects are followed forward in time until some event occurs [18]. All clinical trials are cohort studies. Case-control studies look for risk factors in samples of case patients with a specific disease and control patients who do not have this disease [2]. A survival outcome variable consists of the time until death or some morbid event occurs, or the total follow-up time for a patient who does not suffer this event. Continuous outcome variables like weight or serum creatinine may take a wide range of values. Dichotomous outcomes take only two values such as success or failure, or the presence or absence of some risk factor.

In justifying our study design, the actual power that will be achieved with the selected sample size is more relevant than the power that would have been achieved with other sample sizes that were considered but ultimately rejected. The power associated with the selected sample size can be most effectively demonstrated by plotting the power curve as a function of the true value of the parameter of interest under different alternative hypotheses. The coordinates for such curves can be generated by POWER for input into graph- ics software packages.

POWER is in the public domain and is available from the authors on request for the cost of distribution. It is written in ANSI standard FORTRAN-77 and has been run successfully on both PC computers running under MS-DOS and VAX computers running VMS.

METHODS

Generic Power and Sample Size Formulas

All the methods discussed in this paper are variations on a familiar theme [3, sect. 5.2]. Suppose that we observe responses on n patients (or groups of patients) that are dependent on some parameter 0. Let f(0) be a known mon- otonic function of 0 and let S be a statistic derived from the n responses that has a normal distribution with mean X/-dn f(0) and standard deviation ~(0). Let • [z] be the cumulative probability distribution for a standard normal random variable and let z~ -- ~-111 - a] denote the critical value that is exceeded by a standard normal random variable with probability o~. Let 00 and 0a denote the values of 0 under the null and a specific alternative hypothesis, respec- tively. Let or0 = o'(00), o-a = o-(0o) and let ~ = {f(%) - f(00)}/o'~ denote the difference between f(%) and f(00) expressed in standard deviations of S under the alternative hypothesis. Testing the null hypothesis against a two-sided alternative hypothesis with type I error probability ~ leads to rejection of the null hypothesis when

IS - ~ f(00)l > ~0z~2. (1)

Page 3: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Tab

le 1

Met

hod

Nu

mb

er

Stu

dy

Des

ign

s T

hat

Can

Be

Ev

alu

ated

by

th

e P

OW

ER

Co

mp

ute

r P

rog

ram

~

Stu

dy D

esig

n

Out

com

e T

est

Ind

epen

den

t C

ohor

t vs

. V

aria

ble

Stat

isti

c vs

. M

atch

ed

Cas

e-C

ontr

ol

Ref

eren

ce

1 S

urvi

val

log

rank

In

depe

nden

t C

ohor

t 2

Con

tinu

ous

t P

aire

d E

ithe

r 3

Con

tinu

ous

t In

depe

nden

t E

ithe

r 4

Dic

hoto

mou

s X

~ M

atch

ed

Cas

e-co

ntro

l 5

Dic

hoto

mou

s ×2

P

aire

d C

ohor

t 6

Dic

hoto

mou

s un

corr

ecte

d X

2 In

depe

nden

t C

ase-

-con

trol

7

Dic

hoto

mou

s F

ishe

r's

exac

t In

depe

nden

t E

ithe

r or

cor

rect

ed

X 2

8 D

icho

tom

ous

unco

rrec

ted

X 2

Inde

pend

ent

Coh

ort

Sch

oenf

eld

and

Ric

hter

[10

] P

ears

on a

nd H

arfl

ey [

9]

Pea

rson

and

Har

tley

[91

D

upon

t [6

1 T

his

pape

r S

chle

ssel

man

[2]

C

asag

rand

e et

ai.

[5]

Mei

nert

[1]

"The

ter

ms

used

in

this

tab

le a

re d

efin

ed i

n th

e In

trod

ucti

on.

o

Page 4: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Review of Power and Sample Size Calculations 119

When 0 = 0a, the probability that S will satisfy Eq. 1 equals the power associated with this alternative hypothesis , which is

l -- ~ = (1 ) [~V~ - (O'O/O'a)Zed2] q- ( ~ [ - ~ V ~ - ((YO/O'a)Z~d2]. (2)

The first and second terms on the r ight-hand side of Eq. 2 give the probabilities

under the alternative hypothesis of obtaining S > V~n f(00) + o'0z~,.2 and S < X/-~n f(%) - croZ~2, respectively. One or the other of these terms is usually negligible for relevant values of ~ and 13. The smaller of these two terms will be less than 0.001 as long as 2((ro/~r,)z~/2 + z~ >~ 3.1 (see Appendix). Hence, approximat ing this probability by zero in Eq. 2 yields

n = [(crdcr,,)z~,~ + z~]~/~ ~. (3)

To illustrate the use of Eq. 3, consider a sample of size n randomly drawn from a normal populat ion with mean i~ and known variance ~r ~. Let ~ denote the sample mean, S = ~ Y, f(t~) = ~ be the identi ty function, fro = cro = ~r and ~ = {f(i~) - f(Ix0)}/¢, = ( ~ - ~0)/~r. Then S has a normal distribution with mean X~n ~ and s tandard deviation ~r. Substituting %, ~r,, and ~ into Eq. 3 gives n = (zoo2 + z~)2tr-~/(Ixa - ~0) 2, which is Eq. 5.34 in Ref. 3.

The power and sample size formulas for the s tudy designs considered in this article can now be obtained by substi tuting the appropriate definitions of (r0, cro, and 8 into Eqs. 2 and 3. Equation 3 can also be used to find the specific alternative hypothes is that can be detected with power 1 - I~ and sample size n. These equat ions do not yield a closed solution w h en tr0 ~ ~r~, but can be readily solved by a computer using iterative methods [19[.

POWER provides a warning message with sample size estimates wheneve r 2(o'o/(ra)z,~/2 + Z~ <~ 3.1. When this happens Eq. 3 will provide a sample size estimate whose power exceeds 1 - 13 by no more than a/2. POWER assumes a two-sided alternative hypothesis for all s tudy designs. Sample size calcu- lations for one-sided tests may be obtained by doubl ing the value of a.

Log Rank Tests of Survival Data

Suppose that n patients are to be recruited into each of two t reatment groups dur ing an accrual period of length A who are then followed for an additional follow-up period F. (In other words, follow-up for all patients ends on the same day, with follow-up intervals ranging from F through A + F days.) Assume that recrui tment follows a uniform distribution over the accrual interval A and that the survival times for patients on t reatments 1 and 2 have exponential distributions with medians m~ and m2, respectively.

Let R = m2/m~ be the ratio of median survival times on the two treatments. (R is also the relative hazard or ins tantaneous relative risk for patients on t reatment 1 relative to patients on t reatment 2.) Let t i be the total number of patient days of follow-up on t reatment j and let ai be the number of observed events on t reatment j: j = 1,2. Let S = ~ log(t2adha2)) , m = (rn~ + m2)/2, P ( A ) = {1 - e x p ( - l o g ( 2 ) A / m ) } / ( l o g ( 2 ) A / m ) , G(F) = e x p ( - log(2)F/m), and p = 1 - P ( A ) G ( F ) . Then Schoenfeld and Richter [10_] have shown that S has an asymptotically normal distribution with mean V'n log(R) and approximate s tandard deviation ~r = k/~p. We wish to test the null hypothes is that R = 1. Letting f (R) = log(R), or0 = ~r~ = or, and ~) = {f(R) - f(1)}/~r,, = log(R)/~r,

Page 5: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

120 W.D. Dupont and W.D. Plummer, Jr.

and substi tut ing 8, ¢0, and cr~ into Eqs. 2 and 3 gives the power and sample size formulas associated with the specific alternative hypothes is that the ratio to median survival times equals R. This version of Eq. 3 is identical to the sample size formula der ived by Schoenfeld and Richter [10, p. 169]. These formulas are appropr ia te for studies that will be analyzed using the log-rank test [20] in addit ion to the parametric test of Schoenfeld and Richter [10].

t T e s t s o f P a i r e d C o n t i n u o u s R e s p o n s e D a t a

Let T k ( X ) denote the cumulat ive probabili ty distribution for a t statistic with k degrees of f reedom. Let tk,,~ = T ~ I ( 1 - c~) be the critical value that will be exceeded with probability c~ by a t statistic with k degrees of f reedom. Suppose that a cont inuous response measure is observed on n patients before and after t rea tment and that the difference of these measures on a given pat ient has a normal distribution with mean A and u n k n o w n s tandard deviat ion or. Let ~ denote the average difference in these response measures and let S = V~n ~. We wish to test the null hypothes is that A = 0. Letting 8 = A/or and approximat ing the sample s tandard deviat ion by ~r gives that

1 - /3 = T , , _ l [ S X / ~ n - t~-1. , , /2] + T , _ l [ - a ~ n n - t,-,.¢~,21, (4)

n = ( t , 1.,,.2 q- t,,_. 1,13)2/~ 2, (5 )

by precisely the same a rgument used to derive Eqs. 2 and 3. Equation 5 must now be solved using iterative me thods because n appears on both sides of the equal sign [19].

Exact power and sample size calculations for t tests can be der ived in terms of the noncentral t distribution [21[. Table 10 of Pearson and Hart ley [9] provides graphs for power calculations that are based on this derivation. Their graphs are in close agreement with Eqs. 4 and 5.

t T e s t s for I n d e p e n d e n t C o n t i n u o u s R e s p o n s e D a t a

Suppose that i n d e p e n d e n t normal response measures are observed on patients who either receive an experimental or control t reatment . Suppose fur ther that n patients receive the experimental t rea tment and that the ratio of control to exper imental subjects is m. Let the mean response for experi- mental and control patients be ~1 and ~2, respectively, and assume that the s tandard deviation of responses within each t rea tment g roup is ~ , . Let Yc~ and :t= denote the sample mean res._ponse of the exper imental and control groups respectively, and let S = Vn(i~ - i=). Then S has mean ~/~n(~.~ - ~.2) and s tandard deviat ion cr = ~ , X/1 + l l m . Thus, if we let ~ = (~.1 -- ~._,)/¢, then the analogous power and sample size formulas cor responding to Eqs. 4 and 5 are

1 - /3 = T,,~ ... . 1)-2 [aX/-~ - tn~.,, i) . 2,,.,'2]

+ T,,( .... 1~-2 [ - c r ~ n - t,,~ . . . . I) -- 2,t,!2]

and

(6)

~ ~- (~n(m+D-2~:2 ~- ~{ .... l) 2,~) 2/~2 (~)

Page 6: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Review of Power and Sample Size Calculations 121

Equations 6 and 7 give power and sample size estimates that are in d o s e agreement with the Table 10 of Pearson and Hart ley [9].

X 2 Cont ingency Table Tests of Independent D i c h o t o m o u s Response Data

Suppose that i ndependen t d ichotomous responses are observed on patients who either receive an experimental or control t reatment. Suppose fur ther that n patients receive the experimental t reatment and that the ratio of control to experimental subjects is m. Let po and Pl denote the probabilities that a pat ient on the control or experimental t reatments will have an event (positive re- sponse). Let p denote this probability for all subjects combined and let ~)0 and p~ denote the observed proport ion of events in the two t reatment groups. Let q = 1 - p and qi = 1 - pi: i = 0,1. Then S = ~/-~n(p~ - J)0) will have an asymptotically normal distribution with mean V~n(pl - P0). Under the null hypothes is that p~ = p0 = p, the variance of S will be O~o = pq(1 + l / m ) . Under the alternative hypothes is that p~ - po = 4, this variance will be ~ = poqo/m + Plql. Thus, the power and sample size formulas associated with the specific alternative hypothes is that pl - P0 = ~ may be obtained by substi tut ing 8 = ~/%, ~0, and % into Eqs. 2 and 3. This version of Eq. 3 is identical to Eq. 6.6 of Schlesselman [2] and to Eq. 9.7 of Meinert [1]. When m = 1, this equat ion corresponds to the sample size formula given by Fried- man et al. [8, p. 75].

Equations 2 and 3 may also be used for case-control studies. In these studies p0 and p~ denote the probability of exposure in control and case patients, respectively. The alternative hypothes is for such studies may also be ex- pressed in terms of P0 and the odds ratio ~. In this case p~ = p0~/(1 + P0(~ - 1)). For prospect ive studies, it is generally more useful to express the alternative hypothes is in terms of P0 and the relative risk R = p~/po.

The preceding version of Eq. 3 is appropria te for studies that can be ad- equately assessed with an uncorrected ×~ statistic. Casagrande et al. [5] pro- posed a cont inui ty correction which should be used for studies that will be analyzed with Fisher 's exact test or with a ×2 test that uses a cont inui ty correction. The me thod of Casagrande et al. has been generalized to the case of unequal sample sizes by Fleiss [4, Eq. 3.18]. Let n' denote the est imated number of experimental subjects obtained from Eq. 3. Then the cor responding corrected sample size is

n = ~ 1 + \ / 1 + n m l p o - Pd] " (8)

X 2 Tests for Matched Case-Control Studies

Suppose we have n matched sets, each of which consists of a case patient and m matched controls. Let P0 and pl denote the probability of exposure to some risk factor of interest among control and case patients, respectively, let ~ denote the correlation coefficient for exposure be tween a case and one of his matched controls, and let ~ denote the odds ratio for exposure in cases and controls. Let ql = 1 - Pl, q0 = 1 - P0, Pl~ = plPo + ~ ~ P m = q~po - rb p~~q~p~o, po÷ = pl~/p~, po- = pm/ql, qo÷ = 1 - po÷, qo- = 1 - po-, and

Page 7: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

122 W.D. Dupont and W.D. l'lummer, Ir.

t~ = p~ k - 1 P~7 -~ qo+''' ~ ' l + q~ p~, qi",- ~ : k = 1 . . . . m .

Let n~ deno te the number of matched sets of subjects in which the case patient was (i = 1) or was not (i = 0) exposed and j of the m matched controls were exposed. Let T~ = n~,k_~ + no, t be the number of sets in which k subjects were exposed. Let S = X~'_~ I'll,k_l/ ~ , f(q~) = Y-"~!'=I k t tO/(kq~ + m - k ~ 1), and 0.2(q~) = El'~l k t k O ( m - k + 1)/(k~, + m - k + 1) 2. Then Dupon t [6] shows the condit ional distribution of S given the ancillary statistics Vk" k = 1 . . . . m has an asymptotical ly normal distribution with mean X/77n f(6) and s tandard deviat ion 0.(,b). We wish to test the null hypothes is that ~ = 1. Substi tuting 0.o = o'(1), 0",~ = 0"(0), and 8 = {f(~) - f(1)}/0", , into Eqs. 2 and 3 give the power and sample size formulas associated with the specific alternative hy- pothesis that the odds ratio equals O. These versions of Eqs. 2 and 3 are identical to Eqs. 6 and 7 of Dupon t [6], respectively.

McNemar's Test for Paired Dichotomous Response Data from Prospective Studies

The test statistic discussed in the preceding section reduces to McNemar ' s test for paired studies with a single control per case (Eq. 5.5 of Breslow and Day [22]). Thus, the power calculations of Dupon t [6] are also appropr ia te for paired case-control studies that are evaluated using McNemar ' s test. For paired prospect ive studies with d icho tomous response variables we may be primarily interested in the relative risk of failure in experimental subjects relative to controls. Let P0 and p~ denote the probability of failure among control and exper imental patients, respectively, and let R = p~/po. The sample size and power calculations associated with a specific relative risk R can then be der ived in an analogous fashion to those of the preceding section. These calculations differ only in that now Pl = Rp0, whereas in the paired case- control s tudy, Pl is a function of p,, 0, and ~b [6].

USING THE POWER PROGRAM: AN EXAMPLE

The power and sample size formulas that are reviewed in the preceding section have been implemented in a program called POWER. The power and sample size calculations genera ted by POWER agree with examples publ ished by other authors [2, 4, 6, 8-10]. Users of this program may obtain help mes- sages at any time by typing a quest ion mark (?). These messages give the definit ions of the terms that the user must enter into the program and elim- inate the need for a separate reference manual. An illustration of the use of this program for survival cohort studies follows. In this example prompt ing messages are wri t ten in regular type while input from the user is writ ten in boldface:

S RUN POWER

POWER program. Type ? for help, R for references, (ctrl)z to exit

Please enter a log file name

(The default log file name is POWER.LOG):

Page 8: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Review of Power and Sample Size Calculations 123

Type of outcome variable?

(1 = survival, 2 = continuous, 3 = dichotomous): 1

What do you want to know?

(1 = sample size, 2 = power, 3 = detectable alternative): 1

How is the alternative hypothesis expressed?

(1 = two survival times, 2 = hazard ratio or relative risk): 1

Enter ALPHA, POWER, M1, M2, A, AND F: ?

Input required from the user:

ALPHA

POWER

M1

M2

A

F

Output:

N

Type I error probability for two-sided test

The desired statistical power

Median survival time on control treatment

Median survival time on experimental treatment

Accrual time during which patients are recruited

Additional follow-up time after end of recruitment

Number of patients per group that must be recruited to detect a true ratio of medial survival times M2/M1 with power 1-BETA and type I error probability ALPHA.

Type E to edit previously entered values

Enter ALPHA, POWER, M1, M2, A and F: 0.1, 0.8, 11, 16.5, 24, 12

ALPHA = 0.1000 POWER = 0.8000 M1 = 11.0000

M2 = 16.5000 A = 24.0000 F -= 12.0000

Required sample size: 110

Enter ALPHA, POWER, M1, M2, A, AND F:

Answering "2" to the second question in the preceding example permits the derivation of the power curve associated with a range of different alter- native hypotheses. The coordinates of these curves can be written to a data file for subsequent use by graphics software packages. Figure 1 shows such a curve for survival data with a sample size of 110 patients per group, a median control survival time of 11 months, a two-sided type I error probability of 0.1 (one sided c~ = 0.05), and accrual and follow-up times of 24 and 12 months, respectively. Note that this figure is in agreement with the sample size calculations of the preceding example.

The questions asked by POWER to specify the sample size and power calculation method used are given in Table 2. This table also shows the ac- ceptable answers to these questions and the resulting method that is used.

Page 9: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

124 W.D. Dupont and W.D. Plummer, jr.

D.~

0.8 PATIENTS PER

0.7-~ \ / TREATMENT GROUP = 110

0.6-~ \ / TWO SIDED TYPE I ERROR ~05- \ / PROBAB,LITY = OA

~o " ~ / PATIENT ACCRUAL TIME 0.~,- \ / = 24 MONTHS

0..5- \ / ADDITIONAL FOLLOW-UP TiME O2- ~ / = 12 MONTHS

• ~ MEDIAN SURVIVAL TIME FOR 0.1 CONTROL PATIENTS=I 1 MONTHS

0 . 0 . . . . , . . . . , . . . . , . . . . 0

5 10 15 20 25

MEDIAN SURVIVAL TIME FOR EXPERIMENTAL PATIENTS IN MONTHS

Figure 1 Power curve for a clinical trial in which 110 patients are randomized into each of two treatments. The coordinates of this curve were generated by the POWER computer program using the method Schoenfeld and Richter [101.

A D D I T I O N A L EXAMPLES AND C O M M E N T S

Survival Data

Consider a clinical trial in which patients are r andomized to one of two t rea tments and then fol lowed for some specified length of time, or until death. A statistic that is commonly used to assess t rea tment efficacy for such trials is the log-rank test [20]. This test makes no assumpt ions about how mortal risk varies with time since randomizat ion in ei ther group, and is the optimal test with respect to alternative hypo theses in which the hazard ratio (instan- taneous relative risk) be tween the t rea tment groups remains constant over time [23]. The POWER program uses the me thod of Schoenfeld and Richter [10] to assess the power of trials that will be analyzed with the log-rank test. In this method , the alternative hypothes is of interest is specified by the median survival t imes on the two treatments . If prel iminary data are available, these times may be est imated from Kaplan-Meier survival curves [20,24] to be the times at which 50% of patients in each group will have died. These curves must be extrapolated beyond the available follow-up period if less than 50% of patients have died dur ing this time. If the survival curves follow an ex- ponential distribution then the ratio of the median survival times of the ex- per imental patients relative to the controls will equal the hazard ratio of controls relative to exper imental patients. Thus, if we only have prel iminary data on the control group, we can base our power calculations on the expected median survival time among control patients and the relative risk or hazard ratio between experimental and control subjects that we wish to detect. POWER permits power calculations for survival studies to be formulated in this way. In the preceding example, this could be done by answer ing "2" to the third quest ion to specify that the alternative hypothes is is to be expressed as a hazard ratio. POWER will then ask for this ratio, which, in this example, is

Page 10: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Tab

le 2

Q

ues

tio

ns

Ask

ed b

y t

he P

OW

ER

Pro

gra

m,

the

Acc

epta

ble

An

swer

s,

and

the

Res

ulti

ng S

amp

le S

ize

and

Po

wer

Cal

cula

tion

Met

ho

ds

Th

at A

re U

sed

a

Ans

wer

s

Sur

viva

l C

onti

nuou

s

Q.N

.A. ~

Pa

ired

In

depe

nden

t

Que

stio

ns

Typ

e of

out

com

e va

riab

le?

Wha

t is

the

stu

dy d

esig

n?

Is t

his

a ca

se-c

ontr

ol s

tudy

? Q

.N.A

. M

etho

d nu

mbe

r 1

Q.N

.A.

Q.N

.A.

2 3

aThe

met

hod

num

ber

give

n ab

ove

is d

efin

ed i

n T

able

1.

~Que

stio

n no

t as

ked

Dic

hoto

mou

s

Mat

ched

or

Pai

red

Inde

pend

ent

Yes

N

o Y

es

No

4 5

6,7

7,8

O

Page 11: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

126 W.D. Dupont and W.D. Plummer, Jr.

16.5/11 = 1.5. In other words a trial with 110 patients per group will be able to detect the alternative hypothesis of 50% greater morbidity on the control treatment with 80% power and a 10% type I error.

The Schoenfeld and Richter [10] method permits the follow-up interval to be specified as an accrual interval A when patients are recruited plus an additional follow-up interval F. Note that if all patients are followed for the same length of time then A equals zero and F equals the uniform follow-up interval.

Continuous Response Data

The power of studies of continuous response data are affected by the variation of patient responses within treatment groups. It is for this reason that it is necessary to estimate the standard deviation of patient responses. Although this can be very difficult to do in the absence of good pilot data, it is helpful to bear in mind that 95% of patient responses should lie within a range of four standard deviations.

Paired study designs will be more powerful than independent designs if the matching variables account for a sizable amount of the patient variation. It can, however, be difficult to find suitable pairs of patients if there are several matching variables and the matching criteria are sufficiently strict. In obser- vational studies it is often easier to recruit control patients than case patients. The power of such studies may be increased by recruiting multiple controls per case.

A common clinical trial design involves measuring a response variable on patients before and after treatment. Suppose that patients are randomized to a control and experimental treatment, that the response measure (say weight) is normally distributed, that we measure each patient's weight before and after each treatment, and that we wish to determine whether the change in weight varies between the experimental and control groups. Such studies can be analyzed by an independent t test on the change in weight for each patient. To perform power calculations for such a study using POWER, one must specify a continuous independent study design and then estimate the stan- dard deviation of the change in weight among patients who receive the same treatment. If it is easier to specify the standard deviation (r, of patient's base- line weight, and the correlation coefficient p between baseline and follow-up weight on the same patient, then the standard deviation of the patient's weight change can be calculated to be (r = ~b'V'(2(1 -- f)) [1, sect. 9.4.2.2]. This standard deviation is then entered as S after specifying a continuous paired study design to the POWER program.

Dichotomous Response Data

Suppose that patients are randomized to an experimental or control treat- ment whose outcome is either success or failure. The success rates of these treatments may be assessed using a ×2 test or Fisher's exact text [4]. When the number of successes and failures on each treatment are large, all of these methods are equivalent. There has, however, been considerable controversy over the most appropriate method when the number of patients in one of

Page 12: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

Review of Power and Sample Size Calculations 127

the four cells of the ou tcome table become modera te or small [25]. Meinert [1] r ecommends using the uncorrected X 2 statistic if there are at least 15 patients in each cell of this table. Many authors r ecommend that Yates's corrected ×2 statistic be used for tables with modera te m in im u m cell size and that Fisher's exact test be used when the min imum expected cell size is less than five [4]. The correct sample size calculations for such studies depends on the test statistic that will be used. POWER performs the appropria te sample size calculations for each of these test statistics.

The alternate hypothes is is often expressed in terms of relative risks. For prospective studies this is simply the ratio of failure rates of patients on the two treatments. For case-control studies, relative risk is est imated by the odds ratio if the disease unde r s tudy is rare [4]. POWER asks whe ther the user is planning a case-control s tudy in order to allow the user to express the alter- native hypothes is as an odds ratio. The choice of the test statistic is not affected by whe ther the s tudy is prospective or retrospective.

We thank Robert A. Parker, George W. Reed, Gordon R. Bernard, Curtis L. Meinert, and the referees for helpful advice, and Janelle Steele and Virginia McKinney for assistance in preparing this manuscript. This research was supported in part by NIH grants and contracts HL-14192, N01-AI-52593, R01-CA40517, and R01-CA46492.

REFERENCES

1. Meinert CL: Clinical Trials: Design, Conduct, and Analysis. New York: Oxford University Press, 1986

2. Schlesselman JJ: Case-Control Studies: Design, Conduct, Analysis. New York: Oxforcl University Press, 1982

3. Steel RGD, Torrie JH: Principles and Procedures of Statistics: A Biometrical Ap- proach, 2nd ed. New York: McGraw-Hill, 1980

4. Fleiss JL: Statistical Methods for Rates and Proportions, 2nd ed. New York: Wiley, 1981

5. Casagrande JT, Pike MC, Smith PG: An improved approximate formula for cal- culating sample sizes for comparing two binomial distributions. Biometrics 34:483- 486, 1978

6. Dupont WD: Power calculations for matched case-control studies. Biometrics 44: 1157-1168, 1988

7. Feigl P: A graphical aid for determining sample size when comparing two inde- pendent proportions. Biometrics 34:111-122, 1978

8. Friedman LM, Furberg CD, DeMets DL: Fundamentals of Clinical Trials. Boston: John Wright PSG, 1982

9. Pearson ES, Hartley HO: Biometrika Tables for Statisticians, 3rd ed. Cambridge: Cambridge University Press, 1970, vol I

10. Schoenfeld DA, Richter JR: Nomograms for calculating the number of patients needed for a clinical trial with survival as an endpoint. Biometrics 38:163-170, 1982

11. Gross AJ, Hunt HH, Cantor AB, Clark BC: Sample size determination in clinical trials with an emphasis on exponentially distributed responses. Biometrics 43:875- 883, 1987

12. Halpen J, Brown BW Jr: Designing clinical trials with arbitrary, specification of

Page 13: Power and Sample Size Calculationsstaff.pubhealth.ku.dk/~tag/Teaching/share/material/... · 2019-12-02 · Review of Power and Sample Size Calculations 117 can be detected with a

128 W.D. Dupont and W.D. Plummer, Jr.

survival functions and for the log rank or generalized Wilcoxon test. Controlled Clin Trials 8:177-189, 1987

13. Lachin JM, Foulkes MA: Evaluation of sample size and power for analysis of survival with allowance for nonuniform patient entry, losses to follow-up, non- compliance and stratification. Biometrics 42:507-519, 1986

14. Lakatos E: Sample sizes based on the log rank statistic in complex clinical trials. Biometrics 44:229-241, 1988

15. Parker RA, Bregman DJ: Sample size for individually matched case-control studies. Biometrics 42:919-926, 1986

16. Self SG, Mauritsen RH: Power/sample size for generalized linear models. Bio- metrics 44:79-86, 1988

17. Taulbee JD, Symons MJ: Sample size and duration for cohort studies of survival time with covariables. Biometrics 39:351-360, 1983

18. Kelsey JL, Thompson WD, Evans AS: Methods in Observational Epidemiology. New York: Oxford University Press, 1986

19. Ralston A: A First Course in Numerical Analysis. New York: McGraw-Hill, 1965

20. Peto R, Pike MC, Armitage P, et al: Design and analysis of randomized clinical trials requiring prolonged observation of each patient: II. Analysis and examples. Br J Cancer 35:1-39, 1977

21. Johnson NL, Kotz S: Distributions in Statistics Continuous Univariate Distribu- tions--2. New York: Wiley, 1970

22. Breslow NE, Day NE: Statistical Methods in Cancer Research: Vol I--The Analysis of Case-Control Studies. Lyon: International Agency for Research on Cancer, 1980

23. Peto R: Rank tests of maximal power against Lehmann-type alternatives. Bio- metrika 59:472-475, 1972

24. Lee ET: Statistical Methods for Survival Data Analysis. Belmont, CA: Lifetime Learning, 1980, pp 76-87

25. Dupont WD: Sensitivity of Fisher's exact test to minor perturbations in 2 x 2 contingency tables. Stat Med 5:629-635, 1986

A P P E N D I X

Suppose that 2(Cro/cr~)z./~ ~ z~ ~ 3.1 and we select n according to Eq. 3. Then

I, lx/ n = ÷

and hence

- J g l ~ - (¢~4~, , )z~/2 ~ - 3.1 = @-~ [0.001]. (9)

When ~ > 0, Eq. 9 implies that ~he r ight-most term in Eq. 2 is less than 0.001. When ~ < 0 the other term in Eq. 2 will be ~ 0.001. Hence the approximation used to derive Eq. 3 from Eq. 2 is excellent as long as 2 ( ¢ d ¢ , ) z ~ / ~ ~ z~ ~ 3.1 .