Approximate multinormal probabilities applied to correlated multiple endpoints in clinical trials

13
STATISTICS IN MEDICINE, VOL. 10, 1123-1135 (1991) APPROXIMATE MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS IN CLINICAL TRIALS SUSAN JAMES Deparrrnent of Mathematics. hirester University. University Road, Leicesrer, U. K SUMMARY Clinical trials with multiple endpoints incur increased familywise type I errors. The Bonferroni correction is a common method used to modify the p-values to account for multiple significance testing. For independent endpoints the Bonferroni method is slightly conservative whereas with high correlation the conservatism is extreme, as demonstrated by Pocock et a!. This paper presents a procedure which allows for the correlation present, whilst adjusting the multiple p-values. The method is based on an approximation derived for multinormal probabilities. 1. INTRODUCTION Clinical trials are often concerned with investigating a treatment effect for several endpoints. By considering a variety of outcomes it is possible to gain better overall knowledge of the treatment regime. However, performing multiple significance tests incurs an increased risk of a false positive result. A method frequently used to adjust for multiple significance tests is the Bonferroni correction. The standard procedure is to perform each significance test then multiply pmin, the minimum p-value obtained, by k, the number of significance tests. This adjusted p-value represents the overall significance level of the trial. The Bonferroni procedure, although simple to apply, suffers from two major drawbacks. The first is that it only concentrates on the smallest p-value, ignoring the collective information. Secondly, it is conservative, particularly when correlation is present. In fact, with high degrees of correlation the conservatism is excessive, and results in a lower power of detecting treatment differences.' Simes2 and Armitage and Parmar3 have proposed amended procedures which nullify the first objection by adjusting the ordered p-values in a progressive chain. Both methods reject the overall null hypothesis if one or more of these adjusted p-values are less than a, the chosen type I error rate. When correlation is present these modifications are less conservative than the standard Bonferroni procedure. However, they do not consider the correlation directly, and consequently the problem is not eliminated. Since clinical trials with multiple endpoints often demonstrate a degree of correlation between outcomes, a method of adjusting p-values for multiple significance tests in the presence of correlation would be desirable. Unfortunately, for normally distributed test statistics, a multinormal probability must be evaluated and this can be quite complex. Johnson and Kotz4 relate various reduction methods for 0277-6715/91/071123-13$06.50 0 1991 by John Wiley & Sons, Ltd. Received June 1990 Revised November 1990

Transcript of Approximate multinormal probabilities applied to correlated multiple endpoints in clinical trials

STATISTICS IN MEDICINE, VOL. 10, 1123-1135 (1991)

APPROXIMATE MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS

IN CLINICAL TRIALS

SUSAN JAMES Deparrrnent of Mathematics. h i r e s t e r University. University Road, Leicesrer, U. K

SUMMARY Clinical trials with multiple endpoints incur increased familywise type I errors. The Bonferroni correction is a common method used to modify the p-values to account for multiple significance testing. For independent endpoints the Bonferroni method is slightly conservative whereas with high correlation the conservatism is extreme, as demonstrated by Pocock et a!. This paper presents a procedure which allows for the correlation present, whilst adjusting the multiple p-values. The method is based on an approximation derived for multinormal probabilities.

1. INTRODUCTION

Clinical trials are often concerned with investigating a treatment effect for several endpoints. By considering a variety of outcomes it is possible to gain better overall knowledge of the treatment regime. However, performing multiple significance tests incurs an increased risk of a false positive result.

A method frequently used to adjust for multiple significance tests is the Bonferroni correction. The standard procedure is to perform each significance test then multiply pmin, the minimum p-value obtained, by k, the number of significance tests. This adjusted p-value represents the overall significance level of the trial.

The Bonferroni procedure, although simple to apply, suffers from two major drawbacks. The first is that it only concentrates on the smallest p-value, ignoring the collective information. Secondly, it is conservative, particularly when correlation is present. In fact, with high degrees of correlation the conservatism is excessive, and results in a lower power of detecting treatment differences.'

Simes2 and Armitage and Parmar3 have proposed amended procedures which nullify the first objection by adjusting the ordered p-values in a progressive chain. Both methods reject the overall null hypothesis if one or more of these adjusted p-values are less than a, the chosen type I error rate. When correlation is present these modifications are less conservative than the standard Bonferroni procedure. However, they do not consider the correlation directly, and consequently the problem is not eliminated.

Since clinical trials with multiple endpoints often demonstrate a degree of correlation between outcomes, a method of adjusting p-values for multiple significance tests in the presence of correlation would be desirable.

Unfortunately, for normally distributed test statistics, a multinormal probability must be evaluated and this can be quite complex. Johnson and Kotz4 relate various reduction methods for

0277-6715/91/071123-13$06.50 0 1991 by John Wiley & Sons, Ltd.

Received June 1990 Revised November 1990

1124 S. JAMES

evaluating multinormal probabilities, but they describe them as 'somewhat laborious in practise, even with the assistance of electronic computers'.

Although technology has progressed dramatically, it would still be convenient to calculate the adjusted p-values without the use of a computer. To this effect Armitage and Parmar produced an empirical approximation which they justify for up to five endpoints. This paper provides an analytic approximation for multinormal probabilities with equal correlation, which can be applied to calculate the adjusted p-values for both the one and the two sided case, and also for any number of endpoints. In addition, it can be used to calculate the power for trials with multiple testing.

2. APPROXIMATING THE ADJUSTED p-VALUE

Assume that the test statistics follow a multivariate normal distribution with equal correlation p between the k endpoints. The first assumption is reasonable since most non-normal test statistics are either asymptotically normal or can be transformed into such a form. Unequal correlation is discussed later in the paper.

Now let pmin denote the smallest of the per-experiment error rates; this requires correcting for multiple significance tests and correlation. If padj denotes the adjusted p-value then

padj = Pr(minimum p < pmin)

= 1 - Pr(a1l p > pmin)

where X i , ( i = I , . . . , k ) , are standardized multinormal with equal correlation p such that,

for the two-sided case and

b = @ - ' ( I - pmin),

for the one-sided case. The approximation is,

where, for the two-sided case:

Dl = (1 - Pmin) '

D2 = 1 - P m i n

D 3 = 0

a = - c o

p ) - D,(2 - 2(1 - p ) ' I 2 - p - p 2 ) (1)

D , = k ( k - l ) $ ~ ( h ) @(z)k-24(z)2dz

= k ( k - 1)4 (b)G(k)

MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS 1125

and for the one-sided case:

The derivation of the approximation and values for G ( k ) are given in Appendix I . Note that the above formulae are simpler than those given in Appendix I because the integration limits are the same for each endpoint.

2.1. Example of use

A two-sided trial with c1 = 0.05, four endpoints, and correlation p = 0.7 between them has the following p-values:

0.019, 0.015, 0.02, 0.032.

The usual Bonferroni correction accepts the null hypothesis of no treatment difference since,

k p m i , = 0.06 > 0.05

whereas the ‘true’ adjusted p-value should be 004344, (calculated using a multivariate normal program, see later).

Using the approximation,

D , = 0.9413, D, = 0.985, D, = 0, D, = 0.0213

giving padj = 004335, which is quite close to the true value (note that Armitage and Parmar’s approximation gives 0.04262).

3. ADEQUACY OF APPROXIMATION

Tables I(a) and I(b) present the adjusted p-values obtained using a minimum p-value equal to a /k , for various levels of correlation. The tables show the effectiveness of the approximation at estimating the true adjusted p-value. They also demonstrate how conservative the standard Bonferroni procedure will be when correlation is present.

The values obtained from Armitage and Parmar’s empirical formula are included for com- parison, although it should be noted that the authors do not certify its use for more than five endpoints.

As stated earlier, a more accurate approximation can be achieved by extending equation (1 ) to include more derivatives (see Appendix 11). The obvious ramification of this is that the greater accuracy gained, the more complex the formula becomes. As the error of the approximation is small any way, it would seem to make this complexity, and potential confusion, unnecessary for most practical situations. Thus the rest of this paper continues to use the simpler form of the approximation defined earlier.

Table I. Adjusted p-values for a clinical trial with k endpoints and pmin = 0.05/k

Correlation k 0 0.1 0.3 0.5 0.7

(a ) Two-sided case

2

3

4

5

6

7

8

9

10

0.0494* 0.0494t 0.0494$ 0.0492 0.0492 0.0492 0.049 1 0.049 I 0.049 1 0.0490 0.0490 0.0490 0.0490 0.0490 0.0490 0.0489 0.0489 0.0489 0.0489 0.0489 0.0489 0.0489 0.0489 0,0489 0,0489 0.0489 0.0489

(h) One-sided cuse 0.0494*

2 0.0494t 0.0492

3 0.0492 0.049 1

4 0.049 1 0.0490

5 0.0490 0.0490

6 0.0490 0.0489

7 0.0489 0.0489

8 0.0489 0.0489

9 0.0489 0.0489

10 0.0489

0.0493 0.0493 0.0492 0.0490 0.0490 0.0488 0.0488 0.0488 0.0487 0.0487 0.0487 0.0488 0.0487 0.0487 0.0488 0.0486 0.0486 0.0488 0.0486 0.0486 0,0489 0.0485 0.0486 0.0489 0.0485 0.0485 0.0489

0,0490 0.0484 0,0485 0.0485 0.0483 0.0483 0.048 1 0.048 I 0.0480 0.0480 0.0479 0.0479 0,0478 0.0478 0.0477 0.0478 0.0477 0.0477

0.0484 0.0483 0.0479 0.0476 0.0475 0.0467 0.0470 0.0469 0.046 1 0.0466 0.0466 0.0459 0.0462 0.0463 0.0459 0.0459 0.046 I 0,0459 0.0457 0.0459 0.0459 0.0455 0.0458 0.0459 0.0453 0.0457 0.0460

0.0476 0.0475 0.0465 0-0463 0.0457 0.0455 0.045 1 0.0450 0.0446 0.0446 0.0442 OW43 0.0439 0.044 1 0.0436 0.0436 0.0433 0.0437

0,0465 0.0463 0.0455 0.0445 0.0442 0.043 1 0.043 1 0.0429 0.04 1 8 0.042 1 0.0420 0.04 1 2 0.04 12 0.04 14 0.0408 0.0405 00409 0.0406 0.0399 0.0405 00404 0.0393 0.040 1 0.0403 0.0388 0.0399 0.0402

0.0454 0.045 1 0,0429 0.0427 0.04 1 3 0.041 1 Oa400 o w 0 1 0.0390 0.0393 0.0383 0.0387 0.0375 0.0382 0.0369 0.0378 00363 0.0375

0.0430 0.0426 0.04 1 5 0.0392 0.0397 0.0382 0.0367 0.0363 0.0362 0.0348 0.0347 0.035 1 0.0333 0.0335 0.0344 0.032 I 0.0326 0.0339 0.03 10 0.03 19 0.0335 0.0301 0.03 1 3 0.0332 0.0293

0.0329 0.0308

0,0417 0.04 14 0.0375 0.0372 0.0348 0.0346 0.0328 0.0329 0.03 12 0.03 16 0.0299 0.0306 0.0288 0.0298 0.0279 0.0292 0.027 1 0.0287

0.9

0.0362 0.0359 0.0342 0.0298 0.0294 0.0303 0.0259 0.0257 0.0280 0.0232 0.0232 0.0267 0.02 12 0.02 14 0.0259 0.0197 0.0200 0.0300 0.0185 0.0 1 90 0.0247 0.0174 0.0181 0.0243 0.0166 0.0 1 74 0.0239

0.035 1 0.0350 0.0286 0.0284 0.0247 0.0246 0.022 1 0.022 1 0.020 1 0.0203 00185 0.0190 0.01 73 0.0171 0.0 I62 0.0 I70 0.01 53 0.0163

* Row I : True values obtained using multivariate normal program t Row 2: Values using approximation ( 1 ) $ Row 3: Armitage and Parmar's approximation In Table I ( h ) the values [or Armitage and Parmar's approximation are omitted (since their approximation does not appear to make any distinction between one or two sided cases)

MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS 1127

4. GENERAL CORRELATION STRUCTURE

Hitherto the assumption of equal correlation between endpoints has been used, although this is rather unrealistic. To apply the approximation for a general correlation structure, a simple and plausible approach is to replace p with the mean correlation, p m , defined by,

where pi j is the correlation between endpoint i and endpoint j , and m = k( k - 1)/2.

is to replace p by pmu where, However, a more accurate procedure, suggested by the Armitage and Parmar approximation,

Presently, the only justification for this approach is an empirical one. Furthermore, it is often necessary to estimate the correlation matrix, either from a previous

study or preferably a pilot trial. If all the endpoints are distributed normally we can estimate the correlation matrix using standard techniques, since the correlations between the standardized test statistics are the same as the correlations between the raw data. In addition, Pocock et a/.’ explain how to determine the correlation matrix for a whole variety of situations; for example, asymp- totically normal test statistics derived from non-normal data such as binary or survival data.

4.1. Example with unequal correlation

A one-sided trial with eight endpoints and correlation matrix p i j such that,

j 2 3 4 5 6 7 8

1 0.54 0.42 0.24 0.57 0.33 0.51 0.54 2 0.63 0.36 0.85 0.50 0.77 0.81 3 0.28 0.66 0.39 0.59 0.63

i 4 0.38 0.22 0.34 0.36 5 0.52 0.81 0.85 6 0.47 0.50 7 0.77

has a minimum p-value of O.OO63. The standard Bonferroni procedure would adjust the p-value to 0.0504 and accept the null hypothesis.

Using the correlation structure given above, p m = 0.5297 and pmu = 0,5983. Therefore, applying equation (1) to approximate the true adjusted p-values, we obtain

and padj = 0.037418 using p m

padj = 0.034696 using p m u .

Note, the true adjusted value should be 0.034408 (Armitage and Parmar’s approximation gives 0.037470).

1128 S. JAMES

5. CLINICAL TRIAL OF ANTIHYPERTENSIVE TREATMENT

A double blind crossover trial was undertaken to study the antihypertensive effects of an active drug versus placebo for 18 patients suffering from primary hypertension. Patients were allocated randomly to the two treatment sequences, treated for 9 days in each period and examined physically after 0, 3, 6 and 9 days. The 10 measurements (9 physiological and 1 haematological) listed in Table 11 were considered suitable for reflecting changes in the patients' condition.

Analysis of covariances was used with all response variables taking the initial value of the variable in each treatment period as a covariate, and the average of the measurements on days 3,6 and 9 as the response. A brief outline of the comparison between treatments is given in Table I1 (one patient who did not comply with the treatment regime has been excluded).

A possible summary for Table I1 might be, 'it was found that the active drug reduced the systolic, diastolic and mean arterial blood pressures significantly for both erect and supine measurements. I t also significantly increased the heart rate (erect but not supine) and decreased packed cell volume. There was no significant effect on body weight.'

These p-values could be amended to account for multiple significance tests using the Bonferroni procedure and multiplying each probability by 10. However, this would be overly conservative since a fair degree of correlation between the endpoints was expected.

An estimate of the correlation matrix p i j obtained from the data is:

i 2 3 4 5 6 7 8 9 10

1 0.67 0.94 0.11 0.89 0.53 0.84 0.22 0.02 0.10 2 0.88 0.28 0.48 0.75 0.66 0.11 0.20 0.24 3 0.20 0.79 0.68 0.84 0.30 0.1 1 0.17 4 0.11 0.29 0.20 0.96 0.17 - 0.32

i 5 0.55 0.93 0.17 0.23 0.13 6 0.82 0.30 0.34 0.46 7 0.25 0.34 0.30

9 0.55 8 - 0.23 - 0.36

Thus the mean correlation is 0.4238 and the variance is 0.0799. Table I1 also provides psdj, the two-sided p-values adjusted for multiple significance testing, using pmv with equation ( 1).

The adjusted p-values do not drastically alter the conclusions drawn above but instead give more realistic estimates for the actual levels of significance. The only endpoints for which the summary is altered are erect heart rate and packed cell volume, both of which now become non- significant. This endorses the original report which was hesitant in claiming any real treatment effect for either of these endpoints.

6. POWER

The approximation may also be used to calculate the power of any trial with multiple testing. For example, if we have k multinormal test statistics with correlation p, the power of detecting H I : Oi = di , for i = 1, . . . , k , is

1 - P r ( z < Z , , . . . , z < Z , )

3 c

Tab

le 1

1. C

ompa

riso

n be

twee

n tr

eatm

ents

Var

iabl

e M

ean

diff

eren

ce

95 p

er c

ent

(pla

cebo

-act

ive)

co

nfid

ence

lim

its

d.f.

F P

Pa

d i

1 Su

pine

sys

tolic

blo

od p

ress

ure

(mm

Hg)

2

Supi

ne d

iast

olic

blo

od p

ress

ure

(mm

Hg)

3

Supi

ne m

ean

arte

rial

blo

od p

ress

ure

(mm

Hg)

4

Supi

ne h

eart

rat

e (b

eats

/min

) 5

Erec

t sy

stol

ic b

lood

pre

ssur

e (m

mH

g)

6 Er

ect

dias

tolic

blo

od p

ress

ure

(mm

Hg)

7

Erec

t m

ean

arte

rial

blo

od p

ress

ure

(mm

Hg)

8

Erec

t he

art

rate

(be

ats/

min

) 9

Bod

y w

eigh

t (k

g)

10 P

acke

d ce

ll vo

lum

e (L

/L)

12.9

7.

2 9.

0 - 1

.2

15.4

8.

1 10

.6

- 2

.1

0.4

0.00

9

(4.6

, 2 1.

2)

(3.6

, 10.

8)

(4.0

, 14.

0)

( - 3

.1,0

.7)

(7.1

,23.

7)

(3.8

, 12.

4)

(5.4

, 15.

8)

( - 3

.9, - 0

.3)

( - 0

.4, 1

.2)

(0.0

00,0

.018

)

1, 14

1,

14

1, 1

4 1,

14

I, 1

4 1,

14

1, 1

4 1,

14

1, 6*

1,

12’

* T

he d

egre

es o

f fre

edom

for

body

wei

ght a

nd p

acke

d ce

ll vo

lum

e ar

e re

duce

d by

mis

sing

dat

a

13.3

4 20

.22

17.5

9 1.

77

16.7

7 18

.13

20.5

3 5.

79

1.32

7.

26

0.00

261

0~00

050

0000

90

0.20

465

0.00

1 09

0.00

080

0.00

047

0.03

050

0.27

298

0.03

584

0.0 1

93 3

0.00

379

0.00

676

0,72

1 17

0.00

8 18

0,

0059

7 0.

0035

5 0.

1986

9 0.

7973

6 0.

2283

7

3

C

r =! r in m

z W

4i 0 2 z 1.

N

W

1130 S. JAMES

where Z , , ( i = I , . . . , k ) are the appropriate critical points. This power can be estimated using equation ( 2 ) in Appendix I and by setting b, = Zi and a, = - co for i = 1 , . . . , k.

With normally distributed data Zi = uan - S i J ( n / 2 ) , where S, = di/ai and a, is the nominal significance level for each endpoint. Therefore, for a one-sided test the Bonferroni procedure requires a, = a / k , and for a two-sided test a, = a / 2 k .

To calculate the power of the test described earlier, it is necessary to find the nominal level, a,,, required for an overall type I error of a. This requires solving 1 - a = F ( p , a,) iteratively.

A rough solution is given by

a*

k(1 - F ( P , a l k ) ) a, =

where (1 - F ( p , a / k ) ) are the values given in Tables I(a) and I(b). Alternatively, Pocock et al.' contains a table of true nominal levels required for a = 005 or 0.025 and certain values of p.

6.1. Example

A one-sided trial has six multinormal endpoints, with correlation p = 0.8,

H, : S, = 0.7, S , = 0.9, S, = 0.6, S , = 1.2, S, = 0.3 and S, = 0.8,

and sample size n = 10. Using equation ( 2 ) and a, = 0.0083, the power of the Bonferroni test is calculated to be 65.64 per cent, whereas the approximate power of the test adjusted for correlation using equation ( I ) is 75.59 per cent, since a,, = 0016619.

Note, the true power of the Bonferroni test is 64.18 per cent and the true power for the test using approximation ( I ) is 73.75 per cent.

7. MULTIVARIATE NORMAL PROGRAMS

For the correlation structure defined by,

p . . = 1 .1 . ( i # j ; - 1 < I , < 1) 1J 1 I

it is possible to reduce the multiple integral for

PI-( a, 6 xi 6 bi

i = I

to a univariate integral. Dunnett6 provides an algorithm with specified error bound for evaluating the integral in this

case. Alternatively, a compact routine may be constructed using Gaussian quadrature, provided a set of weights and abscissae are available. The function DOlBBF from the NAG library' will provide these. The author's own listing is available on request.

8. DISCUSSION

When reporting a clinical trial with several endpoints we are faced with a dilemma. Do we summarize the results considered most important, or report all the p-values obtained and leave the reader to cope with the mass of information? The first option could lead to all kinds of biased conclusions and even the most honest report will be viewed with suspicion. The second allows the

MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS 1131

reader to draw their own conclusions, but may lead to confusion over differing p-values and what effect the multiple testing has had on the results.

This difficulty can be alleviated by publishing both the individual p-values and the adjusted familywise ones, so that the problem of increased type I errors no longer exists. In addition, the overall level of the trial can be quoted as the minimum adjusted p-value.

Alternatively, there are other methods available for analysing multiple endpoints. O'Brien' compares five procedures, one of which is the standard Bonferroni, another is a global test statistic. A global test results in an overall probability for a trial and therefore eliminates the problem of summarizing several p-values. Pocock et al.' elaborated on O'Brien's test and examined its performance for non-normal situations. It was then expanded by Tang et aL9 to include interim analyses; they also discussed the fact that the test can lead to a substantial reduction in sample size compared with existing univariate methods. However, this is also true when applying a Bonferroni-type procedure, especially if an adjustment for correlation is used.

The effectiveness of a particular method will depend on the form of the alternative hypothesis. If all the endpoints are expected to have standardized differences of about the same order, the global test is usually the most powerful. However, it is also very sensitive to changes in the alternative hypothesis. Consider the design of a one-sided trial with 7 endpoints and equal correlation, p = 0.7, such that,

H , : S i = O i = 1 , . . . , 7 and H 1 ' ' S 1 - - 0.8, S, = 0.95, S, = 0.9, S, = 1.0, S, = 0.85, S , = 0.7, S, = 0.75.

Suppose trial organizers decide to use O'Brien's test because the standardized differences are around the same order. The sample size is set at n = 15 since this gives 85.4 per cent power of detecting H , , which is considered adequate. Given this sample size the usual Bonferroni procedure would have 78.4 per cent power and the test using approximation (1) would have 84.3 per cent power.

Next, suppose the trial organizers have been a little over optimistic, and the real situation is such that H , is true with the exception that S, = 0.1 and S, = 0.1. Given this new alternative hypothesis the global test now has only 67.9 per cent power, whereas the others have hardly changed in that the Bonferroni procedure has 77.3 per cent power and the approximation now has 83.3 per cent.

It is clear from the above example that no one method provides a panacea, indeed, O'Brien also highly recommends a new non-parametric rank-sum test. Further work is needed to ascertain under which conditions the various methods should be applied. Nonetheless, using the adjust- ment for correlation will certainly reduce one of the main arguments against a Bonferroni-type procedure.

In addition, the adjustment still allows inferences to be made about an individual hypothesis, whereas the alternative methods previously described only give a test of the overall null hypothesis without providing detailed inferences. Hochberg" describes an extended version of Simes' modified Bonferroni procedure which makes statements on individual hypotheses. However, there is no mention of the level of effect due to correlation.

This paper has mainly been concerned with the methodology for testing multiple endpoints. However, estimates of the treatment effect for each endpoint can be calculated using the familiar univariate confidence intervals with CI replaced by an, the nominal significance level described in Section 6. The problem of estimation when several endpoints are included in a trial has been aired by various authors. Geller and Pocock' question whether univariate intervals should be employed at all, or whether multidimensional ellipsoids should be constructed. Obviously further work is needed in this area since, as yet, no solution is forthcoming.

1132 S. JAMES

A further concern of this paper has been the approximation of multinormal probabilities. For practical usage, such as the one described, it appears to work adequately. Nevertheless, further investigation needs to be done to find an error bound for the approximation.

APPENDIX I : APPROXIMATE MULTINORMAL PROBABILITIES

If X i , ( i = 1, . . . , k ) , are standardized multinormal with equal correlation p, it is possible to evaluate Pr( ai < Xi < bi). However, normally this involves complex procedures which require the use of a computer. The following section describes briefly how to approximate the above probability.

F ( p ) = P r ( (j a, G X i < bi . ) Define

i = 1

This is a multiple integral of order k . It is possible, by expressing X , , . . . , xk in terms of k + 1 independent standardized normal variates, to reduce F ( p ) to a single integral; see Dunnett and Sobel” or Curnow and Dunnett,” such that,

Let

where

10 otherwise

( 0 otherwise

M U L T I N O R M A L PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS 1133

10 otherwise

( 0 otherwise

and G ( k ) = j ?m @(z)2$(z)2dz which is tabulated below. The probability is approximated using

F ( p ) = C I + c,(l - p)1’2 + c3p + c,p2.

F ( p ) = DI(1 - p 2 ) + D , p 2 + D,p(l - p ) + D4(2 - 2(1 - p ) 1 ’ 2 - p - p2) .

Thus

(2)

Note, it is possible to improve the approximation by adding more derivatives. Appendix I1 contains formulae for the extended approximation which includes the second derivative at zero.

G ( k ) can be obtained from the following table:

2 3 4 5 6 7 8 9

10 1 1

0.28209479 0.14 104740 0.08578 128 0.058 14822 0.0422402 1 0.032 19472 0.02542 143 0.020625 18 0.01709725 0.014422 15

12 13 14 15 16 17 18 19 20

0.01234263 0.01069224 0.00935924 0.00826625 000735830 0.00659537 000594782 0.00539322 0.0049 144 1

Note that k ( k - l )G(k) is the expected value of the largest in a sample of k independent observations of a standard normal random variable.

APPENDIX 11

Let F ( p ) = c1 + c,(l - p)1’2 + c 3P + C 4 P 2 + c5p3

Therefore

F ( p ) = D,(1 - p3) + D2p3 + D3p(l - p 2 ) + D, 2 - 2(1 - p ) l i 2 - p - -

1 1 3 4 S. JAMES

where hi, a,, D,, D,, D, and D, are specified as in Appendix I and D5 = F“(0) . Thus for the general probability Pr( , a, < Xi d b,),

where

and

N o t e : To calculate padj use these simplified formulae:

(i) Two-sided case:

D, = 2 k ( & - 1)(1 - p, i , )k-24(b)2b2.

(ii) One-sided case:

k 5 - 4

D - - ( & - I ) ( & - 2 ) ( k - 3)(1 - pmin ) k - 4 4 ( b ) 4 - k ( k - l ) ( k - 2)(1 - p , i , ) k - 3 4 ( b ) 3 b

k + T ( k - 1)(1 - ~ , i , ) ~ - ~ 4 ( b ) ~ b ~ .

ACKNOWLEDGEMENTS

This work was supported by a SERC research award and by Boots the Chemist Plc, who also provided data for the paper. I thank Brian English and Peter Freeman for all their help and the two referees for their helpful comments.

1.

2.

3.

4.

5. 6.

7.

REFERENCES

Pocock, S. J., Geller, N. L. and Tsiatis, A. A. ‘The analysis of multiple endpoints in clinical trials’, Biometrics, 43, 487498 (1987). Sirnes, R. J. ‘An improved Bonferroni procedure for multiple tests of significance’, Biometrika, 73,

Armitage, P. and Parmar, M. ‘Some approaches to the problem of multiplicity in clinical trials’, Proceedings of the X I I I t h International Biometric Conference, Biometric Society, Seattle, 1986. Johnson, N. L. and Kotz, S. Distributions in Statistics; Continuous Multivariate Distributions, Volume 4, Wiley, 1972, pp. 43-83. Winer, B. J. Statistical Principles in Experimental Design, 2nd Edition, McGraw-Hill, New York, 1971. Dunnett, C. W. ‘Multivariate normal probability integrals with product correlation structure’, Applied Statistics, 38, 564-579 (1989). Numerical Algorithms Group. N A G Fortran Library Munual, Mark 10, Numerical Algorithms Group, Oxford, 1983.

751-754 (1986).

MULTINORMAL PROBABILITIES APPLIED TO CORRELATED MULTIPLE ENDPOINTS 1135

8. O’Brien, P. C. ‘Procedures for comparing samples with multiple endpoints’, Biometrics, 40,

9. Tang, D., Gnecco, C. and Geller, N. L. ‘Design of group sequential trials with multiple endpoints’,

10. Hochberg, Y. ‘A sharper Bonferroni procedure for multiple tests of significance’, Biometrika, 75,800-802

1 1 . Geller, N. L. and Pocock, S. J. ‘Interim analyses in randomized clinical trials: ramifications and

12. Dunnett, C. W. and Sobel, M. ‘Approximations to the probability integral and certain percentage points

13. Curnow, R. N. and Dunnett, C. W. ‘The numerical evaluation of certain multivariate normal integrals’,

1079-1087 (1984).

Journal of the American Statistical Association, 84, 7 7 6 7 7 9 (1989).

(1988).

guidelines for practitioners’, Biometrics, 43, 213-223 (1987).

of a multivariate analogue of Student’s t-distribution’, Biornetrika, 42, 258-260 (1955).

Annals of Mathematical Statistics, 33, 571-579 (1962).