Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

13
Tests for Rank Correlation Coefficients: II E. C. Fieller; E. S. Pearson Biometrika, Vol. 48, No. 1/2. (Jun., 1961), pp. 29-40. Stable URL: http://links.jstor.org/sici?sici=0006-3444%28196106%2948%3A1%2F2%3C29%3ATFRCCI%3E2.0.CO%3B2-W Biometrika is currently published by Biometrika Trust. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/bio.html. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact [email protected]. http://www.jstor.org Thu Feb 28 22:09:30 2008

Transcript of Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Page 1: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Tests for Rank Correlation Coefficients: II

E. C. Fieller; E. S. Pearson

Biometrika, Vol. 48, No. 1/2. (Jun., 1961), pp. 29-40.

Stable URL:

http://links.jstor.org/sici?sici=0006-3444%28196106%2948%3A1%2F2%3C29%3ATFRCCI%3E2.0.CO%3B2-W

Biometrika is currently published by Biometrika Trust.

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/bio.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.orgThu Feb 28 22:09:30 2008

Page 2: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Biometrikcs (1961), 48, 1 and 2, p. 29

Printed in Great Britain

Tests for rank correlation coefficients. I1 BY E. C. FIELLER AND E. S. PEARSON

Statistical Advisory Unit, War Ofice, and University College London

(This paper must, I fear, be introduced with a short explanatory statement. Before his untimely death on 1 December 1960, Edgar Fieller had organized and seen completed all the heavy calculations for the distributions of the coefficient rFand its transform zF. Publication was delayed because as far as he and I were aware there were no theoretical results for the mean and variance of rF' However, since the empirical results establish clearly the utility of z, from the practical point of view it seems now desirable to put these results on record, and to include in the same paper the new results for the variance of r,. For the arrangement of the material and its discussion I must accept responsibility, but it should be stated that of the three authors of part I of this series it was Fieller who had from the beginning insisted on the inclusion of the rF and zp coefficients in the investigation. He was un-doubtedly right in pressing this point. E.s.P.)

1. INTRODUCTION

I n the first paper in this series, which will be described as part I (Fieller, Hartley & Pearson, 1957)' 25,000 sets of correlated random normal deviates were used to examine the distributions of Spearman's and Kendall's rank correlation coefficients, rs and r,, in samples of n = 10, 30 and 50 observations. I n particular, it was shown that, as in the case of the product moment correlation coefficient r,,, R. A. Fisher's transformation

had a remarkable normalizing property when applied both to r, and r,. In this respect, and also asregardspower of discrimination, it appeared that there was little to choose between these two coefficients. It was also pointed out that the conclusions reached would apply not only to rankings generated by sampling from a bivariate normal parent having correla- tion p, but to a much wider class of parental distributions which have the property of being convertible to the normal by monotonic transformations applied to the marginal dis- tributions.

The present part I1 contains: (a) A short discussion of the more exact results for the variance of Spearman's r, derived from the work of David & Mallows (1961). (b) Results for the Fisher-Yates coefficient r, which had been incomplete in 1957. (c)Some comparison of the distributions of the three transformed coefficients z,, z, and z,. (a)A numerical illustration.

2. THEVARIANCE COEFFICIENTOF SPEARMAN'S rS

The results given in Table 1 have been calculated from equation (X)for var (r,) given by David & Mallows (1961) on p. 22 of the preceding paper. The computation involved was carried out by Barbara Snow in the Department of Statistics, University College; as a check on arithmetic she also used the David & Mallows alternative formula (2)involving the expansion of S, = sin-l p and S, = sin-l &p in powers of p. From the computing point of view it is of interest to note that the results based on (X) and (2)agreed to the five decimal places tabled, except for the case p = 0.9 where there were differences of 2 units (n = 10) and 1 unit (n.= 30,50) in the last place.

Page 3: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Table 1. Theoretical values of var (rs)

Since the a,p and y terms of (X)consist of expansions in powers of Sland S2, the curtail- ment of these series at the 12th power must involve some inaccuracy. No precise estimate of this error is possible without further investigation. David & Mallows considered the cases of p = 0.5 and 0-6 numerically. If we take p = 0.9, it is found that it is the p-term which con- verges most slowly. For n = 10,p = 0.9 the full contributions to var (r,) from this term are:

From terms in: ,'

8; st s$ Sz Hi0 5i2 +0.062465 9552 2635 1126 388 183 (all positive)

Without exploring further, one might hazard a guess that in this case the calculated variance may be too small by 2 or 3 units in the 4th decimal place. For n = 30 and 60, the error should be no more than 1or 2 units in the 5th place, even when p = 0.9. With this caution, it is justifiable to table results to 5 decimal places throughout.

The figures for var (r,) in Table 1 may be compared with those given in Table 1of part I. It will then be noted that:

(a) As expected, the Kendall approximation breaks down for high p and low n. (b ) The results of our experimental sampling conform with the improved theoretical

values. (c) The smoothing of the experimental values, used in the further analysis, was on the

whole successful in providing estimates of the true values. With regard to (c ) , we had used these smoothed values to calculate approximations to

the mean and variance of z, and z,, obtained from

P var (r) b ( z ) = tanh-I P + -(1--2 2 'r

var (r) var (z) = -(1---2r )2 '

The substitution of the true value of var (r,) therefore makes no appreciable alteration in the figures given in Tables 2 and 3 of part I. It may therefore still be concluded that.f

(i) The two terms of the expansion of equation (2) are adequate to give the expected value of zs to all necessary accuracy, if n 3 10,p < 0.9.t

t The changes in var (r,) from the earlier emoothed values bring the observed and theoretical rduee of 2, closer together.

Page 4: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

31 Tests for rank correlation coeficients. 11

(ii) The single term of equation (3)is not adequate to give var (zs). For example we find:

From (3), using smoothed var (r,) 0.1224 0.0370 0.0219 var (2,) for From (3), using true var (r,) -1216 -0374 .0221

From sampling experiment el487 -0388 -0228 From empirical estimate, 1.060/(n-3) -1514 .0392 .0226

There appears to be no reason, therefore, to alter the purely empirical suggestion of part I, namely to take

1.060 1.03 var (zs) = - or a,= -

n-3 J(n-3) '

A N D ITS TRANSFORM ZF

In the notation of part I,we have n pairs of associated rankings

u,, u2, ..., U, and v,, v,, ..., v,, where the integers ui (i= 1,2, ...,n) may be taken in ascending order 1,2,...,n and the v, are a permutation of these integers. If we then attach to both these rankings the appro-priate normal order statistics [(iln), i.e. the expected values of the ith largest standardized deviate in a sample of size n from a normal population, the Fisher-Yates coefficient (1938, pp. 1615)is

(4)i= 1

Tables of c(iln) have been given in several publications (e.g. Fisher & Yates, 1938, Table XX; Pearson & Hartley, 1954,Table 28; Harter, 1961);Fisher & Yates (1938, Table XXI) give the expressions I:g2(iln)for n = 2 (1) 50.

i

(3.2) Test of the hypothesis that p = 0

Whatever be the form of the parent distribution, it follows from Pitman (1937) that if the two variables are independent

where k, and k, are the 2nd and 4th cumulants of the t(iln), i.0.

n 3(n- c=,t 2 ( i l n ) j 2 ) .\k -- (n- 1)(n-2) (n-3) ( ( n + l ) ~ t 4 ( i 1 n ) - ~

i

t As before, in taking averages values for p = 0.9 have been omitted.

Page 5: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

The odd moments of r, are of course zero. The second term within the braces in (6) will tend rapidly to zero as n increases. For example, when n = 10 it is found that

Ic,/ki = -0.491 1

so that the two terms inside the braces become 2.4545 and 0.0136, respectively. Thus the second term is effectively negligible even when n = 10. As Pitman pointed out, l / (n -1) and 3(n - l) /(n+ 1) are the variance and mean of the distribution

p(r) = constant (1 -r2)t(n-4) ( -1 < r < 1).

It follows that without much loss of accuracy, in testing whether p differs from zero we may treat r, as though it were the product moment correlation, r,,, of n pairs of indepen- dently distributed normal variables. This result was implied, though not specifically stated, on pp. 14-15 of Fisher & Yates's Introduction (1938) and was again referred to by Hoeffding (1951, p. 88).

When p $ 0, the distributions of r,, and r, will not be as similar; for example

(3.3) The distributions of r, and their mean and va.riance

As in the case of Spearman and Kendall coefficients, values of r, defined by equation (4) were computed for each of the 2500 samples of 10, 833 samples of 30 and 500 samples of 50. The resulting distributions are not published here, but are available should any statistician wish to make use of them in exploratory work. Table 2 shows the mean and variance of each of the 27 distributions, n = 10,30,50;p = 0.1 (0.1) 0.9.

Table 2. Mean and variance of r,. Experimental values

n = 10 n = 30 n = 50

P

Mean Variance Mean Variance Mean Variance

0.1 0.0868 0.10639 0.0965 0.03278 0.0959 0.01887 0.2 -1740 el0269 el906 a03285 el904 a02143 0.3 .2640 e09833 .2857 -03032 -2909 a01866 0.4 a3550 .09018 a3857 a02506 -3896 -01512 0.5 .4380 -07778 .4719 e02152 .4806 e01228

0.6 0.5387 0.06063 0.5725 0.01603 0.5837 0.009582 0.7 a6415 1 e04221 ~ 6 7 6 0 -009859 .6863 e005465

a02952 : 005240 a7856 .002807 01161 a001954 e8873 .000896:::

1 1 1

As far as is known, the true moments of r, for any p have not been found although there is little doubt that with sufficient effort they could be obtained following the approach of David & Mallows (1961). We have therefore proceeded to an empirical examination of the distribution of z,, each experimental value of r, being converted by the transformation of equation (1) and the results tabled.

Page 6: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

33 Tests for rank correlation coeficients. 11

Table 3. Comparison of observed and theoretical valwes for the mean and variance of z~

P

Smoothed

I +=F v a r ( 5 ~ )

Var (rp) (1-5$)2

Experi-mental Var(zF) tanh-l+

Smoothed

C018. (6)+(7)

I FF v a - 4 ~ ~ ) ( 1 -

Experi-mental

ZF t d - I p

(1) (2) (3) I (5) (6) (7) (8) (9) (10)

n = 10

0.1 0.2 0.3 0.4 0.5

0.0868 el743 a2626 e3518 -4423

0.1076 .lo34 a0980 ,0897 .0773

0.109 ,110 el13 el17 .I19

0.1368 ,1370 el426 a1482 ,1465

0.087 a176 .269 .367 .475

0.009 e019 a030 ,041 ,053

0.096 ,195 .299 a408 a528

0.0966 .I971 .3041 a4185 ,5268

0.1003 .2027 .3095 .4236 .5493

0.6 0.7 0.8 0.9

0.6346 .6300 ~7320 .8460

0.0619 ,0450 -0282 .0121

0.121 -124 ~ 1 3 1 el60

0.1410 .1349* .1543* .1654*

0.696 ,741 .933

1.242

0.065 a078 ~ 0 9 6 ,127

0.661 a819

1.029 1.369

0.6709 .8401*

1.0290* 1.3637*

0.6931 ,8673

1.0986 1.4722

n = 30

0.1 0.2 0.3 0.4 0.5

0.0960 -1902 -2854 .3810 -4770

0.03370 e03260 e03025 -02600 e02125

0.0343 ,0351 ,0359 .0356 ,0356

0.0360 .0379 .0384 .0359 .0382

0.095 ,193 .294 .401 .519

0.003 .007 ,010 .014 ,017

0.099 a199 .304 .416 ,536

0.1004 92001 .3050 .4193 a5300

0.6 0.7 0.8 0.9

0.5739 -6724 ,7740 -8804

0.01585 .01000 ,00525 -00195

0.0352 ,0333 ,0327 ,0386

0.0358 .0331 .0342 .0397

0.653 .815

1.030 1.377

0.020 -022 ,025 ,034

0.674 .837

1.056 1.411

0.6706 .8432

1.0625 1.4085

n = 60

0.1 0.2 0.3 0.4 0.5

0.0967 el935 ~2905 -3880 -4859

0.02035 e01970 e01825 e01565 -01255

0.0207 ,0213 .0218 .0217 ,0215

0.0199 .0240 ,0228 -02 17 e0216

0.097 el96 ,299 .409 .531

0.002 .004 .006 ,008 ,010

0.099 9200 ,305 .418 .541

0.0988 ,1969 .3063 ,4198 ,5338

0.6 0.7 0.8 0.9

-

0.5845 -6841 -7852 -8890

0.00910 -00565 -00281 .00090

0.0210 -0200 ,0191 ,0205

0.0221 .0204 .0195 .0198

0.669 .837

1.059 1.417

0.012 .014

0.015 .018

0.681 ,851

1.074 1.435

0.6809 ,8541

1.0746 1.4264

* For n = 10, the few irtfinite values of z have been ignored for p = 0.7, 0.8 and 0.9.

3 Biom. 48

Page 7: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

34 E. C. FIELLERand E. S. PEARSON

To improve the approximations, (2) and (3), to &(z,) and var (x,), the observed values of r, and var (r,) were smoothed. This was done graphically, making use of the known values of these statistics corresponding to the terminal values p = 0 and 1. The results are given in cols. (2) and (3) of Table 3.

(3.4) The variance of z,

Clearly the most important function of the z-transformation is to provide approximate normality and a variance as nearly as possible independent of the population p. We shall consider first the variance. It is clear from a comparison of cols. (4)and (5) of Table 3 that, as in the cases of xs and x, considered in part I , the approximation of equation (3) is far from adequate when n = 10 and still giving too sinall values when n = 30. As the true mean and variance of r, are a t present undetermined, this is of no great consequence.

What is of much more importance is that the experimental values of var (2,) have an average of nearly l / (n -3), i.e, the same value as is used for the transformed product moment correlation coefficient, z,~,. Thus we find:

Average of estimates of var (2,) Appros. (3) smoothed 0.118 0.0347 0.0209 for p = 0.1-0.8 'Experiment .I43 ,0362 .0215

.I43 ,0370 .0213

Certainly the variance will not be exactly constant, but the sampling error obscures such trend with p as is present. We may note that the variance of the product moment z,, as given by Gayen's (1951) amendment to the original expression of Fisher takes the follow.ing values :

(3.5) The expectation of zF

Column (9)of Table 3 gives the exper i~enta l values of Z,; the standard errors are, approxi- mately: for n = 10, 0.008; for n = 30, 50, 0.007. Cols. (6), ( 7 )and (8) were obtained by in-

?, serting the smoothed values of to &(z,) of equation (2). It seems probable that this expression would be adequate were

and var (r,) from cols. (2) and (3) into the approximation

and var (r,) known. ?, the true values of Because of the very small change in b(z,lp, n) with n, a knowledge of this function is not

in general required in using tests of significance. As, however, it might be needed in con- nexion with estimation, i t was thought that a purely empirical formula (with no theoretical significance) would be worth recording. For simplicity this will need to be expressed directly in terms of p and n rather than the unknown r, and var (r,).

Column (10) of the top section of Table 3 gives values of tanh-lp. It is seen that the experimental means lie between these values and tanh-I?,. This suggested that an expres- sion of the form tanh-lp', with p' < p, might be used. After several attempts, i t was found that a good approximation could be obtained by putting p' = p{l -0.6/(n+ 8)}, i.e. taking

Page 8: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

35 Tests for r a n k correlat ion coeficients. 11

Table 4 compares this approximation with the figures obtained previously by inserting smoothed values of F, and var (r,) into equation (2). The agreement certainly appears adequate for practical use and is likely to hold to the same order of approximation for intermediate values of p and n.

Table 4. Comparison of (i) tanh-l (p') = tanh-I p ( 1 --c8))( and (ii) the

approximation 8(z,) = tanh-I .F,+ i;F var (rF)/(l -?$)2 (smoothed)

(3.6) Shape of the distribution of z,

Values of the moment ratios Jb, and b, were calculated for all distributions except the three containing infinite values of z (i.e. n = 10; p = 0.7,0.8,0.9). They are shown in Tables 5 and 6, together with the corresponding values for zs.-I- The large sample approxi- mations to the standard errors of Jb, and b, in sampling from a normal distribution, namely J(6IN) and J(24/N), give

N = 2500 for N = 833 for N = 500 for n = 10 n = 30 n = 50

Except perhaps in the case of samples of n = 10, there is no consistent evidence of asymmetry in the distributions of either zs or z,. On the other hand the distributions are clearly leptokurtic, with ,f3, > 3. This effect seems to be appreciably the same for both zs and z, and is definitely rather greater than that for the transformed product moment coefficient z,,. The P, values for the last, averaged for p = 0.1-0.8, are shown a t the bottom of Table 6.

To obtain some idea of the effect of this amount of kurtosis on tests of significance (based on the assumption of normality), we may suppose that the distribution of z, is represented by a symmetrical Pearson curve (a type VI I or t-distribution) with the same P, values.

f (It appeal- that the values of these statistics were not completely calculated by E.C.F. for Kendall's coefficient. E.S.P.)

Page 9: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Table 5. Va1~e.sof Jb, for distributions of z

zs (experiment) ZP (experiment) 221

for n = 10 (theory)*

* From the Fisher-Gayen formula.

Table 6. Values of b, for distributions of z

zs (experiment) ZF (experiment)I I I I

* For n = 10, the mean is only for p = 0.1 -0.6. t From the Fisher-Gayen formula.

Page 10: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

Tests for rank correlation coeficients. I1 37

Table 42 of Pearson & Hartley (1954) shows the following modification of standardized percentage points, according to P2(whenP, = 0):

5 % 2.5% 1.0% 0.5% 0*25%t

Normal curve, P, = 3.0 1.64 1.96 2.33 2.58 2.81

Type VII P, = 3.2 1.64 1.97 2.37 2.65 2.91 /J', = 3.4 1.64 1.98 2.40 2.71 3.00

The effect only becomes of importance a t extreme significance levels and, presumably, has never been taken into account in the practical application of the z,, transformation in small samples.

Normal curves were fitted to the distributions and the values of x2obtained were of about the same average size as those given for zs and z , in Table 7 of part I.

I n § 5 of part 1 brief consideration was given to the question of power. It was pointed out. that the power of discrimination of any one of these alternative correlation measures would depend on the rapidity with which its sampling distributions 'drew clear' of one another as the population value of p changed. It was suggested that a rough measure of local sensi- tivity would be given by the ratios (2, -Z1)/2/(s,2,+si2) of

(a) the difference between pairs of consecutive entries for 2 , in col. (9) of Table 3 (or corresponding values for Zs and Z, in part I) , and

(b) the square root of the sum of the corresponding pair of variances from col. (5) of the same table.

Table 7 . Sensitivity ~atios (2, -Z,)/ , i (~,2~ +si2)for z p

Table 8 of part I compared these sensitivity ratios for the z transforms of the product moment and of Spearman's and Kendall's rank correlation coefficients. Table 7 gives similar ratios for the Fisher-Yates coefficient. On comparing the two tables it will be found that with very few exceptions, which are probably the result of sampling fluctuations, $ the sensitivity ratio for z , lies between that for z,, and those for zs and z,. I n so far as the variances of the 2's are nearly constant, the sensitivity in detecting differences in p greater than 0.1 will be found by adding two or more of the consecutive local ratios. The superiority

+ Results for 0.25% from a table in preparation by N. L. Johnson & E. Nixon. $ As in part I, the ratios are calculated from the unsmoothed experimental means and variances.

Page 11: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

of z, over zs and z, will then stand out further still. It follows from this evidence that for the population model considered a test based on the Fisher-Yates coefficient is more likely to detect differences in the population p, if they exist, than tests based on the other two rank coefficients. This empirical result confirms what would be expected on theoretical grounds.

5. A NUMERICAL EXAMPLE

In Table 8 there are shown for each of four samples, nt pairs of associated rankings ( t = 1,2,3,4), where n, = 15, n, = 12, n, = 19 and n, = 14. Assuming that the underlying ranked variables x,,,yti follow four bivariate normal distributions, are the results consistent with the hypothesis that these parent distributions have a common coefficient of correla- tionp? If the parent distributions are not bivariate normal, it is assumed that they each

Table 8. Comparison of four tests for heterogeneity of correlation

Values of vi

Z = Ui

1st sample 2nd sample 3rd sample 4th sample n, = 15 n, = 12

P 9 2 3 2 11 3 8 3 4 6 1 4 9 5 1 2 6 1 10 7 129 5 9 8 1; 8 6 8 9 13 5 18 5

10 14 ' 10 4 13 11 6 ' 11 19 10 12 15 7 11 12 13 12 A 16 111 14 3 - 17 14 15 8 - 15 -

16 - A 13 -17 - - 7 -18 - - 14 -19 - A 12 -

TS 0.211 0.622 0.583 0.894 el43 -424 .368 .714

TF a164 a552 .518 .875 el61 -500 .562 a864 Z xa

TE

T X Y

'4.3 0.214 0.729 0.666 1.444 0.743 8-38 el44 .453 .387 0.896 a454 7.04

ZF -166 .621 a573 1.354 a659 8.36 ZE

el63 -550 .636 1.308 -656 7.70zeu

Page 12: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

39 Tests for rank correlation coeficients. I 1

have the property of being convertible to the normal by monotonic transformations applied to their margins. We should then be testing whether the distributions, after transformation, had a common p.

We can now proceed by using any one of the three rank correlation coefficients rs, r , or r,, applying the z-transformation to the coefficients calculated for each sample and then determining 4

2 2 = wt(z1-Z)2 with v = 3 degrees of freedom, (9)t = 1

4 4 where 2 = w,z,/ 2 w, (10)

t = 1 t=1

and zu, is the reciprocal of the variance appropriate to the type of coefficient chosen. That is to say, using the empirical results:

for Spearman's coefficient w, = (n,-3)/1.060,

for Kendall's coefficient w, = (nl-4)/0.437,

for the Fisher-Yates' coefficient wt = n,- 3.

At the bottom of Table 8, the correlation coefficients are shown with their z-transforms and also the mean z of equation (10)and the x2values of equation (9).The rankings were in fact based on samples of 15, 12, 19 and 14 taken from Fieller, Lewis & Pearson's Tables of Correlated Normal Deviates ( 195&), the parent correlations being different, namely

p1 -- 0.1, p, = 0.3, p, = 0.6, p, = 0.7.

We have also, therefore, shown the four values of r,,, and their z-transforms and the resulting weighted mean z and x2. For v = 3 degrees of freedom, the 10% point of x2 is a t 6.25, the 5 % at 7.81 and the 2.5 % a t 9.35. All four values are therefore on the borderline of significance. The ~ 2 ' sfor the Spearman and Fisher-Yates coefficientshave come out with the top value of 8.36, followed by that for the product moment coefficient with 7.70 and Kendall's coefficient with 7.04.

It must, of course, be emphasized that the ordering of x2 in a single set of samples will not necessarily agree with the order of efficiency with which the tests would be expected to detect differences in the p-values in the long run. For given differences in p, we should expect that x2based on z,, would on the average be greatest, the x2based on z, coming second.

From the point of view of computation, the coefficients rs and r , are obtained most easily while r,, involves the longest calculation; after this stage, the procedure is in all cases the same apart from the difference in weights.

The results discussed in the two papers of this series apply to rankings generated by sampling either from (a)a bivariate normal distribution, or (b) a class of bivariate distribu-tions which are convertible to the normal form by monotonic transformations applied to the margins. I n case (b), p is the coefficientof correlation in the transformed, not the original distribution. The properties of the class under (b) which contain a wide range of skew bivariate forms have not been extensively explored (but see Johnson, 1949).

On theoretical grounds i t would be expected that when dealing with a bivariate normal parent, rank order tests based on r, would be more powerful than those based on r,. The coefficientr, based on paired comparisons falls in a rather different category.

Page 13: Tests for Rank Correlation Coefficients: II E. C. Fieller - Solar MURI

By making use of the z-transformation, all three coefficients are put into a simple com- parable form. In all three cases the distributions of z have variances nearly independent of p, are closely symmetrical but slightly leptokurtic.

The advantage of the Fisher-Yates coefficient zF over the other two lies in its somewhat greater power of discrimination, and in the fact that the sampling variance of zF seems to have nearly the same value as that of z,,, i.e. l/(n- 3)-an easy expression to remember. A disadvantage, which certainly carries some practical weight is that a table or tables are needed to calculate rF.-1- Thus the test does not fall within the class of quick methods which, to quote a description once used by Student, can be worked out in a railway train on the back of an envelope.

As in connection with part I, the authors have been greatly indebted to Miss M. U. Thomas, Mrs Esmr5 Hill and Mr T. Vickers for extensive help in computation.

DAVID,I?. N. & MALLOWS,C. L. (1961). The variance of Spearman's 'rho ' in normal samples. Bio-metrika, 48, 19-28.

FIELLER, E. C., HARTLEY,H. 0. & PEARSON,E. S. (1957). Tests for rank correlation coefficients. I. Biometrika, 44, 470-81.

FIELLER,E. C., LEWIS, T. & PEARSON,E. S. (1955). Correlated random normal deviates. Tracts for Computers, no. XXXVI. Cambridge University Press.

FISHER, and Medical Research R. A. & YATES,F. (1938). Statistical Tables for Biological, ~ ~ r i c u l t u r a l (5th edition, 1957). Edinburgh: Oliver and Boyd.

GAYEN,A. J. (1951). The frequency distribution of the product-moment correlation coefficient in random samples of any size drawn from non-normal universes. Biornetrika, 51, 219-47.

HARTER,H. L. (1961). Expected values of normal order statistics. Bwmetrika, 48, 151-165. HOEFFDING,W. (1951). Optimum non-parametric tests. Proceedings of the Second Berkeley Sym-

posium on Mathematical Statistics and Probability, pp. 83-92. University of California Press. JOHNSON,N. L. (1949). Bivariate distributions based on simple transformation systems. B W t r i k a ,

36, 297-304. PEARSON, H. 0. (1954). Biometrika Tables for Statisticians. Cambridge University E. S. & HARTLEY,

Press. PITMAN,E. J. G. (1937). Significance tests which may be applied to samples from any populations.

11. The correlation coefficient. J.R. Statist. Soc. Suppl. 4, 225-32.

t A table is of course required in all cases to convert T to z.