The Behavior of Some Standard Statistical Tests Under

7/28/2019 The Behavior of Some Standard Statistical Tests Under

http://slidepdf.com/reader/full/the-behavior-of-some-standard-statistical-tests-under 1/70

THE BEHAVIOR OF SOME STANDARD STATISTICAL TESTS UNDER

NON-STANDARD CONDITIONS

by

Harold Hotelling

University of North Carolina,

Chapel Hill

Contents

l.

2.. 3·

4.

5·

6.

PageIntroduction and summary 1

Geometry of the Student rat io and,projections on a sphere 4

Probabilities of large values of t in non-standard cases. 8

Samples from a Cauchy distribution 10

Accuracy of approximation . Inequalities 13

Exact probabilities for N=3.Cauchy distribution' 16

7. The double and one-sided exponential distributions 25

8. Student ratios from certain other distributions . 31

9. Conditions for approximation to Student distribution. 34

10. Correlated and heteroscedastic observations . Time series 39

11. Sample variances and correlations 48

12. -Inequalit ies for variance distributions

13. What can be done about non-standard conditions? .

52

58

.Bibliography 65

Research carried out in part under Office of Naval Research Project NR 042 031,Contract Nonr-855(06) and i ts predecessor contract. Reproduction is permittedfor any purpose of th e United States Government.



THE BEHAVIOR OF SOME STANDARD STATISTICAL TESTS

UNDER NON-STANDARD CONDITIONS*

Harold Hotelling

University of North Carolina

1. Introduction and Summary

In the burst of new stat is t ical developments that have followed the

/.

work of Sir Ronald Fisher in introducing and popularizing methods involving

exact probability distributions in place of the old approximations, questions

as to the effects of departures from the assumed normality, independence and

uniform variance have often been subordinated. I t i s true that much recogni-

tion has been given to the exis tence of ser ia l cor re la tion in time ser ies,

with the resultant vitiat ion of stat is t ical tests carried ou t in the absence

of due precautions, and to some other special situations, such as manifestly

unequal variances in l e a s t - s q u a ~ e problems. Wassily Hoeffding has established

some general considerations on the role of hypotheses in stat is t ical decisions

B!:T '. Also, there have been many stu die s o f distributions of the Student

rat io, the sample variance, variance ratio, and correlation coefficient In

samples from non-normal popUlations. C''$ome are c i : t ~ d ' a t the eng. of this

p a p e r . ~ . These efforts have encountered formidable mathematical difficul-

t ies , Btnce the distribution functions sought cannot usually, except in t r iv ial

cases, be easily speci fi ed or calculated in terms of familiar or t abula ted

* This research was carried out in part under Office of Naval ResearchProject NR 042 031, Contract Nonr-855(06) and i ts p r e d e c e s s o ~ contract .



,

2

functions. Because of these difficult ies, mathematics has in some such

studies been supplemented or replaced by experiment ' ~ 2 0 , 5 8 , 5 4 ~ ' ~ t c ~ 7 .) .

or, as in some important w o r ~ 1 .

led to approximations for which definite error bounds are not

a t hand.

Practical statist icians have tended to disregard non-normality, partly

for lack of an adequate body of mathematical theory to which an appeal can

be made, partly because they think i t is too much trouble, and partly because

of a hazy t radi ti on tha t a l l mathematical i l l s arising from non-normality will be

cured by sufficiently large numbers. This las t idea presumably stems from

central l imit theorems, or rumors or inaccurate recollections of them.

Central limit theorems have usually dealt only with linear functions

of a large number of variates, and under various assumptions have proved

convergence to normality as the number increases. For a large but fi xed

number th e approximation of the distribution to normality i s typically close

within a restricted por ti on of i ts range, but bad in the ta i ls . Yet i t is

the ta i ls that are used in tests of significance, and the statist ic used

for a tes t is seldom a linear funct ion o f the observations.I

The non-linear ti

sta t i s t ics t , F and r in common use can be shown ~ 9 , p. 2 6 6 ~ under

certain wide sets of assumptions to have distributions approaching the

normal form, though slowly in the ta i ls . But in the absence of a normal

d is tr ibu tion in the basic population, with independence, there is a lack of

convincing evidence that the errors in these approximations are less than

in the nineteenth-century methods now supplanted, on grounds o f g re at er

accuracy , by the new sta t i s t ics .



3

In ta i ls corresponding to small probabilit ies the behavior of t ,

r and various related s ta t is t ics may be examined with the help of a method

of tubes and spheres proposed in C10J, CllJ and ['12J, and applied by

Daisy Starkey in 1939 to periodogram analysis C57_7 j and by Bradley .. ,"

["4'.,5,6:::.,7,to,t,"'Fand multivar.iate T;! ,',,'" :' : : ' , ; ~ , . :':'.c .: ' : ' ~

, by E. S. Keeping C29J to testing

the f i t of exponential regressions, and by Siddiqui L-55,56 ~ to th e dis

tribution of a serial correlation coefficient.

The distribution of the Student-Fisher t , as used for testing devia

tions of means in samples from various non-normal populations, and in c er ta in

cases of observations of unequal variances and inter-correlations, will be

examined in th e next few sections, with special reference to large values of

t . New exact distributions of t will be found for a few cases. A l imit

for increasing t , with wide applicability, will be obtained for the rat io

of two probabilit ies of t being exceeded, one probabi li ty for the standard

normal theory with independence on which the tables in use ar e based, the

other per ta in ing to var ious cases of non-normality, dependence, and

heteroscedasticity. This l imit depends on the sample size, which as i t

increases provides a double limit for th e rat io. This double l imit, for

random samples from a non-normal population, is for many commonly considered

populations either ° or 1. After considering the accuracy of the f i r s t l imit

and exact distributions of t in c er ta in cases of small samples, we obtain

the conditions on the populat ion that the double limit be unity, a situation

favorable to the use of the standard table. The nature of the condi tions

leads to certain remarks on proposals to modify th e t - tes t for non-normality.



It

4

The marked effects of non-normality on th e distributions of sample

standard deviations and correlation coefficients will then be shown. The

paper will conclude with a discussion of th e dilemmas with which statis-

t icians are confronted by the anomolous behavior in many cases of

sta t is t ica l methods that have become standard, and of possible means of

escape from such ~ i f f i c u l t i e s .

2. Geometry of th e Student Rat io and Projections on a Sphere

To test th e deviation of the mean x of observations ~ , •.• , XNfrom a hypothetical value, which is here assumed without loss of essential

general ity to be zero, the sta t is t ic appropriate in case the observations

constitute a random sample from a normal population, i . e . , are independently

distributed with the same normal distribution, is

(2.1)_ 1

t = x N2/s ,

where s is th e positive square root of

S stands for summation over th e sample, x = Sx/N, and n = N - 1. We shall

continue to adhere to Fisher's convention that S shall denote summation

over a sample and ~ other kinds of summation.

In space of Cartesian coordinates Xl ' . . . , XN le t 0 be the origin

and X a random point whose coord ina te s a re the sample values. Let Q be the

angle, between 0 and inclusive, made by OX with the positive extension of

th e eqUiangular line, i .e . the line on which every point has a l l i t s co-

ordinates equal. I t will soon be shown that

(2.3) t = cot Q.



5

We deal only with population distributions specified everywhere by a

joint density function. This we cal l f(xl, x

2' ••• , ~ ) . On changing to

spher ical polar coordinates by means of the N equations xi = P ~ i ' where

••. , are functions of angular coordinates $1' ••• , $N-l and sat isfy

identically

(2.4) =1

+ ..• + 1,

while th e rad ius vector p is the positive square root of

2+ ~ ;

.'

the Jacobian by which f is multiplied is of the form

Thus the element of probability is

(2.6)

where

is th e element of area on the unit sphere.

Let th e random point X be projected onto the unit sphere by the

radius vector OX into the unique point I on th e same side of 0 as X.

The Cartesian coordinates of are ••• , £N'

(2.7 )

in accordance with (2.6).



6.

Any statist ic that is a continuous function of the N observations,

that is , of the coordinates throughout a region of the n-space,determines

through each point of this region what Fisher has called an i sosta t i s t ical

surface, on a l l poin ts o f which the sta t is t ic takes the same constant value.

I f , as in the cases of t , F and r , the sta t is t ic remains invariant when a l l

observations are multiplied by anyone positive constant, the i sosta t i s t ical

surface is a cone with vertex a t the origin, or a nappe of such a cone. The

probability that the statist ic l ies between specified values is the integral

of (2.1) over the portion of the unit sphere between i ts intersections with

the corresponding cones.

Since t is unaffected by the division of each Xi by p, and therefore

by i ts replacement by the corresponding we may rewrite (2.1) and (2.2) in1

terms of the ~ i ' s and then simplify with the help of (2.4). We also use

the notation

(2.8)

and obvain

1 1

t =a(N - a2)-2(N _ 1)2 •

Thus the locus on the unit sphere for which t is constant is also a locus

on which has the constan t value a. This locus is therefore a sub-sPhere

of intrinsic dimensionality N - 2 =n - 1, that of the original unit sphere

being n =N - 1.

The distance between 0 and the hyperplane (2.8) is a N - ~ , and also

equals cos Q. From these facts (2.3) easily follows.

The probability that t i s greater than a constant T, which we shall

write p(t > T), is the integral of (2.1) over a spherical cap. In the



7

case of independent observations a l l having the same central normal

distribution, a case for which we attaCh a star to each of the symbols f ,

D and P already introduced, we readily find

(2.10)

independently of S.

The constancy of probability density over the sphere in this case,

with the necessity that the integral over the whole sphere must in every case

be unity, provides a ready derivation of the (N - I)-dimensional "area" of

*..1a unit sphere, which is at once seen to be DN

, t h a t is

This is one of s evera l contributions of statistics to geometry arising in

connection with contributions of geometry to statistics.

The N-dimensional "volume" enclosed by such a sphere is found by

integrating with respect to r from 0 to 1 the product of (2.11) by rN-l,

and therefore equals

Evaluation of the probability of t being exceeded in this standard

normal case is equivalent to finding the area of a spherical cap, a problem

solved by Arbhimedes for the case N = 3. A geometrical solution in any

number of dimensions may be obtained by noticing that the subsphere of

N - 2 =n - 1 dimensions considered above for a particular value of t has



..

8

radius sin Q and that i t s " area" is proportional to the (n - l ) s t power of

this radius. By considering a 11 zone" of breadth dQ obtained by differ-

entiating (2.3), the Student distribution may be reached in the form obtained

by Fisher 1127 in 1926 by quite different reasoning and for a broader

class of uses,

The integral of this from T to 00 is the proba.bility P ( t > T). This must

of course be doubled to get the usual two-tailed proba.bility.

3. Probabilities of large values of t in non-standard cases.

The probabi li ty pet > T) when T > 0 i s the integral of the n-dimen

s ional densi ty DN(s) over a spherical cap. This cap is the locus of points

on the unit sphere in N-space whose distance from the equiangular l ine is

1

less than sin Q, where T = n2 cot Q and 0 S Q < in. Central to such a cap

is the point A on the unit sphere a t which a l l the Car tesian coordinates

_1..ar e equal and positive, and are therefore a l l equal to N 2 . The projected

densi ty (2.7) thus takes a t A the value

A similar situation holds a t the point At d i a m e ~ r i c a l l y opposite to A,

1

whose coordinates a?e a l l N-2.

We ccnsider the a p ~ r o x i m a t i o n to pet < T) consisting of the product

of the n-dimensiona.l area of the cap centered a t A by too d ~ s i t y a t A.



9

If corresponding to two different population distributions both

the DN

functions are continuous, the l imit of the ratio of these as T

increases, i f i t eXists, equals the l imit of the ratios of the probabilities

in the two cases of the same event t > T. This also equals the limit of

the ratio of the two probability densities of t when t increases, prOVided

that this las t l imit eXists, as L'Hosp it al 's rule shows.

We are particularly concerned with comparing pet > T) for

various populations with the probability of the same event for a normally

distributed population with zero mean and a fixed variance. Using a star

attached to P or D to identify this case of normality, we introduce for

each alternative population d ist ribu t ion for which the indicated limits

exist the following notation:

pet > T).. --- =

P (t > T)

in accordance with (2.10). By L'Hospital's rule,

lim

t c>·co

get)

*(t),

where get) is the probability density of t in the non-standard case and

g*(t) ( )s given by the standard t density 2.13 .

R = lim ~N ~ 00

We also put



10

I f RNi s less than unity, a standard one-tailed t - tes t of signi

ficance on a sufficiently stringent level, i .e . corresponding to a suffi-

ciently small probability of rejecting a null hypothesis because t exceeds

a standard level given in tables of th e Student distribution, will in

samples of N overstate th e probability of the Type I error of wrongfully

rejecting the null hypothesis. If ~ 1, th e probability of such an error

is understated. Such biases exist even for very large e ~ p l e s i f R ~ 1.

Negative values of t may be treated just as are positive ones above,

replacing A by the diametrically opposite poin t A ', with only t r ivial ly

different results. For symmetrical two-tailed t - tests , DN(A) is to be

replaced by i t s average with DN(A').

In random samples of N from a population with density f(x), i .e .

with independence, (3.1) takes the form

Substituting this in (3.2) yields

l.N -11 OJ N N- ,~ = 2(nN)2 ff<!N)]' 0 g( z17 z ~ dz

4. Samples from a Cauchy distribution

When th e Cauchy density

(4.1) f(x) -1 2 -1= n (1 + x )

is substituted in (3.5) th e integral may be reduced to a beta function by

2/( 2, f dubstitution of a new le t ter for z 1 + Z ) , and we in

(4.2) ~ = ( N / n ) ~ N r ( ~ N ) / r ( N ) .



II

For large samples we find with the help of Stirl ing 's formula that ~approximates

and thus approaches zero. For N = 2,3,4, the respective values of ~ are

.6366, .4134, .2703·

The fa ct that these are markedly less than unity points to substantial over..

estimation of probabilities of getting large values of t when a population

really of the Cauchy type is treated as i f i t were normal.

A desirable but diff icul t enterprise is to provide for these

approximations, which are l imits as t increases to inf inity , definite and

close upper and lower bounds for a fixed t . This can be done for samples

of any size from a Cauchy distribution by a method th at w ill be outlined

below, but the a lgeb ra is lengthy except for very small samples. The case

of th e double exponential, or Laplace,distrtbution is more tractable, as

will be seen later.

For N = 2 the exact distribution of t , which then equals

(xl+ x

2) / Ix

l- x21, is readily found directly by change of variables in

f(Xl

)f(X2) and integration. On the basis of the Cauchy population (4.1)

this yields

(4.4) 21 log (tt_+ll) dt.

2 i t

To prove this, we oeserve that It I has the same distribution as lui,

where

u = (x+y)/(x-y)

and the joint distribution of x and y is specified by



12.

-2 2 -1 2 - 1( l+x) ( l+y) dx dy.

Substituting

x =y(u+l) /(u- l) , dx = -2y (u_l)-2 du,

we have for the joint distribution of y and u,

In spite of a singularity a t the point u = 1, y =0 this may be integrated

With respect to y over a l l real values. The resul t , when t is substituted

for u, is ( 4 ~ 4 ) .The density given by (4.4;) is infinite for t = .:!: 1, but is elsewhere

continuous, and is integrable over any interval. Upon expending in a series

of powers of t -l

and integrating, i t 1s seen that th e probability that any

t > 1 should be exceeded in absolute value is given by the series

411 1

T( ' t + 2 3 + 2 5 + • • . ) ,3 t 5 t

* 1which converges even for t = 1, and then takes the same value '2 as the

corresponding probabi li ty for t = 1 based on a fundamentally normal population,

for which the distribution (2.13) becomes

dt

+ t2

)

( -1Expansion of 4.6) in powers of t and integration gives for the

probability corresponding to (4.5),

(4.6)

* co -2 2/The formula El

n = 8 used here may be derived by integrating

from 0 to the well known Fourier series sin x + (sin 3x) + (sin 5x)/5+ . . .=n/4



(4.7)

','

The ratio of (4.5) to (4.7), or of (4.4) to (4.6), approaches when t 2

increases, agreeing with th e result for N = 2 in (4.3), but exceeds this

limit for f inite values of t .

The 5 per cent point for t based on normal theory, i . e . , th e value

of t for which the probability (4.7 ) equals .05, is 12.706. When this

value of t is substituted in (4.5) th e resulting exact probability is found

to be .0218. The approximation obtained by multiplying R2 from (4.3) by

the originally chosen probability .05 is .0318.

The true 5 per cent point when the underlying population is of the

Cauchy form is 8.119.

When we go further out in the t a i l we may expect a better approxi-

mation in using the product of R2by the "normal theory" probabi li ty to

estima te the true "Cauchy" theory probability, and indeed this is soon

verified: the one per cent point from standard normal theory tables for

n =1, i .e . N = 2 in th e present case, is 63.657. The approximation to the

true Cauchy probability obtained on multiplying the chosen "normal theory'

probability .01 by R2is .006366. The true value given by (4.5) is .006364.

Thus the approximations by the R2method are rough a t th e five per cent

point, but qUite satisfactory at the one per cent point or beyond.

5. Accuracy of a p p r o x i m a t i o n ~ I n e q u a l i t i e s

Various inequalities may be obtained r el evan t t o the accuracy of the

approximations that must usually be used to est imate the probabilities

associated With standard statist ical tests under non-standard conditions.



14

For the Cauchy distribution and others similarly expressible by

rational functions, and some others, th e theorem that the geometric mean of .

posi tive quant i ties cannot exceed the arithmetic mean can be brought into

play. Thus when the n-dimensional density n(g) on th e unit sphere is

specified by substituting in (2.7) for f the product of N Cauchy functions

(4.1) with different arguments, the result ,

_N[CD 2 2 2 2 -1 N-lD ( g > = ~ Lrl + P gl ) ••. (1 + P gN 17 pdp,

in which = 1 and = a, can be shown to exceed D(A), in which each

in (5.1) is replaced by N-land which is given by (3.5). This readily

follows from the fact that the product within the square brackets in

(5.1) is the Nth power of th e geometric mean of these binomials, and is

therefore not greater than the Nth power of their mean 1 + p2/N. Since

th e density on th e sphere thus takes a minimum value a t A, the estimate

of the t a i l probability obtained by mult iplying the area of the polar cap

by the density at i ts center A, equivalent to mult iplying the ta i l probability

given by "normal theory' by the factors ~ given in (4.2) and (4.3), is

an underestimate. A numerical example of this is a t the end of Sec.4.

Another kind of inequality may be used to set upper bounds on the

difference between the probability densities on the sphere a t different

points, and from these may be derived others setting lower bounds. Consider

for example samples of three from a Cauchy population, and instead of

(5.1) use th e notation DOOO

' This will be changed to DIOO

i f gl is

changed to a new value while and g3 remain unchanged. In relation

to a new set of values g2', g3' and the old se t gl' g2' g3' we shall

use D with three subscripts, aach subscript equal to 0 i f the corresponding



15

argument is an old g, or to 1 i f i t is a new g'. Now

ro-N( 2 ,2)JIJ 2 ,2)( 2 2) ( 2 217-1 N+lD

100- D

OOO =n gl - Sl 1 + P Sl 1 + P Sl . . . 1 + P SN pdp.

o

This difference is of the same sign as S12- S22. The integrand on the right

is the product of that of DOOO

by p2/(1+p2s1,2). This last factor is less

-2than for a l l positive values of p. Thus we find

~ In the same way,

and

Combining these results gives

and



16

6. Exact probabi li ties for N =3: Cauchy distribution

For samples of three or more from non-normal populations the

direct method used for samples of two in Section 4 leads usual ly to almost

inextricable difficul t ies. Another method, particularly simple for finding

the por tion of the distribution of t pertaining to values of t greater

than n , will be applied in this section to samples of three from the

Cauchy distribution, and in the next section to samples from the double

exponential. We continue to use what may seem an odd dualism of notation,

with N for the sample value and n

=N-l, in dealing with the Student ratio,

partly in deference to d if fe ring traditions, but primatily because some

of the algebra is simpler in terms of the sample number and some in terms

of the number of degrees of freedom.

I t is easy to prove the following lemma, in which we continue

to use the notation xi for the ith observation, Si= pXi' (i=l, • . .N),

2P > 0, Ss = 1, a = SS:

1LEMMA: If a ~ n2", then a l l Si ~ 0. On the other hand, i f a ll

S. > 0, then a > 1.-If the f i rs t statement were not true, one or more of the

must be negative; then since the sum of the SIS is positive, there must

be a subset o f them, say Sl"" 'S , with 1 < r < N, such thatr -

The maximum possible value of this upper bound for a, subject to the1

last condition, is r2 . Since r is an integer less than N, this leads

1.to a < n2 , which contradicts the hypothesis. The second part of the

lemma follows from the relations



17

222 2a = (SS) = sg + 2Sg.S., Ss =1,

J

and, because a ll the SIS are non-negat ive, SSiS' > O.J -

The Cauchy distribution (4.1) generates on the unit sphere

in N-space the n-dimensional density

(6.1)

This may be decomposed into par t ia l fractions and integrated by ele-

mentary methods. The process is slightly more troublesome for even than

for odd values of N because of the infinite l imit, but the integral

converges to a f ini te positive value for every positive integral N.

For N =3 the expression under the integral sign equals

where

(6.2)

and a2

and a3

may be obtained from this by cyclic permutations.

00r 2 2 -1Jo

(1 + P s) dp = ~ / lsi,

we find

Since

When this sum is evaluated with the help of (6.2) and reduced to a single

fraction, both numerator and denominator equal determinants of the form

known as alternants, and their quotient is a symmetric function of the

form called by Muir a bialternantj this gives a form of (6.1) for a l l odd



18

values of N. In the present case we obtain after cance llat ion of the

common factors,

(6.4)

as the density on an ordinary sphere. This density is continuous excepting

at the six s ingular points where two observations, and hence two g's , are

zero. At each of these six points, t =

We shall use on the unit sphere polar coordinates Q, ¢ with the

equiangular line as polar axis,and

~ the anglebetween

this axisand a

random point having the density (6.4). For samples of th ree , n = 2, and

so we have from (2.3),

(6.5)1.

t = 22 cot Q.

The element of area on the sphere is sin Q dQ d¢. When this is multiplied

by the density (6.4), with the replaced by appropr ia te funct ions of

the polar coordinates, and then integrated with respect to ¢ from 0 to

2n, the result will be the element of probability for Q, which will be

transformed through (6.5) into that for t .

Let Yl' Y2' Y3 be rectangular coordinates of which the third is

measured along the old equiangular l ine. These are re la te d to the old

coordinates g by an orthogonal transformation

(6.6)i= 1, 2, 3.

The orthogonal ity of the transformation implies the following conditions,

in which the sums are with r es pect t o k from 1 to 3:

(6.7) ( j = 1,2).



19

The orthogonality also implies that

(6.8)

2On the sphere, whose equation in terms of the y 's is of course E Yk = 1,

the transformation to the polar coordinates may be written

(6.9) y = sin ¢1

sin Q, Y2 = cos ¢ sin Q, Y3 = cos Q.

The five conditions (6.7) on the six unspecified c i j ' s are suff icient

for the orthogonality of the transformation (6.6), so one degree of

freedom remains in choosing th e CiJ's. This degree of freedom permits

adding an arbitrary constant to ¢, thus changing at will the point on a

circle Q = constant from which ¢ is measured. In going around one of

these circles, ¢ always ranges over an interval of length 2n. This degree

of freedom will be used la ter to simplify a complicated expression by taking

c32

= O.

We shall now derive as a single integral and as a series tbe proba-

bil i ty p(t ~ T) that th e Student t should in a random sample of three

from the Cauchy distribution (4.1) exceed a number T which is i tse lf

greater than 2. We thus confine attention to values of t greater than 2,

~and these correspond to values of a greater than 22 , as is seen by

1putting N =3 and a = 22 in (2.9), which specifies a monotonic increasing

relation between a and t . Then from the lemma a t the beginning of this

section i t follows that a l l the observations and a l l th e e 's are positive

in th e samples with which we are now concerned, as well as in some others.

The positiveness of the S iS permits for the present purpose th e

discarding of the absolute value signs in (6.4). The resulting cubic

expression may, since

(6.10)



20

be written

(6,11)

2With the help of (6.10) and the relation, derived from (6.10) and Ss = 1,

(6.12)

the cubic (6.11) reduces to

With the new symbols

(6.14)

the transformation (6.6) can be rewritten

(6.15)

(6.16)

th e last equation holding because solving (6.6) for Y3 gives

1 1

Y3= 3-2 (s l + £2 + S3) = 3-2a.

From (6.14), (6.7) and (6.9) i t will be seen that

2 2 2 . 2I ;u

k= 0, I ;u

k= Y

1+ Y

2= Q

From (6.16), by reasoning analogous to that leading to (6.12), we find

(6.17)

From (6. 14) ,

(6.18)

where



In each term of kl, the f i rs t two factors may be eliminated with the

help of (6.8), which may also be written

( i , j = 1, 2, 3; i ~ j ) .

Thus k = -3k .1 3

(6.19)

The linear terms thus introduced cance l out in accordance with the second

of (6.7). The three remaining terms are each equal to k3

.

Likewise, k2= -3kl.

At this point a substantial simplification is introduced by using

the one available degree of freedom to make c32

=0, thereby making ~

and kl vanish. The orthogonal matrix in which c . . is the element in thel.J

ith row and jth column is now

1 1 1 -I6-2 _2-2 3-2

e1 1 1

6-2 2-2 3-2

1 - ~-2 x 6-2

° 3

Thus ko = - 2 - ~ 3-3/

2, ~ =O. From (6.18) we now find, with the help of

(6.9),

ulu2u3

= ko

(y13- 3Y1Y22) = ko Sin3g (sin

3¢ - 3 sin ¢ COs2¢)

= 3-3/2sin3 9 sin 3 ¢.

From (6.15), (6.16), (6.17), and (6.19) we find

~ 1 ~ 2 ~ 3 = (a/3 + u1)(a/3 + u2 )(a/3 +u3 )

= a3/2'7 - (a/6)sin2g -+ 3-'i

2 Sin3g: sin 3¢.



22

Substituting this in (6.13) and s impl ifying g ives the express ion that

appears in th e denominator of the density (6.4) in the form

This may be put into terms of t and ¢ alone by substituting in i t the

expressions, derivable when N = 3 from (2.9) and (6.5),

(6.20)

The result is

'e and this equals the conten t of the square bracket in (6.4) when a ll

th e are positive, as they are for t > 2.

In the probability element D 3 ( ~ ) sinQ dQ d¢, the factor sinQ dQ

may be replaced by I d cos Q Ifrom (6.20) that is , by 2 (2 + t2) -3/

2dt.

Substituting (6.21) in (6.4) and multiplying by th e element of area as

thus transformed we have as the element of probability

The probability element for t when i t is greater than 2 will be

found by integrating (6.22)with respect to ¢ from 0 to We put

and observe that

w = 4t3

- 3t,

w - 1 = ( t - 1) (2t +1)2,2

w + 1 = ( t +1) (2t - 1) •



23

From the f i rs t of these relations i t fol lows, s ince ~ ~ 2, that w ~ 25,

and therefore that the integral

(6.25) J =

i

2

"(W - sin 31W

1

d¢

exists. If the substitution ¢, =3¢ is made in this integral, a factor

1/3 is introduced by the differential, but the range of integration is

tripled, and we have

12Jr

-1J = 0 (w - sin ¢, ) d¢' •

This is evaluated by a well known device. For the complex variablei¢'

Z = e , t he path of integration runs counterclockwise along the unit

circle C around 0, and d¢' = dZ/(iz). Since sin ¢, = (z - z-1)/(2i), we

find

Poles ar e at

of which only the la tte r is with in the unit circle. Hence

Substitution in this from (6.24) yields a function of t alone. When this

is multiplied by the facto rs of (6.22) no t included in the integrand of (6.25)

the element of probability g(t) dt is obtained, where

(6.2<5 ) ( t > 2)- .

A check on the reasoning of this whole section is provided by dividing

this expression for g(t) by the corresponding density of t as given by

normal theory, that is , by g*(t) as given by (2.13), and letting t increase.



24

3/2 -1The limit, 3 (4n) , agrees with ~ as given by (4.2), in accordance

with (3.3).

The Laurent series obtained by expanding (6.25) in inverse powers of t

is

and converges uniformly for a l l values of t ~ 2. For any such value i t may

therefore be integra ted to infinity term by term. After doubling to include

the like probabil ity for negative values from th e synmetric distribution

this gives

e a rapidly convergent series.;

Thisprob.ability also ava ilable i n closed form. Indeed, · t h e ~ /

integral of (6.26) may be evaluated after rationalizing the integrand by

means of the substitution t = (u2+ 1)/(u

2_ 1). The indefinite integral then

becomes

1. -1 1. -1 _1.) . .32 (tan 32u - tan 3 2U + constant.

The two inverse tangents may be combined by the usual formula, and th e

probability may be expressed in various ways, of which the most compact

appears to be

(6.28)-1 rz_1. - l ( 2 ~ (

p( It I ~ T) = 1 - 6n arctan L3 2 T T - l ) ~ / , T ~ 2).

One consequence is that p( ItI ~ 2) is about .115.

This equation may be solved for T to obtain a "percentage pointl l

with specif ied probabil ity P of being exceeded in absolute value when th e

null hypothesis 1s t rue. We have thus the formula



25

which is true only i f the value of IT I obtained from i t is at least 2.

Taking in turn P as .05 and .01, we find from (6.29):

7. The double and one-sided exponential distributions

We now derive for random samples of N =n + 1 the probability

that t should exceed any value T ~ n, when the populat ion sampled has a

frequency function of either of the forms

(7.1)-kx

f(x) = ke for x ~ 0, =0 for x < 0;

where k is in each case a positive scale factor. Since t is invariant

under changes of scale, k does not ente r in to i ts distribution and we

shall take k = 1 without any loss of generality. Thus in the following

we shall use the simpler frequency functions

flex) = e-

x

for x~

0, =0 for x < 0;

(7.4) f2(X) = ~ - I x l for a ll real x;

and the results will be applicable without change to (7.1) and (7.2).

We are free also to interchange the signs ~ and >, and to say "positivell

for IInon-negative" without changing our probabilities. The limitation of

T and therefore t to values not less than n implies, in accordance with

the lemma at the beginning of Section 6, that a l l observations are non-

negative in the samples with which we are dealing .

Such samples are represented by points in the "octant" of the

sample space for which a l l coordinates are positive, and in this part of

the space the density, which is the product of N values of the frequency



26

-Sx () -N -Sxfunction for independent variates, is e for 7.3 and 2 e "; for

(7.4). Both these densities are constant over each n-dimensional simplex

(generalized triangle or te trahedron) defined, for c > 0, by an equation

of the form

and the N inequalities xi ~ 0. Since this definition is unchanged by

permuting the XIS, the simplex is regular. I ts n-dimensional volume,

which we designate v , may be determined by noticing that this simplex

nis a face, which may be called the "base'" of an N-dimensional simplex

with vertices at the origin and a t th e N points where the coordinate

axes meet the hyperplane (7.5). These last N points are a l l at distance

c from the origin. A generalization, easily established inductively

by integration, of the ordinary formulae for the area of a triangle and

the volume of a pyramid, shows that the volume of an N-dimensional simplex

is the product of the n-dimensional area of any face, which may be desig-

nated the "base,"hy one-Nth of the length of a perpendicular ("altitude")

dropped upon this base from the opposite vertex. Since the perpendicular

distance from the o rig in t o the hyperplane (7.5) is• .!.

cN 2 , the volume

of th e N-dimensional simplex enclosed between this hyperplane and the

coordinate hyperplanes is v CN-3/2• But this N-dimensional volume mayn

also be computed by taking another face as base, with c as alt i tude,

evaluating the n-dimensional area of this base by the same method, and

so on through a l l lower dimensions. This gives c N / N ~ as the N-dimen-

sional volume. Equating, we find



27

The samples for which t exceeds T are represented by points with-

in a r ight c ir cu la r cone with vertex a t the origin, aXis on the equiangular

1

l ine, and semivertical angle Q, where T = n2cot Q. This cone inter-

sects the hyperplane (7.5) in a sphere whose radius we may cal l r . Then

1

since the d is tance o f th e hyperplane from the origin is CN-2, we find

1 ~ -1r =CN-2 tan Q = cn2N 2T • The n-dimensional volume enclosed by this

sphere is within the simplex and, in accordance with (2.12), equals

nr n2 / r ( ~ n + 1).

When the express ion for r is inserted here and the resul t is divided

by the volume v of the simplex, the result isn

(7.6)

,e

This is the ratio of the n-dimensional volume of the sphere to that of

the simplex, and is independent of c. I t is thus the same on a l l the

hyperplanes specified by (7.5) with positive values of c. Because

of the constancy of the density on each of these hyperplanes, (7.6) is

the probability that t ~ T when i t is given that the probability is unity

that a l l observations are positive, as is the case when the frequency

function (7.1) or (7.3) is the true one. But i f (7.2) or (7.4) holds,

(7.6) is only the conditional probability that t ~ T, and to give the

unconditional probability must be multiplied by the probability of the

-Ncondition, that is , by 2 .

If a two-tailed t - t e s t of the null hypothesis that the mean is

zero is applied and the double exponential (7.2) is the actual popula-

t ion dist ribution, the probability of a verdi ct o f s igni ficance because

of t exceeding T is double the foregoing, and therefore 2-n

times (7.6).



28

With a slight change in the la t ter through reducing the argument of the

gamma function, this gives

(7.7)

The obvious relation

(7.8)

e

which is a special case of the duplication formula for the gamma function,

makes i t possible to write (7.7) in the simpler form

Such probabilities of errors of the f i r s t kind in a two-tailed

t - test are illustrated in Table 1 for the values of T found in tables

based on normal theory corresponding to the frequently used .05, .01

and .001 probabilities. One of these probabilities appears at the top le f t

of each of the three main panels into which the table-- is divided, and

is to be compared with the exact probabilities in the second column of

the pane l, computed as indicated above for the values t* of T obtained

from a table of the Student distribution and often re ferre d to as "percentage

points" or "levels." I t will a t once be evident that the normal-theory

probabilities at the hea,ds of t he panel s are material ly greater than the

respective probabilities of the same events when the basic distribution of

the observed variate is of the double exponential type (7.2).

This tabl e a lso i l lustrates the comparison of the exact probabilities

with the approximations introduced in Section 3 for large values of T.

These approximations, presented in the: bottom row;' of each panel, are

obtained by multiplying the normal-theory probability at the head of the



29

panel by the expression ~ of (3.6) as adapted to the particular case.

For the double exponential (7.4) we find

(7.10)

This may, with the help of (7.8), be put in the simpler alternative form

( 7.11)

e

The table i l lustrates the manner in which the approximation slowly improves

as T increases for a fixed n but grows poorer when n increases and T re-

mains fixed.

TABLE 1

PROBABILITIES FOR t IN SAMPLES FROM THE DOUBLE EXPONENTIAL

Stars refer to normal theory.

Sample2 3 4 5 6

size N

P = .05: 12.706 4.303 3.182 2.776 2.571t .05*

p( It I > t .05

*):

Exact .039 35 .032 65 .031 051 . . . . . .Approximate .039 27 .030 23 .023 132 .017 656 .013 458

P = .01: 63.657 9·925 5.841 4.604 4.032

t .01*

p( It/ > t .ot) :

Exact .007 855 .006 138 .005 120 .004 715 . . .Approximate .007 854 .006 046 .004 626 .003 531 .002 692

P = .001 636.619 31.598 12.941 8.610 6.859

t . 001*

p(lt l > t .001

*):

Exact .000 799 .000 606 .000 471 .000 386 .000 337Approximate .000 785 .000 605 .000 463 .000 353 .000 269



30

The blank spaces in Table 1 correspond to cases in which T < n,

to which our formulae do not apply. Expressions of different analytical

forms could be, but have not been, derived for such cases.

All the probabilities in the body of Table 1 are less than the

normal-theory probabilities .05, .Dl and .001 a t the heads of the respect ive

columns, i l lustrating the greater concentr ati on of the new distribution

about i t s center in comparison with the familiar Student distribution.

As N increases this

N- 2

( N ~ 2 ) ,

~ + 2 n N+l~ = 2 N + 2factors a ll less than unity.he product of three

This concentration is further i l lustrated by the circumstance that ~ < 1

for a l l values of N. Indeed, R

2

= n/4 and R

3

= n/33/2are obviously less

than unity, and for a l l N,

approaches n/(2e) < 1, and R, the l imit of ~ , is zero.

Percentage points, the values that t has assigned probabilities

P of exceeding, can be found for the double exponential population by

solving (7.7) or (7.9) for T, provided th e value thus found is not less

than n. For samples of three (n = 2) the 5 07 point thus found iso

3 . 4 ~ and the 1 070point is 7.78. Each of these is between the corresponding

points for the normal and the Cauchy parent distributions, th e la t ter of

which were found a t the end of Sec. 6. These results are summarized in

Table 2, along with th e values of R3for the same populations, which

provide good approximations when t is very large. The degree of concen-

tration of t about zero is in th e same order by a l l three of the measures

in the columns of Table 2.



31

TABLE 2

PERCENTAGE POINTS OF t AND MULTIPLIERS OF SMALL EXTREME-TALL NORMAL-THEORY

PROBABILITIES FOR SAMPLES OF THREE

Population T. 05 T.OlR3

Normal 4.30 9.92 1.000

Double exponential 3.48 7.78 .785

Cauchy 2·95 3.69 .413

For a f ixed bas ic distribution and a fixed probability of T being

exceeded by t , T is a functi on of n alone. The behavior of this function as

n increases depends on the basic distribution. If this is normal, a

percentage point T approaches a fixed value, th e corresponding percentage

point. of a standard normal distribution of unit variance. If the basic

distribution is the double exponential, i t is deducible from (7.9) with

.!.the help of Stir ling's formula that T = 0(n2 ) when P is fixed.

8. Student ratios ' from eertain other distributions

The distribution of the Student ratio in samples from the "rec-

tangular" distribution with density constant between two limits symmetric

about zero, and elsewhere zero, has been the obje ct o f considerable atten-

tion. The exact distribution for samples of two is easily found by a geo-

metrical method. Perlo 51.-7 in 1931 obtained the exact distribution for

samples of three in terms of elementary functions--not th e same analytic

function over th e whole range--again by geometry, but with such greatly

increased complexity attendant on the extension from two to three dimen-

sions as to discourage attempts to go on in this way to samples larger than

three. Rider ~ _ 7 gave the distribution for N = 2, and also, by enumeration



32

of a ll possibi l i t ies, investigated the distribution in samples of two and

of four from a discrete distribution with equal probabilit ies at point s uni -

formly spaced along an interval; the resul t presumably resembles that for

samples from th e continuous rectangular distribution.

The distribution of t is independent of the scale of the original

1variate, and we take this to have density 2 from -1 to 1, and zero out-

sid e th ese limits. In an N-dimensional sample space the density is then

-N2 within a cube of vertices whose coordinates are a l l t 1, and zero out-

side i t . The projection of the volume of this cube from the center onto

a surrounding sphere will obviously produce a concentration of density at

the points A and AI on the sphere where the coord inates a l l have equal values,

greater than a t the points of minimum density in the ratio of the length of

1

th e diagonal of the cube to an edge, that is , of N2 to 1. Substituting

the frequency function

(8.1)

(8.2)

f(x) = ~ for -1 < x < 1, =0 otherwise, in (3.6) gives

~ = ( n N / 4 ) ~ N / r ( ~ N + 1).

This increases rapidly with N and without bound; indeed, R N + 2 / ~ is nearly

ne, so R = 00 . The lowest values are

e

This provides a sharp contrast with the cases of the Cauchy and the

single and double exponential distributions, for which R = O. There seems

to be a real danger that a s tat is t ic ian applying the st anda rd t - tes t to a

sample of moderate size from a population which, unknown to him, is from a

rectangular population, will mistakenly declare the deviation in th e mean

significant. The opposite was true for the populat ions considered earlier;



33

that is , use of the standard normal-theory tables of t where th e Cauchy or

exponential distribution is the actual one, with central value zero the

hypothesis to be tested, will not lead to rejec tion of this hypothesis as

often as expected.

An interesting generalization of the rectangular distribution is th e

Pearson Type I I frequency function

2 p-lf(x) = c (1 - x ) ,p > 0, -1 < x < 1,

P

which reduces to the rectangular for p =1. Values of p greater than unity

determine frequency curves looking something like normal curves, but the in-

duced distribution of t is very different in i ts ta i ls . These are given

approximately for large t by the p roduct of the corresponding normal theory

probability by

~ = ~ / 2 L F { p ~ ) / r { p 1 7 N r { p N - N + l ) / r { P N - ~ N + l ) .This is a complicated function of which only an incomplete exploration has

been made; i t appears generally to increase rapidly with N, less rapidly

with p; wllether i t has an upper bound has not been determined. If i t ap-

proaches a definite l imit as N increases, the t a i l distribution of t has a

feature not yet encountered in other examples: th e probabilities of the

extreme ta i ls on this Type I I hypothesis , divided by their probabilities

on a normal hypothesis, may have a definite positive l imit as the sample

size incresases. Such l imits of ra t ios have in previous examples been

zero or infinity.

Values of p between °and 1 (not inclusive) yield U-shaped frequency



curves. The expression just given for ~ may then present difficul t ies

because of zero or negative values of the gamma functions. The unusual

nature of the function and the variety of possibi l i t ies connected with i t

appear particularly when studying large samples, as the application of

Stirl ing 's formula is not quite so straightforward when the arguments of

the gamma functions are not a l l positive; and the factor in square brackets

with the exponent Nmay be either greater or less than unity.

Skew distributions and distributions symmetric about values different

from zero generate distributions of t which can in many cases be studied by

th e methods used in this paper, particularly by means of ~ , which is bften

easy to compute and gives approximations good for sufficiently large values

4It of t , even with skew distributions. In ~ a r t i c u l a r , power functions of one

sided or two-sided t - tests can readily be studied in this way for numerous

non-normal populations.

9. Conditions for apprOXimation to Student distribution

We now derive conditions on probability density functions o fa wide

class, necessary and sufficient for the distribution of t in random samples

to converge to the Student distribution in th e special sense that R = 1; that

is , that

R = lim... R_= l'l'l=oo--N '

where P is the probability that t exceeds T in a random sample of N from

a distribution of density f(x), and P* is the probability of the same event

on th e hypothesis of a normal distribution with zero mean. A l ike result

~ will hold, with only t r ivia l and obvious modifications, for the limiting

form when t approaches - 00 . I t will be seen from th e results of this section



35

that the condi tions for this kind of approach to the Student distribution

are of a different character, and are far more restr ict ive t han those

treated in cen tr al l im it theorems for approach to a central normal dist r i -

bution, which is also approached by a Student distribution, with increasing

sample size within a fixed f ini te interval.

The probability density (frequency) function we suppose such that

the probability that x is positive i s i tself positive. In the oppos it e case

the t r iv ial variation mentioned above comes into play. We rewrite (3.6)

in the form

where

and consider only case s in which this last integra l ex is ts and can be ap-

proximated asymptotically by a method of Laplace, of which many accounts are

available, including those of D. V. Widder in The Laplace Transform and of

A. Erdelyi in ASymptotic Expansions. Direc t app li ca tion of this method re

quires that xf(x), and therefore

g(x) = log x + log f(x),

shall be twice differentiable and shall attain i t s maximum value for a

single positive value x of x; and that this be an ordinary maximum in theo

sense that the f i r s t derivative is zero and the second derivative i s nega-

tive a t this point. I t is further required that , for a l l positive x x ,o

the strong inequality g(x) < g(x ) shall hold. Variations of the methodo

may be applied when these conditions are somewhat relaxed; for instance,



when there are several equal maxima, the interval of integration may broken

into subintervals each satisfying the conditions just stated; but we shall

not in this paper consider any such variations.

Thus we deal with cases in which there is a uniquely determined

x > 0 for which g(x ) > g(x) when x x> 0, g' (x ) =0, g"(xo) < 0, and

o 000g(x) = g(x ) +l(x_X )2 g" (x ) + o(x-x )2 •

0 2000

Then (9.2) may be written

lJq=[00 eNg( x) x-1 dx = eNg( "0) [a;,tNg" ("0)(x-Xc)2+ No (x-"o)2x-i ldx

When f , g and their derivatives appear without explicit arguments the

argument x will be understood. With this convention, a new variable ofo

integration

1

Z = (_Ng")2 (x-x)o

leads to the form

IN = eNg JOO e_z2 /21 o(l /N)[Xo+ O ( N - ~ ) J dz

(_Ng")-2o

1(-Ng") -"2 •

An asymptotic approximation to IN' which by definition has a ratio to ~that approaches unity as N increases, is found here by le t t ing the lower

2/ ( _1..l imit tend to -00 and dropping the terms o(z N) and 0 N 2), which are small

in comparison with the terms to which they are added. This gives

(1.. -1 _1.. Ng

I - 21t ) 2 X ( - Ng") 2 eN 0



37

With the understanding that the argument is x in (9.3) and i tso

f i rs t and second derivatives, we find:

whence

g = log Xo+ log f , g'= x-

l+ f ' / f = 0,

o

eg=x f , x = - f / f ' ,

o 0

By st ir l ing 's formula, r(tN) is asymptotically equivalent to

(2n)'LrN_2)/g/(N-l)/2 e-(N-2)/2

N/2 ~ -1This con ta in s the factor (1 - 2/N) - 2 , which may be replaced bye.

In this way we obtain

Making both this substitution and (9.5) in (9.1) and simplifying gives

R_ 2t ( , , )- t -1(2 2g+l)N/2- ~ - -g Xo ne .

Since g" <°by assumption, i t is clear from (9,8) that the l imit

R of ~ as N increases is zero, a positive number, or infinite, according

2 2g+l i 1 th 1 t t th 't Since thiss ne s eSB an, equa 0 , or grea er an y.

2cr i t ical quantity is , by the f i rs t of (9.7), equal to 2ne(x f) , we have

o

the fol lowing result :

THEOREM I . For a frequency function f(x) satisfying the gene ral con

ditions for application of the Laplace aSymptotic approximation to xr(x)

a necessary and sUfficient condition that the corresponding distribution of

the Student ratio have the property that R is a positive constant is that

1~ f = (2ne(2 is the absolute maximum of xr(x), t aken only where x = xo' and

is an ordinary maximum.



In this we replace x and g' byo

If this condition is satisfied, then i t is obvious from (9.8) that

a further necessary and sufficient condition for R to be unity is that

.1. _.1. -1 222 (_gll) 2 XO = 1, that is , xogll + 2 =O.

the expressions for them in the second and third equations respectively in

(9.7) and simplify. The result is simply fit =O. Hence

THEOREM II . In order that R =1 for the distribution of the Student

ratio in random samples from a distriubtion of x of the kind just described,

i t is necessary and suff ic ien t that th e conditions of Theorem I hold and that

fll (xo) =O.

A graphic version of these conditions is as follows. Under the por-

t ion of the smooth frequency curve y = f(x) for which x is positive le t a

~ rectangle be inscribed, with two sides on the coordinate axes and the opposite

vertex at a point L on the curve. Let L be so chosen that th e rectangle has

the greatest possible area. Le t the tangent to the curve a t L meet the x-

axis a t a point M. Then th e tr iangle formed by L and M With th e origin 0

is isosceles, sinee th e first-order condition for maximization, th e second

equation of (9.6) or (9.7), sta te s th at th e subtangent a t L equals i ts

abscissa and is measured in the opposite direction.

The area of the isosceles tr iangle OLM equals that of the maximum rec-

tangle. The necessary and sufficient condition that R be a positive constant

1

is that this area have the same value ( 2 ~ e ) - 2 as for a normal curve sym-

metric about the y-axis. If this is satisfied, the further necessary and

sufficient condition that R = 1 is that L be a point of inflection of the

frequency curve for which x is negative.

I t is partiCUlarly to be noticed that these conditions, which indicate

th e circumstances under which the familiar Student distribution can be



39

tr us ted fo r large values of t and large samples, have nothing to do with

the behavior of the distribution of x for extremely large or extremely small

values of this , the observed variate, nor withi t s

moments, but are solely

concerned with certain relat ions of the frequency function and i t s f i rs t

and second derivatives a t points of inflection.

Proposals to lIcor rectl l t for non-normality by means of higher moments

of x (up to the seventh, according to one suggestion), encounter the diff i

culty that such moments reflect primarily the behavior of the distribution

for very large values, whereas the mainly relevant cri ter ia seem from the

results of this section to concern rather the vicinity of the points of in

flection, which for a normal distribution are at a distance of one standard

deviation from the center. I t is such intermediate or "shoulder" portions

of a frequency curve that seem chiefly to cal l for exploration when the

SUitability of the t tables fo ra part icu la r application is in question,

espec ia lly for stringent tests with large samples. Moments of high order,

even i f known exactly, do not appear to be at a l l sensitive to var ia tions of

frequency curves in th e shoulders. Moreoever, i f moments are est imated from

samples, their standard errors tend to increase rapidly with the order of the

moment. More efficient s tat is t ics for the purpose are evidently to be sought

by other methods.

10. Correlated and heteroscedastic observations. Time series

Much waswritten in the nineteenth century, principally by astronomers

and surveyers, about the problem of unequal variances of errors of observa

tion, and hence different weights, which ar e inversely proportional to the

variances. Much has been written in the twentieth century, largely by ec

nomists and others concerned with time series, about the problems raised by



40

th e la c k of independence o f o bs er va ti on s, p a r t i c u l a r l y by the tendency

fo r observations close to each o th e r in time o r space to be more a like

t ha n o b se rv at io ns remote from each other. The two problems are closely

re la te d, though t h i s f a c t i s seldom mentioned i n the two separate l i t e r a -

ture s . They combine into one problem where the o b s e rv a tio n s , o r th e e r r o r s

in them, have j o i n t l y a multivariate normal d i s t r i b u t i o n i n which both

correlations and unequal variances may be present. Problems o f estimating

or t e s t i n g means and regression c oe ffic ie nts may in such cases e a s i l y be

reduced by l i n e a r transformation to the standard forms in which the obser-

va ti on s a re independent with a uniform variance, provided only t h a t the

correlations and the r a t i o s of the variances are known, s in ce t he se para-

meters e nte r in to t he t ra n sf o rm a ti on .

Thus, i f a multivariate normal d i s t r i b u t i o n is assumed, the leading

methodological problem is the determination o f th is s e t o f parameters, which

c o l l e c t i v e l y are equivalent to the covariance matrix o r i ts inverse, a p a r t from

a common f a c t o r . The number o f independent parameters needed to reduce th e

problem to one of l i n e a r e s tima tio n of standard type i s l e s s by unity than

the t o t a l number of variances and correlations o f the N observations, and

th e re fo re equals N + ~ N ( N - l ) - 1 = ~ ( N - l ) ( N + 2 ) . Since t h i s exceeds the num-

b er o f o bse rv at io ns , often g r e a t l y , th e re i s no hope of obtaining reasonable

estimates from these N observations alone. Numerous expedients, none of

them universally s a t i s f a c t o r y , are available f or a rr iv in g a t values fo r

these parameters i n various applications. Thus astronomers add estimated

variances due to d i f f e r e n t sources, one of which, th e "personal equation,"

~ involves a sum of squares of estimated e r r o r s committed by an in d iv id u a l

observer. Fo r observations made at d i f f e r e n t times, a maintained assumption,



41

ostensibly a priori knowledge, that the observations or their errors con-

sti tute.a stationary stochastic process of a particular sort, often reduces

the number of independent parameters enough so that, with large aamples,

estimates can be made with small calculated standard error, but valid

only i f the maintained assumption is close to the real t ruth. Another ex

pedient, often in time series the most satisfactory of al l , is to adjust

the observations by means of concurrent variables, either supplementary

observations, such as temperature and barometric pressure in biological

experiments, or variables known a priori, such ·as the time as used in es

timating seasonal variation and secular t rend with the help of a number of

estimated parameters sufficiently small to leave an adequate number of de

grees of freedom for the effec ts pr inc ipally to be examined. The simplest

of a l l such expedients is merely to ignore any possible dif fe rences of vari

ances or of intercorrelations that may exist.

If methods of this kind leave something to be desired, the stat ist ician

using them may be well advised at the next stage, when estimating the ex

pected value common to a l l the observations, or that of a specified linear

function of a l l the observations, or testing a hypothesis that such a linear

function has a specified expected value, to modify or supplement methods

based on the Student distribution so as to b rin g ou t as clearly as possible

how the probability used is affected by variations in the covariance matrix.

In addition, when as here the maintained hypotheses about the ancillary pa

rameters of the exact model used are somewhat suspect, a general protective

measure, in the nature of insurance against too-frequent assertions that

part icular effec ts are signiticant, is simply to require for such a judgment

of s igni ficance an unusually small value of the probability of greater dis-



42

crepancy than that actually found for the sample. Thus instead of such

conventional values of this probability P as .05 and .01, i t may be rea-

sonable, where some small intercorrelations or dif ferences in variances in

the observations are suspected but are not well enough known to be used in

the calcu la tion , to disclaim any finding of significance unless P is below

some smaller value such as .001 or .0001. Such circumstances point to use

of the t -s tat is t ic , with a new distribution based on a distribution of ob-

servations which, though multivariate normal, differs from that usually

assumed in that the several observations may be intercorrelated and may

have unequal variances. Moreover, the ta i ls of the new t-distribution be

yond large values of It I are here of particular interest. The probabilities

of such ta ilS w ill now be approximated by methods s im ila r to those already

used in this paper when dealing with t in samples from non-normal distri-

butions.

Consider N observations, whose deviations from their respective ex-

pectations will be calledx l " " ' ~ ' having

a non-singularmultivariate

normal distribution whose density throughout an N-dimensional euclidean

space is

(10.1)

where ~ is the inverse of the covariance matrix and is symmetric, is

i ts determinant, and . .= Aj,. is the element in the i th row and j th column

o f ~ .Let the Student t be computed from these

X iS

by the formula (2.1),

just as i f were a scalar matrix as in familiar usage, wi th the object of

testing whether the expecta tions of the x's, assumed equal to each other,

were zero. Then, as betore",t='(N-l)"cotQWhere Q is the angle between the

equiangular line xl= •.•= ~ and the line through the origin 0 and the point



x whose coordinates are x l " " ' ~ ' When the points x are projected onto

the (N-l)-dimenSional unit sphere about 0, the density of the projected points

~ ( E l " " ' ~ N ) is given by (2.7), which for the part icu lar d ist r ibu tion (10.1)

becomes

(10.2) D N ( ~ ) = [ ' : (PS 1, .. · ,P5N

)pN

-ldP

= (2n)-N/2 IA /i J : ~ N . l e - t p 2 E E A 1 J ~ 1 ~ J 4p

When ~ = 1 (the identity matrix), the double sum in the last parenthesis

reduces to unity, since 512

+. . . + 5N2= 1, and in this case

The ratio of (10.2) to (10.3) is

(10.4)

and is the ratio of the density in our case to that in th e standard case

a t point S on the sphere. I f ' 5 moves to either of the points A, where a l l

5i= N - ~ , or AI, where a l l 5

i= - N - ~ , (10.4) approaches

this we cal l s i m p l y ~ . For large values of t , (10.5) approximates both the

ratio of the probabili ty densit ies and the ratio of the probabilities of a

~ greater value of t in our case to that in t he st anda rd case.



44

A small bias may here be mentioned in our earl ier method of approxi

mating the distribution of t in random s am pl es, a bias generally absent

from the applications in the present sec tion. The approximation is , as

was seen earl ier, equivalent to replacing the integral of the density over

a spherical cap centered at A or AI by the product of D(A) or D(A I ) respec

tively by the (N-l)-dimensional area of the cap. In the applications we have

considered to the distribution of t in random samples from non-normal popula

tions, A and AI are likely to be maximum or minimum points of the density on

the sphere, in which case the approximation will tend to be biased, with

the product of ~ by the probability in the s tandard case of t taking a

greater value tending in case of a maximum to overestimate slightly the pro

bability of this event and in case of a minimum to underestimate i t . In non

random sampling of the kind dealt with in this section, maxima and minima of

the density on the sphere can occur only at points that are also on latent

vectors of the quadratic form. A and AI will be such points only for a

special subset of the m a t r i c e s ~ . Apart from these special cases, the ap

proximation should usually be more accurate in a certain qualitative sense

with correlated and unequally variable observations than with random samples

from non-normal distributions.

The simplest and oldest case is that of heteroscedasticity with in

dependence. Here both the covariance matrix and i ts inverse are of diag

onal form. Let wl, ••• ,w

N

be the principal diagonal elements of while

a l l i ts other elements are zero. These w's are the reciprocals of the vari

ances, and are therefore the true weights of the observations. From (10.5),



(10.6)

= (geOmetric mean )N/2 < 1

arithmetic mean -

Hence, fo r suff ic ien tly large T,

(10.7) *( t > T) ~ P ( t > T),

where th e right-hand probability is that found i n famil ia r tables.

Moreover, the ratio of the tru e p robability to the standard one is seen

from (10.6) to approach zero i f the observations are replicated more and

more times with th e same unequal but correct weights, and with concurrent

appropriate decreases in the cri t ica l probability used. The equality signs

in (10.6) and (10.7) hold only i f the true weights are a l l equal.

In con tr as t t o th e foregoing cases are those of correlated normal vari-

ates with equal variances. The common variance may without affecting th e

distribution of t be taken as unity, and th e covariances then become the

correlations.We

consider two special cases.

A simple model useful in time series analysis has as the correlation

between the i th and j th observations pl i - j l , with p of course the serial cor-



46

From this we find

The determinant of is of course the reciprocal of that of the correlation

matrix, and therefore equals (1 - p2)-N+l. Substituting in (10.5) gives

(10.8)N'2

(2 .l. -N r , ( )-L_-17" ,.

RN= 1 - p )2(1 - p) . Ll + 2p 1 - p ~ _ •

If N increases this becomes infinite , in contrast with the case of in-

dependent observations of unequal variances, where th e limit was zero.

RN also becomes infinite as p approaches 1 with N fixed. But when p tends

to -1 with N fixed, RNvanishes.

As another example, consider N observations, each with correlation p

~ with each of the others, and a l l with equal variances. The correlation

matrix has a determinant equal to

which must be positive, implying that p > _(N_l)-l; and th e inve rse matrix

may be written

l+g g

g l+g

~ = (l_p)-l

g •

g .

g

g

where

g

g = -pLl + ( N _ l ) ~ l

g g l+g

A short calcu la t ion now gives

~ =I i + NP/(1_p11N... l )/2 .



47

As in the previous example, ~ becomes infinite i f N increases with p fixed,

or i f p tends to unity with N fixed. The possibility that p approaches -1

cannot here arise.

A further application of the methods of this section having some

importance is to situations in which the mean is in question of a normally

distributed set of observations whose correlations and relative variances

are known approximately but not exactly. An exact knowledge of these para-

meters would provide transformations of the observations ~ , ••. , ~ into new

* *ariates x l " " ' ~ ' independently and normally distributed with equal vari-

ance. If now t is computed from the starred quantities, i t will have ex-

actly the same distribution as in the standard case. But errors in the es-

timates used of the correlations and relative variances will generate errors

in the transformation obtained by means of them, and therefore in the dis-

tribution of t . The true and erroneous values of the parameters together

determine a matrix A which, with (10.4) or (10.5), gives indications re-

garding the distribution of t , from which approximate percentage points may

be obtained, wi th an accuracy depending on that of the original estimates

of the correlations and relative variances. In this way general theory and

independent observations leading to knowledge or estimates of these para-

meters may be used to improve the accuracy of the probabilities employed

in a t - tes t . If a confidence region of known probability is available for

the parameters, an interesting set of values of ~ corresponding to the

boundary of the confidence region might be considered.



48

11. Sample variances and correlations

The sensitiveness to non-normality of th e Student t - tes t is ,

as we have seen, chiefly a matter of the very non-standard forms

of tai ls distant from the center of the true t-distribution to an

extent increas ing with the sample number. Between fixed l imits , or

for fixed probabili ty levels, i t is easy to deduce from tDOw.n l imit

theorems of probabili ty that the Student distribution is s t i l l often

reasonably accurate for SUffic iently large samples. However these

theorems do not apply in a l l cases; for instance, they do not hold

i f the basic population has the Cauchy distribution; and unequal vari

ances and intercorrelations of observations are notorious sources of

fallacies. Moreover, the whole point in using the Student distribution

in st ead of the older and simpler procedure of a sc ribing the normal

distribution to the Student ratio is to improve the accuracy in a way

having importance only for small samples from a normal distribution

with independence and uniform variance. Deviations from these s tandard

conditions may well, for small samples with fixed probabili ty levels

such as .05, or for larger samples with m o r e ~ r i n g e n t levels,produce

much greater errors than the int roduct ion of the Student distribution

was designed to avoid. The l imit theorems are of l i t t l e use for small

samples. To substitute actual observations uncr it ica lly in the beauti

ful formulae produced by modern mathematical sta t is t ics , without an y

examination of the applicability of th e basic assumptions in the par

t icular ciecumstances, may be straining a t a gnat and swallowing a

camel.



49

Even the poor consolation provided by limit theorems in the

central part of some of the distributions of t is further weakened when

we pass to more complicated statist ics suCh as sample variances and

correlat ion coeffic ients . Here the moments of the statist ic in samples

from non-normal populations betray the huge errors t ha t a ri se so easily

when non-normality is neglected. Thus the familiar expression for th e

variance of the sample variance,

( l l . l )

~ h e n d±y1ded by' the squaredpopulattontvariahce' y i e ~ a s a q u o ~ i e n t'two ' a n d ' I ' ~ 4 h a J . P " . l ' t 1 . m . e s 8:8 great ' for a' normal as, fOZJl'a rectangular 'pop-

ulation, ·and has no general upper bound.

The asymptotic standard error of the correlat ion coeffic ient

(see, e.g. Kendall, vol. 1 , p. 211 ) reduces for a bivariate normal

population to the well known approximation

( l l . 2)2 2

(1 - p ) In ,

e

where p is th e correlation in the population. This may now be compared

with the asymptotic variance of the correlat ion coeffic ient between x

and y when the density o f poin ts representing these variates is uniform

within an ellipse in the xy-plane and zero outside i t . The popUlation

correlation p and the moments of the distribution are determined by the

shape of the ellipse and the inclinations of i t s axes to the coordinate

axes. On substituting the moments in the general asymptotic formula for

the s tandard error of the correlation coeffic ient r , the result reduces

to



50

2 22(1 - P ) /(3n),

only two-thirds of the s tandard express ion (11.2). I t is note-

worthy that in the special case p = 0, in thich the ellipse reduces to

a circle , the variance is only 2/<3n), differing from l /n , the exact

variance of r in samples from a bivariate normal distribution with

p = 0, and the asymptotic variance of r in samples from any population

in which there is independence between x and y, or even in which the

moments o f f ou rth and lower orders satisfy the conditions for indepen-

dence.

Fisher1s ingenious u s ~ s L14, Sec.327 of the transformation

(11.4) z = f (r) =! log (1+r ) - ! log (1- r ) , t = f ( p) ,

whose surprising accuracy has been verified in a study by Florence David

[ i ~ 7 , and which has since been studied and modified [ 2 ~owes i ts value primari ly to the properties that, for large and even

moderately large samples, z has a nearly normal distribution with

r -1mean close to ':I and variance n , apa r t from terms of higher order.

This transformation may be derived (though i t was not by Fisher) from

the c ri te ri on tha t z shall be such a function of r as to have the

-1asymptotic variance n , i n d e p e n d e n ~ l y of the parameter p. This

problem has been s tUdied L02§7, and an asymptotic series

for the solution has been found to have z .as i ts leading term; two

-1 -2additional terms, mUltiples respectively of nand n ,have been

calculated, with resultant modifications of z that presumably improve

somewhat the accuracy of the procedures using i t .



51

All uses of z, however, depend for the accuracy of their

probabilities on the applicability of the familiar formula (11.2)

for the var iance of r , which has been der ived only from assumptions

including a bivariate normal population, and even then, only as an

approximation; cf. /J§]. If the bivar ia te d ist ribut ion is

not normal, and in spite of this fact the var iance (11 .2) is used

e i ther d ir ec tly, or indirectly through the z-transformation, the

errors in th e final conclusions may be very substantial. This trans-

fonnation does not in any way atone for non-normality or any other

deviation from t he s tandard conditions assumed in i t s derivation.

A different transformation having the s ~ m e desirable asymp-

totic properties as Fisher 's may be obtained for cor re la tions in

random samples from a non-normal bivariate distribution of known form,

or even one whose moments of order not exceeding four are a l l known,

provided certain weak conditions are satisfied. In these cases the

_.1-asymptotic standard error rr of r is of order n 2 and may be found inrthe usual way as a function of p. If z = f ( r ) and = f(p), with f

a sufficiently regular increas ing function;" ' a time-honored method

of .getting approximate standard errors may be applied: Expand z -

in a series of powers of r - p; square, and take the mathematical ex-

pectation; neglect terms of the resulting series that are of order

-1

higher than the f i r s t in n , t hus ordinar ily retaining only the

f i r s t term, which emerges as asymptotical ly equivalent to the vari-

ance of r. A theorem justifying this procedure under suitable con

ditions of regUlar ity and boundedness is given by Cramer 19 _7, but



52

the technique has been in frequent use for more than a century with-

out the benefit of a careful proof such as Cramer's. In the present

case, after taking a square root, i t yields

rr - rr ( d ~ / d p ) .z r

1If now we put rr =n-2 , integrate, and impose the additional con

z

dition that f(O) = 0, we find:

(11.5 )

I f we wish to obtain a transformation resembling Fisher 's in being

independent of n, as well as in other respects, we may replace the

sign of asymptotic approximation in (11.5) by one o f equalit y.

SUbstituting in (11.5) the reciprocal of the square root of the

asymptotic variance (11.2) of r in samples from a normal distribution

leads to Fisher 's transformation (11.4). But for the b ivar ia te d is

tr ibution having uniform density in an ellipse we substitute from (11.3)

instead of (11.2), thus redefining z and by multiplying Fisher 's

1 -1values by {3/2)2, while retaining the approximate variance n •

12. Inequali ties for variance distributions

Light may often be thrown on distributions of statist ics by

means of inequalities determined by maximum and minimum values, even

where i t is excessively diff icul t to calculate exact distributions.

An example of this approach is provided by the following brief study

of the sample variance in certain non-normal populations. I t will

be seen that this method sometimes yie lds rel evan t information where



53

none can be obtained from the asymptotic standard error formula

because the fourth moment is infinite.

One family of populations presenting special interest for the

study of sample variances is specified by th e Pearson Type VII f re-

quency curves

(12.1)

where

(12.2)

(p ~ 1)

This family of symmetric-distributions may be said to interpolate

between the Cauchy and normal forms, which i t takes respectively

for p = 1 and p = 00 .

The even moments, so far as they exist , are given by

Putting k =1, 2 we fizld

( 1 ~ . 4 ) 2 2a = =a /(2p-3)

2i f P > 3/2,

4( - l( -13a 2p-3) 2p-5) i f P > 5/2.

We conside r the case in which the populat ion mean is known but

methods appropriate to a normal distribution are used in connection

with an estimate of a and tests of hypotheses concerning a. In a

random sample le t x1, .•• ,x

nbe t he dev ia tions from the known popula

tion mean; then n is both th e sample number and th e number of degrees222f freedomj and s = Sx /n is an unbiased estimate of the var iance a

in any population for which th e la t ter exists. But despite th is fact



and the similarity of the Type VII and normal curves, the accuracy

2 2of s as an estimate of ~ differs widely for the different populations,

even i f has the same value.

When the formula (11.1) for the variance (i .e. squared standard

error) of the

the result is

sample variance s2 is applied to a normal distribution,

42 ~ In, as is well known. But when the moments (12.4)

of the Type VII distribution are inserted in the same formula, the

result is

(p > 5/2).

In this , a2

may by (12.4) be replaced by ( 2 P _ 3 ) ~ 2 , yielding

22= 4 ~ 4 ( p _ l ) L t 2 P - 5 ) ~ 7 - 1s

2The variance of s is infinite when p ~ 5/2, for instance when p=l and

the Type VII take s the Cauchy form. When p > 5/2 the ratio of the

2variance of s in samples from a Type VII to that in samples from a

normal population of the same variance is equal to 1 + 3/(2p-5);

this may take any value greater than unity.

In studying the distributions of s le t us put22222u = ns = SX

i,(u ~ 0), Q = Slog (1 + xi /a ).

Then in the sample space u is the radius of a sphere whose (n-l)-

dimensional surface has , by (2.11), the"area"

Also, le t us denote by ~ ( u ) , q(u) and ~ ( u ) , or simply ~ , q and qM'

the minimum, mean and maximum respect ive ly of Q over the spherical



55

(n- l) -d imensional surface of rad ius u. Thus

(12.6) ~ ( u ) < q(u) < ~ ( u ) .The density in the sample space found by multiplying together

n values of (12.1) with independent arguments is , in the new notation,

(12.7)n -pQ

y e •o

The probability element for u is seen, on considering a thin spherical

shell , to be

(12.8)

upper and lower bounds will be found by replacimg q(u) here by ~and ~ respectively.

At the extremes of Q for a fixed value of u, differentiation

yields

( 2 / 2 -1X. l+x. a) = AX

i,

1. 1.

where A is a". Lagrange multiplier.

(i=1,2, • .. ,n),

This shows that a ll non-zero

coordinates of an extreme point must have equal squares. Let K be

the number of non-zero coordinates at a point satisfying th e equations.

Since th e sum of t he squares is u2, each o f these coordinates must

1

be I K-2U; and i f u 0, K can have only the values 1,2, • . . ,n. At such

a cri t ica l point,

(2 -1 -2

Q = K log 1 + U K a ) .

This is a monotonic increasing funct ion o f K. Hence,

(12.10) (2 -2~ = log 1 + U a ) , (

2 -1 -2 n~ = log 1 + una )



56

A slightly larger but simpler upper bound is also ava ilab le ; for the

2 2monotonic character of the funct ion shows that qM < u /a .

22 2The well known and tabulated distribution of X =ns , based

on normal theory, is used for various purposes, one of which is to

test the hypothesis that ~ has a certain value ~ against largero

values o f ~ . Such a test might for example help to decide whether to

launch an investigation of possible excessive errors of measurement.

The simplest equivalent of the s tandard distribution is that of

(12.11)

of which the element of probability is

(12.12)

We now inquire how this must be modif ied i f the population is really

of Type VII rather than normal.

Let usf i r s t

consider the cases for which the Type VII dis

tribution has a variance, that is , for which p > 3/2; then by

2 2(12.4), a = ( 2 p - 3 ) ~ •

where

Eliminating a between this and (12.2) gives

1

C = (P-3/2)-2 r(p)/r(p-i).p

When p increases, c tends to unity through values greater than unity.p

Also, c3is approximately 1.23.

2 2 2 J..Since ns = Sx = u , we find from (12.11) that u = ~ ( 2 W ) 2 . This

and (12.13) we substitute in the distribution (12.8), which may then

be written with the help of the notation (12.12) in the form



57

(12.14)

Making the subs ti tutions for u and a also in (12.10), where u2/a

2

becomes 2w/(2p-3), and referring to (12.6), we observe that the

q function in (12.14) is greater than

and less than

fJ. - l( )-17nlog 1 + 2w n 2p-3 - I .Consequently, for a l l w > 0,

h' (w) < h(w) < h" (w) ,

where

h' (w) = c nLr2P-3)/(2P-3+2nw17np eW

g (w),p n

If p increases while nand w remain fixed, both h' (w) and h"(w) tend

to g (w), and th eir r ati o tends to unity. Thus the ordinary normaln2

theory tables and usages of X give substantially correct results

when applied to samples from a Type VII population if only p is large

enough. This of course might have been expected since the Type VII

distribution i tself approaches the normal form when p grows large; but

the bounds just determined for the error of assuming normality may

often be useful in doubtful cases, or where p is not very large.

In a l l cases , h'(w) and h"(w) may be integrated by elementary

methods. If n=2, h' (w) coincides with h" (w).



13. What can be done about non-standard conditions?

Standard sta t is t ica l methods involving probabilities calculated

on standard assumptions of normality, independence and uniform variance

of the observed variates lead to many fallacies when any of these assumed

condi tions a re not fully present, and adequate correc tions , often re

quiring additional knowledge are not made. The object of this paper is

to clar i ty the behavior of these aberrant probabilities with the help

of some new mathematical techniques, and to take some soundings in the

vast ocean of possible cases .

We have concentrated for the most part on the effects of non

standard conditions on the distribution of Student's t , and especially

on the por tions of this distribution for which It I is rather large. I t

is such probabilities (or their complements) that appear to supply the

main channel for passing from numerical values of t to decisions and

actions. A knowledge of the "tail" areas of the frequency distribution

of t (as also of F, r and other sta t i s t ics) seems t he re fo re of greater

value than information about re la t ive probabil i ties of different regions

a ll so close to the center that no one would ever be l ikely to notice

them,to say nothing of distinguishing between them. Thus instead of

moments, which describe a distribution in an exceedingly diffuse manner

and are often not sufficiently accurate when only conventional numbers

of them are used, i t seems better to ut i l ize for the present purpose a

functional only of outer portions of the distribution of t . Such a

measure, here called ~ for samples of N, and R in the l imit for very

large samples, is the l imit as T increases of the ratio of two proba

bil i t ies of th e same event, such as t > T or I t I > T, with the denom-



59

inator probabil ity based on standard normal central random sampling

and the numerator on some other condition which the investigator wishes

to study. Additional methods are here developed for evaluating the

exact probabilities of t a i l areas for which T is a t least as great

as the number of degrees of freedom n, in such leading cases as the

Cauchy and the single and double s imple exponential distributions.

These findings could be pushed further inward toward the center, but

with increasing labor and, a t least in some cases, diminishing uti l i ty .

To heal the i l ls t ha t r esul t from attempts to absorb into the

smooth routine o f standard normal theory a mass of misbehaving data of

doubtful antecedents, an inquiry into the natu re and circumstances

of the trouble is a natural part of the preparations for a prescription.

Such an inquiry may in diff icul t cases require the combined efforts of

a mathematical sta t i s t ic ian, a special ist in the field of applica tion ,

and a psychiatrist . But many relatively simple situations yield to

knowledge obtainable with a few simple instruments. One of these

instruments is ~ , which in case of erratiCally high or low values oft will eliminate many possible explanations of the peculiarity. Positive

ser ia l correlations in time series, one of Markoff type and the other

with equal correlations, tend to make ~ and t too large, whereas in-dependence without accurate weights makes them too small. The Cauchy

distribution makes them too low, some others too high. Beldom will

~ be close to unity, especially in small samples, unless the standardconditions come close to being sat isfied.

The conditions that R, which is 1 for random sampling from a

normal distribution, be also 1 with independent random sampling, but



e

60

without actual normality, are very special, and are developed in Section

9. These are rather picturesque, and show that this kind of approximate

norrJality has absolutely nothing to do with moments, or with the behavior

of the frequency curve a t infinity, or very near the cente r o f symmetry

i f there is one, but are solely concerned with what happens in a small

neighborhood of each point of inflection. Any correct ion of t for

non-normality should evidently be responsive to deviations from the

conditions of Section 9.

In some cases the trouble can be dispelled as an i l lusion, gener

ated perhaps by too few or too i l l controlled observations.

When a remedy is required for the error s resul ting from non-nor

mality, a f i r s t procedure sometimes available is to transform the variate

to a normal form, or some approximation thereto. The transformation

must be based on some real or assumed form of the populat ion distribution,

and this should be obtained from some positive evidence, which may be

in an empirical good f i t or in reasoning grounded on an acceptable body

of theory regarding the nature of the variates observed. One form of

the f i rs t , practiced by at least some psychologists, consi st s o f new

scores constituting a monotonic func tion of the raw scores but adjusted

by means of a large s a m p ~ e so as to have a nearly normal distribution,

thus bring ing the battery of tests closer to the domain of normal cor

relation analysis. This is satisfactory from the s tandpoin t of uni

variate normal distributions, though for consistency of mult ivariate

normal. theory additional condi tions a re necessa ry , which mayor may no t

be contradicted by the facts.



61-

The replacement of price relatives by their logarithms appears

to be a sound practice. Indeed, a positive factor, the general level,

persisting through a l l prices, has a meaning and movement of i ts own;

and with i t are independent fluctuations of a multiplicative character.

All this points to the logari thms of the prices as having the right

general properties of approximate symmetry and normality instead of

the very skew distributions of price relatives themselves, taken in

some definite sense such as a price divided by the price of the same

good a t a definite earlier time. There is also some empirical evidence

that logarithms of such price relatives have normal, or a t least sym

metric, distributions. In preparing a program of logarithmic trans

formations,it:lsdes1rable to provide f or suitable steps in the rare

ins tances of a price becoming zero, infinite or negative. The ad

vantages of rep lacing each price,with these rare exceptions,by i ts

logarithm (perhaps 4-place logarithms are best) a t the very beginning

of a s tudy a re considerable, and extend also to some other time

variables in economics.

Non-parametric methods, including rank correlation and th e use

of contingency tables and of order sta t i s t ics such as quantiles, prOVide

a means of escape from such errors as applying normal theory, for

example through the Student rat io, to d is tr ibu tions that are not normal.

Getting suitable exact probabilities based purely on ranks is acom-

binatorial problem after a decision has been reached in detai l as to

the alternative hypotheses to be compared, and the steps to be taken

after each relevant set of observations. These however ar e often diff i

cult decisions, lacking such definite cr i ter ia as occur naturally in



62

parametric situations like those involving correlation coefficients

in normal distributions. Changing from measures to ranks often implies

some sacrifice of information. For example, in a sample from a bivari

ate normal distribution, i t is possible to est imate the correlation

parameter either by the sample product-moment correl at ion or by a

function of the rank correlation coefficient; but the lat ter has a

higher standard error.

In spite of such str ictures, non-parametric methods may suit

ably be preferred in some situations. The median is a definitely

useful sta t is t ic , and so, for some purposes, is M. G. Kendall's rank

correlation coefficient.

One caution about non-parametric methods should be kept in mind:

Their use escapes the possible errors due to non-normality, but does

nothing to avoid those of falsely assumed independence, or of hetero

scedasticity, or of bias; and a l l these may be even greater threats

to sound sta t is t ica l inference than is non-normality. These prevalent

sources of trouble, and the relative inefficiency of non-parametric

methods manifested in their higher standard errors where they compete

with parametr ic methods based on correct models, must be weighed against

their advantage of not assuming any particular type of population.

We have not unti l the last paragraph even mentioned biased errors

of observation. Such errors, though troublesome and common enough, are

in g reat part best discussed in connection with specific applications

rather than general theory. However, the sta t is t ica l theory of design

of experiments does prOVide very material help in combatting the age

old menace of bias, and a t the same time contributes part ia l answers to



t he question at the head of this section.

Objective randomization in the allocation of treatments to exper

imental units, for example with the help of mechanisms of games of

chance, serves an important function that has been described in at

least two different ways. I t may be called elimination of bias,

since in a balanced and randomized experiment any bias tends to be

distributed equally in opposite directions, and thereby to be trans

formed into an increment of variance. Randomization may also be re

garded as a restoration of independence impaired by spec ia l ident i fiable

fe atu re s o f individual units, as in shuffling cards to destroy any

relations between their positions that may be known or suspected. This

aspect of randomization offers a substantial opportunity to get rid of

one of th e th re e chie f kinds of non-standard conditions that weaken the

applicability of standard procedures of mathematical stat is t ics .

Normality, independence and uniformity of variance can a l l be pro

moted by a tt en ti on to their desirability during careful design and ex

ecution of experiments and analogous investigations, such as sample surveys.

Emphasis during such research planning should be placed on features tend

ing to normali ty , including use of composite observations, such as mental

test scores, formed by add iti on of ind iv idua l item scores; and the removal

of major individual causes of var iance leav ing as dominant a usually larger

number of minor causes, which by their convolutions produce a tendency

to normality.

After factorial and other modern balanced experiments, the unknown

quantities are commonly estimated by linear func tions o f a considerable



64

number of observations , frequently with equal coefficients. In addition

to the other benefits of such designs, this feature is conducive to close

approach ton o ~ i t y

in the distributions of these estimates. A case

in point is in weighing several l ight objects, which is much better done

together in various combinations than separately one a t a time, especially

i f both pans of the scale can be used, weighing some combinations against

others. In the development of this subject (Yates [bg7, Hotelling 12'2.7,

Kishen 51.7, Mood 15,-7) the primary motives were cance ll at ion of b ia s

and reduction of variance, fol lowed by advancement of the combinatorial

theory. But from the point of view of the present inqui ry , an important

part of the gain is the increase in the number of independent observations

combined l inearly to estimate each unknown weight. Errors in weights

arrived a t through such experiments must have not only sharply reduced

bias and variance, but much closer adherence to the normal form.

One other protection is available from the dangerous fallacies

that have been th e main sub je ct o f this paper. I t l ies in continual

scrut iny of the sources of the observations in order to understand as

fUlly as possible the n atu re of the random elements in them, as well as

of the biases, to the end that the most accurate and reasonable models

possible be employed. To employ the new models most effectively will

also cal l frequently for dealing in an informed and imaginative manner

with new problems of mathematicalstat is t ics .



BIBLIOGRAPHY

The references below have been selected mainly because of theirbearing on the problems of distributions of. t and F in non-standardcases, with a few dealing instead with distributions of correlationand serial correlation coefficients and other s tat is t ics . The many

papers on distributions of means alone or of standard deviationslone have been barred, save in a few cases such as Rietz 1 derivation of the distribution of s in samples of three from a rectangula r distribution, which throws l ight on the nature of the difficulty of th e problem for larger samples. A few papers on weighingdesigns have been included for a reason arising in the f inal paragraphs of the paper.

An effort has been made to l i s t a l l studies giving particularattention to extreme t a i l probabilities of th e several sta t is t icsjas to this approach, which is also much used in the p resent paper,i t is possib le the l i s t is complete. But in the general field indicated by the t i t le , completeness has not been attained, and could

not be without a t r u ~ y enormous expansion. I t is with great regretthat many meritorious papers have had to be omitted. But referencesare abundant, and well-selected l is ts can be found in many of thepublications listed below. The short bibliographies of Bradley andRietz are extremely informative regarding the condition of our sub

ject when they wrote. Kendall's is virtually all-inclusive, and whentaken with appropriate parts of his text gives the most nearly adequate conspectus of th e great and devoted efforts that have goneinto th e field since the ! i r s t World War.

A relatively new non-standard situation for app lication o f s tandard methods is that in which different members of a sample are drawnfrom normal populations with different means. This is not treatedin th e present paper, but is in papers by Hyrenius, Quensel , Robbins

and zackrisson l isted below.

1:!7 G. A. Baker, I lDistribution of the mean divided by the standard

deviations of samples from non-homogeneous populations/1Ann. Math. Stat. ,

vol. 3 (1932), pp.1-9.

I:gl, M. S. Bartlett, liThe effects of non-normality on the t distribution, ' ~ Camb. Phil. Soc., vol. 31 (1935), pp. 223-231.

L"L7 G. E. P. Box, "Non-normality and tests on variances," Bio

metrika, vol. 40 (1953) , pp. 318-335.

I:fl Ralph A. Bradley, "Sampling dis tr ibu tions for non-normal universes, I Ph.D. Thesis, Universi ty of North Carolina library, Chapel Hill,1949.

1:2.7 Ralph A. Bradiey, "The d i s t r i ~ u t i o n of th e t and F stat ist icsfor a class of non-normal populat ions ,' Virginia Journal of Science, vol.3(NS), No. l(Jan. 1952), pp. 1-32. - -



66

l"fl Ralph A. Bradley, "Corrections for non-normality in the use

of the two-sample t and F toasts at high significance levels," Ann. Math.Stat. , vol. 23 (1952), pp.103-113. -

l"17 Chung, Kai Lai, "The approximate distribution of Student's

stat ist ic," ~ M a t h . ~ . , v o l ~ 17 (1946),pp. 447-465.

LfJfl A. T. Craig, "The simultaneous distribution of mean and standarddeviation in small samples," ~ Math. ~ ,vol. 3 (1932), PP.126-140.

£:27 Harold Cramer, Mathematical Methods of Statistics,Princeton, 1946.

LI27 F. N. David, Tables of the ordinates and probability integral ofthe distribution of the corre la t ion coeff icient in small samples. London,Biometrika office. 1935.

f i l l F. N. David, "The moments of the z and F distributions," Biometrika

vol. 3b (1949), pp.394-403.

[ig7 F. N. David and N. L. Johnson, "The effect of non-normality onthe power function of the F-test in the ana ly sis o f variance," Biometrika,

vol. 38(1951), pp. 43-57.

[iil A. Erdelyi, Asymptotic expansions. New York, Dover. 1956.

[ i ~ R. A. (Sir Ronald) Fisher, Stat is t ica l Methods for Research WorkersEdinburgh, Oliver and Boyd. 1st ed. 1925.

[i17 R. A. (Sir Ronald) Fisher, "Applications of Student's distribution,"Metron; vol. 5, no. 3, (1925) , pp . 90-104.

[i§] A. K. Gayen, "The distribution of 'Student 's ' t in random samplesof any size drawn from non-normal universes," Biometrika,vol.36(1949),

pp. 353-369.

[i17 A. K. Gayen, "The distribution of t he var iance rat io in randomsamples of any size drawn from non-normal universes," Biometrika, vol. 37(1950), pp.236-255.

[i§.7 A. K. Gayen, "The frequency distribution of the product-moment correlation coefficient in random samples of any size drawn from non-normaluniverses," Biometrika, vol. 38 (1951), pp. 219-247.

/f.cjJ R. C. Geary, "The distribution of S tudent 's ratio for non-normalsamples,'! Supp. Journ. Roy. Stat. Soc., vol. 3(1946), pp. 178-184.

8,07 G. B. Hey, "A new method of experimental sampling i l lustrated incertain non-normal populations," Biometrika, vol. 30 (1938), pp. 8-80.(Contains an important bibliography).

ffi.!7 Wassily Hoeffding, "The role of hypotheses in stat is t ical decisionsProceedings of the third Berkeley symposium on mathematical s ta t i s t ica l andprobability, vol. 1, pp. 105-116, (1956).



67

@27 Harold Hotelling, "An application of analysis situs to sta t is t ics ,"Blll l . Amer. Math. ~ , vol.33 (1927) , pp. 467-476.

l2g Harold Hotel*ing, "Tubes and spheres in n-spaces and a class ofstat is t ical problems, Amer. Journ. of ~ , vol. 61, (1939), pp. 440-460.

(247 Harold Hotelling, "Effects of non-normality a t high significance

levels-;" (abstract) , ~ Math. Stat. , vol. 18 (1947), p. 608.

122.7 Harold Hotell!ng, "Some improvements in weighing and other ex perimental techniques, 1 ~ ~ Stat. , vol. 15 (1944), pp. 297-306.

12§.7 Harold Hotelling, "New l ight on the corre .lat ion coefficient andi ts t r a n s f ~ r m s , " Journ. Roy. ~ Soc., Ser ie s B, vol. 15 (1953), pp. 193-23

@77 Hannes Hyrenius, "Sampling distributions from a compound normalparent-population, " Skand. Aktuar. .Tidskrift . , vol. 32 (1949), pp. 180-187.

12§.7 Hannes Hyrenius, "Distribution of 'Student'-Fisher 's t in samplesfrom compound ,normal functions," Biometrika, vol. 37 (1950), pp . 429-442.

[297 E. S. Keeping, "A significance test for exponential regression,"Ann. Math. ~ , vol.22 (1950), pp.180-l98.

1327 M. G. Kendall, The advanced theory of stat ist ics,2 vols. London,ttiliffin, 1948.

/317 K. Kishen, "On the design of experiments for weighing and makingother types of measurements," ~ Math. ~ , vol. 16 (1945), pp. 2 9 4 ~ 3 0 b .

13'5} Jack Laderman, "The distribution of 'Student's ' rat io .for samplesof two items drawn from non-normal universes," Ann. Math. Stat. , vol. 10(1939), pp. 376-379. --! 5 ~ 7 Alexander M. Mood, "On Hotelling's weighing problem," Ann. Math.~ , - v o l . 17(1946), pp.432-446. ---

f5'±7 Jerzy Ne;yman, "On the correlation of the mean and variance insamples drawn from an infinite population;' Biometrika, vol. 18 (1926),pp. 40l-4l}.

132.7 Jerzy Neyman and E. S. Pearson, "On th e use and interpretation ofcertain tes t cri teria for purposes of stat is t ical inference," Biometrika,

vol. 20A (1928) , pp. 175-240.

139 Egon S. Pearson, assisted by N. K. Adayantha and others, "Thedistribution of frequency constants in small samples from non-normalsymmetrical and skew populations," Biometrika, vol. 21 (1929) , pp. 259-286.

1517 Egon S. Pearson, "The analysis of variance in cases of non-normalvariation," Biometrika, vol. 23 (1931), pp. 114-133.



68

!5ffJ Leone Chesire, Elena Oldis, and Egon S. Pearson, "Furtherexperiments on the sampling distribution of the correlation coefficient,"Journ. ~ ~ ~ Assoc., vol. 27 (1932), pp . 121-128.

/5.97 Egon S. Pearson, "The test of significance for the correlationcoefficient: some further results," Journ. Am. Stat. Assoc., vol. 27,

(1932), pp. 424-426. -

1497 Egon S. Pearson, "The test of significance for th e correlationcoefficient, " Journ. ~ Stat. Assoc., vol. 26 (1931), pp. 128-134.

f!i.17 Victor Perlo, "On the distribution of S tudent 's ratio for samplesof three drawn from a rectangular distribution," Biometrika, vol. 25,(1933), pp. 203-204.

/.4.27 Carl-Erik Quensel, "An extension of the validity of StudentFisherTs law of distribution," Skandinavisk Aktuarietidskrift, vol. 26(1943), 210-219.

The distribution is examined of t for normally distributed observationshaving a common mean but different variances, and is found to be more closely concentrated about zero than in the standard case.

/.4.37 Carl-Erik Quensel, "The validity of tlE z-criterion when the variates are taken from different normal populations," Skandinavisk Aktuarietids

krift , vol. 30, (1947), pp. 44-55.~ e n e r a 1 1 z e s th e foregoing to analysis of variance, with similar results.

fF4J Carl-Erik Quensel, "The distribution of th e second-order momentsin random samples from non-normal multivariate universes," Lund Univ.Arsskrift. N. F. Avd. 2. Bd. 48 No.4. 1952. - - - - - - - -

14,7 Paul R. Rider, "On the distribution of the ratio of mean tostandard deviation in small samples from certain non-normal universes,ll

Biometrika, vol. 21 (1929), pp. 124-143.

ffi§] Paul R. Rider, "On small samples from certain non-normal universes,"~ Math. S t a ~ . : . , vol. 2 (1931), pp. 48-65.

ffiJJ Paul R. Rider, "A note on small sample theory,ll Journ. Amer. Stat.Assn., vol. 26 (1931), pp. 172-174. --

L'4g Paul R. R ~ d e r , "On the distribution of the correlation coefficientsin small samples-," Biometrika, vol. 24 (1932), pp. 382-403.

ffiil H. L. Rietz, "Note on th e distribution of the s tandard deviationof sets of three variates drawn at random from a rectangular distribution,"Biometrika, vol. 23, (1931), pp. 424-426.

897 H. L. Rietz, "Comments on applications of recently developedtheory of small samples," Journ. Amer. Stat. Assoc., vol. 26 (1931), PP.36-44

15.17 H. L. Rietz, IlSome topics in sampling theory," Bull. Amer. Math.Soc., vol. 43, (1937), pp. 209-230. -- -



69

.6g7 H. L. Rietz, "On the distribution of the 'Student' ratio forsmall samples from certain non-normal populations," Ann. Math. Stat . ,vol. 10 (1939), pp. 265-274. - - -

1327Herbert Robbins, "The distribution of Student's twhen the

population means are unequal, II ~ Math. Stat . , vol. 19 (1948), pp. 406-410.

/51£1 W. A. Sh,ewhart and F. W. Winters, "Small samples - new experimenta l results," Journ. Amer. Stat. Assn., vol. 23 (1928), pp. 144-153.

This art icle combIneS experimental with other approaches, especiallythat of NeJiUlan f.3}:7 .

.6'2..7 M. M. Siddiqui, "Distribution of a serial correlat ion coefficientnear the ends of the range ," Ann. Math. Stat . , vol. 29 (1958), pp.852-86l.

/527 M. M. Siddiqui, Distributions of some serial correlation coeffi- 'cients. University of North Carolina Ph.D. thesis, 1957. Chapel Hill:

University of North Carolina Library.

8 ]] D. M. Starkey, "The distribution of the multiple correlationcoefficient in periodogram analysis, II Ann. Math. Stat . , vol. 10 (1939),pp. 327-336. - -

8 ~ Martin Weibull, "The distribution of the t and z variables in thecase of stratified sample with individuals taken from normal parent populations with varying means,1I Skandinavisk Aktuarietidskrift, vol. 33,(1950), pp. 137-167.

8il Martin Weibull, liThe regression problem involving non-randomvariates in the case of stratif ied sample from normal parent populations

with varying regression coefficients,1I Skandinavisk Aktuariatidskrift,vol. 34 (1951), pp. 53-71.

lbQ7 Martin Weibull, "The distribution of t - and F- statistics and ofcorrelation and regression coefficients in stratified samples from normalpopUlations wi,th different means," Skand. Aktuar. Tidskrift, vol. 36, (1953)pp. 106, Supp. 1'-2.

lb!7 D. V. Widder, The Laplace Transform. Princeton, 1941.

&27 F. Yates, " C o ~ p l e x experiments," Supp. Journ. Roy. Stat. Soc.,vol. 2-(1935), pp. 181-247. - --

The Behavior of Some Standard Statistical Tests Under

Documents

Transcript of The Behavior of Some Standard Statistical Tests Under