A Twosample Nonparametric Multivariate Scale Test based ... · sum test for shift in location is...

A Twosample Nonparametric Multivariate

Scale Test based on Data Depth

Shoja’eddin Chenouri Thomas J. Farrar

Department of Statistics and Actuarial Science

University of Waterloo

Waterloo, N2L 3G1

Abstract

In this paper, a percentile modification is applied to a multivariate extension of the

well known SiegelTukey test based on data depth. This modification is done by remov

ing the deepest proportion of the data. The asymptotic distribution of this modified

test statistic is proven to be standard normal. The test is compared empirically to the

Box’s M test and the F product test. It performs favourably in terms of type I error

and power, especially for nonnormal data. The empirical results also show that the

percentile modification increases power for lowdimensional data. The test is applied

to a real data set. It is claimed that the percentilemodified multivariate SiegelTukey

test is a viable option for testing scale change between two multivariate populations.

KEYWORDS: Data depth, Multivariate nonparametric, Scale test, SiegelTukey test

1

2

1 Introduction

In multivariate statistical analysis it is often desirable to compare the dispersion between

two or more populations. The Box’s M test (Box 1949) and F product tests (Mardia et al

1979 and Liu and Singh 2006) are two available options for this purpose. Both tests assume

that the underlying distributions are multivariate normal. The Box’s M test is a modified

likelihood ratio test of homogeneity of multiple covariance matrices, which is equivalent to the

Bartlett’s test in univariate case. The F product test is a two sample test of scale expansion.

Recently, Chenouri (2004) and Liu & Singh (2006) have proposed a family of multivariate

nonparametric tests. These tests are based on the centeroutward ranks generated by depth

functions. These multivariate nonparametric tests can be treated as multivariate extensions

of the SiegelTukey and AnsariBradleyFreund tests.

It is known that when the samples are drawn from normal populations, the Wilcoxon rank

sum test for shift in location is nearly as efficient (0.955) as the ttest in terms of asymptotic

relative efficiency (A.R.E.). The SiegelTukey and AnsariBradleyFreund tests for change in

scale, however, have low (0.61) A.R.E relative to the F test when the distributions sampled

are normal.

Gastwirth (1965) developed a method for improving the univariate AnsariBradleyFreund

test in terms of Pitman efficiency. He reasoned that data from two distributions with equal

location but different scale are likely to both be clustered near the mean, but as one moves

away from the mean, one will predominantly find points from the distribution with greater

scale. That is, the scale difference between the distributions will be most evident in the tail.

His idea, then, was to place more weight on the extreme data points than the central ones

when computing the ranksum statistic, and in fact, to eliminate the most central ranks from

the statistic altogether.

In this paper we adopt Gastwirth’s idea to improve the power of the depthbased multi

variate SiegelTukey test of Chenouri (2004), and Liu and Singh (2006).

3

To summarize, the method in p dimensions is to obtain a centeroutward ranking from 1

to n of our combined sample using a depth function. Then, discard those points whose rank

is above the rth percentile, 0 < r ≤ 1. That is, ignore the deepest (1 − r)% of the data when

computing the test statistic. Another way to say this is that we retain all points with center

outward rank at most n�, where n� = �r n�, and n is the combined sample size.

2 Statistical depth functions

In the univariate case, the idea of data depth was first used by Hotelling (1929). Hodges

(1955) introduced the idea of halfspace depth for bivariate data in order to construct his

bivariate sign test. Tukey (1974) formally defined the halfspace depth function and applied

it to visualize bivariate data. Since Tukey (1974) numerous other notions of data depth have

been developed, including projection depth (Stahel, 1981 and Donoho, 1982), simplicial

volume depth (Oja, 1983, Zuo and Serfling, 2000), simplicial depth (Liu, 1990), majority

depth (Singh 1991), Mahalanobis depth (Liu and Singh, 1993), regression depth (Rousseeuw

and Hubert 1999) and spatial depth (cf. Serfling, 2002).

Small (1990) and Zuo and Serfling (2000) provide a summary of the study of statistical

depth up to those dates. Zuo and Serfling (2000), inspired by Liu (1990), have listed four

criteria that every useful depth function ought to satisfy. These desirable properties are

affine equivariance, maximality at center, monotonicity with respect to the deepest point,

and vanishing at infinity. A point that maximizes data depth is called a multidimensional

median. In the case that there is more than one maximizer, we take the centroid of the set

of maximizers as the multidimensional median (see Small 1990 for an interesting historical

review). An obvious application of data depth is in ordering and ranking of multivariate data

centeroutwardly. The center outward ordering and ranking have been used in multivariate

nonparametric inference and also robust statistics (cf. Chenouri and Small 2005, and Liu

4

et al 1999).

To formally define a few depth functions, let X1, . . . ,Xn be a random sample from a

pdimensional distribution F and suppose that Fn represents the corresponding empirical

distribution, taken as a nonparametric estimate of F . The spatial depth of a given point

x ∈ Rd with respect to F and the data cloud are

1 SPD(x; F ) = , (1)

1 + EF �X − x�

and 1

SPD(x; Fn) = � (2) n

1 + 1 n �Xi − x�

i=1

respectively, where � · � is the usual Euclidean norm.

The Mahalanobis depth of x is defined by

1 MHD(x; F ) = (3)

1 + (x − µF )T Σ−1(x − µ

F )

F

where µF

is a center and Σ a dispersion matrix of distribution F . A sample version of the F

MHD is defined by replacing µF

and Σ with appropriate estimates. F

The Halfspace depth or Tukey depth at a point x ∈ Rp with respect to F is defined to be

HSD (x, F ) = inf PrF (Hx,u) = inf Pr

F (Hx,u) (4) u∈Rp ||u||=1

where Hx,u = {y ∈ Rp : u�y ≤ u�x} is a closed halfspace. Note that the empirical version

of the halfspace depth is obtained by simply replacing F by Fn

Tmin # {i : uT Xi ≤ u x, i = 1, 2, . . . , n}HSD (x, Fn) =

�u�=1 . (5)

n

� ��

5

In other words, for pvariate data, the empirical half space depth of a given point x cor

responds to the minimal proportion of data points contained in a closed halfspace whose

boundary, a (p− 1)dimensional hyperplane, passes through x.

Finally the simplicial depth (Liu 1990) of a given point x ∈ Rp with respect to F is defined

to be

SD (x; F ) = PrF {S[X1,X2, ...,Xp+1] (6) � x}

where S[X1,X2, ...,Xp+1] is the closed simplex whose vertices X1,X2, ...,Xp+1 are p+ 1 ran

domly chosen observations from F . The sample version of SD (x; F ) is defined by replacing

F by Fn or by computing the fraction of simplices containing x i.e.,

−1 �n SD(x; Fn) =

p + 1 I(S[Xi1 , Xi2 , ..., Xip+1 ] � x) (7)

Here runs over all possible subsets of {X1, X2, . . . ,Xn} of size p + 1, and I(·) is the

indicator function.

Now given a depth function D(· ; ·), one can compute the depths of sample points X1, . . . ,Xn

with respect to a given (either theoretical or empirical) distribution F , that is

D(X1 ; F ), . . . , D(X1 ; F ) ,

and order them according to decreasing depth values. This gives a ranking of the sample

points from the center outward. More precisely, for a sample point Xi

Ri = R(Xi) = #{Xj ; D(Xj ; Fn) ≥ D(Xi ; F ), j = 1, . . . n},

is the centeroutward rank of Xi with respect to the data cloud {X1, X2, . . . ,Xn}. The

implication of this ranking procedure is that a larger rank is associated with a more outlying

point within the data cloud.

�

6

3 Multivariate scale tests

Let X = {X1, X2, . . . , Xn1 } and Y = {Y1, Y2, . . . , Yn2 } be two independent random sam

ples from pvariate absolutely continuous distributions F and G respectively. Consider a

depth function D(· ; ·). We assume that the distributions F and G have the same center

(deepest point) with respect to this depth function, but possibly different scales. We are

interested in testing

H0 :F (x) = G(x) for all x ∈ Rp

(8) HA : there exists σ = (σ1, . . . , σp) = 1 such that F (x) = G(σ x) for all x ∈ Rp ,

where σ x = (σ1 x1, . . . , σp xp). Note that if we let Σ1 and Σ2 be the covariance matrices of

F and G respectively, the hypothesis can be equivalently written as

H0 : Σ1 = Σ2 versus HA : Σ1 − Σ2 semidefinite .

In the univariate case (p = 1), one may use the SiegelTukey or AnsariBradley tests

which are strictly distribution free. The multivariate extension of the SiegelTukey test is no

different from the univariate case except that we use a depth function to order the combined

sample in a centeroutward manner. The rationale for the test is that if two random samples

are drawn from two distributions that differ only in scale and then combined, we would

expect the sample from the distribution with smaller scale to be scattered tightly around

the center, while observations from the distribution with larger scale would tend to occupy

more outlying positions. The notion of data depth provides a centeroutward ranking which

is suitable for capturing this difference using a ranksum test.

More precisely, let Ri be the linear rank of D(Xi ; Hn) in the set

{D(X1 ; Hn), . . . , D(Xn1 ; Hn), D(Y1 ; Hn), . . . , D(Yn2 ; Hn)},

�

7

where Hn is the empirical distribution of the dataset Z = . Note that Ri, i = 1, . . . , n1 isX∪Y

the centeroutward rank of Xi within the pooled dataset Z. A natural multivariate extension n1

of the SiegelTukey test is based on the Wilcoxon type ranksum statistic W = Ri. Under i=1

the null hypothesis, assuming no ties, Ri’s are identically and uniformly distributed on the

set {1, 2 , . . . , n}. This implies that under Ho, W is distribution free. We reject the null

hypothesis if W is large or small enough. For small sample sizes the distribution of W can

be tabulated, but for large samples it is computationally expensive if not impossible. Note

that under Ho

EHo

(Ri) = n + 1

2 , V ar

Ho (Ri) =

n2 − 1 12

, CovHo

(Ri, Rj ) = − n + 1 12

, i �= j . (9)

Hence

EHo

(W ) = n1 (n + 1)

2 V ar

Ho (W ) =

n1 n2 (n + 1) 12

(10)

and as n →∞ W − E

Ho (W )�

V arHo

(W ) D−→ N(0, 1) . (11)

So for large sample sizes we can use quantiles of N(0, 1) to test Ho.

4 A modified multivariate SiegelTukey test

Gastwirth (1965) developed a method for improving the univariate AnsariBradley test. In

this section we adopt Gastwirth’s idea to improve the power of the depth based multivariate

SiegelTukey test. As it is illustrated in Figures 1 for p = 2, we discard those points whose

centeroutward rank is above the rth percentile, 0 < r ≤ 1. That is, we ignore the deepest

(1 − r)% of the data when computing the test statistic.

For i = 1, . . . , n, define δi = 1 if an observation from the dataset X has the centeroutward

rank of i within the combined dataset Z = X ∪ Y , and δi = 0 otherwise.

�

�

�

� � � �

8

Figure 1: Combined sample before (left panel) and after (right panel) removing 30% deepest points (r=0.7)

The retained sample is denoted as Z � = {X1� , . . . ,X� , Y1, . . . ,Y

�2 } where n� =

� n� δin��

n� 1 i=11

and n� = n� − n1� are the random variables representing the number of points retained from 2

X and Y respectively. Note that n�, the total number of points retained, is fixed.

We define a new test statistic

n�

Vn� = [n� + 1 − i]δi

i=1

That is, Vn� is the sum of the reversed ranks of the retained portion of the dataset X . The

ranks are reversed so that the points further from the center receive higher weight. We reject

the null hypothesis if Vn is large or small enough.

Under Ho, assuming no ties, Ri’s are identically and uniformly distributed on the set

{1, 2 , . . . , n}. This implies that under Ho, Vn is distribution free. For small sample sizes

n1, n2, the exact distribution of Vn , for any given n�, can be tabulated. This is done by

enumerating all nn 1

possible combinations of the ranks of sample X , computing Vn on each

of them (for the given n�), and counting the number of times Vn� takes on each integer value.

�

� �

9

By way of example, for n = n = 5, the exact distribution of V7 is given by1 2 ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪

1 � � k = 3, 4, 24, 2510 5

2 � � k = 5, 2310 5

5⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪

� � k = 6, 2210 5

6 � � k = 7, 2110 5

9 ⎨� � k = 8, 2010 Pr (V7 = k) = 5 (12)⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪

12 � � k = 9, 1910 5

17 � � k = 10, 11, 17, 1810 5

22 � � k = 12, 13, 15, 1610 5

24 � � k = 1410 5 ⎩0 otherwise

Note that the distribution of Vn� is not always symmetric; this is true only if n1 and n2 are

either both less than n� or both greater than n�.

The support of Vn is given by k = kmin, . . . , kmax, where

kmin =(n� − n2)(n

� − n2 + 1) I(n2 < n�)

2

n� (n� + 1)kmax =

2 −

(n� − n1)(n

2

� − n1 + 1) I(n1 < n�)

This exhaustive approach to computing the exact distribution quickly becomes computa

tionally burdensome as n1 and n2 increase. It is only feasible up to around n1 = n2 = 10.

We shall develop a large sample analysis for the test statistic Vn under the null hypothesis.

Note that Vn� can be written in form of a simple linear rank statistic Vn = n ci a(i)� i=1

� � � �

� � �

10

where for i = 1, . . . , n, a(i) = δi and

⎧ ⎪⎪⎨n� + 1 − i if i ∈ {1, . . . , n�}ci = ⎪⎪⎩0 otherwise .

Thus,

n1 � n1

a = δi = , n n

i=1

1 � n�n1 � n�(n� + 1)

c = ci = (n� + 1 − i) = , n n 2n

i=1 i=1

n n� 2

� 1 2 �

n1 n1 �

σ2 =1 �

(a(i) − ¯1 �

a =

�

n1 − n1 =

� 1 −a)2 = a(i)2 − n ¯a n n n n n n

i=1 i=1

nn1 � n�

1 � 2σ2 = (ci − c)2 = ci − n c2 =

1 � (n� + 1 − i)2 − n c2

c n n n i=1 i=1 � i=1

1 n�(n� + 1)(2n� + 1) n�2(n� + 1)2 n�(n� + 1) 2n� + 1 n�(n� + 1) = =

n 6 −

4n 2n 3 −

2n

ajek and Sid´Now from Theorem c. page 61 of H´ ak (1967) or Lemma 1. page 78 of Ferguson

(1996) we conclude that

n1 n�(n� + 1) n1 n

�(n� + 1)E (Vn� ) = n ¯ ca ¯ = n =

Ho n 2n n 2 2 � �

V ar (Vn� ) = n

n

− 1 σ2 σ2 =

n2 n1 � 1 −

n1 n�(n� + 1) � 2n� + 1 n�(n� + 1)

(13)Ho a c n− 1 n �� n 2n 3

− 2n

n1 n�(n� + 1) � n1 2n� + 1 n�(n� + 1)

�

= 2(n− 1)

1 − n 3

− 2n

Theorem 1 Under the null hypothesis, if n1/n → λ1 as n and n1 → ∞ with 0 < λ1 < 1,

then the test statistic Vn� − E (Vn )

Tn = � Ho �

V ar (Vn� )Ho

has an asymptotic N(0, 1) distribution.

� �

� �

11

Proof: First note that since the function f(x) = (c−x)2 defined in the interval [a, b] attains

its maximum in the set of boundary points {a, b} we conclude that

2max (ci − c)2 = max{ max (ci − c)2 , ci=1,...n i=1, ...,n�

} �2 � �2 �

n�(n� + 1) n�(n� + 1) = max max n� + 1 − i− ,

i=1,...n� 2n 2n �� 2 � �2 � �2 � (14)

n�(n� + 1) n�(n� + 1) n�(n� + 1) = max n� , 1 − ,−

2n 2n 2n � �2 n�(n� + 1)

= n� − 2n

so clearly � �2 max (ci − c)2

= n� �− n

�(n�+1) i=1,...n 2n

n � � n c)2 n� (n�+1) 2n�+1 n�(n� +1) i=1(ci − ¯

2n 3 − 2n

is bounded. In addition

2 2 max (a(i) − ¯ n1 n2a)2

max n2 ,i=1,...n n2 � = 0 (15) n a)2 n1 n2

→i=1(a(i) − ¯

n

as n →∞ and n1/n → λ1 with 0 < λ1 < 1. Thus Theorem 12. page 82 of Ferguson (1996)

implies that Vn� − E (Vn )

Tn = � Ho �

V ar (Vn� )Ho

has an asymptotic N(0, 1) distribution.

It is worth noting that for values of n1, n2 > 10 which are still too small to use the normal

approximation, we can approximate the exact probabilities under Ho by sampling from the

population of all possible combinations.

� �

12

5 Empirical Results

Before we assess the effectiveness of the percentile modification in terms of power gains we

give a short overview on two parametric competitors to the depth based tests discussed in

the previous two sections. These tests are used in the simulations along with the multivariate

SiegelTukey test and its percentile modification.

Under assumption of normality, i.e. when X1, . . . , Xn1 and Y1, . . . , Yn1 are independent

samples from pvariate normal distributions N(µ1, Σ1) and N(µ2, Σ2) respectively, Box

(1949) proposed a modified likelihood ratio test for testing Ho : Σ1 = Σ2 versus HA :

Σ1 =� Σ2. This test is known as Box’s M test. Its test statistic M follows an asymptotic χ2

distribution with p(p + 1)/2 degrees of freedom under H0, and is constructed as follows:

M = c [(n1 + n2 − 2) log Spooled| − (n1 − 1) log S1 − (n2 − 1) log S2 ] ,| | | | |

where the correction factor c is

2 p2 + 3 p− 1 1 1 1 +c = 1 −

6 (p− 1) n1 − 1 n2 − 1 −

n1 + n2 − 2

and furthermore

n2n1 1 �1 � S1 =

n1 − 1(Xi − X)(Xi − X)� , S2 =

n2 − 1(Yi − Y)(Yi − Y)�

i=1 i=1

are unbiased estimators of Σ1, Σ2 respectively and

Spooled =(n1 − 1)S1 + (n2 − 1)S2

n1 + n2 − 2

is the pooled, unbiased estimator of the common covariance matrix under Ho.

Box’s M test is affine invariant and is easy to compute, but it is very sensitive to departures

13

from the normality assumption.

Under the same assumptions as Box’s M test, Liu & Singh (2006, p. 26) used Theorem

3.4.8 from Mardia, Kent & Bibby (1979) to develop a likelihood ratio test for the hypothesis

= Σ2| versus < Σ2| ,H0 : |Σ1| | HA : |Σ1| |

where A| is the determinant of the matrix A. The test statistic is |

� n1 − 1

�p−1 �p−1 i=1 (n2 − i − 1) S1

Fproduct = � p−1

| | ,

n2 − 1 i=1 (n1 − i − 1) |S2|

which reduces to S1|/|S2 if n = n . Under H0, Fproduct follows the same distribution 1 2| |

as (τ1τ2 · · · τd), where τi ∼ F (n − i, n1 − i) and the τi’s are independent of one another. 2

As the null distribution of Fproduct is complicated, quantiles are best computed empirically.

This is easily done by sampling independently from each of the necessary F distributions,

multiplying these samples to obtain a sample from the distribution of Fproduct, repeating

until there are a large number of Fproduct samples, and taking the quantiles of the empirical

distribution of these samples. This was done for the sample sizes and dimensions we used

for our simulations, and the resulting quantiles can be viewed in Table 1.

n1 = n2 = 50 n1 = n2 = 100 n1 = n2 = 200 p = 2 1.9619 1.6006 1.3916 p = 5 2.9631 2.1207 1.6899 p = 10 4.8576 2.9328 2.1140

Table 1: Selected upper 0.05 quantiles of F product test statistic

In our empirical studies in this section, we use the multivariate normal N (µ, Σ), mul

tivariate Student’s t, multivariate Cauchy and Multivariate doubleexponential (Laplace)

distributions. As there are several versions of multivariate Student’s t, Cauchy, and double

exponential distributions, we define the appropriate version of these distributions used in

� �

14

this section.

Definition 1 A pdimensional random vector X is said to follow a multivariate Student’s

tdistribution with degrees of freedom ν, location vector µ and covariance matrix Σ if it has

probability density function

Γ( ν+p )f (x; ν, µ, Σ) = 2

Γ(ν/2)(νπ)(p/2)|Σ 1/2[1 + 1 ν (x − µ)TΣ−1(x − µ)](ν+p)/2|

(cf. Genz & Bretz, 1999). If ν = 1, X is said to follow a multivariate Cauchy distribution.

Definition 2 A pdimensional random vector X is said to follow a multivariate Laplace (or

multivariate double exponential) distribution with location vector µ and covariance matrix Σ

if it has probability density function

2exTΣ−1 µ � xTΣ−1x

�υ/2 �� f (x; µ, Σ) = Kυ (2 + µTΣ−1µ)(xTΣ−1x)

(2π)p/2 Σ|1/2 2 + µTΣ−1µ|

where υ = (2 − p)/2 and Kυ (·) is the modified Bessel function of the third kind given by

1 u 2

Kλ(u) = � �λ

� ∞

t−λ−1 exp −t − u

dt, u > 0 2 2 0 4t

(cf. Kotz, Kozubowski & Podgorski, 2002).

5.1 Assessing the effectiveness of the percentile modification in

terms of power gains

Gastwirth used A.R.E. as a criterion for comparing his modified univariate sumrank test to

other tests. As A.R.E. is difficult to estimate empirically in this case, we instead consider

the Monte Carlo estimates of the power and type I error.

� �

��

15

If the power (here a function of r) is denoted by π(r), then we generate two independent

samples from a given distribution with equal location vectors but with covariance matrices

which differ by a scale change.

We then estimate the power of our percentilemodified multivariate SiegelTukey test by

I

π(r) = 1

I(Wn�, i > z1−α/2)I

i=1

If it is instead desired to estimate the type I error, we generate two iid samples and use

the above formula again. Similar estimates may be obtained for the Box’s M test and the

F product test.

We also compute the sample standard error of the power estimate (or type I error estimate)

to be

I1

[I(Wn�, i > z1−α/2) − �π(r)]2

I i=1

where I(·) is again the indicator function.

We obtained estimates (as explained above) for the power and type I error of the percentile

modified multivariate SiegelTukey test for r = 0.1, 0.2, . . . , 1, where r = 1 represents the

unmodified multivariate SiegelTukey test. We also obtained estimates for the power and

type I error of Box’s M test and the F product test using the same simulated data (see

section 6.2). We report the simulation results only for Mahalanobis depth. Other depths

yielded similar results.

Figures 216 below give the estimated power of the modified multivariate SiegelTukey

test as a function of r using data from five different multivariate distributions. There is one

graph for each of p = 2, 5, 10 and within each graph, the sample sizes of n = m = 50, 100

and 200 are represented by dotted, dashed and solid lines respectively.

All estimates are based on I = 105 iterations. Standard error estimates indicate the

16

power and type I error estimates are correct to at least three decimal places. This is enough

accuracy to establish the significance of the trends observed in the plots.

Multivariate Normal

Figure 2: p = 2 Figure 3: p = 5 Figure 4: p = 10

Multivariate t(10)


17

Multivariate t(3)


Multivariate Cauchy (t(1))


Multivariate doubleexponential (Laplace)


18

The distribution of the first sample always had an identity covariance matrix; for the

second sample it was either again the identity matrix (to compute type I errors), or the

identity matrix multiplied by an appropriate scalar > 1.

Note that it is only meaningful to directly compare the results within each distribution, and

not between distributions, since the scale change of the second sample under the alternative

hypothesis was not the same for each distribution. Rather, the scale changes were weighted

as necessary for each distribution to ensure that for every combination of dimension and

sample size, the power was well above the target type I error of 0.05, but below 1 to allow

comparisons. The scale changes corresponding to the above distributions were 1.3, 1.3, 1.5,

2.2 and 1.6, respectively.

To make comparisons between distributions somewhat meaningful at least, the yaxis

viewing window for the plots range from the minimum power to the maximum power achieved

for that distribution.

As can be seen in the figures, For the normal, t(10) and Laplace distributions, slight gains

in power are made in dimension p = 2. Even slighter gains are maintained in 5 and 10

dimensions for the normal and Laplace cases, while losses are incurred in the t(10) case in

these dimensions.

Percentile modifications are harmful to the power for any p for the t(3) and Cauchy

distributions. In the Cauchy case, this is likely due to the indefiniteness of the moments,

which makes Mahalanobis depth a poor metric as it is not computing depth with respect to

an actual center.

In general it appears that the heaviertailed the distribution, the less effective percentile

modifications are (although the Laplace distribution has greater kurtosis than the t(10)).

Intuitively, this may be because the purpose of the modification is to concentrate the test

on the tail, and a very heavytailed distribution will likely have plenty of points from both

samples in the tail in spite of a scale difference. Also, the Mahalanobis depths themselves

19

may be inaccurate in the presence of many outliers.

In increasing the sample size an obvious increase in power is observed across the board,

but not much change in the graphs’ trends in r. It seems that sample size makes little

difference to the effectiveness of percentile modifications (although this may not be true for

very small sample sizes).

To summarize, some gains in power are observed for certain distributions, but they are

small, especially in relation to what Gastwirth observed in one dimension. The gains seem

to disappear as dimensionality increases, which may reflect the decreasing “importance” of

the distributional center in high dimensions.

5.2 Comparison with other tests

Tables 2 and 3 below, and 59 in the appendix compare the type I error and power respectively

of the modified multivariate SiegelTukey test (empirically maximized with respect to r), to

the power of Box’s M test and the F product test for the same data.

Type I error estimates are reported before power estimates, because high power is nothing

to get excited over if type I error is also high.

The α value used in both cases was 5%; this means that, if the two samples are generated

with the same covariance matrix, we expect the empirical type I error to be close to 0.05. If

this is not the case, it implies that the test statistic does not really follow the specified null

distribution.

For the multivariate SiegelTukey and our modified SiegelTukey tests, the empirical type

I errors were invariably very close to the target value of 0.05, so they are not reported. This

is, of course, expected since the test statistic is distributionfree.

For Box’s M test, the empirical type I errors were much higher than 0.05 when the data

were not normally distributed. This was true to a lesser degree for the F product test.

Indeed, for t(3), Cauchy and Laplace distributed data, and for t(10) data in high dimen

20

p n1 (= n2 ) Underlying Distribution

Normal t(10) t(3) t(1)

(Cauchy) Double Exp.

2 50 100 200

0.0424 0.0462 0.0481

0.109 0.123 0.133

0.543 0.640 0.718

0.974 0.990 0.997

0.284 0.307 0.323

5 50 100 200

0.0409 0.0455 0.0484

0.175 0.213 0.236

0.859 0.940 0.975

1.000 1.000 1.000

0.593 0.647 0.671

10 50 100 200

0.0381 0.0434 0.0462

0.285 0.380 0.439

0.984 0.998 1.000

1.000 1.000 1.000

0.921 0.955 0.968

Table 2: Empirical type I errors of Box’s M test

p n1 (= n2 ) Underlying Distribution

Normal t(10) t(3) t(1)

(Cauchy) Double Exp.

2 50 100 200

0.0488 0.0509 0.0496

0.0927 0.0947 0.0969

0.251 0.278 0.303

0.432 0.451 0.450

0.163 0.166 0.170

5 50 100 200

0.0499 0.0487 0.0492

0.115 0.123 0.126

0.284 0.317 0.342

0.449 0.464 0.461

0.206 0.212 0.215

10 50 100 200

0.0503 0.0504 0.0678

0.144 0.154 0.186

0.306 0.340 0.382

0.456 0.474 0.474

0.251 0.261 0.263

Table 3: Empirical type I errors of F product test

�

21

sions, the type I errors are so high as to make Box’s M test meaningless. Thus, without

even looking at power estimates it is safe to say that the multivariate SiegelTukey test is

vastly superior to Box’s M test if the normality assumption does not hold.

From Table 5, it appears that for normally distributed data, the multivariate Siegel

Tukey test outperforms Box’s M test, but not the F product test, in terms of power. This

establishes, at least on an empirical basis, that the SiegelTukey test is superior to Box’s M

for both normal and (at least some) nonnormal data. The F product test is to be preferred

of the three for normal data, as long as the onesided hypothesis on covariance determinants

is sufficient for the application in question.

Even under some nonnormal circumstances, the F product test may be preferred to the

SiegelTukey. For instance, when n = n = 50 and p = 2 for t(10) or Laplace data, the power 1 2

of the F product test is more than twice that of the optimal SiegelTukey. The type I errors

of the F product test for these two circumstances are roughly 0.09 and 0.16 respectively.

Thus, as long as it is understood that the actual type I error is somewhat higher than what

the quantiles would indicate, the F product may be the better test to use. However, if one

desires to fix the type I error, our test is preferable.

It should be noted that the estimated power and type I errors of the Box’s M and F

product tests under Cauchy data (for which moments are undefined) had larger standard

errors than the rest of the estimates; this is why they do not always increase as dimension

and/or sample size increase.

6 An example

A data set given by Johnson and Wichern (1988, p. 261262) is examined here using Box’s M ,

Fproduct, W and Vn tests. The data set originally was taken from Jolicoeur and Mosimann

(1960), who studied the relationship of size and shape for painted turtles. This data set

22

contains measurements on the carapace of n1 = 24 female and n2 = 24 male turtles.

The marginal scatter plots of the variables in Figure 6 suggest location and scale differences

between groups. To apply a scale test, we first shift the data in order to have the same

center. A depthdepth plot (see Parelius et al 1999) and chisquare quantilequantile plots

(cf Johnson and Wichern 1988) reported in Figures 18 and 19 respectively, suggest that the

assumption of multivariate normality is plausible.

We consider a test for H0 : σ = (1, 1, 1)T , where σ is the trivariate scale vector discrimi

nating the two distributions. This is done using four different tests namely Box’s M , Fproduct,

multivariate SiegelTukey (MST) and its percentile modified version (PMST). The pvalues

are reported in the following table. All pvalues suggest that the data does not support the

Box’s M MST PMST with r = 0.6Fproduct

0.00142 0.00013 0.00429 0.00127

Table 4: Pvalues of various multivariate scale tests for turtle data

null hypothesis of equal scales which can be seen in Figure 17. The lowest Pvalues is given

Fproduct followed by PMST with r = 0.6, which is consistent with our simulations in section

5.

●

●

●● ●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●●

●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●●

●●

0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

Female turtles

Depths based on turtle data

Dep

ths

base

d on

sim

ulat

ed N

orm

al d

ata

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

● ●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●

●●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

0.2 0.4 0.6 0.8

0.2

0.4

0.6

0.8

Male turtles

Depths based on turtle data

Dep

ths

base

d on

sim

ulat

ed N

orm

al d

ata

Figure 18: Depthdepth plot of turtle data vs. simulated normal data

23

Figure 17: Scatter plots of Length, Width, and Height versus observation index

24

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

0 2 4 6 8 10

02

46

810

Female turtle

Squared mahalanobis distances

Chi

−sq

uare

qua

ntile

s w

ith d

f=3 ●

●

●

●

●

●

●●

●

●

●

●●

●●

●

●

●●

●

●

●

●

0 2 4 6 8 10

02

46

810

Male turtle

Squared mahalanobis distancesC

hi−

squa

re q

uant

iles

with

df=

3

Figure 19: Chisquare quantilequantile plot of turtle data

7 Conclusion and future work

In conclusion, the multivariate SiegelTukey test using Mahalanobis depth outperforms Box’s

M test in terms of power for normal data, and in terms of type I error for tdistributed data.

It is also preferable to the F product test for nonnormal, heavytailed data, particularly

in high dimensions.

Extending Gastwirth’s percentile modification approach to this test improves the power

for normal and Laplace data, especially in two dimensions but to a lesser extent in five and

ten. It improves the power for t(10) data in two dimensions. For all other cases tested, the

modification was counterproductive.

While the improvements were in no case comparable to those obtained by Gastwirth’s

univariate modification, in any situation where computational cost is not a major issue, and

the data are not too heavytailed, this modification is recommended to improve the power

of what is already a good test.

However, the failure of the modification for Cauchy (and t(3)) data provides an incentive

to investigate the effects of using robust estimates for µF

and Σ . One possibility is to use F

25

the reweighted minimum covariance determinant (RMCD) estimates (cf. Rousseeuw and

van Zomeren 1990). A fast algorithm to compute RMCD estimates is given by Rousseeuw

and Van Driessen (1999). This algorithm is implemented in standard statistical softwares

such as R.

There may be other ways to improve the modification; for instance, instead of discarding

a fixed proportion of deepest points, one could discard all points with a depth value greater

than some threshold. Of course, a test statistic in this case would no longer be distribution

free.

8 Appendix

p n1 (= n2 ) Test

Basic SiegelTukey

Modified SiegelTukey

Box’s M F product

2 50 100 200

0.195 0.352 0.616

0.213 (r = 0.5) 0.393 (r = 0.4) 0.671 (r = 0.4)

0.141 0.290 0.568

0.358 0.576 0.832

5 50 100 200

0.451 0.760 0.968

0.459 (r = 0.6) 0.769 (r = 0.7) 0.970 (r = 0.7)

0.143 0.346 0.729

0.633 0.890 0.993

10 50 100 200

0.736 0.967 1.000

0.736 (r = 0.9) 0.967 (r = 1.0) 1.000 (r = 0.8)

0.122 0.359 0.811

0.859 0.991 1.000

Table 5: Empirical power of all three tests for Normal data

26

p n1 (= n2 ) Test

Basic SiegelTukey


Box’s M F product

2 50 100 200

0.171 0.299 0.538

0.179 (r = 0.6) 0.314 (r = 0.6) 0.561 (r = 0.6)

0.229 0.372 0.594

0.387 0.558 0.778

5 50 100 200

0.324 0.581 0.871

0.326 (r = 0.9) 0.581 (r = 0.9) 0.871 (r = 1.0)

0.332 0.544 0.797

0.600 0.808 0.955

10 50 100 200

0.462 0.768 0.971

0.462 (r = 1.0) 0.768 (r = 1.0) 0.971 (r = 1.0)

0.463 0.722 0.925

0.760 0.929 0.995

Table 6: Empirical power of all three tests for t(10) data

p n1 (= n2 ) Test

Basic SiegelTukey


Box’s M F product

2 50 100 200

0.242 0.441 0.736

0.245 (r = 0.9) 0.443 (r = 0.9) 0.737 (r = 0.9)

0.652 0.783 0.883

0.554 0.665 0.769

5 50 100 200

0.377 0.664 0.926

0.377 (r = 1.0) 0.664 (r = 1.0) 0.926 (r = 1.0)

0.915 0.976 0.995

0.688 0.790 0.873

10 50 100 200

0.461 0.765 0.970

0.461 (r = 1.0) 0.765 (r = 1.0) 0.970 (r = 1.0)

0.993 1.000 1.000

0.785 0.872 0.938

Table 7: Empirical power of all three tests for t(3) data

27

p n1 (= n2 ) Test

Basic SiegelTukey


Box’s M F product

2 50 100 200

0.247 0.432 0.685

0.258 (r = 0.8) 0.456 (r = 0.7) 0.727 (r = 0.7)

0.978 0.992 0.997

0.589 0.608 0.605

5 50 100 200

0.398 0.677 0.924

0.398 (r = 1.0) 0.677 (r = 1.0) 0.924 (r = 1.0)

1.000 1.000 1.000

0.668 0.681 0.679

10 50 100 200

0.478 0.777 0.972

0.478 (r = 1.0) 0.777 (r = 1.0) 0.972 (r = 1.0)

1.000 1.000 1.000

0.745 0.760 0.762

Table 8: Empirical power of all three tests for Cauchy data

p n1 (= n2 ) Test

Basic SiegelTukey


Box’s M F product

2 50 100 200

0.259 0.471 0.773

0.272 (r = 0.6) 0.492 (r = 0.6) 0.791 (r = 0.6)

0.557 0.755 0.927

0.652 0.833 0.962

5 50 100 200

0.376 0.658 0.919

0.391 (r = 0.6) 0.676 (r = 0.6) 0.929 (r = 0.6)

0.823 0.944 0.993

0.828 0.956 0.997

10 50 100 200

0.436 0.731 0.956

0.458 (r = 0.6) 0.757 (r = 0.5) 0.969 (r = 0.5)

0.976 0.997 1.000

0.905 0.985 1.000

Table 9: Empirical power of all three tests for Laplace data

28

References

[1] Ansari, A.R., Bradley, R.A. (1960). RankSum Tests for Dispersions. Ann Math Stat

31:11741189.

[2] Bartlett, M.S. (1937). Properties of Sufficiency and Statistical Tests. Proceedings of the

Royal Society of London, Series A, 160:268282.

[3] Box, G.E.P. (1949). A General Distribution Theory for a Class of Likelihood Criteria.

Biometrika 36:317346.

[4] Chenouri, S. (2004). Multivariate Robust Nonparametric Inference based on Data Depth.

Ph.D thesis, Dept. of Statistics & Actuarial Science, University of Waterloo, Canada.

[5] Chenouri, S. & Small, C.G. (2004). A Nonparametric Multivariate Multisample Test

Based on Data Depth.

[6] Conover, W.J. (1999). Practical Nonparametric Statistics. John Wiley & Sons, Inc.

[7] Donoho, D.L. (1982). Breakdown properties of multivariate location estimators. Ph.D

thesis, Dept. of Statistics, Harvard University.

[8] Ferguson, T.S. (1996). A course in large sample theory Chapman and Hall, New York.

[9] Gao. Y. (2003). Data depth based on spatial rank. Statistics & Probability Letters 65:217

225.

[10] Gastwirth, J.L. (1965). Percentile Modifications of Two Sample Rank Tests. Journal of

the American Statistical Association, 60:11271141.

[11] Genz, A., Bretz, F. (1999). Numerical computation of multivariate tprobabilities with

application to power calculation of multiple contrasts. Journal of Statistical Computation

and Simulation 63:361378.

29

[12] H´ ak, Z. V. (1967). Theory of Rank Test. Academic Press, New York. ajek, J. and Sid´

[13] Hodges, J. L. (1955). A bivariate sign test. Ann. Math. Statist. 26, 523527.

[14] Hotelling, H. (1929). Stability in competition. Econom. J. 39 4157.

[15] Johnson, R.A. and Wichern D. W. (1988). Applied Multivariate Statistical Analysis.

PrenticeHall, Englewood Cliffs, NJ.

[16] Jolicoeur P. and Mosimann J. E. (1960). Size and shape variation in the painted turtle:

A principal component analysis. Growth, 24, 339354.

[17] Kotz, S., Kozubowski, T.J., Podgorski, K. (2002). An asymmetric multivariate Laplace

distribution. Technical Report No. 367, Department of Statistics and Applied Probability,

University of California at Santa Barbara.

[18] Liu, R.Y. (1990). On a notion of data depth based on random simplices. Ann. Statist.

18:405414.

[19] Liu, R.Y., Singh, K. (1993). A Quality Index Based on Data Depth and Multivariate

Rank Tests. Journal of the American Statistical Association, 88:252260.

[20] Liu, R.Y., Singh, K. (2006). Rank tests for multivariate scale difference based on data

depth. Data Depth: Robust Multivariate Analysis, Computational Geometry and Appli

cations, DIMACS Series, AMS, 1736.

[21] Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proc. Nat. Acad.

India, 12:4955.

[22] Mardia, K.V., Kent, J.T., Bibby, J.M. (1979). Multivariate Analysis. Academic Press

Inc., New York.

30

[23] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Prob. Letters

1:327333.

[24] Rousseeuw, P.J., Hubert, M. (1999). Regression Depth. Journal of the American Sta

tistical Association, 94:388402.

[25] Rousseeuw, P. J. and van Zomeren, B. C. (1990). Unmasking multivariate outliers and

leverage points. Journal of American Statistical Association, 85 633639.

[26] Serfling, R. (2002). A depth function and a scale curve based on spatial quantiles. In

Statistical Data Analysis Based On the L1Norm and Related Methods (Y. Dodge, ed.),

pp. 2538. Birkhauser, Basel.

[27] Siegel, S., Tukey, J.W. (1960). A Nonparametric Sum of Ranks Procedure for Relative

Spread in Unpaired Samples. Journal of the American Statistical Association 55:429445.

[28] Singh, K. (1991). A notion of majority depth. Preprint.

[29] Small, C.G. (1990). A Survey of Multidimensional Medians International Statistical

Review 58:263277.

[30] Stahel, W.A. (1981). Robuste Schatzungen: Infintesimale Optimalitat and Schatzungen

von Kovarianzmatrizen. Ph.D thesis, ETH, Zurich.

[31] Tukey, J. W. (1975). Mathematics and picturing data. Proc. Intern. Congr. Math. Van

couver 1974 2 523531.

[32] Zuo, Y., Serfling, R. (2000). General Notions of Statistical Depth Function. Ann Stat

28:461482.

A Twosample Nonparametric Multivariate Scale Test based ... · sum test for shift in location is...

Documents

Transcript of A Twosample Nonparametric Multivariate Scale Test based ... · sum test for shift in location is...