A Twosample Nonparametric Multivariate Scale Test based ... · sum test for shift in location is...
Transcript of A Twosample Nonparametric Multivariate Scale Test based ... · sum test for shift in location is...
A Twosample Nonparametric Multivariate
Scale Test based on Data Depth
Shoja’eddin Chenouri Thomas J. Farrar
Department of Statistics and Actuarial Science
University of Waterloo
Waterloo, N2L 3G1
Abstract
In this paper, a percentile modification is applied to a multivariate extension of the
well known SiegelTukey test based on data depth. This modification is done by remov
ing the deepest proportion of the data. The asymptotic distribution of this modified
test statistic is proven to be standard normal. The test is compared empirically to the
Box’s M test and the F product test. It performs favourably in terms of type I error
and power, especially for nonnormal data. The empirical results also show that the
percentile modification increases power for lowdimensional data. The test is applied
to a real data set. It is claimed that the percentilemodified multivariate SiegelTukey
test is a viable option for testing scale change between two multivariate populations.
KEYWORDS: Data depth, Multivariate nonparametric, Scale test, SiegelTukey test
1
2
1 Introduction
In multivariate statistical analysis it is often desirable to compare the dispersion between
two or more populations. The Box’s M test (Box 1949) and F product tests (Mardia et al
1979 and Liu and Singh 2006) are two available options for this purpose. Both tests assume
that the underlying distributions are multivariate normal. The Box’s M test is a modified
likelihood ratio test of homogeneity of multiple covariance matrices, which is equivalent to the
Bartlett’s test in univariate case. The F product test is a two sample test of scale expansion.
Recently, Chenouri (2004) and Liu & Singh (2006) have proposed a family of multivariate
nonparametric tests. These tests are based on the centeroutward ranks generated by depth
functions. These multivariate nonparametric tests can be treated as multivariate extensions
of the SiegelTukey and AnsariBradleyFreund tests.
It is known that when the samples are drawn from normal populations, the Wilcoxon rank
sum test for shift in location is nearly as efficient (0.955) as the ttest in terms of asymptotic
relative efficiency (A.R.E.). The SiegelTukey and AnsariBradleyFreund tests for change in
scale, however, have low (0.61) A.R.E relative to the F test when the distributions sampled
are normal.
Gastwirth (1965) developed a method for improving the univariate AnsariBradleyFreund
test in terms of Pitman efficiency. He reasoned that data from two distributions with equal
location but different scale are likely to both be clustered near the mean, but as one moves
away from the mean, one will predominantly find points from the distribution with greater
scale. That is, the scale difference between the distributions will be most evident in the tail.
His idea, then, was to place more weight on the extreme data points than the central ones
when computing the ranksum statistic, and in fact, to eliminate the most central ranks from
the statistic altogether.
In this paper we adopt Gastwirth’s idea to improve the power of the depthbased multi
variate SiegelTukey test of Chenouri (2004), and Liu and Singh (2006).
3
To summarize, the method in p dimensions is to obtain a centeroutward ranking from 1
to n of our combined sample using a depth function. Then, discard those points whose rank
is above the rth percentile, 0 < r ≤ 1. That is, ignore the deepest (1 − r)% of the data when
computing the test statistic. Another way to say this is that we retain all points with center
outward rank at most n�, where n� = �r n�, and n is the combined sample size.
2 Statistical depth functions
In the univariate case, the idea of data depth was first used by Hotelling (1929). Hodges
(1955) introduced the idea of halfspace depth for bivariate data in order to construct his
bivariate sign test. Tukey (1974) formally defined the halfspace depth function and applied
it to visualize bivariate data. Since Tukey (1974) numerous other notions of data depth have
been developed, including projection depth (Stahel, 1981 and Donoho, 1982), simplicial
volume depth (Oja, 1983, Zuo and Serfling, 2000), simplicial depth (Liu, 1990), majority
depth (Singh 1991), Mahalanobis depth (Liu and Singh, 1993), regression depth (Rousseeuw
and Hubert 1999) and spatial depth (cf. Serfling, 2002).
Small (1990) and Zuo and Serfling (2000) provide a summary of the study of statistical
depth up to those dates. Zuo and Serfling (2000), inspired by Liu (1990), have listed four
criteria that every useful depth function ought to satisfy. These desirable properties are
affine equivariance, maximality at center, monotonicity with respect to the deepest point,
and vanishing at infinity. A point that maximizes data depth is called a multidimensional
median. In the case that there is more than one maximizer, we take the centroid of the set
of maximizers as the multidimensional median (see Small 1990 for an interesting historical
review). An obvious application of data depth is in ordering and ranking of multivariate data
centeroutwardly. The center outward ordering and ranking have been used in multivariate
nonparametric inference and also robust statistics (cf. Chenouri and Small 2005, and Liu
4
et al 1999).
To formally define a few depth functions, let X1, . . . ,Xn be a random sample from a
pdimensional distribution F and suppose that Fn represents the corresponding empirical
distribution, taken as a nonparametric estimate of F . The spatial depth of a given point
x ∈ Rd with respect to F and the data cloud are
1 SPD(x; F ) = , (1)
1 + EF �X − x�
and 1
SPD(x; Fn) = � (2) n
1 + 1 n �Xi − x�
i=1
respectively, where � · � is the usual Euclidean norm.
The Mahalanobis depth of x is defined by
1 MHD(x; F ) = (3)
1 + (x − µF )T Σ−1(x − µ
F )
F
where µF
is a center and Σ a dispersion matrix of distribution F . A sample version of the F
MHD is defined by replacing µF
and Σ with appropriate estimates. F
The Halfspace depth or Tukey depth at a point x ∈ Rp with respect to F is defined to be
HSD (x, F ) = inf PrF (Hx,u) = inf Pr
F (Hx,u) (4) u∈Rp ||u||=1
where Hx,u = {y ∈ Rp : u�y ≤ u�x} is a closed halfspace. Note that the empirical version
of the halfspace depth is obtained by simply replacing F by Fn
Tmin # {i : uT Xi ≤ u x, i = 1, 2, . . . , n}HSD (x, Fn) =
�u�=1 . (5)
n
� ��
5
In other words, for pvariate data, the empirical half space depth of a given point x cor
responds to the minimal proportion of data points contained in a closed halfspace whose
boundary, a (p− 1)dimensional hyperplane, passes through x.
Finally the simplicial depth (Liu 1990) of a given point x ∈ Rp with respect to F is defined
to be
SD (x; F ) = PrF {S[X1,X2, ...,Xp+1] (6) � x}
where S[X1,X2, ...,Xp+1] is the closed simplex whose vertices X1,X2, ...,Xp+1 are p+ 1 ran
domly chosen observations from F . The sample version of SD (x; F ) is defined by replacing
F by Fn or by computing the fraction of simplices containing x i.e.,
−1 �n SD(x; Fn) =
p + 1 I(S[Xi1 , Xi2 , ..., Xip+1 ] � x) (7)
Here runs over all possible subsets of {X1, X2, . . . ,Xn} of size p + 1, and I(·) is the
indicator function.
Now given a depth function D(· ; ·), one can compute the depths of sample points X1, . . . ,Xn
with respect to a given (either theoretical or empirical) distribution F , that is
D(X1 ; F ), . . . , D(X1 ; F ) ,
and order them according to decreasing depth values. This gives a ranking of the sample
points from the center outward. More precisely, for a sample point Xi
Ri = R(Xi) = #{Xj ; D(Xj ; Fn) ≥ D(Xi ; F ), j = 1, . . . n},
is the centeroutward rank of Xi with respect to the data cloud {X1, X2, . . . ,Xn}. The
implication of this ranking procedure is that a larger rank is associated with a more outlying
point within the data cloud.
�
6
3 Multivariate scale tests
Let X = {X1, X2, . . . , Xn1 } and Y = {Y1, Y2, . . . , Yn2 } be two independent random sam
ples from pvariate absolutely continuous distributions F and G respectively. Consider a
depth function D(· ; ·). We assume that the distributions F and G have the same center
(deepest point) with respect to this depth function, but possibly different scales. We are
interested in testing
H0 :F (x) = G(x) for all x ∈ Rp
(8) HA : there exists σ = (σ1, . . . , σp) = 1 such that F (x) = G(σ x) for all x ∈ Rp ,
where σ x = (σ1 x1, . . . , σp xp). Note that if we let Σ1 and Σ2 be the covariance matrices of
F and G respectively, the hypothesis can be equivalently written as
H0 : Σ1 = Σ2 versus HA : Σ1 − Σ2 semidefinite .
In the univariate case (p = 1), one may use the SiegelTukey or AnsariBradley tests
which are strictly distribution free. The multivariate extension of the SiegelTukey test is no
different from the univariate case except that we use a depth function to order the combined
sample in a centeroutward manner. The rationale for the test is that if two random samples
are drawn from two distributions that differ only in scale and then combined, we would
expect the sample from the distribution with smaller scale to be scattered tightly around
the center, while observations from the distribution with larger scale would tend to occupy
more outlying positions. The notion of data depth provides a centeroutward ranking which
is suitable for capturing this difference using a ranksum test.
More precisely, let Ri be the linear rank of D(Xi ; Hn) in the set
{D(X1 ; Hn), . . . , D(Xn1 ; Hn), D(Y1 ; Hn), . . . , D(Yn2 ; Hn)},
�
7
where Hn is the empirical distribution of the dataset Z = . Note that Ri, i = 1, . . . , n1 isX∪Y
the centeroutward rank of Xi within the pooled dataset Z. A natural multivariate extension n1
of the SiegelTukey test is based on the Wilcoxon type ranksum statistic W = Ri. Under i=1
the null hypothesis, assuming no ties, Ri’s are identically and uniformly distributed on the
set {1, 2 , . . . , n}. This implies that under Ho, W is distribution free. We reject the null
hypothesis if W is large or small enough. For small sample sizes the distribution of W can
be tabulated, but for large samples it is computationally expensive if not impossible. Note
that under Ho
EHo
(Ri) = n + 1
2 , V ar
Ho (Ri) =
n2 − 1 12
, CovHo
(Ri, Rj ) = − n + 1 12
, i �= j . (9)
Hence
EHo
(W ) = n1 (n + 1)
2 V ar
Ho (W ) =
n1 n2 (n + 1) 12
(10)
and as n →∞ W − E
Ho (W )�
V arHo
(W ) D−→ N(0, 1) . (11)
So for large sample sizes we can use quantiles of N(0, 1) to test Ho.
4 A modified multivariate SiegelTukey test
Gastwirth (1965) developed a method for improving the univariate AnsariBradley test. In
this section we adopt Gastwirth’s idea to improve the power of the depth based multivariate
SiegelTukey test. As it is illustrated in Figures 1 for p = 2, we discard those points whose
centeroutward rank is above the rth percentile, 0 < r ≤ 1. That is, we ignore the deepest
(1 − r)% of the data when computing the test statistic.
For i = 1, . . . , n, define δi = 1 if an observation from the dataset X has the centeroutward
rank of i within the combined dataset Z = X ∪ Y , and δi = 0 otherwise.
�
�
�
� � � �
8
Figure 1: Combined sample before (left panel) and after (right panel) removing 30% deepest points (r=0.7)
The retained sample is denoted as Z � = {X1� , . . . ,X� , Y1, . . . ,Y
�2 } where n� =
� n� δin��
n� 1 i=11
and n� = n� − n1� are the random variables representing the number of points retained from 2
X and Y respectively. Note that n�, the total number of points retained, is fixed.
We define a new test statistic
n�
Vn� = [n� + 1 − i]δi
i=1
That is, Vn� is the sum of the reversed ranks of the retained portion of the dataset X . The
ranks are reversed so that the points further from the center receive higher weight. We reject
the null hypothesis if Vn is large or small enough.
Under Ho, assuming no ties, Ri’s are identically and uniformly distributed on the set
{1, 2 , . . . , n}. This implies that under Ho, Vn is distribution free. For small sample sizes
n1, n2, the exact distribution of Vn , for any given n�, can be tabulated. This is done by
enumerating all nn 1
possible combinations of the ranks of sample X , computing Vn on each
of them (for the given n�), and counting the number of times Vn� takes on each integer value.
�
� �
9
By way of example, for n = n = 5, the exact distribution of V7 is given by1 2 ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
1 � � k = 3, 4, 24, 2510 5
2 � � k = 5, 2310 5
5⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
� � k = 6, 2210 5
6 � � k = 7, 2110 5
9 ⎨� � k = 8, 2010 Pr (V7 = k) = 5 (12)⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪
12 � � k = 9, 1910 5
17 � � k = 10, 11, 17, 1810 5
22 � � k = 12, 13, 15, 1610 5
24 � � k = 1410 5 ⎩0 otherwise
Note that the distribution of Vn� is not always symmetric; this is true only if n1 and n2 are
either both less than n� or both greater than n�.
The support of Vn is given by k = kmin, . . . , kmax, where
kmin =(n� − n2)(n
� − n2 + 1) I(n2 < n�)
2
n� (n� + 1)kmax =
2 −
(n� − n1)(n
2
� − n1 + 1) I(n1 < n�)
This exhaustive approach to computing the exact distribution quickly becomes computa
tionally burdensome as n1 and n2 increase. It is only feasible up to around n1 = n2 = 10.
We shall develop a large sample analysis for the test statistic Vn under the null hypothesis.
Note that Vn� can be written in form of a simple linear rank statistic Vn = n ci a(i)� i=1
� � � �
� � �
10
where for i = 1, . . . , n, a(i) = δi and
⎧ ⎪⎪⎨n� + 1 − i if i ∈ {1, . . . , n�}ci = ⎪⎪⎩0 otherwise .
Thus,
n1 � n1
a = δi = , n n
i=1
1 � n�n1 � n�(n� + 1)
c = ci = (n� + 1 − i) = , n n 2n
i=1 i=1
n n� 2
� 1 2 �
n1 n1 �
σ2 =1 �
(a(i) − ¯1 �
a =
�
n1 − n1 =
� 1 −a)2 = a(i)2 − n ¯a n n n n n n
i=1 i=1
nn1 � n�
1 � 2σ2 = (ci − c)2 = ci − n c2 =
1 � (n� + 1 − i)2 − n c2
c n n n i=1 i=1 � i=1
1 n�(n� + 1)(2n� + 1) n�2(n� + 1)2 n�(n� + 1) 2n� + 1 n�(n� + 1) = =
n 6 −
4n 2n 3 −
2n
ajek and Sid´Now from Theorem c. page 61 of H´ ak (1967) or Lemma 1. page 78 of Ferguson
(1996) we conclude that
n1 n�(n� + 1) n1 n
�(n� + 1)E (Vn� ) = n ¯ ca ¯ = n =
Ho n 2n n 2 2 � �
V ar (Vn� ) = n
n
− 1 σ2 σ2 =
n2 n1 � 1 −
n1 n�(n� + 1) � 2n� + 1 n�(n� + 1)
(13)Ho a c n− 1 n �� n 2n 3
− 2n
n1 n�(n� + 1) � n1 2n� + 1 n�(n� + 1)
�
= 2(n− 1)
1 − n 3
− 2n
Theorem 1 Under the null hypothesis, if n1/n → λ1 as n and n1 → ∞ with 0 < λ1 < 1,
then the test statistic Vn� − E (Vn )
Tn = � Ho �
V ar (Vn� )Ho
has an asymptotic N(0, 1) distribution.
� �
� �
11
Proof: First note that since the function f(x) = (c−x)2 defined in the interval [a, b] attains
its maximum in the set of boundary points {a, b} we conclude that
2max (ci − c)2 = max{ max (ci − c)2 , ci=1,...n i=1, ...,n�
} �2 � �2 �
n�(n� + 1) n�(n� + 1) = max max n� + 1 − i− ,
i=1,...n� 2n 2n �� �2 � �2 � �2 � (14)
n�(n� + 1) n�(n� + 1) n�(n� + 1) = max n� , 1 − ,−
2n 2n 2n � �2 n�(n� + 1)
= n� − 2n
so clearly � �2 max (ci − c)2
= n� �− n
�(n�+1) i=1,...n 2n
n � � n c)2 n� (n�+1) 2n�+1 n�(n� +1) i=1(ci − ¯
2n 3 − 2n
is bounded. In addition
2 2 max (a(i) − ¯ n1 n2a)2
max n2 ,i=1,...n n2 � = 0 (15) n a)2 n1 n2
→i=1(a(i) − ¯
n
as n →∞ and n1/n → λ1 with 0 < λ1 < 1. Thus Theorem 12. page 82 of Ferguson (1996)
implies that Vn� − E (Vn )
Tn = � Ho �
V ar (Vn� )Ho
has an asymptotic N(0, 1) distribution.
It is worth noting that for values of n1, n2 > 10 which are still too small to use the normal
approximation, we can approximate the exact probabilities under Ho by sampling from the
population of all possible combinations.
� �
12
5 Empirical Results
Before we assess the effectiveness of the percentile modification in terms of power gains we
give a short overview on two parametric competitors to the depth based tests discussed in
the previous two sections. These tests are used in the simulations along with the multivariate
SiegelTukey test and its percentile modification.
Under assumption of normality, i.e. when X1, . . . , Xn1 and Y1, . . . , Yn1 are independent
samples from pvariate normal distributions N(µ1, Σ1) and N(µ2, Σ2) respectively, Box
(1949) proposed a modified likelihood ratio test for testing Ho : Σ1 = Σ2 versus HA :
Σ1 =� Σ2. This test is known as Box’s M test. Its test statistic M follows an asymptotic χ2
distribution with p(p + 1)/2 degrees of freedom under H0, and is constructed as follows:
M = c [(n1 + n2 − 2) log Spooled| − (n1 − 1) log S1 − (n2 − 1) log S2 ] ,| | | | |
where the correction factor c is
2 p2 + 3 p− 1 1 1 1 +c = 1 −
6 (p− 1) n1 − 1 n2 − 1 −
n1 + n2 − 2
and furthermore
n2n1 1 �1 � S1 =
n1 − 1(Xi − X)(Xi − X)� , S2 =
n2 − 1(Yi − Y)(Yi − Y)�
i=1 i=1
are unbiased estimators of Σ1, Σ2 respectively and
Spooled =(n1 − 1)S1 + (n2 − 1)S2
n1 + n2 − 2
is the pooled, unbiased estimator of the common covariance matrix under Ho.
Box’s M test is affine invariant and is easy to compute, but it is very sensitive to departures
13
from the normality assumption.
Under the same assumptions as Box’s M test, Liu & Singh (2006, p. 26) used Theorem
3.4.8 from Mardia, Kent & Bibby (1979) to develop a likelihood ratio test for the hypothesis
= Σ2| versus < Σ2| ,H0 : |Σ1| | HA : |Σ1| |
where A| is the determinant of the matrix A. The test statistic is |
� n1 − 1
�p−1 �p−1 i=1 (n2 − i − 1) S1
Fproduct = � p−1
| | ,
n2 − 1 i=1 (n1 − i − 1) |S2|
which reduces to S1|/|S2 if n = n . Under H0, Fproduct follows the same distribution 1 2| |
as (τ1τ2 · · · τd), where τi ∼ F (n − i, n1 − i) and the τi’s are independent of one another. 2
As the null distribution of Fproduct is complicated, quantiles are best computed empirically.
This is easily done by sampling independently from each of the necessary F distributions,
multiplying these samples to obtain a sample from the distribution of Fproduct, repeating
until there are a large number of Fproduct samples, and taking the quantiles of the empirical
distribution of these samples. This was done for the sample sizes and dimensions we used
for our simulations, and the resulting quantiles can be viewed in Table 1.
n1 = n2 = 50 n1 = n2 = 100 n1 = n2 = 200 p = 2 1.9619 1.6006 1.3916 p = 5 2.9631 2.1207 1.6899 p = 10 4.8576 2.9328 2.1140
Table 1: Selected upper 0.05 quantiles of F product test statistic
In our empirical studies in this section, we use the multivariate normal N (µ, Σ), mul
tivariate Student’s t, multivariate Cauchy and Multivariate doubleexponential (Laplace)
distributions. As there are several versions of multivariate Student’s t, Cauchy, and double
exponential distributions, we define the appropriate version of these distributions used in
� �
14
this section.
Definition 1 A pdimensional random vector X is said to follow a multivariate Student’s
tdistribution with degrees of freedom ν, location vector µ and covariance matrix Σ if it has
probability density function
Γ( ν+p )f (x; ν, µ, Σ) = 2
Γ(ν/2)(νπ)(p/2)|Σ 1/2[1 + 1 ν (x − µ)TΣ−1(x − µ)](ν+p)/2|
(cf. Genz & Bretz, 1999). If ν = 1, X is said to follow a multivariate Cauchy distribution.
Definition 2 A pdimensional random vector X is said to follow a multivariate Laplace (or
multivariate double exponential) distribution with location vector µ and covariance matrix Σ
if it has probability density function
2exTΣ−1 µ � xTΣ−1x
�υ/2 �� � f (x; µ, Σ) = Kυ (2 + µTΣ−1µ)(xTΣ−1x)
(2π)p/2 Σ|1/2 2 + µTΣ−1µ|
where υ = (2 − p)/2 and Kυ (·) is the modified Bessel function of the third kind given by
1 u 2
Kλ(u) = � �λ
� ∞
t−λ−1 exp −t − u
dt, u > 0 2 2 0 4t
(cf. Kotz, Kozubowski & Podgorski, 2002).
5.1 Assessing the effectiveness of the percentile modification in
terms of power gains
Gastwirth used A.R.E. as a criterion for comparing his modified univariate sumrank test to
other tests. As A.R.E. is difficult to estimate empirically in this case, we instead consider
the Monte Carlo estimates of the power and type I error.
� �
�� � ��
15
If the power (here a function of r) is denoted by π(r), then we generate two independent
samples from a given distribution with equal location vectors but with covariance matrices
which differ by a scale change.
We then estimate the power of our percentilemodified multivariate SiegelTukey test by
I
π(r) = 1
I(Wn�, i > z1−α/2)I
i=1
If it is instead desired to estimate the type I error, we generate two iid samples and use
the above formula again. Similar estimates may be obtained for the Box’s M test and the
F product test.
We also compute the sample standard error of the power estimate (or type I error estimate)
to be
I1
[I(Wn�, i > z1−α/2) − �π(r)]2
I i=1
where I(·) is again the indicator function.
We obtained estimates (as explained above) for the power and type I error of the percentile
modified multivariate SiegelTukey test for r = 0.1, 0.2, . . . , 1, where r = 1 represents the
unmodified multivariate SiegelTukey test. We also obtained estimates for the power and
type I error of Box’s M test and the F product test using the same simulated data (see
section 6.2). We report the simulation results only for Mahalanobis depth. Other depths
yielded similar results.
Figures 216 below give the estimated power of the modified multivariate SiegelTukey
test as a function of r using data from five different multivariate distributions. There is one
graph for each of p = 2, 5, 10 and within each graph, the sample sizes of n = m = 50, 100
and 200 are represented by dotted, dashed and solid lines respectively.
All estimates are based on I = 105 iterations. Standard error estimates indicate the
16
power and type I error estimates are correct to at least three decimal places. This is enough
accuracy to establish the significance of the trends observed in the plots.
Multivariate Normal
Figure 2: p = 2 Figure 3: p = 5 Figure 4: p = 10
Multivariate t(10)
Figure 5: p = 2 Figure 6: p = 5 Figure 7: p = 10
17
Multivariate t(3)
Figure 8: p = 2 Figure 9: p = 5 Figure 10: p = 10
Multivariate Cauchy (t(1))
Figure 11: p = 2 Figure 12: p = 5 Figure 13: p = 10
Multivariate doubleexponential (Laplace)
Figure 14: p = 2 Figure 15: p = 5 Figure 16: p = 10
18
The distribution of the first sample always had an identity covariance matrix; for the
second sample it was either again the identity matrix (to compute type I errors), or the
identity matrix multiplied by an appropriate scalar > 1.
Note that it is only meaningful to directly compare the results within each distribution, and
not between distributions, since the scale change of the second sample under the alternative
hypothesis was not the same for each distribution. Rather, the scale changes were weighted
as necessary for each distribution to ensure that for every combination of dimension and
sample size, the power was well above the target type I error of 0.05, but below 1 to allow
comparisons. The scale changes corresponding to the above distributions were 1.3, 1.3, 1.5,
2.2 and 1.6, respectively.
To make comparisons between distributions somewhat meaningful at least, the yaxis
viewing window for the plots range from the minimum power to the maximum power achieved
for that distribution.
As can be seen in the figures, For the normal, t(10) and Laplace distributions, slight gains
in power are made in dimension p = 2. Even slighter gains are maintained in 5 and 10
dimensions for the normal and Laplace cases, while losses are incurred in the t(10) case in
these dimensions.
Percentile modifications are harmful to the power for any p for the t(3) and Cauchy
distributions. In the Cauchy case, this is likely due to the indefiniteness of the moments,
which makes Mahalanobis depth a poor metric as it is not computing depth with respect to
an actual center.
In general it appears that the heaviertailed the distribution, the less effective percentile
modifications are (although the Laplace distribution has greater kurtosis than the t(10)).
Intuitively, this may be because the purpose of the modification is to concentrate the test
on the tail, and a very heavytailed distribution will likely have plenty of points from both
samples in the tail in spite of a scale difference. Also, the Mahalanobis depths themselves
19
may be inaccurate in the presence of many outliers.
In increasing the sample size an obvious increase in power is observed across the board,
but not much change in the graphs’ trends in r. It seems that sample size makes little
difference to the effectiveness of percentile modifications (although this may not be true for
very small sample sizes).
To summarize, some gains in power are observed for certain distributions, but they are
small, especially in relation to what Gastwirth observed in one dimension. The gains seem
to disappear as dimensionality increases, which may reflect the decreasing “importance” of
the distributional center in high dimensions.
5.2 Comparison with other tests
Tables 2 and 3 below, and 59 in the appendix compare the type I error and power respectively
of the modified multivariate SiegelTukey test (empirically maximized with respect to r), to
the power of Box’s M test and the F product test for the same data.
Type I error estimates are reported before power estimates, because high power is nothing
to get excited over if type I error is also high.
The α value used in both cases was 5%; this means that, if the two samples are generated
with the same covariance matrix, we expect the empirical type I error to be close to 0.05. If
this is not the case, it implies that the test statistic does not really follow the specified null
distribution.
For the multivariate SiegelTukey and our modified SiegelTukey tests, the empirical type
I errors were invariably very close to the target value of 0.05, so they are not reported. This
is, of course, expected since the test statistic is distributionfree.
For Box’s M test, the empirical type I errors were much higher than 0.05 when the data
were not normally distributed. This was true to a lesser degree for the F product test.
Indeed, for t(3), Cauchy and Laplace distributed data, and for t(10) data in high dimen
20
p n1 (= n2 ) Underlying Distribution
Normal t(10) t(3) t(1)
(Cauchy) Double Exp.
2 50 100 200
0.0424 0.0462 0.0481
0.109 0.123 0.133
0.543 0.640 0.718
0.974 0.990 0.997
0.284 0.307 0.323
5 50 100 200
0.0409 0.0455 0.0484
0.175 0.213 0.236
0.859 0.940 0.975
1.000 1.000 1.000
0.593 0.647 0.671
10 50 100 200
0.0381 0.0434 0.0462
0.285 0.380 0.439
0.984 0.998 1.000
1.000 1.000 1.000
0.921 0.955 0.968
Table 2: Empirical type I errors of Box’s M test
p n1 (= n2 ) Underlying Distribution
Normal t(10) t(3) t(1)
(Cauchy) Double Exp.
2 50 100 200
0.0488 0.0509 0.0496
0.0927 0.0947 0.0969
0.251 0.278 0.303
0.432 0.451 0.450
0.163 0.166 0.170
5 50 100 200
0.0499 0.0487 0.0492
0.115 0.123 0.126
0.284 0.317 0.342
0.449 0.464 0.461
0.206 0.212 0.215
10 50 100 200
0.0503 0.0504 0.0678
0.144 0.154 0.186
0.306 0.340 0.382
0.456 0.474 0.474
0.251 0.261 0.263
Table 3: Empirical type I errors of F product test
�
21
sions, the type I errors are so high as to make Box’s M test meaningless. Thus, without
even looking at power estimates it is safe to say that the multivariate SiegelTukey test is
vastly superior to Box’s M test if the normality assumption does not hold.
From Table 5, it appears that for normally distributed data, the multivariate Siegel
Tukey test outperforms Box’s M test, but not the F product test, in terms of power. This
establishes, at least on an empirical basis, that the SiegelTukey test is superior to Box’s M
for both normal and (at least some) nonnormal data. The F product test is to be preferred
of the three for normal data, as long as the onesided hypothesis on covariance determinants
is sufficient for the application in question.
Even under some nonnormal circumstances, the F product test may be preferred to the
SiegelTukey. For instance, when n = n = 50 and p = 2 for t(10) or Laplace data, the power 1 2
of the F product test is more than twice that of the optimal SiegelTukey. The type I errors
of the F product test for these two circumstances are roughly 0.09 and 0.16 respectively.
Thus, as long as it is understood that the actual type I error is somewhat higher than what
the quantiles would indicate, the F product may be the better test to use. However, if one
desires to fix the type I error, our test is preferable.
It should be noted that the estimated power and type I errors of the Box’s M and F
product tests under Cauchy data (for which moments are undefined) had larger standard
errors than the rest of the estimates; this is why they do not always increase as dimension
and/or sample size increase.
6 An example
A data set given by Johnson and Wichern (1988, p. 261262) is examined here using Box’s M ,
Fproduct, W and Vn tests. The data set originally was taken from Jolicoeur and Mosimann
(1960), who studied the relationship of size and shape for painted turtles. This data set
22
contains measurements on the carapace of n1 = 24 female and n2 = 24 male turtles.
The marginal scatter plots of the variables in Figure 6 suggest location and scale differences
between groups. To apply a scale test, we first shift the data in order to have the same
center. A depthdepth plot (see Parelius et al 1999) and chisquare quantilequantile plots
(cf Johnson and Wichern 1988) reported in Figures 18 and 19 respectively, suggest that the
assumption of multivariate normality is plausible.
We consider a test for H0 : σ = (1, 1, 1)T , where σ is the trivariate scale vector discrimi
nating the two distributions. This is done using four different tests namely Box’s M , Fproduct,
multivariate SiegelTukey (MST) and its percentile modified version (PMST). The pvalues
are reported in the following table. All pvalues suggest that the data does not support the
Box’s M MST PMST with r = 0.6Fproduct
0.00142 0.00013 0.00429 0.00127
Table 4: Pvalues of various multivariate scale tests for turtle data
null hypothesis of equal scales which can be seen in Figure 17. The lowest Pvalues is given
Fproduct followed by PMST with r = 0.6, which is consistent with our simulations in section
5.
●
●
●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●●
0.2 0.4 0.6 0.8 1.0
0.2
0.4
0.6
0.8
1.0
Female turtles
Depths based on turtle data
Dep
ths
base
d on
sim
ulat
ed N
orm
al d
ata
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
0.2 0.4 0.6 0.8
0.2
0.4
0.6
0.8
Male turtles
Depths based on turtle data
Dep
ths
base
d on
sim
ulat
ed N
orm
al d
ata
Figure 18: Depthdepth plot of turtle data vs. simulated normal data
23
Figure 17: Scatter plots of Length, Width, and Height versus observation index
24
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
0 2 4 6 8 10
02
46
810
Female turtle
Squared mahalanobis distances
Chi
−sq
uare
qua
ntile
s w
ith d
f=3 ●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
0 2 4 6 8 10
02
46
810
Male turtle
Squared mahalanobis distancesC
hi−
squa
re q
uant
iles
with
df=
3
Figure 19: Chisquare quantilequantile plot of turtle data
7 Conclusion and future work
In conclusion, the multivariate SiegelTukey test using Mahalanobis depth outperforms Box’s
M test in terms of power for normal data, and in terms of type I error for tdistributed data.
It is also preferable to the F product test for nonnormal, heavytailed data, particularly
in high dimensions.
Extending Gastwirth’s percentile modification approach to this test improves the power
for normal and Laplace data, especially in two dimensions but to a lesser extent in five and
ten. It improves the power for t(10) data in two dimensions. For all other cases tested, the
modification was counterproductive.
While the improvements were in no case comparable to those obtained by Gastwirth’s
univariate modification, in any situation where computational cost is not a major issue, and
the data are not too heavytailed, this modification is recommended to improve the power
of what is already a good test.
However, the failure of the modification for Cauchy (and t(3)) data provides an incentive
to investigate the effects of using robust estimates for µF
and Σ . One possibility is to use F
25
the reweighted minimum covariance determinant (RMCD) estimates (cf. Rousseeuw and
van Zomeren 1990). A fast algorithm to compute RMCD estimates is given by Rousseeuw
and Van Driessen (1999). This algorithm is implemented in standard statistical softwares
such as R.
There may be other ways to improve the modification; for instance, instead of discarding
a fixed proportion of deepest points, one could discard all points with a depth value greater
than some threshold. Of course, a test statistic in this case would no longer be distribution
free.
8 Appendix
p n1 (= n2 ) Test
Basic SiegelTukey
Modified SiegelTukey
Box’s M F product
2 50 100 200
0.195 0.352 0.616
0.213 (r = 0.5) 0.393 (r = 0.4) 0.671 (r = 0.4)
0.141 0.290 0.568
0.358 0.576 0.832
5 50 100 200
0.451 0.760 0.968
0.459 (r = 0.6) 0.769 (r = 0.7) 0.970 (r = 0.7)
0.143 0.346 0.729
0.633 0.890 0.993
10 50 100 200
0.736 0.967 1.000
0.736 (r = 0.9) 0.967 (r = 1.0) 1.000 (r = 0.8)
0.122 0.359 0.811
0.859 0.991 1.000
Table 5: Empirical power of all three tests for Normal data
26
p n1 (= n2 ) Test
Basic SiegelTukey
Modified SiegelTukey
Box’s M F product
2 50 100 200
0.171 0.299 0.538
0.179 (r = 0.6) 0.314 (r = 0.6) 0.561 (r = 0.6)
0.229 0.372 0.594
0.387 0.558 0.778
5 50 100 200
0.324 0.581 0.871
0.326 (r = 0.9) 0.581 (r = 0.9) 0.871 (r = 1.0)
0.332 0.544 0.797
0.600 0.808 0.955
10 50 100 200
0.462 0.768 0.971
0.462 (r = 1.0) 0.768 (r = 1.0) 0.971 (r = 1.0)
0.463 0.722 0.925
0.760 0.929 0.995
Table 6: Empirical power of all three tests for t(10) data
p n1 (= n2 ) Test
Basic SiegelTukey
Modified SiegelTukey
Box’s M F product
2 50 100 200
0.242 0.441 0.736
0.245 (r = 0.9) 0.443 (r = 0.9) 0.737 (r = 0.9)
0.652 0.783 0.883
0.554 0.665 0.769
5 50 100 200
0.377 0.664 0.926
0.377 (r = 1.0) 0.664 (r = 1.0) 0.926 (r = 1.0)
0.915 0.976 0.995
0.688 0.790 0.873
10 50 100 200
0.461 0.765 0.970
0.461 (r = 1.0) 0.765 (r = 1.0) 0.970 (r = 1.0)
0.993 1.000 1.000
0.785 0.872 0.938
Table 7: Empirical power of all three tests for t(3) data
27
p n1 (= n2 ) Test
Basic SiegelTukey
Modified SiegelTukey
Box’s M F product
2 50 100 200
0.247 0.432 0.685
0.258 (r = 0.8) 0.456 (r = 0.7) 0.727 (r = 0.7)
0.978 0.992 0.997
0.589 0.608 0.605
5 50 100 200
0.398 0.677 0.924
0.398 (r = 1.0) 0.677 (r = 1.0) 0.924 (r = 1.0)
1.000 1.000 1.000
0.668 0.681 0.679
10 50 100 200
0.478 0.777 0.972
0.478 (r = 1.0) 0.777 (r = 1.0) 0.972 (r = 1.0)
1.000 1.000 1.000
0.745 0.760 0.762
Table 8: Empirical power of all three tests for Cauchy data
p n1 (= n2 ) Test
Basic SiegelTukey
Modified SiegelTukey
Box’s M F product
2 50 100 200
0.259 0.471 0.773
0.272 (r = 0.6) 0.492 (r = 0.6) 0.791 (r = 0.6)
0.557 0.755 0.927
0.652 0.833 0.962
5 50 100 200
0.376 0.658 0.919
0.391 (r = 0.6) 0.676 (r = 0.6) 0.929 (r = 0.6)
0.823 0.944 0.993
0.828 0.956 0.997
10 50 100 200
0.436 0.731 0.956
0.458 (r = 0.6) 0.757 (r = 0.5) 0.969 (r = 0.5)
0.976 0.997 1.000
0.905 0.985 1.000
Table 9: Empirical power of all three tests for Laplace data
28
References
[1] Ansari, A.R., Bradley, R.A. (1960). RankSum Tests for Dispersions. Ann Math Stat
31:11741189.
[2] Bartlett, M.S. (1937). Properties of Sufficiency and Statistical Tests. Proceedings of the
Royal Society of London, Series A, 160:268282.
[3] Box, G.E.P. (1949). A General Distribution Theory for a Class of Likelihood Criteria.
Biometrika 36:317346.
[4] Chenouri, S. (2004). Multivariate Robust Nonparametric Inference based on Data Depth.
Ph.D thesis, Dept. of Statistics & Actuarial Science, University of Waterloo, Canada.
[5] Chenouri, S. & Small, C.G. (2004). A Nonparametric Multivariate Multisample Test
Based on Data Depth.
[6] Conover, W.J. (1999). Practical Nonparametric Statistics. John Wiley & Sons, Inc.
[7] Donoho, D.L. (1982). Breakdown properties of multivariate location estimators. Ph.D
thesis, Dept. of Statistics, Harvard University.
[8] Ferguson, T.S. (1996). A course in large sample theory Chapman and Hall, New York.
[9] Gao. Y. (2003). Data depth based on spatial rank. Statistics & Probability Letters 65:217
225.
[10] Gastwirth, J.L. (1965). Percentile Modifications of Two Sample Rank Tests. Journal of
the American Statistical Association, 60:11271141.
[11] Genz, A., Bretz, F. (1999). Numerical computation of multivariate tprobabilities with
application to power calculation of multiple contrasts. Journal of Statistical Computation
and Simulation 63:361378.
29
[12] H´ ak, Z. V. (1967). Theory of Rank Test. Academic Press, New York. ajek, J. and Sid´
[13] Hodges, J. L. (1955). A bivariate sign test. Ann. Math. Statist. 26, 523527.
[14] Hotelling, H. (1929). Stability in competition. Econom. J. 39 4157.
[15] Johnson, R.A. and Wichern D. W. (1988). Applied Multivariate Statistical Analysis.
PrenticeHall, Englewood Cliffs, NJ.
[16] Jolicoeur P. and Mosimann J. E. (1960). Size and shape variation in the painted turtle:
A principal component analysis. Growth, 24, 339354.
[17] Kotz, S., Kozubowski, T.J., Podgorski, K. (2002). An asymmetric multivariate Laplace
distribution. Technical Report No. 367, Department of Statistics and Applied Probability,
University of California at Santa Barbara.
[18] Liu, R.Y. (1990). On a notion of data depth based on random simplices. Ann. Statist.
18:405414.
[19] Liu, R.Y., Singh, K. (1993). A Quality Index Based on Data Depth and Multivariate
Rank Tests. Journal of the American Statistical Association, 88:252260.
[20] Liu, R.Y., Singh, K. (2006). Rank tests for multivariate scale difference based on data
depth. Data Depth: Robust Multivariate Analysis, Computational Geometry and Appli
cations, DIMACS Series, AMS, 1736.
[21] Mahalanobis, P.C. (1936). On the generalized distance in statistics. Proc. Nat. Acad.
India, 12:4955.
[22] Mardia, K.V., Kent, J.T., Bibby, J.M. (1979). Multivariate Analysis. Academic Press
Inc., New York.
30
[23] Oja, H. (1983). Descriptive statistics for multivariate distributions. Statist. Prob. Letters
1:327333.
[24] Rousseeuw, P.J., Hubert, M. (1999). Regression Depth. Journal of the American Sta
tistical Association, 94:388402.
[25] Rousseeuw, P. J. and van Zomeren, B. C. (1990). Unmasking multivariate outliers and
leverage points. Journal of American Statistical Association, 85 633639.
[26] Serfling, R. (2002). A depth function and a scale curve based on spatial quantiles. In
Statistical Data Analysis Based On the L1Norm and Related Methods (Y. Dodge, ed.),
pp. 2538. Birkhauser, Basel.
[27] Siegel, S., Tukey, J.W. (1960). A Nonparametric Sum of Ranks Procedure for Relative
Spread in Unpaired Samples. Journal of the American Statistical Association 55:429445.
[28] Singh, K. (1991). A notion of majority depth. Preprint.
[29] Small, C.G. (1990). A Survey of Multidimensional Medians International Statistical
Review 58:263277.
[30] Stahel, W.A. (1981). Robuste Schatzungen: Infintesimale Optimalitat and Schatzungen
von Kovarianzmatrizen. Ph.D thesis, ETH, Zurich.
[31] Tukey, J. W. (1975). Mathematics and picturing data. Proc. Intern. Congr. Math. Van
couver 1974 2 523531.
[32] Zuo, Y., Serfling, R. (2000). General Notions of Statistical Depth Function. Ann Stat
28:461482.