Post on 05-May-2022
STUDIES OF MULTINOMIAL
MIXTURE MODELS
by
Byung Soo Kim
A Dissertation presented to the faculty of TheUniversity of North Carolina at Chapel Hill inpartial fulfillment of the requirements of thedegree of Doctor of Philosophy in the Departmentof Statistics
Chapel Hi 11
April 1984
Approved by:
~-Jf!J17~
~~Reader
BYUNG sao KIM. Studies of Multinomial Mixture Models(Under the direction of Barry H. Margolin)
We investigate certain inferential aspects of mixtures of multi-
nomial distributions, both in nonparametric and parametric contexts.
As a nonparametric mixture model we propose a k-population finite mixture
of binomial distributions, which can be applied to the analysis of non
iid data generated from a series of toxicological experiments. A necessary
and sufficient identifiability condition for the k-population finite mix
ture of binomials is obtained. The maximum likelihood estimates (MLE's)
of the k-population finite mixture of binomials is computed via the EM
algorithm (Dempster, Laird and Rubin, 1977), and the asymptotic properties
of the MLE's are discussed. The identifiability condition is equivalent
to the positive definiteness of the information matrix for the parameters.
The MLE's and their sampling distributions, together with the data
mentioned above, provide an empirical check of the statistical procedures
proposed by Margolin, Kaplan and Zeiger (1981).
The Dirichlet-multinomial distribution, a parametric mixture of
multinomials, is discussed as a random group effects model for a one-way
layout contingency table. Interest focuses on testing the hypothesis of
no random effects. For this testing problem' Neyman's C(a) procedure
yields a new test statistic, which is aymptotically superior to Pearson's
chi-square test. This superiority is further evidenced by a Monte Carlo
simulation study. A duality between the C(a) statistic and the Catanova
statistic proposed by Light and Margolin (1971) is demonstrated.
The random effects model for the one-way layout contingency table is
extended within the framework of the Dirichlet-multinomial distribution
to a balanced nested mixed effects model, and two hypothesis testing
problems are investigated.
-e
•
ACKNOWLEDGEMENTS
I wish to express my deepest gratitude to my research advisor,
Dr. Barry H. Margolin, for his suggestion of this topic and for his
guidance and encouragement throughout the duration of this research.
I also would like to thank my committee members, Dr. Norman
Johnson, Dr. Gordon Simons, Dr. Doug Kelly, and Dr. David Ruppert,
for their careful reading of the manuscript and many valuable comments.
The financial support from the Statistics Department has been
indispensable; without it my stay in Chapel Hill would not have been
possible. Thanks are also extended to Dr. David Hoel of the National
Institute of Environmental Health Sciences for providing computer
facilities. Credit is also due to Mrs. Judy Harrelson and Mr. K. Doug
Vass for their excellent typing job.
I am especially indebted to my parents at home in Korea, who have
been praying for the successful completion of my study in Chapel Hill.
Finally, I would like to thank my wife Myung Sook, son Stephen, and
mother-in-law for their understanding and support •
TABLE OF CONTENTSPage
CHAPTER I INTRODUCTION AND SUMMARY . · · 1
1.1 The Binomial Distribution 1
1.2 Mixture Models of Count Data. · 3
1.3 Scope of the Thesis . · · · · · · 6
1. 3.1 The Finite Mixture of BinomialDistributions. · · · · · 6
1. 3. 2 The Dirichlet-MultinomialDistribution . · 10
1.4 Further Research. . · · 14
CHAPTER II FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS 15
2.1 Identifiability Problem 15
~e2.1.1 Preliminaries. · 15
2.1.2 I-Population Finite Mixture ofMultinomials · · · · · · · · 16
2.1. 3 k-Population Finite Mixture ofMultinomials · · · · · · · · · · 20
2.2 Estimation of the Mixing Distribution . 24
2.2.1 Preliminaries. · · · · · · · 24
2.2.2 Maximum Likelihood Equations 26
2.2.3 Asymptotic Distribution of theML Estimator . · · · · · · · · 31
2.3 k-Population Finite Mixture ofBinomials - Application. · · · 37
2.3.1 Description of the Ames Test 37
2.3.2 Statistical Analysis of theExperimental Data. · · · · · 38
e 2.3.3 Further Analysis of the DerivedData: Mixture Model · · · · · · 41
Page
2.3.3.1 k-Population Mixtureof Two Binomials. .. 41
2.3.3.2 Results of the Analysis 50
2.3.3.3 Discussion of theResults . . . . 53
CHAPTER III MIXTURE OF MULTINOMIAL DISTRIBUTIONS
3.1 Binomial Case .
68
68
-e
3.2 Random Effects Model of One-Way Layout. 77
3.2.1 Dirichlet-Multinomial Model. 78
3.2.2 Test of the Random Effects. 81
3.2.2.1 Case of P Known. 81
3.2.2.2 Case of P Unknown. 83
3.2.3 Approximate Null and AlternativeDistributions . .. .. 87
3.2.3.1 Approximate Null Distri-butions. . .. . 87
3.2.3.2 Approximate AlternativeDistributions. . . 91
3.2.4 ARE of X~ Relative to Tk. . 93
3.2.5 Monte Carlo Simulation: PowerCompari son.. . 95
3.2.6 Duality between C and Tk. .. 100
Appendix I: Wisniewski-type Alternatives 103
Appendix II: The Dirichlet-MultinomialAlternatives. . . . . 107
CHAPTER IV BALANCED NESTED MIXED EFFECTS MODEL .
4.1 Introduction ..
107
107
4.2 Test of the Nested Random Effects.. 108
Page
4.2.1 C(a) Test. . . . . . . . . . 109
4.2.2 ARE of X(3) Relative to T(3).. 114
4.3 Test of Equality of the Fixed RowEffects. . • . . . . . . . . . . . . 115
4.3.1 Wald Statistic and Chi-SquareStatistic . . . . . . . . . 117
4.3.2 ARE of a Test F Relative to aTest C. . . . . . . . . . . .. 122
4.3.3 Wald Statistic for Testing theEquality of Fixed Effects. 130
CHAPTER V FURTHER RESEARCH . 133
5.1 The Finite Mixture of Binomial Distri-butions. . . . . . . . . . . . . . .. 133
5.2 Tk Statistic as a Measure of Associa-tion . . . . . . . . . . . . . . . .. 138
-e 5.3 The Nested Random Group Effects Modelof Count Data. . . . .
BIBLIOGRAPHY....
140
145
-e
CHAPTER I
INTRODUCTION AND SUMMARY
1.1 The Binomial Distribution
Among the class of discrete distributions the binomial distribution
is by far the most widely used for bounded count data, while the Poisson
distribution enjoys the same role for unbounded count data. Since the
binomial and Poisson distributions share certain important properties
and there exist several interesting relations between these two distri-
butions, we include the Poisson distribution in the context of our
discussion of the binomial distribution. To introduce notation, these
two distributions are formally defined as follows:
Definition 1.1: A random variable X has a binomial distribution with
parameters nand p, denoted by X~ B(n,p), if
) n x )n-xPr(X=x = (x)p (l-p+for x=O,l, ... ,n, 0 < p < 1, and n E I,
+where I is the set of positive integers.
(1. 1)
For notational convenience
we write b(x;n,p) for the binomial mass function (1.1) and B(x;n,p) for
the corresponding distribution function.
Definition 1.2: A random variable X has a Poisson distribution with
parameter A, denoted by X~ P(A), if
Pr(X=x) = e-AAx/x!
for x=O,1,2, ... , and A> O.
(1 .2)
2
These two distributions possess a variety of desirable properties,
which, in part, account for their popularity. Both the Poisson family
+{P(\);\ > O} and the binomial family {B(n,p); n E I and known, O<p<l}
are one-parameter exponential families. Hence the ML estimator for the
single parameter is a sufficient statistic and achieves the Cramer-Rao
lower bound. If Xl ,X2, ... ,X k are independent Poisson random variables
such that Xi - P(\i)k
Xi given ~j=lXj = n
the DeMoivre-Laplace
for i=l, ... ,k, then the conditional distribution of
is B(n, \./\~ 1\0) for i=1,2, ... ,k. For large n,, LJ = J
limit theorem admits a normal approximation to the
-e
binomial distribution.
There also exist several interesting relations between Poisson and
binomial distributions. Among them we may note a result due to Chatterji
(1963) that if X and Yare independent nonnegative integer-valued random
variables such that Pr(X=x) > 0 and, Pr(Y=x) > 0 for x = 0,1,2, ... , and
the conditional distribution of X given X + Y is binomial, then both X
and Yare Poisson random variables. Another relation that is usually
referred to as the 'Poisson approximation of a binomial I is cited as a
lemma for (Feller, 1968, p. 153).
Lemma 1.1: Suppose that {Xn} is a sequence of random variables such that
Xn - B(n,Pn) and nPn -+ \ as n --+ 00, where 0 < \ < 00; then
(1 .3)
for x = 0,1,2, ... as n -+ 00 •
From the standpoint of this thesis, the most important property
shared by the binomial and Poisson distributions is that these two
distributions belong to a class of discrete probability distributions
that admit mathematically tractable mixture generalizations.
3
1.2 Mixture Models for Count Data
Definition 1.3: Let F(X;8) be a d-dimensional distribution function'" '"
indexed by an m-dimensional parameter vector ~ in a parameter space 8
and let G(~) be an m-dimensional distribution function. Then
is called a G mixture of F or simply a mixture, and F and G are referred
to as the kernel and the mixing distribution, respectively. If the
kernel in (1.4) is expressed in terms of a density function f(~;~), then
(1 .5)
(1 .6)
-e
is called a mixture density when (1.5) exists.
If the mixing distribution G is discrete with finite support, then
we call the resulting mixture a finite mixture. Using the notation in
Johnson and Kotz (1969) we may write H(~) = F(~;~) i G(~) for (1.4) and
h(~) = f(~;~) @G(~) for (1.5). The following definitions are included
for notational purposes.
Definition 1.4: A random variable X has a gamma distribution with
parameters A > a and r > 0, denoted by X '" G(A,r), if it has a prob
ability density function (p.d.f) fG(x) given by
f ( ) - Ar r-l -AXGX-rrxrx e
for x > 0, where
r(t) foo t-l -xd= x e xa for t > O. (1 .7)
Definition 1.5: A random variable X has a negative binomial distribu-
tion with parameters m and c, denoted by X'" NB(m,c) if
( -1Pr(X=x) = r x+c )
( -1x! r c )
-1
( cm )x( 1 Jcl+cm l+cm
4
(1 .8)
for x = 0,1,2, ... ,0 < m < 00, and 0 ~ c < 00; values for c = 0 are
understood to be evaluated in the limit as c + o.It is an easy exercise to show that a negative binomial distribution
can be obtained as a gamma mixture of Poissons, that is,
NB(m,c) = p(e) ~ G((mc)-l,c- l ) (1. 9)
Mixture models for count data have been formulated ever since the
limitations of the binomial and Poisson distributions to fit data were
first noted. In the binomial distribution the independence of n
Bernoulli trials and the constancy of the success probability p through-
out the n trials may be suspect in specific applications. For instance,
as is observed in Haseman and Kupper (1978), in certain animal studies
to investigate the toxicological effect of a compound there is a
tendency for implants from the same litter to respond more alike than
implants from different litters. That is, the litter-specific success
probability varies from litter to litter. Therefore, the independence
among implants in the same litter may not be maintained.
For the Poisson distribution the equality of mean and variance
places an important restriction on the applicability of the model in
practice. Thus, for example, Paul and Plackett (1978), and Margolin,
Kaplan and Zeiger (1981) study the effects of non-Poisson distributed
random variables, specifically negative binomial random variables, on
certain aspects of inference for the Poisson based model.
These limitations of the binomial or Poisson distributions are most
evident when there is clear 'heterogeneity' of count data; this in turn
has led to various considerations of alternatives or generalizations of
-e
5
those two important count data distributions, foremost among which have
been mixture formulations.
As early as 1915, K. Pearson (1915), having noted the 'hetero-
geneity· of the data, considered a mixture of two binomial distributions
as a model for the counts on yeast cells analyzed by Student (1907).
Greenwood and Yule (1920) considered accident data and found that a
gamma mixture of Poissons, which as noted is a negative binomial
distribution, gave a closer fit to the data than a single Poisson
distribution. In the same vein Skellam (1948) introduced a beta-bino-
mial distribution in the form of a beta mixture of binomial distribu-
tions, after observing that the association probabilities varied from
nucleus to nucleus in the analysis of the secondary association of
chromosomes in Brassica.
As the above historical examples of mixture distributions indicate,
there have been two distinct subclasses of mixture models. A parametric
mixture is defined to be a mixture in which the mixing distribution has
a specific functional form, whereas in a nonparametric mixture the
mixing distribution does not have a specific functional form. The
mixture of two binomials is an example of a nonparametric mixture and
the negative binomial and the beta-binomial are examples of a para
metric mixture (Johnson and Kotz, 1969).
The flexibility and generality of a mixture model for count data
are gained at the expense of simplicity and the attractive properties1
of the binomial or the Poisson model. This is w~ll illustrated in the
search for maximum likelihood estimates (MLE) of parameters from a mix-
ture model of count data; in most cases this involves iterative solu-
tion of a set of equations rather than a closed form solution.
·e
6
1.3 SCOPE OF THE THESIS
This thesis investigates certain inferential aspects of mixtures of
multinomial distributions, both in nonparametric and parametric mixture
models. In Chapter 2 a class of finite mixtures of binomial distribu
tions is proposed to model non-iid data generated from certain important
toxicological experiments, and the resultant implications are investi
gated. Chapter 3 combines studies of the goodness of fit test of the
binomial distribution against parametric mixture alternatives and the
development of a random effects model for count data in a one-way layout
contingency table by employing a Dirichlet mixture of multinomial
distributions. In Chapter 4 we discuss a balanced nested mixed effects
model based on a Dirichlet mixture of multinomial distributions.
Finally in Chapter 5 we suggest several problems for further research.
1.3.1 The Finite Mixture of Binomial Distributions
Besides K. Pearson's early attempt to fit Student's counts with a
mixture of two binomials (Pearson, 1915), one other application of this
approach is noteworthy. Neyman (1947) developed a finite mixture of
binomial distributions for the analysis of roentgenographic reading
results of tuberculosis tests. A set of four chest films of different
sizes was taken for each subject and each film was interpreted
independently by five expert radiologists. Neyman decomposed the patient
population into three categories: (i) entirely free from the disease,
(ii) moderately affected, and (iii) heavily affected; and associated
l-T, p, and 1 as the probabilities of correct diagnosis for the compo
nents (i), (ii), and (iii), respectively. Hence the number of positive
results among five independent reader diagnoses for a particular film
·e
7
follows a mixture of three binomial distributions, or a mixture of two
binomials if the component (iii) is entirely dropped from the model.
However, as Neyman (1947) indicated lIin reality we may expect that
the subdivision of human population is much finer than is postulated
here and that the category of 'moderately affected I splits into a
continuous graduation of the intensity of the illness, from very slight
to very heavy ... 11 Thus a 'finite' mixture may be regarded as a
'simplification', which appears to have been the primary motivation for
introducing the finite mixture of binomial distributions.
There is, however, truly a need for a finite mixture of two bino-
mials in the context of imperfect testing of various materials for
positive or negative evidence of a specified characteristic that is
either present or absent. Let ¢ be a set of r materials that have been
tested and let the i-th subset of ¢ consist of r i materials, indexed by
(i,j), j=l, ... ,ri , each of which has been tested ni times forki = 1,2, ... ,k. Clearly Ii=lr i = r. Let T and (l-p) denote the prob-
abilities of a false positive and a false negative, respectively, and
let ~ be the prevalence rate of the characteristic in question. We
assume independence among all I~=lniri tests. For the i-th subset of r imaterials, each of which has been tested ni times, we can summarize the
observed data in a random vector ~i = (X il ,X i2 ,·· .,Xiri
), where Xij is
the number of positive findings in the ni tests with material (i,j).
Under our assumptions, the i-th observation vector X, is now a random~,
sample from a mixture of two binomial distributions, that is,iid
X, 1,X'2'''''X, ~ TI(n,. p} + (l-TI) B(n",T} (1.10)" , r i
for i=l, ... ,k. We note that ~l '~2""'~k' the totality of observations,
8
do not constitute an independent and identically distributed (iid) data
set unless the nils are all equal.
Even though the mixture model (1.10) is the primary motivation for
the research to be discussed, we note the generalization of (1.10) to
mixtures of c binomial distributions, where c ~ 2 is known a priori.
Define Gc to be a class of all discrete distribution functions with c
atoms. Then we have
i i dXll ' ... , Xl r
l~ B(n l ,p) A G(p)
P
iidX21'···'X2r2 ~ B(n2,p) A G(p)
P (1.11)
·eiid
Xkl ,· .. 'X krk ~ B(nk,p) A G(p) ,P
where G E Gc .
Equation (1.10) is then the special case of (1.11) where c = 2. We call
the mixture model (1.11) a k-population finite mixture of binomial
distributions and define the parameter space of interest to be
Ic- l. 1 n < 1, O<p <P2< .. ·<P <l} .1= i 1 c
When k=l, the formulation (1.11) reduces to iid data from a finite mix-
ture of binomial distributions, which is referred to as a l-population
finite mixture of binomials. Thus the i-th finite mixture of binomials
in (1.11) has a mixture densityc ni x n.-x
h.(e) = \. In.( ) p.(l- p.) 11 ~ L.J = J x J J
(1.12)
·e
9
for x = O.l ni • where ~ = (·rr1... ·'TIc_1 ' P1'''.'Pc) and TIc =
1-Ij=1TI j • i =1, , k.
Our primary interest lies in the MLE of the mixing distribution G
in (1.11), which is equivalent to finding an MLE eof 8 in (1.12)..., ...,
because of the nonparametric nature of the mixing distribution G.
Before estimation is attempted, however, we need to verify that the
k-popu1ation finite mixture of binomial model in (1.11) is 'identifi
able'. Teicher (1963) showed that the 1-popu1ation finite mixture of
binomial distributions in (1.12), i.e., k=l, is identifiable if and
only if n1 ~ 2c-1. In Chapter 2 we extend Teicher's result to the
multinomial case and find the necessary and sufficient condition for
identifiability of a k-popu1ation finite mixture of multinomial distri-
butions.
The maximum likelihood approach to estimation of the mixing
distribution in the finite mixture problem has been discussed only since
the late 1960's when access to fast electronic digital computers made
it feasible. Hasse1b1ad (1969) developed a set of iterative equations
for obtaining MLE's of parameters in the 1-popu1ation mixture of members
of the exponential family. which was later recognized as a special case
of the EM algorithm (Dempster. Laird and Rubin. 1977). By extending
Hasse1blad ' s iterative equations to the case of the k-population
mixture of certain members of the exponential family. we obtain as a
special case an algorithm for the MLE ~ of ~ for the k-popu1ation
finite mixture of binomials model (1.11).
This theoretical framework of the k-population mixture of
binomials model is applied in Chapter 2 to the analysis of a sizable
database of Ames test data. gathered at considerable cost to the
·e
10
federal government. The Ames test is a bacterial test that detects
evidence of mutagenicity for chemical compounds. The observed Ames test
data in a given experiment, which are counts, can be transformed into
a 0 or 1 indicating a non-mutagen or mutagen, respectively, through
analysis via a family of mutation models described in Margolin, Kaplan
and Zeiger (1981). Analysis of the derived 0-1 experimental results
based on the k-popu1ation mixture of two binomi'a1s provides estimates
of (i) the prevalence rate of mutagens among the test compounds, (ii)
the false positive rate and (iii) the false negative rate, as well as
the standard errors of these three estimates. By providing these
estimates of various parameters of interest, the k-popu1ation finite
mixture model provides an empirical check of the operational properties
of the mutation models and statistical procedures proposed by Margolin,
Kaplan and Zeiger (1981).
1.3.2 The Dirichlet-Multinomial Distribution
Using an analogy to fixed effects and random effects in linear
models, it appears that almost all the methods for the analysis of
multi-dimensional contingency tables have focused on fixed-effects
models. Fienberg (1975) points this out and lists the development of a
discrete analog to the nested and random effects (Model II) ANOVA models
among the unsolved problems in the analysis of multi-dimensional contin-
gency tables.
In section 3.2 we develop a random effects model for the one-way
layout contingency table. Defineo \'1-1
Sp = {(Pl,· .. ,PI-l); 0 < Pi < 1, l.i=1 Pi < l} (1.13)
11
(1.14)
For our development, we need the following definitions.
Definition 1.6: A random vector ~ = (Xl , ... ,X I_l ) has a multinomial
distribution with nand £ = (Pl, ... ,PI-l)' denoted by X~ M(n,£) if
n I xiPr(~=~) = ( )IT'- l p. (1.15)
Xl""'X I 1- 1
o \1-1 \1-1for ~ E Sx and £ E Sp , where xI=n - Li=l xi and PI=l - Li=lPi .
~ ~
Notationally, m(~;n,£) and M(~;n,£), denote the multinomial mass function
(1.15) and corresponding distribution function, respectively.
Definition 1.7: A random vector ~ = (Ul , ... ,U I_l ) has a Dirichlet
distribution with ~ = (81" .. ,81), denoted by ~ ~ D(@) if it has a p.d.f
given by
·ef(B) 1-1 8i -l 1-1 81-1
= I ( IT u. )(l-I·-lu.)IT i =lf(8i ) i=l 1 1- 1
ofor u E S ,~ .':!
(1.16)
Iwhere 8i > 0 for i=l ,2, ... ,I, and B = Li=18i .
The Dirichlet distribution D(8) can be reparametrized so that it can be
denoted by D(E,e), where TIi = 8i /B and e = liB foroE = (TIl"" ,TI I_l ) ESE' A Dirichlet mixture of multinomial distributions
is called the Dirichlet-multinomial distribution, and denoted by
DM(n,E,e), that is,
(1.17)
Mosimann (1962) provides an extensive study of the Dirichlet-multinomial
distribution, thereby extending Skellam's work on the beta-binomial
distribution. Brier (1980) investigates the effect of the Dirichlet-
multinomial distribution on the chi-square test of a general hypothesis
12
in the one-way layout contingency table and shows that Pearson's
chi-square statistic is in fact asymptotically a constant multiple of a
chi-square random variable when the hypothesis is true.
Thus it follows that in a contingency table with I response
categories and G groups the Dirichlet-multinomial distribution
DM(n+ o ,TI,8) can introduce random group effects, since the j-th groupJ -
probability vector, say ej' is now randomly generated from D(n,8) and,
conditional on the observed ej' the j-th group response vector,
has a multinomial distribution M(n+.,p.), where n+. is the j-thJ -J J
size. In a handy notation this is described asi i d
£j ~ 0(;[,8)
(n . In+ . ,£ .) ~ M(n+ . ,£ .)~J J J J J
say ~j'
group
(1.18)
·e for j=1,2, ... ,G.
The primary concern of section 3.2 is hypothesis testing of the
presence of random group effects, which can be formulated as
Ho: e = 0 vs. Ha: 8 > O. (1.19)
For testing (1.19) we find that Neyman1s C(a) procedure yields a new test
statistic, denoted by Tk. The asymptotic relative efficiency e(X~ITk)
of the classical chi-square statistic X2 satisfiesp
(1. 20)
where the equality holds iff the group sizes {n+j};=l are asymptotically
balanced or G = 2. The superiority of Tk to X~ based on (1.20) is
further evidenced by a Monte Carlo simulation that compares the actual
performances of those two statistics in terms of their sizes and powers.
13
The formulation of the random effects model in the IxG contingency
table (1.18) is extended to a balanced nested mixed effects model in
Chapter 4. Using the conditioning arguments employed in (1.18), nested
mixed effects can be represented asind
(P·k1n.) ~ D(n.,S)~J ~J ~J
ind(n·k!n+·k,p·k) ~ M(n+·k,p·k)~J J ~J J ~J
for j=1,2, ... ,R and k=1,2, ... ,C,
(1.21)
where ~l' ~2" "'~R are fixed and correspond to R levels of the row
variable, and P'k corresponds to the k-th replication within the j-th~J
level of the row variable.
In the model (1.21) interest centers on the hypotheses of no nested
random effects and the equality of the fixed row effects, which are
·e respectively formulated as
H : 8 = a vs.r
and
(1. 22)
= ~R vs. ( 1. 23)
The C(a) procedure can be readily extended for problem (1.22); however,
for testing (1.23) two side questions can be raised.
(i) Are the Wald statistic and the Pearson's chi-square statistic
asymptotically equivalent in the presence of nested random
effects?
(ii) What is the cost of analyzing the balanced nested mixed effects
model as if it were a crossed mixed effects model?
Complete answers to those two questions are not available for general I,
R, and C; however, based on the results for I = R = 2 and general C in
·e
14
section 4.3 the answer to (i) appears to be yes. The answer to (ii)
appears to be sizable when the group sizes {n+jk} exhibit 'reasonable'
departures from balance. Finally in section 4.3 a Wa1d test is con
structed for testing (1.23).
1.4 FURTHER RESEARCH
In Chapter 5 four problems for further research are discussed.
(i) Study of the uniqueness of the MLE ~ of a finite mixture of
binomial distributions.
(ii) Development of the likelihood ratio test in the finite mixture
problem for testing Ho: c=l versus Ha: c=2, where c refers to
the number of components of a population.
(iii) Use of Tk statistic as a measure of association.
(iv) Development of a random effects model in a two-way layout
contingency table.
·e
CHAPTER II
FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS
This chapter focuses on problems relating to the k-population
finite mixture of binomial distributions, such as identifiability,
estimation of the mixing distribution and the asymptotic covariance
matrix, and the asymptotic distribution of the ML estimator. Finally
an example will be presented together with extensive numerical
analyses.
2.1 IDENTIFIABILITY PROBLEM
2.1.1 Preliminaries
Estimation of the mixing distribution in any mixture problem is
meaningful only if the mixture distribution is 'identifiable'. Early on
K. Pearson (1894) treated this problem for the case of a mixture of two
normal distributions; later Feller (1943) observed that any mixture of
Poisson distributions was always identifiable due to the uniqueness
property of the Laplace transform.
Teicher (1963) pursued the study of identifiability in the case of
finite mixtures, including the finite mixture of binomials. A portion
of his development of this topic is summarized below.
*Let Gc be the class of all discrete distribution functions with at
most c atoms, and let HF be the class of all finite mixtures of F given
by
HF = {H(x); H(x)
= {H(x); H(x)
= fOF(x;8)dG(8), G
= F(x;8) A G(8)} .8
16
Definition 2.1: If H is considered as the image of the map of G,
·e
then HF is said to be identifiable if and only if this map defines a
*one to one map of Gc onto HF.
Teicher (1963) found a necessary and sufficient identifiability
condition for the class of all finite mixtures of binomial distributions
with n fixed, which is stated as a lemma.
Lemma 2.1 (Teicher, 1963): Let B = {B(x;n,p); 0 < p < l} be a
one-parameter family of binomial distribution functions, n being fixed.
A necessary and sufficient condition that the class
*HB = {H(x); B(x;n,p) A G(p), G E Gc}P
is identifiable is that n ~ 2c - 1.
2.1.2 l-Population Finite Mixture of Multinomials
In exploring another dimension of the identifiability problem,
Chandra (1977) related the identifiability of the class of mixtures of
multivariate distributions to the identifiability of the class of
mixtures of the corresponding marginals. In what follows, G is defined
to be a class of arbitrary distribution functions. Let X. ~ F.(o;e.)1 1 ~l
for i=l, ... ,k and let ~ = (xl' ... ,xk) ~ F(o;~). Then Chandra (1977)
in his theorem 2.1 showed that the identifiability of the class
HF. = {Hi(x); Hi(x) =1
for all i=1,2, ... ,k
F·(X;8.) A G·(8.), G· E G}1 -1 6 1 _1 1
implied the identifiability of the class
HF
= {H(x); H(x) = F(x;8) A G(8), G E G}- ~
•
17
Chandra's theorem permits an immediate extension of Teicher's
results to yield a new identifiability condition for the class of finite
mixtures of multinomial distributions .
Lemma 2.2: Let M{x;n,p) be the distribution function of a multinomial'" '"
distribution with parameters (n'E)' where £ = (Pl , ... ,Pr ), Pi > 0, and
\r P = 1. Then the classLi=l i
*HM= {H(~); M(~;n,£) EG(£), G E Gc}
is identifiable if and only if n ~ 2c - 1.
(2.1)
·e
Proof. Let Gi(Pi) be the marginal distribution function (d.f) of
G(Pl , ... ,P r ) with respect to Pi' Then the marginal d.f. Hl of Xl can
be obtained as
Hl (x) = f f .. · f dH(xl , ... ,xr )(_co,x] ><z X
r
1 fl f= fo'" O[ f · .. I dM(~;n,p)JdG(Pl'''''P )(_co,x]><Z X
r'" r
= I6 .. ·!6( I dB(xl;n,Pl))dG(Pl'''''Pr )(_oo,x]
1 1= fo .. ·foB(x;n,Pl)dG{Pl'·"'Pr )
= B(x;n,Pl) A Gl(Pl)'Pl
where the interchange of integrations in the second step can be
justified using the result in Neveu (1965, p.77).
Similarly X. '" B(x;n,p.) A G.(p.) for i=l, ... ,r. Since G,- is a, , Pi ' ,
marginal distribution of Pi' the number of atoms, say ci ' is less than
or equal to c. Thus n ~ 2c - 1 implies n ~ 2c. - 1 for i=l , ... ,r.,Hence by lemma 2.1 each class of mixtures of binomial distributions is
identifiable. Thus by theorem 2.1 of Chandra (1977), HMis identifi
able if n ~ 2c - 1.
18
For the necessary condition we prove the contrapositive. Suppose
n < 2c - 1. Thus it suffices to show that there exist two different
mixing distributions, say G1 and G2 in G, giving rise to a common
mixture distribution. Consider G1 and G2 whose c atoms are
2i = (p, ... ,p, qi' 1-(r-2)p-qi) for i=l, ,c with corresponding prob-
abilities n = (nl, ... ,n ) and 2 +' = (p, ,p, q +., 1-{r-2)p-q +.)~ C Cl Cl C1
for i=l, ... ,c with corresponding probabilities n* = (nc+l , ... ,n2c )'
respectively, where ql, ... ,qc' qc+l, ... ,q2c are all distinct. To
prove the result we need to demonstrate the existence of ~ and n* such
that
But (2.2) is equivalent to
L~:16.M(x;n,p.) = a for all ~ ,1- 1 ~ ~1 .-(2.3)
for suitable choices of 6. Is. Since M(n,p) has the probability1 ~
generating function {tlPl+···+ tr_1Pr_l+(1-L~:~Pi)}nwhen E = (Pl,···,Pr)'
(2.3) is equivalent to
\~:16.{tlP+... + t 2P+t lq·+[1-{r-2)p-q.]}n = a (2.4)L1- 1 r- r- 1 1
for all t = (t1,···, t r _l ) .
Since (2.4) holds identically in t, (2.4) is equivalent to
L~~16i(1+up+Wqi)n = a (2.5)
( \r-2for all u,w), where u = L;=lti-l and w = tr_l-l.
Now, (2.5) holds if the following homogeneous linear equations have a
nontrivial solution;
·e
19
[1 1-
rSj1
p p p P
ql q2 'q3 q2c 82
p2 2 2 p2p p
pql pq2 pq3 pq2c 83 = 0 (2.6)2 2 2 2
ql q2 q3 q2c
p3 p3 p3 p3
2 2 2 2P ql P q2 P q3 • P q2c
2 2 2 2pql pq2 PQ3 PQ2c
82cQ3 3 3 3
Q2 Q3 . . . Q2c1
We rewrite (2.6) as £~ = Q, where £ is an (n;2) x 2c matrix and ~ is a
2cxl vector. After deleting the linearly dependent rows in £' (2.6)
can be reduced to
1 1 1 -81l
Ql Q2 • • • Q 622c2 2 2 = 0 (2.7)Ql Q2 •• • Q2c '"
l;2~n n n
Ql Q2 Q2c
e
20
which we denote by gf = Q. In order to have a nontrivial solution in
(2.7), rank(Q) < 2c. But this is guaranteed since n+1 < 2c, and hence
rank(Q) ~ min(n+1 ,2c) = n+1 < 2c. Thus nontrivial values of ~ and
hence of nand n* can be found to satisfy (2.2).
2.1.3 k-Popu1ation Finite Mixture of Mu1tinomia1s
o
..
·e
Suppose we observe k sets of independent random variables
X. l 'X. 2' ... ,X. generated from M(n.,P) 1\ G(P) for i=l, ... ,k and1 1 lr i 1 ~ P ~
* ~G E Gc' In the following discussion we first define the identifiability
problem of the k-popu1ation finite mixture model in general and then
specialize it to the multinomial context. Let
be a class of k-vectors, each of whose elements is ad-dimensional
distribution function indexed by a point f2. E R~ in a Borel subset R'fof Euclidean m-space Rm such that each element F.(x:e) of the vector1 ~ ~
.E(k)C~:f2.) is measurable in Rdx R~. Then a vector of mixtures
H1C~ F1C~:f2.) g G(f2.
H2C~) F2(~:f2.) ~ G(f2.)
= (2.8)
*is the image of the map of G E Gc'
21
Definition 2.2: Let H~k) be the class of k-population mixtures of
F(k) induced by the above mapping. Then H~k) is said to be identifiable
if and only if this map is one to one from......G: onto H~k)
We call ~(k)(~) a k-population finite mixture a~d denote with
corresponding small letters the set of marginal probability density
functions if they exist.
We now specialize the argument to the multinomial context. Let
M(k) = ((~1(~;nl'£), M(~;n2,£), .. ·,M(~;nk'£)); £ = (Pl, ... ,Pr-l)'. \'r-lo < Pi < 1 for l=l, ... ,r-l, Li=lPi < l}
be a class of vectors of k multinomial distribution functions,
nl ,n2, ... ,n k being fixed, and let
M(~;nl ,E) pG(EJ
M(~;n2'£) £G(£)
H(k) ={ • G G*}M ' E: C
·e*H. = {M(x;n.,n) A G(n), G E GclJ ...... J.c. p .c.
and
Then we prove
j=l, ... ,k
Lemma 2.4: Suppose Hj is not identifiable. Then any Hi such that
n· ~ n. is not identifiable, and there exists at least one common pair1 J
(G l ,G2), G11G2 that are mapped to a given mixture for all i such that
n. ~ n ..1 J
(2.9)
22
*Proof. Non-identifiability of Hj implies that for Gl , G2 E Gc with
Gl ' G2 we have a common hj(~) such that
1 1hj(~) = fo ... Jo m(~;nj,£)dGl(£)
f l Jl= 0'" 0 m(~;nj,£)dG2(£)
However, by lemma 2.2, (2.9) is true if and only if n. ~ 2c-l. Now, weJ
define
and
+S(n,r) ={(tl, ... ,tr ); tid U{O}, for all i, \~ It. = n}l.1= 1
•
o ' + rS (n,r) ={(tl, ... ,tr ); tid U{O}, for all i, Li=lti ~ n},
where r+ is the set of positive integers.
Then
·e1 1 n. Xl
h.(~)= fo ... fo( J )PlJ xl"",xr
I I 1 x.+y.f f rrr- p . 1 1 dG (p)
O' .• 0 i =1 1 1 ~
for (xl, .. ·,xr ) Eo s(nj,r), where 't = (Yl'''·'Yr-l)·
Consequently (2.9)
xr XL (-1)£( r)
£=0 9,
is true if and only if
for (xl' ... ,xr ) E s(nj,r),
23
where
( . )]J , (V l '··· ,vr _l ) i=1,2,
..
and v = (vl '···,v 1) .~ r-
Hence (2.9) is true if and only if
(1) _ (2)11 (vl'· .. ,vr _l ) -]J (vl'oo.,vr _l )
ofor v E S (n., r) .~ J
Thus the same result will follow for ni ~ nj with the same (Gl ,G2). 0
Lemma 2.5: IF there exists j ~ 1 such that Hj is identifiable, then
H(k) is identifiable.M
Proof. Suppose H~k) is not identifiable. Then there exist two~
*different mixing distributions Gl ,G2 E Gc such that
M(~;ni'£) £Gl (£) = M(~;ni'£) ~ G2(£)
for i=l, ... ,k with the same (Gl ,G2).
Consequently no H. is identifiable., J
for all x
o
Theorem 2.1: A necessary and sufficient condition that the class
H~k) of all k-population finite mixtures of multinomial distributions
be identifiable is
max n. > ~c-l.
l~i~k ' -
Proof: Without loss of generality we assume that nl~n2~... ~nk.
Suppose nl < 2c-l. Then by lemma 2.3 we have
Hl(~) = M(~;nl'~) pGl(~)~
= M(~;nl'~) £G2(~)
(2.11)
24
*for G1 f G2, where G1,G2 c Gc.
Hence by lemma 2.4 we still have
H.{x) = M(x;n.,n) A G1(n)1 ~ ~ 1,.(, P ,.(,
~
= M{~;ni'E) pG2(E)
for i =1, ... , k.
Thus H(k) is not identifiable.M~
(2.12)
..
The other direction is a direct application of lemma 2.5. 0
2.2 ESTIMATION OF THE MIXING DISTRIBUTION
2.2.1 Pre1 iminaries
Many methods have been suggested for the estimation of the mixing
distribution in the 1-popu1ation mixture model, ranging from the method
of moments to a formal maximum likelihood (ML) approach to methods based
on numerical analysis technique and minimum distance methods. See
Pearson {1894, 1915), Rider (1961a, 1961b), and Blischke (1962, 1964) for
the method of moments, Kabir (1968) for the numerical analysis technique,
Choi and Bulgren (1968) and Deely and Kruse (1968) for the distance
methods, and Hasselb1ad (1969), Sundberg (1974), and Dempster, Laird and
Rubin (1977) for the ML approach.
The rationale for concentrating on the moment estimator, or on the
distance method, rather than the ML approach had been that the latter
method yielded 'highly' intractable equations. However, it was not until
the late 1960's, with the emergence of fast electronic digital computers,
that the ML approach was suggested in various forms for incomplete data.
Observations from a finite mixture are considered incomplete, because the
component in the population from which each observation originates is
·e
25
unknown. Thus Hasselblad (1969) developed a set of iterative equations
for obtaining estimates for the l-population finite mixture of members
of the exponential family, and Orchard and Woodbury (1972) proposed
the missing information principle (MIP) for a problem that originated
from estimating genotypic frequencies from phenotypic frequency data.
Sundberg (1974) considered a ML approach for incomplete data when the
iid data came from an exponential family member and suggested a
fundamental set of formulae for the current iterative computational
approach to obtaining the ML estimator. Under the assumption of
positive definiteness for the information matrix, he also obtained the
consistency, asymptotic efficiency and asymptotic normality of the ML
estimator.
The statistical analysis tools for incomplete data were finally
unified when Dempster, Laird and Rubin (1977) (henceforth abbreviated
DLR) suggested an EM algorithm, which included as special cases Orchard
and Woodbury's MIP and Hasse1blad's iteration equations for the finite
mixture problem. The 'E' in EM stands for the expectation step,
which consists of estimating the complete data sufficient statistic by
constructing the conditional expectation of the complete data given
the observed incomplete data and the current fit of the parameter. The
'M' implies a maximization step, which takes the estimated complete data
and estimates the parameters by the ML methods as though the estimated
complete data were the observed data. The EM algorithm is defined by
cycling back and forth between these two steps.
In their paper DLR showed that the likelihood is nondecreasing at
"each iteration of the EM algorithm. Recently Wu (1983) presented an
elegant study on the convergence of the EM algorithm, which was not
26
clear in the original work of DLR. One of Wu's primary results that
is relevant to our mixture model is that if the unobserved complete data
specification can be described by a curved exponential family or satis
fies a mild regularity condition (condition (10) in his paper), then
all the limit points of any EM sequence are stationary points of the
likelihood function. Also it has been recommended by various authors
(Hasselblad (1969), Wolf (1970), Laird (1978) and Wu (1983), among
others) that several EM iterations be tried with different initial
values to minimize the chance of possible entrapment at a stationary
point but not a local maximum.
Recently Louis (1982) aided applicability of the EM algorithm by
developing an implementation based on the complete data gradient and
the second derivative matrix to find the observed information matrix.
In many cases Efron and Hinkley (1978) have shown this to be a more
appropriate measure of the covariance matrix than the traditional
approximation 1(8), where e is a maximum likelihood estimator and 1
is the Fisher information matrix.
2.2.2 Maximum Likelihood Equations
In this subsection we extend Hasselblad's iterative equations for
obtaining the ML estimator of the l-population finite mixture of an
exponential family to the case of a k-population finite mixture.
Hence we can obtain the ML estimator of the k-population finite mixture
of binomial distributions as a special case of the resulting algorithm.
Let (X, 1,X'2""'X, ) be a random sample from the i-th population" '~i
distribution hi(x), which is a mixture of c component distributions,
that is,
h.(x) = \~ 1 TI.f. .(x;8.) ,1 LJ= J lJ ~J
27
(2.13)
where f.J.(X;8.) is the j-th component distribution in the i-th popula-
1 ~J
tion and is assumed to belong to an s-parameter exponential family,
TI. is a mixing proportion for the j-th component andJ
~j = (81j , ... ,8Sj ) is the parameter vector.
Thus
•
·e
f .. (x;8.) = A.(x)C.(81 ·, ... ,8 .)ExP[81 ·Tl (x)+ ... +8 .T (x)J1J ~J , , J sJ J sJ s
for i=l, ... ,k and j=l, ... ,c.
Define ~ = ((Xll,· .. ,Xlrl ), (X21""'X2r2)"",(Xkl""'Xkrk)) ,
TI = (TI1, ... ,TI 1)' TI =l-\~=ll TI., and 1/1 = (TI,81, ... ,8c)'~ c- c LJ - J ;t. - -
Then the log-likelihood L*(x;~J) of the k-population data becomes~ ~
r l cL*(ZS;;l;) = I£=llog{Ij=l TIjAl (xl£)Cl (8 l j"" ,8sj )ExP[8ljTl (xl £)+ ... +
8sjTs (X H )]
r2 c+ I£=llog{Lj=l 1TjA2(x2£)C2(8lj"" ,8sj)ExP[81jTl(x2£)+"'+
8sjTs (x2Q,)J
(2.14)
(2.15)
If Cl ,C2"",Ck are differentiable, then the ML equations can be
derived as follows;
28
aL* _ r1 'ITjflj(xu) tlog C,+ Tp(XH~88
pj- IQ,=l h1(x1Q,} 88 pj
r2 'IT j f 2j (x2Q, ) tlog C2+ Tp(XU~+ LQ,=l h2(x2Q,} 88pj (2.16)
for p=l, ... ,s and j=l, ... ,c,
and
·e
for j=l, ... ,c-l.
We assume there exists a real valued function C(~j) such that
(2.17)
8 log C. (A.)1 ~J
8 8 .PJ
for p=l, ... ,s, j=l, ... ,c,
= n.1
8 log C(e . )~J
d 8 .PJ
(2.18)
where the nils are constants. It can be easily checked that the
binomial, Poisson, normal and gamma distribution G(A,ri ) with known
r. satisfy the condition (2.18).1
Setting equations (2.16) and (2.17) equal to zero yields the
following ML equations for p=l, ... ,s and j=l, ... ,c;
(2.20)
29
d log C(8 j) =
a8 .PJ
Tf. [rl fl·(xlQ) ~2 f2 .(x2£) r2 fk.(X k£)]Tf. = --l- I J + 2: J + + I J -
J r+ £=1 hl(xl £) £=1 h2(x2£) ... £=1 hk(xk£)'
kwhere r+ = Ii=l r i .
Next, consider the case where each of the k populations has a single
(component) distribution instead of a mixture distribution. If the
corresponding ML equations have a closed-form solution for 8p' we may
use that closed-form solution of 8p for the solution of equation (2.19)
and achieve a major simplification in computation. Thus let L*(x:~) bes ~ ~
the log-likelihood of ~ when the number of components c is equal to 1.
Then
+(2.21 )
when
(0, , ... ,8 ) = 8 = 8, .s ~ ~
30
(2.22)
Under assumption (2.18) a set of ML equations is given byr l r 2 r k
= L£=lTp(x l £)+I£=lTp(X2£)+· .. +L£=lTp(xk£)
nlrl+n2r2+···+nkrk
ologC(e)d6p
for p=l ~ ... ~s.
If equation (2.22) has a closed form solution for 6p~ say
6p = gp (tl ' ... ~ t s ) ~ p=1 ~ ... ~ s . (2.23)
where
(2.24)
then equations (2.18) can be written as
(2.25)
·ewhere
t .OJ (2.26)
for 0=1 ~ ... ~ s ~ j =1 ~ ... ~ c .
If we denote the estimate of the parameter ;J:, at the v-th iteration by
~(v)~ then the t .I S in (2.26) can be evaluated by using ~(v). The new- ~ -estimates are given by (2.25) as
(v+1) _ ( (v ) (v ) (v) )6pj - gp t lj ~ t 2j ~ ... ,tsj .
Similarly (2.20) can be updated as
(2.27)
(v) r r(v+l) TI. 1
TI. =.-L-lLJ r+ £;1
for j=l ~ ... ~c ~
(2.28)
·e
31
where the superscript (v) implies that the reference quantities are
evaluated at ~(v).
Thus we can use equations (2.27) and (2.28) as a basis for the iterative
algorithm. In the language of DLR (2.26) corresponds to the E-step
and (2.27)-(2.28) corresponds to the M-step in the EM algorithm.
We note that in the k-population finite mixture of multinomials
case only the largest ni determines the identifiability; hence there may
be elements in the k-population finite mixture that, marginally, lack
identifiability. For estimation purposes, however, data from all
k-populations are used even though some of them may not be marginally
identifiable.
2.2.3 Asymptotic Distribution of the ML Estimator
For the asymptotic normality of the ML estimator ~ for a
k-population finite mixture of binomial distributions we rely on the
usual maximum likelihood asymptotic theory. Cramer (1946) showed that
under certain regularity conditions the likelihood solution is con-
sistent, asymptotically normal, and asymptotically efficient. Cramer's
proof was extended by Chanda (1954) to the multivariate case. Chanda1s
proof of the uniqueness of the consistent root of the likelihood solu-
tion is not correct. A correct version is provided by Tarone and
Gruenhage (1975). In a more general setting Sundberg (1974) provides a
maximum likelihood asymptotic theory for incomplete data from an
exponential family member, which employs Chanda's extension of Cramer's
conditions. However, it is not clear in his proof that Sundberg checked
Tarone and Gruenhage's conditions that need to be added to Chanda's
conditions.
a10gL = 0 12ka8 ' r= , , ... , ,r
·e
32
Lemma 2.6 (Chanda (1954)): Suppose f(y:~) is a probability density
law; ~ = (81, ... ,8k) is the unknown parameter vector and Yl'Y2'··· 'Yn
are n independent observations on y. The likelihood equations for
estimating ~ are given by
nwhere log L = L log f(Yi ;~).
i =1
Let ~O be the unknown true value of the parameter vector ~, which exist
at some point in the region 0. Then if the conditions (i)-(iii), below,
hold, there exists a unique consistent estimator e corresponding to a~n
solution of the likelihood equation. Further In(s -80) is asymptotic-~n ~
ally normal with mean 0 and covariance matrix 1(~o)-l, where 1(~O) is
the Fisher information matrix.
Condit ion (i): For almost all y and for all ~ E 0
a10gf a210gf and a310gfa8r a8 r a8s a8r aesaet
exist for all r,s,t = 1,2, ... ,k.
Condition (ii): For almost all y and for all ~ E 0
where
JOOHrst(y)f dy < M< 00
_00
and Fr(Y) and Frs(Y) are bounded for all r,s,t = l, ... ,k.
Condition (iii): For all ~ ~ 0 the matrix
1(8) = foo (alogf] (alogfJ~ f d_00 a8 r a8 s y
is positive definite.
33
For the positive definiteness of the information matrix of the
k-population mixture of c binomials when c is known apriori we prove
the following:
Lemma 2.7: Let ~ = (TIl ,TI2,···,TIc_l' Pl ,P2""'Pc) and let the param
eter space 0 be given byc-l
0= {(TI1,· .. ,TIc-l,Pl'''''Pc): O<TIi<l, i=l, ... ,c-l .L1TIi < 1,1=
o < Pl < P2 < ••• < Pc < l}
Let ~O' the true parameter value be contained in some closed regionAlG which does not contain the boundary values of 0.
If the random variable Y = (Yl , ... ,Y k) where Y. = (Y·l, ... ,Y. )~ - - ~1 1 1r.
1
for i=1,2, ... ,k is distributed following the probability mass function
(2.29)
-ek
h (y:e) = IIY - - . 1~ 1=
where TIc = 1
r.1
II [TIlb(y .. ;n·,Pl)+TI2b(y .. ;n',P2)+... +TI b(y .. ;n.,p )]j=l 1J 1 1J 1 C 1J 1 c
c-l- L TI. ,
. 1 11=then the information matrix
l-r k r. alog h·(Y· .:8)] [k r. alog h.(y .. :8)]·-J= E \' \' 1 1 1J ~ \' \' 1 1 1J ~ (2)e q=lLj=l ae Li=lLj=l ae J' .30~O ~ ~
"
where h.(y .. :e) = \,ct lTI b(y .. ;n·,pt) for j=1, ... ,r1·, i=l, ... ,k,1 1J ~ L = t 1J 1is positive definite if and only if the identifiability condition
max n. > 2c-I holds.l.:s.i~k 1-
Proof. We first prove the positive definiteness of the information
matrix, say 1.(8), contained in a single observation Y from1 ~
ISuch a region always exists, because by the definition of the mixture ofof c components the boundary values of 0 are not true parameter values.
34
For convenience we write
(2.31)
and
(2.32)
The first partial derivatives of h.(y;e) are given by, ~
(2.33)
Since 1.(e) is a dispersion matrix, it is positive definite unless there, ~
exists r l= (Yl"" 'Y2c-l) f Qsuch that
-e \,2c-lL.£=l Yi
alogh.(y:e)1 ~
= 0 (2.34)
for y = O, ... ,ni .
The set of equations (2.34) constitutes a set of homogeneous linear
equations-1o Ay = 0 <=> Ay = 0h. ~ ~ ~ ~,, (2.35)
where Dh = diag(h.(O), ... ,h.(n.)), and A is a (n,.+1)x(2c-l) matrix. , , ,,whose y-th row appears in (2.33). Since the sum of each column of A is
equal to zero, we can delete anyone row from A in finding the solution
of (2.35) and let A* denote the resulting matrix. When ni =2c-l, the
matrix A* is nonsingular for ~ E 8, which can be shown by elementary,
but tedious column operations that transform A* into an Echelon form.
Consequently only a trivial solution exists for! in (2.35) if and only
if ni ~ 2c-l. Hence l i (e) is positive definite if and only if ni~2c-l.
35
Now. for the information matrix 1(~) contained in Y=(Yl •...•Yk)
we can decompose I(Q) as
(2.36)
Since Xl .···.Xk are independent and within the i-th population
yil •...• yir. are i.i.d for i=l •...• k. (2.36) readily follows. Suppose1
max n. = nl ~ 2c-l. Then Il(~) is positive definite. Hence bylsi::;k 1
(2.36) 1(~) is positive definite.
To prove the necessary condition we assume max n. < 2c - 1. and1::; i::; k 1
solve the following homogeneous linear equations;
-e (2.37)
for Yij = 0.1 ..... ni and i=l, .... k.
Since (2.37) admits the number of equations less than 2c-l. nontrivial
solution for r exists. Hence 1(~) is not positive definite. 0
Now. we prove the large sample property of the ML estimator
~ = (TIl'" .• TIc_l ' Pl.··· .Pc) based on I~=lri= r+ observations.
2Theorem 2.2: Let Q. 8 and 8 be defined as in lemma 2.6. Let ~O be
the true parameter value contained in 8. If the random variable
2Similar arguments can be found in N. Kiefer (1978). who considered amixture of two normal distributions in the switching regression.
36
Y = (Yl, ... ,Yk), where Y. = (Y.l, ... ,Y. ) for i=l, ... ,k, is distributed~ ~ ~ ~" , r.
1
according to the probability mass function
with maxl:;:;i:;:;k
n. ~ 2c-l, and if,
-e
then for large enough r+, there exists a unique consistent root
~r of the likelihood equations and ;r:{~r -~O) is asymptotically+ +
normally distributed with mean zero and covariance matrix (~ 1{~O))-l.+
Proof. The proof consists of verification of Chanda's conditions
(i) ~ (iii), modified for the one-way layout nature of our data together
with two extra conditions of Tarone and Gruenhage (1975). The condition
(iii) is readily verified by the use of lemma 2.7. Verifying conditions
(i) and (ii) involves straightforward differentiation. It can be easily
seen that ahi/a~ and a2hi/a~a~' are all continuous functions on 8,
hence they are bounded. Now using the relation that
a9-nh. ah. 1,= -'aes aes ~,
2 a2h. ah. ah.Zl £n hi 1 1, , 1
aGsaGt-~ aesaet - aes ae t h?, ,
3 ah. ah. ah. a2h. ah. ah. a2h.a .Q,nh i 1 , 1 1 , , 1 1 , 1= 2 38 h~ - aes h? - aes aesaet 0assaetasu aet aeu aesaets , , ,ah. a2h.
_1 +a3h. 1, 1 ,
-~a8u a8sa8 t h~ a8sa8t d8 u, ,
37
Chanda's condition (i) and (ii) are easily verified. The two extraA 2 1 a£nh.
conditions, that 8 is a convex subset of R c- and that -~ ands
are continuous for all ~ E 8, are readily verified.
2.3 k-POPULATION FINITE MIXTURE OF BINOMIALS-APPLICATION
2.3.1 Description of the Ames Test
o
..
Since publication of the paper by Ames et ~ (1975), the Ames test
has gained worldwide use for investigation of a chemical's mutagenicity.
Its extensive use in studies of genetic toxicology is due to the test's
sensitivity in detecting mutagens, its economy both with respect to
time and material, and the well-documented link between carcinogenicity
and mutagenicity. The Ames test is based on a very sensitive
bacterial test. The bacterial test uses several genetically constructed
histidine-dependent (auxotrophic) Salmonella typhimurium strains that
can be reverted to histidine independence (prototrophy) by a wide
variety of mutagens. This bacterial test is adapted for use in detect
ing chemicals that are potential human carcinogens or mutagens by add-
ing homogenates of mammalian liver, which is a convenient source of the
activating enzymes that are an important aspect of mammalian metabolism.
Ames et ~ (1975) reported that about 85% of known animal carcino
gens had been detected as bacterial mutagens and among 106 known
non-carcinogens few were mutagenic in the test. Many Salmonella tester
strains have been developed by Ames and his colleagues; among them
TA 98, TA 100, TA 1535, and TA 1537 are most commonly used. As indi-
cated earlier, if a tester strain is hit by a mutagen, then it may be
reverted to prototrophy. Since prototrophic strains are capable of
38
synthesizing histidine, an essential amino acid, they continue growing
and dividing without an external supply of histidine, whereas auxo
trophic strains, being entirely dependent on an external supply of
histidine, cannot sustain growth. Thus if at least one auxotrophic
bacterium reverts to its prototrophic state through mutation, there
will be continuous growth of its descendants after exhaustion of the
external supply of histidine. Thus mutagen-induced and spontaneous
revertants ultimately yield colonies that are visible to the human
eye.
It has been observed frequently that the toxicity effects of the
chemical increasingly outweigh its mutagenicity effects beyond certain
dose levels. Thus, toxicity must be considered as a competing risk
vis a vis the mutation process.
2.3.2 Statistical Analysis of the Experimental Data
The experimental data consist of the results of 763 compounds,
where the experiments followed the standard protocol of the Ames
test. Four tester strains, TA 98, TA 100, TA 1535, and TA 1537, were
employed, and three levels of metabolic activations were considered by
adding (i) no enzyme, (ii) liver homogenate from a hamster, and
(iii) liver homogenate from a rat, respectively, to each of a set of
three petri dishes. Each compound for each of the 12 combinations of
four strains and three metabolic activations was tested nat times, for
a = 1, ... ,763, t = 1, ... ,12. For each of the nat times the experiment
should have consisted of 18 petri dishes, i.e., 3 replicates at
control and 3 replicates at each of 5 dose levels, but there was
occasional loss of dishes due to breakage, extreme toxicity, etc.
39
For the analysis of the observed numbers of revertant colonies
in a single experiment, Margolin et !L (1981) suggested a family of
mechanistic models based on the biological formulation of the Ames
test. They also noted the existence of hyper-Poisson variability
among the replicated plate counts and advocated the use of a negative
binomial distribution. Their negative binomial model for the number
X~ of revertant colonies observed on a plate with environment ~ was
denoted by
(2.38)
·e
where ~ is shorthand for ~(~), ~ > 0 and c > O.
The variability in replicated plates is reflected in c; when c + 0
(2.38) reduces to the Poisson distribution through a standard limit
argument.
In order to disentangle the competing risks of mutagenicity and
toxicity, and hence to draw inferences regarding the mutagenicity,
Margolin et ~ (1981) modeled ~ as NOPo' where NO is the known
average number of microbes placed on a plate, which is large, e.g.,
108, and Po is the probability that any plated microbe yields a
revertant colony when the plate is exposed to dose 0 of a test
chemical. Among those models for Po considered, they suggested that
two models were of primary interest;
Po = {l exp[-(a+80)]}· exp(-yO)
Po = {l exp[-(a+SO)]}· [2 - exp(yO)]+
where [xJ+ = max(O,x), a ~ 0 is related to a spontaneous rate of
mutation, S ~ 0 is related to the induced mutation, and y ~ 0 is
(2.39)
(2.40)
40
related to the induced toxicity.
From (2.39) or (2.40) it can be seen that PD is a product of two
probabilities, one for mutagenicity and one for survival from toxicity;
hence PD represents the competing risks of the two. Moreover, a
chemical under study is mutagenic if and only if B > O. Thus one may
formulate the mutagenicity testing problem into a statistical hypo
thesis test by setting up the hypothesis as'
HO B = 0 «=> not mutagenic, or for brevity -)
Ha B >0 «=> mutagenic, or +)
with a significance level a. This significance level a is by
definition
a = Pr(judged +1 truly -) , (2.41)
·e the false positive probability assumed constant for each compound in
each experiment.
In each experiment a chemical is determined to be local-positive
iff [S/SE(S)J > c*, where Sand St(S) are obtained by the ML methods
based on 18 petri dish data under the negative binomial model (2.38)
and either (2.39) or (2.40). Under HO we may claim that
[6/51(6)J - N(O,l) (2.42)
Thus the critical value c* is determined by the given level of a. For
a = 0.05 and compound i we may obtain xit ' the number of local-positives
among nit experiments for each of 12 combinations of strain-activation
sets. For example, for chemical 1 (identification number) we observe
the following table:
41
~ None Hamster RatStrain
TA 98 0/1 2/3 3/3_.TA 100 0/2 0/2 0/2
TA 1535 0/1 0/1 0/1
TA 1535 0/2 1/3 2/3
Table 2.1The number of local positives out of
{nlt}~:l experiments for chemical 1.
where notationally yin implies y local-positives out of n experiments.
Lastly, Margolin and his colleagues (personal communication) combine
data from different experiments and reach a single conclusion and
determine a chemical to be positive if and only if there is at least
one repeated local-positive in at least one strain-activation set.
Thus in Table 2.1 or in any other such table for another chemical if
any cell contains the number of local-positives ~ 2, then that
chemical is considered to be positive. In what follows we refer to the
summary data in Table 2.1 obtained through the statistical procedures
described above as the derived data.
2.3.3 Further Analysis of the Derived Data: Mixture Model
2.3.3.1 k-Popu1ation Mixture of Two Binomials
The derived data in Table 2.1 admits further statistical analysis
that may be focused on the following three problems;
Problem (i): How to perform an empirical check on the operating
properties of Margolin et ~IS statistical procedures?
42
Problem (ii): What is the proportion of mutagens in the popula
tion of compounds tested?
Problem (iii): What is the power of detecting a true positive
chemical in this procedure?
This last problem reflects both the sensitivity of the Ames test as
well as the sensitivity of the statistical analysis. The derived data
can be arranged to yield a lower triangular two way layout for each
strain-activation set by counting the numbers of chemicals that have
O,1,2, ... ,n i positive results, respectively, out of ni experiments,
where n. = 1,2, ... , max(n t) for i=1,2, ... ,k.1 a,t a
To develop a suitable statistical model for problems (i)-(iii),
we may note several characteristics of the experiments and the derived
data;
(i) There are S compounds that have been tested from a
hypothetical set ¢ of compounds that have or will be tested.
Note, this is not the universe of chemical compounds.
Scientific judgement enters the selection procedure, so that
for example, H20 would not be tested nor included in ¢.
(ii) The tests adopted have a probability T of yielding
false positives, and a probability l-p of false negatives.
The latter is somewhat of a simplifying assumption, similar
to Neyman's (1947) diagnostic simplification, so as to permit
some analytic progress.
(iii) The set ¢ of compounds has a proportion TI of positive
chemicals.
(2.44)
(2.45)
43
(iv) For each strain-activation set, the chemical can be grouped
into sets such that the i-th set or r i chemicals has been tested
n. times for positive or negative evidence of mutagenicity, where1
n. = 1,2, ... , max(n t}, i=l, ... ,k.1 a, t a
The probabilities p and T can be described in the usual table:
~Result pos iti ve negativeState
of Naturepositive p l-~
negative T l-T
Table 2.2
For each strain-activation set the vector Xi=(Yil ,Y i2 , ... ,Yir .)1
of positive results of the i-th set of chemicals can be considered
as an observation from a mixture of two binomial distributions, i.e.,
r'r ~1 n. y.. n.-y.. n. y.. n.-y ..Pr(Y.=y.) = IT ITI( 1 )p 1J (1_p) 1 1J+(1_TI)( 1 h 1J (1_-r) 1 1J , (2.43)
-1 -1 j=l L Yij Yij
for i=l, ... ,k.
Using the simple notation we denote (2.43) as
{Y .. } '" {b(y .. ;n.,p)} 1\ G2
(p)1J 1J 1 P
fo r j =1, ... , r i' i =1 , .•. , k ,
where G2 is a discrete distribution function with two atoms and
,§ = (TI,p,T).
Now, (Xl' X2, ... , Yk) constitutes an independent, non-identically
distributed set of data with joint likelihood
k k r iIT h.(,l.;,tZ):=.: IT {IT [b(y .. ;n.,p) 1\ G2(p)]} .
. 1 1 1 . 1 . 1 1J 1 P1= 1= J=
Specifically, we havei i d
Y11 ' ... ,Y1r1~ h1(,©) = B(n1' p) PG2{p}
i i dY21 ' ••• ,Y2r 2 ~ h2{~} = B{n2' p} PG2(P).
44
(2.46)
·e
The reader wi 11 recognize this formulation to be a k-popu1ation mix-
ture of two binomials.
The problems (i) ~ {iii} above are tied to estimation of param-
eters TI, P and T of the k-population mixture of two binomials model
{2.43} and obtaining their sampling distributions. Particularly
Problem (i) affords an empirical check of the a priori assumption on
the size of the false positive probability, which was set to be 0.05
based on the large sample behavior {L.42}. Problem {ii} is identified
with estimation of the prevalence rate TI, and Problem (iii) may be
partially answered by studying the sampling distribution of the estima-A
tor p.
The identifiability condition of the class of joint distributions
of (Y1, ... ,yk) is given by Theorem 2.1, which says the class is
identifiable if and only if
max n. 2 3.hi<::;k 1
(2.47)
In the derlved data, max n.2 4 for each of the 12 combinations ofI::;i::;k 1
strain-activation set.
In order to find the MLE of 8 = (TI,p,T) we use the iterative
equations (2.27) and (2.28). We first define
= { 0
1
z..1.1
if Yij is from blYij;n i ,p)
if Y.. is from b(y .. ;n. ,T)1.1 1.1 1
45
(2.48)
·e
for j=l, ... ,ri , i=l, ... ,k.
Then (2.27) and (2.28) admit the following EM implementations;
Let
w.. = w(y .. ;8) = E(Z. ·IY .. ,8)1.1 1.1 ~ 1.1 1.1 ~
TIb(y .. ;n. ,p)= 1J 1
TIb(Y· .;n. ,p)+ll-TI)b{y .. ;n. ,T)1J 1 1.1 1
and
w~ "! )= w.. (y .. ; 8 ( \J) ) ,1J 1J 1J ~
where the superscript (\J) implies that the estimated quantity ;s
evaluated at the \J-th iteration step.
Then the parameter vector ~ = (TI,p,T) is updated by
P(\J+1) = k r i (\J ) k r i (\J )
L L w-. y .. / I I n.w ..i=l j:';1 1J 1J i=1 j=1 1 1.1
(2.49)
(2.50)
(2.51)
(v+ I)T (2.52)
(L.53)TI(\J+l) = 1 I .~\~\JJ.)' where r+ = I r-r+ i=l J=l i=l 1
The equations (Z.51)-(2.53) can be simplified by noting that the i-th
set of r i chemicals can assume values U,l , ... ,n i for numbers of posi-
tives. Hence
t=O,l, ... ,n i ·
Hence we have
frequencies, say {fit}' can be assigned to {Yij = t},
For all those y .I S , the w.. 's are equal, say to w1-t .. 1J 1J
the following simplified equations;
(\)+1)p
( \)+1)T
46
(2.54)
(2.55)
n.( ) 1
k 1
TT \)+1 = \ \ f- L L 'tw'tr+i=l t=O 1 1
(2.56)
Thus cycling back and forth between the equation (2.49) and the
equations (2.54)-(2.56) defines the eM algorithm for the k-population
mixture of two binomials model.
The observed information matrix for our model can be obtained
following Louis' EM implementation (Louis, 1982). Using EM terminology,
the complete data can be defined by specifying the component distri-
bution from which each observation is drawn.
·e Let
and
z. = (Z'1,Z·2""'Z. )~1 1 1 1 r i
t2.57)
where Z.. is defined in (2.48).1J
Then the complete data ~ = (~1'~2"" '~k) is defined as
X. = (Z., Y. L (2. 58 )~1 ~1 ~1
where .6i = (Xil ,··· ,Xir .), i=l, ... ,k.1
The likelihood of the complete data ~ suggests a two-stage
experiment, where first a component is picked by a Bernoulli experi-
ment, and then a binomial variate is generated. Therefore the log-
likelihood hX .. (o) for Xij is given by1J
47
h*X t x 0 • ; e)• 0 1J ~1J
(2.59)
fo r j = I , ••. , r i' i =1 , ... ,k.
Let S(Xij :&), S(~i:&)' and S(~:&) be gradient vectors ofr. k r.
h*x (x .. :&), I 1h*X (xo. :&), and I I 1
hx (x .. :&), respectively.ij 1J j=l ij 1J i=l j=1 ij 1J
Let BtX .. :8), BtX.:8) and B(X:8) be the negatives of the associated1J ~ ~1 ~ ~ ~
kS(~:&) = L
i =1
second derivative matrices. Since ~l '~2"" '~k are
k r iS(X.:8) = I 1. S(X .. :8),
~1 ~ . 1 . '1 1 J ~1= J=
independent,
-eand
k r i= I I BtX .. :8).
i=l j=l 1J ~
Thus we obtain
r~ h"X (x .. : 8)
p ij 1J ~
S(X .. : 8) = }- h"X (x 0 • : 8) =1J ~ P ij 1J ~
~ h*X (x. 0: e)uTI . 0 1 J ~_ lJ
and
y .. -n. pZ 1J 1ij p(l-p)
y .. -noT( I-Z ) 'lJ 1
ij Ttl-T)
Z.. -TI1J
TI( 1-TI)
(2.60)
B{X .. :8) =1J ~
Z..1J
2n.p -y .. (2p-l)1 1J
2 2(p - p )
o2n.T -yo .(2T-l)
( ) 1 1J1-Z. . 2 2
1J (T-T)
o
2Z.• -2'ITZ .. +'IT1J 1J
2 2'IT (l-'IT)
48
(2.61 )
·e
Hence the conditional complete data observed information matrix
becomes
k r i
Ix = L Li=l j=l
(l-w .. )1J
o
,,2 (")n.T -y .. 2T-l1 1.)
( " ,,2)2T-T
o
(2.62)
49
Following Louis ' development, the lost information, due to the
unobservable I. denoted by IllY' is obtained as
(2.63)
where S'(l:8) is the transpose of S(l:Q).
Also by the definition of the MLE §. which maximizes the incomplete
k r.data likelihood ITi=l ITj~l hitYij:~)' we have
t2.64)
Hence
(2.65)
k r.= \. 1\·11E"'e[S(X .. ;8)S'(X .. ;8)!V .. ]
L1= LJ= lJ - lJ - lJ
k r.- \. 1\·11S*(V .. ;8)S*'(V .. ;8)
L1 = LJ= lJ - lJ -
k r i k r i A
= [I I S*(V .. ;&)J[ I I S*(V .. ;e)J = O.i=l j=l lJ i=l j=l lJ
Thus by using equations (2.63) and (2.65), and the independence of
·e
After simple algebra it can be shown that the two terms in (2.66)
become
4Louis (1982) has a different expression for lxlv' which is algebraically equivalent to (2.66). However. we find that (2.66) was simpler tobe programmed.
50
Ee/{S (X .. ; 8) S I (x. 0; e) Iy .. ]lJ ~ lJ - lJ
[y .. - n opJ2lJ 1
Wij - p{l-p) o lo
and
w.. -2iTw . .+ilJ lJ
[ fT (1-fT ) ] 2
(2.67)
·ew. . -1[
lJfT(l-fT}
(2.68)
(2.69)
In the equation (2.67) the entries in the (1,3) and (2,3) positions are
equal to zero because of the EM equations (2.54) and (2.55), respect-
i vely.
Finally the observed information matrix Iy is obtained as
IX = IX - IX IX '
where IX is obtained in (2.62), and IXIX
is obtained by (2.66)-(2.68).
2.3.3.2 Results of the Analysis.
Two sets of derived data were obtained for further statistical
analysis. The first set of output data was based on the statistical
procedures by Margolin, Kaplan and Zeiger (1981) using the usual
significance level a = 0.05. This is now referred to as the stat-call.
·e
51
For the second set of derived data a senior toxico10gist's5 subjective
judgment based on his past experience yielded the decisions of whether
a compound being tested was local-positive or local-negative. Hence
there is no formal statistical procedure in the generation of the
second set of derived data. The second set of derived data is called
the Zeiger-call.
The stat-call and Zeiger-call data are presented in Table 2.3 in
lower triangular arrays for each strain-activation set. Tables 2.4.A
and 2.4.B display the corresponding MLE's and the inverses of the
observed information matrices Iy'S obtained by the EM algorithm6
procedure described in 2.3.3.1.
The total number of compounds tested in each strain-activation set
varies slightly (at most by 1) because some compounds were not tested in
certain strain-activation sets. In Table 2.4.B some MLE's were obtained
at the boundary of the parameter space, i.e., p = 1 for TA lOO-N and for
TA 1537-R. This yielded singular information matrices (see (2.62),
(2.66)-(2.69) for the singularity of a information matrix at p = 1.)
Since these estimated values do not belong to the parameter space at
the outset, they must be interpreted without benefit of a corresponding
standard error.
For the overall probability of a false positive, for a compound
when all 12 strain-activations are employed, we note the following
immediate but useful results.
50r . Errol Zeiger of the National Institute of Environmental HealthSciences, who actually supervised all the biological experiments thatyielded the experimental data.6Several sets of initial values were tried with the EM alqorithm. Itturned out that the EM algorithm leads to fairly stable stationary valuesof the estimator values with respect to various sets of initial values.
•
52
Lemma 2.8: Let T.· be the probability of a false local-positive forlJ
a compound in the i-th strain and j-th activation set, for i=l , ... ,4,
j=1,2,3, and let Tover be the overall probability of a false positive
for a compound. Then under the independence of the 12 combinations
of strain-activations,
Proof: 1 - Lover = Pr(judged negativeltruly negative)
= Pr(no repeated local-positives in any of 12
combinations of strain-activation setltruly
negative)
4 3 2= n n (l-T .. )
i=l j=l lJ
Thus by lemma 2.8 the MLE Tover of Lover becomes
(2.70)
o
(2.71)
where T.. by Theorem 2.2 has an approximate normal distributionlJ
with mean Tij and variance equal to the (3,3) entry of Iy(~)-l. The
distribution of Tover can be obtained under the independence of the
T.. 's when the compound is not mutagenic.lJ
3 4 12Lemma 2.9: Let hij}j=l ;=1 be reindexed as ht}t=l and assume
[1'" .,112 are independent. Then
V 12 . 2)J2 2)~(T -1 ) ~ N(O,r+Lt-l[2Tt n (l-Ts at '
+ over over - s1t
where Tover
is defined in (2.69) and a~ = Var(Tt )·
(2.72)
Proof. Let I = (Tl,· .. ,T12 ) and the MLE I be defined accordingly.
Then as a result of Theorem 2.2 and the independence assumption
-e
53
~ (~ - ~) ~ N(~,r+diag(0i, ... '0i2))' (2.73)
where diag(0i, ... ,0~2) is a diagonal matrix with diagonal entries
0~, ... ,0~2' Then by using the multivariate a-method (Bishop,
Fienberg and Holland, 1975, §14.6.3) the result follows. 0
Calculation of ~over and S~D(iover) for the stat-call and
Zeiger-call data are given in the table below:
A S:D(Tover )Tover
Stat-Call 0.0560 0.0010
Zeiger-Call 0.0013 0.0004
Table 2.5: Tover and its standarddeviation for Stat-Call andZeiger-Call data
2.3.3.3 Discussion of the Results
Margolin et ~'s statistical procedures described in subsection
2.3.2 assumed that for each set of 18 petri-dishes 8/St(s) was
distributed as N(O,l) based on the large sample theory. Based on
this normal assumption the cut-off value was determined for given
level of significance a = 0.05 for each experiment to test the
local-positiveness of the compound.
By noting that the significance level of a is equivalent to the
probability of a false-positive in each experiment (see t2.4l)) the
operating property of ~~rgolin et ~IS statistical procedures can be
checked against the sampling distributions of Tij for i=l , ... ,4,
j=1,2,3. For Table 2.4.A we extract the entries corresponding to
•
-e
54
T and var(i) and present them in the table below:
Strain- A A
(T .. -0.05)/S:O(T .. )Activation T;j S.O(Tij) lJ lJ
TA 98-N 0.0441 0.0290 -0.203
TA 98-H 0.0493 0.0235 -0.030
TA 98-R 0.0659 0.0187 0.850 ITA 100-N 0.0536 0.0289 0.125 I-
ITA 100-H 0.0969 0.0231 2.030
TA 100-R 0.1026 0.0293 1.795 ,TA 1535-N* 0.0788 0.0107 2.692
TA 1535-H 0.0762 0.0170 1.541
TA 1535-R 0.0607 0.0154 0.695
TA 1537-N 0.0584 0.0207 0.406
TA 1537-H 0.0533 0.0143 0.231
TA 1537-R 0.0659 0.0111 1.432
A
Table 2.6: T and SO(i) for each combination ofstrain-activation set.
In Table 2.6 we see that among the 12 Tij's, 10 have 0.95 confidence
intervals containing the value 0.05. Thus we may conclude that the
stat-call data provide eVidence that N(O,l) is a good approximation
of the tail distribution of ~/SE(S).
With biocnem;cal techniques, TA 98 and TA 100 were engineered
from TA 1535 and TA 1537, respectively, to have greater sensitivity
•
55
to mutagens. This fact is reflected in the dominance of a1s in
TA 98 over a's in TA 1535 uniformly with respect to the activation
sets. The same is true for TA 100 and TA 1537.
Investigation of the Zeiger-cal I data indicates that his
decision making is too conservative relative to tne conventional
range of statistical significance levels commonly employed in
scientific research .
Table 2.3 The Number of j Positive Results in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
56
Stat- Call Zeiger-Call
TA98-N
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 65 13 78 71 5 76
2 512 75 64 651 580 21 44 645
3 15 5 1 4 25 19 3 2 3 27
4 3 1 1 2 0 7 6 0 0 0 0 6
5 0 0 1 0 0 0 1 0 0 1 0 0 0 1
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
TA98-H
'e i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 40 21 61 54 4 582 482 88 83 653 563 15 72 6503 17 8 7 9 41 24 4 3 10 414 2 0 2 0 1 5 3 0 0 1 0 45 1 0 0 0 0 2 3 0 1 0 0 1 1 3
6 0 0 0 0 0 0 0 0 0 0 a 0 0 0 0
TA 98-R
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 47 12 59 54 3 57
2 478 96 80 654 569 17 64 6503 22 4 4 11 41 26 2 6 7 41
e 4 2 2 1 0 3 8 4 0 1 1 1 75 0 0 0 0 1 0 1 0 0 0 0 ·1 0 1
6 0 0 0 0 0 0 0 0 0 0 0 o ' 0 0 0 0
Table 2.3 The Number of"j Positive Results in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
57
• Stat- Call Zei ger-Ca 11
TA 100-N
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 36 11 47 39 2 41
2 409 135 101 645 541 25 79 645
3 24 9 9 9 51 33 11 0 6 50
4 3 3 3 5 -4 18 12 2...,
() 1 18.)
5 1 0 0 0 0 0 1 0 0 1 0 0 0 1
6 0 0 0 1 0 0 a 1 0 0 0 0 0 0 0 0
•TA 100-H
~e i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 11 8 19 16 2 182 394 129 154 677 530 21 123 6743 12 12 9 12 45 19 8 5 10 42
4 4 1 1 4 9 19 7 2 1 4 5 195 0 1 0 0 0 0 1 1 0 0 0 0 0 1
6 1 0 0 0 a 0 0 1 0 0 0 0 0 0 0 0
TA 100-R
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1
1 9 10 19 18 3 212 386 155 140 681 539 24 113 6763 15 9 7 13 44 25 -3 5 7 40
e 4 1 2 1 4 8 16 6 3 0 4 3 165 0 0 0 1 1 0 2 1 0 0 0 '1 0 26 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
Table 20 3 The Number of j Positive Results in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
58
Stat- Call Zeiger-Call
TA 1535-N
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.
1 84 12 96 93 0 93
2 489 85 42 616 569 14 34 617
3 22 7 1 8 38 26 2 2 5 35
4 5 3 1 1 2 12 6 2 1 2 0 11
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
TA 1535-H~e
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.
1 68 15 83 82 2 84
2 472 87 71 630 556 18 47 621
3 19 9 3 10 41 22 10 6 5 43
4 2 1 1 0 1 5 4 0 a 0 1 5
5 0 0 1 1 0 1 3 0 1 2 0 0 0...,,)
6 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
TA 1535-R
i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.
• 1 70 18 88 81 3 84
2 476 88 60 624 554 18 50 6223 16 10 7 7 40 26 -4 5 5 40
e 4 4 1 0 1 2 8 5 0 1 1 1 8
5 0 0 0 0 2 0 2 0 0 1 1 -0 0 26 0 1 0 0 0 0 0 1 0 0 0 0 ' 0 0 0 0
Table 2 0 3 The Number of j Positive Results in i Experiments in
Each Strain-Activation Set; Stat-Call and Zeiger-Call
59
Stat- Call Zei ger-Ca 11
TA 1537-N
"\" 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 ,J 1 1
1 94 20 '114 105 a 105
2 510 80 28 618 592 10 20 622
3 12 7 4 3 26 18 ? 2 3 25...
4 3 1 a 0 a 4 4 a a a a 4
5 0 a a a a a a a a a a a a a6 a a a a a 0 a a 0 a a a a a a a
TA 1537-H
·e i\j 0 1 2 3 4 5 6 r. 0 1 2 .3 4 5 6 r.1 1
1 76 23 99 95 1 96
2 528 68 37 633 579 15 32 626
.3 11 6 3 7 27 18 5 1 6 30
4 4 a a 0 0 4 4 0 a a a 4
5 0 0 a a 0 a a 0 a a a a a 0
6 a 0 a a a a 0 0 0 a a a a a a a
TA 1538-R
i\j 0 1 2 .3 4 5 6 r. 0 1 2 .3 4 5 6 r.1 1
• 1 87 12 99 92 3 95
2 512 73 48 633 583 10 33 626
.3 14 6 2 4 26 20 .6 a 4 30
e 4 1 3 a a 1 5 4 a a a 1 5
5 a 0 a a a a a a a a a a a a6 a a a a a a a a a a a a a a a a
" "" ,.. "Table 2.4.A: The ML Estimator ~ = (TI,p,l) and the Covariance
Matirx Iy(;)-l via EM Algorithm in Each Strain
Activation Set Based on the Stat-Call
60
TA 98-N
" " "TI P 1"
A. r0.1595 0.00423 -0.00972 -0.00181 1TI
" " I (~)-1
J
8 = p = l 0.7870 0.02415 0.00430y -A.
0.0441 0.000841" sym.
TA 98-H" " "TI P 1"
·e " 0.2282 ( 0.00233 -0.00336 -0.00103TI
" " (~) -18= p = 0.7973 I y = 0.00594 0.00162
" l s~.1" 0.0493 0.00055
TA98-R" " "TI P 1
"TI 0.1837 0.00143 -0.00247 -0.00061A. " Iy(~)-18= p = 0.8550 = 0.00548 0.00120
A.
1" 0.0659 syrn. 0.00035
A "'" " ,.,.
Table 2.4.A: The ML Estimator ~ = (n,p,T) and the Covariance
Matirx I y(Q)-l via EM Algorithm in Each Strain
Activation Set Based on the Stat-Call
61
TA 100-N
" " "n p T
" r0.3489 0.00368 -0.00310 -0.00159 1n
'" '" I (~)-1
J
e = p = l0.6887 = 0.00331 0.00141
'"y -
T 0.0536 syrn. 0.00083
TA 100-H'" '" A-
n p T
A-
0.3273 r0.00165 -0.00150 -0.00077.- n
A- A-(~) -1e= p = 0.8541 I y = 0.00193 0.00083
A- l 0.00053T 0.0969 sym.
TA 100-R" '" "n p T
A-
0.3428 0.00291 -0.00254 -0.00140n
" " Iy(~)-1e= p = 0.8070 = 0.00282 0.00133
'" 0.1026 0.00086T sym.
A '" " "Table 2.4.A: The ML Estimator ~ = (TI,p,T) and the Covariance
Matirx I y(2)-1 via EM Algorithm in Each Strain
Activation Set Based on the Stat-Call
62
TA 1535-N
'" '" '"TI P T
'" r0.0827 0.00032 -0.00100 -0.00011 1TI
'" '" I (~)-1
J
e = p = l0.9376 0.00538 0.00049
'"y -
T 0.0788 syrn. 0.00011
TA 1535-H'" '" '"TI P T4_ '" 0.1471 [0 0 00103 -0.00228 -0.00044TI
'" '" (~) -1e= p = 0.8953 I y t 0.00657 0.00115
'" l s~.T 0.0762 0.00029
TA 1535-R'" '" '"TI P T
'" 0.1751 0.00]15 -0.00187 -0.00042TI
'" '" Iy
(§)-l6= p = 0.7846 = 0.00444 0.00078
'"T 0.0607 syrn. 0.00024
" .I\. " "
Table 2.4.A: The ML Estimator Q = (iT,p,T) and the Covariance
Matirx Iy(~)-l via EM Algorithm in Each Strain
Activation Set Based on the Stat-Call
63
TA 1537-NA A A
iT P T
A r0.1034 0.00303 -0.00890 -0.00105iT
A A
I (~)-le = p = l 0.7027 = 0.02942 0.00313A
y -T 0.0584 sym. 0.00043
TA 1537-HA A A
iT P T
A 0.1034 r0.00093 -0.00289 -0.00036-_ iT
A A
(~)-le= p = 0.8472 I y = 0.01162 0.00127A
tT 0.0533 sym. 0.00021
TA 1537-RA A A
iT P T
A 0.0855 0.00038 -0.00129 -0.00014iT
A A
Iy(§)-l8= p = 0.9382 = 0.00673 0.00064
A 0.0659 0.00012T sym.
A A " ""
T bl 2 4 B- The ML Estimator _8 = (n,p,T) and the Covariancea e •• -
Matrix Ir(~)-l via EM Algorithm in Each Strain-
Activation Set Based on the Zeiger-Call
64
TA98-NA A A
n P T
A r 0.0962 r0.00047 -0.00170 -0.00016 1nA A
8 = P
l0.8274 I (~)-1
0.00917 0.00074
JA 0.0073
y -
lT sym. 0.00008
TA 98-H" A A.- n P T
A 0.1321 [ 0.00018 -0.00010 -0.00001n
A A(§) -18= P = 0.9365 I y = 0.00063 0.00005
A
0.0101 lT sym. 0.00002
TA 98-RA A A
n P T
A 0.1390 0.00022 -0.00024 -0.00003n
A A
Iy(~)-18= P = 0.8484 0.00145 0.00009
A
T 0.0010 sym. 0.00001
65
" A. " '"Table 2.4.B: The ML Estimator ~ = (n,p,T) and the Covariance
Matrix Iy(~)-l via EM Algorithm in Each Strain-
Activation Set Based on the Zeiger-Call
TA 100-N
'" '" '"n p T
'" r0.1150 1n
" '"8 = p = l1.0000 1y(~)-1
7
J
=
'"T 0.0348
TA100-H'" '" '"
-en p T
'" 0.2139 [0.00026 -0.00009 -0.00002n
'" '" (~)-l8= p = 0.9338 1y0.00038 0.00005
'" l syrn.T 0.0174 0.00003
TA 100-R'" '" '"n p T
'" -0.00010n 0.2018 0.00026 -0.00002
'" '" 1y
(§)-18= p = 0.9153 = 0.00048 0.00005
'" 0.0123 0.00002T sym.
7 When the ML estimator is obtained at the boundary of the parameterspace, usual asymptotic results fail to hold (see Chernoff, 1954,and Feder, 1968, 1978).
A " " A
T bl 2 4 B' The ML Estimator _8 = (TI,p,T) and the Covariancea e •• '
Matrix I y(§)-l via EM Algorithm in Each Strain-
Activation Set Based on the Zeiger-Call
66
TA 1535-N
'" A '"TI P T
A r0.0721 r0.00013 -0.00026 -0.00002 1TI
A A
8 = p = l0.8518 I (~)-l =
0.00272 0.00010
JA 0.0071
y -
l 0.00001T
sym.
TA 1535-HA '" '"TI P T
-e '" 0.0905 [ 0.00019 -0.00046 -0.00004TI
'" A (~)-l8= p = 0.8922 I y0.00339 0.00023
=
A 0.0203 l 0.00004T sym.
TA 1535-R'" '" '"TI P T
A 0.1254 0.00031 -0.00061 -0.00006TI
'" A
Iy(§)-l8= P = 0.7748 = 0.00316 0.00022
'" 0.0014 0.00003T
sym.
67
" " '" "T bl 2 4 B- The ML Estimator _8 = (TI,P,T) and the Covariance
a e ..-
Matrix Ir(~)-l via EM Algorithm in Each Strain-
Activation Set Based on the Zeiger-Call
TA 1537-N
'" '" '"TI P T
A r0.0418 0.00010 -0.00051 -0.00002 1TI
'" A
~ = P = l0.8622 Iy(~)-1 0.00728 0.00021
J
,A
T 0.0048 sym. 0.00001
TA1537-HA '" '"TI P T
-e A
0.0554 r0.00009 -0.00014 -0.00001TI
AA (~) -18= p = 0.9637 I y = 0.00151 0.00007
A
0.0136 l syrn.0.00001
T
TA 1537-R
'" 0.0534TI
A A
Iy
(§)-18= P = 1.0000 =8
'"T 0.0122
'"TI '"P '"T
a The ML estimator at the boundary of the parameter space does nothave the usual asymptotic results. See the footnote 7.
CHAPTER III
MIXTURE OF THE MULTINOMIAL DISTRIBUTIONS
In section 1 of this chapter discussion focuses on the goodness
of fit test for a binomial distribution against alternative paramI
etric mixtures of binomials. We show that the locally optimal test
for detecting extra-binomial variability from a mixture alternative
can exist in the one-way layout. In section 2, 'where the beta-bino-
mial discussed in section 1 is generalized, we develop a random
effects model for a one-way layout contingency table by employing a
Dirichlet mixture of multinomial distributions. We then discuss
three tests of whether the random effects are negligible.
3.1 Binomial Case
In the binomial distribution of James Bernoulli the n Bernoulli
trials are assumed to be independent and the success probability is
constant throughout these n trials. It has been frequently observed
in practice, for example in biological experiments, that either one
or both assumptions are not satisfied. Sometimes these observations
can be explained or understood by investigating the underlying
mechanism. For instance, for the number of cavities among children
the 'success' probability may differ from child to child due to
differences in nutrition and other factors; hence, independence among
those cavities in a child may not be maintained if the probability is
viewed as a random variable itself. For other examples of deviations
69
from the binomial conditions see Chatfield and Goodhardt (1970),
Griffiths (1973), Haseman and Soares (1976) and Haseman and Kupper
(1978) .
The inconstancy of the success probability caused heterogeneity
of Student's yeast cell count data (Student, 1907), which led K.
Pearson (1915) to consider a mixture of two binomials. A mixture of
two binomials has three parameters, which may not be sufficiently
parsimonious in a small data set as an alternative to the one-param
eter binomial distribution. In what follows we focus on a well-known
two-parameter generalization of the binomial, the beta-binomial
distribution, which can be described as follows:
Let X have a binomial distribution B(n,u), 0 < u < 1 and let U itself
be a random variable that has a beta distribution Beta(a,S) with ~aram-
-e eters a > 0 and S > 0, i.e.,
a > 0, 13 > 0
where
( ) f l a-l( )b-lBe a, b = OX 1-x dx, a > 0, b > O.
Since the mean and covariance of U are
(3.1)
(3.2)
Var(U) = p(l-p)/(a+B+l) ,
E(U) = a/(a+B) = p, (q = l-p)(3.3)
respectively, it is useful to reoarametrize a and B into
p = a/(a+B)
e = l/(a+S)(3.4)
Then e = 0 implies that Var(U) = 0, which reduces a beta-binomial
into an ordinary binomial distribution. With the new parametrization
(3.4) it can be shown that the marginal distribution of X can be
70
represented by
h(x) = (~)Be(% + x, %+ n-x)/Be(% ' %)(3.5)
G
X-l ~ r-X-l ~ /n-l= (~) IT (p+t8) IT (q+t8) IT (l+te).
x t=o t=O t=o
Using previous notation, (3.5) can be expressed as
x~ B(n,u) A Beta(p,8)u
(3.6)
·e
The beta-binomial distribution in the form of the beta mixture
of binomials in (3.6) appears to have been first introduced by
Skellam (1948). It may be noted, however, that the ~robability mass
function (3.5) of the beta-binomial distribution has been recognized
since 1923 as the Polya-Eggenberger urn model with stochastic replace-
ments, which includes the binomial and hypergeometric as soecial cases.
We refer to Johnson and Kotz (1977, Ch.4, 1969) for a detailed
description of the Polya-Eggenberger distribution, and for various
other nomenclatures of the beta-binomial distribution.
Since Skellam (1948) introduced the beta-binomial distribution,
many authors have emoloyed it in the analysis of biological data, most
note\'IortI1Y among them being: Kemp and Kemp t1956), Hill i ams (1975),
Crowder (1978 Feder (1978) and Segreti and t1unson (1931).
We may illustrate the difference between the ordina~y binomial,
the beta-binomial and other proposed binomial generalizations using
the following example.
In certain toxicological experiments with animals the outcome of
interest is the occurrence of a dead fetus in a litter that receives
a certain treatment. A typical example would be the dominant lethal
test (Haseman and Soares. 1976) to determine the mutagenicity of a
compound.
Let
rif the k-th fetus in the j-th 1itter receiving
Yij k = the i-th treatment is dead. (3.7)
O. othen-lise.
andn..
X.. = L lJ Y.. k (3.8)lJ k=l lJ
for k=l .... ,n ij , j=l, ... ,9-. i=l •...• 1.
Then by assuming that the litter-specific probability of fetal death
has a beta distribution we have
71
X.. ~ B(n .. , u.) A Beta ( p. , e.).lJ lJ 1 Ui 1 1
(3.9)
Now the beta-binomial model in (3.9) is partially analogous to the
random effects model under normal theory
(3.10)
where Y.. k* denotes the observed response, the n.. I S are randomlJ . lJ
effects that are independent N(~,0~) and the Eijk's are independent2N(O,02) errors of observation and are independent of nij.
In the random effects model (3.10) Y.· k and Y.. kl for k ~ kl are1J lJ
conditionally independent given nij but Yijk and Yijkl are uncondi-
tionally dependent with correlation 0i/(0f+0~). Hence with the beta
binomial model one introduces intra-litter correlation, but it is
always positive.
The correlated binomial model developed by Kupper and Haseman
(1978) is more flexible than the beta-binomial in the sense that it
can allow negative intra-litter correlation. Using Bahadur's tech-
nique (Bahadur, 1961) they 'corrected' the ordinary binomial to
incorporate the intra-litter dependence and arrived at a probability
mass function tp.m.f)
n. . x .. n .. -x ..h tx .. ) = (l J ) p.1 J q .1 J 1Jc 1J x. . 1 1
1J
72
where
{ 1 + T i 2 2 }2 2 [(x .. -n .. p.) +x .. (2p.-l)-n .. p.] ,2 1J 1J 1 1J 1 1J 1p.q.
1 1
(3.11)
and
Tl' = Co V(Y .. k' Y.. k' ) fo r k "f k I
1J' 1J
q. = 1-p ..1 1
The possible range of Pi = T/tPiqi) is calculated in Kupper and
Haseman (1978) for several choices of p. and n...1 1J
In the same vein A1tham (1978) obtained1 a multiplicative
generalization of the binomial using the multiplicative definition of
interaction for count data. Even though the multiplicative generali
zation has a remarkable property that it belongs to the two-parameter
exponential family, it may not be so easy to work with as the other
two-parameter generalizations.
For a goodness of fit test of a binomial against a mixture
alternative we may classify the data types into three categories
under HO:
1A1tham t1978) actually obtained two generalizations of the binomial.However, the additive generalization is equivalent to the correlatedbinomial discussed above.
(A)
(B)
iidA random sample: Xl, ... ,Xk "" B(n,p)
ind ( )Non-iid case, no replication: X1, ... ,Xk such that Xi .... B ni,p
for i=l, ... ,k, and all ni distinct.
73
te) One-way layout: i idXil,···,Xir .""' B(ni,p)
1fo r all i =1, ... , k.
(3.12)
·e
For case tAl Wisniewski (1968) showed that the test based on the
classical chi-square statistic
\'k ( A)2 AASA = Li=l Xi-np /npq,
wherekP = Li=l Xi/n and q = 1 - P,
is the locally most powerful (LMP) test having Neyman structure against
a wide class of mixture alternatives. It can be further shown that
the test based on SA is locally most powerful unbiased (LMPU) against
the same class of mixture alternatives. tSee Appendix I for the
proof. )
Potthoff and Whittinghill l1966.a) derived the LMP test against
the beta-binomial alternative in case (B) when ~ is known; their test
rejects the null hypothesis for large values of
SB = t L~=l Xi (Xi -1)+ ~ L~=l(ni-Xi)(ni-Xi-l).
When p is unknown in case (B), Wisniewski (1968) proposed a test based
on
S* = \'k (X A)2/ AAB Li=l i-niP pq,
where
(3.13)
Recently Tarone (1979) showed that the test based on S~ is the corre-
74
sponding C(a) test against the class of general mixture alternatives,
and hence it is asymptotically locally most powerful.
As will be shown below, the general mixture alternative suggested
by Wisniewski is broad enough to include both the beta-binomial, and
the correlated binomial when the correlation is positive. Hence fur-
ther discussion of the detection of extra-binomial variability from
mixture alternatives can be focused on the general mixture alternative
suggested by Wisniewski.
Definition 3.1: Suppose a random variable X has a mixture distribu
tion of the following form:
(3.14)
·efor 0 < u < 1, where U is a random variable having a p.d.f g(o)
with mean p and finite variance a2 Then the mixture distribution
(3.14) is called the Wisniewski-type general mixture.
Even though Kupper and Haseman (1978) derived the correlated
binomial in an attempt to 'correct l the ordinary binomial it is use-
ful to derive the p.m.f (3.11) of the correlated binomial when 8i > 0
from the Wisniewski-type general mixture of binomials. It is obvious
that the beta-binomial distribution belongs to the class of
Wisniewski-type general mixtures.
Lemma 3.1: Up to the order of a2, the ~.m.f of the Wisniewski-type
general mixture corresponding to (3.14) is equivalent to the corre-
1ated binomial distribution wnen the correlation is positive.
Proof. If we make a change of variable by putting U = p(l+cV),
where c = alp, (3.14) becomes
X ~ (~)pxqn-xJ'(l+cv)x(l_cPv/q)n-xg*(v)dV , (3.15)
75
where g*(o) is the p.d.f of the standardized random variable V, and
q = l-p.
Let h(x) denote the marginal p.rn.f of X. Then2
h(x) = (~)pxqn-x {l+ ~qP ~(~-l) + (n-x)~n-x-l) - n(n-li]
+ O(03)}
= (n)pxqn-x {l+ 02
[(x-np)2+x(2P_l)_np2]+0(03)}x 2p2q2
(3.16)
Thus after deleting 0(03 ) terms, (3.16) is equivalent to the corre-
lated binomial when the correlation is positive. o
·e
Recently, using the C(a) procedure, Tarone (1979) was led to the
same test statistic S8 in (3.13) against the correlated binomial,
beta-binomial and the Wisniewski-type general mixture alternative.
Tarone's result of having the same C(a) test statistic S8 against those
three alternatives is seen to be an immediate consequence of Lemma 3.1.
In detecting extra-binomial variability from mixture alternatives
the locally optimal test has been derived only for the iid case. This
is not, however, the situation for the closely related Poisson case.
Collings and Margolin (1983) derive a LMPU test that detects negative
binomial departure from the Poisson in the one-way layout case by
extending Potthoff and Whittinghill's result in the iid case (Pott
hoff and Whittinghill, 1966 b). In the following lemma we provide a
necessary and sufficient condition on the mixing distribution for the
existence of an LMPU test for extra-binomial variability from a mix
ture departure in a one-way layout.
Lemma 3.2: Suppose the mean success probabilities are unknown. Let
the null hypothesis HO and the alternative hypothesis Ha be represented
76
as follows;
j=l, ... ,ri , i=l, ... ,k
(3.17)
where Ul ,U2"",Uk are independent random variables and Ui has a
p.d.f gi(e) with mean Pi and finite vadance E;;cr~ for i=l, ... ,k.
Then the LMPU test of HO against Ha exists if and only if cr~ is a
constant multiple of p~q~ for i=l, ... ,k.
Proof. Using transformations Ui = Pi(l+ciVi ), where ci = .{"cr./p.1 1
(3.19)
(3. 17) becomesn. x.. n.-x.. x.. n.-x ..
X.. - ( 1) p. 1Jq.l l JJ(l+c.v.) lJ {1-C.p.v./q.) 1 lJg~(v.)dv., (3.18)lJ x.. 1 1 1 1 1 1 1 , 1 1 1 1lJ
where g~(e) is a p.d.f of V..1 1
Under the null hypothesis HO : E;; = 0, S(~) = (Xl+, ... ,Xk+),r.
where Xi+ = Lj~l Xij for i=l, ... ,k, is complete and sufficient for the
unknown parameter Q = (Pl, ... ,Pk)' We attempt to construct a test
having Neyman structure. The conditional likelihood of
~_= (~l"" '~k)' where ~i = (X i1 ,··· ,Xir .) for i=I, ... ,k, given the1
sufficient statistic S(~) is
k r i n. k n.r.II II ( 1 ) / II ( 1 1)
i=l j=l )(ij i=l xi+
under HO' andk
2 r.
A {l+~' crzi 2 ,1 2 2 3/2 }<" L L [(x .. -n.p.) +x .. (2p.-l)-n.p.]+0(E;; )i =1 2p. q. j =1 1J 1 1 1J 1 1 1
1 1
k r i n.II II (1)
i=l j=l Xij
(3.20)
under Ha, where A is a quantity depending on the data only through
Sl~) .
Since the conditional likelihood ratio, i.e., the ratio of (3.20) to
(3.19) depends on the unspecified parameters, no uniformly most
powerful test of Neyman structure exists.
By the Neyman-Pearson fundamental lemma the locally most power
ful test criterion having Neyman structure becomes the ratio of
77
(3.20) to l3.l9) as ~ + 0, which is2
k O"i r i 2 2S = I· 1 22 I· l[(x .. -n.p.) +X .. (2p.-l)-n.p.]c 1= 2p.q. J= 1J 1 1 1J 1 1 1
1 1
(3.21 )
However, Sc cannot be a test statistic unless the dependence on222O"i/(Piqi) is removed. This dependence is removed if and only if
222O"i=aPiqi' i=1,2, .. .,k (3.22)
where a is a constant.
Now under (3.22) the LMP test of Neyman structure based on Sc is
equivalent to a test based on* _ k r i 2
S -'·l'·lX...C L. 1= L. J = 1J
The argument in Appendix I can be extended to the one-way layout
data to show that the LMP test of Neyman structure based on S* is. cLMPU. 0
Remark: It is difficult to find a class of mixing distributions that
satisfy l3.22); the beta does not.
3.2 Random Effects Model of One-Way Layout
In the last section, we discussed mixture models for binomial
distributions, so as to allow random variation of the success prob-
e.
-e
78
ability. We extend the discussion to mUltinomial distributions and
focus on a Dirichlet mixture of multinomials alternative. This section
consists of a model specification to accommodate random effects in a
multinomial model, the development of an asymptotically optimal test
statistic for detecting these random effects by using Neyman's C(a)
procedure (Neyman, 1959), the null and alternative distribution of the
test statistic, and the large-sample comparison of the C(a) statistic
wtth the classical chi-square statistic. Because of Lemma 3.2, which
includes the case of beta-binomial distributions, it is difficult to
find a locally optimal test for detecting a Dirichlet mixture departure
from a multinomial in the non-iid case. Even in the iid case the local
optimality of a test that detects Wisniewski-tyoe general mixture depar
tures from a binomial is not preserved in the multinomial case. This is
shown in Appendix II in the case of a Dirichlet mixture of multinomials.
A Monte Carlo simulation of the power comparison of the C(u) statistic
and the chi-square statistic is presented. Finally a duality between
the C(u) statistic and Light and Margolin's Catanova statistic is
discussed. (Light and Margolin, 1971, Margolin and Light, 1974).
3.2.1 Dirichlet-Multinomial r~del
We ~onsider the multinomial as a generalization of the binomial and
consider a product of multinomials likelihood. Experimentally this can
arise from a situation in which we have G unordered experimental groups
and I unordered response categories with n+j observations taken in group
j for j=l, ... ,G. Data from such sampling can be represented in the
following contingency table;
79
I~1 2 ·.. G Response
Response Total
1 n11 n12 ·.. n1G n1+
2 n21 n22 ·.. n2G n2+
· • · · ·· · · • ·· · · • ·I nIl n12 ·.. n1G n1+
Group n+1 n+2 ·.. n+G n++Total
Table 3.1
IxG Contingency Table
Let the j-th group response vector, given the group total be denoted by
D.jl= (nlj, ... ,n1_1j ). One natural way of imposing random group effects
on the j-th group response vector is to generalize the multinomial
distribution by allowing the group probability vector itself to have a
Dirichlet distribution. Thus we have
I )ind ((D.j ~j' n+j ~ Mn+j , ~j)
for j=l,~.. ,G and
U U U iid D(r:<).~1 '~2"" '~G ~
(3.23)
(3.24)
From (3.24) it can be easily shown that the means, variances, and
covariances of Yi = yl = (U1, ... ,U1_1) are
E(U.) = 8./B1 1
Var(Ui ) = 8i (B-8i )/B2(B+l)
COV(Ui,Uj ) = -8i 8j / B2 (B+l), i f j
for i, j=l, ... ,I-l, where B = L~=18i.
It is useful to change the parameters by putting
Pi = 8;1B for i=l, ... ,I-l
8 = liB .
80
(3.25)
(3.26)
·e
Then it is an easy exercise to show that the marginal distirbution of
~j' which is a Dirichlet mixture of multinomial distributions, has a
p.m.f
h(nj;£,.e) = ( n+j r~ nit (Pi+re~ / [+r
1(l+re~, (3.27)
ni j , ... , nI -1 j G-1 y-O J r-O 'J,I -1where nIj = n+j - £i=l nij for j=l, ... ,G.
We refer to a Dirichlet mixture of multinomials (3.27) as a Dirichlet
multinomial distribution and denote it by
(3.28)
whe re £. = (P1' ... ,PI-1) I •
which is symbolically described as
!!j ~ M(n+j' ~J.) Q. 0(£,8) •~J
Thus as an extension of a product of multinomial distributions the joint
distribution of (nl, ... ,nk) becomes a product of Dirichlet-multinomial
distributions, i.e.,
ind (nj ~ OM n+j , £,8), j=l, ... ,G. (3.29)
81
Since a Dirichlet-multinomial distribution is a multi-dimensional exten-
sion of a beta-binomial distribution, there are several other termino10-
gies lJohnson and Kotz, 1969, 1977).
3.2.2 Test of the Random Effects
In the product Dirichlet-multinomial model (3.29) 6 becomes the
parameter of interest for testing the existence of random group effects,
because if 6 = 0 the model reduces to a product of mu1tinomia1s; this
is a device we and others have employed to allow a single parameter to
introduce random effects. Thus the null hypothesis HO of no random
effects and the alternative hypothesis Ha of the existence of random
effects can be expressed as
HO 6 = 0(3.30)
-e Ha 8 > O.
Based on the one-way layout contingency table in Table l3.1} the 10g
likelihood function of 8, apart from the additive constant, is given byG I nij -1 n+j -1
£(6)= Lj=l{Li=l Lr=O 10g(P i+r6)- Lr=O 10gll+r6)} , (3.31)
3.2.2.1 Case of £ Known
It is easy to show that the uniformly most powerful (UMP) test for
HO versus Ha does not exist in this case. However, the LMP test of
Potthoff and Whittinghill l1966,a} rejects HO for large values of
a£(6}1 = 1 G {I nijlnij-1} }2 \'. I· \'. 1 - n+
J. ln+
J, -l)a8 lJ= l,= p.6=0 . ,
(3.32)G {I n .. (n .. -1} }
a: Lj=l Li=l 'J pi 'J == T1 '
82
\1-1 \1-1where n1j = n+j - Li=l nij and PI = 1- Li=l Pi'
Potthoff and Whittinghill (1966,a) proposed a method of moment approxi
mation to the nul I distribution of Tl by finding constants e,f, and g
that satisfied
e Tl + f ~ x2(g} , (3.33)
where x2(g} refers to a chi-square random variable with g degrees of
freedom; however, by expressing Tl in (3.32) in terms of a quadratic
form we can suggest another approximation of the null distribution of
Tl . To aid in the development, we introduce some useful results with
out proofs. Proofs can be found in Ronning (1982).
Lemma 3.3: Under HO the covariance matrix of Dj = (n1j , ... ,n1_1,j)'
is given by
·e= n+ j V ,
where 0p. = diag(P1, ... ,P1- l }, and V = Op.- EE'.1 1
Lemma 3.4: Let V and 0p. be defined as in (3.34). Then1
-1 -1 (V = 0p. + l/P1)E,1
where E is a (1-l}x(1-l) matrix consisting of one's only.
(3.34)
(3.35)
Lemma 3.5: Let Z. be a (1-l) dimensional vector with elements~J
n ..z .. = ~(--'!.l._ P.) for i=1, ... ,1-1, j=l, ... ,G; then
lJ J n+ j 1
Z~V-lz. = \~ 1(n .. -n .p.}2/n+. p.~J ~J L1= lJ +J 1 J 1
is Pearson's chi-square statistic for goodness of fit in the j-th group.
83
-1Hence ZjV Zj has an asymptotic chi-square distribution with I degrees
of freedom under HO.
Simple calculation can show that the test based on Tl in (3.32) is
equivalent to the test based on
T*l = [nJ"-n+J"P+ -2I {P- 11 1)],v-l[n.-n+·p~2I{p- 11 1)] ,~ .~ ~ ~ -J J- - - (3.36)
where 1 is a (1-1)x1 vector of one's only and I ~ 2. Thus by use of
Lemma 3.5 we can derive the following results;
(l) When the n+j's are all equal and P = ll/I)!, Ti is equivalent to
Pearson's chi-square statistic.
(2) If we
that a j =
is of the
assume that there exist a., 0 < a. < 1 for j=l , ... ,G suchJ J
lim n+j , then the limiting distribution of (l/n++)Tin++-+<>o n++
form
__1__ T*~ \~_ 2{I 1 0) l3.37)n++ 1 H l.J-1 a j Xj -, ,o
where {X~{I-1,0}; j=l, ... ,G} is a set of independent noncentra1
chi-square random variables with 1-1 degrees of freedom and non
centrality parameter 0=i2L~:~ (Pi _ ~)2 .
(3) In the special case of equal n+j's we have
__1__ T* V ) x2{G (I -l) , Go) ,n++ 1
where 0 is defined in (3.37).
3.2.2.2 Case of E Unknown
The case of unknown P is far more interesting, especially in terms
of applicability to real problems. It is shown in Appendix II, that a
locally optimal test for testing HO: 8 = 0 versus Ha : 8 > 0 does not
84
exist. However, a C{a) test is readily available.
In order to derive the C(a) test statistic we need the following
partial derivatives of the log-likelihood ~(8) of (3.31) evaluated at
8 = O.
(3.38)
( I I-l ).( 21-1 )n+ .- - 1n.. n+ .- . 1n. .-1 }J 1= 1J J . 1= ~2p2
I(3.39)
for ;=1,2, ... ,1-1.
·e
1 G {In.. (n .. -l)(2n .. -1)
= _ -I I 1J 1J 1J8=0 6 j=l i=l p~
1
(2n+j -1)} (3.40)
for i=1, ... ,1-1 ,
where EO implies that the expectation is taken under e = O. Neyman
(1959) (see also Moran, 1970) has shown that when EO[¢2i(£)] = 0, the
null hypothesis can be tested using the statistic ¢l(E). where E is a
root - n++ consistent estimator of £. An obvious choice for E is the
MLE ~ =---nl \~ In. under HO. Substituting the MLE ~ in (3.38) we~ ++ L.J = ~J ~
obtain
A IG A A-l A
2~1(f) = J. l[n.-n+.PJ'V [n.-n+oP] - (1-1)n++~.- =. ~J J~ ~J J~
_ G I n+j[n;j ni + J2- n \ \ --- --- - ---~ - (1-1)n++L.j=lL.i=l n. r.=-- n +j ++'1+ vn+j ++
(3.41)
where v = 0" - ££1P.,Hence we see that the C(a) test is based on
85
(3.42)
(3.43)
·e
In determining the approximate null distribution of Tk two
limiting results are available. One uses the central limit theorem
(CLT) on the iid multinomial random vectors as the sample size G tends
to infinity. In this limiting argument, Tk , properly normalized, has
an asymptotic N(O,l) distribution by the result of Neyman's C(a)
procedure (Neyman, 1959). Since EO[~2i(E)J = 0 for i=l, ... ,I-l,
the variance of ~l(£) is estimated by -EO[~3(E)]. From (3.40) it
foll ows that1 G
-EO[~3(£)] = 2(I-l)Lj=ln+j (n+j -l)
1 A
Since Tk = n++ 2~1(E) + (1-1), by normalizing Tk, we find that under
HO: e = 0 the statistic
(3.44)
has an asymptotic chi-square distribution with 1 degrees of freedom.
We may consider another limiting argument that uses the multi
variate normal approximation of the multinomial distribution when the
number of groups, G, is held fixed and the group sizes {n+j}~=l tend
to infinity in such a manner that n+j/n++ + a j , a < a j < 1, for
j=l, ... ,G. In the following discussion, the approximate null and
alternative distributions are based on this limiting arguments, which
may better reflect practical experimental considerations where the
number of groups is fixed; we conjecture that these results will
86
provide a better sampling approximation for finite sample sizes.
The hypothesis test HO : e = a versus Ha : e > a has been described
as detection of a Dirichlet-multinomial departure from the multinomial
distribution. For this purpose two other test statistics that have
been proposed for fixed effects problems are worthy of consideration:
and
(3.45)
c = (3.46)
2where Xp is Pearson's chi-square statistic and C is the Catanova
statistic suggested by Light and Margolin (1971, also Margolin and
Light, 1974). For the relations among these three statistics, Tk, C2and Xp, we observe that
(i) When n+j = n for j=l, ... ,G, the test based on Tk is identical to2the test based on Xp.
(ii) When 1=2, C is equivalent to X~. (Light and Margolin, 1971).
Hence when n+j = n for all j and 1=2, these three statistics are
equivalent.
For the comparison of the three statistics in terms of large sample
behavior, we obtain the asymptotic relative efficiency (ARE) of X~
relative to Tk.
tractable form.
The ARE of C relative to Tk does not turn out to be a
Later we discuss duality between Tk and C.
·e
3.2.3 Approximate Null and Alternative Distributions
3.2.3.1 Approximate Null Distributions
We define the following notation for j=1,2, ... ,G;
ZI. =_ Z'.(P) 1 (P P )~J ~J = r--- nij-n+j 1,···,nI_lj-n+j 1-1
vn+ j
A A A (n l + nI -1+J.t: = (P1' . . . , PI-1)= n++ ,..., n++
L. = Z.(P)~J ~J ~
=[ AJ ~ ~]vH / ~ , / ~ ,.... / ~++ ++ ++
M = [n+1, n+2 ,..., n+G] 1
n++ n++ n++
87
(3.47)
(3.48)
(3.49)
(3.50)
(3.51)
(3.52)
(3.53)
o.· =J
n+ jlim --- as n+J. and n++ tend to infinity
n++(3.54)
~ = (!a1 ,... , 1aG)1
A = (0.1'"" o.G) 1
We may express ~ = lim ~ and A = lim M.n++-+<:o n++-+<:o
It is well known that as n . + 00+J
vZ. -'-'-+H N(O, V)~J 0 ~
for j=l , ... ,G, where
V= V(£) = D - ££1P.1
(3.55)
(3.56)
(3.57)
88
Also we note for j=l, ... ,G
Z. = Z. - ~(P-P)~J ~J +J ~ ~
(.n+j)~ G (n+k)~= Z - 1- I _ - k~j In++ k-l n++ k
By using the above we can express %in terms of 1 as
(3.58)
(3.59)
where Ik is a k x k identity matrix and ~ stands for the Kronecker
product. The asymptotic distribution of k can be obtained by using
(3.57) and the independence of kl, ... ,kG:
vk~ N(Q,(I G @ V).o
(3.60)
Hence by using (3.59) and the idempotency of (IG - ~~I) we obtain
-e A Vk le-+ N(Q, IG- ~~I) ~ V).o
Now, Pearson's chi-square statistic X~ can be expressed as
2 _ \'G A I A_1 .A
Xp - Lj=lZjV Zj
= ZI{I ~ V-l)Z~ G ~
For further discussion, the following lemma is useful.
(3.61)
(3.62)
Lemma 3.6: Under HO V= V + 0p(l).
Proof. Using maximum absolute column sum 1\ -II, for the matrix norm
we haveA 1-1 A A 1-1 A AII V-VIl l = max L--l IP'PI - P.PII = I·-1IP.Pc P.PIIl::=;;i::=;;I-l 1- 1 1 1- 1 1
where PI = 1 - Li:~Pi and Pi is accordingly defined. Since
ni +!fa B(n++ ' Pi)' Pi = Pi + 0P(1) as n++ -+ 00 for i =1, . . . , I-1. Thus by
the continuity the result follows. 0
89
Thus by lemma 3.6 (3.62) can be written as
(3.63)
By invoking a theorem in quadratic forms it can be seen that X~ is
asymptotically distributed as
2 V \(I~l)G A* 2(1)Xp~ Li=l iXi 'a
where {Ai; i=l, ... ,{I-1)G} is the set of eigenvalues of
(IG 0 V-1)[(IG-~~') 0 V] = (IG- ~~I) 0 11_1
(3.64)
(3.65)
(3.66)
(3.67)
·e
and {X~(l); i=1, ... ,(I-1)G} are iid chi-square random variables with 1
degrees of freedom. The eigenvalues of (IG - ~1A1) 0 11-1 are cross
products of eigenvalues of (IG- ~~') and those of 11_1 . Since 11_1
has an eigenvalue 1 with multiplicity I~l, (3.64) is equivalent to
2 V G 2Xp~ Ii=l PiXi(I-1) ,a
where Pi's are eigenvalues of (IG- 1A/A1).
Since IG - ~~I is idempotent of rank G-l we have (G-l) one's and one
zero for its eigenvalues. Thus (3.66) becomes
X~ ~ x2(I -1)( G-l )) ,
aa well known result.
We now consider the null distribution of the C(a) statistic Tk,
which can be expressed as
(3.68)
= \~ [n+j]~ Z~V-l[n+j]~Z.LJ=l n ~J n ~J++ ++
For notational convenience we define
Z"'!" = (n+j)~Z.""J n++ ""J
and
90
(3.69)
A A A
7:,* = (fi',"· ,fG I ) (3.70)
By the same arguments for obtaining the distribution of % in (3.59)
we obtain:
%*~ N(Q, (D -AA ' )@ V),o a i
where
Da . = di ag (0'.1 ' • • • , aG) •,Now using %* and lemma 3.6, we can express Tk as
Tk = Z*'(IG @ V-l)Z*(l+o (1))."" "" p
(3.71)
(3.72)
(3.73)
·e Thus, using the same arguments employed in (3.64)-(3.66) the asymptotic
distribution of Tk under HO is obtained as
V G 2Tk~ Lj=l Aj Xj(I-l) ,
owhere {AJ.:j=l, ... ,G} is the set of eigenvalues of (D -AA I
). We maya.,note here that n(D -AA ' ) is the singular covariance matrix of aa imultinomial distribution M(n,8).
Even though some computer subroutines can readily provide the
eigenvalues of (D -AA ' ), the determination of the eigenvalues appearsa i
to be an algebraically unsolved problem except that one of the eigen-
va 1ues is known to be zero. (Roy et ~, 1960, Li ght and Margo1in, 1971,
and Ronning, 1982). Since the O'.i IS are known, however, we may approxi
mate the distribution of Tk by 9X2(h), where the constants g and hare
chosen so that gX2(h) has the same first two moments as those of Tk. In
doing this we use the following results on D -AA';a i
trace(D -AA ' ) = Al+···+AG 1 = l-\~-l a~a· - lJ- J,2 2 2 G 2 \G 3 \G 2 2
trace(Da.-AA ' ) = Al+···+AG_l = Ij=laj-2lj=laj+(lj=laj),Thus the asymptotic distribution of Tk can be approximated as
g- 1Tk H:' x2(h)
owhere
91
(3.74)
(3.75)
-1g
and
3.2.3.2 Approximate Alternative Distributions·e
h = G 2 G 3 G 22'\. la.-2\. la.+(\. la.)lJ= J lJ= J lJ= J
(3.77)
(3.76)
2We next derive the asymptotic distribution of Xp and Tk under Ha .
Here we use the remarkable resemblance of the mean and covariance
matrix of the Dirichlet-multinomial to those of the multinomial distri-
bution (Mosimann, 1962);
Ee(n.) = n+.p, j=l, ... ,G.....J J .....
Cove(nj) = (n+~:~l] n+jV, j=l, ... ,G,
where the subscript e indicates that the underlying distribution is the
Dirichlet-multinomial.
It has been observed that there are four different asymptotic forms
of the Dirichlet-multinomial distribution (Paul and Plackett, 1978).
Among them, one is of particular relevance to our development.
92
Theorem 3.1: (Paul and Plackett, 1978). Let
n. ~ M(n+.,u) A D(~) = M(n+.,u) A D(p,e),~J J ~ 1!;;:: J ~ 1! ~
where
(3.78)
and the P.'s and e are defined in (3.26).1
Write Si = n++¢i for all i, where ¢i's are fixed quantities and let
n++ -+ 00. Then
-~ Vn+j(TIj-n+j£) H
a) N(Q'Yj(e)v),
where
(3.79)
·e
1im (n+.e+1) .n -+00 J++n .-+00
+J
We may note that by the construction of Si = n++¢i for all i we assume
that e = (L~_lS.)-l = e = O(l/n++).1- 1 n++
Hence using this result of Paul and Plackett, it is easy to see that
VZ. -n--+ N(O,y.(e)V),~J H ~ Ja
and
(3.80)
Z~
where
VH ) N(Q, Dy . 0 v)a J
(3.8l)
Dy . = diag(Yl(e)""'YG(e)).J
Thus by using (3.59) and (3.81) we obtain
Z ~) N(Q,Q 0 V),a
where
(3.82)
93
Q = D - ~~'D -D ~~I+ ~~'D ~~Iy. y. y. y.J J J J
i . e. ,
sym.
Q=
G-~ (Y·+Y·-\k lakYk)lJ 1 Jl.=
Now it becomes straightforward to show that
-e where {oi:i=l, ... ,G} is a set of eigenvalues of Q, and
Tk ~ ) L~=1oiX~ (I -1) ,a
where {oi:i=l , ... ,G} is a set of eigenvalues of D Q Dfa: ra:
J J
3.2.4 ARE of X~ Relative to Tk
(3.83)
(3.84)
To summarize the relevant distribution results, we have derived
the following:
(a) X2--nV--+l x2[(I-l)(G-l)]P HO
V(b) Tk H
O
where A.IS are eigenvalues of D -AA ' .1 a.
12 V G 2
(c) Xp Hal Li=lOiXi(I-l),
94
where 0i 's are eigenvalues of Q.
•
where O~IS are eigenvalues of 0 Q 01 ffliffli
Thus it can be shown that2var(XplHO) -----+) 2(I-l)(G-l)
Var(TkIHO) ) 2(I-l)L~=lA~=2(I-l)trace(Da.-AA,)21
G 2 G 3 G 2 2=2(I-l)[Lj=laj -Lj=laj +(Lj=laj ) ]
(3.85)
(3.86)
•d G 2 G 3 G 2 2ae Ee[TkIHa]le=o ----+) (I-l)[Lj=laj -2Lj=laj +(Lj=laj ) ]
= (I-l)trace(D _AA , )2ai
Hence the asymptotic relative efficiency (ARE) of the chi-square
statistic X~ relative to the C(a) statistic Tk is given by
(3.88)
(3.89)
where under H : e = e = O(l/n++).a n++
Interestingly Collings and Margolin (1983) obtained the same
expression of an ARE as (3.89) when they compared a C(a) test with
another test for detecting a negative binomial departure from a Poisson
in the regression through the origin case. They proved the following:
95
Theorem 3.1: (Collings and Margolin, 1983)
where the left equality holds if and only if G=2 and the right equalityGholds if and only if the group sizes {n+j}j=l are asymptotically
balanced.
Using the expression of ARE epIC in (3.89) we can prove
Lemma 3.7: The C(a) test is asymptotically equivalent to Pearson1sGchi-square test if and only if G=2 or all the group sizes {n+j }j=l are
asymptotically balanced.
Proof. We may express epiC as
-2 2-2epiC = A /(SA+A )
where
( )-1\,G-1 2 ( )-1\,G-1( -)2A = G-l li=l Ai' and SA = G-l li=l Ai-A .
Thus epiC = 1 if and only if s~ = O. But s~ =0 if and only if G=2 or
A1= ... =AG_1' which is equivalent to a1= ... =aG. (Light and Margolin,
1971, and Ronning, 1982). 0
3.2.5 Monte Carlo Simulation: Power Comparison
As shown above, the test based on Tk is superior to Pearson's
chi-square test based on considerations of asymptotic relative effi
ciency; however, the large sample properties do not necessarily hold for
small samples, nor are the local properties of the asymptotic relative
efficiency readily transferable to practical situations. Therefore, a
Monte Carlo simulation was conducted to compare the performance of the
two tests in terms of their sizes and powers.
S= 80, and initialize {n+j}j=l and the
·e
96
The data for the Monte Carlo simulation were generated on the VAX
780 computer system at the National Institute of Environmental Health
Sciences. The program was written in Fortran and used two IMSL sub-
routines: GGAMR and GGMTN.
The following lemma is useful to generate random observations from
a Dirichlet distribution, say 0(£,8).
Lemma 3.8 (Wilks, 1962): Let Xl ,X2, ... ,X k+l be independent variables
having gamma distributions G(l,Sl)' G(1,S2),···,G(1 ,Sk+l). Define
Ik+ly. = X./( " lX.), , J= J
for i=l, ... ,k.
Then r = (Yl,··.,Yk) has a Dirichlet distribution D(~), where
~ = (61, ... ,6k+l ), and D(~) is defined in (1.16).
The Dirichlet distribution D(~) can be reparametrized as 0(£,8) by
(3.26). The Fortran program of the Monte Carlo simulation is outlined
as follows;
(i) Set £ = £0' and 8
upper bound (upbound) of 8.
(ii) Generate a set of S independent probability vectors
Ql ,···,QS from a Dirichlet distribution 0(£,8) ~sing IMSL subroutine
GGAMR and lemma 3.8.
(iii) Generate a contingency table from a product multinomial
distributions n~ lM(n+",u.) using IMSL subroutine GGMTN.J= J ~J
2(iv) Calculate Tk and Xp.
(v) Count the number of Tk and X~ values exceeding their cut
off values corresponding to a = O.OS.
(vi) Go to the step (ii) and repeat for 2,000 times.
·e
97
(vii) Set e = 80 + 6 and go to the step (ii) until 8 ~ upbound.
For the calculation of sizes of Tk test and x~ test a subset consisting
of (iii)-(vi) of the above program was employed, because putting 8 = 0
in the step (i) involved division by zero in the step (ii).
The actual program was run for two sets of £0 and 8 ranges with the
same group sizes, which are listed in Table 3.2.
First Set Second Set
Po (0.05, 0.1,0.4, 0.45) (0.15, 0.2, 0.3, 0.35)
80 0.001 0.001
6 0.002 0.003
Upbound 0.031 0.025
Group 20, 20, 20, 200, 400 SAMESizes
Table 3.2
Two sets of Input Values of the Program.
The asymptotic relative efficiency of x~ to Tk is 0.415 for these group
sizes. Tables 3.3 and 3.4, respectively, display approximate power
functions of Tk and x~ tests for an 0.05 level based on the first and
the second sets of input values. Over the ranges of e values considered
the difference in powers can be as large as 0.086 for the first set of
input values and 0.115 for the second set. The ratio of the power of
the Tk test to that of the x~ test falls as low as 0.76 both cases
considered. Clearly, the Tk test can perform better than the x~ test.
Table 3.3 Approximate Powers of Tk Test and XpTest for
0.05 Size and!O = (.05,.1,.40,.45)1 and5 _
{n+j }j=l - {20,20,20,200,400 }
98
Approximate Power
8 Tk X2 differencep
0.000 0.0525 0.0505 0.0020
0.001 0.1060 0.0885 0.0175
0.003 0.2445 0.1700 0.0745
0.005 0.3545 0.2685 0.0860
0.007 0.4535 0.3680 0.0855
0.009 0.5255 0.4470 0.0785.- 0.011 0.5860 0.5175 0.0685
0.013 0.6385 0.5855 0.0530
0.015 0.7030 0.6480 0.0550
0.017 0.7165 0.6645 0.0520
0.019 0.7555 0.7270 0.0285
0.021 0.7805 0.7600 0.0205
0.023 0.7950 0.7925 0.0025
0.025 0.811 0 0.8100 0.0010
0.027 0.8250 0.8175 0.0075
0.029 0.8465 0.8410 0.0055
0.031 0.8640 0.8610 0.0030
Table 3.4 Approximate Powers of Tk Test and XpTest for
0.05 Size and ~o = (.15,.2,.3,.35)1 and
{ n+ j }j=~ =. { 20,20,20,200,400 }
99
Approximate Power
e Tk X2 differenceP
0.000 0.0535 0.0480 0.0055
0.001 0.1050 0.0780 0.0270
0.004 0.2900 0.2095 0.0805
0.007 0.4790 0.3640 0.1150
0.010 0.5825 0.4885 0.0940
0.013 0.6460 0.5840 0.0620
-e 0.016 0.7265 0.6865 0.0400
0.019 0.7640 0.7390 0.0250
0.022 0.7870 0.7815 0.0055
0.025 0.8440 0.8335 0.0105
100
3.2.6 Duality Betwep.n C and T.t<
Light and Margolin (1971) developed a categorical analysis of
variance (Catanova) procedure for data in an IxG contingency table.
They demonstrated that the measure of variation due to Gini could be
used to develop a measure R2 of explained variation, which in turn
could be viewed as a qualitative analog to the coefficient of
determination for continuous data.
Followinq Gini's definition of the total variation, Light and
Margolin defined the total sum of squares (TSS), the within group sum
of squares (WSS) and the between-group sum of squares (BSS) for the
replicated one-way classification under study as follows;
·eBSS. . = T55. . W55. .
'oJ 'oJ 'oJ
(3.90)
(3.91)
(3.92)
where the index ioj indicates that the row variables are random and are
being predicted from the fixed column variable. Based on these com-
Ponents a measure R? . of the proportion of variation in the row vari-, ° J .
able attributable to the column variable was proposed:
2R. . = B55. ./T55. . .'oJ 'oJ 'oJ
(3.93)
Later Margolin and Light (1974) observed that the R? . measure of, °J
association and t a , the sample version of Goodman-Kruskal's La' were
computationally identical. This observation led them to provide a
101
means of testing in the product multinomial model the hypothesis that
Ta was equal to zero, a test for which Goodman-Kruskal's asymptotic
distribution result (Goodman and Kruskal, 1959) was not applicable.
The test statistic was
2 2C. . = (n++-l) (1-1 )R.. X ((I - 1)( G-l) ) ,,OJ 'OJ HO
where C.. is Light and Margolin's Catanova statistic and'1 0 J
'is approximately distributed as'.
(3.94)
, is for
The C(a) statistic Tk obtained in (3.42) can be rewritten as
[
n..1J
In+ j
ni+ J2-n- In+ j++(3.95)
LI 1· IG n? __1_ \~ 2= . 1 - . 1 1J n '~J--l n+J.1= ni + "J= ++ -
From (3.92) and (3.95) we observe that
Tk = 2BSS. .,J 0 1
(3.96)
where BSS. . is obtai ned from BSS. . by systema ti c interchange ofJO' 1°J
columns and rows. As a corollary to the above relationship (3.96) we
have
Lemma 3.9: When we have only two grouping variables, i.e., G=2, the
C(a) test based on Tk is equivalent to the chi-square test based on')
Xp'
Proof. X~ and Tk statistics can be rewritten, respectively, as
102
(3.97)
(3.98)
·e
If i and j are interchanged, the argument provided by Light and
Margolin (1971) yields the result that
X2 [n~+]= 2 TkP n+ lo n+2
where G=2. 0
Remark: This is stronger than one part of Lemma 3.7, i.e., the
asymptotic equivalence is now proven for all ratios of sample sizes.
Since
(3.99)
is nonrandom, a test based on Tk is equivalent to a test based on
C. ", i.e.,J 0'
Tk a: BSS. . /TSS. . a: C. .JO' JO' JO'
(3.100 )
2It can be shown that R.. = BSS .. /TSS .. is computationally equivalentJo, JO' JO,
to t b, an estimate of Goodman-Kruskal 's Lb (Goodman and Kruskal, 1954).
However, R~ . has a different operational interpretation fromJ 0'
tb(or Lb)· Rio i ' or equivalently Tk, is based on the column-wise
product multinomial model.
A possible case in which R~ . can be interpreted as a measure ofJ 0'
association is discussed in Chapter 5.
103
Appendix I: Wisniewski-type Alternatives
The proof follows the arquments in Lehman (1959) and Fraser (1957).
Let ¢(~) denote any test for detecting mixture departures from the
binomial distribution.
1. The power function 8¢(a) of any test is given by
under the Wisniewski-type general mixtures alternative (3.16). For a
fixed P 8¢(a) is continuous a = O. Hence any unbiased size-a test is
similar of size a.
2. L~=lXi is the complete sufficient statistic under HO. Thus any
similar test of size a has a Neyman structure with respect to I~=lXi.
3. As Potthoff and Whittinghill (1966,a) noted, a most powerful test
of Neyman structure is necessarily unbiased. Hence the locally most
powerful test of Neyman structure is necessarily LMPU.
Apoendix II: The Dirichlet-Multinomial Alternatives
Under HO: 8=0 (nl +, ••• ,n I _1+) is the complete sufficient statis
tic for the unknown probability vector e. By conditioning on the
sufficient statistic we attempt to find the locally optimal test of
Neyman structure as e 7 O. Under HO the conditional likelihood of the
data n = (nl' ... ,nG) given the sufficient statistic (n1+, ... ,n1_1+) has
104
a 'generalized multivariate l ~ypergeometric distribution, which is
given by
(Al)
For the development of the conditional likelihood under Ha , the follow
ing lemma is useful.
Lemma A.l Under Ha
n++ IPH {(nl+,···,nI_l +)} =( n) II
a nl+,···, 1+ i=l
(proof) .
n. +-11 e
II {P.+r ,,)r=O 1 U
n -1+~ (l+r~)r=O G
{A2)
Using the multi-urn (G urns) extension of the urn models with
stochastic replacements that generates a multivariate Polya-Eqgenberger
distribution and noting the equivalence of the multivariate
Polya-Eggenberger distribution to the Dirichlet-Multinomial distri-
bution, the result readily follows.
Hence under Ha , using the lemma A.l, we obtain the conditional
likelihood of ~ given (n1+,· .. ,n I+) as
~his is a generalization of the multivariate hypergeometricdistribution discussed in the context of urn models in Johnson andKotz (1977).
o
n .. -1lJII (P.+re)
r=O 1
G n+ j IPH {n \(nl +,··· ,n I+)} = A[ IT (n n) II
a j=l lj'"'' Ij i=l
n .-1+JII (l+re)],
r=O
where A is a quantity depending on the data only through the
sufficient statistic.
Now, with some algebra (A3) can be rewritten as
e I n.. {n .. -1) 2{1+ z[ I lJ lJ - n+J.(n+J.-l)]+O(e )}
i =1 Pi
(A3)
(A4)
105
{ [-I G n.. (n .. -1) G D 2 }
1+ eLL 1 J 1 J - In. (n . -1) +0 (e )z . 1 . 1 P. . 1 +J +J .1= J= 1 J=
Thus the LMP test of Neyman structure, if it exists, has critical
region based on the large values of the ratio (A4) to (Al), which is
{e -I ,.. n .. {n .. -lf~ 2 }A 1+ To '\. ,\\; - lJ lJ + o{e )+ constant~ L1=FJ=1 P.
- 1
(A5)
{ e 2 }= A 1+ 2 T1 + o{e ) + constant
where Tl is given by (3.32).
The test criterion Tl , which is equivalent to Ti in (3.36), involves
unspecified parameters V- l . 0
-e
106
Remark: Even for the iid case, because of multiparameter
f = (Pl,.·:,PI-1)', the dependence on V- l in (A4) cannot be removed.
Thus the result of Wisniewski that there is a LMP test of Neyman
structure for Wisniewski-type general mixture alternatives does not
extend to the multi-dimensional generalization.
CHAPTER IV
BALANCED NESTED MIXED EFFECTS MODEL
4.1 Introduction
The one-way layout random effects model of chapter III can be extend
ed within the framework of the Dirichlet-multinomial distribution to a bal-
anced nested mixed effects model in which the row variable has fixed ef-
fects and the replications within each level of the row variable have ran
dom effects. An example of a balanced nested mixed effects model for dis
screte data may be obtained by modifying an example concerning anneals and
tinplates in Scheffe (1959, PI78). While the tinplates are regarded as a
random sample from a large population, the anneals are not, the interest
being in individual performance of anneal treatments on a common number of
tinplates in terms of various levels of corrosion resistance. Now, how-
ever, we consider a qualitative response with I levels instead of a quan-
titative one.
Let n. ok be the number of observations that were classified into thelJ
i-th level of response for the k-th replication within the j-th level of
the row variable for i=l, ... , I, j=l, ... , Rand k=l, ... , C. Because
the random effect is nested within the fixed effects, the {n ijk} do not
constitute a true three-dimensional contingency table. Nevertheless, the
data would probably be reported in the form of such a table and might be
analyzed via a Pearson chi-square test by an unthinking statistician. The
data might also be viewed as a three-dimensional table if there were an
108
attempt at blocking of experimental units, but in fact, the experimental
units were actually a source of random effects. We loosely refer to this
data as a three-dimensional contingency table. Denote the probability
vectors corresponding to the R levels of the row variable by ~1' ~2""'!R'
where TI, = (TIl" TI2 " ... , TIl 1')~ E Sa for j=l, ... ,R and Sa is de--J J J - J TI. TI.-J -J
fined in (1.13). Then by assuming that given ~j the k-th replication of
the j-th layer is determined by a Dirichlet-multinomial distribution
DM~+jk' TIj' e), the joint distribution of {nijk} is given by
(4.1)
.en+J'k = II n" k for j=l, ... , Rand k=l, ... , C.
i =1 lJ
The full model (4.1) is specified by parameters (TIl"'" TIR,e).
Using this parametrization we consider the following hypotheses of inter-
est;
(i) No nested random effects
H: e = ar
(ii) No fixed row effects
Hf : TIl = TI2 = ... = !R .
Discussions in the following sections consist of finding a suitable test
statistics and its null distribution in each of these hypothesis testing
problems.
4.2 Test of the Nested Random Effects
We test the existence of the nested random effects in the presence
of the fixed row effects, which are represented by distinct TI.'s. How-J
109
ever, if there were no fixed effects, the arguments in section 3.2 could
be employed for this problem.
4.2.1 C(a) Test
.e
We define the following notation for j=I, ... , R, k=I, ... , C;
OJ = diag(n1j , n2j ,···, nI_1j )
V. = V.(n.) = D. - n.n~J J ~J J ~J~J
A A
V. = V.(n.)J J ~J
Z'k = Z·k(n.) = (n+·k)-i (n' k - n+·kn.)~J ~J ~J J ~J J ~J
A -1 CTIj = (n+jk ) Lk=1 ~jk
Z'k = Z'k(;')~J ~J ~J
fll = (/n+j1 , jn+j2 , ...J n+j + n+j+
ajk = lim n+jk/n+j + as n+j + tends to 00 •
ajk = n+jk/n+j+
(4.2)
(4.3)
(4.4)
(4.5)
(4.6)
(4.7)
(4.8)
(4.9)
(4.10)
,fA. = ([a;;, ~2'J J1 J
(4.11)
A. = (a· 1, a· 2 , ••• , a. )J J J JC
(4.12)
(4.13)
(4.14)
where nT = n+++.
Under the full model the joint probability of {n ijk} is given byI
R C n 'k .IT n .. (n ..+8) ... [IT ..+{n. 'k-1)8J({ }) ' (+J ) 1=1 1J 1J 1J 1J
Pr nijk = nj =1 ITk=1 n'Uk' ... ' n1jk (1+8)(1+2e) •.. [I+(n+jk
-l)8]
(4.15)e Define £(8) = 'log, pr({n ijk }) . In order to obtain a C(a) test statistic
we need the following derivatives evaluated at e = 0;
= ddtll = \'R \,C {r n. ·k(n. ·k-1)de LJ·=1 L. k=1 \'. 1J 1Je=o L1=12TI .•
lJ
(4.16)
.e
(2) _ d ~(e) _ c {-nijk(nijk-l) onIjk(nrjk-1)}'II •• ({1!.}) -,,~ 'de - Ik=1 2 + 2
1J J u" • • e-o 2 2lJ - TI.. TI r ·lJ J
for i=I, ... , 1-1, j=I, •.• , R, where nljk = n+jk - I~:~ nijk . Since
E[,¥.~2)J = a for all (i,j) under Hr:e = 0, following Neyman (1959) thelJ
hypothesis Hr : e = a can be tested based on the statistic '¥(I)({nj})'
where ~j is a root-n+j + consistent estimator of ~j' Substituting the
MLE TIj we obtain a C(a) test statistic
(3) _ -1 IR IC ( ~ )~A_l( A )T - nT [ . 1 kIn. k- n+.k TI. V. n .k-n+.k TI. ]J= = ~J J ~J J ~J J-J
Here we note the following representation of T(3);
(4.17)
where
(4.19)
(4.20)
is a C(a) statistic based on the j-th IxC contingency table for testing
Hr : e = O.
Denote Pearson's chi-square statistic based on the three-dimensional
111
IxRxC contingency table by X(3). Then by the additivity of chi-square
random variables
(4.21)
.e
where xj (2) is a Pearson's chi-square statistic based on the j-trr IxC
contingency table.
We may note that the C(a) statistic T(3) based on a three-dimension
al contingency table is a weighted sum of corresponding C(a) statistics
based on its two-dimensional contingency sub-table, whereas the chi-square
statistic is a simple sum of chi-square statistics based on lower dimen
sional contingency tables. The representation (4.19) of the C(a) statis
tic T(3) will be used to obtain the asymptotic relative efficiency of
X(3) to T(3) later in this section.
For notational convenience we define for j=l, ... ,R, k=l, ...C
(4.22)
(4.23)
(4.24)
(4.25)
(4.26)
(4.27)
112
o
and
o
A A "
W= W(V1
, ... ,VR)
(4.28)
(4.29)
Then after some algebra it can be shown that the C(a) statistic T(3) can
be expressed as
.e
(3) _ R C "*~ A_I "*T - \ '-I 'k-l Z'k V. Z'kLJ - L - ~J J ~J
A*~ A_I A*= ~(3) W ~(3)
Let
o
(4.30)
lo I
A>; (Dra: - D;a:: IA IA~) H I I-I iRk Rk ~
(4.31)
"Then by (4.9) and (4.13) and using ~j = ~j+Op(l) it can be verified that
(4.32)
e Since a multinomial random vector converges in distribution to a multivari-
113
ate normal distribution when ~j is constant and n+ jk tends to infinity
for k=l, ... ,C; j=l, ... ,R, ~(3) has a limiting distribution given by
(4.33)
as {n+jk} tends to infinity."*Thus by (4.32) and (4.33) we obtain the limiting distribution of ~(3)
as
(4.34)
"Let I = U"WU. Then using properties of Kronecker products and IAj IAj
= I~=l ajk=l, we can simplify I to
.e I =
o(4.35)
o
LHence from (4.30), (4.34) and (4.35) we can derive that
T(3) ~ ) I~~11)RC Ai X~(l)r
where {Ai; i=l, ... ,(I-l)RC} are eigenvalues of w-1I .
Let
o
(4.36)
G =
o(4.37)
114
Since W- 11 = G s 11_1' (4.36) is reduced to
(3) V \RC * 2T H) Li=1 Ai Xi (I-I) ,
r(4.38)
*where {A.; i=1, ... ,RC} are eigenvalues of G in (4.37) .1
Proceeding as before, we may approximate the asymptotic distribution of
T(3) by equating the first two moments of g*-lT(3) with those of a chi
*square random variable with h degrees of freedom. First we note that
and
.eThus the null distribution of T(3) can be approximated as
g*-lT(3) t i( h*) ,r
where
and
4.2.2 ARE of X(3) relative to T(3)
(4.40)
(4.41)
Employing previous arguments in section 3.2 and noting the represen
tations(4.19) and (4.21), the alternative distributions of x(3) and T(3)
can be easily obtained. Thus omitting much algebraic details we can de
rive the following;
(a) Var (x(3)\Hr ) ------~ 2(I-1)(C-l)R
(4.42)
_e
115
where G is defined in (4.37)
(c) JL E [x(3)\K JI ) (I-l)(l-L L b.a. 2k)
de e r e=O j k J J
(d) :e Ee[T(3)IKr J\ ~ (I-l)trace (G2) = (I-I) trace (G) •8=0
Thus the asymptotic relative efficiency e~lt of K(3) relative to T(3) is
given by
(3) _ (1-f ~~b_j_aj_~_)2 ---:-
epic - (C-l)R{L b. 2 [L a.~ - 2L a.~ + (Lk
aJ.~)2}j J k J k J
(I(C-l)R ;\*)2R,=1 R,=----=------
(C-l)R(Li~il)R ;\~2
*under Kr : 8=enT
=0(I/nT), where {;\R,; R,=l, ..• ,(C-l)R} is a set of eigen-
values of G in (4.37).
Now, as a straightforward extension of Theorem 3.1 we can obtain the
following theorem, given here without proof;
Theorem 4.1
where the left equality holds if and only if C=2 and R=1 and the rightR Cequality holds if and only if the group sizes {n+jk}j=l k=l are asymp-
totically balanced.
4.3 Test of Equality of the Fixed Row Effects
In testing the equality of the fixed row effects Hf ~l""~R in
.e
116
the pseudo IxRxC contingency table, two statistics deserve consideration,
the Wald statistic, say W, and Pearson's chi-square statistic x~. It
can be seen that the generalized Wald statistic has a simple reference
distribution due to its construction, whereas Pearson's chi-square statis
tic has a complicated reference distribution as is shown in (3.83) in
the case of a one-way layout contingency table; this is because of the
underlying Dirichlet-multinomial distribution.
The comparison of these two statistics, Wand x~, is small samples is not
practicable. Even in a large sample comparison, as Puri and Sen (1971) indi
cate, the unique answer regarding the relative efficeincy of Wrelative to
x~ may not be possible, since the alternative distributions of the two statis
tics depend on more than one parameter. Hence we consider the simpler problem
*of testing Hf : ~1=~2 in a 2x2xC contingency table and calculate the asymp-
totic relative efficiency of Wrelative to x~ in an attempt to gain some in
sight into the original problem of testing Hf : ~1=~2= ..• =~R .
Our product Dirichlet-multinomial model is reduced into the product
multinomial model with the same probability vector, when 8=0. Hence,
when 8=0, by aggregating the data along the random dimension we obtain
a two-dimensional contingency sub-table of sufficient statistics. Thus
when 8=0, a test of Hf : ~l= ... =~R should be based on the collapsed
two-dimensional contingency sub-table of sufficient statistics. In a
product Dirichlet-multinomial model collapsing a three-dimensional con
tingency table along the random dimension does not yield sufficient stat
istics for (~l' ... '~R). Thus a statistician may want to base his test on
the full three-dimensional contingency table under two possible cases:
(i) He decides that collapsing the pseudo three-dimensional contingency
in the two-dimensional
.e
117
table along the random dimension may incur loss of information on the
random effects, because collapsing does not yield sufficient statistics
when 8>0.
(ii) He mistakenly treats the balanced nested mixed effects model as a
crossed mixed effects model and bases his test on the pseudo three-di
mensional contingency table.
The effect of employing a test procedure based on the full three-dimen-
sional contingency table in a product Dirichlet-multinomial model can
be investigated by comparing test procedures based on collapsed and un
collapsed tables. In doing this we may suggest beneficial conditions
under which the test based on the collapsed table is asymptotically more
efficient than the test based on the uncollapsed table. In the remain
ing discussion we refer to the tests based on the collapsed two-dimen
sional table and full uncollapsed three-dimensional table as a test C,
and a test F, respectively.
4.3.1 Wald Statistic and Chi-Square Statistic
and let 'IT'" = ('IT l ,'IT2 )' be a sequence of points-n n nEuclidean space R 2 of the form ~n= ~6 + o~/;n
where lim ~n=~ , and ~O and ~ are fixed points. In order to have onlyn~
one extra parameter under the alternative hypothesis we set 0'" = (~,O).
Define U(~)=('IT1-TI2)' where ~ ... = (TI l ,TI2) is a point in R 2. Now under
the above formulation the null hypothesis
is understood as
as n -7 00,
118*and the alternative hypothesis, say Kf is formulated as
*Kf : In U(~n) = In (TI1n-TI2n ) • t"
where n = nT. (See Stroud, 1971, and Shuster and Downing, 1976 for
recent development of the generalized Wald statistic and its applica
tions.)
We use the following notations throughout this section.
.e
Yjk = nl~m~ n+jk/n+j++J+
CAo = Ao(8) = Lk lYoknok(8)J . J = J J
for j=I,2 and k=I, •.. ,C, and
1 0 n+1+ (1 1 0 n+2+)B = 1m -n--' -(3 = 1m --nT~ T nT~ nT
B" = (B, I-B)
n+1+For simplicity we sometimes denote Band Yjk for n
T
(4.43)
(4.44)
(4.45)
(4.46)
(4.47)
(4.48)
(4.49)
n ° kand ~, re-nT
spectively, where there is no confusion.
Let Wc and x~ be the generalized Wald statistic and Pearson1s chi
square statistic, respectively, based on the collapsed table along the
random dimension. We consider the asymptotic relative efficiency (ARE)
of Wc to x~. Denote
n11+ n12+ A A
~n= (- ,-) = (TI1n ,TI2n ) (4.50)
n+1+ n+2+
A" n1++ n2++TI = (- , -n-) (4.51)~O nTe T
119
Then based on the asymptotic normality of the beta-binomial random vari
able as was indicated in Paul and Plackett (1978), we have under the null
*hypothesis HfA-' V • N(2. ~o{1-~O} [a-1\1 0]) .rnr (~n-~O) *Hf o (1-S)-l A2
Thus
hiT U(;n)V ) N(~~ [S-l A1 + (1-S)-l A2 ] TIO(l-TIO))*Hf
andA _ n1++ _ 0 ( -t)~O - -nr- - p n
where n=nTHence
(4.52)
(4.53)
(4.54)
.eA A 2
nT(TIln-TI2n)W
c= -_...:....--=..:..:---::.:.:...._---
[S-IAl+(I-S)-lA2]TIO(1-TIO)(4.55)
Similarly, under the alternative hypothesis
(4.56)
because
lim TI (l-n ) = TIO(I-TIO) •n-+«> n n
Thus is follows that
(4.57)
(4.58)V 2( 2 {[ -1 ( -1* -+ XI, ~ / B A1+ I-S) A2]nO(l-TIO)}Hf
where x2(v,o) is a noncentral chi-square random variable with v degrees
of freedom and noncentrality parameter o.
120
Now, by using previous argument it can be shown that under the null
*hypothesis Hf
•
where ¢1 and ¢2 are eigenvalues of the matrix 12
~
12 = (I2-1[1[~) A (I2-1[1[~) ,
(4.59)
(4.60)
and xI(l,O) and X~(l,O) are independent.
Since the matrix 12 in (4.60) is singular it can be readily found that
one eigenvalue is equal to zero and the other one is equal to A1(1-B)+A2B.
Thus from (4.59) it follows that
(4.61)
•~ To find the asymptotic distribution of x~ under the alternative K; we pro
ceed as follows. We can derive
(4.62)
o
and
~G ::
A A
=
1-B -/B(l-B) ~I1+
(4.63)A A
-/B(l-B)
Now, because of the singular transform in (4.63) it can be seen that
z = IS- Z2+ 11-S 1+.
Hence Pearson's chi-square statistic becomes
where
"- V N((1-S)IB~, TIO(1-TIO)[A~1-S)2+A2S(1-8)]) .Z1+ *~
Kf
Hence
x2 V ~ [A1(1-B)+A28J x2(1, B(1-B)~
*c Kf TIo(1-TIO)[A1(l-S)+A2S]
121
(4.64)
(4.65)
(4.66)
(4.67)
.e Based on the above results (4.55), (4.58), (4.61) and (4.67) it can
be seen that
Var(Wc)
8(l-B)
(4.68)
(4.69)
var(x~) ~ 2[A1(1-S)+A2B]2*Kf
~ E[X2] ~s(l-S)
*dt;2 c£;=0 Kf TIO(l-TIO)
(4.70)
(4.71)
Hence the asymptotic relative efficiency e 21 of X~ of Wc is equal toXc Wc
1. This may be considered as an extension of the equivalence of these
122
two test procedures in a product multinomial model to a product Dirichlet
multinomial model.
4.3.2 ARE of a test F relative to a test C
Since the generalized Wald Statistic is asymptotically equivalent to
*the Pearson chi-square statistic for testing Hf : TI 1=TI2 ' the Wald Statis-
tic, due to its simpler reference distribution, may be chosen for the
discussion for the ARE of a test F relative to a test C. We compare the
large sample behavior of the generalized Wald Statistic based on the col-
lapsed table and the full uncollapsed table. Let WF be the generalized
Wald statistic based on the full uncollapsed table. Then WF becomes the
sum of the generalized Wald statistic on each of 2x2 table along the ran-
dom dimension.
We define for j=1,2 and k=l, ... ,C
.e A
TI jk = nljk/n+jk
A A
~k= (TI1k , TI2k )
Bk = nlim n++k/nT~
(4.72)
(4.73)
(4.74)
(4.75)
*Then under the alternative hypothesis Kf we can derive
Hence the generalized Wald statistic Wk based on the k-th 2x2 table is
123
(4.77)
and
(4.78)
Thus
.e
Now by the standard method of asymptotic relative efficiency of WF rela
tive to W can be calculated asc 2
(8) =l~ Sknk(l-Yk) / s(l-sl JeW IW C 1 /-- ! ' ( 4•80 )
F c ,k=l Ykn2k(8)+(1-Yk)n1k(8) A1(1-B)+A2~L__
and we can prove
Theorem 4.3: When 8=0, i.e., under the product multinomial model
eW !W = limF c nrm
(4.81)
where equality holds if and only if {n+1k} is proportional to {n+2k } .
Proof. This can be readily proved by noting a theorem in Hardy, Little
wood and Polya (1923, p.61, theorem 67), which states that
a.b.I (a.+b.) I 1 1 < Ia.}).. 1 1 . +b . 1 11 1 a. . 1
1 1
unless {ail and {bi}are proportional. 0
124
When 8>0, we may have a reparametrization by noting that 8=8n=0{1/n)
in the passage to the limit. (Paul and Plackett, 1978).
Thus we define
and
n¢ = 8n n (4.82)
.e
lim n8n = lim ¢n =¢ , (4.83)~ n~
where ¢ is some positive number. Then by using (4.82) and (4.83) the
(4.84)
Investigation of the formula (4.84) shows that the ARE depends on ¢ and
the group size ratios unless {n+1k} is proportional to {n+2k}. When
{n+1k} is proportional to {n+2k} it can be readily seen that the ARE of
WF relative to We is equal to t .The formula (4.84) of the ARE as a function of ¢ is based on the
O{*) assumption of e=8n• In practice, however, e is determined by na-
nature, and hence is fixed. Thus in order to provide some suggestions
to the practical statistician for the choice between WF and Wc in terms
of the ARE we may calculate the ARE of WF to Wc as a function of 8 for
different group sizes with the same group size ratios.
Indication of likely practical values of e may be obtained from
past empirical studies on the beta-binomial distribution done by Skell
am (1948), Kemp and Kemp (1956), Chatfield and Goodhardt (1970), Williams
(1975), Feder (1978), and Segreti and Munson (1981), among others. Their
Table 4.1 Approximate Range of a, nTa and the Type of Data
125
Author Total number of A A
observations a nTa Type of Data
Ske11 am 337 0.095 27.37 Number of associations inchromosomes
Kemp and Kemp 200 0.171 34.25 Number of contacts withpins in 200 frames tin
200 0.126 25.35 the analysis of pointquadrat data)
200 0.129 25.77
200 0.058 11.62
200 0.482 96.48
Chatfield and 50 0.320 16.02 Number of r weeks on whichGoodhardt purchases of certain item
.e 474 1.279 606.14 are made out of n weeks (r~n)
Wi 11 i ams 145 0.465 67.43 Number of pups survived inpregnant female rat
Feder 524 0.073 38.25 Number of fetal deaths amongtotal fetuses per litter
Segreti and 40 0.681 27.24 Number of fetal deaths amongMunson total fetuses per litter
.e
126
estimated values of e and the total number of observations are shown in
Table 4.1.
We consider a simple case when C=3 for the calculation of ARE's of (4.84)
as a function of 8. Three sets of hypothetical group sizes, say D1, D2
n+11 n+12 n+13] [10 20 75]. .and D3 are considered, where D1= = 1S 1ntend-n+21 n+22 n+23 80 5 40
= [30 30 40]ed to refer to the seriously unbalanced group sizes and D3 25 35 45
represents reasonable proximity to the balanced group sizes. D2 =[20 15 50]35 60 20
is considered to represent the inba1ance somewhere between D1 and D3.
Group sizes are varied but group size ratios are maintained by multiply
ing constant k~l to D1, D2 and D3, respectively. ARE's based on D1, D2and D3 are presented in Figures 4.1, 4.2, and 4.3, respectively.
From the calculation of ARE's we may note that when the group sizes
do not exhibit 'serious' unbalance the total sample size barely affects
the ARE, which is substantially below 1 (see Figures 4.2 and 4.3). If,
however, the group sizes show 'serious' unbalance, the total group sizes
can affect the ARE (see Figure 4.1). It may be concluded that for group
sizes that do not exhibit 'serious' unb1ance the ARE of WF relative to
Wc is less than 1 for a practical range of 8 values. Thus based on this
conclusion we may point out that loss of efficiency is the effect of us
ing a test F procedure based on the full three-dimensional contingency
table in a product Dirichlet-multinomial model with practical e values.
However, this must be used with caution in practice. The effect of the
unknown 8 on the size of the test needs further study. In the remaining
ARE
Figure 4.1 ARE of WF to We Based on10 20 75kD1 = k (80 5 40) for k=1,5,10
127
1.2
1.1
1.0
0.9
0.8
. e 0.7
0.6
0.5
0.4
0.3
0.2
O.T
0
+- k= 10+- k=5+- k=l
'---+---if---+---i--+--+-+--t-+--+-+--+-+--+-+--+--!--l)Q0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
ARE
Figure 4.2 ARE of WF to We Based on
kD 2 = k(j~ ~g ~~) for k=1,5,lO
128
1.2
1.1
1.0
0.9
0.8
. e 0.7
0.6
0.5
0.4
0.3
0.2
0.1
0 0.1 0.2 0.3 0.4 0.50.6 0.70.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
__ k=10.... k=5... k=l
e
Figure 4.3
ARE
0.39
0.38
0.37
0.36
0.35
0.34
0.33
0.2
O. 1
ARE of WF to We Based on30 30 40
kD3 = k( 2~ 35 45) for k=ls5 s10
129
o 0.1 0.2 0.30.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6e
130
section, we derive the form of Wc in the general pseudo IxRxC table.
4.3.3 Wald Statistic for testing the Equality of Fixed Effects.
From the results of previous discussion it may be concluded that
for the most practical cases of the group size ratios and the practical
range of e values, the generalized Wald statistic, Wc based on the col
lapsed table appears asymptotically more efficient than the generalized
Wald statistic, WF based on the full uncollapsed table. Hence in what
follows we employ Wc to test the equality of the fixed row effects in
the general pseudo IxRxC contingency table. We construct a test statis
tic together with its asymptotic null distribution.
We use the following notations for j=l, .•. ,R, k=l, ... ,C.
(4.85)
Tf-" =-0
(4.86)
A-" 1Tf. = ----- (n1 ·+, n2 ·+,···, nI - 1J.+)-J n+j + J J
-" (-" -" -")~G = ~l' ~2""'~R
A A A
~G = (~1' ~2"'" ~R)
(4.87)
(4.88)
(4.89)
(4.90)
1 -1 0 0 0
1 1 -2 0 0
o
H = 1
1
1 1 -3 0
1 1 1
o. l-(R-l) (R-l)xR (4.91)
*U = H D 11_1
131
(4.92)
(4.93)
(4.94)
(4.95)
(4.96)
(4.97)
The null hypotheses Hf ~1=~2= ... =~R(=~O' say) can be restated as
*Hf : U ~G = ~ ( 4 •98 )
Now, based on the asymptotic normality of ~jk = (n+jk)-i(~jk-n+jk~j)
we can deri ve
v ~ N(O, 0 -1 (e) D Vo)Hf S. A.
J J
where
(4.99)
thus
U *" V * U*';n,:- ~G ----'---r~ N(~, U [0 -1 (e) D Vo] )Hf S· A.
J J
(4.100)
Using (4.92) the covariance matrix of ;n,:- U*~G in (4.100) can be sim
pl ified as
u*[O -1 (e) D VO]U*' = [HD -1 (e)w'] D VoB· A. B· A.J J J J
by the property of Kronecker product.
4.101 )
132
*Since U = H III 11_1 is of rank (R-1)(1-1), by the theorem inA A A
Shuster and Downing (1976) and noting Vo = VO+Op(l), where Vo = VO(~O)'
we obtain
v ) x2((1-1)(R-1)). (4.102)Hf
Chapter V
FURTHER RESEARCH
In this chapter, we list four topics that are related to the previ
ous chapters and deserve further research considerations:A
(1) The uniqueness of the MLE e of a finite mixture of binomial distri-
butions.
(2) The likelihood ratio test of Ho : c=l vs. Ha : c=2, where c is the
number of components of a finite mixture of binomial distributions.
(3) The development of the TK statistic as a measure of association.
(4) The development of a nested pure random effects model for count
data.
5.1 THE FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS
Aside from the earlier development of some numerical algorithms that
provide an ML estimator of the mixing distribution in a finite mixture of
members of the exponential family, the fundamental properties such as the
existence, uniqueness and consistency of the ML estimator of the mixing
distribution have not been discussed until quite recently in the litera
ture. Simar (1977) presented an extensive examination of these properties
of the MLE in the case of a finite mixture of Poisson distributions, and
Jewell (1982) applied Simar's arguments to a finite mixture of exponential
distributions. Hill et ~ (1980) considered these problems in an infinite
mixture of the form h(t) = I~=l Pkfk(t) for known densities fk(t) that can
be found in the mixture of Poisson distributions, where fk(t)=e-At(At)k/kl
134
for A>O, k=O,1,2, .... Lindsay (1983) provided a convex geometric ap
proach for the solutions of these problems in a finite mixture in gener
al when identifiability is not an issue. In fact all the families of
mixture models that have been considered for the investigations of these
fundamental properties of the MLE were either always identifiable or as
sumed to be identifiable.
In this section we prove the existence of a MLE of a mixing distri
bution in a finite mixture of binomial distributions and indicate that
the MLE may not be unique.
Let Xl' X2, •.• , Xt be a random sample from a finite mixture h(x) of
binomial distributions with mixing distributions G, i.e.,
•(5.1)
•
*where GEG1, a class of all discrete distribution functions with at most
c atoms.
Suppose the observation vector (Xl, .•. ,Xt ) has k distinct points
O~Yl<Y2<"'<Yk~n. Let ni be the number of XiS equal to Yi' i=l, .•. ,k.
The mixture model (5.1) is poorly specified unless it is identifiable.
Hence we assume n22c-l to make the mixture model identifiable.
The log-likelihood function of X1"",Xt can be written as
1 x. n-x.L = IJ=l l09{!o (~j)p J(l_p) JdG(p)}
(5.2)
135
where
,-
1 y. n-y.S. = f (yn)p '(l-p) 'dG(p)
, 0 i
for i=1,2, .• .,k.
(5.3)
*The equation (5.2) defines a many to one, map ¢ from Gc to a set B of
k-tuples (B1, ... ,Bk) in Rk unless k~c. If k~c then due to the identifi-
*ability condition one and only one G€Gc is associated with a single point
in B; hence ¢ becomes a one to one map.
*Let {G£} be a sequence of distribution functions in Gc • Since for
each ~~1 G£ has a finite support [0,1], {G£} is tight. Also by the Helly
Bray lemma there is a subsequence {G£ } that converges to a distributionk
function G*.
*Lemma 5.1: G*~ Gc .
Proof It suffices to show that G* does not have c+l atoms. Suppose on
the contrary G* has c+l atoms. Let the support points of G£ and G* bek
denoted by x1(k),x2(k), ..• xc(k)' and xl,x2, •.• ,xc,xc+l' respectively.
Since G~ (x) converges to G*(x) as k goes to 00 at all continuity pointsk
of G*, for sufficiently large k we can choose E>O such that xjENE(xj(k))c
for j=I, ••• ,c., and an extra support point x +1$U N (xJ"(k))' where N (y)c j=1 E E
implies the E-neighborhood of y. Without loss of generality we assume
xl(k)+E<xc+l<x2(k)-~ for large k. Set G*{(xc+1)}=a. Hence, for each
136
verge to G*(xO) due to the jump size a. Thus a contradiction is obtained.o
Lemma 5.2: B is compact.
Proof The binomial mass function is a bounded and continuous function
oHowever, we note that B is not convex due to the identifiability con-
*dition n~2c-1, which gives an upper bound of c to Gc '
The likelihood function (5.2) is strictly concave on a compact set
of p. Hence by the Helly··Bray lemma, lemma 5.1 and theorem 4.4.2 of Chung
(1968), every sequence of points of B contains a subsequence converging
to a point of B. Consequently B is compact.
B. Hence it has a unique maximum at some point(s) in B. But due to non-
convexity of B the point at which the likelihood function attains its
maximum may not be unique.
The investigation of sufficient conditions under which the likelihood
function attains its maximum at a unique point in B is proposed as further
research.
Another important problem in the finite mixture of binomials is that
the majority of estimation techniques assume that the number of components
of a mixture, which is c in our notation, is known a priori. However,
no really adequate test has been suggested for testing hypotheses con-
cerning c, even for the simple case of testing Ho : c=l versus Ha :c=2.
Everitt and Hand (1981) noted that IIthis may be a consequence of the prob
lem rather than any lack of ingenuity.1I
We briefly describe the problems involved in the likelihood ratio
•
test for testing Ho : c=l versus Ha : c=2 in the case of a mixture of two
binomial distributions. A mixture h(x;8) of two binomial distributions
137
is represented as
m) x )m-x ( )(m) x( )m-xh(x;~) = n(x p (l-p + I-n x q I-q , (5.4)
where m~3, and 8 = (n,p,q) is a parameter in the parameter space n in
which
(5.5)
.e
where
Wo = {(n,p,q) O<n<l, O<p<q<IJ
WI = {(n,p,q) O<n<l, O<p=q<l}
W2 = {(I,p,q) O<p<q<l}
W3 = {(O,p,q) O<p<q<l}
The null hypothesis of no mixture Ho : c=1 now equivalent to HI = ~Ewl'
H2 : ~ew2" or H3 : ~Ew3' Here we may note that two non-standard condi
tions exist in the parameter space n. First the parameter 8 under the
null nypothesis falls on the boundary of n; hence the standard chi-square
distribution result of the likelihood ratio statistic -2 logA does not
hold. Second, the null hypothesis region w1uw2uw3 consists of a union
of hyperplanes of different dimensions.
Wilks' original result (Wilks, 1938) of the asymptotic distribution
of -2 logA was generalized by Chernoff (1954) to the case where the para
meter fell on the boundary between the null hypothesis and alternative
hypotneses regions. Feder (1968) also investigated the asymptotic distri
bution of -2 logA when the parameter was near the boundary between the
.e
138
null hypothesis and alternative hypothesis regions. Feder (1968) relat-
ed Chernoffls and his results to obtain the null and alternative distributions
of -2 10gA for testing Ho : 8<0 vs. Ha :8>0 in the context of the beta-binomial
distribution described in (3.5), and observed that the asymptotic null dis
tribution of -2 10gA has a jump 1/2 at the origin and a chi-square dis-
tribution with 1 degree of freedom when A>O, i.e.,
-2 10gA ~~--+ (1/2) I(A=O) + (1/2)x2(1) I(A>O),o
where I is an indicator function.
Quite recently, Symons et ~ (1983) provide a Monte Carlo simulation
study of the distribution of -2 10gA for testing Ho : c=1 versus Ha : c=2
in a mixture of two Poisson distributions, and observe that the distribu-
tion function of -2 10gA has a jump 0.4 at the origin.
Even though Symons et ~ (1983) consider a mixture of two Poissons,
they still have non-standard conditions in their parameter space analo
gous to those stated earlier. The fact that their simulation study sup
ports certain aspects of Feder's results (i.e., jump at the origin) sug-
gests the need for further research on the asymptotic distribution of
-2 10gA under the two non-standard conditions in the parameter space men-
tioned earlier.
5.2 IK STATISTIC AS A MEASURE OF ASSOCIATION
We have discussed in section 3.2 that a measure of variation R1.;
could be constructed from the c(a) statistic TK by noting the duality
of the c(a) statistic TK and Light and Margclinls Catanova statistic.
Since R~ . is computationally equivalent to a Goodman-Druskal ISJO'
.e
139
t b, an estimate of their Tb, the following properties of R10i or equiva
lently, t b are known. (Goodman and Kruskal, 1954, 1963, Margolin and
Light, 1974)
i) If there exists j such that n+j=n++, the TSSjoi is equal to zero;
hence R~ . is undefined.JO'
ii) If there does not exist a j such that n+j=n++ and if nij=ni+nj+/n++ for
all pairs of (i,j), then R10i =0.
iii) If there does not exist a j such that n+j=n++ and if for each i2 _
there exists a j such that nij=n i+, then Rjoi - 1.2iV) If none of (i), (ii), or (iii) occurs, then O<R ..<1.JO'
v) R10i is unchanged if all counts {nij } are multiplied by the same
positive constants.
vi) R~. is asymmetric in its treatment of rows and columns of a continJ.'gency table.
vii) R~ . is invariant under the permutation of rows or columns of aJ.'contingency table.
Even though R1.i is computationally equivalent to Goodman-Kruskal's
t b, the two are derived under different sampling models. For Goodman and
Kruskal (1954), and Light and Margolin (1971) row margins are fixed group
sample sizes and columns represent the response from a fixed effect or a
product-multinomial model. Here the column margins are fixed group sample
sizes and the row represents the response from a random effects or a product
Dirichlet-multinomial model.
An hypothetical example of a fixed group effects model in which
2Rj •i can be used as a measure of association can be envisaged as in the
.e
140
following situation; suppose nA, nB, and nC represent the number of
patients with final diagnostic records A, B, and C, respectively, who
were initially classified into primary diagnostic records AI, BI , and C1
as in the following contingency table.
~, FinalA B C
Primary"'",--
AI nll n12 n13
B' n21 n22 n23
C' n31 n32 n33
Total nA nB nCI
Table (5.1)
One of the primary interests in this situation may lie in how much the
primary diagnostic records can account for the final diagnostic records.
The causal relation of interest goes from the row to the column, where-
as the data can be collected so that the probability model of the con
tingency table is based on the column-wise product multinomials model,
i.e., nA, nB, and nC are fixed.
We feel further research needs to be done to study Rj.i as a measure
of association in the fixed and random group effects models.
5.3 THE NESTED RANDOM GROUP EFFECTS MODEL OF COUNT DATA
In chapter 4 we discussed a nested mixed effects model of count data
within each row category), respectively.
.e
141
in which random effects are nested within fixed effects. A natural ex-
tension of the nested mixed effects model would be the corresponding
nested pure random effects model within the framework of a Dirichlet
multinomial distribution. By drawing an analogy to nested random ef
fect models in ANOVA we may explicitly specify the nested random
effects model for count data. Only the balanced case is duscussed; here
the data can be presented in the form of an IxRxC contingency table,
where I, R, and C represent the number of response categories (nested
Let no ok be the number of sub'J
jects that are classified in (i, j, k) cell, and let ~jk = (nijk, ... ,nI-ijk)~
denote a response vector. It' notations will be used for denoting the
sum of nijk's over the corresponding indices.
Now, we may imagine that there is a Dirichlet population cf row cat
egories, labeled by the parameter (n,8), from which the R levels e1' e2'···' eR
of the row category are sampled. We next suppose that for each ej there
exists another Dirichlet population distribution of C levels ej1' ej2,.·.,ejc
of the column category and that D(ej,8) is the population distribution
of ej1' Ej2'···' ejc given ej· Similarly given (ej'~jk} the response
vector nOk can be conceived as a single observation from the multinomial-J
distribution M(n+jk'~jk).
Using the conditioning arguments we may express the hierarchy of the
nesting as follows:
. "d(pJn)'! D(n,81)
:.J - -
° °d(p"k!po)'2 D(p.,82)-J -J -J
142
(n'k1n+'k'P.,P·k) - M(n+'k'P' k)~.J J -J -J J -J
for j=1,2, ... ,R and k=1,2, ... ,C, where (xla,B) - F is understood that the
conditional distribution of x given a and B follows the distribution F.
We define the following notation for convenience.
* ~TI = (TIl' ... , TIl_I)
*ej = (Plj' ... , Pr-1,j)
*~jk = (P1jk'
*~jk = (n1jk ,
, PI - 1jk)
, nl-1jk)~
.e
Using the analogy to the nested random effects model of continuous data,
*we may represent the mean response vector n+jk~jk in the k-th column
category within the j-th row category as
where
(5.6)
* *C'k = P'k - p,-J -J -J(5.7)
The two random vectors R, and CJ'k have zero means by their construction-J -
and variances given by
(5.8)81 * *~
Var(~J'k) = 82
+1 [Op .. - p.p. ]lJ -J-J
143
where 0 = diag(TI1, ... ,TIr 1) and D is similarly defined. We may noteTI i - Pij
further that the two random vectors R. and C. k are uncorrelated since~J -J
" * " *E[p,D·kITI ,p.] = R. E[C·kITI ,p.] = a ,J J - -J -J -J ~ -J
hence
(5.9)
.e
Eqn. (5.6) resolves the mean response vector into parts which may be re
garded as the overall mean, row effects and column effects (within each
row category). Also by noting (5.8) the hypothesis of no row random ef
fects Ho : ej=O for all j is equivalent to HR : 81=0, and similarly the
hypothesis of no column random effects (within each row category) is equal
Since the response vector ~jk is a random observation from M(n+jk,ejk)'
we may specify the nested random effects model of interest as
* *n' k = n+'k[TI +R.+C·kJ + €'k~J J - -J -J -J
(5.10)
* *"where €'k has mean vector a and covariance n 'k[Dp - P'kP'k ] and ~J'
-J - +J i j k -J "J-
and ~jk are independent, defined in (5.7), and have zero mean vectors and
variances as in (5.8). The model (5.10) is in complete analogy to the
nested random effects model of continuous data except for the normality
assumptions which are not valid here.
The joint distribution of {nijk} in a nested random effects model
is not in tractable form. Nevertheless, it is mildly encouraging that
one can specify the nested random effects model of count data in the ex
e plicit form of (5.10). Even though we feel that the arguments of chap-
144
ter 4 may be similarly employed for the hypotheses test HR 81=0 and
He : 82=0, no results have been obtained at this time.
BIBLIOGRAPHY
Altham, Patricia M. E. (1978). Two generalizations of the binomial distribution. Applied Statistics 27, 162-7.
Ames, Bruce N., McCann, Joyce and Yamasaki, Edith (1975). Methods fordetecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenecity test. Mutation Research 31, 347-64.
Bahadur, R. R. (1961). A representation of the joint distribution ofresponses to n dichotomous items. In Studies on Item Analysis andPrediction, H. Solomon (ed.), Stanford University, Stanford, California.
Blischke, W. R. (1962). Moment estimators for the parameters ofa mixture of two binomial distributions. Annals of Mathematical Statistics33, 444-54.
Blischke, W. R. (1964). Estimating the parameters of mixtures of binomial distributions. Journal of the American Statistical Association59, 510-28.
Brier, Stephen S. (1980). Analysis of contingency table under clustersampling. Biometrika 67, 591-6.
Chanda, K. C. (1954). A note on the consistency and maxima of the rootsof likelihood equations. Biometrika 41, 56-61.
Chandra, S. (1977). Onthe mixture of probability distributions. Scan-dinavian Journal of Statistics 4, 105-12. ----
Chatfield, C. and Goodhardt, G. J. (1970). The beta-binomial model forconsumer purchasing behavior. Applied Statistics 19, 240-50.
Chatterji, S. D. (1963). Some elementary characterizations of the Poisson distribution. American Mathematical Monthly 70, 958-64.
Chernoff, Herman (1954). On the distribution of the likelihood ratio.Annals of Mathematical Statistics 25, 573-8.
Choi, K. and Bulgren, W. G. (1968). An estimation procedure for mixturesof distributions. Journal of the Royal Statistical Society B 30, 44460.
ratio test --- withUnpublished Manu
Sciences, Research
.,
.e
146
Chung, Kai Lai (1974). A Course in Probability Theory. Academic Press,New York.
Collings, Bruce J. and Margolin, Barry H. (1983). Testing of fit forthe Poisson assumption when observations are not identically distributed. Submitted for Journal of the American Statistical Association.
Cramer, Harald (1946). Mathematical Methods of Statistics. PrincetonUniversity Press, Princeton.
Crowder, Martin J. (1978). Beta-binomial anova for proportions. AppliedStatistics 27, 34-7.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the RoyalStatistical Society B 39, 1-38.
Deely, J. J. and Kruse, R. L. (1968). Construction of sequences estimating the mixing distribution. Annals of Mathematical Statistics 39,286-88.
Efron, B. and Hinkley, D. V. (1978). The observed versus expected information. Biometrika 65, 581-90.
Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions.Chapman and Hall, London .
Feder, Paul 1. (1968). On the distribution of the log likelihood ratiotest statistic when the true parameter is II near ll the boundaries ofthe hypothesis regions. Annals of Mathematical Statistics 39, 204455.
Feder, Paul I. (1978). The beta binomial likelihoodapplication to the analysis of toxicological data.script, National Institute of Environmental HealthTriangle Park.
Feller, W. (l943). On a General Class of IIContagious ll Distributions.Annals of Mathematical Statistics 14, 389-400.
Fell er , Wi 11 i amApplication.
(1968). An Introduction to Probability Theory and ItsJohn Wiley and Sons, New York.
Fienberg, Stephen E. (1975). Comment on liThe observational study - areview ll by Sonya t,1cKinlay. Journal of the American Statistical Association 70, 521-3.
Fraser, D. A. S. (1957). Nonparametric Methods in Statistics. JohnWiley and Sons, New York.
147
Greenwood, M. and Yule, G. U. (1920). An inquiry into the nature offrequency distributions representative of multiple happenings withparticular reference to the occurrence of multiple attacks of diseaseor of repeated accidents. Journal of the Royal Statistical SocietyA83, 255-79.
Griffiths, D. A. (1973). Maximum likelihood estimation for the betabinomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29, 63748.
Goodman, Leo A. and Kruskal, William H. (1954). Measures of association for cross classifications. Journal of the American StatisticalAssociation 49, 732-64.
Goodman, Leo A. and Kruskal, William H. (1959). Measures of association for crosS classifications. II: Further discussion and reference.Journal of the American Statistical Association 54, 123-63.
Hardy, G. H., Littlewood, J. E. and Polya, G. (1964). Inegualities.Cambridge University Press, Cambridge.
Hasselblad, V. (1969). Estimation of finite mixtures of distributionsfrom the exponential family. Journal of the American StatisticalAssociation 64, 1459-71.
Haseman, J. K. and Kupper, L. L. (1979). An analysis of dichotomousresponse data from certain toxicological experiments. Biometrics 35,281-93.
Haseman, J. K. and Soares, E. R. (1976). The distribution of fetaldeath in control mice and its implications on statistical tests fordominant lethal effects. Mutation Research 41, 277-88.
Hill, David L., Saunders, Roy and Laud, Purushottam W. (1980). Maximumlikelihood estimation for mixtures. Canadian Journal of Statistics~, 87-93.
Jewell, Nocholas P. (1982). Mixture of exponential distributions. Annals of Statistics 10, 479-84.
Johnson, Norman L. and Kotz, Samuel (1969). Discrete Distributions.John Wiley and Sons, New York.
Johnson, Norman L. and Kotz, Samuel (1977). Urn Models and Their Application. John Wiley and Sons, New York.
Kabir, A. B. M. L. (1968). Estimation of parameters of a finite mixture of distributions. Journal of the Royal Statistical Soceity B30, 472-82. -
148
Kemp, C. D. and Kemp, Adrienne W. (1956). The analysis of point quadrat data. Australian Journal of Botany 4, 167-74.
Kiefer, Nicholas M. (1978). Discrete parameter variation: Efficientestimation of a switching regression model. Econometrica 46,427-34.
Kupper, L. L. and Haseman, J. K. (1978). The use of a correlated binomial model for the analysis of certain toxicological experiment.Biometrics 34, 69-76.
Laird, N. (1978). Nonparametric maximum likelihood estimation of amixing distribution. Journal of the American Statistical Association 73, 805-11.
Lehmann, E. L. (1959). Testing Statistical Hypothesis. John Wiley andSons, New York.
Light, Richard J. and Margolin, Barry H. (1971). An analysis of variance for categorical data. Journal of the American Statistical Association 66, 534-44.
Lindsay, Bruce G. (1983). The geometry of mixture likelihoods: A general theory. Annals of Statistics 11, 86-94.
Louis, Thomas A. (1982). Finding the observed information matrix whenusing the EM algorithm. Journal of the Royal Statistical Society B44, 226-33.
Margolin, Barry H., Kaplan, Norman and Zeiger, Errol (1981). Statistical analysis of the Ames salmonella/microsome test. Proceedings ofthe National Academy of Sciences 78, 3779-83.
Margolin, Barry H. and Light, Richard J. (1974). An analysis for categorical data, II: Small sample comparisons with chi square and othercompetitors. Journal of the American Statistical Association 69, 75564.
Mosimann, James E. (1962). On the compound multinomial distribution,the multivariate - distribution, and correlations among proportions.Biometrika 49, 65-82.
Moran, P. A. P. (1970). On asymptotically optimal test of composite hypotheses. Biometrika 57, 47-55.
Neveu, J. (1965). Mathematical Foundation of the Calculus of Probabilj!y, Holden Day, San Francisco.
Neyman, J. (1947). Outline of statistical treatment of the problem ofdiagnosis. Public Health Reports 62, 1449-56.
149
Neyman, Jerzy (1959). Optimal asymptotic tests of composite hypotheses.Probability and Statistics: The Herald Cramer Volumn, ed. Ulf Grenander. John Wiley and Sons, New York.
Orchard, T. and Woodbury, M. A. (1972). A missing information principle:Theory and application. Proceedings of Sixth Berkeley Symposium onMathematical Statistics and Probability 1, 697-715.
Paul, S. R. and Plackett, R. L.son mixtures, Biometrika 65,
(1978). Inference sensitivity for Pois591-602.
Pearson, K. (1894). Contributions to the Mathematical Theory of Evolution. Philosophical Transactions of Royal Society of London A 18571-110.
Pearson, K. (1915). On certain types of compound frequency distributionsin which the components can be individually described by binomial series. Biometrika 11, 139-44.
Potthoff, Richard F. and Whittinghill, Maurice (1969 a). Testing forhomogeneity I: The binomial and multinomial distributions. Biometrika53, 167-82.
Potthoff, Richard F. and Whittinghill, Maurice (1969 b). Testing forhomogeneity II: The Poisson distributions. Biometrika 53, 183-90.
Puri, Madan Lal and Sen, Pranab Kumar (1971). Nonparametric Methodsin Multivariate Analysis. John Wiley and Sons, New York.
Rider, Paul R. (1961 a). The method of moments applied to a mixtureof two exponential distributions. Annals of Mathematical Statistics32, 143-7.
Rider, Paul R. (1961 b). Estimating the parameters of mixed Poisson,binomial and Weibull distributions by the method of moments. Bulletinof the International Statistical Institute 39 Part 2, 225-32.
Ronning, Gerd. (1982). Characteristic values and triangular factorization of the covariance Matrix for multinomial, Dirichlet and multivariate hypergeometric distributions and some related results. Statistische Hefte 23, 152-76.
Roy, S. N., Greenberg, B. G. and Sarhan, A. E. (1960). Evaluation ofdeterminants, characteristic equations, and their roots for a classof patterned matrices. Journal of the Royal Statistical Society B22, 348-59.
Segreti, Anthony C. and Munson, Albert E. (1981). Estimation of themedian lethal dose when responses within a litter are correlated.Biometrics 37, 153-6.
150
Scheffe, Henry (1959). The Analysis of Variance. John Wiley and Sons,New York.
Shuster, J. J. and Downing, D. J.for complex sampling schemes.
(1976). Two-way contingency tablesBiometrika 63, 271-6.
.e
Simar, Leopold (1976). Maximum likelihood estimation of a compoundPoisson process. Annals of Statistics 4, 1200-9.
Skellam, J. G. (1948). A probability distribution derived from thebinomial distribution by regarding the probability of success asvariable between the sets of trials. Journal of the Royal Statistical Society B 10, 257-61.
Stroud, T. W. F. (1971). On obtaining large-sample tests from asymptotically normal estimators. Annals of Mathematical Statistics 42,1412-24.
'Student' (1907). On the error of counting with a hemacytometer. Biometrika 5, 351-60.
Sundberg, R. (1974). Maximum likelihood theory for incomplete datafrom an exponential family. Scandinavian Journal of Statistics 1,49-58.
Symons, M. J., Grimson, R. C. and Yuan, Y. C. (1983). Clustering ofrare events. Biometrics 39, 193-205.
Tarone, R. E. (1979). Testing the goodness of fit of the binomial distribution. Biometrika 66, 585-90.
Tarone, Robert E. and Gruenhage, Gary (1975). A note on the uniquenessof roots of the likelihood equations for vector-valued parameters.Journal of the American Statistical Association 70, 903-4.
Teicher, H. (1963). Identifiability of finite mixtures. Annals of Mathematical Statistics 34, 1265-9.
Wilks, S. S. (1938). The large sample distribution of the likelihoodratio for testing composite hypotheses. Annals of Mathematical Statistics 9, 60-2.
Wilks, Samuel S. (1962). Mathematical Statistics. John Wiley and Sons,New York.
Williams, D. A. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-52.
Wisniewski, T. K. M. (1968). Testing for homogeneity of a binomial series. Biometrika 55, 426-8.
•
151
Wolf, J. H. (1970). Pattern clustering by multivariate mixture analysis.Multivariate Behavioral Research 5, 329-50.
Wu, C. F. Jeff (1983). On the convergence property of the EM algorithm.Annals of Statistics 11, 95-103 .