STUDIES OF MULTINOMIAL

MIXTURE MODELS

Byung Soo Kim

A Dissertation presented to the faculty of TheUniversity of North Carolina at Chapel Hill inpartial fulfillment of the requirements of thedegree of Doctor of Philosophy in the Departmentof Statistics

Chapel Hi 11

April 1984

Approved by:

~-Jf!J17~

~~Reader

BYUNG sao KIM. Studies of Multinomial Mixture Models(Under the direction of Barry H. Margolin)

We investigate certain inferential aspects of mixtures of multi-

nomial distributions, both in nonparametric and parametric contexts.

As a nonparametric mixture model we propose a k-population finite mixture

of binomial distributions, which can be applied to the analysis of non

iid data generated from a series of toxicological experiments. A necessary

and sufficient identifiability condition for the k-population finite mix

ture of binomials is obtained. The maximum likelihood estimates (MLE's)

of the k-population finite mixture of binomials is computed via the EM

algorithm (Dempster, Laird and Rubin, 1977), and the asymptotic properties

of the MLE's are discussed. The identifiability condition is equivalent

to the positive definiteness of the information matrix for the parameters.

The MLE's and their sampling distributions, together with the data

mentioned above, provide an empirical check of the statistical procedures

proposed by Margolin, Kaplan and Zeiger (1981).

The Dirichlet-multinomial distribution, a parametric mixture of

multinomials, is discussed as a random group effects model for a one-way

layout contingency table. Interest focuses on testing the hypothesis of

no random effects. For this testing problem' Neyman's C(a) procedure

yields a new test statistic, which is aymptotically superior to Pearson's

chi-square test. This superiority is further evidenced by a Monte Carlo

simulation study. A duality between the C(a) statistic and the Catanova

statistic proposed by Light and Margolin (1971) is demonstrated.

The random effects model for the one-way layout contingency table is

extended within the framework of the Dirichlet-multinomial distribution

to a balanced nested mixed effects model, and two hypothesis testing

problems are investigated.

ACKNOWLEDGEMENTS

I wish to express my deepest gratitude to my research advisor,

Dr. Barry H. Margolin, for his suggestion of this topic and for his

guidance and encouragement throughout the duration of this research.

I also would like to thank my committee members, Dr. Norman

Johnson, Dr. Gordon Simons, Dr. Doug Kelly, and Dr. David Ruppert,

for their careful reading of the manuscript and many valuable comments.

The financial support from the Statistics Department has been

indispensable; without it my stay in Chapel Hill would not have been

possible. Thanks are also extended to Dr. David Hoel of the National

Institute of Environmental Health Sciences for providing computer

facilities. Credit is also due to Mrs. Judy Harrelson and Mr. K. Doug

Vass for their excellent typing job.

I am especially indebted to my parents at home in Korea, who have

been praying for the successful completion of my study in Chapel Hill.

Finally, I would like to thank my wife Myung Sook, son Stephen, and

mother-in-law for their understanding and support •

TABLE OF CONTENTSPage

CHAPTER I INTRODUCTION AND SUMMARY . · · 1

1.1 The Binomial Distribution 1

1.2 Mixture Models of Count Data. · 3

1.3 Scope of the Thesis . · · · · · · 6

1. 3.1 The Finite Mixture of BinomialDistributions. · · · · · 6

1. 3. 2 The Dirichlet-MultinomialDistribution . · 10

1.4 Further Research. . · · 14

CHAPTER II FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS 15

2.1 Identifiability Problem 15

~e2.1.1 Preliminaries. · 15

2.1.2 I-Population Finite Mixture ofMultinomials · · · · · · · · 16

2.1. 3 k-Population Finite Mixture ofMultinomials · · · · · · · · · · 20

2.2 Estimation of the Mixing Distribution . 24

2.2.1 Preliminaries. · · · · · · · 24

2.2.2 Maximum Likelihood Equations 26

2.2.3 Asymptotic Distribution of theML Estimator . · · · · · · · · 31

2.3 k-Population Finite Mixture ofBinomials - Application. · · · 37

2.3.1 Description of the Ames Test 37

2.3.2 Statistical Analysis of theExperimental Data. · · · · · 38

e 2.3.3 Further Analysis of the DerivedData: Mixture Model · · · · · · 41

2.3.3.1 k-Population Mixtureof Two Binomials. .. 41

2.3.3.2 Results of the Analysis 50

2.3.3.3 Discussion of theResults . . . . 53

CHAPTER III MIXTURE OF MULTINOMIAL DISTRIBUTIONS

3.1 Binomial Case .

3.2 Random Effects Model of One-Way Layout. 77

3.2.1 Dirichlet-Multinomial Model. 78

3.2.2 Test of the Random Effects. 81

3.2.2.1 Case of P Known. 81

3.2.2.2 Case of P Unknown. 83

3.2.3 Approximate Null and AlternativeDistributions . .. .. 87

3.2.3.1 Approximate Null Distri-butions. . .. . 87

3.2.3.2 Approximate AlternativeDistributions. . . 91

3.2.4 ARE of X~ Relative to Tk. . 93

3.2.5 Monte Carlo Simulation: PowerCompari son.. . 95

3.2.6 Duality between C and Tk. .. 100

Appendix I: Wisniewski-type Alternatives 103

Appendix II: The Dirichlet-MultinomialAlternatives. . . . . 107

CHAPTER IV BALANCED NESTED MIXED EFFECTS MODEL .

4.1 Introduction ..

4.2 Test of the Nested Random Effects.. 108

4.2.1 C(a) Test. . . . . . . . . . 109

4.2.2 ARE of X(3) Relative to T(3).. 114

4.3 Test of Equality of the Fixed RowEffects. . • . . . . . . . . . . . . 115

4.3.1 Wald Statistic and Chi-SquareStatistic . . . . . . . . . 117

4.3.2 ARE of a Test F Relative to aTest C. . . . . . . . . . . .. 122

4.3.3 Wald Statistic for Testing theEquality of Fixed Effects. 130

CHAPTER V FURTHER RESEARCH . 133

5.1 The Finite Mixture of Binomial Distri-butions. . . . . . . . . . . . . . .. 133

5.2 Tk Statistic as a Measure of Associa-tion . . . . . . . . . . . . . . . .. 138

-e 5.3 The Nested Random Group Effects Modelof Count Data. . . . .

BIBLIOGRAPHY....

CHAPTER I

INTRODUCTION AND SUMMARY

1.1 The Binomial Distribution

Among the class of discrete distributions the binomial distribution

is by far the most widely used for bounded count data, while the Poisson

distribution enjoys the same role for unbounded count data. Since the

binomial and Poisson distributions share certain important properties

and there exist several interesting relations between these two distri-

butions, we include the Poisson distribution in the context of our

discussion of the binomial distribution. To introduce notation, these

two distributions are formally defined as follows:

Definition 1.1: A random variable X has a binomial distribution with

parameters nand p, denoted by X~ B(n,p), if

) n x )n-xPr(X=x = (x)p (l-p+for x=O,l, ... ,n, 0 < p < 1, and n E I,

+where I is the set of positive integers.

(1. 1)

For notational convenience

we write b(x;n,p) for the binomial mass function (1.1) and B(x;n,p) for

the corresponding distribution function.

Definition 1.2: A random variable X has a Poisson distribution with

parameter A, denoted by X~ P(A), if

Pr(X=x) = e-AAx/x!

for x=O,1,2, ... , and A> O.

(1 .2)

These two distributions possess a variety of desirable properties,

which, in part, account for their popularity. Both the Poisson family

+{P(\);\ > O} and the binomial family {B(n,p); n E I and known, O<p<l}

are one-parameter exponential families. Hence the ML estimator for the

single parameter is a sufficient statistic and achieves the Cramer-Rao

lower bound. If Xl ,X2, ... ,X k are independent Poisson random variables

such that Xi - P(\i)k

Xi given ~j=lXj = n

the DeMoivre-Laplace

for i=l, ... ,k, then the conditional distribution of

is B(n, \./\~ 1\0) for i=1,2, ... ,k. For large n,, LJ = J

limit theorem admits a normal approximation to the

binomial distribution.

There also exist several interesting relations between Poisson and

binomial distributions. Among them we may note a result due to Chatterji

(1963) that if X and Yare independent nonnegative integer-valued random

variables such that Pr(X=x) > 0 and, Pr(Y=x) > 0 for x = 0,1,2, ... , and

the conditional distribution of X given X + Y is binomial, then both X

and Yare Poisson random variables. Another relation that is usually

referred to as the 'Poisson approximation of a binomial I is cited as a

lemma for (Feller, 1968, p. 153).

Lemma 1.1: Suppose that {Xn} is a sequence of random variables such that

Xn - B(n,Pn) and nPn -+ \ as n --+ 00, where 0 < \ < 00; then

(1 .3)

for x = 0,1,2, ... as n -+ 00 •

From the standpoint of this thesis, the most important property

shared by the binomial and Poisson distributions is that these two

distributions belong to a class of discrete probability distributions

that admit mathematically tractable mixture generalizations.

1.2 Mixture Models for Count Data

Definition 1.3: Let F(X;8) be a d-dimensional distribution function'" '"

indexed by an m-dimensional parameter vector ~ in a parameter space 8

and let G(~) be an m-dimensional distribution function. Then

is called a G mixture of F or simply a mixture, and F and G are referred

to as the kernel and the mixing distribution, respectively. If the

kernel in (1.4) is expressed in terms of a density function f(~;~), then

(1 .5)

(1 .6)

is called a mixture density when (1.5) exists.

If the mixing distribution G is discrete with finite support, then

we call the resulting mixture a finite mixture. Using the notation in

Johnson and Kotz (1969) we may write H(~) = F(~;~) i G(~) for (1.4) and

h(~) = f(~;~) @G(~) for (1.5). The following definitions are included

for notational purposes.

Definition 1.4: A random variable X has a gamma distribution with

parameters A > a and r > 0, denoted by X '" G(A,r), if it has a prob

ability density function (p.d.f) fG(x) given by

f ( ) - Ar r-l -AXGX-rrxrx e

for x > 0, where

r(t) foo t-l -xd= x e xa for t > O. (1 .7)

Definition 1.5: A random variable X has a negative binomial distribu-

tion with parameters m and c, denoted by X'" NB(m,c) if

( -1Pr(X=x) = r x+c )

( -1x! r c )

( cm )x( 1 Jcl+cm l+cm

(1 .8)

for x = 0,1,2, ... ,0 < m < 00, and 0 ~ c < 00; values for c = 0 are

understood to be evaluated in the limit as c + o.It is an easy exercise to show that a negative binomial distribution

can be obtained as a gamma mixture of Poissons, that is,

NB(m,c) = p(e) ~ G((mc)-l,c- l ) (1. 9)

Mixture models for count data have been formulated ever since the

limitations of the binomial and Poisson distributions to fit data were

first noted. In the binomial distribution the independence of n

Bernoulli trials and the constancy of the success probability p through-

out the n trials may be suspect in specific applications. For instance,

as is observed in Haseman and Kupper (1978), in certain animal studies

to investigate the toxicological effect of a compound there is a

tendency for implants from the same litter to respond more alike than

implants from different litters. That is, the litter-specific success

probability varies from litter to litter. Therefore, the independence

among implants in the same litter may not be maintained.

For the Poisson distribution the equality of mean and variance

places an important restriction on the applicability of the model in

practice. Thus, for example, Paul and Plackett (1978), and Margolin,

Kaplan and Zeiger (1981) study the effects of non-Poisson distributed

random variables, specifically negative binomial random variables, on

certain aspects of inference for the Poisson based model.

These limitations of the binomial or Poisson distributions are most

evident when there is clear 'heterogeneity' of count data; this in turn

has led to various considerations of alternatives or generalizations of

those two important count data distributions, foremost among which have

been mixture formulations.

As early as 1915, K. Pearson (1915), having noted the 'hetero-

geneity· of the data, considered a mixture of two binomial distributions

as a model for the counts on yeast cells analyzed by Student (1907).

Greenwood and Yule (1920) considered accident data and found that a

gamma mixture of Poissons, which as noted is a negative binomial

distribution, gave a closer fit to the data than a single Poisson

distribution. In the same vein Skellam (1948) introduced a beta-bino-

mial distribution in the form of a beta mixture of binomial distribu-

tions, after observing that the association probabilities varied from

nucleus to nucleus in the analysis of the secondary association of

chromosomes in Brassica.

As the above historical examples of mixture distributions indicate,

there have been two distinct subclasses of mixture models. A parametric

mixture is defined to be a mixture in which the mixing distribution has

a specific functional form, whereas in a nonparametric mixture the

mixing distribution does not have a specific functional form. The

mixture of two binomials is an example of a nonparametric mixture and

the negative binomial and the beta-binomial are examples of a para

metric mixture (Johnson and Kotz, 1969).

The flexibility and generality of a mixture model for count data

are gained at the expense of simplicity and the attractive properties1

of the binomial or the Poisson model. This is w~ll illustrated in the

search for maximum likelihood estimates (MLE) of parameters from a mix-

ture model of count data; in most cases this involves iterative solu-

tion of a set of equations rather than a closed form solution.

1.3 SCOPE OF THE THESIS

This thesis investigates certain inferential aspects of mixtures of

multinomial distributions, both in nonparametric and parametric mixture

models. In Chapter 2 a class of finite mixtures of binomial distribu

tions is proposed to model non-iid data generated from certain important

toxicological experiments, and the resultant implications are investi

gated. Chapter 3 combines studies of the goodness of fit test of the

binomial distribution against parametric mixture alternatives and the

development of a random effects model for count data in a one-way layout

contingency table by employing a Dirichlet mixture of multinomial

distributions. In Chapter 4 we discuss a balanced nested mixed effects

model based on a Dirichlet mixture of multinomial distributions.

Finally in Chapter 5 we suggest several problems for further research.

1.3.1 The Finite Mixture of Binomial Distributions

Besides K. Pearson's early attempt to fit Student's counts with a

mixture of two binomials (Pearson, 1915), one other application of this

approach is noteworthy. Neyman (1947) developed a finite mixture of

binomial distributions for the analysis of roentgenographic reading

results of tuberculosis tests. A set of four chest films of different

sizes was taken for each subject and each film was interpreted

independently by five expert radiologists. Neyman decomposed the patient

population into three categories: (i) entirely free from the disease,

(ii) moderately affected, and (iii) heavily affected; and associated

l-T, p, and 1 as the probabilities of correct diagnosis for the compo

nents (i), (ii), and (iii), respectively. Hence the number of positive

results among five independent reader diagnoses for a particular film

follows a mixture of three binomial distributions, or a mixture of two

binomials if the component (iii) is entirely dropped from the model.

However, as Neyman (1947) indicated lIin reality we may expect that

the subdivision of human population is much finer than is postulated

here and that the category of 'moderately affected I splits into a

continuous graduation of the intensity of the illness, from very slight

to very heavy ... 11 Thus a 'finite' mixture may be regarded as a

'simplification', which appears to have been the primary motivation for

introducing the finite mixture of binomial distributions.

There is, however, truly a need for a finite mixture of two bino-

mials in the context of imperfect testing of various materials for

positive or negative evidence of a specified characteristic that is

either present or absent. Let ¢ be a set of r materials that have been

tested and let the i-th subset of ¢ consist of r i materials, indexed by

(i,j), j=l, ... ,ri , each of which has been tested ni times forki = 1,2, ... ,k. Clearly Ii=lr i = r. Let T and (l-p) denote the prob-

abilities of a false positive and a false negative, respectively, and

let ~ be the prevalence rate of the characteristic in question. We

assume independence among all I~=lniri tests. For the i-th subset of r imaterials, each of which has been tested ni times, we can summarize the

observed data in a random vector ~i = (X il ,X i2 ,·· .,Xiri

), where Xij is

the number of positive findings in the ni tests with material (i,j).

Under our assumptions, the i-th observation vector X, is now a random~,

sample from a mixture of two binomial distributions, that is,iid

X, 1,X'2'''''X, ~ TI(n,. p} + (l-TI) B(n",T} (1.10)" , r i

for i=l, ... ,k. We note that ~l '~2""'~k' the totality of observations,

do not constitute an independent and identically distributed (iid) data

set unless the nils are all equal.

Even though the mixture model (1.10) is the primary motivation for

the research to be discussed, we note the generalization of (1.10) to

mixtures of c binomial distributions, where c ~ 2 is known a priori.

Define Gc to be a class of all discrete distribution functions with c

atoms. Then we have

i i dXll ' ... , Xl r

l~ B(n l ,p) A G(p)

iidX21'···'X2r2 ~ B(n2,p) A G(p)

P (1.11)

·eiid

Xkl ,· .. 'X krk ~ B(nk,p) A G(p) ,P

where G E Gc .

Equation (1.10) is then the special case of (1.11) where c = 2. We call

the mixture model (1.11) a k-population finite mixture of binomial

distributions and define the parameter space of interest to be

Ic- l. 1 n < 1, O<p <P2< .. ·<P <l} .1= i 1 c

When k=l, the formulation (1.11) reduces to iid data from a finite mix-

ture of binomial distributions, which is referred to as a l-population

finite mixture of binomials. Thus the i-th finite mixture of binomials

in (1.11) has a mixture densityc ni x n.-x

h.(e) = \. In.( ) p.(l- p.) 11 ~ L.J = J x J J

(1.12)

for x = O.l ni • where ~ = (·rr1... ·'TIc_1 ' P1'''.'Pc) and TIc =

1-Ij=1TI j • i =1, , k.

Our primary interest lies in the MLE of the mixing distribution G

in (1.11), which is equivalent to finding an MLE eof 8 in (1.12)..., ...,

because of the nonparametric nature of the mixing distribution G.

Before estimation is attempted, however, we need to verify that the

k-popu1ation finite mixture of binomial model in (1.11) is 'identifi

able'. Teicher (1963) showed that the 1-popu1ation finite mixture of

binomial distributions in (1.12), i.e., k=l, is identifiable if and

only if n1 ~ 2c-1. In Chapter 2 we extend Teicher's result to the

multinomial case and find the necessary and sufficient condition for

identifiability of a k-popu1ation finite mixture of multinomial distri-

butions.

The maximum likelihood approach to estimation of the mixing

distribution in the finite mixture problem has been discussed only since

the late 1960's when access to fast electronic digital computers made

it feasible. Hasse1b1ad (1969) developed a set of iterative equations

for obtaining MLE's of parameters in the 1-popu1ation mixture of members

of the exponential family. which was later recognized as a special case

of the EM algorithm (Dempster. Laird and Rubin. 1977). By extending

Hasse1blad ' s iterative equations to the case of the k-population

mixture of certain members of the exponential family. we obtain as a

special case an algorithm for the MLE ~ of ~ for the k-popu1ation

finite mixture of binomials model (1.11).

This theoretical framework of the k-population mixture of

binomials model is applied in Chapter 2 to the analysis of a sizable

database of Ames test data. gathered at considerable cost to the

federal government. The Ames test is a bacterial test that detects

evidence of mutagenicity for chemical compounds. The observed Ames test

data in a given experiment, which are counts, can be transformed into

a 0 or 1 indicating a non-mutagen or mutagen, respectively, through

analysis via a family of mutation models described in Margolin, Kaplan

and Zeiger (1981). Analysis of the derived 0-1 experimental results

based on the k-popu1ation mixture of two binomi'a1s provides estimates

of (i) the prevalence rate of mutagens among the test compounds, (ii)

the false positive rate and (iii) the false negative rate, as well as

the standard errors of these three estimates. By providing these

estimates of various parameters of interest, the k-popu1ation finite

mixture model provides an empirical check of the operational properties

of the mutation models and statistical procedures proposed by Margolin,

Kaplan and Zeiger (1981).

1.3.2 The Dirichlet-Multinomial Distribution

Using an analogy to fixed effects and random effects in linear

models, it appears that almost all the methods for the analysis of

multi-dimensional contingency tables have focused on fixed-effects

models. Fienberg (1975) points this out and lists the development of a

discrete analog to the nested and random effects (Model II) ANOVA models

among the unsolved problems in the analysis of multi-dimensional contin-

gency tables.

In section 3.2 we develop a random effects model for the one-way

layout contingency table. Defineo \'1-1

Sp = {(Pl,· .. ,PI-l); 0 < Pi < 1, l.i=1 Pi < l} (1.13)

(1.14)

For our development, we need the following definitions.

Definition 1.6: A random vector ~ = (Xl , ... ,X I_l ) has a multinomial

distribution with nand £ = (Pl, ... ,PI-l)' denoted by X~ M(n,£) if

n I xiPr(~=~) = ( )IT'- l p. (1.15)

Xl""'X I 1- 1

o \1-1 \1-1for ~ E Sx and £ E Sp , where xI=n - Li=l xi and PI=l - Li=lPi .

Notationally, m(~;n,£) and M(~;n,£), denote the multinomial mass function

(1.15) and corresponding distribution function, respectively.

Definition 1.7: A random vector ~ = (Ul , ... ,U I_l ) has a Dirichlet

distribution with ~ = (81" .. ,81), denoted by ~ ~ D(@) if it has a p.d.f

given by

·ef(B) 1-1 8i -l 1-1 81-1

= I ( IT u. )(l-I·-lu.)IT i =lf(8i ) i=l 1 1- 1

ofor u E S ,~ .':!

(1.16)

Iwhere 8i > 0 for i=l ,2, ... ,I, and B = Li=18i .

The Dirichlet distribution D(8) can be reparametrized so that it can be

denoted by D(E,e), where TIi = 8i /B and e = liB foroE = (TIl"" ,TI I_l ) ESE' A Dirichlet mixture of multinomial distributions

is called the Dirichlet-multinomial distribution, and denoted by

DM(n,E,e), that is,

(1.17)

Mosimann (1962) provides an extensive study of the Dirichlet-multinomial

distribution, thereby extending Skellam's work on the beta-binomial

distribution. Brier (1980) investigates the effect of the Dirichlet-

multinomial distribution on the chi-square test of a general hypothesis

in the one-way layout contingency table and shows that Pearson's

chi-square statistic is in fact asymptotically a constant multiple of a

chi-square random variable when the hypothesis is true.

Thus it follows that in a contingency table with I response

categories and G groups the Dirichlet-multinomial distribution

DM(n+ o ,TI,8) can introduce random group effects, since the j-th groupJ -

probability vector, say ej' is now randomly generated from D(n,8) and,

conditional on the observed ej' the j-th group response vector,

has a multinomial distribution M(n+.,p.), where n+. is the j-thJ -J J

size. In a handy notation this is described asi i d

£j ~ 0(;[,8)

(n . In+ . ,£ .) ~ M(n+ . ,£ .)~J J J J J

say ~j'

(1.18)

·e for j=1,2, ... ,G.

The primary concern of section 3.2 is hypothesis testing of the

presence of random group effects, which can be formulated as

Ho: e = 0 vs. Ha: 8 > O. (1.19)

For testing (1.19) we find that Neyman1s C(a) procedure yields a new test

statistic, denoted by Tk. The asymptotic relative efficiency e(X~ITk)

of the classical chi-square statistic X2 satisfiesp

(1. 20)

where the equality holds iff the group sizes {n+j};=l are asymptotically

balanced or G = 2. The superiority of Tk to X~ based on (1.20) is

further evidenced by a Monte Carlo simulation that compares the actual

performances of those two statistics in terms of their sizes and powers.

The formulation of the random effects model in the IxG contingency

table (1.18) is extended to a balanced nested mixed effects model in

Chapter 4. Using the conditioning arguments employed in (1.18), nested

mixed effects can be represented asind

(P·k1n.) ~ D(n.,S)~J ~J ~J

ind(n·k!n+·k,p·k) ~ M(n+·k,p·k)~J J ~J J ~J

for j=1,2, ... ,R and k=1,2, ... ,C,

(1.21)

where ~l' ~2" "'~R are fixed and correspond to R levels of the row

variable, and P'k corresponds to the k-th replication within the j-th~J

level of the row variable.

In the model (1.21) interest centers on the hypotheses of no nested

random effects and the equality of the fixed row effects, which are

·e respectively formulated as

H : 8 = a vs.r

(1. 22)

= ~R vs. ( 1. 23)

The C(a) procedure can be readily extended for problem (1.22); however,

for testing (1.23) two side questions can be raised.

(i) Are the Wald statistic and the Pearson's chi-square statistic

asymptotically equivalent in the presence of nested random

effects?

(ii) What is the cost of analyzing the balanced nested mixed effects

model as if it were a crossed mixed effects model?

Complete answers to those two questions are not available for general I,

R, and C; however, based on the results for I = R = 2 and general C in

section 4.3 the answer to (i) appears to be yes. The answer to (ii)

appears to be sizable when the group sizes {n+jk} exhibit 'reasonable'

departures from balance. Finally in section 4.3 a Wa1d test is con

structed for testing (1.23).

1.4 FURTHER RESEARCH

In Chapter 5 four problems for further research are discussed.

(i) Study of the uniqueness of the MLE ~ of a finite mixture of

binomial distributions.

(ii) Development of the likelihood ratio test in the finite mixture

problem for testing Ho: c=l versus Ha: c=2, where c refers to

the number of components of a population.

(iii) Use of Tk statistic as a measure of association.

(iv) Development of a random effects model in a two-way layout

contingency table.

CHAPTER II

FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS

This chapter focuses on problems relating to the k-population

finite mixture of binomial distributions, such as identifiability,

estimation of the mixing distribution and the asymptotic covariance

matrix, and the asymptotic distribution of the ML estimator. Finally

an example will be presented together with extensive numerical

analyses.

2.1 IDENTIFIABILITY PROBLEM

2.1.1 Preliminaries

Estimation of the mixing distribution in any mixture problem is

meaningful only if the mixture distribution is 'identifiable'. Early on

K. Pearson (1894) treated this problem for the case of a mixture of two

normal distributions; later Feller (1943) observed that any mixture of

Poisson distributions was always identifiable due to the uniqueness

property of the Laplace transform.

Teicher (1963) pursued the study of identifiability in the case of

finite mixtures, including the finite mixture of binomials. A portion

of his development of this topic is summarized below.

*Let Gc be the class of all discrete distribution functions with at

most c atoms, and let HF be the class of all finite mixtures of F given

HF = {H(x); H(x)

= {H(x); H(x)

= fOF(x;8)dG(8), G

= F(x;8) A G(8)} .8

Definition 2.1: If H is considered as the image of the map of G,

then HF is said to be identifiable if and only if this map defines a

*one to one map of Gc onto HF.

Teicher (1963) found a necessary and sufficient identifiability

condition for the class of all finite mixtures of binomial distributions

with n fixed, which is stated as a lemma.

Lemma 2.1 (Teicher, 1963): Let B = {B(x;n,p); 0 < p < l} be a

one-parameter family of binomial distribution functions, n being fixed.

A necessary and sufficient condition that the class

*HB = {H(x); B(x;n,p) A G(p), G E Gc}P

is identifiable is that n ~ 2c - 1.

2.1.2 l-Population Finite Mixture of Multinomials

In exploring another dimension of the identifiability problem,

Chandra (1977) related the identifiability of the class of mixtures of

multivariate distributions to the identifiability of the class of

mixtures of the corresponding marginals. In what follows, G is defined

to be a class of arbitrary distribution functions. Let X. ~ F.(o;e.)1 1 ~l

for i=l, ... ,k and let ~ = (xl' ... ,xk) ~ F(o;~). Then Chandra (1977)

in his theorem 2.1 showed that the identifiability of the class

HF. = {Hi(x); Hi(x) =1

for all i=1,2, ... ,k

F·(X;8.) A G·(8.), G· E G}1 -1 6 1 _1 1

implied the identifiability of the class

= {H(x); H(x) = F(x;8) A G(8), G E G}- ~

Chandra's theorem permits an immediate extension of Teicher's

results to yield a new identifiability condition for the class of finite

mixtures of multinomial distributions .

Lemma 2.2: Let M{x;n,p) be the distribution function of a multinomial'" '"

distribution with parameters (n'E)' where £ = (Pl , ... ,Pr ), Pi > 0, and

\r P = 1. Then the classLi=l i

*HM= {H(~); M(~;n,£) EG(£), G E Gc}

is identifiable if and only if n ~ 2c - 1.

Proof. Let Gi(Pi) be the marginal distribution function (d.f) of

G(Pl , ... ,P r ) with respect to Pi' Then the marginal d.f. Hl of Xl can

be obtained as

Hl (x) = f f .. · f dH(xl , ... ,xr )(_co,x] ><z X

1 fl f= fo'" O[ f · .. I dM(~;n,p)JdG(Pl'''''P )(_co,x]><Z X

= I6 .. ·!6( I dB(xl;n,Pl))dG(Pl'''''Pr )(_oo,x]

1 1= fo .. ·foB(x;n,Pl)dG{Pl'·"'Pr )

= B(x;n,Pl) A Gl(Pl)'Pl

where the interchange of integrations in the second step can be

justified using the result in Neveu (1965, p.77).

Similarly X. '" B(x;n,p.) A G.(p.) for i=l, ... ,r. Since G,- is a, , Pi ' ,

marginal distribution of Pi' the number of atoms, say ci ' is less than

or equal to c. Thus n ~ 2c - 1 implies n ~ 2c. - 1 for i=l , ... ,r.,Hence by lemma 2.1 each class of mixtures of binomial distributions is

identifiable. Thus by theorem 2.1 of Chandra (1977), HMis identifi

able if n ~ 2c - 1.

For the necessary condition we prove the contrapositive. Suppose

n < 2c - 1. Thus it suffices to show that there exist two different

mixing distributions, say G1 and G2 in G, giving rise to a common

mixture distribution. Consider G1 and G2 whose c atoms are

2i = (p, ... ,p, qi' 1-(r-2)p-qi) for i=l, ,c with corresponding prob-

abilities n = (nl, ... ,n ) and 2 +' = (p, ,p, q +., 1-{r-2)p-q +.)~ C Cl Cl C1

for i=l, ... ,c with corresponding probabilities n* = (nc+l , ... ,n2c )'

respectively, where ql, ... ,qc' qc+l, ... ,q2c are all distinct. To

prove the result we need to demonstrate the existence of ~ and n* such

But (2.2) is equivalent to

L~:16.M(x;n,p.) = a for all ~ ,1- 1 ~ ~1 .-(2.3)

for suitable choices of 6. Is. Since M(n,p) has the probability1 ~

generating function {tlPl+···+ tr_1Pr_l+(1-L~:~Pi)}nwhen E = (Pl,···,Pr)'

(2.3) is equivalent to

\~:16.{tlP+... + t 2P+t lq·+[1-{r-2)p-q.]}n = a (2.4)L1- 1 r- r- 1 1

for all t = (t1,···, t r _l ) .

Since (2.4) holds identically in t, (2.4) is equivalent to

L~~16i(1+up+Wqi)n = a (2.5)

( \r-2for all u,w), where u = L;=lti-l and w = tr_l-l.

Now, (2.5) holds if the following homogeneous linear equations have a

nontrivial solution;

p p p P

ql q2 'q3 q2c 82

p2 2 2 p2p p

pql pq2 pq3 pq2c 83 = 0 (2.6)2 2 2 2

ql q2 q3 q2c

p3 p3 p3 p3

2 2 2 2P ql P q2 P q3 • P q2c

2 2 2 2pql pq2 PQ3 PQ2c

82cQ3 3 3 3

Q2 Q3 . . . Q2c1

We rewrite (2.6) as £~ = Q, where £ is an (n;2) x 2c matrix and ~ is a

2cxl vector. After deleting the linearly dependent rows in £' (2.6)

can be reduced to

1 1 1 -81l

Ql Q2 • • • Q 622c2 2 2 = 0 (2.7)Ql Q2 •• • Q2c '"

l;2~n n n

Ql Q2 Q2c

which we denote by gf = Q. In order to have a nontrivial solution in

(2.7), rank(Q) < 2c. But this is guaranteed since n+1 < 2c, and hence

rank(Q) ~ min(n+1 ,2c) = n+1 < 2c. Thus nontrivial values of ~ and

hence of nand n* can be found to satisfy (2.2).

2.1.3 k-Popu1ation Finite Mixture of Mu1tinomia1s

Suppose we observe k sets of independent random variables

X. l 'X. 2' ... ,X. generated from M(n.,P) 1\ G(P) for i=l, ... ,k and1 1 lr i 1 ~ P ~

* ~G E Gc' In the following discussion we first define the identifiability

problem of the k-popu1ation finite mixture model in general and then

specialize it to the multinomial context. Let

be a class of k-vectors, each of whose elements is ad-dimensional

distribution function indexed by a point f2. E R~ in a Borel subset R'fof Euclidean m-space Rm such that each element F.(x:e) of the vector1 ~ ~

.E(k)C~:f2.) is measurable in Rdx R~. Then a vector of mixtures

H1C~ F1C~:f2.) g G(f2.

H2C~) F2(~:f2.) ~ G(f2.)

= (2.8)

*is the image of the map of G E Gc'

Definition 2.2: Let H~k) be the class of k-population mixtures of

F(k) induced by the above mapping. Then H~k) is said to be identifiable

if and only if this map is one to one from......G: onto H~k)

We call ~(k)(~) a k-population finite mixture a~d denote with

corresponding small letters the set of marginal probability density

functions if they exist.

We now specialize the argument to the multinomial context. Let

M(k) = ((~1(~;nl'£), M(~;n2,£), .. ·,M(~;nk'£)); £ = (Pl, ... ,Pr-l)'. \'r-lo < Pi < 1 for l=l, ... ,r-l, Li=lPi < l}

be a class of vectors of k multinomial distribution functions,

nl ,n2, ... ,n k being fixed, and let

M(~;nl ,E) pG(EJ

M(~;n2'£) £G(£)

H(k) ={ • G G*}M ' E: C

·e*H. = {M(x;n.,n) A G(n), G E GclJ ...... J.c. p .c.

Then we prove

j=l, ... ,k

Lemma 2.4: Suppose Hj is not identifiable. Then any Hi such that

n· ~ n. is not identifiable, and there exists at least one common pair1 J

(G l ,G2), G11G2 that are mapped to a given mixture for all i such that

n. ~ n ..1 J

*Proof. Non-identifiability of Hj implies that for Gl , G2 E Gc with

Gl ' G2 we have a common hj(~) such that

1 1hj(~) = fo ... Jo m(~;nj,£)dGl(£)

f l Jl= 0'" 0 m(~;nj,£)dG2(£)

However, by lemma 2.2, (2.9) is true if and only if n. ~ 2c-l. Now, weJ

define

+S(n,r) ={(tl, ... ,tr ); tid U{O}, for all i, \~ It. = n}l.1= 1

o ' + rS (n,r) ={(tl, ... ,tr ); tid U{O}, for all i, Li=lti ~ n},

where r+ is the set of positive integers.

·e1 1 n. Xl

h.(~)= fo ... fo( J )PlJ xl"",xr

I I 1 x.+y.f f rrr- p . 1 1 dG (p)

O' .• 0 i =1 1 1 ~

for (xl, .. ·,xr ) Eo s(nj,r), where 't = (Yl'''·'Yr-l)·

Consequently (2.9)

xr XL (-1)£( r)

£=0 9,

is true if and only if

for (xl' ... ,xr ) E s(nj,r),

( . )]J , (V l '··· ,vr _l ) i=1,2,

and v = (vl '···,v 1) .~ r-

Hence (2.9) is true if and only if

(1) _ (2)11 (vl'· .. ,vr _l ) -]J (vl'oo.,vr _l )

ofor v E S (n., r) .~ J

Thus the same result will follow for ni ~ nj with the same (Gl ,G2). 0

Lemma 2.5: IF there exists j ~ 1 such that Hj is identifiable, then

H(k) is identifiable.M

Proof. Suppose H~k) is not identifiable. Then there exist two~

*different mixing distributions Gl ,G2 E Gc such that

M(~;ni'£) £Gl (£) = M(~;ni'£) ~ G2(£)

for i=l, ... ,k with the same (Gl ,G2).

Consequently no H. is identifiable., J

for all x

Theorem 2.1: A necessary and sufficient condition that the class

H~k) of all k-population finite mixtures of multinomial distributions

be identifiable is

max n. > ~c-l.

l~i~k ' -

Proof: Without loss of generality we assume that nl~n2~... ~nk.

Suppose nl < 2c-l. Then by lemma 2.3 we have

Hl(~) = M(~;nl'~) pGl(~)~

= M(~;nl'~) £G2(~)

(2.11)

*for G1 f G2, where G1,G2 c Gc.

Hence by lemma 2.4 we still have

H.{x) = M(x;n.,n) A G1(n)1 ~ ~ 1,.(, P ,.(,

= M{~;ni'E) pG2(E)

for i =1, ... , k.

Thus H(k) is not identifiable.M~

(2.12)

The other direction is a direct application of lemma 2.5. 0

2.2 ESTIMATION OF THE MIXING DISTRIBUTION

2.2.1 Pre1 iminaries

Many methods have been suggested for the estimation of the mixing

distribution in the 1-popu1ation mixture model, ranging from the method

of moments to a formal maximum likelihood (ML) approach to methods based

on numerical analysis technique and minimum distance methods. See

Pearson {1894, 1915), Rider (1961a, 1961b), and Blischke (1962, 1964) for

the method of moments, Kabir (1968) for the numerical analysis technique,

Choi and Bulgren (1968) and Deely and Kruse (1968) for the distance

methods, and Hasselb1ad (1969), Sundberg (1974), and Dempster, Laird and

Rubin (1977) for the ML approach.

The rationale for concentrating on the moment estimator, or on the

distance method, rather than the ML approach had been that the latter

method yielded 'highly' intractable equations. However, it was not until

the late 1960's, with the emergence of fast electronic digital computers,

that the ML approach was suggested in various forms for incomplete data.

Observations from a finite mixture are considered incomplete, because the

component in the population from which each observation originates is

unknown. Thus Hasselblad (1969) developed a set of iterative equations

for obtaining estimates for the l-population finite mixture of members

of the exponential family, and Orchard and Woodbury (1972) proposed

the missing information principle (MIP) for a problem that originated

from estimating genotypic frequencies from phenotypic frequency data.

Sundberg (1974) considered a ML approach for incomplete data when the

iid data came from an exponential family member and suggested a

fundamental set of formulae for the current iterative computational

approach to obtaining the ML estimator. Under the assumption of

positive definiteness for the information matrix, he also obtained the

consistency, asymptotic efficiency and asymptotic normality of the ML

estimator.

The statistical analysis tools for incomplete data were finally

unified when Dempster, Laird and Rubin (1977) (henceforth abbreviated

DLR) suggested an EM algorithm, which included as special cases Orchard

and Woodbury's MIP and Hasse1blad's iteration equations for the finite

mixture problem. The 'E' in EM stands for the expectation step,

which consists of estimating the complete data sufficient statistic by

constructing the conditional expectation of the complete data given

the observed incomplete data and the current fit of the parameter. The

'M' implies a maximization step, which takes the estimated complete data

and estimates the parameters by the ML methods as though the estimated

complete data were the observed data. The EM algorithm is defined by

cycling back and forth between these two steps.

In their paper DLR showed that the likelihood is nondecreasing at

"each iteration of the EM algorithm. Recently Wu (1983) presented an

elegant study on the convergence of the EM algorithm, which was not

clear in the original work of DLR. One of Wu's primary results that

is relevant to our mixture model is that if the unobserved complete data

specification can be described by a curved exponential family or satis

fies a mild regularity condition (condition (10) in his paper), then

all the limit points of any EM sequence are stationary points of the

likelihood function. Also it has been recommended by various authors

(Hasselblad (1969), Wolf (1970), Laird (1978) and Wu (1983), among

others) that several EM iterations be tried with different initial

values to minimize the chance of possible entrapment at a stationary

point but not a local maximum.

Recently Louis (1982) aided applicability of the EM algorithm by

developing an implementation based on the complete data gradient and

the second derivative matrix to find the observed information matrix.

In many cases Efron and Hinkley (1978) have shown this to be a more

appropriate measure of the covariance matrix than the traditional

approximation 1(8), where e is a maximum likelihood estimator and 1

is the Fisher information matrix.

2.2.2 Maximum Likelihood Equations

In this subsection we extend Hasselblad's iterative equations for

obtaining the ML estimator of the l-population finite mixture of an

exponential family to the case of a k-population finite mixture.

Hence we can obtain the ML estimator of the k-population finite mixture

of binomial distributions as a special case of the resulting algorithm.

Let (X, 1,X'2""'X, ) be a random sample from the i-th population" '~i

distribution hi(x), which is a mixture of c component distributions,

that is,

h.(x) = \~ 1 TI.f. .(x;8.) ,1 LJ= J lJ ~J

(2.13)

where f.J.(X;8.) is the j-th component distribution in the i-th popula-

tion and is assumed to belong to an s-parameter exponential family,

TI. is a mixing proportion for the j-th component andJ

~j = (81j , ... ,8Sj ) is the parameter vector.

f .. (x;8.) = A.(x)C.(81 ·, ... ,8 .)ExP[81 ·Tl (x)+ ... +8 .T (x)J1J ~J , , J sJ J sJ s

for i=l, ... ,k and j=l, ... ,c.

Define ~ = ((Xll,· .. ,Xlrl ), (X21""'X2r2)"",(Xkl""'Xkrk)) ,

TI = (TI1, ... ,TI 1)' TI =l-\~=ll TI., and 1/1 = (TI,81, ... ,8c)'~ c- c LJ - J ;t. - -

Then the log-likelihood L*(x;~J) of the k-population data becomes~ ~

r l cL*(ZS;;l;) = I£=llog{Ij=l TIjAl (xl£)Cl (8 l j"" ,8sj )ExP[8ljTl (xl £)+ ... +

8sjTs (X H )]

r2 c+ I£=llog{Lj=l 1TjA2(x2£)C2(8lj"" ,8sj)ExP[81jTl(x2£)+"'+

8sjTs (x2Q,)J

(2.14)

(2.15)

If Cl ,C2"",Ck are differentiable, then the ML equations can be

derived as follows;

aL* _ r1 'ITjflj(xu) tlog C,+ Tp(XH~88

pj- IQ,=l h1(x1Q,} 88 pj

r2 'IT j f 2j (x2Q, ) tlog C2+ Tp(XU~+ LQ,=l h2(x2Q,} 88pj (2.16)

for p=l, ... ,s and j=l, ... ,c,

for j=l, ... ,c-l.

We assume there exists a real valued function C(~j) such that

(2.17)

8 log C. (A.)1 ~J

8 8 .PJ

for p=l, ... ,s, j=l, ... ,c,

8 log C(e . )~J

d 8 .PJ

(2.18)

where the nils are constants. It can be easily checked that the

binomial, Poisson, normal and gamma distribution G(A,ri ) with known

r. satisfy the condition (2.18).1

Setting equations (2.16) and (2.17) equal to zero yields the

following ML equations for p=l, ... ,s and j=l, ... ,c;

(2.20)

d log C(8 j) =

a8 .PJ

Tf. [rl fl·(xlQ) ~2 f2 .(x2£) r2 fk.(X k£)]Tf. = --l- I J + 2: J + + I J -

J r+ £=1 hl(xl £) £=1 h2(x2£) ... £=1 hk(xk£)'

kwhere r+ = Ii=l r i .

Next, consider the case where each of the k populations has a single

(component) distribution instead of a mixture distribution. If the

corresponding ML equations have a closed-form solution for 8p' we may

use that closed-form solution of 8p for the solution of equation (2.19)

and achieve a major simplification in computation. Thus let L*(x:~) bes ~ ~

the log-likelihood of ~ when the number of components c is equal to 1.

+(2.21 )

(0, , ... ,8 ) = 8 = 8, .s ~ ~

(2.22)

Under assumption (2.18) a set of ML equations is given byr l r 2 r k

= L£=lTp(x l £)+I£=lTp(X2£)+· .. +L£=lTp(xk£)

nlrl+n2r2+···+nkrk

ologC(e)d6p

for p=l ~ ... ~s.

If equation (2.22) has a closed form solution for 6p~ say

6p = gp (tl ' ... ~ t s ) ~ p=1 ~ ... ~ s . (2.23)

(2.24)

then equations (2.18) can be written as

(2.25)

·ewhere

t .OJ (2.26)

for 0=1 ~ ... ~ s ~ j =1 ~ ... ~ c .

If we denote the estimate of the parameter ;J:, at the v-th iteration by

~(v)~ then the t .I S in (2.26) can be evaluated by using ~(v). The new- ~ -estimates are given by (2.25) as

(v+1) _ ( (v ) (v ) (v) )6pj - gp t lj ~ t 2j ~ ... ,tsj .

Similarly (2.20) can be updated as

(2.27)

(v) r r(v+l) TI. 1

TI. =.-L-lLJ r+ £;1

for j=l ~ ... ~c ~

(2.28)

where the superscript (v) implies that the reference quantities are

evaluated at ~(v).

Thus we can use equations (2.27) and (2.28) as a basis for the iterative

algorithm. In the language of DLR (2.26) corresponds to the E-step

and (2.27)-(2.28) corresponds to the M-step in the EM algorithm.

We note that in the k-population finite mixture of multinomials

case only the largest ni determines the identifiability; hence there may

be elements in the k-population finite mixture that, marginally, lack

identifiability. For estimation purposes, however, data from all

k-populations are used even though some of them may not be marginally

identifiable.

2.2.3 Asymptotic Distribution of the ML Estimator

For the asymptotic normality of the ML estimator ~ for a

k-population finite mixture of binomial distributions we rely on the

usual maximum likelihood asymptotic theory. Cramer (1946) showed that

under certain regularity conditions the likelihood solution is con-

sistent, asymptotically normal, and asymptotically efficient. Cramer's

proof was extended by Chanda (1954) to the multivariate case. Chanda1s

proof of the uniqueness of the consistent root of the likelihood solu-

tion is not correct. A correct version is provided by Tarone and

Gruenhage (1975). In a more general setting Sundberg (1974) provides a

maximum likelihood asymptotic theory for incomplete data from an

exponential family member, which employs Chanda's extension of Cramer's

conditions. However, it is not clear in his proof that Sundberg checked

Tarone and Gruenhage's conditions that need to be added to Chanda's

conditions.

a10gL = 0 12ka8 ' r= , , ... , ,r

Lemma 2.6 (Chanda (1954)): Suppose f(y:~) is a probability density

law; ~ = (81, ... ,8k) is the unknown parameter vector and Yl'Y2'··· 'Yn

are n independent observations on y. The likelihood equations for

estimating ~ are given by

nwhere log L = L log f(Yi ;~).

Let ~O be the unknown true value of the parameter vector ~, which exist

at some point in the region 0. Then if the conditions (i)-(iii), below,

hold, there exists a unique consistent estimator e corresponding to a~n

solution of the likelihood equation. Further In(s -80) is asymptotic-~n ~

ally normal with mean 0 and covariance matrix 1(~o)-l, where 1(~O) is

the Fisher information matrix.

Condit ion (i): For almost all y and for all ~ E 0

a10gf a210gf and a310gfa8r a8 r a8s a8r aesaet

exist for all r,s,t = 1,2, ... ,k.

Condition (ii): For almost all y and for all ~ E 0

JOOHrst(y)f dy < M< 00

and Fr(Y) and Frs(Y) are bounded for all r,s,t = l, ... ,k.

Condition (iii): For all ~ ~ 0 the matrix

1(8) = foo (alogf] (alogfJ~ f d_00 a8 r a8 s y

is positive definite.

For the positive definiteness of the information matrix of the

k-population mixture of c binomials when c is known apriori we prove

the following:

Lemma 2.7: Let ~ = (TIl ,TI2,···,TIc_l' Pl ,P2""'Pc) and let the param

eter space 0 be given byc-l

0= {(TI1,· .. ,TIc-l,Pl'''''Pc): O<TIi<l, i=l, ... ,c-l .L1TIi < 1,1=

o < Pl < P2 < ••• < Pc < l}

Let ~O' the true parameter value be contained in some closed regionAlG which does not contain the boundary values of 0.

If the random variable Y = (Yl , ... ,Y k) where Y. = (Y·l, ... ,Y. )~ - - ~1 1 1r.

for i=1,2, ... ,k is distributed following the probability mass function

(2.29)

h (y:e) = IIY - - . 1~ 1=

where TIc = 1

II [TIlb(y .. ;n·,Pl)+TI2b(y .. ;n',P2)+... +TI b(y .. ;n.,p )]j=l 1J 1 1J 1 C 1J 1 c

c-l- L TI. ,

. 1 11=then the information matrix

l-r k r. alog h·(Y· .:8)] [k r. alog h.(y .. :8)]·-J= E \' \' 1 1 1J ~ \' \' 1 1 1J ~ (2)e q=lLj=l ae Li=lLj=l ae J' .30~O ~ ~

where h.(y .. :e) = \,ct lTI b(y .. ;n·,pt) for j=1, ... ,r1·, i=l, ... ,k,1 1J ~ L = t 1J 1is positive definite if and only if the identifiability condition

max n. > 2c-I holds.l.:s.i~k 1-

Proof. We first prove the positive definiteness of the information

matrix, say 1.(8), contained in a single observation Y from1 ~

ISuch a region always exists, because by the definition of the mixture ofof c components the boundary values of 0 are not true parameter values.

For convenience we write

(2.31)

(2.32)

The first partial derivatives of h.(y;e) are given by, ~

(2.33)

Since 1.(e) is a dispersion matrix, it is positive definite unless there, ~

exists r l= (Yl"" 'Y2c-l) f Qsuch that

-e \,2c-lL.£=l Yi

alogh.(y:e)1 ~

= 0 (2.34)

for y = O, ... ,ni .

The set of equations (2.34) constitutes a set of homogeneous linear

equations-1o Ay = 0 <=> Ay = 0h. ~ ~ ~ ~,, (2.35)

where Dh = diag(h.(O), ... ,h.(n.)), and A is a (n,.+1)x(2c-l) matrix. , , ,,whose y-th row appears in (2.33). Since the sum of each column of A is

equal to zero, we can delete anyone row from A in finding the solution

of (2.35) and let A* denote the resulting matrix. When ni =2c-l, the

matrix A* is nonsingular for ~ E 8, which can be shown by elementary,

but tedious column operations that transform A* into an Echelon form.

Consequently only a trivial solution exists for! in (2.35) if and only

if ni ~ 2c-l. Hence l i (e) is positive definite if and only if ni~2c-l.

Now. for the information matrix 1(~) contained in Y=(Yl •...•Yk)

we can decompose I(Q) as

(2.36)

Since Xl .···.Xk are independent and within the i-th population

yil •...• yir. are i.i.d for i=l •...• k. (2.36) readily follows. Suppose1

max n. = nl ~ 2c-l. Then Il(~) is positive definite. Hence bylsi::;k 1

(2.36) 1(~) is positive definite.

To prove the necessary condition we assume max n. < 2c - 1. and1::; i::; k 1

solve the following homogeneous linear equations;

-e (2.37)

for Yij = 0.1 ..... ni and i=l, .... k.

Since (2.37) admits the number of equations less than 2c-l. nontrivial

solution for r exists. Hence 1(~) is not positive definite. 0

Now. we prove the large sample property of the ML estimator

~ = (TIl'" .• TIc_l ' Pl.··· .Pc) based on I~=lri= r+ observations.

2Theorem 2.2: Let Q. 8 and 8 be defined as in lemma 2.6. Let ~O be

the true parameter value contained in 8. If the random variable

2Similar arguments can be found in N. Kiefer (1978). who considered amixture of two normal distributions in the switching regression.

Y = (Yl, ... ,Yk), where Y. = (Y.l, ... ,Y. ) for i=l, ... ,k, is distributed~ ~ ~ ~" , r.

according to the probability mass function

with maxl:;:;i:;:;k

n. ~ 2c-l, and if,

then for large enough r+, there exists a unique consistent root

~r of the likelihood equations and ;r:{~r -~O) is asymptotically+ +

normally distributed with mean zero and covariance matrix (~ 1{~O))-l.+

Proof. The proof consists of verification of Chanda's conditions

(i) ~ (iii), modified for the one-way layout nature of our data together

with two extra conditions of Tarone and Gruenhage (1975). The condition

(iii) is readily verified by the use of lemma 2.7. Verifying conditions

(i) and (ii) involves straightforward differentiation. It can be easily

seen that ahi/a~ and a2hi/a~a~' are all continuous functions on 8,

hence they are bounded. Now using the relation that

a9-nh. ah. 1,= -'aes aes ~,

2 a2h. ah. ah.Zl £n hi 1 1, , 1

aGsaGt-~ aesaet - aes ae t h?, ,

3 ah. ah. ah. a2h. ah. ah. a2h.a .Q,nh i 1 , 1 1 , , 1 1 , 1= 2 38 h~ - aes h? - aes aesaet 0assaetasu aet aeu aesaets , , ,ah. a2h.

_1 +a3h. 1, 1 ,

-~a8u a8sa8 t h~ a8sa8t d8 u, ,

Chanda's condition (i) and (ii) are easily verified. The two extraA 2 1 a£nh.

conditions, that 8 is a convex subset of R c- and that -~ ands

are continuous for all ~ E 8, are readily verified.

2.3 k-POPULATION FINITE MIXTURE OF BINOMIALS-APPLICATION

2.3.1 Description of the Ames Test

Since publication of the paper by Ames et ~ (1975), the Ames test

has gained worldwide use for investigation of a chemical's mutagenicity.

Its extensive use in studies of genetic toxicology is due to the test's

sensitivity in detecting mutagens, its economy both with respect to

time and material, and the well-documented link between carcinogenicity

and mutagenicity. The Ames test is based on a very sensitive

bacterial test. The bacterial test uses several genetically constructed

histidine-dependent (auxotrophic) Salmonella typhimurium strains that

can be reverted to histidine independence (prototrophy) by a wide

variety of mutagens. This bacterial test is adapted for use in detect

ing chemicals that are potential human carcinogens or mutagens by add-

ing homogenates of mammalian liver, which is a convenient source of the

activating enzymes that are an important aspect of mammalian metabolism.

Ames et ~ (1975) reported that about 85% of known animal carcino

gens had been detected as bacterial mutagens and among 106 known

non-carcinogens few were mutagenic in the test. Many Salmonella tester

strains have been developed by Ames and his colleagues; among them

TA 98, TA 100, TA 1535, and TA 1537 are most commonly used. As indi-

cated earlier, if a tester strain is hit by a mutagen, then it may be

reverted to prototrophy. Since prototrophic strains are capable of

synthesizing histidine, an essential amino acid, they continue growing

and dividing without an external supply of histidine, whereas auxo

trophic strains, being entirely dependent on an external supply of

histidine, cannot sustain growth. Thus if at least one auxotrophic

bacterium reverts to its prototrophic state through mutation, there

will be continuous growth of its descendants after exhaustion of the

external supply of histidine. Thus mutagen-induced and spontaneous

revertants ultimately yield colonies that are visible to the human

It has been observed frequently that the toxicity effects of the

chemical increasingly outweigh its mutagenicity effects beyond certain

dose levels. Thus, toxicity must be considered as a competing risk

vis a vis the mutation process.

2.3.2 Statistical Analysis of the Experimental Data

The experimental data consist of the results of 763 compounds,

where the experiments followed the standard protocol of the Ames

test. Four tester strains, TA 98, TA 100, TA 1535, and TA 1537, were

employed, and three levels of metabolic activations were considered by

adding (i) no enzyme, (ii) liver homogenate from a hamster, and

(iii) liver homogenate from a rat, respectively, to each of a set of

three petri dishes. Each compound for each of the 12 combinations of

four strains and three metabolic activations was tested nat times, for

a = 1, ... ,763, t = 1, ... ,12. For each of the nat times the experiment

should have consisted of 18 petri dishes, i.e., 3 replicates at

control and 3 replicates at each of 5 dose levels, but there was

occasional loss of dishes due to breakage, extreme toxicity, etc.

For the analysis of the observed numbers of revertant colonies

in a single experiment, Margolin et !L (1981) suggested a family of

mechanistic models based on the biological formulation of the Ames

test. They also noted the existence of hyper-Poisson variability

among the replicated plate counts and advocated the use of a negative

binomial distribution. Their negative binomial model for the number

X~ of revertant colonies observed on a plate with environment ~ was

denoted by

(2.38)

where ~ is shorthand for ~(~), ~ > 0 and c > O.

The variability in replicated plates is reflected in c; when c + 0

(2.38) reduces to the Poisson distribution through a standard limit

argument.

In order to disentangle the competing risks of mutagenicity and

toxicity, and hence to draw inferences regarding the mutagenicity,

Margolin et ~ (1981) modeled ~ as NOPo' where NO is the known

average number of microbes placed on a plate, which is large, e.g.,

108, and Po is the probability that any plated microbe yields a

revertant colony when the plate is exposed to dose 0 of a test

chemical. Among those models for Po considered, they suggested that

two models were of primary interest;

Po = {l exp[-(a+80)]}· exp(-yO)

Po = {l exp[-(a+SO)]}· [2 - exp(yO)]+

where [xJ+ = max(O,x), a ~ 0 is related to a spontaneous rate of

mutation, S ~ 0 is related to the induced mutation, and y ~ 0 is

(2.39)

(2.40)

related to the induced toxicity.

From (2.39) or (2.40) it can be seen that PD is a product of two

probabilities, one for mutagenicity and one for survival from toxicity;

hence PD represents the competing risks of the two. Moreover, a

chemical under study is mutagenic if and only if B > O. Thus one may

formulate the mutagenicity testing problem into a statistical hypo

thesis test by setting up the hypothesis as'

HO B = 0 «=> not mutagenic, or for brevity -)

Ha B >0 «=> mutagenic, or +)

with a significance level a. This significance level a is by

definition

a = Pr(judged +1 truly -) , (2.41)

·e the false positive probability assumed constant for each compound in

each experiment.

In each experiment a chemical is determined to be local-positive

iff [S/SE(S)J > c*, where Sand St(S) are obtained by the ML methods

based on 18 petri dish data under the negative binomial model (2.38)

and either (2.39) or (2.40). Under HO we may claim that

[6/51(6)J - N(O,l) (2.42)

Thus the critical value c* is determined by the given level of a. For

a = 0.05 and compound i we may obtain xit ' the number of local-positives

among nit experiments for each of 12 combinations of strain-activation

sets. For example, for chemical 1 (identification number) we observe

the following table:

~ None Hamster RatStrain

TA 98 0/1 2/3 3/3_.TA 100 0/2 0/2 0/2

TA 1535 0/1 0/1 0/1

TA 1535 0/2 1/3 2/3

Table 2.1The number of local positives out of

{nlt}~:l experiments for chemical 1.

where notationally yin implies y local-positives out of n experiments.

Lastly, Margolin and his colleagues (personal communication) combine

data from different experiments and reach a single conclusion and

determine a chemical to be positive if and only if there is at least

one repeated local-positive in at least one strain-activation set.

Thus in Table 2.1 or in any other such table for another chemical if

any cell contains the number of local-positives ~ 2, then that

chemical is considered to be positive. In what follows we refer to the

summary data in Table 2.1 obtained through the statistical procedures

described above as the derived data.

2.3.3 Further Analysis of the Derived Data: Mixture Model

2.3.3.1 k-Popu1ation Mixture of Two Binomials

The derived data in Table 2.1 admits further statistical analysis

that may be focused on the following three problems;

Problem (i): How to perform an empirical check on the operating

properties of Margolin et ~IS statistical procedures?

Problem (ii): What is the proportion of mutagens in the popula

tion of compounds tested?

Problem (iii): What is the power of detecting a true positive

chemical in this procedure?

This last problem reflects both the sensitivity of the Ames test as

well as the sensitivity of the statistical analysis. The derived data

can be arranged to yield a lower triangular two way layout for each

strain-activation set by counting the numbers of chemicals that have

O,1,2, ... ,n i positive results, respectively, out of ni experiments,

where n. = 1,2, ... , max(n t) for i=1,2, ... ,k.1 a,t a

To develop a suitable statistical model for problems (i)-(iii),

we may note several characteristics of the experiments and the derived

(i) There are S compounds that have been tested from a

hypothetical set ¢ of compounds that have or will be tested.

Note, this is not the universe of chemical compounds.

Scientific judgement enters the selection procedure, so that

for example, H20 would not be tested nor included in ¢.

(ii) The tests adopted have a probability T of yielding

false positives, and a probability l-p of false negatives.

The latter is somewhat of a simplifying assumption, similar

to Neyman's (1947) diagnostic simplification, so as to permit

some analytic progress.

(iii) The set ¢ of compounds has a proportion TI of positive

chemicals.

(2.44)

(2.45)

(iv) For each strain-activation set, the chemical can be grouped

into sets such that the i-th set or r i chemicals has been tested

n. times for positive or negative evidence of mutagenicity, where1

n. = 1,2, ... , max(n t}, i=l, ... ,k.1 a, t a

The probabilities p and T can be described in the usual table:

~Result pos iti ve negativeState

of Naturepositive p l-~

negative T l-T

Table 2.2

For each strain-activation set the vector Xi=(Yil ,Y i2 , ... ,Yir .)1

of positive results of the i-th set of chemicals can be considered

as an observation from a mixture of two binomial distributions, i.e.,

r'r ~1 n. y.. n.-y.. n. y.. n.-y ..Pr(Y.=y.) = IT ITI( 1 )p 1J (1_p) 1 1J+(1_TI)( 1 h 1J (1_-r) 1 1J , (2.43)

-1 -1 j=l L Yij Yij

for i=l, ... ,k.

Using the simple notation we denote (2.43) as

{Y .. } '" {b(y .. ;n.,p)} 1\ G2

(p)1J 1J 1 P

fo r j =1, ... , r i' i =1 , .•. , k ,

where G2 is a discrete distribution function with two atoms and

,§ = (TI,p,T).

Now, (Xl' X2, ... , Yk) constitutes an independent, non-identically

distributed set of data with joint likelihood

k k r iIT h.(,l.;,tZ):=.: IT {IT [b(y .. ;n.,p) 1\ G2(p)]} .

. 1 1 1 . 1 . 1 1J 1 P1= 1= J=

Specifically, we havei i d

Y11 ' ... ,Y1r1~ h1(,©) = B(n1' p) PG2{p}

i i dY21 ' ••• ,Y2r 2 ~ h2{~} = B{n2' p} PG2(P).

(2.46)

The reader wi 11 recognize this formulation to be a k-popu1ation mix-

ture of two binomials.

The problems (i) ~ {iii} above are tied to estimation of param-

eters TI, P and T of the k-population mixture of two binomials model

{2.43} and obtaining their sampling distributions. Particularly

Problem (i) affords an empirical check of the a priori assumption on

the size of the false positive probability, which was set to be 0.05

based on the large sample behavior {L.42}. Problem {ii} is identified

with estimation of the prevalence rate TI, and Problem (iii) may be

partially answered by studying the sampling distribution of the estima-A

tor p.

The identifiability condition of the class of joint distributions

of (Y1, ... ,yk) is given by Theorem 2.1, which says the class is

identifiable if and only if

max n. 2 3.hi<::;k 1

(2.47)

In the derlved data, max n.2 4 for each of the 12 combinations ofI::;i::;k 1

strain-activation set.

In order to find the MLE of 8 = (TI,p,T) we use the iterative

equations (2.27) and (2.28). We first define

z..1.1

if Yij is from blYij;n i ,p)

if Y.. is from b(y .. ;n. ,T)1.1 1.1 1

(2.48)

for j=l, ... ,ri , i=l, ... ,k.

Then (2.27) and (2.28) admit the following EM implementations;

w.. = w(y .. ;8) = E(Z. ·IY .. ,8)1.1 1.1 ~ 1.1 1.1 ~

TIb(y .. ;n. ,p)= 1J 1

TIb(Y· .;n. ,p)+ll-TI)b{y .. ;n. ,T)1J 1 1.1 1

w~ "! )= w.. (y .. ; 8 ( \J) ) ,1J 1J 1J ~

where the superscript (\J) implies that the estimated quantity ;s

evaluated at the \J-th iteration step.

Then the parameter vector ~ = (TI,p,T) is updated by

P(\J+1) = k r i (\J ) k r i (\J )

L L w-. y .. / I I n.w ..i=l j:';1 1J 1J i=1 j=1 1 1.1

(2.49)

(2.50)

(2.51)

(v+ I)T (2.52)

(L.53)TI(\J+l) = 1 I .~\~\JJ.)' where r+ = I r-r+ i=l J=l i=l 1

The equations (Z.51)-(2.53) can be simplified by noting that the i-th

set of r i chemicals can assume values U,l , ... ,n i for numbers of posi-

tives. Hence

t=O,l, ... ,n i ·

Hence we have

frequencies, say {fit}' can be assigned to {Yij = t},

For all those y .I S , the w.. 's are equal, say to w1-t .. 1J 1J

the following simplified equations;

(\)+1)p

( \)+1)T

(2.54)

(2.55)

n.( ) 1

TT \)+1 = \ \ f- L L 'tw'tr+i=l t=O 1 1

(2.56)

Thus cycling back and forth between the equation (2.49) and the

equations (2.54)-(2.56) defines the eM algorithm for the k-population

mixture of two binomials model.

The observed information matrix for our model can be obtained

following Louis' EM implementation (Louis, 1982). Using EM terminology,

the complete data can be defined by specifying the component distri-

bution from which each observation is drawn.

·e Let

z. = (Z'1,Z·2""'Z. )~1 1 1 1 r i

t2.57)

where Z.. is defined in (2.48).1J

Then the complete data ~ = (~1'~2"" '~k) is defined as

X. = (Z., Y. L (2. 58 )~1 ~1 ~1

where .6i = (Xil ,··· ,Xir .), i=l, ... ,k.1

The likelihood of the complete data ~ suggests a two-stage

experiment, where first a component is picked by a Bernoulli experi-

ment, and then a binomial variate is generated. Therefore the log-

likelihood hX .. (o) for Xij is given by1J

h*X t x 0 • ; e)• 0 1J ~1J

(2.59)

fo r j = I , ••. , r i' i =1 , ... ,k.

Let S(Xij :&), S(~i:&)' and S(~:&) be gradient vectors ofr. k r.

h*x (x .. :&), I 1h*X (xo. :&), and I I 1

hx (x .. :&), respectively.ij 1J j=l ij 1J i=l j=1 ij 1J

Let BtX .. :8), BtX.:8) and B(X:8) be the negatives of the associated1J ~ ~1 ~ ~ ~

kS(~:&) = L

second derivative matrices. Since ~l '~2"" '~k are

k r iS(X.:8) = I 1. S(X .. :8),

~1 ~ . 1 . '1 1 J ~1= J=

independent,

k r i= I I BtX .. :8).

i=l j=l 1J ~

Thus we obtain

r~ h"X (x .. : 8)

p ij 1J ~

S(X .. : 8) = }- h"X (x 0 • : 8) =1J ~ P ij 1J ~

~ h*X (x. 0: e)uTI . 0 1 J ~_ lJ

y .. -n. pZ 1J 1ij p(l-p)

y .. -noT( I-Z ) 'lJ 1

ij Ttl-T)

Z.. -TI1J

TI( 1-TI)

(2.60)

B{X .. :8) =1J ~

2n.p -y .. (2p-l)1 1J

2 2(p - p )

o2n.T -yo .(2T-l)

( ) 1 1J1-Z. . 2 2

1J (T-T)

2Z.• -2'ITZ .. +'IT1J 1J

2 2'IT (l-'IT)

(2.61 )

Hence the conditional complete data observed information matrix

becomes

Ix = L Li=l j=l

(l-w .. )1J

,,2 (")n.T -y .. 2T-l1 1.)

( " ,,2)2T-T

(2.62)

Following Louis ' development, the lost information, due to the

unobservable I. denoted by IllY' is obtained as

(2.63)

where S'(l:8) is the transpose of S(l:Q).

Also by the definition of the MLE §. which maximizes the incomplete

k r.data likelihood ITi=l ITj~l hitYij:~)' we have

t2.64)

(2.65)

k r.= \. 1\·11E"'e[S(X .. ;8)S'(X .. ;8)!V .. ]

L1= LJ= lJ - lJ - lJ

k r.- \. 1\·11S*(V .. ;8)S*'(V .. ;8)

L1 = LJ= lJ - lJ -

k r i k r i A

= [I I S*(V .. ;&)J[ I I S*(V .. ;e)J = O.i=l j=l lJ i=l j=l lJ

Thus by using equations (2.63) and (2.65), and the independence of

After simple algebra it can be shown that the two terms in (2.66)

become

4Louis (1982) has a different expression for lxlv' which is algebraically equivalent to (2.66). However. we find that (2.66) was simpler tobe programmed.

Ee/{S (X .. ; 8) S I (x. 0; e) Iy .. ]lJ ~ lJ - lJ

[y .. - n opJ2lJ 1

Wij - p{l-p) o lo

w.. -2iTw . .+ilJ lJ

[ fT (1-fT ) ] 2

(2.67)

·ew. . -1[

lJfT(l-fT}

(2.68)

(2.69)

In the equation (2.67) the entries in the (1,3) and (2,3) positions are

equal to zero because of the EM equations (2.54) and (2.55), respect-

i vely.

Finally the observed information matrix Iy is obtained as

IX = IX - IX IX '

where IX is obtained in (2.62), and IXIX

is obtained by (2.66)-(2.68).

2.3.3.2 Results of the Analysis.

Two sets of derived data were obtained for further statistical

analysis. The first set of output data was based on the statistical

procedures by Margolin, Kaplan and Zeiger (1981) using the usual

significance level a = 0.05. This is now referred to as the stat-call.

For the second set of derived data a senior toxico10gist's5 subjective

judgment based on his past experience yielded the decisions of whether

a compound being tested was local-positive or local-negative. Hence

there is no formal statistical procedure in the generation of the

second set of derived data. The second set of derived data is called

the Zeiger-call.

The stat-call and Zeiger-call data are presented in Table 2.3 in

lower triangular arrays for each strain-activation set. Tables 2.4.A

and 2.4.B display the corresponding MLE's and the inverses of the

observed information matrices Iy'S obtained by the EM algorithm6

procedure described in 2.3.3.1.

The total number of compounds tested in each strain-activation set

varies slightly (at most by 1) because some compounds were not tested in

certain strain-activation sets. In Table 2.4.B some MLE's were obtained

at the boundary of the parameter space, i.e., p = 1 for TA lOO-N and for

TA 1537-R. This yielded singular information matrices (see (2.62),

(2.66)-(2.69) for the singularity of a information matrix at p = 1.)

Since these estimated values do not belong to the parameter space at

the outset, they must be interpreted without benefit of a corresponding

standard error.

For the overall probability of a false positive, for a compound

when all 12 strain-activations are employed, we note the following

immediate but useful results.

50r . Errol Zeiger of the National Institute of Environmental HealthSciences, who actually supervised all the biological experiments thatyielded the experimental data.6Several sets of initial values were tried with the EM alqorithm. Itturned out that the EM algorithm leads to fairly stable stationary valuesof the estimator values with respect to various sets of initial values.

Lemma 2.8: Let T.· be the probability of a false local-positive forlJ

a compound in the i-th strain and j-th activation set, for i=l , ... ,4,

j=1,2,3, and let Tover be the overall probability of a false positive

for a compound. Then under the independence of the 12 combinations

of strain-activations,

Proof: 1 - Lover = Pr(judged negativeltruly negative)

= Pr(no repeated local-positives in any of 12

combinations of strain-activation setltruly

negative)

4 3 2= n n (l-T .. )

i=l j=l lJ

Thus by lemma 2.8 the MLE Tover of Lover becomes

(2.70)

(2.71)

where T.. by Theorem 2.2 has an approximate normal distributionlJ

with mean Tij and variance equal to the (3,3) entry of Iy(~)-l. The

distribution of Tover can be obtained under the independence of the

T.. 's when the compound is not mutagenic.lJ

3 4 12Lemma 2.9: Let hij}j=l ;=1 be reindexed as ht}t=l and assume

[1'" .,112 are independent. Then

V 12 . 2)J2 2)~(T -1 ) ~ N(O,r+Lt-l[2Tt n (l-Ts at '

+ over over - s1t

where Tover

is defined in (2.69) and a~ = Var(Tt )·

(2.72)

Proof. Let I = (Tl,· .. ,T12 ) and the MLE I be defined accordingly.

Then as a result of Theorem 2.2 and the independence assumption

~ (~ - ~) ~ N(~,r+diag(0i, ... '0i2))' (2.73)

where diag(0i, ... ,0~2) is a diagonal matrix with diagonal entries

0~, ... ,0~2' Then by using the multivariate a-method (Bishop,

Fienberg and Holland, 1975, §14.6.3) the result follows. 0

Calculation of ~over and S~D(iover) for the stat-call and

Zeiger-call data are given in the table below:

A S:D(Tover )Tover

Stat-Call 0.0560 0.0010

Zeiger-Call 0.0013 0.0004

Table 2.5: Tover and its standarddeviation for Stat-Call andZeiger-Call data

2.3.3.3 Discussion of the Results

Margolin et ~'s statistical procedures described in subsection

2.3.2 assumed that for each set of 18 petri-dishes 8/St(s) was

distributed as N(O,l) based on the large sample theory. Based on

this normal assumption the cut-off value was determined for given

level of significance a = 0.05 for each experiment to test the

local-positiveness of the compound.

By noting that the significance level of a is equivalent to the

probability of a false-positive in each experiment (see t2.4l)) the

operating property of ~~rgolin et ~IS statistical procedures can be

checked against the sampling distributions of Tij for i=l , ... ,4,

j=1,2,3. For Table 2.4.A we extract the entries corresponding to

T and var(i) and present them in the table below:

Strain- A A

(T .. -0.05)/S:O(T .. )Activation T;j S.O(Tij) lJ lJ

TA 98-N 0.0441 0.0290 -0.203

TA 98-H 0.0493 0.0235 -0.030

TA 98-R 0.0659 0.0187 0.850 ITA 100-N 0.0536 0.0289 0.125 I-

ITA 100-H 0.0969 0.0231 2.030

TA 100-R 0.1026 0.0293 1.795 ,TA 1535-N* 0.0788 0.0107 2.692

TA 1535-H 0.0762 0.0170 1.541

TA 1535-R 0.0607 0.0154 0.695

TA 1537-N 0.0584 0.0207 0.406

TA 1537-H 0.0533 0.0143 0.231

TA 1537-R 0.0659 0.0111 1.432

Table 2.6: T and SO(i) for each combination ofstrain-activation set.

In Table 2.6 we see that among the 12 Tij's, 10 have 0.95 confidence

intervals containing the value 0.05. Thus we may conclude that the

stat-call data provide eVidence that N(O,l) is a good approximation

of the tail distribution of ~/SE(S).

With biocnem;cal techniques, TA 98 and TA 100 were engineered

from TA 1535 and TA 1537, respectively, to have greater sensitivity

to mutagens. This fact is reflected in the dominance of a1s in

TA 98 over a's in TA 1535 uniformly with respect to the activation

sets. The same is true for TA 100 and TA 1537.

Investigation of the Zeiger-cal I data indicates that his

decision making is too conservative relative to tne conventional

range of statistical significance levels commonly employed in

scientific research .

Table 2.3 The Number of j Positive Results in i Experiments in

Each Strain-Activation Set; Stat-Call and Zeiger-Call

Stat- Call Zeiger-Call

TA98-N

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 65 13 78 71 5 76

2 512 75 64 651 580 21 44 645

3 15 5 1 4 25 19 3 2 3 27

4 3 1 1 2 0 7 6 0 0 0 0 6

5 0 0 1 0 0 0 1 0 0 1 0 0 0 1

6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

TA98-H

'e i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 40 21 61 54 4 582 482 88 83 653 563 15 72 6503 17 8 7 9 41 24 4 3 10 414 2 0 2 0 1 5 3 0 0 1 0 45 1 0 0 0 0 2 3 0 1 0 0 1 1 3

6 0 0 0 0 0 0 0 0 0 0 a 0 0 0 0

TA 98-R

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 47 12 59 54 3 57

2 478 96 80 654 569 17 64 6503 22 4 4 11 41 26 2 6 7 41

e 4 2 2 1 0 3 8 4 0 1 1 1 75 0 0 0 0 1 0 1 0 0 0 0 ·1 0 1

6 0 0 0 0 0 0 0 0 0 0 0 o ' 0 0 0 0

Table 2.3 The Number of"j Positive Results in i Experiments in

• Stat- Call Zei ger-Ca 11

TA 100-N

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 36 11 47 39 2 41

2 409 135 101 645 541 25 79 645

3 24 9 9 9 51 33 11 0 6 50

4 3 3 3 5 -4 18 12 2...,

() 1 18.)

5 1 0 0 0 0 0 1 0 0 1 0 0 0 1

6 0 0 0 1 0 0 a 1 0 0 0 0 0 0 0 0

•TA 100-H

~e i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 11 8 19 16 2 182 394 129 154 677 530 21 123 6743 12 12 9 12 45 19 8 5 10 42

4 4 1 1 4 9 19 7 2 1 4 5 195 0 1 0 0 0 0 1 1 0 0 0 0 0 1

6 1 0 0 0 a 0 0 1 0 0 0 0 0 0 0 0

TA 100-R

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 1

1 9 10 19 18 3 212 386 155 140 681 539 24 113 6763 15 9 7 13 44 25 -3 5 7 40

e 4 1 2 1 4 8 16 6 3 0 4 3 165 0 0 0 1 1 0 2 1 0 0 0 '1 0 26 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

Table 20 3 The Number of j Positive Results in i Experiments in

Stat- Call Zeiger-Call

TA 1535-N

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.

1 84 12 96 93 0 93

2 489 85 42 616 569 14 34 617

3 22 7 1 8 38 26 2 2 5 35

4 5 3 1 1 2 12 6 2 1 2 0 11

5 0 0 0 0 0 0 0 0 0 0 0 0 0 0

6 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0

TA 1535-H~e

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.

1 68 15 83 82 2 84

2 472 87 71 630 556 18 47 621

3 19 9 3 10 41 22 10 6 5 43

4 2 1 1 0 1 5 4 0 a 0 1 5

5 0 0 1 1 0 1 3 0 1 2 0 0 0...,,)

6 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0

TA 1535-R

i\j 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.l. l.

• 1 70 18 88 81 3 84

2 476 88 60 624 554 18 50 6223 16 10 7 7 40 26 -4 5 5 40

e 4 4 1 0 1 2 8 5 0 1 1 1 8

5 0 0 0 0 2 0 2 0 0 1 1 -0 0 26 0 1 0 0 0 0 0 1 0 0 0 0 ' 0 0 0 0

Table 2 0 3 The Number of j Positive Results in i Experiments in

Stat- Call Zei ger-Ca 11

TA 1537-N

"\" 0 1 2 3 4 5 6 r. 0 1 2 3 4 5 6 r.1 ,J 1 1

1 94 20 '114 105 a 105

2 510 80 28 618 592 10 20 622

3 12 7 4 3 26 18 ? 2 3 25...

4 3 1 a 0 a 4 4 a a a a 4

5 0 a a a a a a a a a a a a a6 a a a a a 0 a a 0 a a a a a a a

TA 1537-H

·e i\j 0 1 2 3 4 5 6 r. 0 1 2 .3 4 5 6 r.1 1

1 76 23 99 95 1 96

2 528 68 37 633 579 15 32 626

.3 11 6 3 7 27 18 5 1 6 30

4 4 a a 0 0 4 4 0 a a a 4

5 0 0 a a 0 a a 0 a a a a a 0

6 a 0 a a a a 0 0 0 a a a a a a a

TA 1538-R

i\j 0 1 2 .3 4 5 6 r. 0 1 2 .3 4 5 6 r.1 1

• 1 87 12 99 92 3 95

2 512 73 48 633 583 10 33 626

.3 14 6 2 4 26 20 .6 a 4 30

e 4 1 3 a a 1 5 4 a a a 1 5

5 a 0 a a a a a a a a a a a a6 a a a a a a a a a a a a a a a a

" "" ,.. "Table 2.4.A: The ML Estimator ~ = (TI,p,l) and the Covariance

Matirx Iy(;)-l via EM Algorithm in Each Strain

Activation Set Based on the Stat-Call

TA 98-N

" " "TI P 1"

A. r0.1595 0.00423 -0.00972 -0.00181 1TI

" " I (~)-1

8 = p = l 0.7870 0.02415 0.00430y -A.

0.0441 0.000841" sym.

TA 98-H" " "TI P 1"

·e " 0.2282 ( 0.00233 -0.00336 -0.00103TI

" " (~) -18= p = 0.7973 I y = 0.00594 0.00162

" l s~.1" 0.0493 0.00055

TA98-R" " "TI P 1

"TI 0.1837 0.00143 -0.00247 -0.00061A. " Iy(~)-18= p = 0.8550 = 0.00548 0.00120

1" 0.0659 syrn. 0.00035

A "'" " ,.,.

Table 2.4.A: The ML Estimator ~ = (n,p,T) and the Covariance

Matirx I y(Q)-l via EM Algorithm in Each Strain

TA 100-N

" " "n p T

" r0.3489 0.00368 -0.00310 -0.00159 1n

'" '" I (~)-1

e = p = l0.6887 = 0.00331 0.00141

T 0.0536 syrn. 0.00083

TA 100-H'" '" A-

0.3273 r0.00165 -0.00150 -0.00077.- n

A- A-(~) -1e= p = 0.8541 I y = 0.00193 0.00083

A- l 0.00053T 0.0969 sym.

TA 100-R" '" "n p T

0.3428 0.00291 -0.00254 -0.00140n

" " Iy(~)-1e= p = 0.8070 = 0.00282 0.00133

'" 0.1026 0.00086T sym.

A '" " "Table 2.4.A: The ML Estimator ~ = (TI,p,T) and the Covariance

Matirx I y(2)-1 via EM Algorithm in Each Strain

TA 1535-N

'" '" '"TI P T

'" r0.0827 0.00032 -0.00100 -0.00011 1TI

'" '" I (~)-1

e = p = l0.9376 0.00538 0.00049

T 0.0788 syrn. 0.00011

TA 1535-H'" '" '"TI P T4_ '" 0.1471 [0 0 00103 -0.00228 -0.00044TI

'" '" (~) -1e= p = 0.8953 I y t 0.00657 0.00115

'" l s~.T 0.0762 0.00029

TA 1535-R'" '" '"TI P T

'" 0.1751 0.00]15 -0.00187 -0.00042TI

'" '" Iy

(§)-l6= p = 0.7846 = 0.00444 0.00078

'"T 0.0607 syrn. 0.00024

" .I\. " "

Table 2.4.A: The ML Estimator Q = (iT,p,T) and the Covariance

Matirx Iy(~)-l via EM Algorithm in Each Strain

TA 1537-NA A A

iT P T

A r0.1034 0.00303 -0.00890 -0.00105iT

I (~)-le = p = l 0.7027 = 0.02942 0.00313A

y -T 0.0584 sym. 0.00043

TA 1537-HA A A

iT P T

A 0.1034 r0.00093 -0.00289 -0.00036-_ iT

(~)-le= p = 0.8472 I y = 0.01162 0.00127A

tT 0.0533 sym. 0.00021

TA 1537-RA A A

iT P T

A 0.0855 0.00038 -0.00129 -0.00014iT

Iy(§)-l8= p = 0.9382 = 0.00673 0.00064

A 0.0659 0.00012T sym.

A A " ""

T bl 2 4 B- The ML Estimator _8 = (n,p,T) and the Covariancea e •• -

Matrix Ir(~)-l via EM Algorithm in Each Strain-

Activation Set Based on the Zeiger-Call

TA98-NA A A

A r 0.0962 r0.00047 -0.00170 -0.00016 1nA A

l0.8274 I (~)-1

0.00917 0.00074

JA 0.0073

lT sym. 0.00008

TA 98-H" A A.- n P T

A 0.1321 [ 0.00018 -0.00010 -0.00001n

A A(§) -18= P = 0.9365 I y = 0.00063 0.00005

0.0101 lT sym. 0.00002

TA 98-RA A A

A 0.1390 0.00022 -0.00024 -0.00003n

Iy(~)-18= P = 0.8484 0.00145 0.00009

T 0.0010 sym. 0.00001

" A. " '"Table 2.4.B: The ML Estimator ~ = (n,p,T) and the Covariance

Matrix Iy(~)-l via EM Algorithm in Each Strain-

TA 100-N

'" '" '"n p T

'" r0.1150 1n

" '"8 = p = l1.0000 1y(~)-1

'"T 0.0348

TA100-H'" '" '"

-en p T

'" 0.2139 [0.00026 -0.00009 -0.00002n

'" '" (~)-l8= p = 0.9338 1y0.00038 0.00005

'" l syrn.T 0.0174 0.00003

TA 100-R'" '" '"n p T

'" -0.00010n 0.2018 0.00026 -0.00002

'" '" 1y

(§)-18= p = 0.9153 = 0.00048 0.00005

'" 0.0123 0.00002T sym.

7 When the ML estimator is obtained at the boundary of the parameterspace, usual asymptotic results fail to hold (see Chernoff, 1954,and Feder, 1968, 1978).

A " " A

T bl 2 4 B' The ML Estimator _8 = (TI,p,T) and the Covariancea e •• '

Matrix I y(§)-l via EM Algorithm in Each Strain-

TA 1535-N

'" A '"TI P T

A r0.0721 r0.00013 -0.00026 -0.00002 1TI

8 = p = l0.8518 I (~)-l =

0.00272 0.00010

JA 0.0071

l 0.00001T

TA 1535-HA '" '"TI P T

-e '" 0.0905 [ 0.00019 -0.00046 -0.00004TI

'" A (~)-l8= p = 0.8922 I y0.00339 0.00023

A 0.0203 l 0.00004T sym.

TA 1535-R'" '" '"TI P T

A 0.1254 0.00031 -0.00061 -0.00006TI

Iy(§)-l8= P = 0.7748 = 0.00316 0.00022

'" 0.0014 0.00003T

" " '" "T bl 2 4 B- The ML Estimator _8 = (TI,P,T) and the Covariance

a e ..-

Matrix Ir(~)-l via EM Algorithm in Each Strain-

TA 1537-N

'" '" '"TI P T

A r0.0418 0.00010 -0.00051 -0.00002 1TI

~ = P = l0.8622 Iy(~)-1 0.00728 0.00021

T 0.0048 sym. 0.00001

TA1537-HA '" '"TI P T

0.0554 r0.00009 -0.00014 -0.00001TI

AA (~) -18= p = 0.9637 I y = 0.00151 0.00007

0.0136 l syrn.0.00001

TA 1537-R

'" 0.0534TI

(§)-18= P = 1.0000 =8

'"T 0.0122

'"TI '"P '"T

a The ML estimator at the boundary of the parameter space does nothave the usual asymptotic results. See the footnote 7.

CHAPTER III

MIXTURE OF THE MULTINOMIAL DISTRIBUTIONS

In section 1 of this chapter discussion focuses on the goodness

of fit test for a binomial distribution against alternative paramI

etric mixtures of binomials. We show that the locally optimal test

for detecting extra-binomial variability from a mixture alternative

can exist in the one-way layout. In section 2, 'where the beta-bino-

mial discussed in section 1 is generalized, we develop a random

effects model for a one-way layout contingency table by employing a

Dirichlet mixture of multinomial distributions. We then discuss

three tests of whether the random effects are negligible.

3.1 Binomial Case

In the binomial distribution of James Bernoulli the n Bernoulli

trials are assumed to be independent and the success probability is

constant throughout these n trials. It has been frequently observed

in practice, for example in biological experiments, that either one

or both assumptions are not satisfied. Sometimes these observations

can be explained or understood by investigating the underlying

mechanism. For instance, for the number of cavities among children

the 'success' probability may differ from child to child due to

differences in nutrition and other factors; hence, independence among

those cavities in a child may not be maintained if the probability is

viewed as a random variable itself. For other examples of deviations

from the binomial conditions see Chatfield and Goodhardt (1970),

Griffiths (1973), Haseman and Soares (1976) and Haseman and Kupper

(1978) .

The inconstancy of the success probability caused heterogeneity

of Student's yeast cell count data (Student, 1907), which led K.

Pearson (1915) to consider a mixture of two binomials. A mixture of

two binomials has three parameters, which may not be sufficiently

parsimonious in a small data set as an alternative to the one-param

eter binomial distribution. In what follows we focus on a well-known

two-parameter generalization of the binomial, the beta-binomial

distribution, which can be described as follows:

Let X have a binomial distribution B(n,u), 0 < u < 1 and let U itself

be a random variable that has a beta distribution Beta(a,S) with ~aram-

-e eters a > 0 and S > 0, i.e.,

a > 0, 13 > 0

( ) f l a-l( )b-lBe a, b = OX 1-x dx, a > 0, b > O.

Since the mean and covariance of U are

Var(U) = p(l-p)/(a+B+l) ,

E(U) = a/(a+B) = p, (q = l-p)(3.3)

respectively, it is useful to reoarametrize a and B into

p = a/(a+B)

e = l/(a+S)(3.4)

Then e = 0 implies that Var(U) = 0, which reduces a beta-binomial

into an ordinary binomial distribution. With the new parametrization

(3.4) it can be shown that the marginal distribution of X can be

represented by

h(x) = (~)Be(% + x, %+ n-x)/Be(% ' %)(3.5)

X-l ~ r-X-l ~ /n-l= (~) IT (p+t8) IT (q+t8) IT (l+te).

x t=o t=O t=o

Using previous notation, (3.5) can be expressed as

x~ B(n,u) A Beta(p,8)u

The beta-binomial distribution in the form of the beta mixture

of binomials in (3.6) appears to have been first introduced by

Skellam (1948). It may be noted, however, that the ~robability mass

function (3.5) of the beta-binomial distribution has been recognized

since 1923 as the Polya-Eggenberger urn model with stochastic replace-

ments, which includes the binomial and hypergeometric as soecial cases.

We refer to Johnson and Kotz (1977, Ch.4, 1969) for a detailed

description of the Polya-Eggenberger distribution, and for various

other nomenclatures of the beta-binomial distribution.

Since Skellam (1948) introduced the beta-binomial distribution,

many authors have emoloyed it in the analysis of biological data, most

note\'IortI1Y among them being: Kemp and Kemp t1956), Hill i ams (1975),

Crowder (1978 Feder (1978) and Segreti and t1unson (1931).

We may illustrate the difference between the ordina~y binomial,

the beta-binomial and other proposed binomial generalizations using

the following example.

In certain toxicological experiments with animals the outcome of

interest is the occurrence of a dead fetus in a litter that receives

a certain treatment. A typical example would be the dominant lethal

test (Haseman and Soares. 1976) to determine the mutagenicity of a

compound.

rif the k-th fetus in the j-th 1itter receiving

Yij k = the i-th treatment is dead. (3.7)

O. othen-lise.

andn..

X.. = L lJ Y.. k (3.8)lJ k=l lJ

for k=l .... ,n ij , j=l, ... ,9-. i=l •...• 1.

Then by assuming that the litter-specific probability of fetal death

has a beta distribution we have

X.. ~ B(n .. , u.) A Beta ( p. , e.).lJ lJ 1 Ui 1 1

Now the beta-binomial model in (3.9) is partially analogous to the

random effects model under normal theory

(3.10)

where Y.. k* denotes the observed response, the n.. I S are randomlJ . lJ

effects that are independent N(~,0~) and the Eijk's are independent2N(O,02) errors of observation and are independent of nij.

In the random effects model (3.10) Y.· k and Y.. kl for k ~ kl are1J lJ

conditionally independent given nij but Yijk and Yijkl are uncondi-

tionally dependent with correlation 0i/(0f+0~). Hence with the beta

binomial model one introduces intra-litter correlation, but it is

always positive.

The correlated binomial model developed by Kupper and Haseman

(1978) is more flexible than the beta-binomial in the sense that it

can allow negative intra-litter correlation. Using Bahadur's tech-

nique (Bahadur, 1961) they 'corrected' the ordinary binomial to

incorporate the intra-litter dependence and arrived at a probability

mass function tp.m.f)

n. . x .. n .. -x ..h tx .. ) = (l J ) p.1 J q .1 J 1Jc 1J x. . 1 1

{ 1 + T i 2 2 }2 2 [(x .. -n .. p.) +x .. (2p.-l)-n .. p.] ,2 1J 1J 1 1J 1 1J 1p.q.

(3.11)

Tl' = Co V(Y .. k' Y.. k' ) fo r k "f k I

1J' 1J

q. = 1-p ..1 1

The possible range of Pi = T/tPiqi) is calculated in Kupper and

Haseman (1978) for several choices of p. and n...1 1J

In the same vein A1tham (1978) obtained1 a multiplicative

generalization of the binomial using the multiplicative definition of

interaction for count data. Even though the multiplicative generali

zation has a remarkable property that it belongs to the two-parameter

exponential family, it may not be so easy to work with as the other

two-parameter generalizations.

For a goodness of fit test of a binomial against a mixture

alternative we may classify the data types into three categories

under HO:

1A1tham t1978) actually obtained two generalizations of the binomial.However, the additive generalization is equivalent to the correlatedbinomial discussed above.

iidA random sample: Xl, ... ,Xk "" B(n,p)

ind ( )Non-iid case, no replication: X1, ... ,Xk such that Xi .... B ni,p

for i=l, ... ,k, and all ni distinct.

te) One-way layout: i idXil,···,Xir .""' B(ni,p)

1fo r all i =1, ... , k.

(3.12)

For case tAl Wisniewski (1968) showed that the test based on the

classical chi-square statistic

\'k ( A)2 AASA = Li=l Xi-np /npq,

wherekP = Li=l Xi/n and q = 1 - P,

is the locally most powerful (LMP) test having Neyman structure against

a wide class of mixture alternatives. It can be further shown that

the test based on SA is locally most powerful unbiased (LMPU) against

the same class of mixture alternatives. tSee Appendix I for the

proof. )

Potthoff and Whittinghill l1966.a) derived the LMP test against

the beta-binomial alternative in case (B) when ~ is known; their test

rejects the null hypothesis for large values of

SB = t L~=l Xi (Xi -1)+ ~ L~=l(ni-Xi)(ni-Xi-l).

When p is unknown in case (B), Wisniewski (1968) proposed a test based

S* = \'k (X A)2/ AAB Li=l i-niP pq,

(3.13)

Recently Tarone (1979) showed that the test based on S~ is the corre-

sponding C(a) test against the class of general mixture alternatives,

and hence it is asymptotically locally most powerful.

As will be shown below, the general mixture alternative suggested

by Wisniewski is broad enough to include both the beta-binomial, and

the correlated binomial when the correlation is positive. Hence fur-

ther discussion of the detection of extra-binomial variability from

mixture alternatives can be focused on the general mixture alternative

suggested by Wisniewski.

Definition 3.1: Suppose a random variable X has a mixture distribu

tion of the following form:

(3.14)

·efor 0 < u < 1, where U is a random variable having a p.d.f g(o)

with mean p and finite variance a2 Then the mixture distribution

(3.14) is called the Wisniewski-type general mixture.

Even though Kupper and Haseman (1978) derived the correlated

binomial in an attempt to 'correct l the ordinary binomial it is use-

ful to derive the p.m.f (3.11) of the correlated binomial when 8i > 0

from the Wisniewski-type general mixture of binomials. It is obvious

that the beta-binomial distribution belongs to the class of

Wisniewski-type general mixtures.

Lemma 3.1: Up to the order of a2, the ~.m.f of the Wisniewski-type

general mixture corresponding to (3.14) is equivalent to the corre-

1ated binomial distribution wnen the correlation is positive.

Proof. If we make a change of variable by putting U = p(l+cV),

where c = alp, (3.14) becomes

X ~ (~)pxqn-xJ'(l+cv)x(l_cPv/q)n-xg*(v)dV , (3.15)

where g*(o) is the p.d.f of the standardized random variable V, and

q = l-p.

Let h(x) denote the marginal p.rn.f of X. Then2

h(x) = (~)pxqn-x {l+ ~qP ~(~-l) + (n-x)~n-x-l) - n(n-li]

+ O(03)}

= (n)pxqn-x {l+ 02

[(x-np)2+x(2P_l)_np2]+0(03)}x 2p2q2

(3.16)

Thus after deleting 0(03 ) terms, (3.16) is equivalent to the corre-

lated binomial when the correlation is positive. o

Recently, using the C(a) procedure, Tarone (1979) was led to the

same test statistic S8 in (3.13) against the correlated binomial,

beta-binomial and the Wisniewski-type general mixture alternative.

Tarone's result of having the same C(a) test statistic S8 against those

three alternatives is seen to be an immediate consequence of Lemma 3.1.

In detecting extra-binomial variability from mixture alternatives

the locally optimal test has been derived only for the iid case. This

is not, however, the situation for the closely related Poisson case.

Collings and Margolin (1983) derive a LMPU test that detects negative

binomial departure from the Poisson in the one-way layout case by

extending Potthoff and Whittinghill's result in the iid case (Pott

hoff and Whittinghill, 1966 b). In the following lemma we provide a

necessary and sufficient condition on the mixing distribution for the

existence of an LMPU test for extra-binomial variability from a mix

ture departure in a one-way layout.

Lemma 3.2: Suppose the mean success probabilities are unknown. Let

the null hypothesis HO and the alternative hypothesis Ha be represented

as follows;

j=l, ... ,ri , i=l, ... ,k

(3.17)

where Ul ,U2"",Uk are independent random variables and Ui has a

p.d.f gi(e) with mean Pi and finite vadance E;;cr~ for i=l, ... ,k.

Then the LMPU test of HO against Ha exists if and only if cr~ is a

constant multiple of p~q~ for i=l, ... ,k.

Proof. Using transformations Ui = Pi(l+ciVi ), where ci = .{"cr./p.1 1

(3.19)

(3. 17) becomesn. x.. n.-x.. x.. n.-x ..

X.. - ( 1) p. 1Jq.l l JJ(l+c.v.) lJ {1-C.p.v./q.) 1 lJg~(v.)dv., (3.18)lJ x.. 1 1 1 1 1 1 1 , 1 1 1 1lJ

where g~(e) is a p.d.f of V..1 1

Under the null hypothesis HO : E;; = 0, S(~) = (Xl+, ... ,Xk+),r.

where Xi+ = Lj~l Xij for i=l, ... ,k, is complete and sufficient for the

unknown parameter Q = (Pl, ... ,Pk)' We attempt to construct a test

having Neyman structure. The conditional likelihood of

~_= (~l"" '~k)' where ~i = (X i1 ,··· ,Xir .) for i=I, ... ,k, given the1

sufficient statistic S(~) is

k r i n. k n.r.II II ( 1 ) / II ( 1 1)

i=l j=l )(ij i=l xi+

under HO' andk

A {l+~' crzi 2 ,1 2 2 3/2 }<" L L [(x .. -n.p.) +x .. (2p.-l)-n.p.]+0(E;; )i =1 2p. q. j =1 1J 1 1 1J 1 1 1

k r i n.II II (1)

i=l j=l Xij

(3.20)

under Ha, where A is a quantity depending on the data only through

Sl~) .

Since the conditional likelihood ratio, i.e., the ratio of (3.20) to

(3.19) depends on the unspecified parameters, no uniformly most

powerful test of Neyman structure exists.

By the Neyman-Pearson fundamental lemma the locally most power

ful test criterion having Neyman structure becomes the ratio of

(3.20) to l3.l9) as ~ + 0, which is2

k O"i r i 2 2S = I· 1 22 I· l[(x .. -n.p.) +X .. (2p.-l)-n.p.]c 1= 2p.q. J= 1J 1 1 1J 1 1 1

(3.21 )

However, Sc cannot be a test statistic unless the dependence on222O"i/(Piqi) is removed. This dependence is removed if and only if

222O"i=aPiqi' i=1,2, .. .,k (3.22)

where a is a constant.

Now under (3.22) the LMP test of Neyman structure based on Sc is

equivalent to a test based on* _ k r i 2

S -'·l'·lX...C L. 1= L. J = 1J

The argument in Appendix I can be extended to the one-way layout

data to show that the LMP test of Neyman structure based on S* is. cLMPU. 0

Remark: It is difficult to find a class of mixing distributions that

satisfy l3.22); the beta does not.

3.2 Random Effects Model of One-Way Layout

In the last section, we discussed mixture models for binomial

distributions, so as to allow random variation of the success prob-

ability. We extend the discussion to mUltinomial distributions and

focus on a Dirichlet mixture of multinomials alternative. This section

consists of a model specification to accommodate random effects in a

multinomial model, the development of an asymptotically optimal test

statistic for detecting these random effects by using Neyman's C(a)

procedure (Neyman, 1959), the null and alternative distribution of the

test statistic, and the large-sample comparison of the C(a) statistic

wtth the classical chi-square statistic. Because of Lemma 3.2, which

includes the case of beta-binomial distributions, it is difficult to

find a locally optimal test for detecting a Dirichlet mixture departure

from a multinomial in the non-iid case. Even in the iid case the local

optimality of a test that detects Wisniewski-tyoe general mixture depar

tures from a binomial is not preserved in the multinomial case. This is

shown in Appendix II in the case of a Dirichlet mixture of multinomials.

A Monte Carlo simulation of the power comparison of the C(u) statistic

and the chi-square statistic is presented. Finally a duality between

the C(u) statistic and Light and Margolin's Catanova statistic is

discussed. (Light and Margolin, 1971, Margolin and Light, 1974).

3.2.1 Dirichlet-Multinomial r~del

We ~onsider the multinomial as a generalization of the binomial and

consider a product of multinomials likelihood. Experimentally this can

arise from a situation in which we have G unordered experimental groups

and I unordered response categories with n+j observations taken in group

j for j=l, ... ,G. Data from such sampling can be represented in the

following contingency table;

I~1 2 ·.. G Response

Response Total

1 n11 n12 ·.. n1G n1+

2 n21 n22 ·.. n2G n2+

· • · · ·· · · • ·· · · • ·I nIl n12 ·.. n1G n1+

Group n+1 n+2 ·.. n+G n++Total

Table 3.1

IxG Contingency Table

Let the j-th group response vector, given the group total be denoted by

D.jl= (nlj, ... ,n1_1j ). One natural way of imposing random group effects

on the j-th group response vector is to generalize the multinomial

distribution by allowing the group probability vector itself to have a

Dirichlet distribution. Thus we have

I )ind ((D.j ~j' n+j ~ Mn+j , ~j)

for j=l,~.. ,G and

U U U iid D(r:<).~1 '~2"" '~G ~

(3.23)

(3.24)

From (3.24) it can be easily shown that the means, variances, and

covariances of Yi = yl = (U1, ... ,U1_1) are

E(U.) = 8./B1 1

Var(Ui ) = 8i (B-8i )/B2(B+l)

COV(Ui,Uj ) = -8i 8j / B2 (B+l), i f j

for i, j=l, ... ,I-l, where B = L~=18i.

It is useful to change the parameters by putting

Pi = 8;1B for i=l, ... ,I-l

8 = liB .

(3.25)

(3.26)

Then it is an easy exercise to show that the marginal distirbution of

~j' which is a Dirichlet mixture of multinomial distributions, has a

h(nj;£,.e) = ( n+j r~ nit (Pi+re~ / [+r

1(l+re~, (3.27)

ni j , ... , nI -1 j G-1 y-O J r-O 'J,I -1where nIj = n+j - £i=l nij for j=l, ... ,G.

We refer to a Dirichlet mixture of multinomials (3.27) as a Dirichlet

multinomial distribution and denote it by

(3.28)

whe re £. = (P1' ... ,PI-1) I •

which is symbolically described as

!!j ~ M(n+j' ~J.) Q. 0(£,8) •~J

Thus as an extension of a product of multinomial distributions the joint

distribution of (nl, ... ,nk) becomes a product of Dirichlet-multinomial

distributions, i.e.,

ind (nj ~ OM n+j , £,8), j=l, ... ,G. (3.29)

Since a Dirichlet-multinomial distribution is a multi-dimensional exten-

sion of a beta-binomial distribution, there are several other termino10-

gies lJohnson and Kotz, 1969, 1977).

3.2.2 Test of the Random Effects

In the product Dirichlet-multinomial model (3.29) 6 becomes the

parameter of interest for testing the existence of random group effects,

because if 6 = 0 the model reduces to a product of mu1tinomia1s; this

is a device we and others have employed to allow a single parameter to

introduce random effects. Thus the null hypothesis HO of no random

effects and the alternative hypothesis Ha of the existence of random

effects can be expressed as

HO 6 = 0(3.30)

-e Ha 8 > O.

Based on the one-way layout contingency table in Table l3.1} the 10g

likelihood function of 8, apart from the additive constant, is given byG I nij -1 n+j -1

£(6)= Lj=l{Li=l Lr=O 10g(P i+r6)- Lr=O 10gll+r6)} , (3.31)

3.2.2.1 Case of £ Known

It is easy to show that the uniformly most powerful (UMP) test for

HO versus Ha does not exist in this case. However, the LMP test of

Potthoff and Whittinghill l1966,a} rejects HO for large values of

a£(6}1 = 1 G {I nijlnij-1} }2 \'. I· \'. 1 - n+

J. ln+

J, -l)a8 lJ= l,= p.6=0 . ,

(3.32)G {I n .. (n .. -1} }

a: Lj=l Li=l 'J pi 'J == T1 '

\1-1 \1-1where n1j = n+j - Li=l nij and PI = 1- Li=l Pi'

Potthoff and Whittinghill (1966,a) proposed a method of moment approxi

mation to the nul I distribution of Tl by finding constants e,f, and g

that satisfied

e Tl + f ~ x2(g} , (3.33)

where x2(g} refers to a chi-square random variable with g degrees of

freedom; however, by expressing Tl in (3.32) in terms of a quadratic

form we can suggest another approximation of the null distribution of

Tl . To aid in the development, we introduce some useful results with

out proofs. Proofs can be found in Ronning (1982).

Lemma 3.3: Under HO the covariance matrix of Dj = (n1j , ... ,n1_1,j)'

is given by

·e= n+ j V ,

where 0p. = diag(P1, ... ,P1- l }, and V = Op.- EE'.1 1

Lemma 3.4: Let V and 0p. be defined as in (3.34). Then1

-1 -1 (V = 0p. + l/P1)E,1

where E is a (1-l}x(1-l) matrix consisting of one's only.

(3.34)

(3.35)

Lemma 3.5: Let Z. be a (1-l) dimensional vector with elements~J

n ..z .. = ~(--'!.l._ P.) for i=1, ... ,1-1, j=l, ... ,G; then

lJ J n+ j 1

Z~V-lz. = \~ 1(n .. -n .p.}2/n+. p.~J ~J L1= lJ +J 1 J 1

is Pearson's chi-square statistic for goodness of fit in the j-th group.

-1Hence ZjV Zj has an asymptotic chi-square distribution with I degrees

of freedom under HO.

Simple calculation can show that the test based on Tl in (3.32) is

equivalent to the test based on

T*l = [nJ"-n+J"P+ -2I {P- 11 1)],v-l[n.-n+·p~2I{p- 11 1)] ,~ .~ ~ ~ -J J- - - (3.36)

where 1 is a (1-1)x1 vector of one's only and I ~ 2. Thus by use of

Lemma 3.5 we can derive the following results;

(l) When the n+j's are all equal and P = ll/I)!, Ti is equivalent to

Pearson's chi-square statistic.

(2) If we

that a j =

is of the

assume that there exist a., 0 < a. < 1 for j=l , ... ,G suchJ J

lim n+j , then the limiting distribution of (l/n++)Tin++-+<>o n++

__1__ T*~ \~_ 2{I 1 0) l3.37)n++ 1 H l.J-1 a j Xj -, ,o

where {X~{I-1,0}; j=l, ... ,G} is a set of independent noncentra1

chi-square random variables with 1-1 degrees of freedom and non

centrality parameter 0=i2L~:~ (Pi _ ~)2 .

(3) In the special case of equal n+j's we have

__1__ T* V ) x2{G (I -l) , Go) ,n++ 1

where 0 is defined in (3.37).

3.2.2.2 Case of E Unknown

The case of unknown P is far more interesting, especially in terms

of applicability to real problems. It is shown in Appendix II, that a

locally optimal test for testing HO: 8 = 0 versus Ha : 8 > 0 does not

exist. However, a C{a) test is readily available.

In order to derive the C(a) test statistic we need the following

partial derivatives of the log-likelihood ~(8) of (3.31) evaluated at

8 = O.

(3.38)

( I I-l ).( 21-1 )n+ .- - 1n.. n+ .- . 1n. .-1 }J 1= 1J J . 1= ~2p2

I(3.39)

for ;=1,2, ... ,1-1.

1 G {In.. (n .. -l)(2n .. -1)

= _ -I I 1J 1J 1J8=0 6 j=l i=l p~

(2n+j -1)} (3.40)

for i=1, ... ,1-1 ,

where EO implies that the expectation is taken under e = O. Neyman

(1959) (see also Moran, 1970) has shown that when EO[¢2i(£)] = 0, the

null hypothesis can be tested using the statistic ¢l(E). where E is a

root - n++ consistent estimator of £. An obvious choice for E is the

MLE ~ =---nl \~ In. under HO. Substituting the MLE ~ in (3.38) we~ ++ L.J = ~J ~

obtain

A IG A A-l A

2~1(f) = J. l[n.-n+.PJ'V [n.-n+oP] - (1-1)n++~.- =. ~J J~ ~J J~

_ G I n+j[n;j ni + J2- n \ \ --- --- - ---~ - (1-1)n++L.j=lL.i=l n. r.=-- n +j ++'1+ vn+j ++

(3.41)

where v = 0" - ££1P.,Hence we see that the C(a) test is based on

(3.42)

(3.43)

In determining the approximate null distribution of Tk two

limiting results are available. One uses the central limit theorem

(CLT) on the iid multinomial random vectors as the sample size G tends

to infinity. In this limiting argument, Tk , properly normalized, has

an asymptotic N(O,l) distribution by the result of Neyman's C(a)

procedure (Neyman, 1959). Since EO[~2i(E)J = 0 for i=l, ... ,I-l,

the variance of ~l(£) is estimated by -EO[~3(E)]. From (3.40) it

foll ows that1 G

-EO[~3(£)] = 2(I-l)Lj=ln+j (n+j -l)

Since Tk = n++ 2~1(E) + (1-1), by normalizing Tk, we find that under

HO: e = 0 the statistic

(3.44)

has an asymptotic chi-square distribution with 1 degrees of freedom.

We may consider another limiting argument that uses the multi

variate normal approximation of the multinomial distribution when the

number of groups, G, is held fixed and the group sizes {n+j}~=l tend

to infinity in such a manner that n+j/n++ + a j , a < a j < 1, for

j=l, ... ,G. In the following discussion, the approximate null and

alternative distributions are based on this limiting arguments, which

may better reflect practical experimental considerations where the

number of groups is fixed; we conjecture that these results will

provide a better sampling approximation for finite sample sizes.

The hypothesis test HO : e = a versus Ha : e > a has been described

as detection of a Dirichlet-multinomial departure from the multinomial

distribution. For this purpose two other test statistics that have

been proposed for fixed effects problems are worthy of consideration:

(3.45)

c = (3.46)

2where Xp is Pearson's chi-square statistic and C is the Catanova

statistic suggested by Light and Margolin (1971, also Margolin and

Light, 1974). For the relations among these three statistics, Tk, C2and Xp, we observe that

(i) When n+j = n for j=l, ... ,G, the test based on Tk is identical to2the test based on Xp.

(ii) When 1=2, C is equivalent to X~. (Light and Margolin, 1971).

Hence when n+j = n for all j and 1=2, these three statistics are

equivalent.

For the comparison of the three statistics in terms of large sample

behavior, we obtain the asymptotic relative efficiency (ARE) of X~

relative to Tk.

tractable form.

The ARE of C relative to Tk does not turn out to be a

Later we discuss duality between Tk and C.

3.2.3 Approximate Null and Alternative Distributions

3.2.3.1 Approximate Null Distributions

We define the following notation for j=1,2, ... ,G;

ZI. =_ Z'.(P) 1 (P P )~J ~J = r--- nij-n+j 1,···,nI_lj-n+j 1-1

A A A (n l + nI -1+J.t: = (P1' . . . , PI-1)= n++ ,..., n++

L. = Z.(P)~J ~J ~

=[ AJ ~ ~]vH / ~ , / ~ ,.... / ~++ ++ ++

M = [n+1, n+2 ,..., n+G] 1

n++ n++ n++

(3.47)

(3.48)

(3.49)

(3.50)

(3.51)

(3.52)

(3.53)

o.· =J

n+ jlim --- as n+J. and n++ tend to infinity

n++(3.54)

~ = (!a1 ,... , 1aG)1

A = (0.1'"" o.G) 1

We may express ~ = lim ~ and A = lim M.n++-+<:o n++-+<:o

It is well known that as n . + 00+J

vZ. -'-'-+H N(O, V)~J 0 ~

for j=l , ... ,G, where

V= V(£) = D - ££1P.1

(3.55)

(3.56)

(3.57)

Also we note for j=l, ... ,G

Z. = Z. - ~(P-P)~J ~J +J ~ ~

(.n+j)~ G (n+k)~= Z - 1- I _ - k~j In++ k-l n++ k

By using the above we can express %in terms of 1 as

(3.58)

(3.59)

where Ik is a k x k identity matrix and ~ stands for the Kronecker

product. The asymptotic distribution of k can be obtained by using

(3.57) and the independence of kl, ... ,kG:

vk~ N(Q,(I G @ V).o

(3.60)

Hence by using (3.59) and the idempotency of (IG - ~~I) we obtain

-e A Vk le-+ N(Q, IG- ~~I) ~ V).o

Now, Pearson's chi-square statistic X~ can be expressed as

2 _ \'G A I A_1 .A

Xp - Lj=lZjV Zj

= ZI{I ~ V-l)Z~ G ~

For further discussion, the following lemma is useful.

(3.61)

(3.62)

Lemma 3.6: Under HO V= V + 0p(l).

Proof. Using maximum absolute column sum 1\ -II, for the matrix norm

we haveA 1-1 A A 1-1 A AII V-VIl l = max L--l IP'PI - P.PII = I·-1IP.Pc P.PIIl::=;;i::=;;I-l 1- 1 1 1- 1 1

where PI = 1 - Li:~Pi and Pi is accordingly defined. Since

ni +!fa B(n++ ' Pi)' Pi = Pi + 0P(1) as n++ -+ 00 for i =1, . . . , I-1. Thus by

the continuity the result follows. 0

Thus by lemma 3.6 (3.62) can be written as

(3.63)

By invoking a theorem in quadratic forms it can be seen that X~ is

asymptotically distributed as

2 V \(I~l)G A* 2(1)Xp~ Li=l iXi 'a

where {Ai; i=l, ... ,{I-1)G} is the set of eigenvalues of

(IG 0 V-1)[(IG-~~') 0 V] = (IG- ~~I) 0 11_1

(3.64)

(3.65)

(3.66)

(3.67)

and {X~(l); i=1, ... ,(I-1)G} are iid chi-square random variables with 1

degrees of freedom. The eigenvalues of (IG - ~1A1) 0 11-1 are cross

products of eigenvalues of (IG- ~~') and those of 11_1 . Since 11_1

has an eigenvalue 1 with multiplicity I~l, (3.64) is equivalent to

2 V G 2Xp~ Ii=l PiXi(I-1) ,a

where Pi's are eigenvalues of (IG- 1A/A1).

Since IG - ~~I is idempotent of rank G-l we have (G-l) one's and one

zero for its eigenvalues. Thus (3.66) becomes

X~ ~ x2(I -1)( G-l )) ,

aa well known result.

We now consider the null distribution of the C(a) statistic Tk,

which can be expressed as

(3.68)

= \~ [n+j]~ Z~V-l[n+j]~Z.LJ=l n ~J n ~J++ ++

For notational convenience we define

Z"'!" = (n+j)~Z.""J n++ ""J

(3.69)

7:,* = (fi',"· ,fG I ) (3.70)

By the same arguments for obtaining the distribution of % in (3.59)

we obtain:

%*~ N(Q, (D -AA ' )@ V),o a i

Da . = di ag (0'.1 ' • • • , aG) •,Now using %* and lemma 3.6, we can express Tk as

Tk = Z*'(IG @ V-l)Z*(l+o (1))."" "" p

(3.71)

(3.72)

(3.73)

·e Thus, using the same arguments employed in (3.64)-(3.66) the asymptotic

distribution of Tk under HO is obtained as

V G 2Tk~ Lj=l Aj Xj(I-l) ,

owhere {AJ.:j=l, ... ,G} is the set of eigenvalues of (D -AA I

). We maya.,note here that n(D -AA ' ) is the singular covariance matrix of aa imultinomial distribution M(n,8).

Even though some computer subroutines can readily provide the

eigenvalues of (D -AA ' ), the determination of the eigenvalues appearsa i

to be an algebraically unsolved problem except that one of the eigen-

va 1ues is known to be zero. (Roy et ~, 1960, Li ght and Margo1in, 1971,

and Ronning, 1982). Since the O'.i IS are known, however, we may approxi

mate the distribution of Tk by 9X2(h), where the constants g and hare

chosen so that gX2(h) has the same first two moments as those of Tk. In

doing this we use the following results on D -AA';a i

trace(D -AA ' ) = Al+···+AG 1 = l-\~-l a~a· - lJ- J,2 2 2 G 2 \G 3 \G 2 2

trace(Da.-AA ' ) = Al+···+AG_l = Ij=laj-2lj=laj+(lj=laj),Thus the asymptotic distribution of Tk can be approximated as

g- 1Tk H:' x2(h)

owhere

(3.74)

(3.75)

3.2.3.2 Approximate Alternative Distributions·e

h = G 2 G 3 G 22'\. la.-2\. la.+(\. la.)lJ= J lJ= J lJ= J

(3.77)

(3.76)

2We next derive the asymptotic distribution of Xp and Tk under Ha .

Here we use the remarkable resemblance of the mean and covariance

matrix of the Dirichlet-multinomial to those of the multinomial distri-

bution (Mosimann, 1962);

Ee(n.) = n+.p, j=l, ... ,G.....J J .....

Cove(nj) = (n+~:~l] n+jV, j=l, ... ,G,

where the subscript e indicates that the underlying distribution is the

Dirichlet-multinomial.

It has been observed that there are four different asymptotic forms

of the Dirichlet-multinomial distribution (Paul and Plackett, 1978).

Among them, one is of particular relevance to our development.

Theorem 3.1: (Paul and Plackett, 1978). Let

n. ~ M(n+.,u) A D(~) = M(n+.,u) A D(p,e),~J J ~ 1!;;:: J ~ 1! ~

(3.78)

and the P.'s and e are defined in (3.26).1

Write Si = n++¢i for all i, where ¢i's are fixed quantities and let

n++ -+ 00. Then

-~ Vn+j(TIj-n+j£) H

a) N(Q'Yj(e)v),

(3.79)

1im (n+.e+1) .n -+00 J++n .-+00

We may note that by the construction of Si = n++¢i for all i we assume

that e = (L~_lS.)-l = e = O(l/n++).1- 1 n++

Hence using this result of Paul and Plackett, it is easy to see that

VZ. -n--+ N(O,y.(e)V),~J H ~ Ja

(3.80)

VH ) N(Q, Dy . 0 v)a J

(3.8l)

Dy . = diag(Yl(e)""'YG(e)).J

Thus by using (3.59) and (3.81) we obtain

Z ~) N(Q,Q 0 V),a

(3.82)

Q = D - ~~'D -D ~~I+ ~~'D ~~Iy. y. y. y.J J J J

i . e. ,

G-~ (Y·+Y·-\k lakYk)lJ 1 Jl.=

Now it becomes straightforward to show that

-e where {oi:i=l, ... ,G} is a set of eigenvalues of Q, and

Tk ~ ) L~=1oiX~ (I -1) ,a

where {oi:i=l , ... ,G} is a set of eigenvalues of D Q Dfa: ra:

3.2.4 ARE of X~ Relative to Tk

(3.83)

(3.84)

To summarize the relevant distribution results, we have derived

the following:

(a) X2--nV--+l x2[(I-l)(G-l)]P HO

V(b) Tk H

where A.IS are eigenvalues of D -AA ' .1 a.

12 V G 2

(c) Xp Hal Li=lOiXi(I-l),

where 0i 's are eigenvalues of Q.

where O~IS are eigenvalues of 0 Q 01 ffliffli

Thus it can be shown that2var(XplHO) -----+) 2(I-l)(G-l)

Var(TkIHO) ) 2(I-l)L~=lA~=2(I-l)trace(Da.-AA,)21

G 2 G 3 G 2 2=2(I-l)[Lj=laj -Lj=laj +(Lj=laj ) ]

(3.85)

(3.86)

•d G 2 G 3 G 2 2ae Ee[TkIHa]le=o ----+) (I-l)[Lj=laj -2Lj=laj +(Lj=laj ) ]

= (I-l)trace(D _AA , )2ai

Hence the asymptotic relative efficiency (ARE) of the chi-square

statistic X~ relative to the C(a) statistic Tk is given by

(3.88)

(3.89)

where under H : e = e = O(l/n++).a n++

Interestingly Collings and Margolin (1983) obtained the same

expression of an ARE as (3.89) when they compared a C(a) test with

another test for detecting a negative binomial departure from a Poisson

in the regression through the origin case. They proved the following:

Theorem 3.1: (Collings and Margolin, 1983)

where the left equality holds if and only if G=2 and the right equalityGholds if and only if the group sizes {n+j}j=l are asymptotically

balanced.

Using the expression of ARE epIC in (3.89) we can prove

Lemma 3.7: The C(a) test is asymptotically equivalent to Pearson1sGchi-square test if and only if G=2 or all the group sizes {n+j }j=l are

asymptotically balanced.

Proof. We may express epiC as

-2 2-2epiC = A /(SA+A )

( )-1\,G-1 2 ( )-1\,G-1( -)2A = G-l li=l Ai' and SA = G-l li=l Ai-A .

Thus epiC = 1 if and only if s~ = O. But s~ =0 if and only if G=2 or

A1= ... =AG_1' which is equivalent to a1= ... =aG. (Light and Margolin,

1971, and Ronning, 1982). 0

3.2.5 Monte Carlo Simulation: Power Comparison

As shown above, the test based on Tk is superior to Pearson's

chi-square test based on considerations of asymptotic relative effi

ciency; however, the large sample properties do not necessarily hold for

small samples, nor are the local properties of the asymptotic relative

efficiency readily transferable to practical situations. Therefore, a

Monte Carlo simulation was conducted to compare the performance of the

two tests in terms of their sizes and powers.

S= 80, and initialize {n+j}j=l and the

The data for the Monte Carlo simulation were generated on the VAX

780 computer system at the National Institute of Environmental Health

Sciences. The program was written in Fortran and used two IMSL sub-

routines: GGAMR and GGMTN.

The following lemma is useful to generate random observations from

a Dirichlet distribution, say 0(£,8).

Lemma 3.8 (Wilks, 1962): Let Xl ,X2, ... ,X k+l be independent variables

having gamma distributions G(l,Sl)' G(1,S2),···,G(1 ,Sk+l). Define

Ik+ly. = X./( " lX.), , J= J

for i=l, ... ,k.

Then r = (Yl,··.,Yk) has a Dirichlet distribution D(~), where

~ = (61, ... ,6k+l ), and D(~) is defined in (1.16).

The Dirichlet distribution D(~) can be reparametrized as 0(£,8) by

(3.26). The Fortran program of the Monte Carlo simulation is outlined

as follows;

(i) Set £ = £0' and 8

upper bound (upbound) of 8.

(ii) Generate a set of S independent probability vectors

Ql ,···,QS from a Dirichlet distribution 0(£,8) ~sing IMSL subroutine

GGAMR and lemma 3.8.

(iii) Generate a contingency table from a product multinomial

distributions n~ lM(n+",u.) using IMSL subroutine GGMTN.J= J ~J

2(iv) Calculate Tk and Xp.

(v) Count the number of Tk and X~ values exceeding their cut

off values corresponding to a = O.OS.

(vi) Go to the step (ii) and repeat for 2,000 times.

(vii) Set e = 80 + 6 and go to the step (ii) until 8 ~ upbound.

For the calculation of sizes of Tk test and x~ test a subset consisting

of (iii)-(vi) of the above program was employed, because putting 8 = 0

in the step (i) involved division by zero in the step (ii).

The actual program was run for two sets of £0 and 8 ranges with the

same group sizes, which are listed in Table 3.2.

First Set Second Set

Po (0.05, 0.1,0.4, 0.45) (0.15, 0.2, 0.3, 0.35)

80 0.001 0.001

6 0.002 0.003

Upbound 0.031 0.025

Group 20, 20, 20, 200, 400 SAMESizes

Table 3.2

Two sets of Input Values of the Program.

The asymptotic relative efficiency of x~ to Tk is 0.415 for these group

sizes. Tables 3.3 and 3.4, respectively, display approximate power

functions of Tk and x~ tests for an 0.05 level based on the first and

the second sets of input values. Over the ranges of e values considered

the difference in powers can be as large as 0.086 for the first set of

input values and 0.115 for the second set. The ratio of the power of

the Tk test to that of the x~ test falls as low as 0.76 both cases

considered. Clearly, the Tk test can perform better than the x~ test.

Table 3.3 Approximate Powers of Tk Test and XpTest for

0.05 Size and!O = (.05,.1,.40,.45)1 and5 _

{n+j }j=l - {20,20,20,200,400 }

Approximate Power

8 Tk X2 differencep

0.000 0.0525 0.0505 0.0020

0.001 0.1060 0.0885 0.0175

0.003 0.2445 0.1700 0.0745

0.005 0.3545 0.2685 0.0860

0.007 0.4535 0.3680 0.0855

0.009 0.5255 0.4470 0.0785.- 0.011 0.5860 0.5175 0.0685

0.013 0.6385 0.5855 0.0530

0.015 0.7030 0.6480 0.0550

0.017 0.7165 0.6645 0.0520

0.019 0.7555 0.7270 0.0285

0.021 0.7805 0.7600 0.0205

0.023 0.7950 0.7925 0.0025

0.025 0.811 0 0.8100 0.0010

0.027 0.8250 0.8175 0.0075

0.029 0.8465 0.8410 0.0055

0.031 0.8640 0.8610 0.0030

Table 3.4 Approximate Powers of Tk Test and XpTest for

0.05 Size and ~o = (.15,.2,.3,.35)1 and

{ n+ j }j=~ =. { 20,20,20,200,400 }

Approximate Power

e Tk X2 differenceP

0.000 0.0535 0.0480 0.0055

0.001 0.1050 0.0780 0.0270

0.004 0.2900 0.2095 0.0805

0.007 0.4790 0.3640 0.1150

0.010 0.5825 0.4885 0.0940

0.013 0.6460 0.5840 0.0620

-e 0.016 0.7265 0.6865 0.0400

0.019 0.7640 0.7390 0.0250

0.022 0.7870 0.7815 0.0055

0.025 0.8440 0.8335 0.0105

3.2.6 Duality Betwep.n C and T.t<

Light and Margolin (1971) developed a categorical analysis of

variance (Catanova) procedure for data in an IxG contingency table.

They demonstrated that the measure of variation due to Gini could be

used to develop a measure R2 of explained variation, which in turn

could be viewed as a qualitative analog to the coefficient of

determination for continuous data.

Followinq Gini's definition of the total variation, Light and

Margolin defined the total sum of squares (TSS), the within group sum

of squares (WSS) and the between-group sum of squares (BSS) for the

replicated one-way classification under study as follows;

·eBSS. . = T55. . W55. .

'oJ 'oJ 'oJ

(3.90)

(3.91)

(3.92)

where the index ioj indicates that the row variables are random and are

being predicted from the fixed column variable. Based on these com-

Ponents a measure R? . of the proportion of variation in the row vari-, ° J .

able attributable to the column variable was proposed:

2R. . = B55. ./T55. . .'oJ 'oJ 'oJ

(3.93)

Later Margolin and Light (1974) observed that the R? . measure of, °J

association and t a , the sample version of Goodman-Kruskal's La' were

computationally identical. This observation led them to provide a

means of testing in the product multinomial model the hypothesis that

Ta was equal to zero, a test for which Goodman-Kruskal's asymptotic

distribution result (Goodman and Kruskal, 1959) was not applicable.

The test statistic was

2 2C. . = (n++-l) (1-1 )R.. X ((I - 1)( G-l) ) ,,OJ 'OJ HO

where C.. is Light and Margolin's Catanova statistic and'1 0 J

'is approximately distributed as'.

(3.94)

, is for

The C(a) statistic Tk obtained in (3.42) can be rewritten as

ni+ J2-n- In+ j++(3.95)

LI 1· IG n? __1_ \~ 2= . 1 - . 1 1J n '~J--l n+J.1= ni + "J= ++ -

From (3.92) and (3.95) we observe that

Tk = 2BSS. .,J 0 1

(3.96)

where BSS. . is obtai ned from BSS. . by systema ti c interchange ofJO' 1°J

columns and rows. As a corollary to the above relationship (3.96) we

Lemma 3.9: When we have only two grouping variables, i.e., G=2, the

C(a) test based on Tk is equivalent to the chi-square test based on')

Proof. X~ and Tk statistics can be rewritten, respectively, as

(3.97)

(3.98)

If i and j are interchanged, the argument provided by Light and

Margolin (1971) yields the result that

X2 [n~+]= 2 TkP n+ lo n+2

where G=2. 0

Remark: This is stronger than one part of Lemma 3.7, i.e., the

asymptotic equivalence is now proven for all ratios of sample sizes.

(3.99)

is nonrandom, a test based on Tk is equivalent to a test based on

C. ", i.e.,J 0'

Tk a: BSS. . /TSS. . a: C. .JO' JO' JO'

(3.100 )

2It can be shown that R.. = BSS .. /TSS .. is computationally equivalentJo, JO' JO,

to t b, an estimate of Goodman-Kruskal 's Lb (Goodman and Kruskal, 1954).

However, R~ . has a different operational interpretation fromJ 0'

tb(or Lb)· Rio i ' or equivalently Tk, is based on the column-wise

product multinomial model.

A possible case in which R~ . can be interpreted as a measure ofJ 0'

association is discussed in Chapter 5.

Appendix I: Wisniewski-type Alternatives

The proof follows the arquments in Lehman (1959) and Fraser (1957).

Let ¢(~) denote any test for detecting mixture departures from the

binomial distribution.

1. The power function 8¢(a) of any test is given by

under the Wisniewski-type general mixtures alternative (3.16). For a

fixed P 8¢(a) is continuous a = O. Hence any unbiased size-a test is

similar of size a.

2. L~=lXi is the complete sufficient statistic under HO. Thus any

similar test of size a has a Neyman structure with respect to I~=lXi.

3. As Potthoff and Whittinghill (1966,a) noted, a most powerful test

of Neyman structure is necessarily unbiased. Hence the locally most

powerful test of Neyman structure is necessarily LMPU.

Apoendix II: The Dirichlet-Multinomial Alternatives

Under HO: 8=0 (nl +, ••• ,n I _1+) is the complete sufficient statis

tic for the unknown probability vector e. By conditioning on the

sufficient statistic we attempt to find the locally optimal test of

Neyman structure as e 7 O. Under HO the conditional likelihood of the

data n = (nl' ... ,nG) given the sufficient statistic (n1+, ... ,n1_1+) has

a 'generalized multivariate l ~ypergeometric distribution, which is

given by

For the development of the conditional likelihood under Ha , the follow

ing lemma is useful.

Lemma A.l Under Ha

n++ IPH {(nl+,···,nI_l +)} =( n) II

a nl+,···, 1+ i=l

(proof) .

n. +-11 e

II {P.+r ,,)r=O 1 U

n -1+~ (l+r~)r=O G

Using the multi-urn (G urns) extension of the urn models with

stochastic replacements that generates a multivariate Polya-Eqgenberger

distribution and noting the equivalence of the multivariate

Polya-Eggenberger distribution to the Dirichlet-Multinomial distri-

bution, the result readily follows.

Hence under Ha , using the lemma A.l, we obtain the conditional

likelihood of ~ given (n1+,· .. ,n I+) as

~his is a generalization of the multivariate hypergeometricdistribution discussed in the context of urn models in Johnson andKotz (1977).

n .. -1lJII (P.+re)

G n+ j IPH {n \(nl +,··· ,n I+)} = A[ IT (n n) II

a j=l lj'"'' Ij i=l

n .-1+JII (l+re)],

where A is a quantity depending on the data only through the

sufficient statistic.

Now, with some algebra (A3) can be rewritten as

e I n.. {n .. -1) 2{1+ z[ I lJ lJ - n+J.(n+J.-l)]+O(e )}

i =1 Pi

{ [-I G n.. (n .. -1) G D 2 }

1+ eLL 1 J 1 J - In. (n . -1) +0 (e )z . 1 . 1 P. . 1 +J +J .1= J= 1 J=

Thus the LMP test of Neyman structure, if it exists, has critical

region based on the large values of the ratio (A4) to (Al), which is

{e -I ,.. n .. {n .. -lf~ 2 }A 1+ To '\. ,\\; - lJ lJ + o{e )+ constant~ L1=FJ=1 P.

{ e 2 }= A 1+ 2 T1 + o{e ) + constant

where Tl is given by (3.32).

The test criterion Tl , which is equivalent to Ti in (3.36), involves

unspecified parameters V- l . 0

Remark: Even for the iid case, because of multiparameter

f = (Pl,.·:,PI-1)', the dependence on V- l in (A4) cannot be removed.

Thus the result of Wisniewski that there is a LMP test of Neyman

structure for Wisniewski-type general mixture alternatives does not

extend to the multi-dimensional generalization.

CHAPTER IV

BALANCED NESTED MIXED EFFECTS MODEL

4.1 Introduction

The one-way layout random effects model of chapter III can be extend

ed within the framework of the Dirichlet-multinomial distribution to a bal-

anced nested mixed effects model in which the row variable has fixed ef-

fects and the replications within each level of the row variable have ran

dom effects. An example of a balanced nested mixed effects model for dis

screte data may be obtained by modifying an example concerning anneals and

tinplates in Scheffe (1959, PI78). While the tinplates are regarded as a

random sample from a large population, the anneals are not, the interest

being in individual performance of anneal treatments on a common number of

tinplates in terms of various levels of corrosion resistance. Now, how-

ever, we consider a qualitative response with I levels instead of a quan-

titative one.

Let n. ok be the number of observations that were classified into thelJ

i-th level of response for the k-th replication within the j-th level of

the row variable for i=l, ... , I, j=l, ... , Rand k=l, ... , C. Because

the random effect is nested within the fixed effects, the {n ijk} do not

constitute a true three-dimensional contingency table. Nevertheless, the

data would probably be reported in the form of such a table and might be

analyzed via a Pearson chi-square test by an unthinking statistician. The

data might also be viewed as a three-dimensional table if there were an

attempt at blocking of experimental units, but in fact, the experimental

units were actually a source of random effects. We loosely refer to this

data as a three-dimensional contingency table. Denote the probability

vectors corresponding to the R levels of the row variable by ~1' ~2""'!R'

where TI, = (TIl" TI2 " ... , TIl 1')~ E Sa for j=l, ... ,R and Sa is de--J J J - J TI. TI.-J -J

fined in (1.13). Then by assuming that given ~j the k-th replication of

the j-th layer is determined by a Dirichlet-multinomial distribution

DM~+jk' TIj' e), the joint distribution of {nijk} is given by

.en+J'k = II n" k for j=l, ... , Rand k=l, ... , C.

i =1 lJ

The full model (4.1) is specified by parameters (TIl"'" TIR,e).

Using this parametrization we consider the following hypotheses of inter-

(i) No nested random effects

H: e = ar

(ii) No fixed row effects

Hf : TIl = TI2 = ... = !R .

Discussions in the following sections consist of finding a suitable test

statistics and its null distribution in each of these hypothesis testing

problems.

4.2 Test of the Nested Random Effects

We test the existence of the nested random effects in the presence

of the fixed row effects, which are represented by distinct TI.'s. How-J

ever, if there were no fixed effects, the arguments in section 3.2 could

be employed for this problem.

4.2.1 C(a) Test

We define the following notation for j=I, ... , R, k=I, ... , C;

OJ = diag(n1j , n2j ,···, nI_1j )

V. = V.(n.) = D. - n.n~J J ~J J ~J~J

V. = V.(n.)J J ~J

Z'k = Z·k(n.) = (n+·k)-i (n' k - n+·kn.)~J ~J ~J J ~J J ~J

A -1 CTIj = (n+jk ) Lk=1 ~jk

Z'k = Z'k(;')~J ~J ~J

fll = (/n+j1 , jn+j2 , ...J n+j + n+j+

ajk = lim n+jk/n+j + as n+j + tends to 00 •

ajk = n+jk/n+j+

(4.10)

,fA. = ([a;;, ~2'J J1 J

(4.11)

A. = (a· 1, a· 2 , ••• , a. )J J J JC

(4.12)

(4.13)

(4.14)

where nT = n+++.

Under the full model the joint probability of {n ijk} is given byI

R C n 'k .IT n .. (n ..+8) ... [IT ..+{n. 'k-1)8J({ }) ' (+J ) 1=1 1J 1J 1J 1J

Pr nijk = nj =1 ITk=1 n'Uk' ... ' n1jk (1+8)(1+2e) •.. [I+(n+jk

(4.15)e Define £(8) = 'log, pr({n ijk }) . In order to obtain a C(a) test statistic

we need the following derivatives evaluated at e = 0;

= ddtll = \'R \,C {r n. ·k(n. ·k-1)de LJ·=1 L. k=1 \'. 1J 1Je=o L1=12TI .•

(4.16)

(2) _ d ~(e) _ c {-nijk(nijk-l) onIjk(nrjk-1)}'II •• ({1!.}) -,,~ 'de - Ik=1 2 + 2

1J J u" • • e-o 2 2lJ - TI.. TI r ·lJ J

for i=I, ... , 1-1, j=I, •.• , R, where nljk = n+jk - I~:~ nijk . Since

E[,¥.~2)J = a for all (i,j) under Hr:e = 0, following Neyman (1959) thelJ

hypothesis Hr : e = a can be tested based on the statistic '¥(I)({nj})'

where ~j is a root-n+j + consistent estimator of ~j' Substituting the

MLE TIj we obtain a C(a) test statistic

(3) _ -1 IR IC ( ~ )~A_l( A )T - nT [ . 1 kIn. k- n+.k TI. V. n .k-n+.k TI. ]J= = ~J J ~J J ~J J-J

Here we note the following representation of T(3);

(4.17)

(4.19)

(4.20)

is a C(a) statistic based on the j-th IxC contingency table for testing

Hr : e = O.

Denote Pearson's chi-square statistic based on the three-dimensional

IxRxC contingency table by X(3). Then by the additivity of chi-square

random variables

(4.21)

where xj (2) is a Pearson's chi-square statistic based on the j-trr IxC

contingency table.

We may note that the C(a) statistic T(3) based on a three-dimension

al contingency table is a weighted sum of corresponding C(a) statistics

based on its two-dimensional contingency sub-table, whereas the chi-square

statistic is a simple sum of chi-square statistics based on lower dimen

sional contingency tables. The representation (4.19) of the C(a) statis

tic T(3) will be used to obtain the asymptotic relative efficiency of

X(3) to T(3) later in this section.

For notational convenience we define for j=l, ... ,R, k=l, ...C

(4.22)

(4.23)

(4.24)

(4.25)

(4.26)

(4.27)

W= W(V1

, ... ,VR)

(4.28)

(4.29)

Then after some algebra it can be shown that the C(a) statistic T(3) can

be expressed as

(3) _ R C "*~ A_I "*T - \ '-I 'k-l Z'k V. Z'kLJ - L - ~J J ~J

A*~ A_I A*= ~(3) W ~(3)

(4.30)

A>; (Dra: - D;a:: IA IA~) H I I-I iRk Rk ~

(4.31)

"Then by (4.9) and (4.13) and using ~j = ~j+Op(l) it can be verified that

(4.32)

e Since a multinomial random vector converges in distribution to a multivari-

ate normal distribution when ~j is constant and n+ jk tends to infinity

for k=l, ... ,C; j=l, ... ,R, ~(3) has a limiting distribution given by

(4.33)

as {n+jk} tends to infinity."*Thus by (4.32) and (4.33) we obtain the limiting distribution of ~(3)

(4.34)

"Let I = U"WU. Then using properties of Kronecker products and IAj IAj

= I~=l ajk=l, we can simplify I to

.e I =

o(4.35)

LHence from (4.30), (4.34) and (4.35) we can derive that

T(3) ~ ) I~~11)RC Ai X~(l)r

where {Ai; i=l, ... ,(I-l)RC} are eigenvalues of w-1I .

(4.36)

o(4.37)

Since W- 11 = G s 11_1' (4.36) is reduced to

(3) V \RC * 2T H) Li=1 Ai Xi (I-I) ,

r(4.38)

*where {A.; i=1, ... ,RC} are eigenvalues of G in (4.37) .1

Proceeding as before, we may approximate the asymptotic distribution of

T(3) by equating the first two moments of g*-lT(3) with those of a chi

*square random variable with h degrees of freedom. First we note that

.eThus the null distribution of T(3) can be approximated as

g*-lT(3) t i( h*) ,r

4.2.2 ARE of X(3) relative to T(3)

(4.40)

(4.41)

Employing previous arguments in section 3.2 and noting the represen

tations(4.19) and (4.21), the alternative distributions of x(3) and T(3)

can be easily obtained. Thus omitting much algebraic details we can de

rive the following;

(a) Var (x(3)\Hr ) ------~ 2(I-1)(C-l)R

(4.42)

where G is defined in (4.37)

(c) JL E [x(3)\K JI ) (I-l)(l-L L b.a. 2k)

de e r e=O j k J J

(d) :e Ee[T(3)IKr J\ ~ (I-l)trace (G2) = (I-I) trace (G) •8=0

Thus the asymptotic relative efficiency e~lt of K(3) relative to T(3) is

given by

(3) _ (1-f ~~b_j_aj_~_)2 ---:-

epic - (C-l)R{L b. 2 [L a.~ - 2L a.~ + (Lk

aJ.~)2}j J k J k J

(I(C-l)R ;\*)2R,=1 R,=----=------

(C-l)R(Li~il)R ;\~2

*under Kr : 8=enT

=0(I/nT), where {;\R,; R,=l, ..• ,(C-l)R} is a set of eigen-

values of G in (4.37).

Now, as a straightforward extension of Theorem 3.1 we can obtain the

following theorem, given here without proof;

Theorem 4.1

where the left equality holds if and only if C=2 and R=1 and the rightR Cequality holds if and only if the group sizes {n+jk}j=l k=l are asymp-

totically balanced.

4.3 Test of Equality of the Fixed Row Effects

In testing the equality of the fixed row effects Hf ~l""~R in

the pseudo IxRxC contingency table, two statistics deserve consideration,

the Wald statistic, say W, and Pearson's chi-square statistic x~. It

can be seen that the generalized Wald statistic has a simple reference

distribution due to its construction, whereas Pearson's chi-square statis

tic has a complicated reference distribution as is shown in (3.83) in

the case of a one-way layout contingency table; this is because of the

underlying Dirichlet-multinomial distribution.

The comparison of these two statistics, Wand x~, is small samples is not

practicable. Even in a large sample comparison, as Puri and Sen (1971) indi

cate, the unique answer regarding the relative efficeincy of Wrelative to

x~ may not be possible, since the alternative distributions of the two statis

tics depend on more than one parameter. Hence we consider the simpler problem

*of testing Hf : ~1=~2 in a 2x2xC contingency table and calculate the asymp-

totic relative efficiency of Wrelative to x~ in an attempt to gain some in

sight into the original problem of testing Hf : ~1=~2= ..• =~R .

Our product Dirichlet-multinomial model is reduced into the product

multinomial model with the same probability vector, when 8=0. Hence,

when 8=0, by aggregating the data along the random dimension we obtain

a two-dimensional contingency sub-table of sufficient statistics. Thus

when 8=0, a test of Hf : ~l= ... =~R should be based on the collapsed

two-dimensional contingency sub-table of sufficient statistics. In a

product Dirichlet-multinomial model collapsing a three-dimensional con

tingency table along the random dimension does not yield sufficient stat

istics for (~l' ... '~R). Thus a statistician may want to base his test on

the full three-dimensional contingency table under two possible cases:

(i) He decides that collapsing the pseudo three-dimensional contingency

in the two-dimensional

table along the random dimension may incur loss of information on the

random effects, because collapsing does not yield sufficient statistics

when 8>0.

(ii) He mistakenly treats the balanced nested mixed effects model as a

crossed mixed effects model and bases his test on the pseudo three-di

mensional contingency table.

The effect of employing a test procedure based on the full three-dimen-

sional contingency table in a product Dirichlet-multinomial model can

be investigated by comparing test procedures based on collapsed and un

collapsed tables. In doing this we may suggest beneficial conditions

under which the test based on the collapsed table is asymptotically more

efficient than the test based on the uncollapsed table. In the remain

ing discussion we refer to the tests based on the collapsed two-dimen

sional table and full uncollapsed three-dimensional table as a test C,

and a test F, respectively.

4.3.1 Wald Statistic and Chi-Square Statistic

and let 'IT'" = ('IT l ,'IT2 )' be a sequence of points-n n nEuclidean space R 2 of the form ~n= ~6 + o~/;n

where lim ~n=~ , and ~O and ~ are fixed points. In order to have onlyn~

one extra parameter under the alternative hypothesis we set 0'" = (~,O).

Define U(~)=('IT1-TI2)' where ~ ... = (TI l ,TI2) is a point in R 2. Now under

the above formulation the null hypothesis

is understood as

as n -7 00,

118*and the alternative hypothesis, say Kf is formulated as

*Kf : In U(~n) = In (TI1n-TI2n ) • t"

where n = nT. (See Stroud, 1971, and Shuster and Downing, 1976 for

recent development of the generalized Wald statistic and its applica

tions.)

We use the following notations throughout this section.

Yjk = nl~m~ n+jk/n+j++J+

CAo = Ao(8) = Lk lYoknok(8)J . J = J J

for j=I,2 and k=I, •.. ,C, and

1 0 n+1+ (1 1 0 n+2+)B = 1m -n--' -(3 = 1m --nT~ T nT~ nT

B" = (B, I-B)

n+1+For simplicity we sometimes denote Band Yjk for n

(4.43)

(4.44)

(4.45)

(4.46)

(4.47)

(4.48)

(4.49)

n ° kand ~, re-nT

spectively, where there is no confusion.

Let Wc and x~ be the generalized Wald statistic and Pearson1s chi

square statistic, respectively, based on the collapsed table along the

random dimension. We consider the asymptotic relative efficiency (ARE)

of Wc to x~. Denote

n11+ n12+ A A

~n= (- ,-) = (TI1n ,TI2n ) (4.50)

n+1+ n+2+

A" n1++ n2++TI = (- , -n-) (4.51)~O nTe T

Then based on the asymptotic normality of the beta-binomial random vari

able as was indicated in Paul and Plackett (1978), we have under the null

*hypothesis HfA-' V • N(2. ~o{1-~O} [a-1\1 0]) .rnr (~n-~O) *Hf o (1-S)-l A2

hiT U(;n)V ) N(~~ [S-l A1 + (1-S)-l A2 ] TIO(l-TIO))*Hf

andA _ n1++ _ 0 ( -t)~O - -nr- - p n

where n=nTHence

(4.52)

(4.53)

(4.54)

.eA A 2

nT(TIln-TI2n)W

c= -_...:....--=..:..:---::.:.:...._---

[S-IAl+(I-S)-lA2]TIO(1-TIO)(4.55)

Similarly, under the alternative hypothesis

(4.56)

because

lim TI (l-n ) = TIO(I-TIO) •n-+«> n n

Thus is follows that

(4.57)

(4.58)V 2( 2 {[ -1 ( -1* -+ XI, ~ / B A1+ I-S) A2]nO(l-TIO)}Hf

where x2(v,o) is a noncentral chi-square random variable with v degrees

of freedom and noncentrality parameter o.

Now, by using previous argument it can be shown that under the null

*hypothesis Hf

where ¢1 and ¢2 are eigenvalues of the matrix 12

12 = (I2-1[1[~) A (I2-1[1[~) ,

(4.59)

(4.60)

and xI(l,O) and X~(l,O) are independent.

Since the matrix 12 in (4.60) is singular it can be readily found that

one eigenvalue is equal to zero and the other one is equal to A1(1-B)+A2B.

Thus from (4.59) it follows that

(4.61)

•~ To find the asymptotic distribution of x~ under the alternative K; we pro

ceed as follows. We can derive

(4.62)

1-B -/B(l-B) ~I1+

(4.63)A A

-/B(l-B)

Now, because of the singular transform in (4.63) it can be seen that

z = IS- Z2+ 11-S 1+.

Hence Pearson's chi-square statistic becomes

"- V N((1-S)IB~, TIO(1-TIO)[A~1-S)2+A2S(1-8)]) .Z1+ *~

x2 V ~ [A1(1-B)+A28J x2(1, B(1-B)~

*c Kf TIo(1-TIO)[A1(l-S)+A2S]

(4.64)

(4.65)

(4.66)

(4.67)

.e Based on the above results (4.55), (4.58), (4.61) and (4.67) it can

be seen that

Var(Wc)

8(l-B)

(4.68)

(4.69)

var(x~) ~ 2[A1(1-S)+A2B]2*Kf

~ E[X2] ~s(l-S)

*dt;2 c£;=0 Kf TIO(l-TIO)

(4.70)

(4.71)

Hence the asymptotic relative efficiency e 21 of X~ of Wc is equal toXc Wc

1. This may be considered as an extension of the equivalence of these

two test procedures in a product multinomial model to a product Dirichlet

multinomial model.

4.3.2 ARE of a test F relative to a test C

Since the generalized Wald Statistic is asymptotically equivalent to

*the Pearson chi-square statistic for testing Hf : TI 1=TI2 ' the Wald Statis-

tic, due to its simpler reference distribution, may be chosen for the

discussion for the ARE of a test F relative to a test C. We compare the

large sample behavior of the generalized Wald Statistic based on the col-

lapsed table and the full uncollapsed table. Let WF be the generalized

Wald statistic based on the full uncollapsed table. Then WF becomes the

sum of the generalized Wald statistic on each of 2x2 table along the ran-

dom dimension.

We define for j=1,2 and k=l, ... ,C

TI jk = nljk/n+jk

~k= (TI1k , TI2k )

Bk = nlim n++k/nT~

(4.72)

(4.73)

(4.74)

(4.75)

*Then under the alternative hypothesis Kf we can derive

Hence the generalized Wald statistic Wk based on the k-th 2x2 table is

(4.77)

(4.78)

Now by the standard method of asymptotic relative efficiency of WF rela

tive to W can be calculated asc 2

(8) =l~ Sknk(l-Yk) / s(l-sl JeW IW C 1 /-- ! ' ( 4•80 )

F c ,k=l Ykn2k(8)+(1-Yk)n1k(8) A1(1-B)+A2~L__

and we can prove

Theorem 4.3: When 8=0, i.e., under the product multinomial model

eW !W = limF c nrm

(4.81)

where equality holds if and only if {n+1k} is proportional to {n+2k } .

Proof. This can be readily proved by noting a theorem in Hardy, Little

wood and Polya (1923, p.61, theorem 67), which states that

a.b.I (a.+b.) I 1 1 < Ia.}).. 1 1 . +b . 1 11 1 a. . 1

unless {ail and {bi}are proportional. 0

When 8>0, we may have a reparametrization by noting that 8=8n=0{1/n)

in the passage to the limit. (Paul and Plackett, 1978).

Thus we define

n¢ = 8n n (4.82)

lim n8n = lim ¢n =¢ , (4.83)~ n~

where ¢ is some positive number. Then by using (4.82) and (4.83) the

(4.84)

Investigation of the formula (4.84) shows that the ARE depends on ¢ and

the group size ratios unless {n+1k} is proportional to {n+2k}. When

{n+1k} is proportional to {n+2k} it can be readily seen that the ARE of

WF relative to We is equal to t .The formula (4.84) of the ARE as a function of ¢ is based on the

O{*) assumption of e=8n• In practice, however, e is determined by na-

nature, and hence is fixed. Thus in order to provide some suggestions

to the practical statistician for the choice between WF and Wc in terms

of the ARE we may calculate the ARE of WF to Wc as a function of 8 for

different group sizes with the same group size ratios.

Indication of likely practical values of e may be obtained from

past empirical studies on the beta-binomial distribution done by Skell

am (1948), Kemp and Kemp (1956), Chatfield and Goodhardt (1970), Williams

(1975), Feder (1978), and Segreti and Munson (1981), among others. Their

Table 4.1 Approximate Range of a, nTa and the Type of Data

Author Total number of A A

observations a nTa Type of Data

Ske11 am 337 0.095 27.37 Number of associations inchromosomes

Kemp and Kemp 200 0.171 34.25 Number of contacts withpins in 200 frames tin

200 0.126 25.35 the analysis of pointquadrat data)

200 0.129 25.77

200 0.058 11.62

200 0.482 96.48

Chatfield and 50 0.320 16.02 Number of r weeks on whichGoodhardt purchases of certain item

.e 474 1.279 606.14 are made out of n weeks (r~n)

Wi 11 i ams 145 0.465 67.43 Number of pups survived inpregnant female rat

Feder 524 0.073 38.25 Number of fetal deaths amongtotal fetuses per litter

Segreti and 40 0.681 27.24 Number of fetal deaths amongMunson total fetuses per litter

estimated values of e and the total number of observations are shown in

Table 4.1.

We consider a simple case when C=3 for the calculation of ARE's of (4.84)

as a function of 8. Three sets of hypothetical group sizes, say D1, D2

n+11 n+12 n+13] [10 20 75]. .and D3 are considered, where D1= = 1S 1ntend-n+21 n+22 n+23 80 5 40

= [30 30 40]ed to refer to the seriously unbalanced group sizes and D3 25 35 45

represents reasonable proximity to the balanced group sizes. D2 =[20 15 50]35 60 20

is considered to represent the inba1ance somewhere between D1 and D3.

Group sizes are varied but group size ratios are maintained by multiply

ing constant k~l to D1, D2 and D3, respectively. ARE's based on D1, D2and D3 are presented in Figures 4.1, 4.2, and 4.3, respectively.

From the calculation of ARE's we may note that when the group sizes

do not exhibit 'serious' unbalance the total sample size barely affects

the ARE, which is substantially below 1 (see Figures 4.2 and 4.3). If,

however, the group sizes show 'serious' unbalance, the total group sizes

can affect the ARE (see Figure 4.1). It may be concluded that for group

sizes that do not exhibit 'serious' unb1ance the ARE of WF relative to

Wc is less than 1 for a practical range of 8 values. Thus based on this

conclusion we may point out that loss of efficiency is the effect of us

ing a test F procedure based on the full three-dimensional contingency

table in a product Dirichlet-multinomial model with practical e values.

However, this must be used with caution in practice. The effect of the

unknown 8 on the size of the test needs further study. In the remaining

Figure 4.1 ARE of WF to We Based on10 20 75kD1 = k (80 5 40) for k=1,5,10

. e 0.7

+- k= 10+- k=5+- k=l

'---+---if---+---i--+--+-+--t-+--+-+--+-+--+-+--+--!--l)Q0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7

Figure 4.2 ARE of WF to We Based on

kD 2 = k(j~ ~g ~~) for k=1,5,lO

. e 0.7

0 0.1 0.2 0.3 0.4 0.50.6 0.70.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7

__ k=10.... k=5... k=l

Figure 4.3

ARE of WF to We Based on30 30 40

kD3 = k( 2~ 35 45) for k=ls5 s10

o 0.1 0.2 0.30.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6e

section, we derive the form of Wc in the general pseudo IxRxC table.

4.3.3 Wald Statistic for testing the Equality of Fixed Effects.

From the results of previous discussion it may be concluded that

for the most practical cases of the group size ratios and the practical

range of e values, the generalized Wald statistic, Wc based on the col

lapsed table appears asymptotically more efficient than the generalized

Wald statistic, WF based on the full uncollapsed table. Hence in what

follows we employ Wc to test the equality of the fixed row effects in

the general pseudo IxRxC contingency table. We construct a test statis

tic together with its asymptotic null distribution.

We use the following notations for j=l, .•. ,R, k=l, ... ,C.

(4.85)

Tf-" =-0

(4.86)

A-" 1Tf. = ----- (n1 ·+, n2 ·+,···, nI - 1J.+)-J n+j + J J

-" (-" -" -")~G = ~l' ~2""'~R

~G = (~1' ~2"'" ~R)

(4.87)

(4.88)

(4.89)

(4.90)

1 -1 0 0 0

1 1 -2 0 0

1 1 -3 0

o. l-(R-l) (R-l)xR (4.91)

*U = H D 11_1

(4.92)

(4.93)

(4.94)

(4.95)

(4.96)

(4.97)

The null hypotheses Hf ~1=~2= ... =~R(=~O' say) can be restated as

*Hf : U ~G = ~ ( 4 •98 )

Now, based on the asymptotic normality of ~jk = (n+jk)-i(~jk-n+jk~j)

we can deri ve

v ~ N(O, 0 -1 (e) D Vo)Hf S. A.

(4.99)

U *" V * U*';n,:- ~G ----'---r~ N(~, U [0 -1 (e) D Vo] )Hf S· A.

(4.100)

Using (4.92) the covariance matrix of ;n,:- U*~G in (4.100) can be sim

pl ified as

u*[O -1 (e) D VO]U*' = [HD -1 (e)w'] D VoB· A. B· A.J J J J

by the property of Kronecker product.

4.101 )

*Since U = H III 11_1 is of rank (R-1)(1-1), by the theorem inA A A

Shuster and Downing (1976) and noting Vo = VO+Op(l), where Vo = VO(~O)'

we obtain

v ) x2((1-1)(R-1)). (4.102)Hf

Chapter V

FURTHER RESEARCH

In this chapter, we list four topics that are related to the previ

ous chapters and deserve further research considerations:A

(1) The uniqueness of the MLE e of a finite mixture of binomial distri-

butions.

(2) The likelihood ratio test of Ho : c=l vs. Ha : c=2, where c is the

number of components of a finite mixture of binomial distributions.

(3) The development of the TK statistic as a measure of association.

(4) The development of a nested pure random effects model for count

5.1 THE FINITE MIXTURE OF BINOMIAL DISTRIBUTIONS

Aside from the earlier development of some numerical algorithms that

provide an ML estimator of the mixing distribution in a finite mixture of

members of the exponential family, the fundamental properties such as the

existence, uniqueness and consistency of the ML estimator of the mixing

distribution have not been discussed until quite recently in the litera

ture. Simar (1977) presented an extensive examination of these properties

of the MLE in the case of a finite mixture of Poisson distributions, and

Jewell (1982) applied Simar's arguments to a finite mixture of exponential

distributions. Hill et ~ (1980) considered these problems in an infinite

mixture of the form h(t) = I~=l Pkfk(t) for known densities fk(t) that can

be found in the mixture of Poisson distributions, where fk(t)=e-At(At)k/kl

for A>O, k=O,1,2, .... Lindsay (1983) provided a convex geometric ap

proach for the solutions of these problems in a finite mixture in gener

al when identifiability is not an issue. In fact all the families of

mixture models that have been considered for the investigations of these

fundamental properties of the MLE were either always identifiable or as

sumed to be identifiable.

In this section we prove the existence of a MLE of a mixing distri

bution in a finite mixture of binomial distributions and indicate that

the MLE may not be unique.

Let Xl' X2, •.• , Xt be a random sample from a finite mixture h(x) of

binomial distributions with mixing distributions G, i.e.,

•(5.1)

*where GEG1, a class of all discrete distribution functions with at most

c atoms.

Suppose the observation vector (Xl, .•. ,Xt ) has k distinct points

O~Yl<Y2<"'<Yk~n. Let ni be the number of XiS equal to Yi' i=l, .•. ,k.

The mixture model (5.1) is poorly specified unless it is identifiable.

Hence we assume n22c-l to make the mixture model identifiable.

The log-likelihood function of X1"",Xt can be written as

1 x. n-x.L = IJ=l l09{!o (~j)p J(l_p) JdG(p)}

1 y. n-y.S. = f (yn)p '(l-p) 'dG(p)

for i=1,2, .• .,k.

*The equation (5.2) defines a many to one, map ¢ from Gc to a set B of

k-tuples (B1, ... ,Bk) in Rk unless k~c. If k~c then due to the identifi-

*ability condition one and only one G€Gc is associated with a single point

in B; hence ¢ becomes a one to one map.

*Let {G£} be a sequence of distribution functions in Gc • Since for

each ~~1 G£ has a finite support [0,1], {G£} is tight. Also by the Helly

Bray lemma there is a subsequence {G£ } that converges to a distributionk

function G*.

*Lemma 5.1: G*~ Gc .

Proof It suffices to show that G* does not have c+l atoms. Suppose on

the contrary G* has c+l atoms. Let the support points of G£ and G* bek

denoted by x1(k),x2(k), ..• xc(k)' and xl,x2, •.• ,xc,xc+l' respectively.

Since G~ (x) converges to G*(x) as k goes to 00 at all continuity pointsk

of G*, for sufficiently large k we can choose E>O such that xjENE(xj(k))c

for j=I, ••• ,c., and an extra support point x +1$U N (xJ"(k))' where N (y)c j=1 E E

implies the E-neighborhood of y. Without loss of generality we assume

xl(k)+E<xc+l<x2(k)-~ for large k. Set G*{(xc+1)}=a. Hence, for each

verge to G*(xO) due to the jump size a. Thus a contradiction is obtained.o

Lemma 5.2: B is compact.

Proof The binomial mass function is a bounded and continuous function

oHowever, we note that B is not convex due to the identifiability con-

*dition n~2c-1, which gives an upper bound of c to Gc '

The likelihood function (5.2) is strictly concave on a compact set

of p. Hence by the Helly··Bray lemma, lemma 5.1 and theorem 4.4.2 of Chung

(1968), every sequence of points of B contains a subsequence converging

to a point of B. Consequently B is compact.

B. Hence it has a unique maximum at some point(s) in B. But due to non-

convexity of B the point at which the likelihood function attains its

maximum may not be unique.

The investigation of sufficient conditions under which the likelihood

function attains its maximum at a unique point in B is proposed as further

research.

Another important problem in the finite mixture of binomials is that

the majority of estimation techniques assume that the number of components

of a mixture, which is c in our notation, is known a priori. However,

no really adequate test has been suggested for testing hypotheses con-

cerning c, even for the simple case of testing Ho : c=l versus Ha :c=2.

Everitt and Hand (1981) noted that IIthis may be a consequence of the prob

lem rather than any lack of ingenuity.1I

We briefly describe the problems involved in the likelihood ratio

test for testing Ho : c=l versus Ha : c=2 in the case of a mixture of two

binomial distributions. A mixture h(x;8) of two binomial distributions

is represented as

m) x )m-x ( )(m) x( )m-xh(x;~) = n(x p (l-p + I-n x q I-q , (5.4)

where m~3, and 8 = (n,p,q) is a parameter in the parameter space n in

Wo = {(n,p,q) O<n<l, O<p<q<IJ

WI = {(n,p,q) O<n<l, O<p=q<l}

W2 = {(I,p,q) O<p<q<l}

W3 = {(O,p,q) O<p<q<l}

The null hypothesis of no mixture Ho : c=1 now equivalent to HI = ~Ewl'

H2 : ~ew2" or H3 : ~Ew3' Here we may note that two non-standard condi

tions exist in the parameter space n. First the parameter 8 under the

null nypothesis falls on the boundary of n; hence the standard chi-square

distribution result of the likelihood ratio statistic -2 logA does not

hold. Second, the null hypothesis region w1uw2uw3 consists of a union

of hyperplanes of different dimensions.

Wilks' original result (Wilks, 1938) of the asymptotic distribution

of -2 logA was generalized by Chernoff (1954) to the case where the para

meter fell on the boundary between the null hypothesis and alternative

hypotneses regions. Feder (1968) also investigated the asymptotic distri

bution of -2 logA when the parameter was near the boundary between the

null hypothesis and alternative hypothesis regions. Feder (1968) relat-

ed Chernoffls and his results to obtain the null and alternative distributions

of -2 10gA for testing Ho : 8<0 vs. Ha :8>0 in the context of the beta-binomial

distribution described in (3.5), and observed that the asymptotic null dis

tribution of -2 10gA has a jump 1/2 at the origin and a chi-square dis-

tribution with 1 degree of freedom when A>O, i.e.,

-2 10gA ~~--+ (1/2) I(A=O) + (1/2)x2(1) I(A>O),o

where I is an indicator function.

Quite recently, Symons et ~ (1983) provide a Monte Carlo simulation

study of the distribution of -2 10gA for testing Ho : c=1 versus Ha : c=2

in a mixture of two Poisson distributions, and observe that the distribu-

tion function of -2 10gA has a jump 0.4 at the origin.

Even though Symons et ~ (1983) consider a mixture of two Poissons,

they still have non-standard conditions in their parameter space analo

gous to those stated earlier. The fact that their simulation study sup

ports certain aspects of Feder's results (i.e., jump at the origin) sug-

gests the need for further research on the asymptotic distribution of

-2 10gA under the two non-standard conditions in the parameter space men-

tioned earlier.

5.2 IK STATISTIC AS A MEASURE OF ASSOCIATION

We have discussed in section 3.2 that a measure of variation R1.;

could be constructed from the c(a) statistic TK by noting the duality

of the c(a) statistic TK and Light and Margclinls Catanova statistic.

Since R~ . is computationally equivalent to a Goodman-Druskal ISJO'

t b, an estimate of their Tb, the following properties of R10i or equiva

lently, t b are known. (Goodman and Kruskal, 1954, 1963, Margolin and

Light, 1974)

i) If there exists j such that n+j=n++, the TSSjoi is equal to zero;

hence R~ . is undefined.JO'

ii) If there does not exist a j such that n+j=n++ and if nij=ni+nj+/n++ for

all pairs of (i,j), then R10i =0.

iii) If there does not exist a j such that n+j=n++ and if for each i2 _

there exists a j such that nij=n i+, then Rjoi - 1.2iV) If none of (i), (ii), or (iii) occurs, then O<R ..<1.JO'

v) R10i is unchanged if all counts {nij } are multiplied by the same

positive constants.

vi) R~. is asymmetric in its treatment of rows and columns of a continJ.'gency table.

vii) R~ . is invariant under the permutation of rows or columns of aJ.'contingency table.

Even though R1.i is computationally equivalent to Goodman-Kruskal's

t b, the two are derived under different sampling models. For Goodman and

Kruskal (1954), and Light and Margolin (1971) row margins are fixed group

sample sizes and columns represent the response from a fixed effect or a

product-multinomial model. Here the column margins are fixed group sample

sizes and the row represents the response from a random effects or a product

Dirichlet-multinomial model.

An hypothetical example of a fixed group effects model in which

2Rj •i can be used as a measure of association can be envisaged as in the

following situation; suppose nA, nB, and nC represent the number of

patients with final diagnostic records A, B, and C, respectively, who

were initially classified into primary diagnostic records AI, BI , and C1

as in the following contingency table.

~, FinalA B C

Primary"'",--

AI nll n12 n13

B' n21 n22 n23

C' n31 n32 n33

Total nA nB nCI

Table (5.1)

One of the primary interests in this situation may lie in how much the

primary diagnostic records can account for the final diagnostic records.

The causal relation of interest goes from the row to the column, where-

as the data can be collected so that the probability model of the con

tingency table is based on the column-wise product multinomials model,

i.e., nA, nB, and nC are fixed.

We feel further research needs to be done to study Rj.i as a measure

of association in the fixed and random group effects models.

5.3 THE NESTED RANDOM GROUP EFFECTS MODEL OF COUNT DATA

In chapter 4 we discussed a nested mixed effects model of count data

within each row category), respectively.

in which random effects are nested within fixed effects. A natural ex-

tension of the nested mixed effects model would be the corresponding

nested pure random effects model within the framework of a Dirichlet

multinomial distribution. By drawing an analogy to nested random ef

fect models in ANOVA we may explicitly specify the nested random

effects model for count data. Only the balanced case is duscussed; here

the data can be presented in the form of an IxRxC contingency table,

where I, R, and C represent the number of response categories (nested

Let no ok be the number of sub'J

jects that are classified in (i, j, k) cell, and let ~jk = (nijk, ... ,nI-ijk)~

denote a response vector. It' notations will be used for denoting the

sum of nijk's over the corresponding indices.

Now, we may imagine that there is a Dirichlet population cf row cat

egories, labeled by the parameter (n,8), from which the R levels e1' e2'···' eR

of the row category are sampled. We next suppose that for each ej there

exists another Dirichlet population distribution of C levels ej1' ej2,.·.,ejc

of the column category and that D(ej,8) is the population distribution

of ej1' Ej2'···' ejc given ej· Similarly given (ej'~jk} the response

vector nOk can be conceived as a single observation from the multinomial-J

distribution M(n+jk'~jk).

Using the conditioning arguments we may express the hierarchy of the

nesting as follows:

. "d(pJn)'! D(n,81)

:.J - -

° °d(p"k!po)'2 D(p.,82)-J -J -J

(n'k1n+'k'P.,P·k) - M(n+'k'P' k)~.J J -J -J J -J

for j=1,2, ... ,R and k=1,2, ... ,C, where (xla,B) - F is understood that the

conditional distribution of x given a and B follows the distribution F.

We define the following notation for convenience.

* ~TI = (TIl' ... , TIl_I)

*ej = (Plj' ... , Pr-1,j)

*~jk = (P1jk'

*~jk = (n1jk ,

, PI - 1jk)

, nl-1jk)~

Using the analogy to the nested random effects model of continuous data,

*we may represent the mean response vector n+jk~jk in the k-th column

category within the j-th row category as

* *C'k = P'k - p,-J -J -J(5.7)

The two random vectors R, and CJ'k have zero means by their construction-J -

and variances given by

(5.8)81 * *~

Var(~J'k) = 82

+1 [Op .. - p.p. ]lJ -J-J

where 0 = diag(TI1, ... ,TIr 1) and D is similarly defined. We may noteTI i - Pij

further that the two random vectors R. and C. k are uncorrelated since~J -J

" * " *E[p,D·kITI ,p.] = R. E[C·kITI ,p.] = a ,J J - -J -J -J ~ -J

Eqn. (5.6) resolves the mean response vector into parts which may be re

garded as the overall mean, row effects and column effects (within each

row category). Also by noting (5.8) the hypothesis of no row random ef

fects Ho : ej=O for all j is equivalent to HR : 81=0, and similarly the

hypothesis of no column random effects (within each row category) is equal

Since the response vector ~jk is a random observation from M(n+jk,ejk)'

we may specify the nested random effects model of interest as

* *n' k = n+'k[TI +R.+C·kJ + €'k~J J - -J -J -J

(5.10)

* *"where €'k has mean vector a and covariance n 'k[Dp - P'kP'k ] and ~J'

-J - +J i j k -J "J-

and ~jk are independent, defined in (5.7), and have zero mean vectors and

variances as in (5.8). The model (5.10) is in complete analogy to the

nested random effects model of continuous data except for the normality

assumptions which are not valid here.

The joint distribution of {nijk} in a nested random effects model

is not in tractable form. Nevertheless, it is mildly encouraging that

one can specify the nested random effects model of count data in the ex

e plicit form of (5.10). Even though we feel that the arguments of chap-

ter 4 may be similarly employed for the hypotheses test HR 81=0 and

He : 82=0, no results have been obtained at this time.

BIBLIOGRAPHY

Altham, Patricia M. E. (1978). Two generalizations of the binomial distribution. Applied Statistics 27, 162-7.

Ames, Bruce N., McCann, Joyce and Yamasaki, Edith (1975). Methods fordetecting carcinogens and mutagens with the salmonella/mammalian-microsome mutagenecity test. Mutation Research 31, 347-64.

Bahadur, R. R. (1961). A representation of the joint distribution ofresponses to n dichotomous items. In Studies on Item Analysis andPrediction, H. Solomon (ed.), Stanford University, Stanford, California.

Blischke, W. R. (1962). Moment estimators for the parameters ofa mixture of two binomial distributions. Annals of Mathematical Statistics33, 444-54.

Blischke, W. R. (1964). Estimating the parameters of mixtures of binomial distributions. Journal of the American Statistical Association59, 510-28.

Brier, Stephen S. (1980). Analysis of contingency table under clustersampling. Biometrika 67, 591-6.

Chanda, K. C. (1954). A note on the consistency and maxima of the rootsof likelihood equations. Biometrika 41, 56-61.

Chandra, S. (1977). Onthe mixture of probability distributions. Scan-dinavian Journal of Statistics 4, 105-12. ----

Chatfield, C. and Goodhardt, G. J. (1970). The beta-binomial model forconsumer purchasing behavior. Applied Statistics 19, 240-50.

Chatterji, S. D. (1963). Some elementary characterizations of the Poisson distribution. American Mathematical Monthly 70, 958-64.

Chernoff, Herman (1954). On the distribution of the likelihood ratio.Annals of Mathematical Statistics 25, 573-8.

Choi, K. and Bulgren, W. G. (1968). An estimation procedure for mixturesof distributions. Journal of the Royal Statistical Society B 30, 44460.

ratio test --- withUnpublished Manu

Sciences, Research

Chung, Kai Lai (1974). A Course in Probability Theory. Academic Press,New York.

Collings, Bruce J. and Margolin, Barry H. (1983). Testing of fit forthe Poisson assumption when observations are not identically distributed. Submitted for Journal of the American Statistical Association.

Cramer, Harald (1946). Mathematical Methods of Statistics. PrincetonUniversity Press, Princeton.

Crowder, Martin J. (1978). Beta-binomial anova for proportions. AppliedStatistics 27, 34-7.

Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the RoyalStatistical Society B 39, 1-38.

Deely, J. J. and Kruse, R. L. (1968). Construction of sequences estimating the mixing distribution. Annals of Mathematical Statistics 39,286-88.

Efron, B. and Hinkley, D. V. (1978). The observed versus expected information. Biometrika 65, 581-90.

Everitt, B. S. and Hand, D. J. (1981). Finite Mixture Distributions.Chapman and Hall, London .

Feder, Paul 1. (1968). On the distribution of the log likelihood ratiotest statistic when the true parameter is II near ll the boundaries ofthe hypothesis regions. Annals of Mathematical Statistics 39, 204455.

Feder, Paul I. (1978). The beta binomial likelihoodapplication to the analysis of toxicological data.script, National Institute of Environmental HealthTriangle Park.

Feller, W. (l943). On a General Class of IIContagious ll Distributions.Annals of Mathematical Statistics 14, 389-400.

Fell er , Wi 11 i amApplication.

(1968). An Introduction to Probability Theory and ItsJohn Wiley and Sons, New York.

Fienberg, Stephen E. (1975). Comment on liThe observational study - areview ll by Sonya t,1cKinlay. Journal of the American Statistical Association 70, 521-3.

Fraser, D. A. S. (1957). Nonparametric Methods in Statistics. JohnWiley and Sons, New York.

Greenwood, M. and Yule, G. U. (1920). An inquiry into the nature offrequency distributions representative of multiple happenings withparticular reference to the occurrence of multiple attacks of diseaseor of repeated accidents. Journal of the Royal Statistical SocietyA83, 255-79.

Griffiths, D. A. (1973). Maximum likelihood estimation for the betabinomial distribution and an application to the household distribution of the total number of cases of a disease. Biometrics 29, 63748.

Goodman, Leo A. and Kruskal, William H. (1954). Measures of association for cross classifications. Journal of the American StatisticalAssociation 49, 732-64.

Goodman, Leo A. and Kruskal, William H. (1959). Measures of association for crosS classifications. II: Further discussion and reference.Journal of the American Statistical Association 54, 123-63.

Hardy, G. H., Littlewood, J. E. and Polya, G. (1964). Inegualities.Cambridge University Press, Cambridge.

Hasselblad, V. (1969). Estimation of finite mixtures of distributionsfrom the exponential family. Journal of the American StatisticalAssociation 64, 1459-71.

Haseman, J. K. and Kupper, L. L. (1979). An analysis of dichotomousresponse data from certain toxicological experiments. Biometrics 35,281-93.

Haseman, J. K. and Soares, E. R. (1976). The distribution of fetaldeath in control mice and its implications on statistical tests fordominant lethal effects. Mutation Research 41, 277-88.

Hill, David L., Saunders, Roy and Laud, Purushottam W. (1980). Maximumlikelihood estimation for mixtures. Canadian Journal of Statistics~, 87-93.

Jewell, Nocholas P. (1982). Mixture of exponential distributions. Annals of Statistics 10, 479-84.

Johnson, Norman L. and Kotz, Samuel (1969). Discrete Distributions.John Wiley and Sons, New York.

Johnson, Norman L. and Kotz, Samuel (1977). Urn Models and Their Application. John Wiley and Sons, New York.

Kabir, A. B. M. L. (1968). Estimation of parameters of a finite mixture of distributions. Journal of the Royal Statistical Soceity B30, 472-82. -

Kemp, C. D. and Kemp, Adrienne W. (1956). The analysis of point quadrat data. Australian Journal of Botany 4, 167-74.

Kiefer, Nicholas M. (1978). Discrete parameter variation: Efficientestimation of a switching regression model. Econometrica 46,427-34.

Kupper, L. L. and Haseman, J. K. (1978). The use of a correlated binomial model for the analysis of certain toxicological experiment.Biometrics 34, 69-76.

Laird, N. (1978). Nonparametric maximum likelihood estimation of amixing distribution. Journal of the American Statistical Association 73, 805-11.

Lehmann, E. L. (1959). Testing Statistical Hypothesis. John Wiley andSons, New York.

Light, Richard J. and Margolin, Barry H. (1971). An analysis of variance for categorical data. Journal of the American Statistical Association 66, 534-44.

Lindsay, Bruce G. (1983). The geometry of mixture likelihoods: A general theory. Annals of Statistics 11, 86-94.

Louis, Thomas A. (1982). Finding the observed information matrix whenusing the EM algorithm. Journal of the Royal Statistical Society B44, 226-33.

Margolin, Barry H., Kaplan, Norman and Zeiger, Errol (1981). Statistical analysis of the Ames salmonella/microsome test. Proceedings ofthe National Academy of Sciences 78, 3779-83.

Margolin, Barry H. and Light, Richard J. (1974). An analysis for categorical data, II: Small sample comparisons with chi square and othercompetitors. Journal of the American Statistical Association 69, 75564.

Mosimann, James E. (1962). On the compound multinomial distribution,the multivariate - distribution, and correlations among proportions.Biometrika 49, 65-82.

Moran, P. A. P. (1970). On asymptotically optimal test of composite hypotheses. Biometrika 57, 47-55.

Neveu, J. (1965). Mathematical Foundation of the Calculus of Probabilj!y, Holden Day, San Francisco.

Neyman, J. (1947). Outline of statistical treatment of the problem ofdiagnosis. Public Health Reports 62, 1449-56.

Neyman, Jerzy (1959). Optimal asymptotic tests of composite hypotheses.Probability and Statistics: The Herald Cramer Volumn, ed. Ulf Grenander. John Wiley and Sons, New York.

Orchard, T. and Woodbury, M. A. (1972). A missing information principle:Theory and application. Proceedings of Sixth Berkeley Symposium onMathematical Statistics and Probability 1, 697-715.

Paul, S. R. and Plackett, R. L.son mixtures, Biometrika 65,

(1978). Inference sensitivity for Pois591-602.

Pearson, K. (1894). Contributions to the Mathematical Theory of Evolution. Philosophical Transactions of Royal Society of London A 18571-110.

Pearson, K. (1915). On certain types of compound frequency distributionsin which the components can be individually described by binomial series. Biometrika 11, 139-44.

Potthoff, Richard F. and Whittinghill, Maurice (1969 a). Testing forhomogeneity I: The binomial and multinomial distributions. Biometrika53, 167-82.

Potthoff, Richard F. and Whittinghill, Maurice (1969 b). Testing forhomogeneity II: The Poisson distributions. Biometrika 53, 183-90.

Puri, Madan Lal and Sen, Pranab Kumar (1971). Nonparametric Methodsin Multivariate Analysis. John Wiley and Sons, New York.

Rider, Paul R. (1961 a). The method of moments applied to a mixtureof two exponential distributions. Annals of Mathematical Statistics32, 143-7.

Rider, Paul R. (1961 b). Estimating the parameters of mixed Poisson,binomial and Weibull distributions by the method of moments. Bulletinof the International Statistical Institute 39 Part 2, 225-32.

Ronning, Gerd. (1982). Characteristic values and triangular factorization of the covariance Matrix for multinomial, Dirichlet and multivariate hypergeometric distributions and some related results. Statistische Hefte 23, 152-76.

Roy, S. N., Greenberg, B. G. and Sarhan, A. E. (1960). Evaluation ofdeterminants, characteristic equations, and their roots for a classof patterned matrices. Journal of the Royal Statistical Society B22, 348-59.

Segreti, Anthony C. and Munson, Albert E. (1981). Estimation of themedian lethal dose when responses within a litter are correlated.Biometrics 37, 153-6.

Scheffe, Henry (1959). The Analysis of Variance. John Wiley and Sons,New York.

Shuster, J. J. and Downing, D. J.for complex sampling schemes.

(1976). Two-way contingency tablesBiometrika 63, 271-6.

Simar, Leopold (1976). Maximum likelihood estimation of a compoundPoisson process. Annals of Statistics 4, 1200-9.

Skellam, J. G. (1948). A probability distribution derived from thebinomial distribution by regarding the probability of success asvariable between the sets of trials. Journal of the Royal Statistical Society B 10, 257-61.

Stroud, T. W. F. (1971). On obtaining large-sample tests from asymptotically normal estimators. Annals of Mathematical Statistics 42,1412-24.

'Student' (1907). On the error of counting with a hemacytometer. Biometrika 5, 351-60.

Sundberg, R. (1974). Maximum likelihood theory for incomplete datafrom an exponential family. Scandinavian Journal of Statistics 1,49-58.

Symons, M. J., Grimson, R. C. and Yuan, Y. C. (1983). Clustering ofrare events. Biometrics 39, 193-205.

Tarone, R. E. (1979). Testing the goodness of fit of the binomial distribution. Biometrika 66, 585-90.

Tarone, Robert E. and Gruenhage, Gary (1975). A note on the uniquenessof roots of the likelihood equations for vector-valued parameters.Journal of the American Statistical Association 70, 903-4.

Teicher, H. (1963). Identifiability of finite mixtures. Annals of Mathematical Statistics 34, 1265-9.

Wilks, S. S. (1938). The large sample distribution of the likelihoodratio for testing composite hypotheses. Annals of Mathematical Statistics 9, 60-2.

Wilks, Samuel S. (1962). Mathematical Statistics. John Wiley and Sons,New York.

Williams, D. A. (1975). The analysis of binary responses from toxicological experiments involving reproduction and teratogenicity. Biometrics 31, 949-52.

Wisniewski, T. K. M. (1968). Testing for homogeneity of a binomial series. Biometrika 55, 426-8.

Wolf, J. H. (1970). Pattern clustering by multivariate mixture analysis.Multivariate Behavioral Research 5, 329-50.

Wu, C. F. Jeff (1983). On the convergence property of the EM algorithm.Annals of Statistics 11, 95-103 .

STUDIES OF MULTINOMIAL - NCSU

Transcript of STUDIES OF MULTINOMIAL - NCSU

STUDIES OF MULTINOMIAL - NCSU

Documents

Transcript of STUDIES OF MULTINOMIAL - NCSU

Empirical Studies in Test-Driven Development Laurie Williams, NCSU williams@csc.ncsu.edu.

@ NCSU Zhi Wang @ NCSU @ NCSU Xuxian Jiang @ NCSU @ Microsoft Research Weidong Cui @ Microsoft Research @ NCSU Peng Ning @ NCSU ACM CCS’09.

multinomial models - cimat.mxponciano/TCR/multinomialmodels.pdf · Multinomial models The multinomial distribution is a generalization of the binomial distribution, for categorical

Logistic Regression: Binomial, Multinomial and Ordinal · PDF file1 Logistic Regression: Binomial, Multinomial and Ordinal1 Håvard Hegre 23 September 2011 Chapter 3 Multinomial Logistic

Multinomial logit processes and preference discovery ...cess.nyu.edu/wp-content/uploads/2018/03/Multinomial-Logit-Processes.pdf · Multinomial logit processes and preference discovery:

Multinomial Regression Models

Multinomial logisticregression basicrelationships

NCSU Degree

TheTea - NCSU

ABSTRACT - NCSU

Multinomial Model Simulations

Multinomial Logit Models · Multinomial Logit Models Akshita, Ramyani, Sridevi & Trishita Econometrics-II, Instructor : Dr. Subrata Sarkar, IGIDR 19 April 2013 Group 7 Multinomial

echnician - NCSU

Applications - NCSU

, theTechnician - NCSU

RELEVANCE - NCSU

UUUUUUUUUUU - NCSU

Dirichlet-Multinomial Regression

LUMNINEWS - NCSU

NCSU Poster