Predicting Random Effects from a Finite Population of Unequal Size Clusters based
on Two-Stage Sampling
Edward J. Stanek III
Department of Public Health
401 Arnold House
University of Massachusetts
715 North Pleasant Street
Amherst, MA 01003-9304 USA
Julio M. Singer
Departamento de Estatística
Universidade de São Paulo
São Paulo, Brazil
C07ed13.doc 11/29/2007 12:37 PM i
ABSTRACT
Prediction of random effects is an important problem with expanding applications.
In the simplest context, the problem corresponds to prediction of the latent value (the
mean) of a realized cluster selected via two-stage sampling. Best linear unbiased
predictors developed from mixed models are widely used, but their development requires
distributional assumptions or an infinite population framework. When the number and
size of clusters is finite, super population models have been used to predict the
contribution of the unobserved units to a realized cluster mean. Recently, predictors
developed from a two-stage sampling model have been shown to out-perform the model-
based predictors. However, the random permutation model underlying these predictors is
limited to settings where cluster sizes and sample sizes per cluster are equal.
We present a new two-stage sampling model that can be used when cluster sizes
and sample sizes per cluster differ. The model expands the set of random variables from
the two-stage sampling model to accommodate the different size clusters. The resulting
best linear unbiased predictor out performs any of the competing predictors, even when
clusters and sample sizes per cluster are equal. The reduction in expected mean squared
error relative to predictors developed under mixed model or super population model
assumptions is likely due to the specificity of the expanded random permutation model to
the problem. This suggests faithfully capturing the stochastic aspects of a problem is
more important than simplifying assumptions in developing optimal solutions. Many
other problems may be amenable to improved solutions based on extension of this
approach.
C07ed13.doc 11/29/2007 12:37 PM ii
Contact Address: Edward J. Stanek III Department of Public Health 401 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models. ACKNOWLEDGEMENT. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA.
C07ed13.doc 11/29/2007 12:37 PM iii
1. INTRODUCTION
There is often interest in the mean unit response of a cluster where response is
observed for only a subset of the cluster’s units. In its simplest form, this problem is
often addressed in the context of a mixed model, and corresponds to predicting the mean
of a realized cluster in a two stage simple random sample. There are many applications
where such prediction is of interest. Henderson (1984) was interested in predicting a
cow’s milk production, and studied lactation records of dairy cows. In the current U.S.
educational environment, high school standardized test scores are of interest, with
schools having high stakes in the results due to their use in resource allocation. The
strength of a family’s emotional support is of interest in studies of coping with
subsequent illness in aging (Yorgason, et al. 2006). A patient’s biomarker level may be
important in quantifying risk of disease or subsequent intervention.
Although applications vary widely, each problem can be defined using a common
clustered population framework, where the target parameter can be defined as the average
expected response of units in a realized cluster. In the examples, a cluster may
correspond to a cow, a school, a family, or a patient, while the units may correspond to
lactation periods, students, family members, or time periods, respectively. Ideally, the
number and size of clusters is known, since the practical relevance and interpretation of
the statistical inference is enhanced by a clear problem definition.
It is rarely possible to observe the expected response on all units in each cluster.
Instead, response is observed on a subset of units for a subset of clusters. An important
C07ed13.doc 11/29/2007 12:37 PM 1
question is how to use such information to guess the average response of units in a
specified cluster. To answer this question, the observed data are linked to the population
via an assumed probability model, objective criteria are defined that can be used to
evaluate competing guesses, and the best guess is then determined.
Ideally, the problem defined in the population should be identical to the problem
answered via the stochastic model. The typical use of mixed models to guess the mean
response for a cluster is an example where the problem answered via the probability
model is a slightly different problem. In a mixed model, how best to estimate a
parameter corresponding to a cluster mean is translated into the question of how best to
predict a realized random effect. The cluster identified in the population is represented as
a random variable in the stochastic model. This translation changes the problem, since
the identity of the cluster is lost in the usual representation of the stochastic model. The
difference is subtle, since the subscript that identifies a cluster in the population is usually
replaced by a subscript representing the position of a selected cluster in the sample
(Cassel, Sarndal, and Wretman 1979). For example, the first cluster in a population
listing is not the same as the first sample cluster, which we refer to as the first primary
sampling unit, or PSU (as in Stanek and Singer, 2004). A similar substitution occurs for
the subscript identifying units in a cluster, and sample units for a sample cluster, which
we refer to as secondary sampling units, or SSUs. The differences between a cluster and
a PSU are often a source of confusion in interpreting predictors of realized random
effects.
C07ed13.doc 11/29/2007 12:37 PM 2
Model based approaches to predict a realized random effect are characterized by a
gap between the problem stated in terms of identifiable clusters and units in the finite
population and the random effects in a mixed model that do not identify cluster or units.
Since multiple probability models can be posited, different ‘optimal’ solutions can be
claimed for the same basic problem. Each solution suffers from the limitation that the
underlying model can not be connected back to the identifiable finite population. Thus,
while each solution is optimal for its theoretical framework, none of the frameworks
match reality, and it unclear which optimal solution is best.
There are other difficulties with model-based approaches that occur when cluster sizes
differ in the population. A problem occurs when trying to use a model to represent
sample SSUs in a PSU. Such a representation is important for the model to include
characteristics of units (within cluster variables) that may be related to response. The
problem occurs since vectors representing sample SSUs for a sample PSU are of different
size (corresponding to different numbers of sampled units). Interpretation of response for
an element in a sample vector then depends on the ordering of the PSUs, since vectors
representing different sample PSUs have different length. The mixed model assumption
that PSUs are selected at random conflicts with the conditioning needed to interpret
response for a SSU. In part to sidestep this difficulty, model based approaches may
assume the number of SSUs is infinitely large (as in the mixed model approaches of
Goldberger 1962; Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991;
and McCulloch and Searle 2001), or condition on cluster size (as in the superpopulation
model based approaches of Scott and Smith 1969; Ghosh and Lahiri 1987; Bolfarine and
C07ed13.doc 11/29/2007 12:37 PM 3
Zachs 1992; Valliant, Dorfman and Royall, 2000). Such assumptions widen the gap
between the problem defined in the finite population, and the solution to the problem
defined in the model. Mixed model methods are widely accepted and used in spite of
these issues, but the foundation for such methods is not completely secure.
We present a new expanded two stage random permutation (RP) model that
overcomes some of the limitations of previous probability models. A key feature of the
expanded model is simultaneously retaining the cluster identify and the PSU position in
the model, while distinguishing for a PSU the relevant contribution of sample SSUs, and
non-sampled SSUs to a target random variable such as a cluster mean. The model
extends the expanded model for simple random sampling of Stanek, Singer, and Lencina
(2004) to the two-stage RP model used by Stanek and Singer (2004), while allowing for
possible unequal size clusters. We develop the best linear unbiased predictor (BLUP) of
a PSU mean using this model, and show that the predictor has smaller mean squared error
than other predictors, including the usual mixed model predictor.
Beginning with an explicit representation of random variables underlying a two
stage RP of a finite population, we show that when predicting a PSU mean via a linear
unbiased predictor, the expanded model can not be further reduced without loss of
information. Since the expanded model retains this information, the best linear unbiased
predictor (developed in an identical manner as the development in Stanek and Singer
(2004)), has smaller expected MSE than any previously developed predictors (including
commonly used mixed model predictors, super population model predictors). We
characterize the extent of the reduction in the theoretical expected mean squared error,
C07ed13.doc 11/29/2007 12:37 PM 4
and illustrate results for empirical predictors via simulations relative to mixed model and
super population model predictors. We conclude by highlighting model features that
have consequence in extending this work and related work that offer promising
possibilities for future improvement.
2. AN EXPANDED RP MODEL FOR A FINITE CLUSTERED POPULATION
We first define notation and terminology for a clustered finite population. Let a
finite population be defined by a listing of sM units labeled 1,..., st M= in each of
clusters, labeled , where the non-stochastic response for unit t in cluster is
given by
N
1,...,s = N s
sty . We assume that the response for a unit can be observed without error, and
corresponds to the expected value for the unit. The finite population parameters
corresponding to the mean and variance of cluster s , 1,...,s N= , are defined by
1
1 sM
s stts
yM
μ=
= ∑ and ( )22
1
1 1 sMs
s stts s
My
M Mσ μ
=
⎛ ⎞−= −⎜ ⎟
⎝ ⎠∑ s
(where we use the survey sampling definition of the parameter 2sσ ). Similarly, the
population mean, and the variance between cluster means are defined as
1
1 N
ssN
μ μ=
= ∑ and ( )22
1
1 1 N
ss
NN N
σ μ μ=
−⎛ ⎞ = −⎜ ⎟⎝ ⎠
∑ .
Using these parameters, we represent the potentially observable response for unit t in
cluster as s
st sy stμ β ε= + +
C07ed13.doc 11/29/2007 12:37 PM 5
where (s s )β μ μ= − is the deviation of the mean for cluster from the overall mean, and s
( )st sty sε μ= − is the deviation of unit t ’s response (in cluster ) from the mean for
cluster . Defining
s
s ( )1 2 N
′′ ′ ′=y y y y where , the
model can be summarized as
( )1 2 ss s s sMy y y ′=y
μ= + +y X Zβ ε
where , =X 11 s
N
MN s× == ⊕Z 1 ,
1
N
ss
M=
= ∑ , and ( 1 2 N )β β β ′=β . Here, is an
column vector of ones,
a1
1a×1
N
ss=⊕A denotes a block diagonal matrix with blocks given by
sA (Graybill 1983), and is defined similarly to . None of the terms in the model are
random variables.
ε y
2.1. Random Variables and The Two Stage Random Permutation Model
We explicitly define a vector of random variables that represents a two stage RP
of the population. Assuming that each realization of the permutation is equally likely,
with probability
1
1
! !N
ss
N M=∏
, the random variables formally represent two-stage sampling
(Cochran 1977). We assume that the sample clusters are in the first positions in a
permutation of clusters. Similarly, we assume that the sample units in cluster
correspond to the units in the first
n
s
sm positions in a permutation of the cluster’s units.
C07ed13.doc 11/29/2007 12:37 PM 6
Note that these definitions represent random variables as a sequence as opposed to the
more usual set notation.
When all clusters are of equal size such that sM M= for all 1,...,s N= , Stanek
and Singer (2004) defined indicator random variables to explicitly represent a random
vector of dimension corresponding to a two-stage permutation of the population.
We follow a similar strategy when clusters sizes differ, but note two differences. First, to
retain the identity of clusters and PSUs in the vector representing a RP, we expand the
number of random variables from to . Second, we include a set of fixed known
weights associated with the SSUs in a cluster. Different target parameters (such as the
cluster mean, total, or a cluster regression parameter) can be formed easily by changing
the weights.
1NM ×
N
The expanded model uses a larger set of random variables than the random
variables used in Stanek and Singer’s model, or in super-population model approaches.
These random variables are fewer than the random variables that would result from a
further expansion that would retain the identity of units and SSUs, and even fewer than
2
the very general representation of the model used by Godambe (1955). We attribute the
better efficiency of predictors (to be shown) under the expanded model in part to the
larger number of random variables used to represent the basic problem. Also notice that
the two-stage RP model faithfully represents a two-stage cluster sampling design, so that
the methods are designed based. We define the weighted expanded random variables,
and illustrate the need for the expansion next.
C07ed13.doc 11/29/2007 12:37 PM 7
Sample indicator random variables relate the response for unit t in cluster , s sty ,
to the response for a unit in position j of a cluster in position i in a two stage
permutation of clusters and units. We define ( )sjtU as an indicator random variable that
takes on a value of one when SSU j in cluster is unit t , and zero otherwise, and use it
to represent the random variable corresponding to SSU j in cluster given by
s
s
( )
1
sMs
sj jtt
Y U=
=∑ sty . Let sjw denote a fixed non-stochastic weight for 1,...,s N= , 1,..., sj M= ,
and define the weighted response as . For example, wsj sj sjY w Y=1
sM
wsjj
Y=∑ corresponds to a
cluster total when for all 1sjw = 1,..., sj M= , or a cluster mean when 1sj
s
wM
= for
all 1,..., sj M= . Notice that we can define ( )swsj sj s jY w ′= y U where
, and ( ) ( ) ( )( )( )1 2 s
ss ssj j j jMU U U ′=U ( )1 2 sws ws ws ws
′MY Y Y=Y
N
. The vector
represents a permutation of weighted response for SSUs in cluster .
wsY
s
A permutation of clusters is defined using the indicator random variables for
and , that take on a value of one when PSU i is cluster , and a
value of zero otherwise. If all clusters were equal in size, we could represent a
permutation of SSUs for PSU i by . When cluster sizes differ, elements in this
expression are defined only when
isU
1,...,i N= 1,...,s = s
1
N
is wss
U=∑ Y
min( , 1,..., )sj M s N≤ = . Mixed model and
C07ed13.doc 11/29/2007 12:37 PM 8
superpopulation model representations of two stage cluster sampling do not formally
account for this range restriction in linking the random variables to the finite population. .
We directly account for different size clusters, and avoid the requirement of range
constraints for subscripts by expanding the number of random variables used to represent
a permutation of clusters and units. For PSU , the expanded random variables are
defined by the vector
i
1× ( )( ) ( )1 1 2 2wi is ws i w i w iN wNU U U U ′′ ′ ′= =Y Y Y Y Y , with a
two stage random permutation of the population represented by the vector, 1N ×
( )( ) ( )1 2w wi w w wN′′ ′ ′= =Y Y Y Y Y . An element of wY is given by . is wsjU Y
A simple example illustrates the notation. Suppose the population consists of
three clusters ( ), where the first two clusters have two units (3N = 1 2 2M M= = ), the
third cluster has three units ( ), 3 3M = 7= , and 1sjw = for all 1,...,s N= , 1,..., sj M= .
We represent a permutation of units for cluster by , s wsY 1,...,3s = . Suppose the first
permutation of clusters results in clusters 1s = , 2s = , and 3s = in positions
respectively, while a second permutation results in clusters
1,...,3i =
3s = , 2s = , and in
positions respectively. The representation of the random variables realized by
the first permutation of PSUs is the random vector (
1s =
1,...,3i =
)1 2 3w w w′′ ′ ′Y Y Y , and by the second
permutation of PSUs is the random vector ( )3 2 1w w w′′ ′ ′Y Y Y . Although both vectors are
of dimension , the third SSU in the first permutation is in PSU 1× 2i = , while the third
SSU in the second permutation is in PSU 1i = . The position of a SSU in the permuted
C07ed13.doc 11/29/2007 12:37 PM 9
population is not sufficient to retain the identity of the PSU for the SSU. In contrast,
using the expanded random variable representation, the random variables realized by the
first permutation of PSUs are represented by
( )1 2 3 2 2 3 2 2 3w w w′′ ′ ′ ′ ′ ′ ′ ′ ′Y 0 0 0 Y 0 0 0 Y , while those realized by the second
permutation are represented by ( )2 2 3 2 2 3 1 2 3w w w′′ ′ ′ ′ ′ ′ ′ ′ ′0 0 Y 0 Y 0 Y 0 0 , where
is an vector with all elements equal to zero. This notation preserves the identity of
the PSU for each SSU, avoiding the ambiguity that arises in mixed models and
superpopulation models.
a0
1a×
2.2 A Mixed Effect Model for the Expanded Random Variables
We represent a mixed model for the expanded RP model indexing representing
expectation with respect to permutations of clusters with the subscript 1ξ and expectation
with respect to permutations of units in a cluster with the subscript 2ξ . For PSU i , we
express
( ) ( ) ( )1 1 2 1 1 1|wi wi wi wi wiE E Eξ ξ ξ ξ ξ ξ⎡ ⎤= + −⎣ ⎦Y Y Y Y +E
where ( )1 2 1
1 N
wi ssE
Nξ ξ =
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
Y w μ , ( )2 1| 1
N
wi s s isEξ ξ μ
=
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
Y w U , ( )2 1|wi wi wiEξ ξ= −E Y Y ,
( )( ) ( )1 2 ss sj s s sMw w w w ′= =w , ( )( ) ( )1 2s Nμ μ μ μ ′= =μ and
( )( ) ( 1 2i is i i iNU U U U )′= =U . The fixed effects are given by μ , the vector of
C07ed13.doc 11/29/2007 12:37 PM 10
cluster means. The random effects, ( ) ( )2 1 1 1| wi wiE Eξ ξ ξ ξ−Y Y , are defined as the deviation
from the fixed effects of the expected response conditional on a realized PSU. In the RP
model of Stanek and Singer (2004), the random effect for the mean of PSU was defined
explicitly as
i
1
N
is ss
U β=∑ , with the random variables explicitly linking the clusters to
PSU i . In the expanded RP model, random effects are defined for SSU
isU
j in PSU as
. For both models, the expected value of the random effect (with
respect to
i
( )( 1sj )is isE Uξμ −sw U
1ξ ) is zero, but this result arises from quite different circumstances. The
deviation of response from the expected response within a PSU is represented by .
We combine these expressions arriving at the expanded RP mixed model given by
wiE
( )( )11 1
1 N N
w N s N s ss svec E
N ξμ= =
⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ + ⊗ ⊕ − +⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦Y 1 w wμ I w U U E
)
(1)
where . The variance of random effects is given by ( 1 2 N=U U U U
( )( )1 2 11 1
1var1
N N
N s s N s s N ss svec E
Nξ ξ ξμ μ= =
⎛ ⎞⎡ ⎤ ⎡⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − = ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢−⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣⎝ ⎠I w U U P w P w
1
N
ssμ
=
⎤⎥⎦
while ( )1 2
2
1 1 1var
s s
s
M MNs
w N sj M sjs j jw w
Nξ ξσ
= = =
⎛ ⎞⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ ⊕ ⊕⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠E I P , where 1
a a a= −P I Ja and
denotes an matrix with all elements equal to one.
aJ
a a×
2.3. Defining Random Variables of Interest
C07ed13.doc 11/29/2007 12:37 PM 11
Model (1) is an expanded version of a mixed model that retains the identity of
clusters, while accounting for a two stage RP. Our interest is in predicting a linear
combination of these random variables defined by wT ′= g Y , where is non-stochastic.
Although many linear combinations are possible, we limit the discussion to linear
combination given by where
g
wT ′= g Y ′ ′ ′= ⊗g c 1 and c is a 1N × vector of constants.
In particular, we focus on the setting where i=c e is an 1N × vector with all elements
equal to zero, except for element which has the value of one. Of principal interest is
the setting where i , such that the cluster of interest is realized in the sample. When
i
n≤
1sj
s
wM
= for all , 1,...,s N= 1,..., sj M= , the target random variable,
is the mean of PSU ; when 1 1
sMN
is sj sjs j
T U w Y= =
⎛ ⎞= ⎜
⎝ ⎠∑ ∑ ⎟ i 1sjw = for all 1,...,s N= ,
1,..., sj M= , the target random variable, is the total of PSU i . 1 1
sMN
is sj sjs j
T U w Y= =
⎛ ⎞= ⎜
⎝ ⎠∑ ∑ ⎟
3. PREDICTING A PSU MEAN USING AN EXPANDED RP MODEL
The basic strategy for developing a predictor under a model-based approach is
given in many places (Scott and Smith 1969; Royall 1976; Bolfarine and Zacks 1992;
Valliant et al. 2000), and applied in a design-based framework to a balanced two stage
cluster sampling by Stanek and Singer (2004). Stanek and Singer showed that the
predictor of a realized PSU mean from the RP model outperforms the mixed model and
C07ed13.doc 11/29/2007 12:37 PM 12
superpopulation model predictors. We develop similar results using the expanded RP
model, and illustrate that the resulting predictor of a realized PSU mean has even lower
expected MSE relative to the predictor given by Stanek and Singer (2004).
We assume that the elements in the sample portion of wY will be observed, and
express T as the sum of two parts, one which is a function of the sample, and the other
which is a function of the remaining random variables. Then, requiring the predictor to
be a linear function of the sample random variables and to be unbiased, coefficients are
evaluated that minimize, the expected value of the MSE given by . While
in theory, an optimal predictor can be expressed following this prediction recipe, in
practice, the high dimensionality of the vectors from the expansion of random variables
may result in singularities that prevent unique solution (as in Stanek, Singer, and Lencina
(2004)). In part for this reason, we project the random variables into a lower dimensional
space that retains the necessary information for an optimal solution, simplifying the
problem. The projection differs from the projection used when all clusters have equal
size.
(1 2ˆvar T Tξ ξ − )
3.1. Partial Collapsing of the Expanded RP Random Variables
Rao and Bellhouse (1978) (see Theorem 1.1) provide a way of determining
whether the optimal linear unbiased predictor of a target random variable, can
be obtained as the optimal linear unbiased predictor of
wT ′= g Y
p wpT ′= g Y based on ,
a vector of random variables that spans a lower dimensional space. We apply this
wp w′=Y C Y
C07ed13.doc 11/29/2007 12:37 PM 13
theorem when ( )( )
1 1
1 1
s s s
s s s
N N
m M mi sN N
m M mi s
−= =
−= =
⎛ ⎞′ ′⊕⊕⎜ ⎟′ = ⎜ ⎟
⎜ ⎟′ ′⊕⊕⎜ ⎟⎝ ⎠
1 0C
0 1 and ( ) 1
p
−⎡ ⎤′ ′ ′= ⎢ ⎥⎣ ⎦g g C C C , where sm represents
the number of sample units in cluster . The additional subscript s p denotes the partial
collapsing of the expanded random variables. The collapsing sums the SSUs for the
sample and remainder in each cluster for each PSU, reducing the number of random
variables from to . Since N 22N ( ) 1
w wp
−⎡ ⎤′= +⎢ ⎥⎣ ⎦ CY C C C Y P Yw , where
( )2
1
N M
−′= −CP I C C C C′ , we can express ( ) 1
w wp
−⎡ ⎤′ ′ ′ ′= +⎢ ⎥⎣ ⎦ C wg Y g C C C Y g P Y . Notice
that [ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 , 0w′ =Cg P Y and p wpT ′= g Y . Defining as the optimal linear
unbiased predictor of T based on
T
wpY , and B as a linear unbiased predictor of
, will be optimal for ( ) 0w′ =Cg P Y T wT ′= g Y if and only if ( )1 2ˆ ˆ 0E T T Bξ ξ
⎡ ⎤− =⎣ ⎦ . We
determine conditions under which ( )1 2ˆ ˆ 0E T T Bξ ξ
⎡ ⎤− =⎣ ⎦ by using the unbiased constraints
of the predictors, and expressing ( )1 2ˆE T T Bξ ξ
ˆ⎡ ⎤−⎣ ⎦ as a function of ( )1 2 w wEξ ξ ′Y Y .
Simplifying terms, we find that when sjw ws= for all 1,..., sj M= , ( )1 2ˆ ˆ 0E T T Bξ ξ
⎡ ⎤− =⎣ ⎦
(see c06ed54.doc for details). This means that we can obtain the optimal predictor using
the partially collapsed random variables as long as within each cluster, the weights are
equal for SSUs.
C07ed13.doc 11/29/2007 12:37 PM 14
We assume that sjw w= s for all 1,..., sj M= in subsequent developments, and
develop the best linear unbiased predictor of p wpT ′= g Y based on wpY . The vector
contains random variables. The first random variables are of the form
wpY
22N 2N
is s s sIU w m Y , while the remaining random variables are of the form 2N
( )is s s s sIIU w M m Y− , where 1
1 sm
sI sjjsm =
=Y Y∑ and 1
1 s
s
M
II sjj ms sM m = +
=−sY Y∑ .
It is natural to consider whether or not it is sufficient to predict p wpT ′= g Y where
[ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 using *wp wp
′=Y C Y where *2N N
′ ′= ⊗C I 1 . Note that
where . Note that
*wpT ′= g Y
( ) 1* * * *
p
−⎡′ ′′= ⎢⎣ ⎦g g C C C ⎤
⎥ ( ) 1* * *
2N
N N
−′ = ⊗
1C C C I and .
We refer to the random variables in as the collapsed random variables. This set of random variables is similar to those used by Stanek and Singer (2004) for a population with equal size clusters and equal size samples per cluster. When there is no response error, the collapsed random variables can be used to develop the same predictor of a linear combination of PSU means as that obtained by Stanek and Singer (2004).
*2
′ ′ ′= ⊗g 1 c
2N wpY
Our goal is to see whether or not we lose information in doing this collapsing by applying the Rao-Bellhouse theorem. First, we find that an unbiased predictor of T using can be obtained only if sampling of clusters is with probability proportional to size. based on . We proceed in a similar manner
wpY
further collapsing of the random variables to a set of random variables, with
one variable for the sample SSUs and one for the remaining SSUs in each PSU. This
can be accomplished by er(Would it be possible to collapse these random variables
further?)
22N 2N
3.2. Predicting a Parameter for a PSU Using Collapsing Expanded RP Random
Variables
C07ed13.doc 11/29/2007 12:37 PM 15
We partition into the first random variables corresponding to the sample,
, and the remaining random variables, wpY nN
wIY wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where
I I′ ′= ⊗ N′g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 , and ( I II )′′ ′=c c c , where is an vector of constants. Explicitly, the partitioned RP model is defined by
Ic 1n×
2 1 2
IwI wI wI wI
IIwII wII wII wII
E Eξ ξ ξ
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟
⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝⎣ ⎦
XY Y Yμ
XY Y Y⎞⎟⎠
EE
where 1
1 N
I n s ssw m
N =
⎡ ⎤⎛ ⎞= ⊗ ⊕⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦X 1 and
( )
1
1
1
1
N
N n s ss
II N
N s s ss
w mN
w M mN
− =
=
⎛ ⎞⎛ ⎞⊗ ⊕⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟=⎜ ⎟⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1X
1, random effects
are given by
( )( )
( )( )
( ) ( )
1
12 1 2
1
1
1
11
N
n s s I Is
NwI wI
N n s s II IIswII wII
N
N s ss
f d vec E
f d vec EE E
f d vec E
ξ
ξξ ξ ξ
ξ
=
− =
=
⎛ ⎞⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟⎡ ⎤ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊗ ⊕ −⎜ ⎟− = ⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎢ ⎥ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎛ ⎞ ⎡ ⎤⊗ ⊕ − −⎜ ⎟⎜ ⎟⎣ ⎦⎜ ⎟⎝ ⎠⎝ ⎠
I U
Y Y I UY Y
I U
U
U
U
,
where ss
s
mf
M= , s s s sd M w μ= ,
2
wI wI wI
wII wII wII
Eξ
⎛ ⎞ ⎛ ⎞ ⎛ ⎞= −⎜ ⎟ ⎜ ⎟ ⎜
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
E Y YE Y Y
⎟ , and ( )I II=U U U ,
and ( )( ) ( )1 2I i= =U U U U Un ( )( ) ( )1 2II i n n N+ += =U U U U U . We
partition the variance in a similar manner, representing it by1 2
,
,
var I I IIwI
I II IIwIIξ ξ
⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟′⎝ ⎠⎝ ⎠
V VYV VY
where 2 *2
1 1 1
1 11
N N N
I n n s s N s s n ss s s sef d f d fN N = = =
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛= − ⊗ ⊕ ⊕ + ⊗ ⊕⎜ ⎟ ⎜ ⎟ ⎜⎜ ⎟ ⎢ ⎥− ⎝ ⎠ ⎝ ⎠ ⎝⎝ ⎠ ⎣ ⎦V I J P I v ⎞
⎟⎠
,
( ) ( )( )
( )
1 1
,2 *2
1 1 1
1 1 1 1 11 1
N N
n s s Nn N n n N n n N s s
I II N N N
s s N s s n s ses s n N n s
f d f dN N N N
f d f d f v
× − × − × = =
= = × − =
⎛ ⎞⎛ ⎞s s
⎡ ⎤⎡ ⎤ ⎛ ⎞ ⎛ ⎞⎛ ⎞− ⊗ − ⊗ ⊕ ⊕ −⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎢ ⎥− − ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎜ ⎟=⎜ ⎟⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞⊕ ⊕ − ⊗ ⊕⎜ ⎟⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠
J I 0 J PV
P I 0
,
C07ed13.doc 11/29/2007 12:37 PM 16
( ) ( )
( )
( )( )
( )
1 1 1 1
1 1
1 1
11
11
1
N n N n N nN n n N n N
N N N N
s s N s s s s N s ss s s s
IIn N n
N N nN n
N N
s s N s ss s
N N
f d f d f d f d
NN
f d f d
− − −− × − ×
= = = =
× −
× −−
= =
⎛ ⎞ ⎡ ⎤⎛ ⎞− ⊗ − ⊗⎜ ⎟⎜ ⎟ ⎢ ⎥⎝ ⎠⎝ ⎠ ⎣ ⎦⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊕ ⊕ ⊕ ⊕ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦
= ⎛ ⎞⎛ ⎞− ⎜ ⎟⎜ ⎟ − ⊗⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
⎛ ⎞⊕ − ⊕⎜ ⎟⎝ ⎠
I J 0 I J
P P
V 0J
I
P( ) ( )
( )
( )2 *2
1
1 11 1
N n N nN n nN
s sen N n sN
NN n
N N
s s N s ss s
f v
f d f d
− −− ×
× − =
−
= =
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟ −⎛ ⎞⎜ ⎟
⎜ ⎟⎜ ⎟ ⎛ ⎞⎜ ⎟⎜ ⎟ + ⊗⎜ ⎟⎜ ⎟ ⎝ ⎠⎜ ⎟⊗ ⎜ ⎟−⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟⎡ ⎤⎛ ⎞ ⎛ ⎞⊕ − ⊕ −⎜ ⎟⎜ ⎟ ⎜ ⎟⎢ ⎥⎡ ⎤⎛ ⎞ ⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎜ ⎟⎢ ⎥⎜ ⎟⎝ ⎠⎣ ⎦⎝ ⎠
I 0 I
0IP I
P
⊕
and 2 2
*2 1 s s s sse
s
f M wvf N
σ⎛ ⎞−= ⎜ ⎟⎝ ⎠
.
We develop an expression for the best linear unbiased predictor of T next. The
predictor is a linear function of the sample, ˆwIT ′= L Y . Since
, the unbiased constraint given by
. Minimizing
( )ˆ wII II
wII
T T⎛ ⎞
′ ′ ′− = − − ⎜⎝ ⎠
YL g g
Y⎟
=( ) 0I I II II′ ′ ′− −L g X g X ( )1 2ˆvar T Tξ ξ − accounting for the unbiased
constraint using Lagrange multipliers, results in the familiar solution,
( ) ( )1 11 1 1 1 1 1
,ˆ
I I I I I I I I I I II II I I I I I II I
− −− − − − − −⎡ ⎤′ ′ ′ ′= + − +⎢ ⎥⎣ ⎦
L g V V X X V X X V V g V X X V X X g I .
This result simplifies to (see c06ed56.doc, p42 and c07ed01.doc p1)
( )*
1 1
1ˆ N Ns
n I N n N II n Ns ss s
k N N nc cf n f n= =
⎛ ⎞ ⎡ ⎤⎡ ⎤ ⎛ ⎞ −= ⊗ ⊕ + ⊗ ⊕ + ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠
L P c 1 1 1 1 1
where * **
* 1
111
Ns s
s s sss
k kk k d
d N k=
−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,
( )2
2 *21s
ss se
dkd N v
=+ −
, 1
1 N
ss
k kN =
= ∑ ,
1
1 N
II ii n
c cN n = +
=− ∑ and
1
1 N
ii
cN =
= ∑c . The predictor ˆ ˆwIT ′= L Y can be expressed as (see
page 6 of c07ed01.doc)
C07ed13.doc 11/29/2007 12:37 PM 17
( ) ( ) ( )1 1 1
ˆ ˆ ˆn N N
i i s s s sI II s s s s sIi s s
N N nT c Y Y c I M w Y c I M f w Yn n= = =
−⎛ ⎞ ⎛= − + +⎜ ⎟ ⎜⎝ ⎠ ⎝
∑ ∑ ∑ ⎞⎟⎠
where *
1
ˆN
i is s s s sIs
M w k Y=
= ∑Y U , 1
1ˆ ˆn
ii
Y Yn =
= ∑ , 1
n
s isi
I U=
= ∑ is an indicator ‘inclusion’ random
variable for cluster s in the sample, and 1
1 sm
sI sjs
Y Ym =
= j∑ .
An expression for the expected mean squared error (EMSE) of the predictor can
be developed directly using expressions for the partitioned variance, and simplify to (see
c07ed17.doc, p15)
( ) ( ) ( )
( )
1 2
22 2 *2 *2
,1 1
2 2
1
1 1ˆvar 2
2
n N
i I kd kd d s se sei s
N
i I di
NcT T c c k v v
N n N
Ncc Nc ncn
ξ ξ σ σ
σ
= =
=
⎛ ⎞⎛ ⎞ ⎛− = − − + +⎜ ⎟⎜ ⎟ ⎜⎝ ⎠⎝ ⎠ ⎝⎡ ⎤+ + −⎢ ⎥⎣ ⎦
∑ ∑
∑
*2
1
N
s=
⎞⎟⎠
∑
where 1
1 n
I ii
c cn =
= ∑ , 1
1 N
d ss
dN
μ=
= ∑ , *
1
1 N
kd s ss
k dN
μ=
= ∑ , ( )22 *
1
11
N
kd s s kds
k dN
σ μ=
= −− ∑ ,
( )22
1
11
N
d ss
dN
σ μ=
= −− ∑ d and ( )( )*
,1
11
N
kd d s s kd s ds
k d dN
σ μ μ=
= −− ∑ − .
When predicting a PSU mean, 1s
s
wM
= , and the predictor simplifies to
( )1
1ˆ ˆN
s sI is
T I Y Y Yn =
= + −∑ n≤ˆ if i , and to ( )1
1ˆN
s sIs
T I Yn =
⎡ ⎤= ⎢ ⎥⎣ ⎦∑ when . The EMSE of the
sample PSU mean predictor (when
i n>
i n≤ ) simplifies to (see c07ed25.doc, page 3)
( ) ( ) ( )
( ) ( )
1 2
2* *
1 1
2*2
21
1 1 1ˆvar 1 11
1 1 1 1
N N
s s s ss s
Ns
s ss s
nT T k kn N N
n k fnN m
ξ ξ μ μ
σ
= =
=
⎡ ⎤− ⎛ ⎞⎛ ⎞− = − − −⎢ ⎥⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦
⎡ ⎤+ + − −⎣ ⎦
∑ ∑
∑,
C07ed13.doc 11/29/2007 12:37 PM 18
while the EMSE of a PSU not in the sample is given by
( ) ( )1 2
22
1
1 1ˆvar 1N
ss
s s
nT T fn nN Nξ ξ m
σσ=
+⎛ ⎞− = + −⎜ ⎟⎝ ⎠
∑ .
4. COMPARISON OF EMSE
5. EMPIRICAL PREDICTOR SIMULTION
6. APPLICATION
7. DISCUSSION
OLD STUFF
4. APPLICATION
_________________________________________________________________________________________
Table 2. Predictors of latent values of a realized PSU Mean based on Different models
Predictor Target Sample SSUs Remaining SSUs
_________________________________________________________________________________________
iY Simple Mean | 1s isUμ = *i ic Y + ( )*1 i ic Y−
iMMP Mixed Model iP ( )ˆ ˆi ik Yμ μ+ −
iSSP Scott&Smith Model iP i if Y + ( ) ( )* * *ˆ ˆ1 i i if k Yμ μ⎡ ⎤− + −⎣ ⎦
*iP RP (PPS) iP ifY + ( ) ( )*1 if Y k Y Y⎡ ⎤− + −⎣ ⎦
C07ed13.doc 11/29/2007 12:37 PM 19
iP• RP (General) iP•i ic Y + ( ) ( )1 i ic Y k Y Y•⎡ ⎤− + −⎣ ⎦
_________________________________________________________________________________________
_________________________________________________________________________________________
C07ed13.doc 11/29/2007 12:37 PM 20
REFERENCES
Bolfarine, H., and Zacks, S. (1992), Prediction Theory for Finite Populations, New York:
Springer-Verlag.
Cassel, C.M., Sarndal, C.E., and Wretman, J.H. (1977), Foundations of Inference in
Survey Sampling, New York: Wiley.
Cochran, W. (1977), Survey Sampling, New York: Wiley.
Deville, J.C., and Sarndal, C.E. (1992), “Calibration Estimation in Survey Sampling,”
Journal of the American Statistical Association, 87, 376-382.
Diggle, P. L., Heagerty, P., Liang, K. Y. and Zeger, S. (2002), Analysis of Longitudinal
Data, Oxford University Press.
Ghosh, M. and Lahiri, P. (1987), “Robust Empirical Bayes Estimation of Means from
Stratified Samples,” Journal of the American Statistical Association, 82,1153-1162.
Goldberger, A. S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear
Regression Model,” Journal of the American Statistical Association, 57, 369-375.
C07ed13.doc 11/29/2007 12:37 PM 21
Graybill, F. A. (1983), Matrices with applications in statistics, Belmont, California:
Wadsworth International.
Henderson, C.R. (1984), Applications of Linear Models in Animal Breeding, Guelph,
Canada: University of Guelph.
Henderson, C. R., Kempthorne, O., Searle, S. R. and von Krosigk , C. M., (1959), “The
Estimation of Environmental and Genetic Trends from Records Subject to Culling,”
Biometrics, 15, 192-218.
Hinkelmann, K., and Kempthorne, O. (1994), Design and Analysis of Experiments, Vol. 1,
Introduction to Experimental Designs, New York: Wiley.
Li, W. (2003), “Use of random Permutation Model in rate Standardization and
Calibration,” unpublished doctoral thesis, University of Massachusetts, Massachusetts.
McCulloch, C. E. and Searle, S. R. (2001), Generalized, Linear, and Mixed Models, New
York: John Wiley and Sons.
McLean, R. A., Sanders, W. L. and Stroup, W. W. (1991), “A Unified Approach to
Mixed Linear Models,” The American Statistician, 45(1), 54-64.
C07ed13.doc 11/29/2007 12:37 PM 22
Ockene, I. S., Hebert, J. R., Ockene, J. K., Saperia, G. M., Nicolosi, R., Merriam, P.A.
and Hurley, T. G. (1999), “Effect of Physician-delivered Nutrition Counseling Training
and an Office-support Program on Saturated Fat Intake, Weight, and Serum Lipid
Measurements in a Hyperlipidemic Population: Worcester Area Trial for Counseling in
Hyperlipidemia (WATCH),” Archives of Internal Medicine, Apr 12, 1 59(7), 725-731.
Rao, J.N.K. (2003), Small Area Estimation, New York: Wiley.
Robinson, G. K. (1991). “That BLUP is a Good Thing: the Estimation of Random
Effects,” Statistical Science, 6(1), 15-51.
Royall, R. M. (1976), “The Linear Least-squares Prediction Approach to Two-stage
Sampling,” Journal of the American Statistical Association , 71, 657-664.
Sarndal, C-E, Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling,
New York: Springer-Verlag.
Scott, A. and T. M. F. Smith (1969), “Estimation in Multi-stage Surveys,” Journal of the
American Statistical Association, 64(327), 830-840.
C07ed13.doc 11/29/2007 12:37 PM 23
Searle, S. R., Casella, G. and McCulloch, C. E. (1992), Variance Components. New
York: Wiley.
Stanek, E. J. III and Singer, J. M. (2004), “Predicting Random Effects from Finite
Population Clustered Samples with Response Error,” Journal of the American Statistical
Association, 99, 119-130.
Valliant, R., Dorfman, H. A. and Royall, R. M. (2000), Finite Population Sampling and
Inference, New York: Wiley.
Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,
New York: Springer-Verlag.
C07ed13.doc 11/29/2007 12:37 PM 24
Top Related