Predicting Random Effects for Different Size Clusters ... · Julio M. Singer Departamento de...
Transcript of Predicting Random Effects for Different Size Clusters ... · Julio M. Singer Departamento de...
C07ed47.doc 11/29/2007 3:48 PM i
Predicting Random Effects for Different Size Clusters Using an Expanded Finite
Population Mixed Model
Edward J. Stanek III
Department of Public Health
401 Arnold House
University of Massachusetts
715 North Pleasant Street
Amherst, MA 01003-9304 USA
Julio M. Singer
Departamento de Estatística
Universidade de São Paulo
São Paulo, Brazil
C07ed47.doc 11/29/2007 3:48 PM ii
ABSTRACT
Prediction of random effects is an important problem with expanding applications.
In the simplest context, the problem corresponds to prediction of the latent value (the
mean) of a realized cluster selected via two-stage sampling. Best linear unbiased
predictors developed from mixed models are widely used, but in equal cluster size
settings, have been recently shown to be out-performed by finite population mixed model
predictors (Stanek and Singer 2004). When clusters differ in size, super-population
models have been used to predict the contribution of the unobserved subjects to a realized
cluster mean, providing an extension of the mixed model to a finite population. We
develop an expanded finite population mixed model for use in predicting linear
combinations of realized cluster means. The predictor is a linear function of the sample,
unbiased, and has minimal MSE. Comparison with mixed model, super-population
model, and finite population mixed model predictors demonstrates substantial reduction
in the MSE, even in settings when cluster sizes are equal.
The general approach faithfully capturing the stochastic aspects sampling in the
problem. The expanded random variables span a higher dimensional space than those
typically applied to such problems. This fact enables predictors with reduced MSE to
result. The general approach describing the expansion of the random variables, and
subsequent reductions to enable a sufficient is illustrated in the two stage setting, and has
the potential further application.
C07ed47.doc 11/29/2007 3:48 PM iii
Contact Address: Edward J. Stanek III Department of Public Health 401 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models. ACKNOWLEDGEMENT. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA.
C07ed47.doc 11/29/2007 3:48 PM 1
1. INTRODUCTION
How to best guess the average response for subjects in a cluster based on data that
includes only some subjects in some clusters is a common problem. For example,
medical costs and quality of care are important factors that impact health care economics,
and influence patient choice of care. Web sites are currently available that rate hospitals
in regions according to these characteristics (as for example see
http://www.healthgrades.com). Estimating such rates and average costs for hospitals that
typically vary in size is an important practical problem.
The best linear unbiased predictor (BLUP) developed in a mixed model is often
offered as a solution to the problem of predicting the average cluster response. This
solution accounts for unequal numbers of subjects in sample clusters, but does not use
information that is often available about the size of the cluster. An alternative approach
uses the super-population model of Scott and Smith (1969) to account for different
cluster sizes in a finite super-population. Both approaches specify stochastic models that
plausibly represent the problem of interest, but are not formally linked to the finite
population. A third approach, limited to settings where clusters are of equal size, is the
finite population mixed model of Stanek and Singer (2004). This approach uses sampling
indicator random variables to link the sample to the population, resulting in predictors
with smaller mean squared error (MSE) than the other approaches, even when empirical
predictors are used (San Martino, Singer, and Stanek (2007)). We revisit this problem
when clusters differ in size, illustrating that each previous approach does not adequate
represent the problem, and develop an expanded finite population mixed model that
C07ed47.doc 11/29/2007 3:48 PM 2
overcomes these limitations, resulting in a predictor that out performs previous predictors
of the latent value of a realized random effect in all settings.
2. BACKGROUND AND AN EXAMPLE
We consider a simple example using hypothetical data on the cost of
appendectomies (assumed to be known without error) for patients in two hospitals to
motivate and provide background for the problem (Table 1).
Table 1. Data on Hospital Costs for Appendectomy for Patients in Two Hospitals
Data Hospital
Hospital Patient Expense # PatientsLatent Value
Sam Evans $1400 Jane Blake $2100
Central
Hong Yao $2500
3 Centralμ
Juan Marcus $1900 Mercy Mary Slokum $1700
2 Mercyμ
Our interest is in the average cost of appendectomies (the latent value) for each hospital
in the past year. If the available data (in Table 1) includes the cost of all appendectomies
in this period for a hospital, the latent value is the simple average cost. When such data
are not available for all appendectomy patients, the average cost for the available patients
in each hospital ( i.e. $2000 for Central, and $1800 for Mercy) can be used to estimate of
the latent value for each hospital. This estimator is the best linear unbiased estimator if
the available data resulted from a stratified simple random sample of appendectomy
patients, with hospitals as strata.
C07ed47.doc 11/29/2007 3:48 PM 3
A different stochastic model assumes that the data in Table 1 are the result of a
two-stage cluster sample, where a simple random sample of appendectomy patients is
selected from each of a simple random sample of hospitals. We refer to a sample hospital
as a primary sampling unit (PSU) to distinguish it from a specific hospital, and to a
sample patient as a secondary sampling unit (SSU) to distinguish it from a specific
patient. Using the notation in Table 2,
Table 2. Notation for Common Mixed Model Representation of Data
Data PSU
Sample Hospital PSU i
SamplePatient SSU j
ExpenseijY
SampleSize
im
Latent Value
iBμ +
1j = 11Y 2j = 12Y
1i =
3j = 13Y
3 1Bμ +
1j = 21Y 2i = 2j = 22Y
2 2Bμ +
a model for ijY , the appendectomy cost for SSU j in PSU i is given by the mixed model
ij i ijY B Eμ= + + (1)
where μ is the overall mean, iB is the random effect for PSU i , and ijE is a random
variable corresponding to the deviation of SSU j from the latent value, i iT Bμ= + , of
PSU i . Model (1) is an example of the general linear mixed model
= + +Y Xα ZB E (2)
C07ed47.doc 11/29/2007 3:48 PM 4
where Y is an 1r × response vector, X and Z are r p× and r q× known design
matrices, respectively, α is a 1×p vector of fixed effects, B is a 1×q vector of random
effects with null means and covariance matrix Γ , E is a 1r × vector of random errors
with null means and covariance matrix Σ and E is independent of B , such that
( )var ′= =Y Ω ZΓZ + Σ . This model has a long history (see for example Harville 1978,
Laird and Ware 1982) and is the main topic in several recent texts (as for example,
Brown and Prescott 1999, Byrk and Raudenbush ( ),Demeidenko ( ), Diggle et al
2003, Littell et al 2006, McCulloch and Searle 2001, Singer and Willett 2003, Verbeke
and Molenberghs 2000, or Vonesh and Chinchilli 1997). For the data in Table 2 using
(1), where 1,..., 2i n= = , 1,..., ij m= and defining 1
n
ii
r m=
= ∑ , the terms in (2) are given
by r=X 1 , 1 i
n
mi== ⊕Z 1 , μ=α , and 1( , , )′= … nB BB with 2
nσ=Γ I and 2
1 i
n
i miσ
== ⊕Σ I ,
where a1 is an 1a× vector with all elements equal to one, aI is an a a× identity matrix,
and 1
n
ii=⊕A denotes a block diagonal matrix with blocks given by iA (Graybill 1983). In
these expressions, ( ) 2var iB σ= , and ( ) 2var ij iE σ= for 1,...,=i n . The mixed model
predictor of the latent value for PSU i
( )ˆ ˆ ˆi i iP k Yμ μ= + − (3)
C07ed47.doc 11/29/2007 3:48 PM 5
is a linear function of Y (i.e., iP ′= L Y ), unbiased (i.e., ( )ˆ 0i iE P T− = ), and has
minimum mean squared error (MSE), where 1
1
ˆn
iin
ii
i
wY
wμ
=
=
⎛ ⎞⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎝ ⎠
∑∑
is a weighted sample
mean with 2 2
1/i
i i
wmσ σ
=+
, 1
1 im
i ijji
Y Ym =
= ∑ , and 2
2 2 /ii i
km
σσ σ
=+
(Goldberger 1962;
Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991). If we assume that
the data in Table 1 is the realization of the random variables represented in Table 2, and
in addition that σ =100 , 1 3σ = 00 and 2 50σ = , then ˆ $1844μ = , 1 0.25k = , 2 0.89k = ,
and the predictor of the latent value for the realized hospital corresponding to 1i = (i.e.,
Central) is 1 $1883P = , while the predictor of the latent value for 2i = (i.e. Mercy) is
2 $1805P = .
When the two-stage sampling model can be assumed (either because such
sampling was conducted, or because it is considered to be a plausible model), there is
general agreement that (3) rather than the hospital sample mean, iY should be used to
predict the latent value for the realized PSU since the predictor has smaller MSE, a fact
related to the mixed model being unconditional (on hospitals), in contrast to the stratified
sample model that estimates a cluster mean by iY . As noted by Robinson (1991)
following Goldberger (1962), expressing i i iT X μ ′= + Z B , where 1iX = and
( )0 1 0i′ =Z is a 1 n× vector with the a value of one in column i , (3) can be
C07ed47.doc 11/29/2007 3:48 PM 6
motivated as the best linear unbiased predictor (BLUP) of iT from the joint distribution
of ( )iT ′′Y .
The stratified model and mixed model estimate a realized hospital’s latent value
without using additional population detail, such as the number of hospitals in the
population, or the number of appendectomy patients in each hospital, even though such
additional information may be available (as, for example, in Table 3).
Table 3. Population of Hospitals and Appendectomy Patients in the Past Year
Population Hospital Hospital ( )s
Patient ( )t
Expensesty
# PtssM
Meansμ
Variance 2sσ
1t = 11y 1s = (County)
2t = 12y 2 1μ 2
1σ
1t = (Jane Blake)
21y
2t = 22y 3t =
(Sam Evans) 23y
2s = (Central)
4t = (Hong Yao)
24y
4 2μ 22σ
1t = (Mary Slokum)
31y
2t = (Juan Marcus) 32y
3s = (Mercy)
3t = 33y
3 3μ 23σ
Recognizing the limitation of (3), Scott and Smith (1969) developed a predictor of iT by
augmenting the random variables in Table 2 by a remaining set of random variables in
Table 4, motivated by the finite population in Table 3.
C07ed47.doc 11/29/2007 3:48 PM 7
Table 4. Remaining Random Variables Not Realized via Sampling
Remaining PSUs/SSUs PSU PSU ( )i
SSU ( )j
ExpenseijY
# PtsiM
Mean iBμ +
Variance 2iσ
1i = (Central)
4j = 14Y 4 1Bμ + 21σ
2i = (Mercy)
3j = 23Y 3 2Bμ + 22σ
1j = 31Y 3i = (County)
2j = 32Y 2 3Bμ + 2
3σ
Random variables in the super-population model consist of the joint set of random
variables in Tables 2 and 4. As described by Scott and Smith, the super-population is
constructed by first selecting a finite population (presumably from some larger
population in time or space), and then selecting a two-stage sample from the realized
population. The latent value of a hospital in the super-population is
( )1
11i
i
M
i i i i ijj mi i
T f Y f YM m = +
= + −− ∑ where i
ii
mfM
= . Using similar assumptions as for the
mixed model, but applying them to all super-population model random variables, Scott
and Smith’s predictor is given by
( ) ( )ˆ ˆ ˆ1i i i i i iP f Y f k Yμ μ⎡ ⎤= + − + −⎣ ⎦ (4)
and can be motivated as the best linear unbiased predictor of iT from the joint distribution
of 1
1 i
i
M
ijj mi i
YM m = +
⎛ ⎞′⎜ ⎟
−⎝ ⎠∑Y . This joint distribution more clearly identifies the portion of
C07ed47.doc 11/29/2007 3:48 PM 8
the PSU that is not observed relative to the joint distribution used for the mixed model.
This added specificity results in a smaller MSE for (4) than (3) as indicated by Stanek
and Singer (2004). Using (4) for the data in Table 1, the predictor of the latent value for
the realized hospital corresponding to 1i = (i.e., Central) is 1 $1971P = , while the
predictor of the latent value for 2i = (i.e. Mercy) is 2 $1802P = .
The super-population model represented in Tables 2 and 4 includes some aspects
of the finite population, but does not clearly separate realized clusters from random
variables representing the sampling of clusters. For example, responses for SSUs in the
same PSU are correlated in the super-population model, even though the cluster
associated with each PSU (as in Table 4) is assumed to be known. This limitation was
addressed by Stanek and Singer (2004) when clusters are of equal size (i.e. M ) and
within cluster sample sizes are equal (i.e. m ) by formally representing the two-stage
sample process using sampling indicator random variables. The sampling model clearly
separated clusters from PSUs, and resulted in a slightly different variance structure than
that of Scott and Smith: first, the variance of SSUs for a PSU, 2iσ , in the two-stage model
was replaced by the average SSU variance, 2 2
1
1 N
e ssN
σ σ=
= ∑ ; and second, the correlation
between response for SSUs in a PSU included some small corrections proportional to
1M
or 1N
due to the finite number of clusters and finite cluster size. The predictor given
by
( ) ( )ˆ 1i i iP fY f Y k Y Y⎡ ⎤= + − + −⎣ ⎦ (5)
C07ed47.doc 11/29/2007 3:48 PM 9
differs from (4) in that PSU weights are equal, resulting in 1
1ˆn
ii
Y Yn
μ=
= = ∑ , mfM
= , and
*2
*2 2 /e
km
σσ σ
=+
, where 2
*2 2 e
Mσσ σ= − . Theoretically, when cluster have equal size and
sample sizes per PSU are equal, as shown by Stanek and Singer (2004), the MSE of (5) is
less than the MSE for (4) or (3). The empirical version of (5) formed by replacing
variance components with their sample estimates also out-performs the empirical
versions of the other predictors (San Martino, Singer, and Stanek 2007). However, (5)
can not be used for data in Tables 2 and 4 since cluster sizes differ.
If clusters are of equal size, Stanek and Singer’s approach can be used to
represent the remaining random variables (similar to Table 4) without the need to identify
the realized clusters for sample PSUs. When clusters differ in size, this strategy for
representing remaining random variables is problematic. For example, when the realized
hospital for PSU 1i = is not known, we do not know how many SSUs are remaining in
Table 4 (Is it one, as would be appropriate if the realized PSU is Central Hospital, or zero,
as would be appropriate if the realized PSU is Mercy Hospital?). Other problems include
the inability of County Hospital to correspond to the PSU 1i = in Table 2, even though
the first stage sampling is assumed to be simple random sampling, or the apparent
random nature of the second stage sample size, PSU size, and SSU variance due to the
first stage sampling.
We extend the expanded model used by Stanek, Singer, and Lencina (2004) for
simple random sampling to two stage sampling to overcome these problems. The
expanded model simultaneously retains the cluster identify and the PSU position, and
C07ed47.doc 11/29/2007 3:48 PM 10
distinguishes for a PSU the relevant contribution of sample SSUs, and non-sampled SSUs
to a target random variable such as a PSU mean. The expansion replaces sums of random
variables by individual random variables to represent response, resulting in a set of
random variables that spans a higher dimensional space that the random variables used in
Stanek and Singer (2004). Since there are two stages of sampling, there are two levels of
expansion of the random variables. We first define this expanded set of random variables,
and subsequently investigate whether a lower dimensional set of random variables can
adequately represent the problem without loss of information, using a theorem of Rao and
Bellhouse (1978). We arrive at such a collapsed set of random variables, and show that
when predicting a weighted linear combination of PSU means (or totals) via a linear
unbiased predictor, the expanded model can not be further reduced without loss of
information. The best linear unbiased predictor is developed (similar to Stanek and
Singer 2004) and the theoretical expected mean squared error is characterized and
compared (via simulation) to mixed model and super population model predictors. We
conclude by highlighting model features that have consequence in extending this work
and related work that offer promising possibilities for future improvement.
2. AN EXPANDED RP MODEL FOR A FINITE CLUSTERED POPULATION
Let a finite population be defined (as in Table 3) by a listing of 1,..., st M=
subjects in each of 1,...,s N= clusters, where the non-stochastic response for subject t in
cluster s is given by sty . The finite population parameters corresponding to the mean
C07ed47.doc 11/29/2007 3:48 PM 11
and variance of cluster s are defined by 1
1 sM
s stts
yM
μ=
= ∑ and
( )22
1
1 1 sMs
s st sts s
My
M Mσ μ
=
⎛ ⎞−= −⎜ ⎟
⎝ ⎠∑ , respectively, where we use the survey sampling
definition of the parameter 2sσ . Similarly, the population mean, and the variance
between cluster means are defined as 1
1 N
ssN
μ μ=
= ∑ and ( )22
1
1 1 N
ss
NN N
σ μ μ=
−⎛ ⎞ = −⎜ ⎟⎝ ⎠
∑ ,
respectively. Using these parameters, we represent the potentially observable response
for subject t in cluster s as st s sty μ β ε= + + where ( )s sβ μ μ= − is the deviation of the
mean for cluster s from the overall mean, and ( )st st syε μ= − is the deviation of subject
t ’s response (in cluster s ) from the mean for cluster s . Defining
( )1 2 N
′′ ′ ′=y y y y where ( )1 2 ss s s sMy y y ′=y , the model can be
summarized as
μ= + +y X Zβ ε
where =X 1 , 1 s
N
MN s× == ⊕Z 1 ,
1
N
ss
M=
= ∑ , ( )1 2 Nβ β β ′=β , and ε is defined
similarly to y . None of the terms in the model are random variables.
2.1. Random Variables and The Two Stage Random Permutation (RP) Model
We explicitly define a vector of random variables that represents a two stage RP
of the population. Assuming that each realization of the permutation is equally likely
C07ed47.doc 11/29/2007 3:48 PM 12
with probability
1
1
! !N
ss
N M=∏
, the random variables formally represent two-stage sampling
(Cochran 1977). We assume that the sample clusters are in the first n positions in a
permutation of clusters and that the sample subjects in cluster s correspond to the
subjects in the first sm positions in a permutation of the cluster’s subjects. These
definitions represent random variables as a sequence as opposed to the more usual set
notation.
We use indicator random variables to relate the response for subject t in cluster s ,
sty , to the response for SSU j in PSU i in a two stage permutation of clusters and
subjects. We define ( )sjtU as an indicator random variable that takes on a value of one
when SSU j in cluster s is subject t , and zero otherwise, and use it to represent response
for SSU j in cluster s by ( )
1
sMs
sj jt stt
Y U y=
= ∑ . We include a fixed non-stochastic weight for
SSU j in cluster s , sjw , and define the weighted response as wsj sj sjY w Y= so that the sum,
1
sM
wsjj
Y=∑ , will correspond to a cluster total when 1sjw = for all 1,..., sj M= , or a cluster
mean when 1sj
s
wM
= for all 1,..., sj M= . Defining ( ) ( ) ( )( )( )1 2 s
ss ssj j j jMU U U ′=U ,
( )swsj sj s jY w ′= y U . The vector ( )1 2 sws ws ws wsMY Y Y ′=Y represents a permutation of
weighted response for SSUs in cluster s .
C07ed47.doc 11/29/2007 3:48 PM 13
We define isU as an indicator random variable that takes on a value of one when
PSU i is cluster s , and a value of zero otherwise. If all clusters were equal in size, we
could represent a permutation of SSUs for PSU i by 1
N
is wss
U=∑ Y . When cluster sizes differ,
this sum is not defined since the dimension of the vectors in the sum differs. We solve
this problem by expanding the random variables for PSU i into the 1× vector
( )( ) ( )1 1 2 2wi is ws i w i w iN wNU U U U ′′ ′ ′= =Y Y Y Y Y . A two stage random permutation of
the population is represented by the 1N × vector,
( )( ) ( )1 2w wi w w wN′′ ′ ′= =Y Y Y Y Y , where an element of wY is given by is wsjU Y .
2.2 A Mixed Effect Model for the Expanded Random Variables
We represent a mixed model for the expanded RP model, indexing representing
expectation with respect to permutations of clusters with the subscript 1ξ and expectation
with respect to permutations of subjects in a cluster with the subscript 2ξ . For PSU i , we
express
( ) ( ) ( )1 1 2 1 1 1|wi wi wi wi wiE E Eξ ξ ξ ξ ξ ξ⎡ ⎤= + − +⎣ ⎦Y Y Y Y E
where ( )1 2 1
1 N
wi ssE
Nξ ξ =
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
Y w μ , ( )2 1| 1
N
wi s s isEξ ξ μ
=
⎛ ⎞= ⊕⎜ ⎟⎝ ⎠
Y w U , ( )2 1|wi wi wiEξ ξ= −E Y Y ,
( )( ) ( )1 2 ss sj s s sMw w w w ′= =w , ( )( ) ( )1 2s Nμ μ μ μ ′= =μ ,
C07ed47.doc 11/29/2007 3:48 PM 14
( )( ) ( )1 2i is i i iNU U U U ′= =U and the deviation of response from the expected
response within a PSU is wiE . The fixed effects are given by μ , the vector of cluster
means. The random effects, ( ) ( )2 1 1 1| wi wiE Eξ ξ ξ ξ−Y Y , are defined as the deviation from
the fixed effects of the expected response conditional on a realized PSU. In the RP
model of Stanek and Singer (2004), the random effect for the mean of PSU i was defined
explicitly as *1 1 * 1
1N N N
is s is s is ss s s
U U UN
β μ μ= = =
⎛ ⎞= −⎜ ⎟⎝ ⎠
∑ ∑ ∑ , with the random variables isU
explicitly linking the clusters to PSU i . In the expanded RP model, random effects are
defined for SSU j in PSU i as ( )( )1sj s is isw U E Uξμ − . For example, when 1sj
s
wM
= for
all 1,..., sj M= (corresponding to the PSU mean), summing over j ,
( )( )11
1sM
sj s is is is s sj
w U E U UNξμ μ μ
=
− = −∑ . For both models, the expected value of the
random effect (with respect to 1ξ ) is zero, but this result arises from quite different
circumstances. We combine these expressions arriving at the expanded RP mixed model
given by
( )( )11 1
1 N N
w N s N s s ws svec E
N ξμ= =
⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ + ⊗ ⊕ − +⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦Y 1 w μ I w U U E (6)
where ( )1 2 N=U U U U . The variance of random effects is given by
C07ed47.doc 11/29/2007 3:48 PM 15
( )( )1 2 11 1 1
1var1
N N N
N s s N s s N s ss s svec E
Nξ ξ ξμ μ μ= = =
⎛ ⎞⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − = ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢ ⎥−⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠I w U U P w P w
while ( )1 2
2
1 1 1var
s s
s
M MNs
w N sj M sjs j jw w
Nξ ξσ
= = =
⎛ ⎞⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ ⊕ ⊕⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠E I P , where 1
a a aa= −P I J and aJ
denotes an a a× matrix with all elements equal to one.
2.3. Defining Random Variables of Interest
Model (6) is an expanded version of a mixed model that retains the identity of
clusters, while accounting for a two stage RP. Our interest is in predicting a linear
combination of these random variables defined by wT ′= g Y , where g is non-stochastic.
Although many linear combinations are possible, we limit the discussion to linear
combination given by wT ′= g Y where
′ ′ ′= ⊗g c 1 (7)
and c is a 1N × vector of constants. In particular, we focus on the setting where i=c e
is an 1N × vector with all elements equal to zero, except for element i which has the
value of one. Of principal interest is the setting where i n≤ , such that the cluster of
interest is realized in the sample. When 1sj
s
wM
= for all 1,...,s N= , 1,..., sj M= , the
target random variable, 1 1
sMN
is sj sjs j
T U w Y= =
⎛ ⎞= ⎜ ⎟
⎝ ⎠∑ ∑ is the mean of PSU i ; when 1sjw = for all
C07ed47.doc 11/29/2007 3:48 PM 16
1,...,s N= , 1,..., sj M= , the target random variable, 1 1
sMN
is sj sjs j
T U w Y= =
⎛ ⎞= ⎜ ⎟
⎝ ⎠∑ ∑ is the total of
PSU i .
3. PREDICTING A PSU MEAN USING AN EXPANDED RP MODEL
We apply the basic strategy for developing a predictor given by Scott and Smith
1969; Royall 1976; Bolfarine and Zacks 1992; Valliant et al. 2000; and Stanek and
Singer 2004, to the expanded RP model. We assume that the elements in the sample
portion of wY will be observed, and express T as the sum of two parts, one which is a
function of the sample, and the other which is a function of the remaining random
variables. Then, requiring the predictor to be a linear function of the sample random
variables and to be unbiased, coefficients are evaluated that minimize the expected value
of the MSE given by ( )1 2ˆvar T Tξ ξ − . While in theory, an optimal predictor can be
expressed following this prediction recipe, in practice, the high dimensionality of the
vectors from the expansion of random variables may result in singularities that prevent a
unique solution (as in Stanek, Singer, and Lencina (2004)). In part for this reason, we
first explore projections of the expanded random variables into a lower dimensional space
that retains the necessary information for an optimal solution, thereby simplifying the
problem.
3.1. Partial Collapsing of the Expanded RP Random Variables
C07ed47.doc 11/29/2007 3:48 PM 17
Rao and Bellhouse (1978) (see Theorem 1.1) provide a way of determining
whether the optimal linear unbiased predictor of a target random variable, wT ′= g Y can
be obtained as the optimal linear unbiased predictor of p wpT ′= g Y based on wp w′=Y C Y ,
a vector of random variables that spans a lower dimensional space. We apply this
theorem when ( )( )
1 1
1 1
s s s
s s s
N N
m M mi sN N
m M mi s
−= =
−= =
⎛ ⎞′ ′⊕⊕⎜ ⎟′ = ⎜ ⎟
⎜ ⎟′ ′⊕⊕⎜ ⎟⎝ ⎠
1 0C
0 1 and ( ) 1
p
−⎡ ⎤′ ′ ′= ⎢ ⎥⎣ ⎦g g C C C . The additional
subscript p denotes the partial collapsing of the expanded random variables. The
collapsing sums the SSUs for the sample and remainder in each cluster for each PSU,
reducing the number of random variables from N to 22N . Since
( ) 1
w wp w
−⎡ ⎤′= +⎢ ⎥⎣ ⎦ CY C C C Y P Y , where ( ) 1
N
−′ ′= −CP I C C C C , we can express
( ) 1
w wp w
−⎡ ⎤′ ′ ′ ′= +⎢ ⎥⎣ ⎦ Cg Y g C C C Y g P Y . Using (7), [ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 , 0w′ =Cg P Y and
p wpT ′= g Y . Let ˆpL represent an 1nN × constant vector, and wIY be the first nN
random variables (corresponding to the sample) in wpY . Then defining ˆ ˆp p wIT ′= L Y as
the optimal linear unbiased predictor of T based on wpY , and ˆpB as a linear unbiased
predictor of ( ) 0w′ =Cg P Y , Theorem 1.1 (Rao and Bellhouse 1978) states that pT will be
optimal for wT ′= g Y if and only if ( )1 2ˆ ˆ 0p pE T T Bξ ξ
⎡ ⎤− =⎣ ⎦ . Expressing
( )1 2ˆ ˆ
pE T T Bξ ξ⎡ ⎤−⎣ ⎦ as a function of ( )1 2 w wEξ ξ ′Y Y , and simplifying terms, we find that when
C07ed47.doc 11/29/2007 3:48 PM 18
sj sw w= for all 1,..., sj M= , ( )1 2ˆ ˆ 0p pE T T Bξ ξ
⎡ ⎤− =⎣ ⎦ (see c06ed54.doc for details). This
implies that we can obtain the optimal predictor using the partially collapsed random
variables as long as within each cluster, the weights are equal for all SSUs.
We assume that sj sw w= for all 1,..., sj M= in subsequent developments, and
develop the BLUP of p wpT ′= g Y based on wpY . The vector wpY contains 22N random
variables. The first 2N random variables are of the form is s s sIU w m Y , while the
remaining 2N random variables are of the form ( )is s s s sIIU w M m Y− , where
1
1 sm
sI sjjs
Y Ym =
= ∑ and 1
1 s
s
M
sII sjj ms s
Y YM m = +
=− ∑ . Before developing the predictor, we
consider whether some additional collapsing of the random variables is possible without
loss of information.
It is natural to consider whether it is sufficient to predict p wpT ′= g Y using the 2N
collapsed random variables defined by *w wp
′=Y C Y where *2N N
′ ′= ⊗C I 1 . Note that
*wT ′= g Y and ( ) 1
* * * *p
−⎡ ⎤′ ′′= ⎢ ⎥⎣ ⎦g g C C C which simplifies to *
2′ ′ ′= ⊗g 1 c . This set of
random variables is similar to those used by Stanek and Singer (2004) for a population
with equal size clusters and equal size samples per cluster with no response error.
We apply the Rao-Bellhouse theorem to see if an optimal predictor of T can be
based on wY . Let us define such a predictor as ˆ ˆwIT ′= L Y where L represents an 1n×
constant vector, and wIY represents the first n random variables (corresponding to the
C07ed47.doc 11/29/2007 3:48 PM 19
sample) in wY . For T to be unbiased, we require
( ) ( ) ( )( )1 2
ˆ ˆwI n s N s sE T m M mξ ξ ′ ′ ′− = − −L Y L 1 c 1 to be zero. This will be zero only if
sampling of clusters is conducted with probability proportional to size (PPS)(see page 8,
c07ed29.doc).
Supposing that sampling is PPS sampling, and notice that
( )**
w p wpT ′ ′= +C
g Y g P Y since ( )* 0p wp′ =C
g P Y . Defining ( )ˆ ˆn N wIB ′= ⊗b I P Y as a
linear unbiased predictor of ( )*p wp′C
g P Y based on the sample part of * wpCP Y given by
( )n N wI⊗I P Y , the predictor T will be optimal if and only if ( )1 2ˆ ˆ 0E T T Bξ ξ
⎡ ⎤− =⎣ ⎦ .
Simplifying this expectation, we find that (see page 18 of c07ed29.doc)
( ) ( )( ) ( )
1 2 1 1
2 2
1
1ˆ ˆ ˆ ˆ1
1 ˆ ˆ
N N
I N n s N s Ns s
N
I N s s s Ns
E T T B f f d dN
f fM w
N
ξ ξ
σ
= =
=
⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎡ ⎤ ⎡ ⎤′ ′ ′− = − ⊗ ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟⎢ ⎥⎜ ⎟⎣ ⎦ ⎣ ⎦ − ⎝ ⎠ ⎝ ⎠⎝ ⎠⎣ ⎦−⎡ ⎤⎡ ⎤⎛ ⎞′ ′ ′+ + ⊗ ⊕⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎣ ⎦
L c 1 I P P b
L c 1 P b
where s s s sd M w μ= , ( )I II′′ ′=c c c , Ic is an 1n× vector and where f denotes the
common sampling fraction. This expression is not equal to zero, even when the
population consists of equal size clusters with homogeneous variances, and equal size
samples are taken from sample clusters. The result implies that some efficiency is lost in
prediction when collapsing wpY to wY . The predictor based on wpY will have smaller
MSE than the predictor based on wY , even in the settings considered by Stanek and
C07ed47.doc 11/29/2007 3:48 PM 20
Singer (2004) with no response error when clusters are of equal size, and equal size
samples are selected.
3.2. Predicting Linear Combinations of PSU Parameters Using Collapsing Expanded
RP Random Variables
We partition wpY into the first nN random variables corresponding to the sample,
wIY , and the remaining random variables, wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where
I I N′ ′ ′= ⊗g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 . Explicitly, the partitioned RP model is defined
by
2 1 2
IwI wI wI wI
IIwII wII wII wII
E Eξ ξ ξ
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟
⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
XY Y Y Eμ
XY Y Y E (8).
Requiring the predictor of T to be a linear function of wIY , to be unbiased, and to have
minimum MSE, the BLUP of T in (8) is
( ) ( ) ( )1 1 1
ˆ ˆ ˆn N N
p i i s s s sI II s s s s sIi s s
N N nT c Y Y c I M w Y c I M f w Yn n= = =
−⎛ ⎞ ⎛ ⎞= − + +⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ ∑ ∑ (9)
where *
1
ˆN
i is s s s sIs
Y U M w k Y=
= ∑ , 1
1ˆ ˆn
ii
Y Yn =
= ∑ , * **
* 1
111
Ns s
s s sss
k kk k d
d N k=
−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,
( )2 2
2 2 21s s
ss s se
f dk
f d N v=
+ − (from page 41, c07ed15.doc),
1
1 N
ss
k kN =
= ∑ , 1
1 N
II ii n
c cN n = +
=− ∑ ,
1
1 N
ii
c cN =
= ∑ and 1
n
s isi
I U=
= ∑ is an indicator ‘inclusion’ random variable for cluster s in
the sample (see Appendix A). An expression for the mean squared error (MSE) of the
C07ed47.doc 11/29/2007 3:48 PM 21
predictor can be developed directly using expressions for the variance, and simplifies to
(see c07ed17.doc, p15)
( ) ( ) ( )
( )
1 2
2*2 2 22 2
, 2 21 1 1
2 2
1
1 1ˆvar 2
2
n N Ns se se
p i I kd kd di s ss s
N
i I di
Nck v vT T c c
N n Nf f
Ncc Nc ncn
ξ ξ σ σ
σ
= = =
=
⎛ ⎞ ⎛ ⎞⎛ ⎞− = − − + +⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ ⎝ ⎠⎡ ⎤+ + −⎢ ⎥⎣ ⎦
∑ ∑ ∑
∑
where 1
1 n
I ii
c cn =
= ∑ , 1
1 N
d ss
dN
μ=
= ∑ , *
1
1 N
kd s ss
k dN
μ=
= ∑ , ( )22 *
1
11
N
kd s s kds
k dN
σ μ=
= −− ∑ ,
( )22
1
11
N
d s ds
dN
σ μ=
= −− ∑ and ( )( )*
,1
11
N
kd d s s kd s ds
k d dN
σ μ μ=
= − −− ∑ .
When predicting a PSU mean, 1s
s
wM
= , and the predictor simplifies to
( )ˆ ˆ ˆp iT Y Y Y= + − if i n≤ , and to pT Y= when i n> , where
1
N
i is sIs
Y U Y=
= ∑ . The MSE of
the sample PSU mean predictor (when i n≤ ) simplifies to (see c07ed25.doc, page 3)
( ) ( ) ( )
( ) ( )
1 2
2* *
1 1
2*2
1
1 1 1ˆvar 1 11
1 1 1 1
N N
p s s s ss s
Ns
s ss s
nT T k kn N N
n k fnN m
ξ ξ μ μ
σ
= =
=
⎡ ⎤− ⎛ ⎞⎛ ⎞− = − − −⎢ ⎥⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦
⎡ ⎤+ + − −⎣ ⎦
∑ ∑
∑,
while the MSE of a PSU not in the sample is given by
( ) ( )1 2
22
1
1 1ˆvar 1N
sp s
s s
nT T fn nN mξ ξ
σσ
=
+⎛ ⎞− = + −⎜ ⎟⎝ ⎠
∑ .
4. COMPARISON OF PREDICTORS
C07ed47.doc 11/29/2007 3:48 PM 22
The predictor pT is the best linear unbiased predictor, and hence has smaller MSE
than other predictors of T in its class. We compare the MSE of (9), including the simple
mean, and predictors (3) and (4).
When clusters are of equal size, have equal variance, and sample sizes are equal,
the MSE for each predictor can be explicitly calculated. In this setting, we also compare
the results with the MSE for predictor (5). For the sample mean, the MSE is equal to
( )2
1
1 1N
ss
s s
fN m
σ=
−∑ ; for (5), the MSE is ( ) ( ) ( )2
21ˆ 1 1eRP
nMSE T f knm nσ
σ⎡ ⎤−⎛ ⎞= − + −⎜ ⎟⎢ ⎥
⎝ ⎠⎣ ⎦. The
MSE for predictors (3) and (4) are given by
( ) ( )( )2 221ˆ 1e
RPnMSE T c f f knk M
σσ⎛ ⎞−⎛ ⎞+ − − + −⎡ ⎤⎜ ⎟⎜ ⎟ ⎣ ⎦⎝ ⎠⎝ ⎠
where 2e
mcm
σσ σ
2
2=+
for (3) and
( )2
1
e
f mc f
mσ
σ σ
2
2
−= +
+ for (4) (see Stanek and Singer 2004). Although we have explicit
expressions for the MSE of these predictors, the difference is a complicated function of
the population parameters. Since shrinkage constants for the expanded predictor depend
on the cluster means, we compare the MSE relative to the expanded finite population
mixed model predictor in Figure 1 in four settings, defining the subject intra-class
correlation as 2
2 2ss
σρσ σ
=+
. In each setting, the cluster means are equal to equally
spaced quantiles from the respective distributions. The results, expressed as percent
increase in MSE relative to the MSE of (9) illustrates that in all settings considered, there
is a substantial reduction in MSE (over 40% when 0.2f < ) using (9). This is true even
C07ed47.doc 11/29/2007 3:48 PM 23
for (5), illustrating that the results of Stanek and Singer (2004) are not optimal. There
was little difference in the MSE comparisons with different distributions of cluster means.
The results illustrate that predictor (3) has higher MSE relative to the other predictors
when 0.5f > . The MSE for predictors (4) and (5) are similar, and differ more from the
MSE of (9) when f is small.
Figure 2 summarizes increases in MSE for different intra-class correlations (using
quantiles of a uniform distribution to determine cluster and subject means). The results
illustrate that for low intra-class correlations, the relative increase in MSE can be
dramatic. Once again, for low sampling fractions, similar patterns in MSE are evident for
(3), (4) and (5).
Results in Figure 3 compare predictors corresponding to the sample mean, for (4)
and (5) in two settings where cluster sizes differ. Predictor (3) is not applicable in such
settings. These results are based on simulation studies that repeat a two stage sampling
process from a finite population, with 5000 trials for each simulation. The MSE is
estimated by the average squared difference between the predictor and the latent PSU
value in each case. In the first column, cluster size differs by 10-fold, with sample sizes
for clusters proportional to the sample size. The results illustrate the performance of the
predictors for different sampling fractions. The right column in Figure 3 compares the
MSE of predictors when the sample size per cluster is constant for all clusters.
5. DISCUSSION
C07ed47.doc 11/29/2007 3:48 PM 24
The expanded finite population mixed model uses a larger set of random variables
than the random variables typically used in super-population models or used in the
finite population mixed model of Stanek and Singer These random variables are fewer
than the 2 random variables that would result from a further expansion that would
retain the identity of subjects and SSUs, and even fewer than the very general
representation of the model used by Godambe (1955). The results illustrate an
intermediate set of random variables that enable a clear representation of a two stage
sample, while accounting for details on different cluster and sample sizes. Other
approaches are conceptually flawed, and appear not to be able to connect the potentially
observable data to the random variables in the stochastic model. Since there is more than
one finite population mixed model that can be used, we have shown how different models
can be compared by considering the models in a hierarchy, and identifying whether the
additional set of orthogonal random variables adds to the information about the target
random variable. It is valuable to note that these results depend on selection of a target
random variable. For example, if there is interest in the relationship between two
variables among subjects (in a cluster), the collapsed expanded set of random variables is
most likely no longer to be sufficient.
The new model and predictor offers substantial gains over previous predictors.
These gains are likely mitigated by the need to estimate shrinkage constants for use
empirical predictors. Simulation studies comparing the performance of the empirical
predictors (3), (4), and (5) in the equal cluster size/sample size settings indicate some
loss in efficiency, but a similar ordering of MSE (San Martino, Singer, and Stanek 2007).
C07ed47.doc 11/29/2007 3:48 PM 25
Limited simulation studies have been conducted using the expanded model predictor
which have indicated that there is a greater loss in the MSE of (9) relative to the other
predictors. Iterative estimation procedures may be possible, and are currently being
investigated. This area requires more study.
C07ed47.doc 11/29/2007 3:48 PM 26
REFERENCES
Bolfarine, H., and Zacks, S. (1992), Prediction Theory for Finite Populations, New York:
Springer-Verlag.
Bryk, A.S. and Raudenbush, S.W. (2003). (2nd edition) Hierarchical linear models. Sage
Publishing, New York.
Cassel, C.M., Sarndal, C.E., and Wretman, J.H. (1977), Foundations of Inference in
Survey Sampling, New York: Wiley.
Cochran, W. (1977), Survey Sampling, New York: Wiley.
Demidenko, E. (2004). Mixed models: Theory and Application, John Wiley, New York.
Deville, J.C., and Sarndal, C.E. (1992), “Calibration Estimation in Survey Sampling,”
Journal of the American Statistical Association, 87, 376-382.
Diggle, P. L., Heagerty, P., Liang, K. Y. and Zeger, S. (2002), Analysis of Longitudinal
Data, Oxford University Press.
C07ed47.doc 11/29/2007 3:48 PM 27
Ghosh, M. and Lahiri, P. (1987), “Robust Empirical Bayes Estimation of Means from
Stratified Samples,” Journal of the American Statistical Association, 82,1153-1162.
Goldberger, A. S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear
Regression Model,” Journal of the American Statistical Association, 57, 369-375.
Graybill, F. A. (1983), Matrices with applications in statistics, Belmont, California:
Wadsworth International.
Henderson, C.R. (1984), Applications of Linear Models in Animal Breeding, Guelph,
Canada: University of Guelph.
Henderson, C. R., Kempthorne, O., Searle, S. R. and von Krosigk , C. M., (1959), “The
Estimation of Environmental and Genetic Trends from Records Subject to Culling,”
Biometrics, 15, 192-218.
Hinkelmann, K., and Kempthorne, O. (1994), Design and Analysis of Experiments, Vol. 1,
Introduction to Experimental Designs, New York: Wiley.
Li, W. (2003), “Use of random Permutation Model in rate Standardization and
Calibration,” unpublished doctoral thesis, University of Massachusetts, Massachusetts.
C07ed47.doc 11/29/2007 3:48 PM 28
McCulloch, C. E. and Searle, S. R. (2001), Generalized, Linear, and Mixed Models, New
York: John Wiley and Sons.
McLean, R. A., Sanders, W. L. and Stroup, W. W. (1991), “A Unified Approach to
Mixed Linear Models,” The American Statistician, 45(1), 54-64.
Ockene, I. S., Hebert, J. R., Ockene, J. K., Saperia, G. M., Nicolosi, R., Merriam, P.A.
and Hurley, T. G. (1999), “Effect of Physician-delivered Nutrition Counseling Training
and an Office-support Program on Saturated Fat Intake, Weight, and Serum Lipid
Measurements in a Hyperlipidemic Population: Worcester Area Trial for Counseling in
Hyperlipidemia (WATCH),” Archives of Internal Medicine, Apr 12, 1 59(7), 725-731.
Rao, J.N.K. (2003), Small Area Estimation, New York: Wiley.
Robinson, G. K. (1991). “That BLUP is a Good Thing: the Estimation of Random
Effects,” Statistical Science, 6(1), 15-51.
Royall, R. M. (1976), “The Linear Least-squares Prediction Approach to Two-stage
Sampling,” Journal of the American Statistical Association , 71, 657-664.
C07ed47.doc 11/29/2007 3:48 PM 29
Sarndal, C-E, Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling,
New York: Springer-Verlag.
Scott, A. and T. M. F. Smith (1969), “Estimation in Multi-stage Surveys,” Journal of the
American Statistical Association, 64(327), 830-840.
Searle, S. R., Casella, G. and McCulloch, C. E. (1992), Variance Components. New
York: Wiley.
Stanek, E. J. III and Singer, J. M. (2004), “Predicting Random Effects from Finite
Population Clustered Samples with Response Error,” Journal of the American Statistical
Association, 99, 119-130.
Valliant, R., Dorfman, H. A. and Royall, R. M. (2000), Finite Population Sampling and
Inference, New York: Wiley.
Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,
New York: Springer-Verlag.
C07ed47.doc 11/29/2007 3:48 PM 30
APPENDIX A.
We partition wpY into the first nN random variables corresponding to the sample, wIY ,
and the remaining random variables, wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where
I I N′ ′ ′= ⊗g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 . Explicitly, the partitioned RP model is defined by
2 1 2
IwI wI wI wI
IIwII wII wII wII
E Eξ ξ ξ
⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟
⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦
XY Y Y Eμ
XY Y Y E
where 1
1 N
I n s ssw m
N =
⎡ ⎤⎛ ⎞= ⊗ ⊕⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦X 1 and
( )
1
1
1
1
N
N n s ss
II N
N s s ss
w mN
w M mN
− =
=
⎛ ⎞⎛ ⎞⊗ ⊕⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟=⎜ ⎟⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠
1X
1, random effects
are given by
( )( )
( )( )
( ) ( )
1
12 1 2
1
1
1
11
N
n s s I Is
NwI wI
N n s s II IIswII wII
N
N s ss
f d vec E
f d vec EE E
f d vec E
ξ
ξξ ξ ξ
ξ
=
− =
=
⎛ ⎞⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟⎡ ⎤ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊗ ⊕ −⎜ ⎟− = ⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎢ ⎥ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎛ ⎞ ⎡ ⎤⊗ ⊕ − −⎜ ⎟⎜ ⎟⎣ ⎦⎜ ⎟⎝ ⎠⎝ ⎠
I U U
Y Y I U UY Y
I U U
,
where 2
wI wI wI
wII wII wII
Eξ
⎛ ⎞ ⎛ ⎞ ⎛ ⎞= −⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
E Y YE Y Y
, and ( )I II=U U U ,
( )( ) ( )1 2I i n= =U U U U U and ( )( ) ( )1 2II i n n N+ += =U U U U U . The variance is given by (see page 5 of c07ed27.doc)
( )( )
( ) ( ) ( )
1 1 1 1
1 1 1 1
11var
1 1 1 1
N N N N
N s s N s s N s s N s ss s s s
wp N N N N
N s s N s s N s s N s ss s s s
f d f d f d f d
N f d f d f d f d
= = = =
= = = =
⎛ ⎞⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ ⊕ ⊗ ⊕ ⊕ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎜ ⎟= ⎜ ⎟− ⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − ⊕ ⊗ ⊕ − ⊕ −⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠
+
P P P PY
P P P P
I2
1
NN NsesN N
v=
−⎛ ⎞ ⎛ ⎞⊗ ⊕⎜ ⎟ ⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠
II I
which we partition such that 1 2
,
,
var I I IIwI
I II IIwIIξ ξ
⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟′⎝ ⎠⎝ ⎠
V VYV VY
, where
( )2 2
2 1 s s sse s s
M wv f fNσ
= − (from page 41, c07ed15.doc).
C07ed47.doc 11/29/2007 3:48 PM 31
We develop an expression for the best linear unbiased predictor of T that is a
linear function of the sample, wI′L Y . Since ( ) wIwI I II
wII
T⎛ ⎞
′ ′ ′ ′− = − − ⎜ ⎟⎝ ⎠
YL Y L g g
Y, the
unbiased constraint given by ( ) 0I I II II′ ′ ′− − =L g X g X . Minimizing ( )1 2var wI Tξ ξ ′ −L Y
while accounting for the unbiased constraint using Lagrange multipliers results in the
familiar solution,
( ) ( )1 11 1 1 1 1 1
,ˆ
I I I I I I I I I I II II I I I I I II II
− −− − − − − −⎡ ⎤′ ′ ′ ′= + − +⎢ ⎥⎣ ⎦
L g V V X X V X X V V g V X X V X X g .
This result simplifies to (see c06ed56.doc, p42 and c07ed01.doc p1)
( )*
1 1
1ˆ N Ns
n I N n N II n Ns ss s
k N N nc cf n f n= =
⎛ ⎞ ⎡ ⎤⎡ ⎤ ⎛ ⎞ −= ⊗ ⊕ + ⊗ ⊕ + ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠
L P c 1 1 1 1 1
where * **
* 1
111
Ns s
s s sss
k kk k d
d N k=
−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,
( )2 2
2 2 21s s
ss s se
f dk
f d N v=
+ − (from page 41,
c07ed15.doc), 1
1 N
ss
k kN =
= ∑ , 1
1 N
II ii n
c cN n = +
=− ∑ and
1
1 N
ii
c cN =
= ∑ . The predictor
ˆ ˆp wIT ′= L Y can be expressed as (see page 6 of c07ed01.doc)
( ) ( ) ( )1 1 1
ˆ ˆ ˆn N N
p i i s s s sI II s s s s sIi s s
N N nT c Y Y c I M w Y c I M f w Yn n= = =
−⎛ ⎞ ⎛ ⎞= − + +⎜ ⎟ ⎜ ⎟
⎝ ⎠ ⎝ ⎠∑ ∑ ∑
where *
1
ˆN
i is s s s sIs
Y U M w k Y=
= ∑ , 1
1ˆ ˆn
ii
Y Yn =
= ∑ , and 1
n
s isi
I U=
= ∑ is an indicator ‘inclusion’
random variable for cluster s in the sample.
C07ed47.doc 11/29/2007 3:48 PM 32
An expression for the mean squared error (MSE) of the predictor can be
developed directly using expressions for the variance, and simplify to (see c07ed17.doc,
p15)
( ) ( ) ( )
( )
1 2
2*2 2 22 2
, 2 21 1 1
2 2
1
1 1ˆvar 2
2
n N Ns se se
p i I kd kd di s ss s
N
i I di
Nck v vT T c c
N n Nf f
Ncc Nc ncn
ξ ξ σ σ
σ
= = =
=
⎛ ⎞ ⎛ ⎞⎛ ⎞− = − − + +⎜ ⎟ ⎜ ⎟⎜ ⎟
⎝ ⎠⎝ ⎠ ⎝ ⎠⎡ ⎤+ + −⎢ ⎥⎣ ⎦
∑ ∑ ∑
∑
where 1
1 n
I ii
c cn =
= ∑ , 1
1 N
d ss
dN
μ=
= ∑ , *
1
1 N
kd s ss
k dN
μ=
= ∑ , ( )22 *
1
11
N
kd s s kds
k dN
σ μ=
= −− ∑ ,
( )22
1
11
N
d s ds
dN
σ μ=
= −− ∑ and ( )( )*
,1
11
N
kd d s s kd s ds
k d dN
σ μ μ=
= − −− ∑ .