Predicting Random Effects for Different Size Clusters ... · Julio M. Singer Departamento de...

C07ed47.doc 11/29/2007 3:48 PM i

Predicting Random Effects for Different Size Clusters Using an Expanded Finite

Population Mixed Model

Edward J. Stanek III

Department of Public Health

401 Arnold House

University of Massachusetts

715 North Pleasant Street

Amherst, MA 01003-9304 USA

[email protected]

Julio M. Singer

Departamento de Estatística

Universidade de São Paulo

São Paulo, Brazil

[email protected]

C07ed47.doc 11/29/2007 3:48 PM ii

ABSTRACT

Prediction of random effects is an important problem with expanding applications.

In the simplest context, the problem corresponds to prediction of the latent value (the

mean) of a realized cluster selected via two-stage sampling. Best linear unbiased

predictors developed from mixed models are widely used, but in equal cluster size

settings, have been recently shown to be out-performed by finite population mixed model

predictors (Stanek and Singer 2004). When clusters differ in size, super-population

models have been used to predict the contribution of the unobserved subjects to a realized

cluster mean, providing an extension of the mixed model to a finite population. We

develop an expanded finite population mixed model for use in predicting linear

combinations of realized cluster means. The predictor is a linear function of the sample,

unbiased, and has minimal MSE. Comparison with mixed model, super-population

model, and finite population mixed model predictors demonstrates substantial reduction

in the MSE, even in settings when cluster sizes are equal.

The general approach faithfully capturing the stochastic aspects sampling in the

problem. The expanded random variables span a higher dimensional space than those

typically applied to such problems. This fact enables predictors with reduced MSE to

result. The general approach describing the expansion of the random variables, and

subsequent reductions to enable a sufficient is illustrated in the two stage setting, and has

the potential further application.

C07ed47.doc 11/29/2007 3:48 PM iii

Contact Address: Edward J. Stanek III Department of Public Health 401 Arnold House University of Massachusetts at Amherst Amherst, Ma. 01002 Phone: 413-545-3812 Fax: 413-545-1645 Email: [email protected] KEYWORDS: superpopulation, best linear unbiased predictor, random permutation, optimal estimation, design-based inference, mixed models. ACKNOWLEDGEMENT. This work was developed with the support of the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil and the National Institutes of Health (NIH-PHS-R01-HD36848, R01-HL071828-02), USA.

C07ed47.doc 11/29/2007 3:48 PM 1

1. INTRODUCTION

How to best guess the average response for subjects in a cluster based on data that

includes only some subjects in some clusters is a common problem. For example,

medical costs and quality of care are important factors that impact health care economics,

and influence patient choice of care. Web sites are currently available that rate hospitals

in regions according to these characteristics (as for example see

http://www.healthgrades.com). Estimating such rates and average costs for hospitals that

typically vary in size is an important practical problem.

The best linear unbiased predictor (BLUP) developed in a mixed model is often

offered as a solution to the problem of predicting the average cluster response. This

solution accounts for unequal numbers of subjects in sample clusters, but does not use

information that is often available about the size of the cluster. An alternative approach

uses the super-population model of Scott and Smith (1969) to account for different

cluster sizes in a finite super-population. Both approaches specify stochastic models that

plausibly represent the problem of interest, but are not formally linked to the finite

population. A third approach, limited to settings where clusters are of equal size, is the

finite population mixed model of Stanek and Singer (2004). This approach uses sampling

indicator random variables to link the sample to the population, resulting in predictors

with smaller mean squared error (MSE) than the other approaches, even when empirical

predictors are used (San Martino, Singer, and Stanek (2007)). We revisit this problem

when clusters differ in size, illustrating that each previous approach does not adequate

represent the problem, and develop an expanded finite population mixed model that

C07ed47.doc 11/29/2007 3:48 PM 2

overcomes these limitations, resulting in a predictor that out performs previous predictors

of the latent value of a realized random effect in all settings.

2. BACKGROUND AND AN EXAMPLE

We consider a simple example using hypothetical data on the cost of

appendectomies (assumed to be known without error) for patients in two hospitals to

motivate and provide background for the problem (Table 1).

Table 1. Data on Hospital Costs for Appendectomy for Patients in Two Hospitals

Data Hospital

Hospital Patient Expense # PatientsLatent Value

Sam Evans $1400 Jane Blake $2100

Central

Hong Yao $2500

3 Centralμ

Juan Marcus $1900 Mercy Mary Slokum $1700

2 Mercyμ

Our interest is in the average cost of appendectomies (the latent value) for each hospital

in the past year. If the available data (in Table 1) includes the cost of all appendectomies

in this period for a hospital, the latent value is the simple average cost. When such data

are not available for all appendectomy patients, the average cost for the available patients

in each hospital ( i.e. $2000 for Central, and $1800 for Mercy) can be used to estimate of

the latent value for each hospital. This estimator is the best linear unbiased estimator if

the available data resulted from a stratified simple random sample of appendectomy

patients, with hospitals as strata.

C07ed47.doc 11/29/2007 3:48 PM 3

A different stochastic model assumes that the data in Table 1 are the result of a

two-stage cluster sample, where a simple random sample of appendectomy patients is

selected from each of a simple random sample of hospitals. We refer to a sample hospital

as a primary sampling unit (PSU) to distinguish it from a specific hospital, and to a

sample patient as a secondary sampling unit (SSU) to distinguish it from a specific

patient. Using the notation in Table 2,

Table 2. Notation for Common Mixed Model Representation of Data

Data PSU

Sample Hospital PSU i

SamplePatient SSU j

ExpenseijY

SampleSize

im

Latent Value

iBμ +

1j = 11Y 2j = 12Y

1i =

3j = 13Y

3 1Bμ +

1j = 21Y 2i = 2j = 22Y

2 2Bμ +

a model for ijY , the appendectomy cost for SSU j in PSU i is given by the mixed model

ij i ijY B Eμ= + + (1)

where μ is the overall mean, iB is the random effect for PSU i , and ijE is a random

variable corresponding to the deviation of SSU j from the latent value, i iT Bμ= + , of

PSU i . Model (1) is an example of the general linear mixed model

= + +Y Xα ZB E (2)

C07ed47.doc 11/29/2007 3:48 PM 4

where Y is an 1r × response vector, X and Z are r p× and r q× known design

matrices, respectively, α is a 1×p vector of fixed effects, B is a 1×q vector of random

effects with null means and covariance matrix Γ , E is a 1r × vector of random errors

with null means and covariance matrix Σ and E is independent of B , such that

( )var ′= =Y Ω ZΓZ + Σ . This model has a long history (see for example Harville 1978,

Laird and Ware 1982) and is the main topic in several recent texts (as for example,

Brown and Prescott 1999, Byrk and Raudenbush ( ),Demeidenko ( ), Diggle et al

2003, Littell et al 2006, McCulloch and Searle 2001, Singer and Willett 2003, Verbeke

and Molenberghs 2000, or Vonesh and Chinchilli 1997). For the data in Table 2 using

(1), where 1,..., 2i n= = , 1,..., ij m= and defining 1

n

ii

r m=

= ∑ , the terms in (2) are given

by r=X 1 , 1 i

n

mi== ⊕Z 1 , μ=α , and 1( , , )′= … nB BB with 2

nσ=Γ I and 2

1 i

n

i miσ

== ⊕Σ I ,

where a1 is an 1a× vector with all elements equal to one, aI is an a a× identity matrix,

and 1

n

ii=⊕A denotes a block diagonal matrix with blocks given by iA (Graybill 1983). In

these expressions, ( ) 2var iB σ= , and ( ) 2var ij iE σ= for 1,...,=i n . The mixed model

predictor of the latent value for PSU i

( )ˆ ˆ ˆi i iP k Yμ μ= + − (3)

C07ed47.doc 11/29/2007 3:48 PM 5

is a linear function of Y (i.e., iP ′= L Y ), unbiased (i.e., ( )ˆ 0i iE P T− = ), and has

minimum mean squared error (MSE), where 1

1

ˆn

iin

ii

i

wY

wμ

=

=

⎛ ⎞⎜ ⎟⎜ ⎟=⎜ ⎟⎜ ⎟⎝ ⎠

∑∑

is a weighted sample

mean with 2 2

1/i

i i

wmσ σ

=+

, 1

1 im

i ijji

Y Ym =

= ∑ , and 2

2 2 /ii i

km

σσ σ

=+

(Goldberger 1962;

Henderson 1984, McLean, Sanders and Stroup 1991; Robinson 1991). If we assume that

the data in Table 1 is the realization of the random variables represented in Table 2, and

in addition that σ =100 , 1 3σ = 00 and 2 50σ = , then ˆ $1844μ = , 1 0.25k = , 2 0.89k = ,

and the predictor of the latent value for the realized hospital corresponding to 1i = (i.e.,

Central) is 1 $1883P = , while the predictor of the latent value for 2i = (i.e. Mercy) is

2 $1805P = .

When the two-stage sampling model can be assumed (either because such

sampling was conducted, or because it is considered to be a plausible model), there is

general agreement that (3) rather than the hospital sample mean, iY should be used to

predict the latent value for the realized PSU since the predictor has smaller MSE, a fact

related to the mixed model being unconditional (on hospitals), in contrast to the stratified

sample model that estimates a cluster mean by iY . As noted by Robinson (1991)

following Goldberger (1962), expressing i i iT X μ ′= + Z B , where 1iX = and

( )0 1 0i′ =Z is a 1 n× vector with the a value of one in column i , (3) can be

C07ed47.doc 11/29/2007 3:48 PM 6

motivated as the best linear unbiased predictor (BLUP) of iT from the joint distribution

of ( )iT ′′Y .

The stratified model and mixed model estimate a realized hospital’s latent value

without using additional population detail, such as the number of hospitals in the

population, or the number of appendectomy patients in each hospital, even though such

additional information may be available (as, for example, in Table 3).

Table 3. Population of Hospitals and Appendectomy Patients in the Past Year

Population Hospital Hospital ( )s

Patient ( )t

Expensesty

# PtssM

Meansμ

Variance 2sσ

1t = 11y 1s = (County)

2t = 12y 2 1μ 2

1σ

1t = (Jane Blake)

21y

2t = 22y 3t =

(Sam Evans) 23y

2s = (Central)

4t = (Hong Yao)

24y

4 2μ 22σ

1t = (Mary Slokum)

31y

2t = (Juan Marcus) 32y

3s = (Mercy)

3t = 33y

3 3μ 23σ

Recognizing the limitation of (3), Scott and Smith (1969) developed a predictor of iT by

augmenting the random variables in Table 2 by a remaining set of random variables in

Table 4, motivated by the finite population in Table 3.

C07ed47.doc 11/29/2007 3:48 PM 7

Table 4. Remaining Random Variables Not Realized via Sampling

Remaining PSUs/SSUs PSU PSU ( )i

SSU ( )j

ExpenseijY

# PtsiM

Mean iBμ +

Variance 2iσ

1i = (Central)

4j = 14Y 4 1Bμ + 21σ

2i = (Mercy)

3j = 23Y 3 2Bμ + 22σ

1j = 31Y 3i = (County)

2j = 32Y 2 3Bμ + 2

3σ

Random variables in the super-population model consist of the joint set of random

variables in Tables 2 and 4. As described by Scott and Smith, the super-population is

constructed by first selecting a finite population (presumably from some larger

population in time or space), and then selecting a two-stage sample from the realized

population. The latent value of a hospital in the super-population is

( )1

11i

i

M

i i i i ijj mi i

T f Y f YM m = +

= + −− ∑ where i

ii

mfM

= . Using similar assumptions as for the

mixed model, but applying them to all super-population model random variables, Scott

and Smith’s predictor is given by

( ) ( )ˆ ˆ ˆ1i i i i i iP f Y f k Yμ μ⎡ ⎤= + − + −⎣ ⎦ (4)

and can be motivated as the best linear unbiased predictor of iT from the joint distribution

of 1

1 i

i

M

ijj mi i

YM m = +

⎛ ⎞′⎜ ⎟

−⎝ ⎠∑Y . This joint distribution more clearly identifies the portion of

C07ed47.doc 11/29/2007 3:48 PM 8

the PSU that is not observed relative to the joint distribution used for the mixed model.

This added specificity results in a smaller MSE for (4) than (3) as indicated by Stanek

and Singer (2004). Using (4) for the data in Table 1, the predictor of the latent value for

the realized hospital corresponding to 1i = (i.e., Central) is 1 $1971P = , while the

predictor of the latent value for 2i = (i.e. Mercy) is 2 $1802P = .

The super-population model represented in Tables 2 and 4 includes some aspects

of the finite population, but does not clearly separate realized clusters from random

variables representing the sampling of clusters. For example, responses for SSUs in the

same PSU are correlated in the super-population model, even though the cluster

associated with each PSU (as in Table 4) is assumed to be known. This limitation was

addressed by Stanek and Singer (2004) when clusters are of equal size (i.e. M ) and

within cluster sample sizes are equal (i.e. m ) by formally representing the two-stage

sample process using sampling indicator random variables. The sampling model clearly

separated clusters from PSUs, and resulted in a slightly different variance structure than

that of Scott and Smith: first, the variance of SSUs for a PSU, 2iσ , in the two-stage model

was replaced by the average SSU variance, 2 2

1

1 N

e ssN

σ σ=

= ∑ ; and second, the correlation

between response for SSUs in a PSU included some small corrections proportional to

1M

or 1N

due to the finite number of clusters and finite cluster size. The predictor given

by

( ) ( )ˆ 1i i iP fY f Y k Y Y⎡ ⎤= + − + −⎣ ⎦ (5)

C07ed47.doc 11/29/2007 3:48 PM 9

differs from (4) in that PSU weights are equal, resulting in 1

1ˆn

ii

Y Yn

μ=

= = ∑ , mfM

= , and

*2

*2 2 /e

km

σσ σ

=+

, where 2

*2 2 e

Mσσ σ= − . Theoretically, when cluster have equal size and

sample sizes per PSU are equal, as shown by Stanek and Singer (2004), the MSE of (5) is

less than the MSE for (4) or (3). The empirical version of (5) formed by replacing

variance components with their sample estimates also out-performs the empirical

versions of the other predictors (San Martino, Singer, and Stanek 2007). However, (5)

can not be used for data in Tables 2 and 4 since cluster sizes differ.

If clusters are of equal size, Stanek and Singer’s approach can be used to

represent the remaining random variables (similar to Table 4) without the need to identify

the realized clusters for sample PSUs. When clusters differ in size, this strategy for

representing remaining random variables is problematic. For example, when the realized

hospital for PSU 1i = is not known, we do not know how many SSUs are remaining in

Table 4 (Is it one, as would be appropriate if the realized PSU is Central Hospital, or zero,

as would be appropriate if the realized PSU is Mercy Hospital?). Other problems include

the inability of County Hospital to correspond to the PSU 1i = in Table 2, even though

the first stage sampling is assumed to be simple random sampling, or the apparent

random nature of the second stage sample size, PSU size, and SSU variance due to the

first stage sampling.

We extend the expanded model used by Stanek, Singer, and Lencina (2004) for

simple random sampling to two stage sampling to overcome these problems. The

expanded model simultaneously retains the cluster identify and the PSU position, and

C07ed47.doc 11/29/2007 3:48 PM 10

distinguishes for a PSU the relevant contribution of sample SSUs, and non-sampled SSUs

to a target random variable such as a PSU mean. The expansion replaces sums of random

variables by individual random variables to represent response, resulting in a set of

random variables that spans a higher dimensional space that the random variables used in

Stanek and Singer (2004). Since there are two stages of sampling, there are two levels of

expansion of the random variables. We first define this expanded set of random variables,

and subsequently investigate whether a lower dimensional set of random variables can

adequately represent the problem without loss of information, using a theorem of Rao and

Bellhouse (1978). We arrive at such a collapsed set of random variables, and show that

when predicting a weighted linear combination of PSU means (or totals) via a linear

unbiased predictor, the expanded model can not be further reduced without loss of

information. The best linear unbiased predictor is developed (similar to Stanek and

Singer 2004) and the theoretical expected mean squared error is characterized and

compared (via simulation) to mixed model and super population model predictors. We

conclude by highlighting model features that have consequence in extending this work

and related work that offer promising possibilities for future improvement.

2. AN EXPANDED RP MODEL FOR A FINITE CLUSTERED POPULATION

Let a finite population be defined (as in Table 3) by a listing of 1,..., st M=

subjects in each of 1,...,s N= clusters, where the non-stochastic response for subject t in

cluster s is given by sty . The finite population parameters corresponding to the mean

C07ed47.doc 11/29/2007 3:48 PM 11

and variance of cluster s are defined by 1

1 sM

s stts

yM

μ=

= ∑ and

( )22

1

1 1 sMs

s st sts s

My

M Mσ μ

=

⎛ ⎞−= −⎜ ⎟

⎝ ⎠∑ , respectively, where we use the survey sampling

definition of the parameter 2sσ . Similarly, the population mean, and the variance

between cluster means are defined as 1

1 N

ssN

μ μ=

= ∑ and ( )22

1

1 1 N

ss

NN N

σ μ μ=

−⎛ ⎞ = −⎜ ⎟⎝ ⎠

∑ ,

respectively. Using these parameters, we represent the potentially observable response

for subject t in cluster s as st s sty μ β ε= + + where ( )s sβ μ μ= − is the deviation of the

mean for cluster s from the overall mean, and ( )st st syε μ= − is the deviation of subject

t ’s response (in cluster s ) from the mean for cluster s . Defining

( )1 2 N

′′ ′ ′=y y y y where ( )1 2 ss s s sMy y y ′=y , the model can be

summarized as

μ= + +y X Zβ ε

where =X 1 , 1 s

N

MN s× == ⊕Z 1 ,

1

N

ss

M=

= ∑ , ( )1 2 Nβ β β ′=β , and ε is defined

similarly to y . None of the terms in the model are random variables.

2.1. Random Variables and The Two Stage Random Permutation (RP) Model

We explicitly define a vector of random variables that represents a two stage RP

of the population. Assuming that each realization of the permutation is equally likely

C07ed47.doc 11/29/2007 3:48 PM 12

with probability

1

1

! !N

ss

N M=∏

, the random variables formally represent two-stage sampling

(Cochran 1977). We assume that the sample clusters are in the first n positions in a

permutation of clusters and that the sample subjects in cluster s correspond to the

subjects in the first sm positions in a permutation of the cluster’s subjects. These

definitions represent random variables as a sequence as opposed to the more usual set

notation.

We use indicator random variables to relate the response for subject t in cluster s ,

sty , to the response for SSU j in PSU i in a two stage permutation of clusters and

subjects. We define ( )sjtU as an indicator random variable that takes on a value of one

when SSU j in cluster s is subject t , and zero otherwise, and use it to represent response

for SSU j in cluster s by ( )

1

sMs

sj jt stt

Y U y=

= ∑ . We include a fixed non-stochastic weight for

SSU j in cluster s , sjw , and define the weighted response as wsj sj sjY w Y= so that the sum,

1

sM

wsjj

Y=∑ , will correspond to a cluster total when 1sjw = for all 1,..., sj M= , or a cluster

mean when 1sj

s

wM

= for all 1,..., sj M= . Defining ( ) ( ) ( )( )( )1 2 s

ss ssj j j jMU U U ′=U ,

( )swsj sj s jY w ′= y U . The vector ( )1 2 sws ws ws wsMY Y Y ′=Y represents a permutation of

weighted response for SSUs in cluster s .

C07ed47.doc 11/29/2007 3:48 PM 13

We define isU as an indicator random variable that takes on a value of one when

PSU i is cluster s , and a value of zero otherwise. If all clusters were equal in size, we

could represent a permutation of SSUs for PSU i by 1

N

is wss

U=∑ Y . When cluster sizes differ,

this sum is not defined since the dimension of the vectors in the sum differs. We solve

this problem by expanding the random variables for PSU i into the 1× vector

( )( ) ( )1 1 2 2wi is ws i w i w iN wNU U U U ′′ ′ ′= =Y Y Y Y Y . A two stage random permutation of

the population is represented by the 1N × vector,

( )( ) ( )1 2w wi w w wN′′ ′ ′= =Y Y Y Y Y , where an element of wY is given by is wsjU Y .

2.2 A Mixed Effect Model for the Expanded Random Variables

We represent a mixed model for the expanded RP model, indexing representing

expectation with respect to permutations of clusters with the subscript 1ξ and expectation

with respect to permutations of subjects in a cluster with the subscript 2ξ . For PSU i , we

express

( ) ( ) ( )1 1 2 1 1 1|wi wi wi wi wiE E Eξ ξ ξ ξ ξ ξ⎡ ⎤= + − +⎣ ⎦Y Y Y Y E

where ( )1 2 1

1 N

wi ssE

Nξ ξ =

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

Y w μ , ( )2 1| 1

N

wi s s isEξ ξ μ

=

⎛ ⎞= ⊕⎜ ⎟⎝ ⎠

Y w U , ( )2 1|wi wi wiEξ ξ= −E Y Y ,

( )( ) ( )1 2 ss sj s s sMw w w w ′= =w , ( )( ) ( )1 2s Nμ μ μ μ ′= =μ ,

C07ed47.doc 11/29/2007 3:48 PM 14

( )( ) ( )1 2i is i i iNU U U U ′= =U and the deviation of response from the expected

response within a PSU is wiE . The fixed effects are given by μ , the vector of cluster

means. The random effects, ( ) ( )2 1 1 1| wi wiE Eξ ξ ξ ξ−Y Y , are defined as the deviation from

the fixed effects of the expected response conditional on a realized PSU. In the RP

model of Stanek and Singer (2004), the random effect for the mean of PSU i was defined

explicitly as *1 1 * 1

1N N N

is s is s is ss s s

U U UN

β μ μ= = =

⎛ ⎞= −⎜ ⎟⎝ ⎠

∑ ∑ ∑ , with the random variables isU

explicitly linking the clusters to PSU i . In the expanded RP model, random effects are

defined for SSU j in PSU i as ( )( )1sj s is isw U E Uξμ − . For example, when 1sj

s

wM

= for

all 1,..., sj M= (corresponding to the PSU mean), summing over j ,

( )( )11

1sM

sj s is is is s sj

w U E U UNξμ μ μ

=

− = −∑ . For both models, the expected value of the

random effect (with respect to 1ξ ) is zero, but this result arises from quite different

circumstances. We combine these expressions arriving at the expanded RP mixed model

given by

( )( )11 1

1 N N

w N s N s s ws svec E

N ξμ= =

⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ + ⊗ ⊕ − +⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦Y 1 w μ I w U U E (6)

where ( )1 2 N=U U U U . The variance of random effects is given by

C07ed47.doc 11/29/2007 3:48 PM 15

( )( )1 2 11 1 1

1var1

N N N

N s s N s s N s ss s svec E

Nξ ξ ξμ μ μ= = =

⎛ ⎞⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − = ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢ ⎥−⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠I w U U P w P w

while ( )1 2

2

1 1 1var

s s

s

M MNs

w N sj M sjs j jw w

Nξ ξσ

= = =

⎛ ⎞⎡ ⎤⎛ ⎞ ⎛ ⎞= ⊗ ⊕ ⊕ ⊕⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎝ ⎠E I P , where 1

a a aa= −P I J and aJ

denotes an a a× matrix with all elements equal to one.

2.3. Defining Random Variables of Interest

Model (6) is an expanded version of a mixed model that retains the identity of

clusters, while accounting for a two stage RP. Our interest is in predicting a linear

combination of these random variables defined by wT ′= g Y , where g is non-stochastic.

Although many linear combinations are possible, we limit the discussion to linear

combination given by wT ′= g Y where

′ ′ ′= ⊗g c 1 (7)

and c is a 1N × vector of constants. In particular, we focus on the setting where i=c e

is an 1N × vector with all elements equal to zero, except for element i which has the

value of one. Of principal interest is the setting where i n≤ , such that the cluster of

interest is realized in the sample. When 1sj

s

wM

= for all 1,...,s N= , 1,..., sj M= , the

target random variable, 1 1

sMN

is sj sjs j

T U w Y= =

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑ ∑ is the mean of PSU i ; when 1sjw = for all

C07ed47.doc 11/29/2007 3:48 PM 16

1,...,s N= , 1,..., sj M= , the target random variable, 1 1

sMN

is sj sjs j

T U w Y= =

⎛ ⎞= ⎜ ⎟

⎝ ⎠∑ ∑ is the total of

PSU i .

3. PREDICTING A PSU MEAN USING AN EXPANDED RP MODEL

We apply the basic strategy for developing a predictor given by Scott and Smith

1969; Royall 1976; Bolfarine and Zacks 1992; Valliant et al. 2000; and Stanek and

Singer 2004, to the expanded RP model. We assume that the elements in the sample

portion of wY will be observed, and express T as the sum of two parts, one which is a

function of the sample, and the other which is a function of the remaining random

variables. Then, requiring the predictor to be a linear function of the sample random

variables and to be unbiased, coefficients are evaluated that minimize the expected value

of the MSE given by ( )1 2ˆvar T Tξ ξ − . While in theory, an optimal predictor can be

expressed following this prediction recipe, in practice, the high dimensionality of the

vectors from the expansion of random variables may result in singularities that prevent a

unique solution (as in Stanek, Singer, and Lencina (2004)). In part for this reason, we

first explore projections of the expanded random variables into a lower dimensional space

that retains the necessary information for an optimal solution, thereby simplifying the

problem.

3.1. Partial Collapsing of the Expanded RP Random Variables

C07ed47.doc 11/29/2007 3:48 PM 17

Rao and Bellhouse (1978) (see Theorem 1.1) provide a way of determining

whether the optimal linear unbiased predictor of a target random variable, wT ′= g Y can

be obtained as the optimal linear unbiased predictor of p wpT ′= g Y based on wp w′=Y C Y ,

a vector of random variables that spans a lower dimensional space. We apply this

theorem when ( )( )

1 1

1 1

s s s

s s s

N N

m M mi sN N

m M mi s

−= =

−= =

⎛ ⎞′ ′⊕⊕⎜ ⎟′ = ⎜ ⎟

⎜ ⎟′ ′⊕⊕⎜ ⎟⎝ ⎠

1 0C

0 1 and ( ) 1

p

−⎡ ⎤′ ′ ′= ⎢ ⎥⎣ ⎦g g C C C . The additional

subscript p denotes the partial collapsing of the expanded random variables. The

collapsing sums the SSUs for the sample and remainder in each cluster for each PSU,

reducing the number of random variables from N to 22N . Since

( ) 1

w wp w

−⎡ ⎤′= +⎢ ⎥⎣ ⎦ CY C C C Y P Y , where ( ) 1

N

−′ ′= −CP I C C C C , we can express

( ) 1

w wp w

−⎡ ⎤′ ′ ′ ′= +⎢ ⎥⎣ ⎦ Cg Y g C C C Y g P Y . Using (7), [ ]2p N′ ′ ′ ′= ⊗ ⊗g 1 c 1 , 0w′ =Cg P Y and

p wpT ′= g Y . Let ˆpL represent an 1nN × constant vector, and wIY be the first nN

random variables (corresponding to the sample) in wpY . Then defining ˆ ˆp p wIT ′= L Y as

the optimal linear unbiased predictor of T based on wpY , and ˆpB as a linear unbiased

predictor of ( ) 0w′ =Cg P Y , Theorem 1.1 (Rao and Bellhouse 1978) states that pT will be

optimal for wT ′= g Y if and only if ( )1 2ˆ ˆ 0p pE T T Bξ ξ

⎡ ⎤− =⎣ ⎦ . Expressing

( )1 2ˆ ˆ

pE T T Bξ ξ⎡ ⎤−⎣ ⎦ as a function of ( )1 2 w wEξ ξ ′Y Y , and simplifying terms, we find that when

C07ed47.doc 11/29/2007 3:48 PM 18

sj sw w= for all 1,..., sj M= , ( )1 2ˆ ˆ 0p pE T T Bξ ξ

⎡ ⎤− =⎣ ⎦ (see c06ed54.doc for details). This

implies that we can obtain the optimal predictor using the partially collapsed random

variables as long as within each cluster, the weights are equal for all SSUs.

We assume that sj sw w= for all 1,..., sj M= in subsequent developments, and

develop the BLUP of p wpT ′= g Y based on wpY . The vector wpY contains 22N random

variables. The first 2N random variables are of the form is s s sIU w m Y , while the

remaining 2N random variables are of the form ( )is s s s sIIU w M m Y− , where

1

1 sm

sI sjjs

Y Ym =

= ∑ and 1

1 s

s

M

sII sjj ms s

Y YM m = +

=− ∑ . Before developing the predictor, we

consider whether some additional collapsing of the random variables is possible without

loss of information.

It is natural to consider whether it is sufficient to predict p wpT ′= g Y using the 2N

collapsed random variables defined by *w wp

′=Y C Y where *2N N

′ ′= ⊗C I 1 . Note that

*wT ′= g Y and ( ) 1

* * * *p

−⎡ ⎤′ ′′= ⎢ ⎥⎣ ⎦g g C C C which simplifies to *

2′ ′ ′= ⊗g 1 c . This set of

random variables is similar to those used by Stanek and Singer (2004) for a population

with equal size clusters and equal size samples per cluster with no response error.

We apply the Rao-Bellhouse theorem to see if an optimal predictor of T can be

based on wY . Let us define such a predictor as ˆ ˆwIT ′= L Y where L represents an 1n×

constant vector, and wIY represents the first n random variables (corresponding to the

C07ed47.doc 11/29/2007 3:48 PM 19

sample) in wY . For T to be unbiased, we require

( ) ( ) ( )( )1 2

ˆ ˆwI n s N s sE T m M mξ ξ ′ ′ ′− = − −L Y L 1 c 1 to be zero. This will be zero only if

sampling of clusters is conducted with probability proportional to size (PPS)(see page 8,

c07ed29.doc).

Supposing that sampling is PPS sampling, and notice that

( )**

w p wpT ′ ′= +C

g Y g P Y since ( )* 0p wp′ =C

g P Y . Defining ( )ˆ ˆn N wIB ′= ⊗b I P Y as a

linear unbiased predictor of ( )*p wp′C

g P Y based on the sample part of * wpCP Y given by

( )n N wI⊗I P Y , the predictor T will be optimal if and only if ( )1 2ˆ ˆ 0E T T Bξ ξ

⎡ ⎤− =⎣ ⎦ .

Simplifying this expectation, we find that (see page 18 of c07ed29.doc)

( ) ( )( ) ( )

1 2 1 1

2 2

1

1ˆ ˆ ˆ ˆ1

1 ˆ ˆ

N N

I N n s N s Ns s

N

I N s s s Ns

E T T B f f d dN

f fM w

N

ξ ξ

σ

= =

=

⎡ ⎤⎛ ⎞⎛ ⎞ ⎛ ⎞⎡ ⎤ ⎡ ⎤′ ′ ′− = − ⊗ ⊗ ⊕ ⊕⎜ ⎟ ⎜ ⎟⎢ ⎥⎜ ⎟⎣ ⎦ ⎣ ⎦ − ⎝ ⎠ ⎝ ⎠⎝ ⎠⎣ ⎦−⎡ ⎤⎡ ⎤⎛ ⎞′ ′ ′+ + ⊗ ⊕⎢ ⎥⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦⎣ ⎦

L c 1 I P P b

L c 1 P b

where s s s sd M w μ= , ( )I II′′ ′=c c c , Ic is an 1n× vector and where f denotes the

common sampling fraction. This expression is not equal to zero, even when the

population consists of equal size clusters with homogeneous variances, and equal size

samples are taken from sample clusters. The result implies that some efficiency is lost in

prediction when collapsing wpY to wY . The predictor based on wpY will have smaller

MSE than the predictor based on wY , even in the settings considered by Stanek and

C07ed47.doc 11/29/2007 3:48 PM 20

Singer (2004) with no response error when clusters are of equal size, and equal size

samples are selected.

3.2. Predicting Linear Combinations of PSU Parameters Using Collapsing Expanded

RP Random Variables

We partition wpY into the first nN random variables corresponding to the sample,

wIY , and the remaining random variables, wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where

I I N′ ′ ′= ⊗g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 . Explicitly, the partitioned RP model is defined

by

2 1 2

IwI wI wI wI

IIwII wII wII wII

E Eξ ξ ξ

⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟

⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

XY Y Y Eμ

XY Y Y E (8).

Requiring the predictor of T to be a linear function of wIY , to be unbiased, and to have

minimum MSE, the BLUP of T in (8) is

( ) ( ) ( )1 1 1

ˆ ˆ ˆn N N

p i i s s s sI II s s s s sIi s s

N N nT c Y Y c I M w Y c I M f w Yn n= = =

−⎛ ⎞ ⎛ ⎞= − + +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ ∑ ∑ (9)

where *

1

ˆN

i is s s s sIs

Y U M w k Y=

= ∑ , 1

1ˆ ˆn

ii

Y Yn =

= ∑ , * **

* 1

111

Ns s

s s sss

k kk k d

d N k=

−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,

( )2 2

2 2 21s s

ss s se

f dk

f d N v=

+ − (from page 41, c07ed15.doc),

1

1 N

ss

k kN =

= ∑ , 1

1 N

II ii n

c cN n = +

=− ∑ ,

1

1 N

ii

c cN =

= ∑ and 1

n

s isi

I U=

= ∑ is an indicator ‘inclusion’ random variable for cluster s in

the sample (see Appendix A). An expression for the mean squared error (MSE) of the

C07ed47.doc 11/29/2007 3:48 PM 21

predictor can be developed directly using expressions for the variance, and simplifies to

(see c07ed17.doc, p15)

( ) ( ) ( )

( )

1 2

2*2 2 22 2

, 2 21 1 1

2 2

1

1 1ˆvar 2

2

n N Ns se se

p i I kd kd di s ss s

N

i I di

Nck v vT T c c

N n Nf f

Ncc Nc ncn

ξ ξ σ σ

σ

= = =

=

⎛ ⎞ ⎛ ⎞⎛ ⎞− = − − + +⎜ ⎟ ⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠ ⎝ ⎠⎡ ⎤+ + −⎢ ⎥⎣ ⎦

∑ ∑ ∑

∑

where 1

1 n

I ii

c cn =

= ∑ , 1

1 N

d ss

dN

μ=

= ∑ , *

1

1 N

kd s ss

k dN

μ=

= ∑ , ( )22 *

1

11

N

kd s s kds

k dN

σ μ=

= −− ∑ ,

( )22

1

11

N

d s ds

dN

σ μ=

= −− ∑ and ( )( )*

,1

11

N

kd d s s kd s ds

k d dN

σ μ μ=

= − −− ∑ .

When predicting a PSU mean, 1s

s

wM

= , and the predictor simplifies to

( )ˆ ˆ ˆp iT Y Y Y= + − if i n≤ , and to pT Y= when i n> , where

1

N

i is sIs

Y U Y=

= ∑ . The MSE of

the sample PSU mean predictor (when i n≤ ) simplifies to (see c07ed25.doc, page 3)

( ) ( ) ( )

( ) ( )

1 2

2* *

1 1

2*2

1

1 1 1ˆvar 1 11

1 1 1 1

N N

p s s s ss s

Ns

s ss s

nT T k kn N N

n k fnN m

ξ ξ μ μ

σ

= =

=

⎡ ⎤− ⎛ ⎞⎛ ⎞− = − − −⎢ ⎥⎜ ⎟ ⎜ ⎟−⎝ ⎠ ⎝ ⎠⎢ ⎥⎣ ⎦

⎡ ⎤+ + − −⎣ ⎦

∑ ∑

∑,

while the MSE of a PSU not in the sample is given by

( ) ( )1 2

22

1

1 1ˆvar 1N

sp s

s s

nT T fn nN mξ ξ

σσ

=

+⎛ ⎞− = + −⎜ ⎟⎝ ⎠

∑ .

4. COMPARISON OF PREDICTORS

C07ed47.doc 11/29/2007 3:48 PM 22

The predictor pT is the best linear unbiased predictor, and hence has smaller MSE

than other predictors of T in its class. We compare the MSE of (9), including the simple

mean, and predictors (3) and (4).

When clusters are of equal size, have equal variance, and sample sizes are equal,

the MSE for each predictor can be explicitly calculated. In this setting, we also compare

the results with the MSE for predictor (5). For the sample mean, the MSE is equal to

( )2

1

1 1N

ss

s s

fN m

σ=

−∑ ; for (5), the MSE is ( ) ( ) ( )2

21ˆ 1 1eRP

nMSE T f knm nσ

σ⎡ ⎤−⎛ ⎞= − + −⎜ ⎟⎢ ⎥

⎝ ⎠⎣ ⎦. The

MSE for predictors (3) and (4) are given by

( ) ( )( )2 221ˆ 1e

RPnMSE T c f f knk M

σσ⎛ ⎞−⎛ ⎞+ − − + −⎡ ⎤⎜ ⎟⎜ ⎟ ⎣ ⎦⎝ ⎠⎝ ⎠

where 2e

mcm

σσ σ

2

2=+

for (3) and

( )2

1

e

f mc f

mσ

σ σ

2

2

−= +

+ for (4) (see Stanek and Singer 2004). Although we have explicit

expressions for the MSE of these predictors, the difference is a complicated function of

the population parameters. Since shrinkage constants for the expanded predictor depend

on the cluster means, we compare the MSE relative to the expanded finite population

mixed model predictor in Figure 1 in four settings, defining the subject intra-class

correlation as 2

2 2ss

σρσ σ

=+

. In each setting, the cluster means are equal to equally

spaced quantiles from the respective distributions. The results, expressed as percent

increase in MSE relative to the MSE of (9) illustrates that in all settings considered, there

is a substantial reduction in MSE (over 40% when 0.2f < ) using (9). This is true even

C07ed47.doc 11/29/2007 3:48 PM 23

for (5), illustrating that the results of Stanek and Singer (2004) are not optimal. There

was little difference in the MSE comparisons with different distributions of cluster means.

The results illustrate that predictor (3) has higher MSE relative to the other predictors

when 0.5f > . The MSE for predictors (4) and (5) are similar, and differ more from the

MSE of (9) when f is small.

Figure 2 summarizes increases in MSE for different intra-class correlations (using

quantiles of a uniform distribution to determine cluster and subject means). The results

illustrate that for low intra-class correlations, the relative increase in MSE can be

dramatic. Once again, for low sampling fractions, similar patterns in MSE are evident for

(3), (4) and (5).

Results in Figure 3 compare predictors corresponding to the sample mean, for (4)

and (5) in two settings where cluster sizes differ. Predictor (3) is not applicable in such

settings. These results are based on simulation studies that repeat a two stage sampling

process from a finite population, with 5000 trials for each simulation. The MSE is

estimated by the average squared difference between the predictor and the latent PSU

value in each case. In the first column, cluster size differs by 10-fold, with sample sizes

for clusters proportional to the sample size. The results illustrate the performance of the

predictors for different sampling fractions. The right column in Figure 3 compares the

MSE of predictors when the sample size per cluster is constant for all clusters.

5. DISCUSSION

C07ed47.doc 11/29/2007 3:48 PM 24

The expanded finite population mixed model uses a larger set of random variables

than the random variables typically used in super-population models or used in the

finite population mixed model of Stanek and Singer These random variables are fewer

than the 2 random variables that would result from a further expansion that would

retain the identity of subjects and SSUs, and even fewer than the very general

representation of the model used by Godambe (1955). The results illustrate an

intermediate set of random variables that enable a clear representation of a two stage

sample, while accounting for details on different cluster and sample sizes. Other

approaches are conceptually flawed, and appear not to be able to connect the potentially

observable data to the random variables in the stochastic model. Since there is more than

one finite population mixed model that can be used, we have shown how different models

can be compared by considering the models in a hierarchy, and identifying whether the

additional set of orthogonal random variables adds to the information about the target

random variable. It is valuable to note that these results depend on selection of a target

random variable. For example, if there is interest in the relationship between two

variables among subjects (in a cluster), the collapsed expanded set of random variables is

most likely no longer to be sufficient.

The new model and predictor offers substantial gains over previous predictors.

These gains are likely mitigated by the need to estimate shrinkage constants for use

empirical predictors. Simulation studies comparing the performance of the empirical

predictors (3), (4), and (5) in the equal cluster size/sample size settings indicate some

loss in efficiency, but a similar ordering of MSE (San Martino, Singer, and Stanek 2007).

C07ed47.doc 11/29/2007 3:48 PM 25

Limited simulation studies have been conducted using the expanded model predictor

which have indicated that there is a greater loss in the MSE of (9) relative to the other

predictors. Iterative estimation procedures may be possible, and are currently being

investigated. This area requires more study.

C07ed47.doc 11/29/2007 3:48 PM 26

REFERENCES

Bolfarine, H., and Zacks, S. (1992), Prediction Theory for Finite Populations, New York:

Springer-Verlag.

Bryk, A.S. and Raudenbush, S.W. (2003). (2nd edition) Hierarchical linear models. Sage

Publishing, New York.

Cassel, C.M., Sarndal, C.E., and Wretman, J.H. (1977), Foundations of Inference in

Survey Sampling, New York: Wiley.

Cochran, W. (1977), Survey Sampling, New York: Wiley.

Demidenko, E. (2004). Mixed models: Theory and Application, John Wiley, New York.

Deville, J.C., and Sarndal, C.E. (1992), “Calibration Estimation in Survey Sampling,”

Journal of the American Statistical Association, 87, 376-382.

Diggle, P. L., Heagerty, P., Liang, K. Y. and Zeger, S. (2002), Analysis of Longitudinal

Data, Oxford University Press.

C07ed47.doc 11/29/2007 3:48 PM 27

Ghosh, M. and Lahiri, P. (1987), “Robust Empirical Bayes Estimation of Means from

Stratified Samples,” Journal of the American Statistical Association, 82,1153-1162.

Goldberger, A. S. (1962), “Best Linear Unbiased Prediction in the Generalized Linear

Regression Model,” Journal of the American Statistical Association, 57, 369-375.

Graybill, F. A. (1983), Matrices with applications in statistics, Belmont, California:

Wadsworth International.

Henderson, C.R. (1984), Applications of Linear Models in Animal Breeding, Guelph,

Canada: University of Guelph.

Henderson, C. R., Kempthorne, O., Searle, S. R. and von Krosigk , C. M., (1959), “The

Estimation of Environmental and Genetic Trends from Records Subject to Culling,”

Biometrics, 15, 192-218.

Hinkelmann, K., and Kempthorne, O. (1994), Design and Analysis of Experiments, Vol. 1,

Introduction to Experimental Designs, New York: Wiley.

Li, W. (2003), “Use of random Permutation Model in rate Standardization and

Calibration,” unpublished doctoral thesis, University of Massachusetts, Massachusetts.

C07ed47.doc 11/29/2007 3:48 PM 28

McCulloch, C. E. and Searle, S. R. (2001), Generalized, Linear, and Mixed Models, New

York: John Wiley and Sons.

McLean, R. A., Sanders, W. L. and Stroup, W. W. (1991), “A Unified Approach to

Mixed Linear Models,” The American Statistician, 45(1), 54-64.

Ockene, I. S., Hebert, J. R., Ockene, J. K., Saperia, G. M., Nicolosi, R., Merriam, P.A.

and Hurley, T. G. (1999), “Effect of Physician-delivered Nutrition Counseling Training

and an Office-support Program on Saturated Fat Intake, Weight, and Serum Lipid

Measurements in a Hyperlipidemic Population: Worcester Area Trial for Counseling in

Hyperlipidemia (WATCH),” Archives of Internal Medicine, Apr 12, 1 59(7), 725-731.

Rao, J.N.K. (2003), Small Area Estimation, New York: Wiley.

Robinson, G. K. (1991). “That BLUP is a Good Thing: the Estimation of Random

Effects,” Statistical Science, 6(1), 15-51.

Royall, R. M. (1976), “The Linear Least-squares Prediction Approach to Two-stage

Sampling,” Journal of the American Statistical Association , 71, 657-664.

C07ed47.doc 11/29/2007 3:48 PM 29

Sarndal, C-E, Swensson, B., and Wretman, J. (1992), Model Assisted Survey Sampling,

New York: Springer-Verlag.

Scott, A. and T. M. F. Smith (1969), “Estimation in Multi-stage Surveys,” Journal of the

American Statistical Association, 64(327), 830-840.

Searle, S. R., Casella, G. and McCulloch, C. E. (1992), Variance Components. New

York: Wiley.

Stanek, E. J. III and Singer, J. M. (2004), “Predicting Random Effects from Finite

Population Clustered Samples with Response Error,” Journal of the American Statistical

Association, 99, 119-130.

Valliant, R., Dorfman, H. A. and Royall, R. M. (2000), Finite Population Sampling and

Inference, New York: Wiley.

Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data,

New York: Springer-Verlag.

C07ed47.doc 11/29/2007 3:48 PM 30

APPENDIX A.

We partition wpY into the first nN random variables corresponding to the sample, wIY ,

and the remaining random variables, wIIY to predict I wI II wIIT ′ ′= +g Y g Y , where

I I N′ ′ ′= ⊗g c 1 and ( )II II N N′ ′ ′ ′ ′= ⊗ ⊗g c 1 c 1 . Explicitly, the partitioned RP model is defined by

2 1 2

IwI wI wI wI

IIwII wII wII wII

E Eξ ξ ξ

⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎛ ⎞= + − +⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟

⎢ ⎥⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦

XY Y Y Eμ

XY Y Y E

where 1

1 N

I n s ssw m

N =

⎡ ⎤⎛ ⎞= ⊗ ⊕⎜ ⎟⎢ ⎥⎝ ⎠⎣ ⎦X 1 and

( )

1

1

1

1

N

N n s ss

II N

N s s ss

w mN

w M mN

− =

=

⎛ ⎞⎛ ⎞⊗ ⊕⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟=⎜ ⎟⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠

1X

1, random effects

are given by

( )( )

( )( )

( ) ( )

1

12 1 2

1

1

1

11

N

n s s I Is

NwI wI

N n s s II IIswII wII

N

N s ss

f d vec E

f d vec EE E

f d vec E

ξ

ξξ ξ ξ

ξ

=

− =

=

⎛ ⎞⎛ ⎞⊗ ⊕ −⎜ ⎟⎜ ⎟⎝ ⎠⎜ ⎟⎜ ⎟⎡ ⎤ ⎛ ⎞⎛ ⎞ ⎛ ⎞ ⊗ ⊕ −⎜ ⎟− = ⎜ ⎟⎢ ⎥⎜ ⎟ ⎜ ⎟ ⎝ ⎠⎢ ⎥ ⎜ ⎟⎝ ⎠ ⎝ ⎠⎣ ⎦⎜ ⎟⎛ ⎞ ⎡ ⎤⊗ ⊕ − −⎜ ⎟⎜ ⎟⎣ ⎦⎜ ⎟⎝ ⎠⎝ ⎠

I U U

Y Y I U UY Y

I U U

,

where 2

wI wI wI

wII wII wII

Eξ

⎛ ⎞ ⎛ ⎞ ⎛ ⎞= −⎜ ⎟ ⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠ ⎝ ⎠

E Y YE Y Y

, and ( )I II=U U U ,

( )( ) ( )1 2I i n= =U U U U U and ( )( ) ( )1 2II i n n N+ += =U U U U U . The variance is given by (see page 5 of c07ed27.doc)

( )( )

( ) ( ) ( )

1 1 1 1

1 1 1 1

11var

1 1 1 1

N N N N

N s s N s s N s s N s ss s s s

wp N N N N

N s s N s s N s s N s ss s s s

f d f d f d f d

N f d f d f d f d

= = = =

= = = =

⎛ ⎞⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ ⊕ ⊗ ⊕ ⊕ −⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎢ ⎥⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎜ ⎟= ⎜ ⎟− ⎡ ⎤ ⎡ ⎤⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⊗ ⊕ − ⊕ ⊗ ⊕ − ⊕ −⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎢ ⎥⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠

+

P P P PY

P P P P

I2

1

NN NsesN N

v=

−⎛ ⎞ ⎛ ⎞⊗ ⊕⎜ ⎟ ⎜ ⎟⎜ ⎟− ⎝ ⎠⎝ ⎠

II I

which we partition such that 1 2

,

,

var I I IIwI

I II IIwIIξ ξ

⎛ ⎞ ⎛ ⎞=⎜ ⎟ ⎜ ⎟′⎝ ⎠⎝ ⎠

V VYV VY

, where

( )2 2

2 1 s s sse s s

M wv f fNσ

= − (from page 41, c07ed15.doc).

C07ed47.doc 11/29/2007 3:48 PM 31

We develop an expression for the best linear unbiased predictor of T that is a

linear function of the sample, wI′L Y . Since ( ) wIwI I II

wII

T⎛ ⎞

′ ′ ′ ′− = − − ⎜ ⎟⎝ ⎠

YL Y L g g

Y, the

unbiased constraint given by ( ) 0I I II II′ ′ ′− − =L g X g X . Minimizing ( )1 2var wI Tξ ξ ′ −L Y

while accounting for the unbiased constraint using Lagrange multipliers results in the

familiar solution,

( ) ( )1 11 1 1 1 1 1

,ˆ

I I I I I I I I I I II II I I I I I II II

− −− − − − − −⎡ ⎤′ ′ ′ ′= + − +⎢ ⎥⎣ ⎦

L g V V X X V X X V V g V X X V X X g .

This result simplifies to (see c06ed56.doc, p42 and c07ed01.doc p1)

( )*

1 1

1ˆ N Ns

n I N n N II n Ns ss s

k N N nc cf n f n= =

⎛ ⎞ ⎡ ⎤⎡ ⎤ ⎛ ⎞ −= ⊗ ⊕ + ⊗ ⊕ + ⊗⎜ ⎟ ⎢ ⎥⎜ ⎟⎢ ⎥⎜ ⎟ ⎝ ⎠⎣ ⎦ ⎣ ⎦⎝ ⎠

L P c 1 1 1 1 1

where * **

* 1

111

Ns s

s s sss

k kk k d

d N k=

−⎛ ⎞= − ⎜ ⎟−⎝ ⎠∑ ,

( )2 2

2 2 21s s

ss s se

f dk

f d N v=

+ − (from page 41,

c07ed15.doc), 1

1 N

ss

k kN =

= ∑ , 1

1 N

II ii n

c cN n = +

=− ∑ and

1

1 N

ii

c cN =

= ∑ . The predictor

ˆ ˆp wIT ′= L Y can be expressed as (see page 6 of c07ed01.doc)

( ) ( ) ( )1 1 1

ˆ ˆ ˆn N N

p i i s s s sI II s s s s sIi s s

N N nT c Y Y c I M w Y c I M f w Yn n= = =

−⎛ ⎞ ⎛ ⎞= − + +⎜ ⎟ ⎜ ⎟

⎝ ⎠ ⎝ ⎠∑ ∑ ∑

where *

1

ˆN

i is s s s sIs

Y U M w k Y=

= ∑ , 1

1ˆ ˆn

ii

Y Yn =

= ∑ , and 1

n

s isi

I U=

= ∑ is an indicator ‘inclusion’

random variable for cluster s in the sample.

C07ed47.doc 11/29/2007 3:48 PM 32

An expression for the mean squared error (MSE) of the predictor can be

developed directly using expressions for the variance, and simplify to (see c07ed17.doc,

p15)

( ) ( ) ( )

( )

1 2

2*2 2 22 2

, 2 21 1 1

2 2

1

1 1ˆvar 2

2

n N Ns se se

p i I kd kd di s ss s

N

i I di

Nck v vT T c c

N n Nf f

Ncc Nc ncn

ξ ξ σ σ

σ

= = =

=

⎛ ⎞ ⎛ ⎞⎛ ⎞− = − − + +⎜ ⎟ ⎜ ⎟⎜ ⎟

⎝ ⎠⎝ ⎠ ⎝ ⎠⎡ ⎤+ + −⎢ ⎥⎣ ⎦

∑ ∑ ∑

∑

where 1

1 n

I ii

c cn =

= ∑ , 1

1 N

d ss

dN

μ=

= ∑ , *

1

1 N

kd s ss

k dN

μ=

= ∑ , ( )22 *

1

11

N

kd s s kds

k dN

σ μ=

= −− ∑ ,

( )22

1

11

N

d s ds

dN

σ μ=

= −− ∑ and ( )( )*

,1

11

N

kd d s s kd s ds

k d dN

σ μ μ=

= − −− ∑ .

Predicting Random Effects for Different Size Clusters ... · Julio M. Singer Departamento de...

Documents

Transcript of Predicting Random Effects for Different Size Clusters ... · Julio M. Singer Departamento de...