Prediction Theory 1 Introduction - Animal Biosciences ...lrs/ABModels/NOTES/predict.pdf · Roy...

26
Prediction Theory 1 Introduction Best Linear Unbiased Prediction (BLUP) was developed for animal breeding by Dr. Charles Roy Henderson around 1949. The methods were first applied to genetic evaluation of dairy sires in the northeastern United States in 1970. BLUP is used in nearly all countries and all livestock species for genetic evaluation of individual animals. DEFINITION: Prediction is the estimation of the realized value of a random variable (from data) that has been sampled from a population with a known variance-covariance structure. 2 General Linear Mixed Model 2.1 Equation of the Model y = Xb + Zu + e where y is an N × 1 vector of observations, b is a p × 1 vector of unknown constants, u is a q × 1 vector of unknown effects of random variables, e is an N × 1 vector of unknown residual effects, X, Z are known matrices of order N × p and N × q respectively, that relate elements of b and u to elements of y. The elements of b are considered to be fixed effects while the elements of u are the random effects from populations of random effects with known variance-covariance structures. Both b and u may be partitioned into one or more factors depending on the situation. 2.2 Expectations and Covariance Structure The expectations of the random variables are E(u) = 0 1

Transcript of Prediction Theory 1 Introduction - Animal Biosciences ...lrs/ABModels/NOTES/predict.pdf · Roy...

Prediction Theory

1 Introduction

Best Linear Unbiased Prediction (BLUP) was developed for animal breeding by Dr. CharlesRoy Henderson around 1949. The methods were first applied to genetic evaluation of dairy siresin the northeastern United States in 1970. BLUP is used in nearly all countries and all livestockspecies for genetic evaluation of individual animals.

DEFINITION: Prediction is the estimation of the realized value of a random variable (fromdata) that has been sampled from a population with a known variance-covariance structure.

2 General Linear Mixed Model

2.1 Equation of the Model

y = Xb + Zu + e

where

y is an N × 1 vector of observations,

b is a p× 1 vector of unknown constants,

u is a q × 1 vector of unknown effects of random variables,

e is an N × 1 vector of unknown residual effects,

X,Z are known matrices of order N × p and N × q respectively, that relate elements of b andu to elements of y.

The elements of b are considered to be fixed effects while the elements of u are the randomeffects from populations of random effects with known variance-covariance structures. Both band u may be partitioned into one or more factors depending on the situation.

2.2 Expectations and Covariance Structure

The expectations of the random variables are

E(u) = 0

1

E(e) = 0

E(y) = E(Xb + Zu + e)= E(Xb) + E(Zu) + E(e)= XE(b) + ZE(u) + E(e)= Xb + Z0 + 0

= Xb

and the variance-covariance structure is typically represented as

V

(ue

)=

(G 00 R

),

where G and R are known, positive definite matrices. Consequently,

V ar(y) = V ar(Xb + Zu + e)= V ar(Zu + e)= ZV ar(u)Z′ + V ar(e) + ZCov(u, e) + Cov(e,u)Z′

= ZGZ′ + R, andCov(y,u) = ZG

Cov(y, e) = R

If u is partitioned into s factors as

u′ =(

u′1 u′2 · · · u′s),

then

V ar(u) = V ar

u1

u2...

us

=

G11 G12 · · · G1s

G′12 G22 · · · G2s...

.... . .

...G′1s G′2s · · · Gss

.Each Gij is assumed to be known.

3 Predictors

The problem is to predict the function

K′b + M′u,

provided that K′b is an estimable function.

2

3.1 Best Predictor

The best predictor, for any type of model, requires knowledge of the distribution of the randomvariables as well as the moments of that distribution. Then, the best predictor is the conditionalmean of the predictor given the data vector, i.e.

E(K′b + M′u|y)

which is unbiased and has the smallest mean squared error of all predictors (Cochran 1951).The computational form of the predictor depends on the distribution of y. The computationalform could be linear or nonlinear. The word best means that the predictor has the smallestmean squared error of all predictors of K′b + M′u.

3.2 Best Linear Predictor

The best predictor may be linear OR nonlinear. Nonlinear predictors are often difficult tomanipulate or to derive a feasible solution. The predictor could be restricted to class of linearfunctions of y. Then, the distributional form of y does not need to be known, and only the first(means) and second (variances) moments of y must be known. If the first moment is Xb andthe second moment is V ar(y) = V, then the best linear predictor is

E(K′b + M′u) = K′b + C′V−1(y −Xb)

whereC′ = Cov(K′b + M′u,y).

When y has a multivariate normal distribution, then the best linear predictor (BLP) is thesame as the best predictor (BP). The BLP has the smallest mean squared error of all linearpredictors of K′b + M′u.

3.3 Best Linear Unbiased Predictor

In general, the first moment of y, namely Xb, is not known, but V, the second moment, iscommonly assumed to be known. Then predictors can be restricted further to those that arelinear and also unbiased. The best linear unbiased predictor is

K′b + C′V−1(y −Xb)

whereb = (X′V−1X)−X′V−1y,

and C and V are as before.

This predictor is the same as the BLP except that b has replaced b in the formula. Notethat b is the GLS estimate of b. Of all linear, unbiased predictors, BLUP has the smallestmean squared error. However, if y is not normally distributed, then nonlinear predictors ofK′b + M′u could potentially exist that have smaller mean squared error than BLUP.

3

4 Derivation of BLUP

4.1 Predictand and Predictor

DEFINITION: The predictand is the function to be predicted, in this case

K′b + M′u.

DEFINITION: The predictor is the function to predict the predictand, a linear function of y,i.e. L′y, for some L.

4.2 Requiring Unbiasedness

Equate the expectations of the predictor and the predictand to determine what needs to be truein order for unbiasedness to hold. That is,

E(L′y) = L′Xb

E(K′b + M′u) = K′b

then to be unbiased for all possible vectors b,

L′X = K′

orL′X−K′ = 0.

4.3 Variance of Prediction Error

The prediction error is the difference between the predictor and the predictand. The covariancematrix of the prediction errors is

V ar(K′b + M′u− L′y) = V ar(M′u− L′y)= M′V ar(u)M + L′V ar(y)L−M′Cov(u,y)L− L′Cov(y,u)M

= M′GM + L′VL−M′GZ′L− L′ZGM

= V ar(PE)

4

4.4 Function to be Minimized

Because the predictor is required to be unbiased, then the mean squared error is equivalentto the variance of prediction error. Combine the variance of prediction error with a LaGrangeMultiplier to force unbiasedness to obtain the matrix F, where

F = V ar(PE) + (L′X−K′)Φ.

Minimization of the diagonals of F is achieved by differentiating F with respect to the un-knowns, L and Φ, and equating the partial derivatives to null matrices.

∂F∂L

= 2VL− 2ZGM + XΦ = 0

∂F∂Φ

= X′L−K = 0

Let θ = .5Φ, then the first derivative can be written as

VL = ZGM−Xθ

then solve for L as

V−1VL = L

= V−1ZGM−V−1Xθ.

Substituting the above for L into the second derivative, then we can solve for θ as

X′L−K = 0

X′(V−1ZGM−V−1Xθ)−K = 0

X′V−1Xθ = X′V−1ZGM−K

θ = (X′V−1X)−(X′V−1ZGM−K)

Substituting this solution for θ into the equation for L gives

L′ = M′GZ′V−1 + K′(X′V−1X)−X′V−1

−M′GZ′V−1X(X′V−1X)−X′V−1.

Letb = (X′V−1X)−X′V−1y,

then the predictor becomes

L′y = K′b + M′GZ′V−1(y −Xb)

which is the BLUP of K′b + M′u, and b is a GLS solution for b. A special case for this predictorwould be to let K′ = 0 and M′ = I, then the predictand is K′b + M′u = u, and

L′y = u = GZ′V−1(y −Xb).

Hence the predictor of K′b + M′u is(

K′ M′)

times the predictor of(

b′ u′)′

which is(b′ u′

)′.

5

5 Variances of Predictors

Let

P = (X′V−1X)−

b = PX′V−1y

then

u = GZ′V−1(y −XPX′V−1y)= GZ′V−1Wy

for W = (I−XPX′V−1). From the results on generalized inverses of X,

XPX′V−1X = X,

and therefore,

WX = (I−XPX′V−1)X= X−XPX′V−1X

= X−X = 0.

The variance of the predictor is,

V ar(u) = GZ′V−1W(V ar(y))W′V−1ZG

= GZ′V−1WVW′V−1ZG

= GZ′V−1ZG−GZ′V−1XPX′V−1ZG.

The covariance between b and u is

Cov(b, u) = PX′V−1V ar(y)W′V−1ZG

= PX′W′V−1ZG

= 0 because X′W = 0

Therefore, the total variance of the predictor is

V ar(K′b + M′u) = K′PK + M′GZ′V−1ZGM

−M′GZ′V−1XPX′V−1ZGM.

6 Variance of Prediction Error

The main results are

V ar(b− b) = V ar(b) + V ar(b)− Cov(b,b)− Cov(b, b)= V ar(b)= P.

V ar(u− u) = V ar(u) + V ar(u)− Cov(u,u)− Cov(u, u),

6

where

Cov(u,u) = GZ′V−1WCov(y,u)= GZ′V−1WZG

= GZ′(V−1 −V−1XPX′V−1)ZG

= V ar(u)

so that

V ar(u− u) = V ar(u) + G− 2V ar(u)= G− V ar(u).

Also,

Cov(b, u− u) = Cov(b, u)− Cov(b,u)= 0−PX′V−1ZG.

7 Mixed Model Equations

The covariance matrix of y is V which is of order N . N is usually too large to allow V tobe inverted. The BLUP predictor has the inverse of V in the formula, and therefore, wouldnot be practical when N is large. Henderson(1949) developed the mixed model equations forcomputing BLUP of u and the GLS of b. However, Henderson did not publish a proof of theseproperties until 1963 with the help of S. R. Searle, which was one year after Goldberger (1962).

Take the first and second partial derivatives of F,(V XX′ 0

)(Lθ

)=

(ZGM

K

)

Recall that V = V ar(y) = ZGZ′ + R, and let

S = G(Z′L−M)

which when re-arranged givesM = Z′L−G−1S,

then the previous equations can be re-written as R X ZX′ 0 0Z′ 0 −G−1

L

θS

=

0KM

.

Take the first row of these equations and solve for L, then substitute the solution for L intothe other two equations.

L = −R−1Xθ −R−1ZS

7

and

−(

X′R−1X X′R−1ZZ′R−1X Z′R−1Z + G−1

)(θS

)=

(KM

).

Let a solution to these equations be obtained by computing a generalized inverse of(X′R−1X X′R−1ZZ′R−1X Z′R−1Z + G−1

)

denoted as (Cxx Cxz

Czx Czz

),

then the solutions are (θS

)= −

(Cxx Cxz

Czx Czz

)(KM

).

Therefore, the predictor is

L′y =(

K′ M′)( Cxx Cxz

Czx Czz

)(X′R−1yZ′R−1y

)

=(

K′ M′)( b

u

),

where b and u are solutions to(X′R−1X X′R−1ZZ′R−1X Z′R−1Z + G−1

)(bu

)=

(X′R−1yZ′R−1y

).

The equations are known as Henderson’s Mixed Model Equations or MME. The equations areof order equal to the number of elements in b and u, which is usually much less than the numberof elements in y, and therefore, are more practical to solve. Also, these equations require theinverse of R rather than V, both of which are of the same order, but R is usually diagonal orhas a more simple structure than V. Also, the inverse of G is needed, which is of order equalto the number of elements in u. The ability to compute the inverse of G depends on the modeland the definition of u.

The MME are a useful computing algorithm for obtaining BLUP of K′b + M′u. Please keepin mind that BLUP is a statistical procedure such that if the conditions for BLUP are met,then the predictor has the smallest mean squared error of all linear, unbiased predictors. Theconditions are that the model is the true model and the variance-covariance matrices of therandom variables are known without error.

In the strictest sense, all models approximate an unknown true model, and the variance-covariance parameters are usually guessed, so that there is never a truly BLUP analysis of data,except possibly in simulation studies.

8

8 Equivalence Proofs

The equivalence of the BLUP predictor to the solution from the MME was published by Hen-derson in 1963. In 1961 Henderson was in New Zealand (on sabbatical leave) visiting ShayleSearle learning matrix algebra and trying to derive the proofs in this section. Henderson neededto prove that

V−1 = R−1 −R−1ZTZ′R−1

whereT = (Z′R−1Z + G−1)−1

andV = ZGZ′ + R.

Henderson says he took his coffee break one day and left the problem on Searle’s desk, and whenhe returned from his coffee break the proof was on his desk.

VV−1 = (ZGZ′ + R)(R−1 −R−1ZTZ′R−1)= ZGZ′R−1 + I− ZGZ′R−1ZTZ′R−1

−ZTZ′R−1

= I + (ZGT−1 − ZGZ′R−1Z− Z)TZ′R−1

= I + (ZG(Z′R−1Z + G−1)−ZGZ′R−1Z− Z)TZ′R−1

= I + (ZGZ′R−1Z + Z

−ZGZ′R−1Z− Z)TZ′R−1

= I + (0)TZ′R−1

= I.

Now take the equation for u from the MME

Z′R−1Xb + (Z′R−1Z + G−1)u = Z′R−1y

which can be re-arranged as

(Z′R−1Z + G−1)u = Z′R−1(y −Xb)

oru = TZ′R−1(y −Xb).

The BLUP formula wasu = GZ′V−1(y −Xb).

9

Then

GZ′V−1 = GZ′(R−1 −R−1ZTZ′R−1)= (GZ′R−1 −GZ′R−1ZTZ′R−1)= (GT−1 −GZ′R−1Z)TZ′R−1

= (G(Z′R−1Z + G−1)−GZ′R−1Z)TZ′R−1

= TZ′R−1.

Similarly, the MME solution for u and substituting it into the first equation in the MMEgives

X′R−1Xb + X′R−1Z(TZ′R−1(y −Xb)) = X′R−1y.

Combine the terms in b and y to give

X′(R−1 −R−1ZTZ′R−1)Xb = X′(R−1 −R−1ZTZ′R−1)y,

which are the same as the GLS equations,

X′V−1Xb = X′V−1y.

Goldberger (1962) published these results before Henderson (1963), but Henderson knew ofthese equivalences back in 1949 through numerical examples. After he discovered Goldberger’spaper (sometime after his retirement) Henderson insisted on citing it along with his work. Mostpeople in animal breeding, however, refer to Henderson as the originator of this work and itsprimary proponent.

9 Variances of Predictors and Prediction Errors From MME

The covariance matrices of the predictors and prediction errors can be expressed in terms of thegeneralized inverse of the coefficient matrix of the MME, C. Recall that(

bu

)=

(Cxx Cxz

Czx Czz

)(X′R−1

Z′R−1

)y,

or asb = C′by,

andu = C′uy.

If the coefficient matrix of the MME is full rank (or a full rank subset) (to simplify thepresentation of results), then(

Cxx Cxz

Czx Czz

)(X′R−1X X′R−1ZZ′R−1X Z′R−1Z + G−1

)=

(I 00 I

),

10

which gives the result that(Cxx Cxz

Czx Czz

)(X′R−1X X′R−1ZZ′R−1X Z′R−1Z

)

=

(I −CxzG−1

0 I−CzzG−1

).

This last result is used over and over in deriving the remaining results.

Now,

V ar(b) = V ar(C′by)= C′bV ar(y)Cb

= C′b(ZGZ′ + R)Cb

=(

Cxx Cxz

)( X′R−1

Z′R−1

)(ZGZ′ + R)Cb

=(

Cxx Cxz

)( X′R−1ZZ′R−1Z

)GZ′Cb

+(

Cxx Cxz

)( X′

Z′

)Cb

= −CxzG−1GZ′Cb

+(

Cxx Cxz

)( X′R−1X X′R−1ZZ′R−1X Z′R−1Z

)(Cxx

Czx

)= CxzG−1Czx

+(

I −CxzG−1)( Cxx

Czx

)= CxzG−1Czx + Cxx −CxzG−1Czx

= Cxx.

The remaining results are derived in a similar manner. These give

V ar(u) = Cu′V ar(y)Cu

= G−Czz

Cov(b, u) = 0

V ar(u− u) = V ar(u) + V ar(u)− Cov(u,u)− Cov(u, u)= V ar(u)− V ar(u)= G− (G−Czz)= Czz

11

Cov(b, u− u) = Cov(b,u)= Cxz

In matrix form, the variance-covariance matrix of the predictors is

V ar

(bu

)=

(Cxx 00 G−Czz

),

and the variance-covariance matrix of prediction errors is

V ar

(b

u− u

)=

(Cxx Cxz

Czx Czz

).

As the number of observations in the analysis increases, two things can be noted from theseresults:

1. V ar(u) increases in magnitude towards a maximum of G, and

2. V ar(u− u) decreases in magnitude towards a minimum of 0.

10 Hypothesis Testing

When G and R are assumed known, as in BLUP, then the solutions for b from the MMEare BLUE and tests of hypotheses that use these solutions are best. Tests involving u areunnecessary because when G and R have been assumed to be known, then the variation dueto the random factors has already been assumed to be different from zero. The general linearhypothesis procedures are employed as in the fixed effects model. The null hypothesis is

(H′o 0

)( bu

)= c

orH′ob = c,

where H′ob must be an estimable function of b and H′o must have full row rank. H′ob isestimable if

H′o(

Cxx Cxz

)( X′R−1XZ′R−1X

)= H′o.

The test statistic iss = (H′ob− c)′(H′oCxxHo)−1(H′ob− c)

with r(H′o) degrees of freedom, and the test is

F = (s/r(H′o))/σ2e ,

whereσ2

e = (y′R−1y − b′X′R−1y − u′Z′R−1y)/(N − r(X)).

12

The degrees of freedom for F are r(H′o) and (N − r(X)). Note that

y′R−1y − b′X′R−1y − u′Z′R−1y = y′V−1y − b′X′V−1y.

If G and R are not known, then there is no best test because BLUE of b is not possible.Valid tests exist only under certain circumstances. If estimates of G and R are used to constructthe MME, then the solution for b is not BLUE and the resulting tests are only approximate.

If the estimate of G is considered to be inappropriate, then a test of H′ob = c can beconstructed by treating u as a fixed factor, assuming that H′ob is estimable in the model withu as fixed. That is, (

bu

)=

(X′R−1X X′R−1ZZ′R−1X Z′R−1Z

)−(X′R−1yZ′R−1y

),

=

(Pxx Pxz

Pzx Pzz

)(X′R−1yZ′R−1y

),

and

σ2e = (y′R−1y − b′X′R−1y − u′Z′R−1y)/(N − r

(X Z

)),

s = (H′ob− c)′(H′oPxxHo)−1(H′ob− c),F = (s/r(H′o))/σ2

e .

11 Restrictions on Fixed Effects

There may be functions of b that are known and this knowledge should be incorporated into theestimation process. For example, in beef cattle, male calves of a particular breed are known toweigh 25 kg more than female calves of the same breed at 200 days of age. By incorporating adifference of 25 kg between the sexes in an analysis then all other estimates of fixed and randomeffects would be changed accordingly and also their variances.

Let B′b = d be the restriction to be placed on b, then the appropriate equations would be X′R−1X X′R−1Z BZ′R−1X Z′R−1Z + G−1 0B′ 0 0

b

=

X′R−1yZ′R−1y

d

.Because B′b = d is any general function, then there are three possible effects of this functionon the estimability of K′b in the model. The conditions on B′ are that it

1. must have full row rank, and

2. must not have more than r(X) rows.

13

11.1 B′b is an estimable function

If B′b represents a set of estimable functions of b in the original model, then

1. the estimability of b is unchanged, and

2. the modified equations above do not have an inverse.

11.2 B′b is not an estimable function

If B′b represents a set of non-estimable functions of b with (p − r(X)) rows, where p is thenumber of columns of X, then

1. b is estimable as if X was full column rank, and

2. the modified equations above have a unique inverse.

11.3 B′b is not an estimable function

If B′b represents a set of non-estimable functions of b with fewer than (p− r(X)) rows, and ifwe let (

P11 P12

P21 P22

)=

(X′V−1X B

B′ 0

)−then K′b is estimable if (

K′ 0)( P11 P12

P21 P22

)(X′V−1X B

B′ 0

)

=(

K′ 0).

The modified MME do not have a unique inverse in this situation.

12 Restricted BLUP

BLUP is commonly applied to models to evaluate the genetic merit of livestock in order to makedecisions on culling and breeding of animals. In these cases, an objective of selection might beto improve the performance of animals for one trait while leaving another trait unchanged. Inmatrix notation, we might have two functions,

K′1b + M′1u and K′2b + M′

2u,

14

representing the vectors of elements upon which selection decisions are to be made. One tech-nique of achieving the objective is to force the covariance between the predictor of one functionwith the predictand of the other function to be zero. A zero covariance would result in nocorrelated response in K′2b + M′

2u as a consequence of selecting on L′1y, provided y has amultivariate normal distribution. The covariance matrix of concern is

Cov(L′1y,K′2b + M′2u) = L′1ZGM2.

Therefore, in deriving L′1 we must add another LaGrange Multiplier to F to give

F = V ar(L′1y −K′1b−M′1u) + (L′1X−K′1)Φ + L′1ZGM2ϕ.

Minimize the diagonals of F with respect to L1,Φ, and ϕ, and equate the partial derivatives tonull matrices. The resulting modified MME would be X′R−1X X′R−1Z X′R−1ZGM2

Z′R−1X Z′R−1Z + G−1 Z′R−1ZGM2

M′2GZ′R−1X M′

2GZ′R−1Z M′2GZ′R−1ZGM2

b

ut

=

X′R−1yZ′R−1yM′

2GZ′R−1y

.Let a generalized inverse of the coefficient matrix be C11 C12 C13

C21 C22 C23

C31 C32 C33

,then the following results may be derived:

V ar(b) = C11,

V ar(u− u) = C22,

V ar(u) = G−C22,

Cov(b, u) = 0,

Cov(u,M′2u) = 0,

Cov(b,M′2u) = 0,

Cov(b, u− u) = C12,

M′2u = 0.

Another technique of obtaining the same result is to compute b and u in the usual mannerfrom the MME, then derive the appropriate weights to apply, in K1 and M1, such that

Cov(K′1b + M′1u,M′

2u) = 0,

for a given M2.

15

13 Singular G

By definition, variance-covariance matrices should always be nonsingular. In particular, G andR should be nonsingular because the MME utilize the inverse of these matrices to obtain BLUP.The matrix V must always be nonsingular, but there may be cases when either G or R may besingular.

Consider the case where G is singular, and therefore G does not have an inverse. The BLUPof u is unaffected since the inverse of G is not needed, but in the MME there is a problem.Harville (1976) and Henderson(1973) suggest pre-multiplying the last equation of the MME byG to give (

X′R−1X X′R−1ZGZ′R−1X GZ′R−1Z + I

)(bu

)=

(X′R−1y

GZ′R−1y

).

A disadvantage of these equations is that the coefficient matrix is no longer symmetric, andsolving the equations by Gauss-Seidel iteration may be slow to achieve convergence, if thesolutions converge at all. Also, the variance-covariance matrix of prediction errors has to beobtained as follows:

V ar

(b

u− u

)=

(X′R−1X X′R−1ZGZ′R−1X GZ′R−1Z + I

)−(I 00 G

).

The equations could be made symmetric as follows:(X′R−1X X′R−1ZGGZ′R−1X GZ′R−1ZG + G

)(bα

)=

(X′R−1y

GZ′R−1y

),

whereu = Gα,

and the variance-covariance matrix of prediction errors is calculated as

V ar

(b

u− u

)=

(I 00 G

)C

(I 00 G

),

where C represents a generalized inverse of the coefficient matrix in the symmetric set of equa-tions.

14 Singular R

When R is singular, the MME can not be used to compute BLUP. However, the calculation ofL′ can still be used and the results given earlier on variances of predictors and prediction errorsstill holds. The disadvantage is that the inverse of V is needed and may be too large to solve.

16

Another alternative might be to partition R and y into a full rank subset and analyze that partignoring the linearly dependent subset. However, the solutions for b and u may be dependenton the subsets that are chosen, unless X and Z may be partitioned in the same manner as R.

Singular R matrices do not occur frequently with continuously distributed observations, butdo occur with categorical data where the probabilities of observations belonging to each categorymust sum to one.

15 When u and e are correlated

Nearly all applications of BLUP have been conducted assuming that Cov(u, e) = 0, but supposethat Cov(u, e) = T so that

V ar(y) = ZGZ′ + R + ZT′ + TZ′.

A solution to this problem is to use an equivalent model where

y = Xb + Wu + ε

forW = Z + TG−1

and

V ar

(uε

)=

(G 00 B

)where B = R−TG−1T′, and consequently,

V ar(y) = WGW′ + B

= (Z + TG−1)G(Z′ + G−1T′) + (R−TG−1T′)= ZGZ′ + ZT′ + TZ′ + R

The appropriate MME for the equivalent model are(X′B−1X X′B−1WW′B−1X W′B−1W + G−1

)(bu

)=

(X′B−1yW′B−1y

).

The inverse of B can be written as

B−1 = R−1 −R−1T(G−T′R−1T)−1T′R−1,

but this form may not be readily computable.

The biggest difficulty with this type of problem is to define T = Cov(u, e), and then toestimate the values that should go into T. A model with a non-zero variance- covariance matrixbetween u and e can be re-parameterized into an equivalent model containing u and ε whichare uncorrelated.

17

16 G and R Unknown

For BLUP an assumption is that G and R are known without error. In practice this assumptionalmost never holds. Usually the proportional relationships among parameters in these matrices(i.e. such as heritabilities and genetic correlations) are known. In some cases, however, both Gand R may be unknown, then linear unbiased estimators of b and u may exist, but these maynot necessarily be best.

Unbiased estimators of b exist even if G and R are unknown. Let H be any nonsingular,positive definite matrix, then

K′bo = K′(X′H−1X)−X′H−1y = K′CX′H−1y

represents an unbiased estimator of K′b, if estimable, and

V ar(K′bo) = K′CX′H−1VH−1XCK.

This estimator is best when H = V. Some possible matrices for H are I, diagonals of V,diagonals of R, or R itself.

The u part of the model has been ignored in the above. Unbiased estimators of K′b can alsobe obtained from (

X′H−1X X′H−1ZZ′H−1X Z′H−1Z

)(bo

uo

)=

(X′H−1yZ′H−1y

)provided that K′b is estimable in a model with u assumed to be fixed. Often the inclusion of uas fixed changes the estimability of b.

If G and R are replaced by estimates obtained by one of the usual variance componentestimation methods, then use of those estimates in the MME yield unbiased estimators of b andunbiased predictors of u, provided that y is normally distributed (Kackar and Harville, 1981).Today, Bayesian methods are applied using Gibbs sampling to simultaneously estimate G andR, and to estimate b and u.

17 Example 1

Below are data on progeny of three sires distributed in two contemporary groups. The firstnumber is the number of progeny, and the second number in parentheses is the sum of theprogeny observations.

Sire Contemporary Group1 2

A 3(11) 6(19)B 4(16) 3(18)C 5(14)

18

17.1 Operational Models

Letyijk = µ + Ci + Sj + eijk,

where yijk are the observations on the trait of interest of individual progeny, assumed to be onerecord per progeny only, µ is an overall mean, Ci is a random contemporary group effect, Sj isa random sire effect, and eijk is a random residual error term associated with each observation.

E(yijk) = µ,

V ar(eijk) = σ2e

V ar(Ci) = σ2c = σ2

e/6.0V ar(Sj) = σ2

s = σ2e/11.5

The ratio of four times the sire variance to total phenotypic variance (i.e. (σ2c + σ2

s + σ2e)), is

known as the heritability of the trait, and in this case is 0.2775. The ratio of the contemporarygroup variance to the total phenotypic variance is 0.1329. The important ratios are

σ2e/σ

2c = 6.0

σ2e/σ

2s = 11.5

There are a total of 21 observations, but only five filled subclasses. The individual observa-tions are not available, only the totals for each subclass. Therefore, an equivalent model is the“means” model.

yij = µ + Ci + Sj + eij ,

where yij is the mean of the progeny of the jth sire in the ith contemporary group, and eij isthe mean of the residuals for the (ij)th subclass.

The model assumes that

• Sires were mated randomly to dams within each contemporary group.

• Each dam had only one progeny.

• Sires were not related to each other.

• Progeny were all observed at the same age (or observations are perfectly adjusted for ageeffects).

• The contemporary groups were independent from each other.

19

17.2 Mixed Model Equations

The process is to define y, X, and Z, and also G and R. After that, the calculations arestraightforward. The “means” model will be used for this example.

17.2.1 Observations

The observation vector for the “means” model is

y =

11/316/414/519/618/3

.

17.2.2 Xb and Zu

Xb =

11111

µ.

The overall mean is the only column in X for this model.

There are two random factors and each one has its own design matrix.

Zu =(

Zc Zs

)( cs

),

where

Zcc =

1 01 01 00 10 1

(C1

C2

), Zss =

1 0 00 1 00 0 11 0 00 1 0

SA

SB

SC

,so that, together,

Zu =

1 0 1 0 01 0 0 1 01 0 0 0 10 1 1 0 00 1 0 1 0

C1

C2

SA

SB

SC

.

20

17.2.3 G and R

The covariance matrix of the means of residuals is R. The variance of a mean of randomvariables is the variance of individual variables divided by the number of variables in the mean.Let nij equal the number of progeny in a sire by contemporary group subclass, then the varianceof the subclass mean is σ2

e/nij . Thus,

R =

σ2

e/3 0 0 0 00 σ2

e/4 0 0 00 0 σ2

e/5 0 00 0 0 σ2

e/6 00 0 0 0 σ2

e/3

.

The matrix G is similarly partitioned into two submatrices, one for contemporary groupsand one for sires.

G =

(Gc 0

0 Gs

),

where

Gc =

(σ2

c 00 σ2

c

)= Iσ2

c = Iσ2

e

6.0,

and

Gs =

σ2s 0 00 σ2

s 00 0 σ2

s

= Iσ2s = I

σ2e

11.5.

The inverses of G and R are needed for the MME.

R−1 =

3 0 0 0 00 4 0 0 00 0 5 0 00 0 0 6 00 0 0 0 3

1σ2

e

,

and

G−1 =

6 0 0 0 00 6 0 0 00 0 11.5 0 00 0 0 11.5 00 0 0 0 11.5

1σ2

e

.

Because both are expressed in terms of the inverse of σ2e , then that constant can be ignored.

The relative values between G and R are sufficient to get solutions to the MME.

21

17.2.4 MME and Inverse Coefficient Matrix

The left hand side of the MME (LHS) is(X′R−1X X′R−1ZZ′R−1X Z′R−1Z + G−1

)(bu

),

and the right hand side of the MME (RHS) is(X′R−1yZ′R−1y

).

Numerically,

LHS =

21 12 9 9 7 512 18 0 3 4 59 0 15 6 3 09 3 6 20.5 0 07 4 3 0 18.5 05 5 0 0 0 16.5

µ

C1

C2

SA

SB

SC

,

and

RHS =

784137303414

.

The inverse of LHS coefficient matrix is

C =

.1621 −.0895 −.0772 −.0355 −.0295 −.0220−.0895 .1161 .0506 .0075 .0006 −.0081−.0772 .0506 .1161 −.0075 −.0006 .0081−.0355 .0075 −.0075 .0655 .0130 .0085−.0295 .0006 −.0006 .0130 .0652 .0088−.0220 −.0081 .0081 .0085 .0088 .0697

.

C has some interesting properties.

• Add elements (1,2) and (1,3) = -.1667, which is the negative of the ratio of σ2c/σ

2e .

• Add elements (1,4), (1,5), and (1,6) = -.08696, which is the negative of the ratio of σ2s/σ

2e .

• Add elements (2,2) and (2,3), or (3,2) plus (3,3) = .1667, ratio of contemporary groupvariance to residual variance.

• Add elements (4,4) plus (4,5) plus (4,6) = .08696, ratio of sire variance to residual variance.Also, ( (5,4)+(5,5)+(5,6) = (6,4)+(6,5)+(6,6) ).

• The sum of ((4,2)+(5,2)+(6,2)) = ((4,3)+(5,3)+(6,3)) = 0.

22

17.2.5 Solutions and Variance of Prediction Error

Let SOL represent the vector of solutions to the MME, then

SOL = C ∗ RHS =

(bu

)=

µ

C1

C2

SA

SB

SC

=

3.7448−.2183.2183−.2126.4327−.2201

.

The two contemporary group solutions add to zero, and the three sire solutions add to zero.

The variances of prediction error are derived from the diagonals of C corresponding to therandom effect solutions multiplied times the residual variance. Hence, the variance of predictionerror for contemporary group 1 is .1161 σ2

e . An estimate of the residual variance is needed. Anestimate of the residual variance is given by

σ2e = (SST − SSR)/(N − r(X)).

SST was not available from these data because individual observations were not available.Suppose SST = 322, then

σ2e = (322− 296.4704)/(21− 1) = 1.2765.

SSR is computed by multiply the solution vector times the RHS of the MME. That is,

SSR = 3.7448(78)− .2183(41) + .2183(37)− .2126(30) + .4327(34)− .2201(14) = 296.4704.

The variance of prediction error for contemporary group 1 is

V ar(PE) = .1161(1.2765) = .1482.

The standard error of prediction, or SEP, is the square root of the variance of prediction error,giving .3850. Thus, the solution for contemporary group 1 is -.2183 plus or minus .3850.

Variances of prediction error are calculated in the same way for all solutions of random effects.

Effect Solution SEPC1 -.2183 .3850C2 .2183 .3850SA -.2126 .2892SB .4327 .2885SC -.2201 .2983

Sire A has 9 progeny while sire B has 7 progeny, but sire B has a slightly smaller SEP. Thereason is due to the distribution of progeny of each sire in the two contemporary groups. Sire C,of course, has the larger SEP because it has only 5 progeny and all of these are in contemporarygroup 1. The differences in SEP in this small example are not large.

23

17.2.6 Repeatability or Reliability

Variances of prediction error are often expressed as a number going from 0 to 100 % , known asrepeatability or reliability (REL) (depending on the species). The general formula is

REL = (V ar(True Values)− V ar(PE))/(V ar(True Values),

times 100. Thus, for contemporary group 1, the reliability would be

REL = 100(.1667− .1482)/(.1667) = 11.10.

For Sire A, the REL would be

REL = 100(.08696− (.0655 ∗ 1.2765))/(.08696) = 3.85.

Thus, sires have smaller reliabilities than contemporary groups, but SEP for sires is smaller thanfor contemporary groups. This is because contemporary groups have more progeny in them thansires have, and because the variance of contemporary groups is larger than the variance of siretransmitting abilities.

17.3 R Methods for MME

Given the matrices X, Z, G−1, R−1, and y, then an R-function can be written to set up theMME, solve, and compute SSR. This function works for small examples, as given in these notes.For large problems, other methods can be used to solve the equations by iterating on the data.The function for small examples is given here.

24

MME = function(X,Z,GI,RI,y) {XX = t(X) %*% RI %*% XXZ = t(X) %*% RI %*% ZZZ = (t(Z) %*% RI %*% Z) + GIXy = t(X) %*% RI %*% yZy = t(Z) %*% RI %*% y

# Combine the pieces into LHS and RHSpiece1 = cbind(XX,XZ)piece2 = cbind(t(XZ),ZZ)LHS = rbind(piece1,piece2)RHS = rbind(Xy,Zy)

# Invert LHS and solveC = ginv(LHS)SOL = C %*% RHSSSR = t(SOL) %*% RHSSOLNS = cbind(SOL,sqrt(diag(C)))

return(list(LHS=LHS,RHS=RHS,C=C,SSR=SSR,SOLNS=SOLNS))}

To use the function,

Exampl = MME(X1,Z1,GI,RI,y)str(Exampl)

# To view the resultsExampl$LHSExampl$RHSExampl$CExampl$SOLNSExampl$SSR

25

18 EXERCISES

1. Below are data on progeny of 6 rams used in 5 sheep flocks (for some trait). The ramswere unrelated to each other and to any of the ewes to which they were mated. The firstnumber is the number of progeny in the herd, and the second (within parentheses) is thesum of the observations.

Ram FlocksID 1 2 3 4 5

1 6(638) 8(611) 6(546) 5(472) 0(0)2 5(497) 5(405) 5(510) 0(0) 4(378)3 15(1641) 6(598) 5(614) 6(639) 5(443)4 6(871) 11(1355) 0(0) 3(412) 3(367)5 2(235) 4(414) 8(874) 4(454) 6(830)6 0(0) 0(0) 4(460) 12(1312) 5(558)

Let the model equation beyijk = µ+ Fi +Rj + eijk

where Fi is a flock effect, Rj is a ram effect, and eijk is a residual effect. There are a totalof 149 observations and the total sum of squares was equal to 1,793,791. Assume thatσ2

e = 7σ2f = 1.5σ2

r when doing the problems below.

(a) Set up the mixed model equations and solve. Calculate the SEPs and reliabilities ofthe ram solutions.

(b) Repeat the above analysis, but assume that flocks are a fixed factor (i.e. do not addany variance ratio to the diagonals of the flock equations). How do the evaluations,SEP, and reliabilities change from the previous model?

(c) Assume that rams are a fixed factor, and flocks are random. Do the rams ranksimilarly to the previous two models?

2. When a model has one random factor and its covariance matrix is an identity matrix timesa scalar constant, then prove the the solutions for that factor from the MME will sum tozero. Try to make the proof as general as possible.

26