REML and residual likelihood

23
university-logo Maximum likelihood Applications and examples REML and residual likelihood Peter McCullagh Department of Statistics University of Chicago Nelder Lecture Imperial College, March 8 2012 Peter McCullagh REML

Transcript of REML and residual likelihood

Page 1: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihood

Peter McCullagh

Department of StatisticsUniversity of Chicago

Nelder LectureImperial College, March 8 2012

Peter McCullagh REML

Page 2: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

JAN: Some personal remarks...

IC 1974–1977:The MS/PhD program in StatisticsComputing strategies: GLIM,...Ordinal data and log-linear models...

Chicago 1977-79: consulting workIC 1979–1984:

Plans for the GLM book I: London 1980–81Writing the GLM book II: Vancouver 1982Writing the GLM book III: London/Rothamsted 1982/83Toronto ASA Mtg 1984

Chicago 1985–1987:The second edition...Random effects models: the salamander data

Peter McCullagh REML

Page 3: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Outline

1 Maximum likelihoodREML and residual likelihoodLikelihood ratios

2 Applications and examplesExample I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Peter McCullagh REML

Page 4: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Symmetric functions

Estimation of moments/cumulants:Thiele 1891; Fisher 1929; Dressel 1940; Tukey 1950Y1, . . . ,Yn iid mean κ1, variance κ2, . . .

Polynomial symmetric functions...k1 = (Y1 + · · ·+ Yn)/n for κ1

k2 =∑

(Yi − k1)2/(n − 1) for κ2k11 =

∑]ij YiYj/n↓2 = k2

1 − k2/n for ??

k3 =∑

(Y1 − k1)3n2/n↓3 for κ3k21 = . . .k111 =

∑]ijk YiYjYk/n↓3

k4 =((n + 1)

∑(Yi − k1)4/n − 3(n − 1)k2

2)n3/n↓4

k31, k22, k211, k1111

Peter McCullagh REML

Page 5: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Maximum likelihood estimation

Design with n units/plots/subjects i = 1, . . . ,ncovariate x(i) ≡ xi in Rp givenResponse Y (i) = Yi a real number

Observation space Y ∈ S = Rn

Covariate space X = span(X ) ⊂ SLinear model: For some β ∈ Rp and σ2 > 0

Y ∼ N(Xβ, σ2In)Log likelihood function: l(β, σ; y) = −1

2‖y − Xβ‖2 − n logσ

β̂ = (X ′X )−1X ′y ; µ̂ = X β̂σ̂2 = ‖y − µ̂‖2/n

E(σ̂2) = (n − p)σ2/n: too small!Conventional estimate s2 = ‖y − µ̂‖2/(n − p)

Peter McCullagh REML

Page 6: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Residuals

One definition: R = Y − X (X ′X )−1X ′Y = QYAnother definition R′ = AY where ker(A) = XBut R′ = AR.... so all definitions are equivalent

... for likelihood computations

Distributions:R ∼ N(0, σ2Q) R2 ∼ N(0, σ2A′A)Likelihoods? No density function for R

Peter McCullagh REML

Page 7: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Variance-components estimation

Design with n units/plots/subjects i = 1, . . . ,nBlock factor relationship: B(i , j) = 1 if i ∼B j (given)covariate x(i) ≡ xi in Rp (given treatment level)Response Y (i) = Yi a real number

Linear model: For some β ∈ Rp and σ20σ

21 > 0

Y ∼ N(Xβ, σ20In + σ2

1B)mean µ = Xβ; variance Σ = σ2

0In + σ21B; W = Σ−1

Log likelihood function: l(β, σ; y) = −12‖y − µ‖

2 − 12 log |Σ|

Sufficient statistics (balance and µ = 0)E(YY ′) = tr(Σ) = nσ2

0 + nσ21

E(Y ′BY = tr(ΣB) = nσ20 + σ2

1∑

n2j

typically too small

Peter McCullagh REML

Page 8: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Gaussian likelihoods

Density of the Gaussian N(µ,Σ) distn at y ∈ Rn

|W |1/2 exp(−12‖y − µ‖

2) dy(Rn,W ) regarded as an inner product spaceW = Σ−1, 〈x , y〉 = x ′Wy

K ⊂ Rn a subspace of dimension k spanned by cols of KOrthogonal projections: P = K (K ′WK )−1K ′W , Q = I − PA: a linear transformation with kernel K

Marginal likelihood based on AY ∼ N(Aµ,AΣA′) is|AΣA′|−1/2 exp(−1

2(y − µ)′A′(AΣA′)−1A(y − µ))

Peter McCullagh REML

Page 9: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Gaussian likelihoods contd.

Marginal likelihood based on AY ∼ N(Aµ,AΣA′) is

|AΣA′|−1/2 exp(−12(y − µ)′A′(AΣA′)−1A(y − µ))

Equivalent expressions:

|W |1/2

|K ′WK |1/2 exp(−12(y − µ)′WQ(y − µ))

|W |1/2 |K ′K |1/2

|K ′WK |1/2 exp(−12(y − µ)′WQ(y − µ))

Det1/2(WQ) exp(−12(y − µ)′WQ(y − µ))

Det(WQ) is the product of n − k non-zero eigenvaluesRn/K regarded as an inner product space 〈x , y〉 = x ′WQy

Peter McCullagh REML

Page 10: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

REML and residual likelihood

Family of distributions: N(Xβ,Σ(θ)): β ∈ Rp, θ ∈ ΘFull log likelihood:

l(β,Σ; y) = −12 log det(Σ)− 1

2(y − Xβ)′W (y − Xβ)

Profile log likelihood: β̂θ = (X ′WX )−1X ′Wy : W = Σ−1θ

l(β̂,Σ; y) = −12 log det(Σ)− 1

2y ′WQy

Residual: Y 7→ AY where ker(A) = XResidual log likelihood

l(Σ; Qy) =−12 log det(Σ)− 1

2 log det(X ′WX )− 12y ′WQy

= 12 log Det(WQ)− 1

2y ′WQy + const(X )

Peter McCullagh REML

Page 11: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Summary: Marginal likelihood: K 6= X

Model subspace X = {µ(β) : β ∈ Rp} X = span(X )Kernel subspace K = span(K )Covariance matrix Σθ: W = Σ−1

Log likelihood based on observation y +KLog likelihood based on Ay where ker(A) = K

l(β, θ; y +K) = 12 log Det(WQ)− 1

2(y − µ)′WQ(y − µ))

where WQ = W (I − K (K ′WK )−1K ′W ) is i.p. in Rn/K

Special cases:K = 0: ordinary likelihoodK = 1n = span(e1 + · · ·+ en): likelihood based on contrastsK = X : standard REMLK = span(en): likelihood with yn unobserved

Peter McCullagh REML

Page 12: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

REML and residual likelihoodLikelihood ratios

Likelihood ratio tests

Simple likelihood ratio: Pθ(event)Pθ′ (event)

Maximized likelihood ratio:

supθ∈HAPθ(event)

supθ∈H0Pθ(event)

Event in numerator = event in denominator, usually dyFor marginal likelihood, event = dy +K

Marginal likelihood ratio statistic

supΘ Pθ(dy +K)

supΘ′ Pθ(dy +K)

Same K in numerator and denominator

Peter McCullagh REML

Page 13: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Example: Eelworm control using fumigants

I

II III

IV

Actual field layout of 48 plots in four blocks. Experiment usingfumigants to control eelworms in oat field. (Bailey, 2008, p. 73).

Data (eelworm counts) from Cochran and Cox (1950, Table 3.1)Blk 1 (I) Blk 2 (IV)

269 283 252 212 95 127 80 134138 100 197 263 107 89 41 74282 230 216 145 88 25 42 62

Blk 2 (II) Blk 4 (III)124 211 194 222 193 209 109 153102 193 128 42 29 9 17 19162 191 107 67 23 19 44 48

1

Peter McCullagh REML

Page 14: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Variance models: taking off from JAN (1965)

Block-structured effects: η iid∼ N(0, σ21) const on each block

Yi = trt effects + ηb(i) + εi

cov(Yi ,Yj) = σ21B(i , j) + σ2

0δij

Stationary isotropic spatial effect:

η ∼ GP(0, σ21K ) cov(η(x), η(x ′)) = σ2

1K (|x − x ′|)

Y (A) = trt erffects +

∫Aη(x) dx + εi

cov(Yi ,Yj) = σ21K̄ (xi , xj) + σ2

0δij

K (x , x ′) = exp(−|x − x ′|/ρ) with range ρ > 0 for illustrationIn practice, ρ̂ =∞ K (x , x ′) = −|x − x ′|

Peter McCullagh REML

Page 15: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Comparison of variance models for eelworm expt

Y (i) response for plot i : (log ratio of eelworm counts)Block relation: B(i , j) = 1 if i ∼ j in same blockDistance relation: Dij = d(i , j): Vij = exp(−Dij/ρ)Take K = fumigant ∗ dose as kernel

Maximal model: cov(Y (i),Y (j)) = σ20δij + σ2

1Bij + σ22Vij

H0 : σ21 = σ2

2 = 0H1 : σ2

2 = 0 (no spatial effect beyond blocks)H2 : σ2

1 = 0 (no block effect)

Log likelihood values: 6.47, 12.28, 20.53, 20.53 (both)

(Max ‘always’ occurs at ρ→∞V = −D is pos def on contrasts: K ⊃ 1)

R syntax: regress(y~1, ~blk+V, kernel=K)

Peter McCullagh REML

Page 16: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Treatment comparisons via likelihood

Fix covariance model at cov(Y ) = σ20In + σ2

2V : (V = −D)

Treatments: Four fumigants and three dose levels includingzeroNullnull model: Nothing has any effect (X = 1) dim 1Null model: all fumigants equally effective: 1+dose dim 3Alternative: fumigant*dose dim 9

regress(y~1, ~V, kernel=~1) llik=14.4regress(y~dose, ~V, kernel=~1) llik=17.3regress(y~dose, ~V, kernel=~dose) llik=16.3regress(y~fumigant:dose, ~V, kernel=~dose)26.7

Comparisons involving models having the same kernelDefault kernel is K = X : (REML)

Peter McCullagh REML

Page 17: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Marginal likelihood and kernel smoothing

(Y0Y1

)∼(

Σ00 Σ01Σ10 Σ11

)Y0 |Y1 = y1∼N(Σ01Σ−1

11 y1, Σ00 − Σ01Σ−111 Σ10)

Implications: Observe Y1 = y1 only (n-component vector)Predictive distn: mean = Σ01Σ−1

11 y1; cov = W−100

Typical application: observe (y1, . . . , yn) at (x1, . . . , xn)Σij = σ2

0δij + Σ21K (xi , xj) K (x , x ′) = e−|x−x ′| or ...

Predictions: E(Y (x∗) |data) =∑

ij K (x∗, xi)Σ−1ij yj

‘smooth’ fn of x∗ called kernel spline

Peter McCullagh REML

Page 18: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

●●

●●

●●

●●●

● ●

●●

●●

●●

2 4 6 8 10

0.3

0.5

0.7

0.9

x[w]

C_1 spline: const mean model●

●●

●●

●●

●●●

● ●

●●

●●

●●

2 4 6 8 10

0.3

0.5

0.7

0.9

x[w]

y[w

]

C_1 spline: linear mean model

●●

●●

●●

●●●

● ●

●●

●●

●●

0.3

0.5

0.7

0.9

C_2 spline: linear mean model●

●●

●●

●●

●●●

● ●

●●

●●

●●

0.3

0.5

0.7

0.9

y[w

]

C_2 spline: quadratic mean model

Peter McCullagh REML

Page 19: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

R code:d <- abs(outer(x, x, "-")); rho <- 100;K <- (1 + d/rho)*exp(-d/rho)fit1 <- regress(y~1, ~K, kernel=1)blp <- fit1$fitted + fit1$sigma[2] * K %*% fit1$W%*% (y-fit1$fitted)plot(x, y, cex=0.5); lines(x, blp)

Example of an improper covariance function:K3 <- d^3fit3 <- regress(y~1+x+xsq, ~K3, kernel=~1+x)blp <- fit3$fitted + fit3$sigma[2] * K3 %*% fit3$W%*% (y-fit3$fitted)

plot(x, y, cex=0.5); lines(x, blp)

Peter McCullagh REML

Page 20: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

The Box-Cox technique for transformation

Family of transformations y 7→ gλ(y) = (yλ − 1)/λindexed by λ and applied component-wise

Model: for some λ, gλ(Y ) ∼ N(Xβ,Σ)Density at y ∈ Rn is

det(W )1/2 exp(−12‖gλ(y)− Xβ‖2W )× Jλ(y) dy

W = Σ−1, Jλ(y) =∏|g′λ(yi)|

Log likelihood is

12 log det(W )− 1

2‖gλ(y)− Xβ‖2W +∑

log |g′λ(yi)|

Profile log likelihood for λ is

12 log det(W )− 1

2‖gλ(y)‖2WQ +∑

log |g′λ(yi)|

Peter McCullagh REML

Page 21: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Box-Cox and REML

Profile log likelihood for λ

12 log det(W )− 1

2‖gλ(y)‖2WQ +∑

log |g′λ(yi)|

gλ(y) = (yλ − 1)/λ, g′λ(y) = yλ−1

REML likelihood: (Shi-Tsai, JRSSB, 2002)

l(λ,W ; y ,X ) = 12 log Det(WQ)− 1

2‖gλ(y)‖2WQ +∑

log |g′λ(yi)|

...by adopting the results of Verbyla (1990) or Diggle (1994)...

Is this right/reasonable/OK?(i) seems reasonable by analogy with REML to adjust for d.f.(ii) but not a function of the residuals Qy(iii) Put X = In: resid = 0 but l(λ...) = (λ− 1)

∑log(yi)

... so it cannot be right!Peter McCullagh REML

Page 22: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

Box-Cox and REML, contd

Is there a right way to combine Box-Cox with REML?No!Why not?

Ans I:Because the transformation y 7→ yλ Rn → Rn

is not measurable with respect to B(Rn/K)The transformation does not preserve cosets

Ans II:Model says Y λ ∼ N(µ ∈ X ,Σ) or Y ∼ N(µ,Σ, λ)Then E(Y ) 6∈ X implies distn of QY depends on µ

Peter McCullagh REML

Page 23: REML and residual likelihood

university-logo

Maximum likelihoodApplications and examples

Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML

References

Bailey, R. (2007) Design of Comparative Experiments. Cambridge.Box, GEP and Cox, D.R. (1964) Analysis of transformations JRSSB211–252.Harville, D.A. (1974) Bayesian variance components. Bka 61,383–385.Harville, D.A. (1977) Variance component estimation. JASA 72,320-340.Nelder, J.A. (1965) Orthogonal block structure. Proc Roy Soc A 283Patterson, H. and Thompson, R. (1971) Biometrika 58 545-554.Shi, P. and Tsal, C-L. Regression model selection: A residuallikelihood approach. JRSSB 2002.

Peter McCullagh REML