university-logo
Maximum likelihoodApplications and examples
REML and residual likelihood
Peter McCullagh
Department of StatisticsUniversity of Chicago
Nelder LectureImperial College, March 8 2012
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
JAN: Some personal remarks...
IC 1974–1977:The MS/PhD program in StatisticsComputing strategies: GLIM,...Ordinal data and log-linear models...
Chicago 1977-79: consulting workIC 1979–1984:
Plans for the GLM book I: London 1980–81Writing the GLM book II: Vancouver 1982Writing the GLM book III: London/Rothamsted 1982/83Toronto ASA Mtg 1984
Chicago 1985–1987:The second edition...Random effects models: the salamander data
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Outline
1 Maximum likelihoodREML and residual likelihoodLikelihood ratios
2 Applications and examplesExample I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Symmetric functions
Estimation of moments/cumulants:Thiele 1891; Fisher 1929; Dressel 1940; Tukey 1950Y1, . . . ,Yn iid mean κ1, variance κ2, . . .
Polynomial symmetric functions...k1 = (Y1 + · · ·+ Yn)/n for κ1
k2 =∑
(Yi − k1)2/(n − 1) for κ2k11 =
∑]ij YiYj/n↓2 = k2
1 − k2/n for ??
k3 =∑
(Y1 − k1)3n2/n↓3 for κ3k21 = . . .k111 =
∑]ijk YiYjYk/n↓3
k4 =((n + 1)
∑(Yi − k1)4/n − 3(n − 1)k2
2)n3/n↓4
k31, k22, k211, k1111
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Maximum likelihood estimation
Design with n units/plots/subjects i = 1, . . . ,ncovariate x(i) ≡ xi in Rp givenResponse Y (i) = Yi a real number
Observation space Y ∈ S = Rn
Covariate space X = span(X ) ⊂ SLinear model: For some β ∈ Rp and σ2 > 0
Y ∼ N(Xβ, σ2In)Log likelihood function: l(β, σ; y) = −1
2‖y − Xβ‖2 − n logσ
β̂ = (X ′X )−1X ′y ; µ̂ = X β̂σ̂2 = ‖y − µ̂‖2/n
E(σ̂2) = (n − p)σ2/n: too small!Conventional estimate s2 = ‖y − µ̂‖2/(n − p)
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Residuals
One definition: R = Y − X (X ′X )−1X ′Y = QYAnother definition R′ = AY where ker(A) = XBut R′ = AR.... so all definitions are equivalent
... for likelihood computations
Distributions:R ∼ N(0, σ2Q) R2 ∼ N(0, σ2A′A)Likelihoods? No density function for R
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Variance-components estimation
Design with n units/plots/subjects i = 1, . . . ,nBlock factor relationship: B(i , j) = 1 if i ∼B j (given)covariate x(i) ≡ xi in Rp (given treatment level)Response Y (i) = Yi a real number
Linear model: For some β ∈ Rp and σ20σ
21 > 0
Y ∼ N(Xβ, σ20In + σ2
1B)mean µ = Xβ; variance Σ = σ2
0In + σ21B; W = Σ−1
Log likelihood function: l(β, σ; y) = −12‖y − µ‖
2 − 12 log |Σ|
Sufficient statistics (balance and µ = 0)E(YY ′) = tr(Σ) = nσ2
0 + nσ21
E(Y ′BY = tr(ΣB) = nσ20 + σ2
1∑
n2j
typically too small
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Gaussian likelihoods
Density of the Gaussian N(µ,Σ) distn at y ∈ Rn
|W |1/2 exp(−12‖y − µ‖
2) dy(Rn,W ) regarded as an inner product spaceW = Σ−1, 〈x , y〉 = x ′Wy
K ⊂ Rn a subspace of dimension k spanned by cols of KOrthogonal projections: P = K (K ′WK )−1K ′W , Q = I − PA: a linear transformation with kernel K
Marginal likelihood based on AY ∼ N(Aµ,AΣA′) is|AΣA′|−1/2 exp(−1
2(y − µ)′A′(AΣA′)−1A(y − µ))
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Gaussian likelihoods contd.
Marginal likelihood based on AY ∼ N(Aµ,AΣA′) is
|AΣA′|−1/2 exp(−12(y − µ)′A′(AΣA′)−1A(y − µ))
Equivalent expressions:
|W |1/2
|K ′WK |1/2 exp(−12(y − µ)′WQ(y − µ))
|W |1/2 |K ′K |1/2
|K ′WK |1/2 exp(−12(y − µ)′WQ(y − µ))
Det1/2(WQ) exp(−12(y − µ)′WQ(y − µ))
Det(WQ) is the product of n − k non-zero eigenvaluesRn/K regarded as an inner product space 〈x , y〉 = x ′WQy
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
REML and residual likelihood
Family of distributions: N(Xβ,Σ(θ)): β ∈ Rp, θ ∈ ΘFull log likelihood:
l(β,Σ; y) = −12 log det(Σ)− 1
2(y − Xβ)′W (y − Xβ)
Profile log likelihood: β̂θ = (X ′WX )−1X ′Wy : W = Σ−1θ
l(β̂,Σ; y) = −12 log det(Σ)− 1
2y ′WQy
Residual: Y 7→ AY where ker(A) = XResidual log likelihood
l(Σ; Qy) =−12 log det(Σ)− 1
2 log det(X ′WX )− 12y ′WQy
= 12 log Det(WQ)− 1
2y ′WQy + const(X )
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Summary: Marginal likelihood: K 6= X
Model subspace X = {µ(β) : β ∈ Rp} X = span(X )Kernel subspace K = span(K )Covariance matrix Σθ: W = Σ−1
Log likelihood based on observation y +KLog likelihood based on Ay where ker(A) = K
l(β, θ; y +K) = 12 log Det(WQ)− 1
2(y − µ)′WQ(y − µ))
where WQ = W (I − K (K ′WK )−1K ′W ) is i.p. in Rn/K
Special cases:K = 0: ordinary likelihoodK = 1n = span(e1 + · · ·+ en): likelihood based on contrastsK = X : standard REMLK = span(en): likelihood with yn unobserved
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
REML and residual likelihoodLikelihood ratios
Likelihood ratio tests
Simple likelihood ratio: Pθ(event)Pθ′ (event)
Maximized likelihood ratio:
supθ∈HAPθ(event)
supθ∈H0Pθ(event)
Event in numerator = event in denominator, usually dyFor marginal likelihood, event = dy +K
Marginal likelihood ratio statistic
supΘ Pθ(dy +K)
supΘ′ Pθ(dy +K)
Same K in numerator and denominator
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Example: Eelworm control using fumigants
I
II III
IV
Actual field layout of 48 plots in four blocks. Experiment usingfumigants to control eelworms in oat field. (Bailey, 2008, p. 73).
Data (eelworm counts) from Cochran and Cox (1950, Table 3.1)Blk 1 (I) Blk 2 (IV)
269 283 252 212 95 127 80 134138 100 197 263 107 89 41 74282 230 216 145 88 25 42 62
Blk 2 (II) Blk 4 (III)124 211 194 222 193 209 109 153102 193 128 42 29 9 17 19162 191 107 67 23 19 44 48
1
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Variance models: taking off from JAN (1965)
Block-structured effects: η iid∼ N(0, σ21) const on each block
Yi = trt effects + ηb(i) + εi
cov(Yi ,Yj) = σ21B(i , j) + σ2
0δij
Stationary isotropic spatial effect:
η ∼ GP(0, σ21K ) cov(η(x), η(x ′)) = σ2
1K (|x − x ′|)
Y (A) = trt erffects +
∫Aη(x) dx + εi
cov(Yi ,Yj) = σ21K̄ (xi , xj) + σ2
0δij
K (x , x ′) = exp(−|x − x ′|/ρ) with range ρ > 0 for illustrationIn practice, ρ̂ =∞ K (x , x ′) = −|x − x ′|
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Comparison of variance models for eelworm expt
Y (i) response for plot i : (log ratio of eelworm counts)Block relation: B(i , j) = 1 if i ∼ j in same blockDistance relation: Dij = d(i , j): Vij = exp(−Dij/ρ)Take K = fumigant ∗ dose as kernel
Maximal model: cov(Y (i),Y (j)) = σ20δij + σ2
1Bij + σ22Vij
H0 : σ21 = σ2
2 = 0H1 : σ2
2 = 0 (no spatial effect beyond blocks)H2 : σ2
1 = 0 (no block effect)
Log likelihood values: 6.47, 12.28, 20.53, 20.53 (both)
(Max ‘always’ occurs at ρ→∞V = −D is pos def on contrasts: K ⊃ 1)
R syntax: regress(y~1, ~blk+V, kernel=K)
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Treatment comparisons via likelihood
Fix covariance model at cov(Y ) = σ20In + σ2
2V : (V = −D)
Treatments: Four fumigants and three dose levels includingzeroNullnull model: Nothing has any effect (X = 1) dim 1Null model: all fumigants equally effective: 1+dose dim 3Alternative: fumigant*dose dim 9
regress(y~1, ~V, kernel=~1) llik=14.4regress(y~dose, ~V, kernel=~1) llik=17.3regress(y~dose, ~V, kernel=~dose) llik=16.3regress(y~fumigant:dose, ~V, kernel=~dose)26.7
Comparisons involving models having the same kernelDefault kernel is K = X : (REML)
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Marginal likelihood and kernel smoothing
(Y0Y1
)∼(
Σ00 Σ01Σ10 Σ11
)Y0 |Y1 = y1∼N(Σ01Σ−1
11 y1, Σ00 − Σ01Σ−111 Σ10)
Implications: Observe Y1 = y1 only (n-component vector)Predictive distn: mean = Σ01Σ−1
11 y1; cov = W−100
Typical application: observe (y1, . . . , yn) at (x1, . . . , xn)Σij = σ2
0δij + Σ21K (xi , xj) K (x , x ′) = e−|x−x ′| or ...
Predictions: E(Y (x∗) |data) =∑
ij K (x∗, xi)Σ−1ij yj
‘smooth’ fn of x∗ called kernel spline
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
●
●●
●
●
●
●
●●
●●
●●●
● ●
●
●
●
●
●●
●●
●
●
●●
2 4 6 8 10
0.3
0.5
0.7
0.9
x[w]
C_1 spline: const mean model●
●●
●
●
●
●
●●
●●
●●●
● ●
●
●
●
●
●●
●●
●
●
●●
2 4 6 8 10
0.3
0.5
0.7
0.9
x[w]
y[w
]
C_1 spline: linear mean model
●
●●
●
●
●
●
●●
●●
●●●
● ●
●
●
●
●
●●
●●
●
●
●●
0.3
0.5
0.7
0.9
C_2 spline: linear mean model●
●●
●
●
●
●
●●
●●
●●●
● ●
●
●
●
●
●●
●●
●
●
●●
0.3
0.5
0.7
0.9
y[w
]
C_2 spline: quadratic mean model
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
R code:d <- abs(outer(x, x, "-")); rho <- 100;K <- (1 + d/rho)*exp(-d/rho)fit1 <- regress(y~1, ~K, kernel=1)blp <- fit1$fitted + fit1$sigma[2] * K %*% fit1$W%*% (y-fit1$fitted)plot(x, y, cex=0.5); lines(x, blp)
Example of an improper covariance function:K3 <- d^3fit3 <- regress(y~1+x+xsq, ~K3, kernel=~1+x)blp <- fit3$fitted + fit3$sigma[2] * K3 %*% fit3$W%*% (y-fit3$fitted)
plot(x, y, cex=0.5); lines(x, blp)
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
The Box-Cox technique for transformation
Family of transformations y 7→ gλ(y) = (yλ − 1)/λindexed by λ and applied component-wise
Model: for some λ, gλ(Y ) ∼ N(Xβ,Σ)Density at y ∈ Rn is
det(W )1/2 exp(−12‖gλ(y)− Xβ‖2W )× Jλ(y) dy
W = Σ−1, Jλ(y) =∏|g′λ(yi)|
Log likelihood is
12 log det(W )− 1
2‖gλ(y)− Xβ‖2W +∑
log |g′λ(yi)|
Profile log likelihood for λ is
12 log det(W )− 1
2‖gλ(y)‖2WQ +∑
log |g′λ(yi)|
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Box-Cox and REML
Profile log likelihood for λ
12 log det(W )− 1
2‖gλ(y)‖2WQ +∑
log |g′λ(yi)|
gλ(y) = (yλ − 1)/λ, g′λ(y) = yλ−1
REML likelihood: (Shi-Tsai, JRSSB, 2002)
l(λ,W ; y ,X ) = 12 log Det(WQ)− 1
2‖gλ(y)‖2WQ +∑
log |g′λ(yi)|
...by adopting the results of Verbyla (1990) or Diggle (1994)...
Is this right/reasonable/OK?(i) seems reasonable by analogy with REML to adjust for d.f.(ii) but not a function of the residuals Qy(iii) Put X = In: resid = 0 but l(λ...) = (λ− 1)
∑log(yi)
... so it cannot be right!Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
Box-Cox and REML, contd
Is there a right way to combine Box-Cox with REML?No!Why not?
Ans I:Because the transformation y 7→ yλ Rn → Rn
is not measurable with respect to B(Rn/K)The transformation does not preserve cosets
Ans II:Model says Y λ ∼ N(µ ∈ X ,Σ) or Y ∼ N(µ,Σ, λ)Then E(Y ) 6∈ X implies distn of QY depends on µ
Peter McCullagh REML
university-logo
Maximum likelihoodApplications and examples
Example I: fumigants for eelworm controlExample II: kernel smoothingBox-Cox and REML
References
Bailey, R. (2007) Design of Comparative Experiments. Cambridge.Box, GEP and Cox, D.R. (1964) Analysis of transformations JRSSB211–252.Harville, D.A. (1974) Bayesian variance components. Bka 61,383–385.Harville, D.A. (1977) Variance component estimation. JASA 72,320-340.Nelder, J.A. (1965) Orthogonal block structure. Proc Roy Soc A 283Patterson, H. and Thompson, R. (1971) Biometrika 58 545-554.Shi, P. and Tsal, C-L. Regression model selection: A residuallikelihood approach. JRSSB 2002.
Peter McCullagh REML
Top Related