Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a...
Transcript of Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a...
Generalized linear models IIExponential families
Peter McCullagh
Department of StatisticsUniversity of Chicago
Polokwane, South AfricaNovember 2013
Outline
Components of a GLM
Exponential families
Real exponential families
Maximum likelihood fitting
Parameter estimation
Components of a generalized linear model
I Observation Y ∈ Rn with independent components... very strong simplifying assumption
I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,
hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn
e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)
Components of a generalized linear model
I Observation Y ∈ Rn with independent components... very strong simplifying assumption
I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,
hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn
e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)
Components of a generalized linear model
I Observation Y ∈ Rn with independent components... very strong simplifying assumption
I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,
hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn
e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)
Components of a generalized linear model
I Observation Y ∈ Rn with independent components... very strong simplifying assumption
I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,
hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn
e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)
Construction of an exponential family(i) Observation space y ∈ S (S = R)(ii) Baseline distribution with density f0(y) on S(iii) Real-valued statistic S(y)(iv) Moment generating function of statistic S():
M0(θ) =
∫S
eθS(y) × f0(y) dy
(v) Θ = {θ : M0(θ) <∞} (parameter space)(vi) K0(θ) = log M0(θ) is the cumulant generating function(vii) Weighted distribution
fθ(y) =eθS(y)f0(y)
M0(θ)= eθS(y)−K0(θ) · f0(y)
for θ ∈ Θ.(viii) Support of fθ = support of f0
Simplest types: (natural exponential families)S = R and S(y) = y .
Some properties of the family
Moment generating function of S() under fθ
Mθ(t) =
∫S
etS(y)fθ(y) dy
=
∫S
etS(y) eθS(y)f0(y)
M0(θ)dy
=M0(t + θ)
M0(θ)
Kθ(t) = K0(θ + t)− K0(θ)
The r th cumulant of S under fθ is K (r)θ (0) = K (r)
0 (θ)Mean: Eθ(S) = K ′0(θ)Variance: varθ(S) = K ′′0 (θ) ≥ 0
K0(·) is a convex function on Θ
Examples of real exponential families
Real: observation space is S = R
Gaussian family:Baseline: f0(y) = exp(−y2/2)/
√2π
MGF: M0(θ) = exp(θ2/2)CGF: K0(θ) = θ2/2
Exponentially weighted distribution:
fθ(y) = eθy−θ2/2e−y2/2/√
2π
= e−(y−θ)2/2/√
2π
Initial distribution N(0,1):
Exp family {N(θ,1) : θ ∈ R}all with unit variance
The Poisson familyBaseline distribution: Po(1)
f0(y) =exp(−1)
y !y = 0,1, . . .
Generating functions:
M0(θ) =∞∑
y=0
eθye−1
y != exp(eθ − 1)
K0(θ) = eθ − 1
All cumulants are equal to one; r th moment is Br (Bell number)Θ = {θ : M0(θ) <∞} = RExponential family:
fθ(y) =exp(yθ − eθ + 1)e−1
y !=
eyθ−eθ
y !=µy e−µ
y !
Initial baseline distribution Po(1) on integers:
Exp family {Po(µ) : µ > 0} µ = eθ
r th cumulant of Po(µ) is K (r)(θ) = µ
The Bernoulli familyBaseline distribution: Bernoulli coin-toss
f0(y) = 1/2 for y = 0,1
Generating functions:
M0(θ) = (eθ0 + eθ1)/2 = (1 + eθ)/2K0(θ) = log(1 + eθ)− log 2
Θ = {θ : M0(θ) <∞} = R
Exponential family:
fθ(y) =
{1/(1 + eθ) y = 0eθ/(1 + eθ) y = 1
π = eθ/(1 + eθ) is the mean of fθ
Initial baseline distribution Ber(1/2) on {0,1}Exponential family: {Ber(π) : 0 < π < 1} on {0,1}
The binomial familyBaseline distribution: Binomial(1/2):
f0(y ; m) =
(my
)2−m (0 ≤ y ≤ m)
Generating functions
M0(θ) = 2−mm∑
y=0
(my
)eθy = (1 + eθ)m/2m
K0(θ) = m log(1 + eθ)−m log 2
Exponential family:
fθ(y ; m) =
(my
)eθy
(1 + eθ)m
=
(my
)πy (1− π)m−y
Θ = R, π = eθ/(1 + eθ), 0 < π < 1
The Ewens familyDistribution on permutations [n]→ [n]: S = Sn:
[n] = {1, . . . ,n}, #Sn = n!
σ =y( 1 2 3 4 5 6 7
4 1 2 3 7 6 5
)= (1,4,3,2)(5,7), (6)
#σ = 3
Baseline distribution: f0(σ) = 1/n!
Generating function:∑
σ α#σ = α↑n (Euler)
α↑n = α(α + 1) · · · (α + n − 1)
Generating functions
M0(θ) =∑σ
eθ#σf0(σ) = α↑n/n! (α = eθ)
K0(θ) = log(α↑n)− log(n!)
Weighted distribution on permutations
fθ(σ) =α#σ
α↑n
The Ewens family on set partitionsPartition of [7] into blocks: 1|23|4567 or 1234|57|6 or...Partitions of [n]
n = 2 : 12, 1|2n = 3 : 123, 12|3, 13|2, 23|1, 1|2|3n = 4 : 1234, 123|4[4], 12|34[3], 12|3|4[6], 1|2|3|4
#Pn: 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975,. . .
Make each permutation cycle into a blockInduced marginal distribution on set partitions
fn(σ) =α#σ
α↑n
∏b∈σ
(#b − 1)!
Also exponential family:Canonical statistic #σ (number of blocks)
Can talk of the Ewens distribution of #σ on {1, . . . ,n}Cumulant function:
K (θ) = log((eθ)(eθ + 1) · · · (eθ + n − 1))
versus n log(1 + eθ) for binomial
Regression models
Sample: n individuals or subjects i = 1, . . . ,nCovariate xi for individual iResponse Yi for individual i
Distributional assumptions: (exp family)density fi(y) = exp(yiθi − K (θi))× f0(y)
independent for i 6= jµi = E(Yi) = K ′(θi);var(Yi) = K ′′(θi) = V (µi)
Model for vector µ as a function of Xµ =
Convolution and dispersion parameter
Suppose Y1, . . . ,Ym are iid fθ(y) = eθy−K0(θ)f0(y)What is the distribution of Y?
Answer first for θ = 0: f (m)0 (y)
in general: em(θy−K0(θ)) × f (m)0 (y)
Suggests introducing a dispersion parameter σ2 = 1/nufθ(y) = eν(θy−K0(θ)) × f0(y ; ν) ... ν is effective sample size;Mean is E(Y ) = K ′0(θ) independent of νvar(Y ) = K ′′(θ)/ν = σ2K ′′(θ)σ2 = 1/ν is the relative varianceθ, ν are orthogonal parameters
Example: the gamma familyStandard gamma distribution:
f (y ; ν) = yν−1e−y/Γ(ν); y > 0; ν > 0 CGF:K (t) = −ν log(1− t) = ν(t + t2/2 + t3/3 + t4/4 + · · ·
sum of exponentials: E(Y ) = ν; var(Y ) = ν
Standard 2-parameter gamma distribution:λνyν−1e−λy/Gamma(ν)K (t) = ν log(1− t/λ) = ν(tλ+ t2λ2/2 + · · ·Mean µ = λν; variance λ2ν
Parameterization for GLMs:
ννyν−1e−νy/µ
µν Γ(ν)
E(Y ) = µ; var(Y ) = µ2/ν; ν = var(Y )/µ2
GLM assumption: ν constant c.v.; µi depends on xiExp family parameterization: θi = −1/µi
Alternative non-GLM models:νi = ν constant and g(λi) = x ′i βλi = λ constant and log νi = x ′i β
Generalized linear models: Key ideasY1, . . . ,Yn are assumed independentdensity function of Yi at y :
eνi (θi y−K0(θi )) × f0(y ; νi)
Two-parameter family (θi , νi) such thatµi = E(Yi) = K ′0(θi);var(Yi) = K ′′(θi)/νi = V (µi)/νivariance function V (µ) is a characteristic of the familyV (µ) = µ for Poisson; V (µ) = µ2 for Gamma
GLM assumptions:νi = ν (constant relative variance)g(µi) = ηi = x ′i β;g(µ) = Xβ, where X is the design matrixLink function g is part of the specification
Parameters to be estimated (learned) (β, ν)
Estimation of β by maximum likelihoodLog likelihood derivatives w.r.t. β (vector/matrix form)
∂l∂β
= νX ′W(
Y − µdµ/dη
)∂2l∂β2 = −νX ′WX + terms of zero mean
W = diag{(dµi/dηi)2/Vi}
Fisher modification of Newton-Raphson scheme gives
(β − β0) = (X ′W0X )−1X ′W0
(Y − µ0
dµ/dη
)(X ′W0X )β = X ′W0X β0 + X ′W0
(Y − µ0
dµ/dη
)Entire F-N-R sequence is independent of ν ≡ 1/σ2
Asymptotic moments of β:
E(β) = β + Op(n−1
cov(β) = (X ′WX )−1/ν = σ2(X ′WX )−1
Dispersion estimation
In certain applications ν = 1 is ‘known’: ignore this frame
Otherwise, . . . F-N-R sequence produces βη = X β; µ = g−1(η)
Dispersion = relative variance: σ2 = var(Yi)/V (µi)
Natural moment estimate
σ2 =1
n − p
∑i
(Yi − µi)2
V (µi)=
X 2
n − p
p = rank(X ); X 2 is the generalized Pearson statistic
σ2 is consistent, but not the mle
R defaults
R defaults in summary(fit) and vcov(fit)σ2 = 1 for Poisson and binomialσ2 = X 2/(n − p) otherwise. normal, gamma,...
Arguably these are the right defaults,...Sometimes, but not always, appropriate
Over-riding the defaults:summary(glm(y~..., family=poisson()),dispersion=4.7)summary(glm(y~..., family=gamma(link=log)),dispersion=1)
The deviance function