Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a...

22
Generalized linear models II Exponential families Peter McCullagh Department of Statistics University of Chicago Polokwane, South Africa November 2013

Transcript of Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a...

Page 1: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Generalized linear models IIExponential families

Peter McCullagh

Department of StatisticsUniversity of Chicago

Polokwane, South AfricaNovember 2013

Page 2: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Outline

Components of a GLM

Exponential families

Real exponential families

Maximum likelihood fitting

Parameter estimation

Page 3: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Components of a generalized linear model

I Observation Y ∈ Rn with independent components... very strong simplifying assumption

I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,

hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn

e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)

Page 4: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Components of a generalized linear model

I Observation Y ∈ Rn with independent components... very strong simplifying assumption

I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,

hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn

e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)

Page 5: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Components of a generalized linear model

I Observation Y ∈ Rn with independent components... very strong simplifying assumption

I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,

hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn

e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)

Page 6: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Components of a generalized linear model

I Observation Y ∈ Rn with independent components... very strong simplifying assumption

I Distribution: exponential family: Yi ∼ EF (θi)mean-value parameter µi = E(Yi)includes Poisson, binomial, exponential,

hypergeometric,...I Linear part: η = Xβ; η ∈ X ⊂ Rn

e.g. factorial model,..I Link function: ηi = g(µi) (component-wise)

Page 7: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Construction of an exponential family(i) Observation space y ∈ S (S = R)(ii) Baseline distribution with density f0(y) on S(iii) Real-valued statistic S(y)(iv) Moment generating function of statistic S():

M0(θ) =

∫S

eθS(y) × f0(y) dy

(v) Θ = {θ : M0(θ) <∞} (parameter space)(vi) K0(θ) = log M0(θ) is the cumulant generating function(vii) Weighted distribution

fθ(y) =eθS(y)f0(y)

M0(θ)= eθS(y)−K0(θ) · f0(y)

for θ ∈ Θ.(viii) Support of fθ = support of f0

Simplest types: (natural exponential families)S = R and S(y) = y .

Page 8: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Some properties of the family

Moment generating function of S() under fθ

Mθ(t) =

∫S

etS(y)fθ(y) dy

=

∫S

etS(y) eθS(y)f0(y)

M0(θ)dy

=M0(t + θ)

M0(θ)

Kθ(t) = K0(θ + t)− K0(θ)

The r th cumulant of S under fθ is K (r)θ (0) = K (r)

0 (θ)Mean: Eθ(S) = K ′0(θ)Variance: varθ(S) = K ′′0 (θ) ≥ 0

K0(·) is a convex function on Θ

Page 9: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Examples of real exponential families

Real: observation space is S = R

Gaussian family:Baseline: f0(y) = exp(−y2/2)/

√2π

MGF: M0(θ) = exp(θ2/2)CGF: K0(θ) = θ2/2

Exponentially weighted distribution:

fθ(y) = eθy−θ2/2e−y2/2/√

= e−(y−θ)2/2/√

Initial distribution N(0,1):

Exp family {N(θ,1) : θ ∈ R}all with unit variance

Page 10: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The Poisson familyBaseline distribution: Po(1)

f0(y) =exp(−1)

y !y = 0,1, . . .

Generating functions:

M0(θ) =∞∑

y=0

eθye−1

y != exp(eθ − 1)

K0(θ) = eθ − 1

All cumulants are equal to one; r th moment is Br (Bell number)Θ = {θ : M0(θ) <∞} = RExponential family:

fθ(y) =exp(yθ − eθ + 1)e−1

y !=

eyθ−eθ

y !=µy e−µ

y !

Initial baseline distribution Po(1) on integers:

Exp family {Po(µ) : µ > 0} µ = eθ

r th cumulant of Po(µ) is K (r)(θ) = µ

Page 11: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The Bernoulli familyBaseline distribution: Bernoulli coin-toss

f0(y) = 1/2 for y = 0,1

Generating functions:

M0(θ) = (eθ0 + eθ1)/2 = (1 + eθ)/2K0(θ) = log(1 + eθ)− log 2

Θ = {θ : M0(θ) <∞} = R

Exponential family:

fθ(y) =

{1/(1 + eθ) y = 0eθ/(1 + eθ) y = 1

π = eθ/(1 + eθ) is the mean of fθ

Initial baseline distribution Ber(1/2) on {0,1}Exponential family: {Ber(π) : 0 < π < 1} on {0,1}

Page 12: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The binomial familyBaseline distribution: Binomial(1/2):

f0(y ; m) =

(my

)2−m (0 ≤ y ≤ m)

Generating functions

M0(θ) = 2−mm∑

y=0

(my

)eθy = (1 + eθ)m/2m

K0(θ) = m log(1 + eθ)−m log 2

Exponential family:

fθ(y ; m) =

(my

)eθy

(1 + eθ)m

=

(my

)πy (1− π)m−y

Θ = R, π = eθ/(1 + eθ), 0 < π < 1

Page 13: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The Ewens familyDistribution on permutations [n]→ [n]: S = Sn:

[n] = {1, . . . ,n}, #Sn = n!

σ =y( 1 2 3 4 5 6 7

4 1 2 3 7 6 5

)= (1,4,3,2)(5,7), (6)

#σ = 3

Baseline distribution: f0(σ) = 1/n!

Generating function:∑

σ α#σ = α↑n (Euler)

α↑n = α(α + 1) · · · (α + n − 1)

Generating functions

M0(θ) =∑σ

eθ#σf0(σ) = α↑n/n! (α = eθ)

K0(θ) = log(α↑n)− log(n!)

Weighted distribution on permutations

fθ(σ) =α#σ

α↑n

Page 14: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The Ewens family on set partitionsPartition of [7] into blocks: 1|23|4567 or 1234|57|6 or...Partitions of [n]

n = 2 : 12, 1|2n = 3 : 123, 12|3, 13|2, 23|1, 1|2|3n = 4 : 1234, 123|4[4], 12|34[3], 12|3|4[6], 1|2|3|4

#Pn: 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975,. . .

Make each permutation cycle into a blockInduced marginal distribution on set partitions

fn(σ) =α#σ

α↑n

∏b∈σ

(#b − 1)!

Also exponential family:Canonical statistic #σ (number of blocks)

Can talk of the Ewens distribution of #σ on {1, . . . ,n}Cumulant function:

K (θ) = log((eθ)(eθ + 1) · · · (eθ + n − 1))

versus n log(1 + eθ) for binomial

Page 15: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Regression models

Sample: n individuals or subjects i = 1, . . . ,nCovariate xi for individual iResponse Yi for individual i

Distributional assumptions: (exp family)density fi(y) = exp(yiθi − K (θi))× f0(y)

independent for i 6= jµi = E(Yi) = K ′(θi);var(Yi) = K ′′(θi) = V (µi)

Model for vector µ as a function of Xµ =

Page 16: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Convolution and dispersion parameter

Suppose Y1, . . . ,Ym are iid fθ(y) = eθy−K0(θ)f0(y)What is the distribution of Y?

Answer first for θ = 0: f (m)0 (y)

in general: em(θy−K0(θ)) × f (m)0 (y)

Suggests introducing a dispersion parameter σ2 = 1/nufθ(y) = eν(θy−K0(θ)) × f0(y ; ν) ... ν is effective sample size;Mean is E(Y ) = K ′0(θ) independent of νvar(Y ) = K ′′(θ)/ν = σ2K ′′(θ)σ2 = 1/ν is the relative varianceθ, ν are orthogonal parameters

Page 17: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Example: the gamma familyStandard gamma distribution:

f (y ; ν) = yν−1e−y/Γ(ν); y > 0; ν > 0 CGF:K (t) = −ν log(1− t) = ν(t + t2/2 + t3/3 + t4/4 + · · ·

sum of exponentials: E(Y ) = ν; var(Y ) = ν

Standard 2-parameter gamma distribution:λνyν−1e−λy/Gamma(ν)K (t) = ν log(1− t/λ) = ν(tλ+ t2λ2/2 + · · ·Mean µ = λν; variance λ2ν

Parameterization for GLMs:

ννyν−1e−νy/µ

µν Γ(ν)

E(Y ) = µ; var(Y ) = µ2/ν; ν = var(Y )/µ2

GLM assumption: ν constant c.v.; µi depends on xiExp family parameterization: θi = −1/µi

Alternative non-GLM models:νi = ν constant and g(λi) = x ′i βλi = λ constant and log νi = x ′i β

Page 18: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Generalized linear models: Key ideasY1, . . . ,Yn are assumed independentdensity function of Yi at y :

eνi (θi y−K0(θi )) × f0(y ; νi)

Two-parameter family (θi , νi) such thatµi = E(Yi) = K ′0(θi);var(Yi) = K ′′(θi)/νi = V (µi)/νivariance function V (µ) is a characteristic of the familyV (µ) = µ for Poisson; V (µ) = µ2 for Gamma

GLM assumptions:νi = ν (constant relative variance)g(µi) = ηi = x ′i β;g(µ) = Xβ, where X is the design matrixLink function g is part of the specification

Parameters to be estimated (learned) (β, ν)

Page 19: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Estimation of β by maximum likelihoodLog likelihood derivatives w.r.t. β (vector/matrix form)

∂l∂β

= νX ′W(

Y − µdµ/dη

)∂2l∂β2 = −νX ′WX + terms of zero mean

W = diag{(dµi/dηi)2/Vi}

Fisher modification of Newton-Raphson scheme gives

(β − β0) = (X ′W0X )−1X ′W0

(Y − µ0

dµ/dη

)(X ′W0X )β = X ′W0X β0 + X ′W0

(Y − µ0

dµ/dη

)Entire F-N-R sequence is independent of ν ≡ 1/σ2

Asymptotic moments of β:

E(β) = β + Op(n−1

cov(β) = (X ′WX )−1/ν = σ2(X ′WX )−1

Page 20: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

Dispersion estimation

In certain applications ν = 1 is ‘known’: ignore this frame

Otherwise, . . . F-N-R sequence produces βη = X β; µ = g−1(η)

Dispersion = relative variance: σ2 = var(Yi)/V (µi)

Natural moment estimate

σ2 =1

n − p

∑i

(Yi − µi)2

V (µi)=

X 2

n − p

p = rank(X ); X 2 is the generalized Pearson statistic

σ2 is consistent, but not the mle

Page 21: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

R defaults

R defaults in summary(fit) and vcov(fit)σ2 = 1 for Poisson and binomialσ2 = X 2/(n − p) otherwise. normal, gamma,...

Arguably these are the right defaults,...Sometimes, but not always, appropriate

Over-riding the defaults:summary(glm(y~..., family=poisson()),dispersion=4.7)summary(glm(y~..., family=gamma(link=log)),dispersion=1)

Page 22: Generalized linear models II Exponential familiespmcc/seminars/SASA/glm2.pdf · Components of a generalized linear model I Observation Y 2Rn with independent components... very strong

The deviance function