Waterloo GLMM talk

Post on 24-Jan-2015

263 views 2 download

description

Generalized linear mixed models for Waterloo

Transcript of Waterloo GLMM talk

De�nitions Estimation Inference Challenges & open questions References

Generalized linear mixed models

Ben Bolker

McMaster University, Mathematics & Statistics and Biology

26 September 2014

De�nitions Estimation Inference Challenges & open questions References

Acknowledgments

lme4: Doug Bates, MartinMächler, Steve Walker

Data: Josh Banta, Adrian Stier,Sea McKeon, David Julian,Jada-Simone White

NSERC (Discovery)

SHARCnet

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

De�nitions Estimation Inference Challenges & open questions References

(Generalized) linear mixed models

(G)LMMs: a statistical modeling framework incorporating:

combinations of categorical and continuous predictors,and interactions

(some) non-Normal responses(e.g. binomial, Poisson, and extensions)

(some) nonlinearity(e.g. logistic, exponential, hyperbolic)

non-independent (grouped) data

De�nitions Estimation Inference Challenges & open questions References

least−squaresnonlinear

generalizedlinear models

} correlation

smooth

nonlinearityscaled

variance

effectsrandom

} (non−normal errors)

nonlineartime seriesmodels

thresholds;mixtures;compound distributions;etc. etc. etc. etc.

nonlinearity

effects

randomnonlinearity

correlation

(nonlinearity)

generallinear models

logistic regressionbinomial regressionlog−linear models

linear regressionANOVAanalysis of covariancemultiple linear regression

(non−normal errors)(nonlinearity)

GLMM

mixed models

repeated−measurestime series (ARIMA)models;

generalizedadditivemodels

quasilikelihood negativebinomial models

De�nitions Estimation Inference Challenges & open questions References

Coral protection from seastars (Culcita)by symbionts (McKeon et al., 2012)

none shrimp crabs both

Number of predation events

Symbionts

Num

ber

of b

lock

s

0

2

4

6

8

10

1

2

0

1

2

0

2

0

1

2

De�nitions Estimation Inference Challenges & open questions References

Environmental stress: Glycera cell survival(D. Julian unpubl.)

H2S

Cop

per

0

33.3

66.6

133.3

0 0.03 0.1 0.32

Osm=12.8Normoxia

Osm=22.4Normoxia

0 0.03 0.1 0.32

Osm=32Normoxia

Osm=41.6Normoxia

0 0.03 0.1 0.32

Osm=51.2Normoxia

Osm=12.8Anoxia

0 0.03 0.1 0.32

Osm=22.4Anoxia

Osm=32Anoxia

0 0.03 0.1 0.32

Osm=41.6Anoxia

0

33.3

66.6

133.3

Osm=51.2Anoxia

0.0

0.2

0.4

0.6

0.8

1.0

De�nitions Estimation Inference Challenges & open questions References

Arabidopsis response to fertilization & herbivory(Banta et al., 2010)

Log(

1+fr

uit s

et)

0

1

2

3

4

5

unclipped clipped

●●●●●●

●●

●●

●●

●●

●●●

●●● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●● ●● ●

●●

●●

●●

●● ●

●●

● ●●● ●●● ●●● ●●● ●● ●● ●● ●

● ●● ●●● ●●●●

●●

● ●●

●●●●●● ●●

●● ●

● ●

: nutrient 1

unclipped clipped

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

● ●

●●

●●●● ●● ●●● ●●●● ●

●●

●●

●●●●●●● ●●● ●●

●●

●●●● ●●●●●●●

●●

: nutrient 8

De�nitions Estimation Inference Challenges & open questions References

Coral demography(J.-S. White unpubl.)

Before Experimental

●● ●

●●●

●●

●● ●

●●●

● ●●

●●●

●● ●● ●

●●

●● ●●

● ●●● ●

●● ●● ●●●●● ●

●●

● ●● ●

●● ●

● ●

●●●

●●●●

● ●●

●●

● ●

● ●● ●●

●●●

● ●●

●●

●●●

●●●

●●● ●●● ●●

●●●●

● ● ●●●

●●

● ●●●● ●●

●●●

●●● ●

●● ●●

●●●

●●

●● ●●

●●● ●●●●

●●● ●●

●● ●

● ●

●●0.00

0.25

0.50

0.75

1.00

0 10 20 30 40 50 0 10 20 30 40 50Previous size (cm)

Mor

talit

y pr

obab

ility

Treatment

Present

Removed

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

De�nitions Estimation Inference Challenges & open questions References

Technical de�nition

Yi︸︷︷︸response

conditionaldistribution︷︸︸︷Distr (g−1(ηi )︸ ︷︷ ︸

inverselink

function

, φ︸︷︷︸scale

parameter

)

η︸︷︷︸linear

predictor

= Xβ︸︷︷︸�xede�ects

+ Zb︸︷︷︸randome�ects

b︸︷︷︸conditionalmodes

∼ MVN(0, Σ(θ)︸ ︷︷ ︸variance-covariancematrix

)

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

What are random e�ects?

A method for . . .

accounting for among-individual, within-block correlation

compromising between complete pooling (no among-blockvariance)and �xed e�ects (in�nite among-block variance)

handling levels selected at random from a larger population

sharing information among levels (shrinkage estimation)

estimating variability among levels

allowing predictions for unmeasured levels

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Maximum likelihood estimation

Best �t is a compromise between two components(consistency of data with �xed e�ects and conditional modes;consistency of random e�ect with RE distribution)

Goodness-of-�t integrates over conditional modes

●●

−2

−1

0

1

2

1 2 3 4 5f

y

De�nitions Estimation Inference Challenges & open questions References

Shrinkage: Arabidopsis conditional modes

● ●

●●

● ● ●●

● ● ● ●● ● ● ●

● ● ●●

Genotype

Mea

n fr

uit s

et

0 5 10 15 20 25

0

0.1

1

1020

● group meanshrinkage est.

De�nitions Estimation Inference Challenges & open questions References

Estimation methods

deterministic : various approximate integrals (Breslow, 2004)

Penalized quasi-likelihood, Laplace,Gauss-Hermite quadrature, . . .�exibility and speed vs. accuracy

. . .

stochastic (Monte Carlo): frequentist and Bayesian (Booth andHobert, 1999; Ponciano et al., 2009; Sung, 2007)

usually slower but �exible and accurate

De�nitions Estimation Inference Challenges & open questions References

Estimation: Culcita (McKeon et al., 2012)

Log−odds of predation−6 −4 −2 0 2

Symbiont

Crab vs. Shrimp

Added symbiont

GLM (fixed)GLM (pooled)PQLLaplaceAGQ

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

Wald tests

typical results of summary

exact for ANOVA, regression:approximation for GLM(M)s

fast

approximation is sometimesawful (Hauck-Donner e�ect) parameter

log−

likel

ihoo

d

De�nitions Estimation Inference Challenges & open questions References

2D pro�les for Culcita data

Scatter Plot Matrix

.sig01

2 4 6 8 101214

−3−2−1

0

(Intercept)

0

5

10

1510 15

0 1 2 3

tttcrabs

−10−8−6−4−20

−4 −2 0

0 1 2 3

tttshrimp

−10−8−6−4−2 −6 −4 −2

0 1 2 3

tttboth

−12−10−8−6−4−2

0 1 2 3

De�nitions Estimation Inference Challenges & open questions References

Likelihood ratio tests

better than Wald, but still have to two problems:

�denominator degrees of freedom� (when estimating scale)for GLMMs, distributions are approximate anyway (Bartlettcorrections)Kenward-Roger correction? (Stroup, 2014)

Pro�le con�dence intervals: expensive/fragile

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrapping

�t null model to data

simulate �data� from null model

�t null and working model, compute likelihood di�erence

repeat to estimate null distribution

should be OK but ??? not well tested(assumes estimated parameters are �su�ciently� good)

De�nitions Estimation Inference Challenges & open questions References

Parametric bootstrap results

True p value

Infe

rred

p v

alue

0.020.040.060.08

0.02 0.06

Osm Cu

H2S

0.02 0.06

0.020.040.060.08

Anoxia

normalt(14)t(7)

De�nitions Estimation Inference Challenges & open questions References

Bayesian inference

If we have a good sample from the posterior distribution(Markov chains have converged etc. etc.) we get most of theinferences we want for free by summarizing the marginalposteriors

post hoc Bayesian can work, but mode at zero causes problems

De�nitions Estimation Inference Challenges & open questions References

Culcita con�dence intervals

●●●

●●●

●●●

●●●

●●●

block

(Intercept)

tttboth

tttcrabs

tttshrimp

−15 −10 −5 0 5 10 15Effect (log−odds of predation)

CI

Wald

profile

boot

MCMC

fun

● glmer

glmmADMB

MCMCglmm

De�nitions Estimation Inference Challenges & open questions References

Outline

1 Examples and de�nitions

2 EstimationOverviewMethods

3 Inference

4 Challenges & open questions

De�nitions Estimation Inference Challenges & open questions References

On beyond R

Julia: MixedModels package

SAS: PROC MIXED, NLMIXED

AS-REML

Stata (GLLAMM, xtmelogit)

AD Model Builder

HLM, MLWiN

De�nitions Estimation Inference Challenges & open questions References

Challenges

Small/medium data: inference, singular �ts (blme, MCMCglmm)

Big data: speed!

Worst case: large n, small N (e.g. telemetry/genomics)

Model diagnosis

Con�dence intervals accounting for uncertainty in variances

See also: http://rpubs.com/bbolker/glmmchapter, https://groups.nceas.ucsb.edu/non-linear-modeling/projects

De�nitions Estimation Inference Challenges & open questions References

Spatial and temporal correlations

Sometimes blocking takes care of non-independence ...

but sometimes there is temporal or spatial correlation within

blocks

. . . also phylogenetic . . . (Ives and Zhu, 2006)

�G-side� vs. �R-side� e�ects

tricky to implement for GLMMs,but new possibilities on the horizon (Rousset and Ferdy, 2014;Rue et al., 2009)

De�nitions Estimation Inference Challenges & open questions References

Next steps

Complex random e�ects:regularization, model selection, penalized methods(lasso/fence)

Flexible correlation and variance structures

Flexible/nonparametric random e�ects distributions

hybrid & improved MCMC methods

Reliable assessment of out-of-sample performance

De�nitions Estimation Inference Challenges & open questions References

References

Banta, J.A., Stevens, M.H.H., and Pigliucci, M., 2010. Oikos, 119(2):359�369. ISSN 1600-0706.doi:10.1111/j.1600-0706.2009.17726.x.

Booth, J.G. and Hobert, J.P., 1999. Journal of the Royal Statistical Society. Series B, 61(1):265�285.doi:10.1111/1467-9868.00176.

Breslow, N.E., 2004. In D.Y. Lin and P.J. Heagerty, editors, Proceedings of the second Seattlesymposium in biostatistics: Analysis of correlated data, pages 1�22. Springer. ISBN 0387208623.

Ives, A.R. and Zhu, J., 2006. Ecological Applications, 16(1):20�32.

McKeon, C.S., Stier, A., et al., 2012. Oecologia, 169(4):1095�1103. ISSN 0029-8549.doi:10.1007/s00442-012-2275-2.

Ponciano, J.M., Taper, M.L., et al., 2009. Ecology, 90(2):356�362. ISSN 0012-9658.

Rousset, F. and Ferdy, J.B., 2014. Ecography, page no�no. ISSN 1600-0587. doi:10.1111/ecog.00566.

Rue, H., Martino, S., and Chopin, N., 2009. Journal of the Royal Statistical Society, Series B,71(2):319�392.

Stroup, W.W., 2014. Agronomy Journal, 106:1�17. doi:10.2134/agronj2013.0342.

Sung, Y.J., 2007. The Annals of Statistics, 35(3):990�1011. ISSN 0090-5364.doi:10.1214/009053606000001389.