Lecture 12: Generalized Linear Models (GLM)

52
2001 Bio 4118 Applied Biostatistics L12.1 Université d’Ottawa / University of Ottawa Lecture 12: Generalized Lecture 12: Generalized Linear Models (GLM) Linear Models (GLM) What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions

description

Lecture 12: Generalized Linear Models (GLM). What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions. What are General(ized) Linear Models. Multivariate models. GLMs are models of the form: - PowerPoint PPT Presentation

Transcript of Lecture 12: Generalized Linear Models (GLM)

Page 1: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.1

Université d’Ottawa / University of Ottawa

Lecture 12: Generalized Linear Lecture 12: Generalized Linear Models (GLM)Models (GLM)

What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions

Page 2: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.2

Université d’Ottawa / University of Ottawa

What are General(ized) Linear What are General(ized) Linear ModelsModels

What are General(ized) Linear What are General(ized) Linear ModelsModels

GLMs are models of the form:

with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms.

GLMs are models of the form:

with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms.

Y bX e

Multivariate modelsMultivariate models

Simple linear regressionSimple linear regression

Multiple regressionMultiple regression

Analysis of variance(ANOVA)

Analysis of variance(ANOVA)

Analysis of covariance(ANCOVA)

Analysis of covariance(ANCOVA)

Page 3: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.3

Université d’Ottawa / University of Ottawa

Some GLM proceduresSome GLM proceduresSome GLM proceduresSome GLM procedures

ProcedureDependentvariable

Independent variable(s)

Simpleregression

1 continuous 1 continuous

SingleclassificationANOVA

1 continuous 1 categorical*

Multiple-classificationANOVA

1 continuous 2 or more categorical*

ANCOVA 1 continuousAt least 1 categorical*, atleast 1 continuous

Multipleregression

1 continuous 2 or more continuous

*either categorical or treated as a categorical variable

Page 4: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.4

Université d’Ottawa / University of Ottawa

When do we use When do we use ANCOVA?ANCOVA?

When do we use When do we use ANCOVA?ANCOVA?

to compare the relationship between a dependent (Y) and independent (X1) variable for different levels of one or more categorical variables (X2)

e.g. relationship between body mass (Y) and body size (X1) for different taxonomic groups (birds & mammals, X2)

to compare the relationship between a dependent (Y) and independent (X1) variable for different levels of one or more categorical variables (X2)

e.g. relationship between body mass (Y) and body size (X1) for different taxonomic groups (birds & mammals, X2)

Body size

Bo

dy

ma

ssBody size

Page 5: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.5

Université d’Ottawa / University of Ottawa

When do we use When do we use ANCOVA?ANCOVA?

When do we use When do we use ANCOVA?ANCOVA?

In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables...

…otherwise, one is comparing apples and oranges!

In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables...

…otherwise, one is comparing apples and oranges!

Level 1 of X2

Level 2 of X2

Y

Qualitativelysimilar models

X1

Y

Qualitativelydifferent models

Page 6: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.6

Université d’Ottawa / University of Ottawa

When do we use When do we use ANCOVA?ANCOVA?

When do we use When do we use ANCOVA?ANCOVA?

ANCOVA is used to compare linear models …

… although ANCOVA-like extensions have been developed for nonlinear models.

ANCOVA is used to compare linear models …

… although ANCOVA-like extensions have been developed for nonlinear models.

Level 1 of X2

Level 2 of X2

X1

Y

Non- linear models

X1

Y

Linear models

Page 7: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.7

Université d’Ottawa / University of Ottawa

The simple regression modelThe simple regression model The regression model

is:

So, all simple regression models are described by 2 parameters, the intercept () and slope (b).

=YX(slope)

(intercept)

iii XY

ObservedExpected

X X

Y

i

Xi

Yi

Page 8: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.8

Université d’Ottawa / University of Ottawa

Simple GLMsSimple GLMsSimple GLMsSimple GLMs

Two linear models may differ as follows:

differences in both intercepts () and slopes ()

different intercepts but the same slopes (ANCOVA model)

Two linear models may differ as follows:

differences in both intercepts () and slopes ()

different intercepts but the same slopes (ANCOVA model)

X1

Y

Different &

X1

Y

Different ,same

Page 9: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.9

Université d’Ottawa / University of Ottawa

Simple GLMsSimple GLMsSimple GLMsSimple GLMs

Two linear models may also differ as follows:

different slopes () but the same intercepts ()

same slopes and intercepts (common regression model)

Two linear models may also differ as follows:

different slopes () but the same intercepts ()

same slopes and intercepts (common regression model)

X1

Y

Same different

X1

Y

Same ,same

Page 10: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.10

Université d’Ottawa / University of Ottawa

Fitting GLMsFitting GLMsFitting GLMsFitting GLMs Proceeds in hierarchical

fashion fitting the most complex model first.

Evaluate significance of a term by fitting two models: one with the term in, the other with it removed.

Test for change in model fit ( MF) associated with removal of the term in question.

Proceeds in hierarchical fashion fitting the most complex model first.

Evaluate significance of a term by fitting two models: one with the term in, the other with it removed.

Test for change in model fit ( MF) associated with removal of the term in question.

Model A(term in)

Model B(term out)

MF

Delete term( small)

Retain term( large)

Page 11: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.11

Université d’Ottawa / University of Ottawa

Model fitting: evaluating the significance Model fitting: evaluating the significance of model termsof model terms

Model fitting: evaluating the significance Model fitting: evaluating the significance of model termsof model terms

Fit higher order model (hom) including all possible terms; retain SSresidual and MSresidual .

Fit reduced model (rm), retain SSresidual .

Test for significance of removed term by computing:

Fit higher order model (hom) including all possible terms; retain SSresidual and MSresidual .

Fit reduced model (rm), retain SSresidual .

Test for significance of removed term by computing:

Higher ordermodel

Reducedmodel

F

Delete term(p)

Retain term(p)

Fkresidual

rmresidual

residual

SS SSMS

( ) /hom

hom

1

Page 12: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.12

Université d’Ottawa / University of Ottawa

The full model with 2 The full model with 2 independent variablesindependent variablesThe full model with 2 The full model with 2

independent variablesindependent variables The full model is:

i is the slope of the regression of Y on X1 (the covariate) estimated for level i of the categorical variable X2 .

i is the difference between the mean of each level i of the categorical variable X2

and the overall mean.

The full model is:

i is the slope of the regression of Y on X1 (the covariate) estimated for level i of the categorical variable X2 .

i is the difference between the mean of each level i of the categorical variable X2

and the overall mean.

Y X Xij i i ij i ij ( )

Level 1 of variable X2

Level 2 of variable X2

Y1

Y2

X1 X2

1

1 j

X j1

X Xj1 1

2

1 2

Page 13: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.13

Université d’Ottawa / University of Ottawa

The full model : null The full model : null hypotheseshypotheses

The full model : null The full model : null hypotheseshypotheses

For the full model with 2 independent variables, there are 3 null hypotheses:

For the full model with 2 independent variables, there are 3 null hypotheses:

0:

constant,:

, allfor 0:

03

02

01

i

i

i

H

H

iH

Level 1 of variable X2

Level 2 of variable X2

Y1

Y2

X1 X2

1

1 j

X j1

X Xj1 1

2

1 2

Page 14: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.14

Université d’Ottawa / University of Ottawa

0:

constant,:

, allfor 0:

03

02

01

i

i

i

H

H

iH

0:

constant,:

, allfor 0:

03

02

01

i

i

i

H

H

iH

0:

constant,:

, allfor 0:

03

02

01

i

i

i

H

H

iH

Y

Y

Y

Page 15: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.15

Université d’Ottawa / University of Ottawa

Assumptions for full model Assumptions for full model hypothesis testinghypothesis testing

Residuals are independent and normally distributed.

Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity).

No error in independent variables Relationship between Y and covariate is

linear.

Page 16: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.16

Université d’Ottawa / University of Ottawa

ProcedureProcedureProcedureProcedure

Fit full model, test for differences among slopes.

If H02 rejected, run separate regressions for each level of categorical variable(s).

If H02 accepted, proceed to fit ANCOVA model.

Fit full model, test for differences among slopes.

If H02 rejected, run separate regressions for each level of categorical variable(s).

If H02 accepted, proceed to fit ANCOVA model.

H i02: constant

Level 1 of variable X2

Level 2 of variable X2

ANCOVASeparate

regressions

H02 accepted H02 rejected

X1

Y

Page 17: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.17

Université d’Ottawa / University of Ottawa

The full model is:

is the slope of the regression of Y on X1 (the covariate) pooled over levels of the categorical variable X2 .

i is the difference between the mean of each level i of the categorical variable X2 and the overall mean.

The full model is:

is the slope of the regression of Y on X1 (the covariate) pooled over levels of the categorical variable X2 .

i is the difference between the mean of each level i of the categorical variable X2 and the overall mean.

The ANCOVA model The ANCOVA model with 2 independent with 2 independent

variablesvariables

The ANCOVA model The ANCOVA model with 2 independent with 2 independent

variablesvariables

Y X Xij i ij i ij ( )

Level 1 of variable X2

Level 2 of variable X2

Y1

Y2

X1 X2

1

1 j

X j1

X Xj1 1

2

Page 18: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.18

Université d’Ottawa / University of Ottawa

The ANCOVA model: The ANCOVA model: null hypothesesnull hypotheses

The ANCOVA model: The ANCOVA model: null hypothesesnull hypotheses

For the ANCOVA model with 2 independent variables, there are 2 null hypotheses:

For the ANCOVA model with 2 independent variables, there are 2 null hypotheses:

0:

, allfor 0:

02

01

i

i

H

iH

Level 1 of variable X2

Level 2 of variable X2

Y1

Y2

X1 X2

1

1 j

X j1

X Xj1 1

2

Page 19: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.19

Université d’Ottawa / University of Ottawa

H i

Hi

i

01

02

0

0

: ,

:

for all

H i

Hi

i

01

02

0

0

: ,

:

for all

H i

Hi

i

01

02

0

0

: ,

:

for all

Y

Y

Y

Page 20: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.20

Université d’Ottawa / University of Ottawa

Assumptions for hypothesis testing Assumptions for hypothesis testing in ANCOVA modelin ANCOVA model

Residuals are independent and normally distributed.

Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity).

No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X1 (the

covariate) is the same for all levels of the categorical variable X2 (not an assumption for full model!).

Page 21: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.21

Université d’Ottawa / University of Ottawa

Fit ANCOVA model; test for differences among intercepts.

If H01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X2).

If H01 accepted, proceed to fit common regression model.

Fit ANCOVA model; test for differences among intercepts.

If H01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X2).

If H01 accepted, proceed to fit common regression model.

ProcedureProcedureProcedureProcedure

H i01: constant

Level 1 of variable X2

Level 2 of variable X2

Commonregression

Multiplecomparisons

H01 accepted H01 rejected

X1

Y

Page 22: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.22

Université d’Ottawa / University of Ottawa

The model is:

is the slope of the regression of Y on X1 pooled over levels of the categorical variable X2 .

is the pooled intercept. is the pooled average of

X1.

The model is:

is the slope of the regression of Y on X1 pooled over levels of the categorical variable X2 .

is the pooled intercept. is the pooled average of

X1.

The common regression The common regression model with 2 model with 2

independent variablesindependent variables

The common regression The common regression model with 2 model with 2

independent variablesindependent variables

Y X Xij ij ij ( )

Level 1 of variable X2

Level 2 of variable X2

X

1 j

X j1

X Xj1

X

Page 23: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.23

Université d’Ottawa / University of Ottawa

The common regression The common regression model : null hypothesesmodel : null hypothesesThe common regression The common regression model : null hypothesesmodel : null hypotheses

For the common regression model, there are 2 null hypotheses:

For the common regression model, there are 2 null hypotheses:

H

H01

02

0: ,

: .

0

Level 1 of variable X2

Level 2 of variable X2

X

1 j

X j1

X Xj1

Page 24: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.24

Université d’Ottawa / University of Ottawa

Assumptions for hypothesis testing Assumptions for hypothesis testing in common regression modelin common regression model

Residuals are independent and normally distributed.

Residual variance is equal for all values of X.

No error in independent variable Relationship between Y and X is linear.

Page 25: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.25

Université d’Ottawa / University of Ottawa

Example 1: effects of sex and age on Example 1: effects of sex and age on sturgeon size at The Passturgeon size at The Pas

Example 1: effects of sex and age on Example 1: effects of sex and age on sturgeon size at The Passturgeon size at The Pas

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE

1.5

1.6

1.7

1.8

LFKL

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Males Females

Page 26: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.26

Université d’Ottawa / University of Ottawa

AnalysisAnalysisAnalysisAnalysis

Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels).

Q1: is slope of regression of LFKL on LAGE the same for both sexes?

Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels).

Q1: is slope of regression of LFKL on LAGE the same for both sexes?

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE

1.5

1.6

1.7

1.8

LFKL

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Females

Males

Page 27: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.27

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas

Page 28: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.28

Université d’Ottawa / University of Ottawa

AnalysisAnalysisAnalysisAnalysis

Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H03 ) since p(SEX$*LAGE) > .05 .

Q2: is intercept the same for both males and females?

Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H03 ) since p(SEX$*LAGE) > .05 .

Q2: is intercept the same for both males and females?

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE

1.5

1.6

1.7

1.8

LFKL

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Females

Males

Page 29: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.29

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)

Page 30: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.30

Université d’Ottawa / University of Ottawa

AnalysisAnalysisAnalysisAnalysis

Conclusion 2: Intercept is the same for both males and females. H02 is accepted since p(SEX$ > 0.05), implying that…

…best model is common regression model.

Note that reduction in fit (R2) from full model to ANCOVA model is negligible (.697 to .696) indicating that deleting a model term has a negligible impact on model fit.

Conclusion 2: Intercept is the same for both males and females. H02 is accepted since p(SEX$ > 0.05), implying that…

…best model is common regression model.

Note that reduction in fit (R2) from full model to ANCOVA model is negligible (.697 to .696) indicating that deleting a model term has a negligible impact on model fit.

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7LAGE

1.5

1.6

1.7

1.8

LFKL

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Females

Males

Page 31: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.31

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)

Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)

Page 32: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.32

Université d’Ottawa / University of Ottawa

Example 2: Effect of location and Example 2: Effect of location and age on sturgeon sizeage on sturgeon size

Example 2: Effect of location and Example 2: Effect of location and age on sturgeon sizeage on sturgeon sizeLofW

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Nelson

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

LFKL

LFKL

Page 33: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.33

Université d’Ottawa / University of Ottawa

AnalysisAnalysisAnalysisAnalysis

Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels).

Q: is slope of regression of LFKL on LAGE the same at both locations?

Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels).

Q: is slope of regression of LFKL on LAGE the same at both locations?

NelsonRiver

Lake ofthe Woods

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

LofW

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Nelson

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

LFKL

Page 34: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.34

Université d’Ottawa / University of Ottawa

Effect of location and age on Effect of location and age on sturgeon sizesturgeon size

Effect of location and age on Effect of location and age on sturgeon sizesturgeon size

Page 35: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.35

Université d’Ottawa / University of Ottawa

AnalysisAnalysisAnalysisAnalysis

Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H03 ) since p(LOCATION$*LAGE) < .05 .

So, should fit individual regressions for each location.

Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H03 ) since p(LOCATION$*LAGE) < .05 .

So, should fit individual regressions for each location.

NelsonRiver

Lake ofthe Woods

LofW

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Nelson

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

LofW

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

Nelson

1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8LAGE

1.5

1.6

1.7

1.8

1.9

LFKL

LFKL

LFKL

Page 36: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.36

Université d’Ottawa / University of Ottawa

What do you do if?What do you do if?What do you do if?What do you do if?

More than 2 levels of categorical variable?

More than 2 levels of categorical variable?

Follow above procedure but if H03 (same slope) rejected, do pairwise contrasts of individual slopes.

If H03 accepted but H02 (same intercepts) rejected, do pairwise comparisons of intercepts.

Always control for experiment-wise Type I error rate.

Follow above procedure but if H03 (same slope) rejected, do pairwise contrasts of individual slopes.

If H03 accepted but H02 (same intercepts) rejected, do pairwise comparisons of intercepts.

Always control for experiment-wise Type I error rate.

Y

X

Page 37: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.37

Université d’Ottawa / University of Ottawa

What do you do if?What do you do if?What do you do if?What do you do if?

Biological hypothesis implies one-tailed null(s)?

Biological hypothesis implies one-tailed null(s)?

Follow above procedure but if H03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes.

If H03 accepted but H02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts.

Follow above procedure but if H03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes.

If H03 accepted but H02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts.

Y

X

Page 38: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.38

Université d’Ottawa / University of Ottawa

Power analysis Power analysis in GLMin GLM

Power analysis Power analysis in GLMin GLM

In any GLM, hypotheses are tested by means of an F-test.

Remember: the appropriate SSerror and dferror depends on the type of analysis and the hypothesis under investigation.

Knowing F, we can compute R2, the proportion of the total variance in Y explained by the factor (source) under consideration.

In any GLM, hypotheses are tested by means of an F-test.

Remember: the appropriate SSerror and dferror depends on the type of analysis and the hypothesis under investigation.

Knowing F, we can compute R2, the proportion of the total variance in Y explained by the factor (source) under consideration.

F

FR

df

df

SS

SS

dfSS

dfSS

MS

MSF

factor

error

error

factor

errorerror

factorfactor

error

factor

1

/

/

2

Page 39: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.39

Université d’Ottawa / University of Ottawa

Partial and total Partial and total RR22Partial and total Partial and total RR22

The total R2 (R2Y•B) is the

proportion of variance in Y accounted for (explained by) a set of independent variables B.

The partial R2 (R2Y•A,B- R2

Y•A ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed.

The total R2 (R2Y•B) is the

proportion of variance in Y accounted for (explained by) a set of independent variables B.

The partial R2 (R2Y•A,B- R2

Y•A ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed.

Proportion of varianceaccounted for by both A

and B (R2Y•A,B)

Proportion of variance

accounted for by A only

(R2Y•A)(total R2)

Proportion of variance accounted

for by Bindependent of A

(R2Y•A,B- R2

Y•A )(partial R2)

Page 40: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.40

Université d’Ottawa / University of Ottawa

Partial and total Partial and total RR22

Partial and total Partial and total RR22

The total R2 (R2Y•B) for

set B equals the partial R2 (R2

Y•A,B- R2Y•A ) for set

B if either (1) the total R2 for A (R2

Y•A) is zero; or (2) if A and B are independent (in which case R2

Y•A,B= R2Y•A +

R2Y•B).

The total R2 (R2Y•B) for

set B equals the partial R2 (R2

Y•A,B- R2Y•A ) for set

B if either (1) the total R2 for A (R2

Y•A) is zero; or (2) if A and B are independent (in which case R2

Y•A,B= R2Y•A +

R2Y•B).

Proportion of variance

accounted for by B

(R2Y•B)(total R2)

Proportion of variance

independent of A(R2

Y•A,B- R2Y•A )

(partial R2)

A

Y

B

A

Equal iff

Page 41: Lecture 12: Generalized Linear Models (GLM)

Université d’Ottawa / University of Ottawa

L12.41 Bio 4118 Applied Biostatistics

2001

Partial and total Partial and total RR22Partial and total Partial and total RR22

In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical).

In these cases, set B includes only one variable X and total R2 (R2

Y•B) = total R2 (R2Y•X) and the

partial and total R2 are the same.

In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical).

In these cases, set B includes only one variable X and total R2 (R2

Y•B) = total R2 (R2Y•X) and the

partial and total R2 are the same.

X

Y

Water temperature (°C)

16 20 24 280.00

0.04

0.08

0.12

0.16

0.20

Gro

wth

ra

te

(c

m/d

ay)

Page 42: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.42

Université d’Ottawa / University of Ottawa

Partial and total Partial and total RR22Partial and total Partial and total RR22

In ANCOVA and multiple-factor ANOVA, there are several independent variables X1, X2, ... (either continuous or categorical), so set B includes several variables.

In this case, the total and partial R2 may be very different.

In ANCOVA and multiple-factor ANOVA, there are several independent variables X1, X2, ... (either continuous or categorical), so set B includes several variables.

In this case, the total and partial R2 may be very different.

X1

Y

pH = 6.5pH = 4.5

Water temperature (°C)16 20 24 28

0.00

0.04

0.08

0.12

0.16

0.20

Gro

wth

ra

te

(c

m/d

ay)

Page 43: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.43

Université d’Ottawa / University of Ottawa

Example: Partial and total Example: Partial and total RR2 2 in ANCOVAin ANCOVAExample: Partial and total Example: Partial and total RR2 2 in ANCOVAin ANCOVA

Two independent variables: X1 (continuous) and X2

(categorical)

Two independent variables: X1 (continuous) and X2

(categorical)

121

2

1

21

2,

22,

2

22

22

,2

,2

21 ,

XYXXYAYBAY

XYBY

XYAY

XXYBAY

RRRR

RR

RR

RR

XBXA

X1

Y

X2 = L1

X2 = L2

Page 44: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.44

Université d’Ottawa / University of Ottawa

Defining effect size in GLMDefining effect size in GLMDefining effect size in GLMDefining effect size in GLM

The effect size, denoted f2, is given by the ratio of the factor (source) R2

factor and 1 minus the appropriate error R2

error.

Note: both R2factor and

R2error depend on the

null hypothesis under investigation.

The effect size, denoted f2, is given by the ratio of the factor (source) R2

factor and 1 minus the appropriate error R2

error.

Note: both R2factor and

R2error depend on the

null hypothesis under investigation.

2

22

1 error

factor

R

Rf

Page 45: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.45

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)

Effects of sex and age on size of sturgeon Effects of sex and age on size of sturgeon at The Pas (common regression)at The Pas (common regression)

Page 46: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.46

Université d’Ottawa / University of Ottawa

Defining effect size in GLM: case 1Defining effect size in GLM: case 1Defining effect size in GLM: case 1Defining effect size in GLM: case 1

Case 1: a set B is related to Y, and the total R2 (R2

Y•B) is determined. The error variance

proportion is then 1- R2

Y•B .

H0: R2Y•B = 0

Example: effect of age on sturgeon size at The Pas

B = {LAGE}

Case 1: a set B is related to Y, and the total R2 (R2

Y•B) is determined. The error variance

proportion is then 1- R2

Y•B .

H0: R2Y•B = 0

Example: effect of age on sturgeon size at The Pas

B = {LAGE}

23.2690.1

690.

11 2

2

2

22

LAGE

LAGE

error

factor

R

R

R

Rf

Page 47: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.47

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Passturgeon at The Pas

Page 48: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.48

Université d’Ottawa / University of Ottawa

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)

Effects of sex and age on size of Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)sturgeon at The Pas (ANCOVA model)

Page 49: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.49

Université d’Ottawa / University of Ottawa

Defining effect size in GLM: case 2Defining effect size in GLM: case 2Defining effect size in GLM: case 2Defining effect size in GLM: case 2 Case 2: the proportion of

variance of Y due to B over and above that due to A is determined (R2

Y•A,B- R2Y•A ).

The error variance proportion is then 1- R2

Y•A,B . H0: R2

Y•A,B- R2Y•A = 0

Example: effect of SEX$*LAGE on sturgeon size at The Pas

B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}

Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R2

Y•A,B- R2Y•A ).

The error variance proportion is then 1- R2

Y•A,B . H0: R2

Y•A,B- R2Y•A = 0

Example: effect of SEX$*LAGE on sturgeon size at The Pas

B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}

003.697.1

.696.697.

1 2}*$,$,{

2}$,{

2}*$,$,{

2

LAGESEXLAGESEX

LAGESEX

LAGESEXLAGESEX

R

R

R

f

Page 50: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.50

Université d’Ottawa / University of Ottawa

Determining powerDetermining powerDetermining powerDetermining power Once f2 has been

determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non-central F parameter .

Knowing and factor (source) (1) and error (2) degrees of freedom, we can determine power from appropriate tables for given .

Once f2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non-central F parameter .

Knowing and factor (source) (1) and error (2) degrees of freedom, we can determine power from appropriate tables for given .

= .05)

= .01)

Decreasing 2

1-

1 = 2

= .05

2 3 4 5

= .01

1 1.5 2 2.5

)1( 212 f

Page 51: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.51

Université d’Ottawa / University of Ottawa

Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass

Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass

Sample of 35 lakes 3 pH levels: acid,

circumneutral, basic For each lake, an estimate of

growth rate is obtained (e.g. from size-age regression).

What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given = .05?

Sample of 35 lakes 3 pH levels: acid,

circumneutral, basic For each lake, an estimate of

growth rate is obtained (e.g. from size-age regression).

What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given = .05?

Variable df p

pH 2 0.15

Nutrient (N) 1 <.01

pH*N 2 0.20

Error 29

R2{pH, N, pH*N} 0.44

R2{pH, N } 0.36

R2{N} 0.27

Page 52: Lecture 12: Generalized Linear Models (GLM)

2001

Bio 4118 Applied BiostatisticsL12.52

Université d’Ottawa / University of Ottawa

Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass

Example: effect of pH and nutrient Example: effect of pH and nutrient levels on growth rate of basslevels on growth rate of bass

Sample effect size f2 for pH once effects of N and pH*N have been controlled for = 0.14

Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29

Use tables of based on R2 to get power (NOT the same tables as for ANOVA).

Sample effect size f2 for pH once effects of N and pH*N have been controlled for = 0.14

Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29

Use tables of based on R2 to get power (NOT the same tables as for ANOVA).

),(..)(.

)(f

..

...R

RRf

}N*pH,N,pH{

}pH,N{}N*pH,N,pH{

21

212

2

222

, ,given tables,from441484129214

1

144413644

1