Introduction and Overview.pdf

LONGITUDINAL ANALYSIS

Longitudinal Analysis refers to study designs that include measurements for the same responsevariable taken at several occasions for each subject, thus resulting in a response profile foreach subject. The aim is to model and compare the mean response profiles for different groupsor strata, where the groups are defined by the main exposure variable such as treatment.Particular features of the data are

1. The variables are the SAME response measured at different times.

2. Measurements are taken at selected occasions on an underlying continuous time scale.

3. The sequential nature of the observations leads to particular kinds of covariance struc-tures. The inherent dependence among the variables introduces extra complications intothe analysis.

References

1. Dobson AJ & Barnett AG, An Introduction to Generalized Linear Models, thirdedition, CRC Press, 2008.

2. Everitt BS & Hothorn T, A Handbook of Statistical Analyses Using R, CRC Press,2005.

3. Hand D & Crowder M, Practical Longitudinal Data Analysis, Chapman&Hall, 1996.

4. Pinheiro JC & Bates DM, Mixed Effect Models in S and S-Plus, Springer,2000.

Examples

From Crowder & Hand (1996):

1. Body weights of rats on 3 different diets measured on 11 occasions.

2. Blood Glucose levels of 6 volunteers after a 6 am meal

1

From Dobson & BarnetA study investigating 3 different Stroke rehabilitation programs,A = a new occupational therapyB= existing stroke rehabilitation program conducted at same hospital as AC= existing stroke rehabilitation program conducted at a different hospitalwith Response = Bartel index a measure of functional ability = a score out of 100, wherehigher scores correspond to better outcomes

Each patient was assessed weekly for an 8-week period.

> stroke attach(stroke)

> stroke

subject group week1 week2 week3 week4 week5 week6 week7 week8

1 1 1 45 45 45 45 80 80 80 90

2 2 1 20 25 25 25 30 35 30 50

3 3 1 50 50 55 70 70 75 90 90

4 4 1 25 25 35 40 60 60 70 80

5 5 1 100 100 100 100 100 100 100 100

6 6 1 20 20 30 50 50 60 85 95

7 7 1 30 35 35 40 50 60 75 85

8 8 1 30 35 45 50 55 65 65 70

9 9 2 40 55 60 70 80 85 90 90

10 10 2 65 65 70 70 80 80 80 80

11 11 2 30 30 40 45 65 85 85 85

12 12 2 25 35 35 35 40 45 45 45

13 13 2 45 45 80 80 80 80 80 80

14 14 2 15 15 10 10 10 20 20 20

15 15 2 35 35 35 45 45 45 50 50

16 16 2 40 40 40 55 55 55 60 65

17 17 3 20 20 30 30 30 30 30 30

18 18 3 35 35 35 40 40 40 40 40

19 19 3 35 35 35 40 40 40 45 45

20 20 3 45 65 65 65 80 85 95 100

21 21 3 45 65 70 90 90 95 95 100

22 22 3 25 30 30 35 40 40 40 40

23 23 3 25 25 30 30 30 30 35 40

24 24 3 15 35 35 35 40 50 65 65

3

3040

5060

7080

time

mea

n of

wee

k

1 2 3 4 5 6 7 8

group

123

There are usually three fundamental questions:

1. Are the profiles of the means of the groups at the same level; testing for a GROUP effect.

2. Are the profiles flat; testing for a TIME effect.

3. Are the profiles parallel; testing for a GROUP BY TIME INTERACTION.

Methodology

The inherent DEPENDENCE between the repeated measurements for each subject impliesthat the data does NOT have the properties arising from IID variables.

There are 2 methods that modify this problem so that independence based methods can beused:

1. COMPARISONS AT EACH TIME/OCCASION, which is equivalent to analysing eachtime separately and then comparing the analyses. However this allows statement aboutthe change of averages and not about the average of changes. It is thus INVALID.

2. RESPONSE FEATURE ANALYSIS where the vector of measurement on each subject isreduced to a single summarising score which can then be subjected to a simple univariateanalysis. For example

(a) the peak value achieved after administration of some treatment,

(b) the time taken to return to some baseline value,

4

(c) the difference between average post- and pre-treatment scores.

(d) The area under some curve

(e) the slope = the rate of change over time

DISADVANTAGE= LACK OF STATISTICAL POWER

There are several statistical approaches to this problem. We are going to focus on two regressiontype modeling approaches:

1. Generalized Estimating Equations(GEE): Regression approach modeling the correlationstructure explicitly.

2. Mixed Effect Models:Regression approach using models with fixed and random effects.

We illustrate these for a continuous response.

GENERALIZED ESTIMATING EQUATIONS

Generalized estimating equations are extensions of generalized linear models that take intoaccount the within-subject association by specifying a structure for the within-subject correla-tions, called the working correlation matrix. It has been shown that, even if the structure ofthis matrix is wrongly specified, the estimates of the effects of interest will be valid providedrobust sandwhich estimators of the standard errors are used.

Model Formulation

Assume we have N subjects, each with ni measurements. Let yi = (yi1yi2 . . . yini)=responsevector for subject i, and (Y = (y1y2 . . .yN)

t=vector of responses for all subjects.

If Y is continuous with underlying NORMAL distribution, we can model Y as a function ofthe risk factors, X1, X2, . . . , Xp, using

Y = X + e = + e

where Y N(,V ) and V is the variance-covariance matrix. This is a block diagonalmatrix, made up of subject-specific variance-covariance matrices on the diagonal (assumingindependence between subjects):

V =

V1 0 . . . . . . 00 V2 0 . . . 00 0 V3 . . . 0...

......

......

0 0 0 0 Vni

.

where

5

Vi =

i11 i12 . . . . . . i1nii21 i22 . . . . . . . . .

......

......

...ini1 ini2 . . . . . . inini

We estimate the regression coefficients using the following expression:

= (XtV1X)1XTV1Y

which shows that the estimated regression coefficients is a function of the data (X and Y) andof the variance-covariance matrix, V.

There are several commonly used forms for the within-subject covariance matrices, Vi:

1. EXCHANGEABLE, assuming within subject measurements to be correlated but all cor-relations are the same,

Vi = 2

1 . . . 1 . . . ...

......

......

. . . 1

2. AUTOREGRESSIVE, where correlations depend on the distance between observa-

tions, for example, the first-order autoregressive model,

Vi = 2

1 2 . . . n1

1 . . . n2...

......

......

n1 n2 n3 . . . 1

3. UNSTRUCTURED, where all correlation terms are different,

Vi = 2

1 12 13 . . . 1n21 1 23 . . . 2n...

......

......

n1 n2 n3 . . . 1

6

Example

The observed pairwise correlations between the measurements taken at the 8 different occasionsare:

> week cor(week)

week.1 week.2 week.3 week.4 week.5 week.6 week.7 week.8

week.1 1.0000000 0.9280364 0.8820153 0.8306544 0.7936646 0.7125630 0.6163514 0.5544246

week.2 0.9280364 1.0000000 0.9225559 0.8774061 0.8466833 0.7895900 0.7041487 0.6425992

week.3 0.8820153 0.9225559 1.0000000 0.9530914 0.9092148 0.8542616 0.7667288 0.7007907

week.4 0.8306544 0.8774061 0.9530914 1.0000000 0.9215159 0.8786341 0.8313352 0.7716004

week.5 0.7936646 0.8466833 0.9092148 0.9215159 1.0000000 0.9734304 0.9149511 0.8819552

week.6 0.7125630 0.7895900 0.8542616 0.8786341 0.9734304 1.0000000 0.9569344 0.9266933

week.7 0.6163514 0.7041487 0.7667288 0.8313352 0.9149511 0.9569344 1.0000000 0.9776126

week.8 0.5544246 0.6425992 0.7007907 0.7716004 0.8819552 0.9266933 0.9776126 1.0000000

This shows decreasing strength of association between measurements as the distance betweenthe occasions increases.

We fit four GEE models, specifying different correlation structures:

> model1 summary(model1)

Call:

geeglm(formula = week ~ group * time0, family = gaussian, data = strokelong,

weights = subject, id = subject, waves = time, corstr = "independence")

Coefficients:

Estimate Std.err Wald Pr(>|W|)

(Intercept) 35.5440 10.3262 11.848 0.000577

group2 0.8394 11.6467 0.005 0.942548

group3 -2.3758 11.2491 0.045 0.832732

time0 6.5807 1.2645 27.083 1.95e-07

group2:time0 -2.4920 1.5373 2.628 0.105016

group3:time0 -2.8125 1.5750 3.189 0.074148

Estimated Scale Parameters:

Estimate Std.err

(Intercept) 413.1 103.8

Correlation: Structure = independenceNumber of clusters: 24 Maximum cluster size: 8

>

>

>

7


Call:

geeglm(formula = week ~ group + time0 + group * time0, family = gaussian,

data = strokelong, weights = subject, id = subject, waves = time,

corstr = "exchangeable")

Coefficients:


(Intercept) 35.544 10.326 11.85 0.00058

group2 0.839 11.647 0.01 0.94255

group3 -2.376 11.249 0.04 0.83273

time0 6.581 1.265 27.08 1.9e-07

group2:time0 -2.492 1.537 2.63 0.10502

group3:time0 -2.813 1.575 3.19 0.07415


Estimate Std.err

(Intercept) 413 104

Correlation: Structure = exchangeable Link = identity

Estimated Correlation Parameters:

Estimate Std.err

alpha 0.831 0.0868

Number of clusters: 24 Maximum cluster size: 8

>


Call:


weights = subject, id = subject, waves = time, corstr = "ar1")

Coefficients:


(Intercept) 38.42 9.77 15.46 8.4e-05

group2 -2.07 10.85 0.04 0.849

group3 -7.66 10.52 0.53 0.467

time0 6.35 1.21 27.66 1.4e-07

group2:time0 -2.64 1.44 3.38 0.066

group3:time0 -2.37 1.60 2.20 0.138


Estimate Std.err

(Intercept) 416 105

Correlation: Structure = ar1 Link = identity

8


Estimate Std.err

alpha 0.934 0.0294


> fitted3 res3 plot(fitted3,res3)

> hist(res3)

Model validation

There are not many tools available to do model checking after the GEEGLM command. Wecan generate residuals and fitted values and hence can plot residuals versus fitted values andlook at a histogram of residuals. Scatterplot of residuals versus fitted values show possible

30 40 50 60 70 80

-100

0100

200

fitted3

res3

deceasing variance. Histogram of residuals show some departure from normality.

9

Histogram of res3

res3

Frequency

-200 -100 0 100 200

020

4060

80

Inference

The geeglm package does not give us the option to asess model fit or to compare models withthe same variables but different correlation strctures. The best way to decide on a correlationstructure is to compare the estimated working correlation matrix with the observed correlationmatrix.

We can use the anova command to compare GEE models that have a different number ofvariables. For example, we can fit a model without the interaction terms and compare this toa model with the itneraction terms:


Call:

geeglm(formula = week ~ group + time0, family = gaussian, data = strokelong,

weights = subject, id = subject, waves = time, corstr = "ar1")

Coefficients:


(Intercept) 46.007 7.091 42.10 8.7e-11

group2 -11.265 8.793 1.64 0.200

10

group3 -15.910 8.804 3.27 0.071

time0 4.179 0.657 40.45 2.0e-10


Estimate Std.err

(Intercept) 420 103



Estimate Std.err

alpha 0.931 0.0304


> anova(model3,model5)

Analysis of Wald statistic Table

Model 1 week ~ group * time0

Model 2 week ~ group + time0

Df X2 P(>|Chi|)

1 2 3.54 0.17 inclusion of interaction terms do not significantly improve model fit.

Fitting a model with an unstructured within-subject correlation matrix:


Call:


weights = subject, id = subject, corstr = "unstructured")

Coefficients:


(Intercept) 35.369 10.372 11.63 0.00065

group2 0.557 11.756 0.00 0.96222

group3 -2.133 11.298 0.04 0.85024

time0 6.882 1.263 29.68 5.1e-08

group2:time0 -2.538 1.750 2.10 0.14687

group3:time0 -3.528 1.577 5.00 0.02531


Estimate Std.err

(Intercept) 415 105

Correlation: Structure = unstructured Link = identity


11

Estimate Std.err

alpha.1:2 0.371 0.0509

alpha.1:3 0.410 0.0453

alpha.1:4 0.474 0.0414

alpha.1:5 0.494 0.0392

alpha.1:6 0.450 0.0515

alpha.1:7 0.401 0.1079

alpha.1:8 0.430 0.0968

alpha.2:3 0.599 0.0539

alpha.2:4 0.695 0.1014

alpha.2:5 0.764 0.0881

alpha.2:6 0.771 0.1071

alpha.2:7 0.788 0.1071

alpha.2:8 0.819 0.1236

alpha.3:4 0.813 0.1217

alpha.3:5 0.871 0.0888

alpha.3:6 0.867 0.0950

alpha.3:7 0.860 0.0903

alpha.3:8 0.892 0.1006

alpha.4:5 1.034 0.1818

alpha.4:6 1.024 0.2001

alpha.4:7 0.999 0.1734

alpha.4:8 1.039 0.1840

alpha.5:6 1.155 0.1488

alpha.5:7 1.150 0.1412

alpha.5:8 1.194 0.1556

alpha.6:7 1.243 0.1451

alpha.6:8 1.286 0.1631

alpha.7:8 1.382 0.1595


Interpretation of Model 3:

Refit model 3 using different groups as reference category to aid interpretation:

(a) Using group 1 as reference category

> model3a summary(model3a)

Call:

geeglm(formula = week ~ group2 + group3 + time0 + group2 * time0 +

group3 * time0, family = gaussian, data = strokelong, weights = subject,

id = subject, corstr = "ar1")

12

Coefficients:


(Intercept) 38.42 9.77 15.46 8.4e-05

group2TRUE -2.07 10.85 0.04 0.849

group3TRUE -7.66 10.52 0.53 0.467

time0 6.35 1.21 27.66 1.4e-07

group2TRUE:time0 -2.64 1.44 3.38 0.066

group3TRUE:time0 -2.37 1.60 2.20 0.138


Estimate Std.err

(Intercept) 416 105



Estimate Std.err

alpha 0.934 0.0294


(b) Using group 2 as reference category

> model3b summary(model3b)

Call:

geeglm(formula = week ~ group1 * time0 + group3 * time0, family = gaussian,

data = strokelong, weights = subject, id = subject, corstr = "ar1")

Coefficients:


(Intercept) 36.351 4.725 59.19 1.4e-14

group1TRUE 2.066 10.852 0.04 0.849

time0 3.716 0.775 22.96 1.6e-06

group3TRUE -5.589 6.136 0.83 0.362

group1TRUE:time0 2.639 1.436 3.38 0.066

time0:group3TRUE 0.269 1.301 0.04 0.836


Estimate Std.err

(Intercept) 416 105



Estimate Std.err

alpha 0.934 0.0294

13


> (c) Using group 3 as reference category

> model3c summary(model3c)

Call:

geeglm(formula = week ~ group1 * time0 + group2 * time0, family = gaussian,

data = strokelong, weights = subject, id = subject, corstr = "ar1")

Coefficients:


(Intercept) 30.762 3.914 61.76 3.9e-15

group1TRUE 7.655 10.525 0.53 0.46700

time0 3.985 1.045 14.54 0.00014

group2TRUE 5.589 6.136 0.83 0.36231

group1TRUE:time0 2.370 1.597 2.20 0.13790

time0:group2TRUE -0.269 1.301 0.04 0.83631


Estimate Std.err

(Intercept) 416 105



Estimate Std.err

alpha 0.934 0.0294


>

Back to models 3 to calculate confidence intervals for effects:

> coeff coeff

(Intercept) group2 group3 time0 group2:time0 group3:time0

38.42 -2.07 -7.66 6.35 -2.64 -2.37

> se se

Std.err

(Intercept) 9.77

group2 10.85

group3 10.52

time0 1.21

group2:time0 1.44

group3:time0 1.60

14

> CIgroup2 CIgroup3 CItime1 CIgroup2time CIgroup3time

> coeff3b se3b CItime2

> coeff3c se3c CItime3

> CI coeff

(Intercept) group2 group3 time0 group2:time0 group3:time0

38.42 -2.07 -7.66 6.35 -2.64 -2.37

> CI

[[1]]

[1] -23.3 19.2

[[2]]

[1] -28.3 13.0

[[3]]

[1] 3.99 8.72

[[4]]

[1] -5.452 0.175

[[5]]

[1] -5.501 0.761

[[6]]

[1] 2.20 5.24

[[7]]

[1] 1.94 6.03

Because there is a significant time*group interaction, you cannot interpret the main effects inthe model as is. The coefficients for the main effects will be estimates of those effects whenthe other variable involved in the interaction is zero.

15

So, the coefficient for time is the effect of time when both group2 and group3 are zero, i.e., theeffect of time for group 1.

Thus with respect to the effect of time:

For group 1: a one week increase will lead to a 6.35 unit increase in the Bartel Index score andthis increase is statistically significant (p

Effect Estimate Std.err p-value CI

(Intercept) 38.42 9.77 < 0.0001group2 vs 1 -2.07 10.85 0.849 (-23.3;19.2)group3 vs 1 -7.66 10.52 0.467 (-28.3;13.0)

week(group 1) 6.35 1.21 < 0.0001 (3.99;8.72)group2*week -2.64 1.44 0.066 (-5.45;0.18)group3*week -2.37 1.60 0.138 (-5.50;0.76)

week(group 2) 3.72 0.76

and the fixed effects are models as before using

= 0 + 1(group) + 2(week) + 3(group week).

This results in the following estimates of mean and variances,

E(yij = ,

cov(yij , yik) = 2a,

cov(yij , ymj) = 0,

andvar(yij) =

2a +

2e .

This results in a within-subject covariance matrix that has an exchangeable correlation struc-ture:

Vi =

2a +

2e

2a . . . . . .

2a

2a 2a +

2e

2a . . . . . .

......

......

...2a

2a . . . . . .

2a +

2e

A general mixed effect model formulation with one X-variable would be

yij = 1 + 2x2ij + ai + eij .

Regrouping givesyij = (1 + ai) + 2x2ij + eij ,

i.e., a regression model with a patient-specific intercept, also known as a random interceptmodel.

This model can be extended by adding a random component to the slopes:

yij = (1 + a1i) + (2 + a2i)x2ij + eij ,

where a1i N(0;2a1) and a2i N(0;2a2) and covariance between a1i and a2i is a1a2.

Estimation

In R there are two procedures for fitting linear mixed effect models, the lme procedure( in thepackage nlme) and the more recent lmer procedure( in the package lme4). They both linearmixed effect models using either maximum likelihood (ML) or restricted maximum likelihood(REML). Maximum likelihood estimation tends to underestimate the variance components,while REML provides consistent estimaets of the variance components.

18

Inference

Models can be compared using likelihood ratio chi-square statistics and Aikaikes informationcriterion. When you wish to compare models with a different number of fixed effects, theseodels should have been estimated using ML rather than REML.

As for GLMs, the model coefficients follow normal distributions and their statistial significancecan be assessed using the Wald statistic.

Validation

For continuous responses, we now have several random terms that are assumed to be normallydistributed with constant variance and these assumptions can be tested for each of the randomterms using a graphical analysis of residuals. The subject-specific random effects are notestimated as part of the model but can be determined using empirical Bayesian estimators.

Example

We return to the data for the Stroke patients. In R we can build the design of repeatedmeasures within subject specific groups into the specification of a data object that makessummarizing and graphing of the longitudinal profiles much easier. This is done by adding aformula that specifies the roles of some of the variables, notably the response, the groupingfactor and the primary covariate,

responseY primaryX|groupingX.

In addition, we can specify between group covariates as outer to the grouping factor,

responseY primaryX|groupingX, outer = Z.

Thus for our data,

> strokedata strokedata

Grouped Data: week ~ time | subject

subject group time week

1.1.1 1 1 1 45

2.1.1 2 1 1 20

3.1.1 3 1 1 50

4.1.1 4 1 1 25

5.1.1 5 1 1 100

....

plot(strokedata)

> plot(strokedata,outer=T)

19

time

week

20406080100

2 4 6 8

2 8

2 4 6 8

4 7

2 4 6 8

1 3

6 5 14 12 15

20406080100

1620406080100

10 13 11 9 17 18

22

2 4 6 8

23 19

2 4 6 8

24 20

2 4 6 8

20406080100

21

time

week

20

40

60

80

100

2 4 6 8

1 2

20

40

60

80

1003

2847

1365

14121516

1013119

17182223

20

Model 1 with subject-specific random effect on intercept:

> memodel1 summary(memodel1)

Linear mixed-effects model fit by REML

Data: strokedata

AIC BIC logLik

1452.715 1478.521 -718.3573

Random effects:

Formula: ~1 | subject

(Intercept) Residual

StdDev: 20.12839 8.56443

Fixed effects: week ~ group * time

Value Std.Error DF t-value p-value

(Intercept) 29.821429 7.497379 165 3.977581 0.0001

group2 3.348214 10.602895 21 0.315783 0.7553

group3 -0.022321 10.602895 21 -0.002105 0.9983

time 6.324405 0.467228 165 13.536016 0.0000

group2:time -1.994048 0.660760 165 -3.017809 0.0030

group3:time -2.686012 0.660760 165 -4.065034 0.0001

Correlation:

(Intr) group2 group3 time grp2:t

group2 -0.707

group3 -0.707 0.500

time -0.280 0.198 0.198

group2:time 0.198 -0.280 -0.140 -0.707

group3:time 0.198 -0.140 -0.280 -0.707 0.500

Standardized Within-Group Residuals:

Min Q1 Med Q3 Max

-2.655094020 -0.533345500 -0.003758181 0.535021362 2.692371849

Number of Observations: 192

Number of Groups: 24

Model with subject specific random intercept and random slope:

> memodel4 summary(memodel4)


Data: strokedata

AIC BIC logLik

1345.732 1377.989 -662.8659

Random effects:

Formula: ~time | subject

21

Structure: General positive-definite, Log-Cholesky parametrization

StdDev Corr

(Intercept) 21.033143 (Intr)

time 2.949524 -0.372

Residual 5.181279



(Intercept) 29.821429 7.572089 165 3.938336 0.0001

group2 3.348214 10.708551 21 0.312667 0.7576

group3 -0.022321 10.708551 21 -0.002084 0.9984

time 6.324405 1.080444 165 5.853524 0.0000

group2:time -1.994048 1.527979 165 -1.305023 0.1937

group3:time -2.686012 1.527979 165 -1.757886 0.0806

Correlation:


group2 -0.707

group3 -0.707 0.500

time -0.396 0.280 0.280

group2:time 0.280 -0.396 -0.198 -0.707

group3:time 0.280 -0.198 -0.396 -0.707 0.500


Min Q1 Med Q3 Max

-2.896086744 -0.454243718 0.004056635 0.461422937 3.166613171



Comparing the two models

> anova(memodel4,memodel1)

Model df AIC BIC logLik Test L.Ratio p-value

memodel4 1 10 1345.732 1377.989 -662.8659

memodel1 2 8 1452.715 1478.521 -718.3573 1 vs 2 110.9827

> plot(memodel1)

> qqnorm(memodel1,~resid(.))

> plot(memodel4)


Model comparisons using the LR test shows that the larger model 4 is significantly betterthan model 1. Model 4 also has a lower AIC. The plot of residuals versus fitted values andthe qnorm plots of residuals show that the assumptions of constant variance and normality ofresiduals are better satisfied for model 4 than for model 1.

22

Fitted values

Sta

ndar

dize

d re

sidu

als

-2

-1

0

1

2

0 20 40 60 80 100 120

Residuals

Qua

ntile

s of

sta

ndar

d no

rmal

-3

-2

-1

0

1

2

3

-20 -10 0 10 20

Fitted values

Sta

ndar

dize

d re

sidu

als

-3

-2

-1

0

1

2

3

20 40 60 80 100

23

Residuals

Qua

ntile

s of

sta

ndar

d no

rmal

-3

-2

-1

0

1

2

3

-10 0 10

Since our repeated measures within subjects are actually time series data, the exchangeablecorrelations structure that is induced by the addition of a single subject specific random effect( as in model1) may not be appropriate. We can specify different correlation structures as wedid in the GEE models. We can firstly examine the assumptions of independence between theresiduals using the following plot that shows a correlation between residuals at lag 1.

Lag

Autocorrelation

-1.0

-0.5

0.0

0.5

0 2 4 6

There are several serial autocorrelation structures we can fit. We consider just one of these,the AR1 process described in the discussion of GEE models above.

>memodel2summary(memodel2)


Data: strokedata

AIC BIC logLik

1322.321 1351.353 -652.1607

Random effects:

24


(Intercept) Residual

StdDev: 0.005875371 21.42593

Correlation Structure: AR(1)


Parameter estimate(s):

Phi

0.9495747



(Intercept) 33.39312 7.937133 165 4.207201 0.0000

group2 -0.11517 11.224802 21 -0.010261 0.9919

group3 -6.22566 11.224802 21 -0.554635 0.5850

time 6.07484 0.843599 165 7.201092 0.0000

group2:time -2.14085 1.193030 165 -1.794467 0.0746

group3:time -2.23826 1.193030 165 -1.876112 0.0624

Correlation:


group2 -0.707

group3 -0.707 0.500

time -0.478 0.338 0.338

group2:time 0.338 -0.478 -0.239 -0.707

group3:time 0.338 -0.239 -0.478 -0.707 0.500


Min Q1 Med Q3 Max

-2.1430561 -0.5861328 -0.2259585 0.6532256 2.8251769



> anova(memodel1,memodel2)


memodel1 1 8 1452.715 1478.521 -718.3573

memodel2 2 9 1322.321 1351.353 -652.1607 1 vs 2 132.3932 plot(memodel2)


25

Fitted values

Sta

ndar

dize

d re

sidu

als

-2

-1

0

1

2

3

30 40 50 60 70 80

Residuals

Qua

ntile

s of

sta

ndar

d no

rmal

-3

-2

-1

0

1

2

3

-40 -20 0 20 40 60

Finally we can consider dropping the two interaction terms from the model. We will now wishto compare the simplified model with just the main effects, to the model with the main andinteraction effects. To be able to compare models based on a different selection of fixed effects,we have to estimate the models using maximum likelihood. This shows us that the mdoel withthe interaction terms is the better model.

> memodel1ML memodel5ML anova(memodel1ML,memodel5ML)


memodel1ML 1 8 1470.786 1496.846 -727.3930

memodel5ML 2 6 1484.014 1503.560 -736.0072 1 vs 2 17.22849 2e-04

The previous analysis could have been done using the lmer procedure, rather than the lmeprocedure.

> library(lme4)

>

26

> memodel1a memodel1a

Linear mixed model fit by REML

Formula: week ~ group * time + (1 | subject)

Data: strokedata

AIC BIC logLik deviance REMLdev

1453 1479 -718.4 1455 1437

Random effects:

Groups Name Variance Std.Dev.

subject (Intercept) 405.152 20.1284

Residual 73.349 8.5644

Number of obs: 192, groups: subject, 24

Fixed effects:

Estimate Std. Error t value

(Intercept) 29.82143 7.49676 3.978

group2 3.34821 10.60201 0.316

group3 -0.02232 10.60201 -0.002

time 6.32440 0.46723 13.536

group2:time -1.99405 0.66076 -3.018

group3:time -2.68601 0.66076 -4.065

Correlation of Fixed Effects:


group2 -0.707

group3 -0.707 0.500

time -0.280 0.198 0.198

group2:time 0.198 -0.280 -0.140 -0.707

group3:time 0.198 -0.140 -0.280 -0.707 0.500

> qint qres layout(matrix(c(1,2),nrow=1,ncol=2))

> qqnorm(qint,main="Random intercepts")

> qqline(qint)

> qqnorm(qres,main="Residuals")

> qqline(qres)

> memodel4a memodel4a

Linear mixed model fit by REML

Formula: week ~ group * time + (time | subject)

Data: strokedata

AIC BIC logLik deviance REMLdev

1346 1378 -662.9 1349 1326

Random effects:

Groups Name Variance Std.Dev. Corr

subject (Intercept) 442.3932 21.0331

27

time 8.6997 2.9495 -0.372

Residual 26.8457 5.1813

Number of obs: 192, groups: subject, 24

Fixed effects:

Estimate Std. Error t value

(Intercept) 29.82143 7.57209 3.938

group2 3.34821 10.70855 0.313

group3 -0.02232 10.70855 -0.002

time 6.32440 1.08049 5.853

group2:time -1.99405 1.52804 -1.305

group3:time -2.68601 1.52804 -1.758

Correlation of Fixed Effects:


group2 -0.707

group3 -0.707 0.500

time -0.396 0.280 0.280

group2:time 0.280 -0.396 -0.198 -0.707

group3:time 0.280 -0.198 -0.396 -0.707 0.500

> anova(memodel1a,memodel4a)

Data: strokedata

Models:

memodel1a: week ~ group * time + (1 | subject)

memodel4a: week ~ group * time + (time | subject)

Df AIC BIC logLik Chisq Chi Df Pr(>Chisq)

memodel1a 8 1470.9 1497.0 -727.46

memodel4a 10 1368.8 1401.3 -674.38 106.16 2 < 2.2e-16

> qint qtime qres layout(matrix(c(1,2,3,4),nrow=2,ncol=2))

> qqnorm(qint,main="Random Intercepts")

> qqline(qint)

> qqnorm(qtime,main="Random Slopes")

> qqline(qtime)

> qqnorm(qres,main="Residuals")

> qqline(qres)

> hist(qres)

To fit the model using ML rather than REML, use the following syntax:

memodel1a

and autoregressive withiin-subject correlation structure.

GENERALIZED LINEAR MIXED EFFECT MODELS

For Non-normally distributed data, we deal with a different distribution and we have a linkfunction so that E(yij) = and g() = 0 + 1x1 + . . . + pxp. Random effects are as beforeand are assumed to follow normal distributions.

For a binary response we will fit a mixed effect logistic regression model of the general form,

logit(pij) = 0 + 1X1j + 2X2j + 3X3ij + . . .+ pXpj + uj + eij .

The estimation procedure is more complicated but the general ideas hold. An approximateestimation method known as Penalized Quasi Likelihood (PQL) is used. Currently it is notpossible to calculate AIC or GLRT statistics.

Example

Data from a clinical trial to evaluate the effectiveness of the drug ribavirin on patients withAIDS (Wei etal 1989). Three groups of 12 patients were assigned to placebo, low dose andhigh dose of ribavirin to evaluate its anti-retroviral effectiveness. The data presents days toviral positivity (p24 antigen level>100pg/ml) in AIDS blood samples taken at 1, 2, and 3months. Censoring occurred if p24 level of 100 was not reached within 4-week period or ifsample became contaminated before positivity was detected.

We will focus on the viral positivity indicator and use a mixed effect model to compare thethree treatment groups with respect to the relative odds of achieving viral positivity. We willalso look at the effect of month on this comparison.

> library(nlme)

> library(MASS)

>

> aids attach(aids)

> aids[1:10,]

patient month treatment stime viralpos

1 1 1 0 9 1

2 1 2 0 6 1

3 1 3 0 7 1

4 2 1 0 4 1

5 2 2 0 5 1

6 2 3 0 10 1

7 3 1 0 6 1

8 3 2 0 7 1

9 3 3 0 6 1

10 4 1 0 10 1

29

> aids$month aids$treat aidsdata aidsdata[1:10,]

Grouped Data: viralpos ~ month | patient

patient month treatment stime viralpos treat

1 1 1 0 9 1 0

2 1 2 0 6 1 0

3 1 3 0 7 1 0

4 2 1 0 4 1 0

5 2 2 0 5 1 0

6 2 3 0 10 1 0

7 3 1 0 6 1 0

8 3 2 0 7 1 0

9 3 3 0 6 1 0

10 4 1 0 10 1 0

> tabpct(aids$treatment[aids$month==1],aids$viralpos[aids$month==1],percent="row")

Row percent

aids$viralpos[aids$month == 1]

aids$treatment[aids$month == 1] 0 1 Total

0 0 12 12

(0) (100) (100)

1 3 9 12

(25) (75) (100)

2 2 10 12

(16.7) (83.3) (100)


Row percent



0 1 8 9

(11.1) (88.9) (100)

1 4 8 12

(33.3) (66.7) (100)

2 1 10 11

(9.1) (90.9) (100)


Row percent



0 3 8 11

30

(27.3) (72.7) (100)

1 5 6 11

(45.5) (54.5) (100)

2 5 7 12

(41.7) (58.3) (100)

>

barplot(tapply(viralpos,list(treatment,month),mean),beside=T)

1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

> memodel1a

> CI names(CI)

31

[1] "fixed" "reStruct" "sigma"

> ORlcl OR ORucl results results

ORlcl OR ORucl

(Intercept) 6.96796542 37.1670643 198.2487831

treat1 0.01983000 0.1384685 0.9668946

treat2 0.05227792 0.3878785 2.8778834

month2 0.20838917 0.6439082 1.9896320

month3 0.04411934 0.1274121 0.3679532

> fitted residuals plot(fitted,residuals)

-2 0 2 4

-6-4

-20

2

fitted

residuals

predp

0 1

0.2

0.4

0.6

0.8

1.0

factor(viralpos)

predp

Class Exercise

A randomized double blind, parrallel group, multicentre study for the comparison of two oraltreatments (A and B) for toenail dermatophyte onychomycosis (TDO). Subjects were followedup for 12 months and assessments made at baseline and months 1, 2, 3, 6, 9 and 12. There weretwo responses: (i) the unaffected nail length, and (ii) the severity of the infection, coded 0 fornot sever and 1 for severe. Data come from De Backer etal., Journal of the European Academyof Dermatology and Venereology,5(Suppl. 1), 1995. A subset of the data for patients followedup at each visit is in toenail.csv. Compare the two treatment groups wrt (i) unaffected naillength and (ii) severity of infection over the 12 month follow-up period.

33

Introduction and Overview.pdf

Documents

Transcript of Introduction and Overview.pdf