Using Repeated Measures Analysis of Variance (RM-ANOVA) to ...
Repeated Measures ANOVA (RM ANOVA) and Mixed Effects … · RM ANOVA: Greenhouse-Geisser /...
Transcript of Repeated Measures ANOVA (RM ANOVA) and Mixed Effects … · RM ANOVA: Greenhouse-Geisser /...
Lukas Meier (most material based on lecture notes and slides from H.R. Roth)
Repeated Measures ANOVA (RM ANOVA)
and Mixed Effects Models
Repeated Measures ANOVA (RM ANOVA)
Now we want to model everything in “one go”.
We use (multiple) ANOVA approaches. Split-plot model Mixed effects models
1
RM ANOVA: Growth Curves
We have three factors: sex (2 levels) age (4 levels) person (27 levels)
We treat age as a categorical variable. This gives us maximal flexibility as we
do not have to care about the functional form of the age effect.
We set up a model of the form
𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘
𝑖: sex, 𝑗: person, 𝑘: time-point
2
effect of sexdistance effect of time-
point 𝑘interaction
time × sex
deviation (or
error) of
person 𝑗
boy girl
16
20
24
28
32
8 10 12 14 8 10 12 14
age
dis
tance
person
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
RM ANOVA: Growth Curves
Person is a block factor.
We cannot have both sex and person as fixed effects in the model because
this model would not be identifiable anymore.
Here it seems quite natural to treat person as a so called random effect (better:
think of a random error per person). That is, we assume 𝛿𝑗(𝑖) ∼ 𝑁(0, 𝜎𝛿2).
This is nothing else than the split-plot model that some have seen in the
ANOVA class. It is not a split-plot design, because age was not randomized!
We have
whole plot = person split plot = age
3
12 12 12
8
10
8
10
8
10
8
10 …
14 141414
12 12 12
8
10
8
10
8
10
8
10 …
14 141414
12
Boy 1 Girl 1Boy 2 Girl 2
12not randomized
here
RM ANOVA: Growth Curves
We therefore have a so called mixed effects model (containing random and
fixed effects).
We can fit this in R with the lmer function in package lmerTest.
Note that the denominator degrees of freedom for sex are only 25 as we only
have 27 observations on the whole-plot level (patients!).
You can think of doing a two-sample 𝑡-test with two groups having 16 and 11
observations, respectively: 25 = 16 + 11 – 2.
4
RM ANOVA: Growth Curves
Going back to the original questions: Are the profiles of girls and boys parallel? check interaction
Does dental distance change over time? check effect of age
Is the average level of boys and girls the same or not? check effect of sex
5
RM ANOVA: Induced Correlation Structure
Let us have a closer look at the model
𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘
with 𝛿𝑗(𝑖) ∼ 𝑁(0, 𝜎𝛿2) and 휀𝑖𝑗𝑘 ∼ 𝑁(0, 𝜎2).
We get Var 𝑌𝑖𝑗𝑘 = 𝜎𝛿
2 + 𝜎2
Cor 𝑌𝑖𝑗𝑘 , 𝑌𝑖𝑗𝑘∗ = 𝜎𝛿2/(𝜎𝛿
2 + 𝜎2) if 𝑘 ≠ 𝑘∗ (same person but different time-points)
Cor 𝑌𝑖𝑗𝑘 , 𝑌𝑖∗𝑗∗𝑘∗ = 0 if 𝑖 ≠ 𝑖∗ or 𝑗 ≠ 𝑗∗ (different persons)
Observations of the same person are correlated. The correlation is constant,
i.e., does not depend on how far away the time-points lie (this is not always a
meaningful assumption for growth curves!).
Observations from different persons are independent.
6
RM ANOVA: Induced Correlation Structure
Hence, we have the following correlation matrix per person
1 𝜌 𝜌 𝜌𝜌 1 𝜌 𝜌𝜌 𝜌 1 𝜌𝜌 𝜌 𝜌 1
where 𝜌 = 𝜎𝛿2/(𝜎𝛿
2 + 𝜎2) is the so-called intra-class correlation.
This correlation structure is called compound symmetry.
Here, the empirical correlation matrix is
7
RM ANOVA: Greenhouse-Geisser / Huynh-Feldt Epsilon
It is not uncommon that repeated measures data violate the compound
symmetry assumption.
There are measures which describe the deviation from the compound
symmetry model.
Greenhouse-Geisser Epsilon: 휀GG (rather conservative) Huynh-Feldt Epsilon: 휀HF
We have 휀GG ≤ 휀HF ≤ 1, where “= 1” means no deviation.
Correction is being performed by multiplying both the numerator and the
denominator degrees of freedom of the 𝐹-distribution with 휀GG (or 휀HF).
Program “by hand” or use function Anova in package car.
This only affects within-subjects factors!
Alternative: Use model nesting approach? (structured vs. unstructured
approach?)8
Example: Pulse (Spector, 1987)
3 groups of 8 patients each Drug 1 Drug 2 Control (placebo)
Measure pulse at 5, 10, 15 and 20 minutes after taking medication.
9
control drug1 drug2
60
70
80
90
5 10 15 20 5 10 15 20 5 10 15 20
period
y
subject
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Example: Pulse
We treat time as a categorical predictor to be flexible enough.
We use the model
𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛽𝑘 + 𝛼𝛽 𝑖𝑘 + 휀𝑖𝑗𝑘
We use this example to illustrate how one can use other correlation structures.
The random effects (errors) at first sight disappeared!
They are now fully integrated into the error term 휀𝑖𝑗 ∼ 𝑁(0, Σ) ∈ ℝ4
When dealing with longitudinal data it is quite common to use an
autoregressive correlation structure.
10
effect of drugpulseeffect of
time-point 𝑘interaction
drug × time
error term
(correlated)
Example: Pulse
An 𝑨𝑹(𝟏) structure would be (exponential decay)
Σ = 𝜎2
1 𝜙 𝜙2 𝜙3
𝜙 1 𝜙 𝜙2
𝜙2 𝜙 1 𝜙
𝜙3 𝜙2 𝜙 1
The compound symmetry model would be (slightly other notation)
Σ = 𝜎2
1 𝜌 𝜌 𝜌𝜌 1 𝜌 𝜌𝜌 𝜌 1 𝜌𝜌 𝜌 𝜌 1
Many more choices possible.
11
Example: Pulse
For such models, the older package nlme is more user friendly than lme4 or
lmerTest.
We use the function gls (generalized least sq.) or lme from package nlme.
12
Random Intercept / Random Slope Model: Growth Curves
Due to the “nice” profiles we can also try to model the growth curves with a
linear regression model approach (see also summary statistic approach).
We us a random intercept / random slope model:
𝑌𝑖𝑗𝑘 = 𝜇 + 𝛼𝑖 + 𝛿𝑗(𝑖) + (𝛽𝑖 + 𝛽𝑗 𝑖 )𝑥𝑖𝑗𝑘 + 휀𝑖𝑗𝑘
Here: 𝑥𝑖𝑗𝑘 ∈ {8, 10, 12 , 14} is time (as a continuous predictor variable).
It is natural to center time.
That means we use −3,−1, 1 , 3 instead of 8, 10, 12 , 14 , otherwise intercept
and slope estimates will always be correlated.
13
effect of sexdistanceslope in
group 𝑖
person specific deviation from
population slope: 𝑁(0, 𝜎𝛽2).
person specific
deviation from
population
intercept : 𝑁(0, 𝜎𝛿2).
boy girl
16
20
24
28
32
8 10 12 14 8 10 12 14
age
dis
tance
person
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Random Intercept / Random Slope Model: Growth Curves
We can model the two random effects 𝛿𝑗(𝑖) and 𝛽𝑗(𝑖) as independent random
variables or we can allow any correlation structure.
Most general model is multivariate normal with unspecified covariance
matrix:
(𝛿𝑗 𝑖 , 𝛽𝑗(𝑖)) ∼ 𝑁(0, Σ) ∈ ℝ2,
where Σ is a 2 × 2 covariance matrix.
14
Random Intercept / Random Slope Model: Growth Curves
15
Model with arbitrary correlation structure:
ො𝜎𝛽
ො𝜎
ො𝜎𝛿
Check if model was
interpreted correctly
Corr(𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 )
Random Intercept / Random Slope Model: Growth Curves
16
Model with independence assumption:
ො𝜎𝛿
ො𝜎
ො𝜎𝛽 Check if model was
interpreted correctly
Random Intercept / Random Slope Model: Growth Curves
We can compare the nested models with the anova command.
Smaller model (independent intercept and slope) seems to be complex enough.
However, results are very close anyway.
Compare results with summary statistic approach!
Actually, we can further simplify this model (see R-Code).
17
Random Intercept / Random Slope Model: Additional Insights …
For time points 𝑡𝑘 the expressions for the variances and covariances of the
observations 𝑌𝑖𝑗𝑘 are
Var 𝑌𝑖𝑗𝑘 = 𝜎𝛿2 + 2𝑡𝑘Cov 𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 + 𝑡𝑘
2𝜎𝛽2 + 𝜎2
Cov 𝑌𝑖𝑗𝑘 , 𝑌𝑖𝑗𝑘∗ = 𝜎𝛿2 + 𝑡𝑘 + 𝑡𝑘∗ Cov 𝛿𝑗 𝑖 , 𝛽𝑗 𝑖 + 𝑡𝑘𝑡𝑘∗ 𝜎𝛽
2
This means that for time points
𝑡𝑘 > −Cov 𝛿𝑗 𝑖 ,𝛽𝑗 𝑖
𝜎𝛽2
the variance Var 𝑌𝑖𝑗𝑘 increases (and decreases before that time point).
Of course the data does not always follow this assumption.
See also R-File.
18
Example: Rats (Box, 1950)
19
Weight gain of rats under three different experimental conditions.
27 rats were randomly assigned to three treatment groups:(1) Control(2) Drinking water with thyroxin(3) Drinking water with thiourocil
Each animal was kept separately (important!).
Weight was recorded every week, the first time at the beginning of the
experiment:
Week 0 Week 1 Week 2 Week 3 Week 4
Example: Rats
20
Example: Rats
See R-File.
21