Stat 502Design and Analysis of Experiments
Factorial Design
Fritz Scholz
Department of Statistics, University of Washington
December 2, 2013
1
Factorial Design
We have looked at 1-sample, 2-sample, and t-sample problems.
We dealt with a treatment at t levels or with t treatments.
A treatment with t levels could also be viewed as a factor.
Now address experiments where several factors come into play.
First we will do this for two such factors.
How should we go about this?
2
Insecticide Data
> poison=read.csv("poison.csv",header=T)
> poisony type delivery
1 3.1 I A2 4.5 I A3 4.6 I A4 4.3 I A5 3.6 II A6 2.9 II A7 4.0 II A8 2.3 II A9 2.2 III A.....44 3.8 II D45 3.0 III D46 3.6 III D47 3.1 III D48 3.3 III D
3
Insecticide Example
We have 3 types of insecticides (I, II, and III) and 4 methods(A,B,C,D) of delivering the insecticide.
(I, II, and III) and (A,B,C,D) are the levels of the respectivefactors type of insecticide and insecticide delivery method.
The response Y is the time to death in minutes.
Find the best insecticide and the best delivery method.
We have 48 experimental insects to experiment with.
Randomly divide the 48 insects into 12 = 3× 4 groups of 4insects each, assigning the respective groups to the 12treatment combinations(I,A), (I,B), (I,C), (I,D), . . . , (III,C), (III,D).
Randomize the order of all 48 runs to eliminate order biases.
This is a factorial design,speci�cally a 3× 4 factor design with 4 replications.
4
Response Table
Visualize responses in relation to the factor levels as follows:
Insecticide
Type
Delivery Method
A B C D
I YI,A YI,B YI,C YI,D
II YII,A YII,B YII,C YII,D
III YIII,A YIII,B YIII,C YIII,D
YI,A short for (YI,A,1, . . . ,YI,A,4) (replication depth), etc.
More generically we would denote the kth response under leveli from factor 1 and under level j from factor 2 by Yijk .
Triplet notation (i , j , k) is more useful than ` = 1, 2, . . . , 48.
Useful in Σ summation notation and also for identifying thefactor levels, i.e., the factor level/replication coordinates.
5
First Look at Insecticide Boxplots
●
I II III
24
68
1012
type of poison
time
to d
eath
(m
inut
es)
●
A B C D
24
68
1012
delivery method
time
to d
eath
(m
inut
es)
6
First Impressions and Questions
Insecticide type III and delivery method A seem to give thebest combination.
Is combination the right word here?
Are the e�ects of delivery consistent across types, i.e., is thedelivery e�ectiveness order (in terms of faster response time)the same from one insecticide type to another?
It could be that delivery type A is not the fastest actingamong all four when applied to insecticide type III.
Method C could actually be better in combination with III.
7
Insecticide Responses by Delivery Methodtim
e to
dea
th (
min
utes
)
delivery method A
I II III
05
1015
●
●●●
●
●
●
● ●●●●
time
to d
eath
(m
inut
es)
delivery method B
I II III
05
1015
●
●
●
●
●
●
●
●
●
●●
●
time
to d
eath
(m
inut
es)
delivery method C
I II III
05
1015
●●
●
●
●
●●
●
●●●●
time
to d
eath
(m
inut
es)
delivery method D
I II III
05
1015
●
●●●
●
●
●
●
●●●●
8
Full Comparison of Insecticide Responsestim
e to
dea
th (
min
utes
)
24
68
1012
●
●●
●
I.A
●
●
●
●
II.A
●●
●
●
III.A
●
●
●
●
I.B
●
●
●
●
II.B
●
●●
●
III.B
●●
●
●
I.C
●
●
●
●
II.C
●●●●
III.C
●
●
●
●
I.D
●
●
●
●
II.D
●
●
●●
III.D
type.delivery combinations
9
Comments
Insecticide type III seems to have lowest response with all 4delivery methods.
The mean levels for each (III, delivery method) combinationare ≈ consistent.
The scatter within each (III, delivery method) combination isquite tight.
Delivery appears to have an e�ect on the response under typeI and II, both in absolute terms and relative to each other.
It appears: scatter ↗ as mean ↗ across all combinations.
=⇒ variance stabilizing transformation. Deal with that �rst.
10
Linear Fit log(si) = a × log(µi) + b
●●
●
●
●
●
●
●
●
●
●
●
2 3 4 5 6 7 8 9
0.0
0.5
1.0
1.5
2.0
2.5
3.0
µi
s i
●●
●
●
●
●
●
●
●
●
●
●
0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2
−2.
0−
1.5
−1.
0−
0.5
0.0
0.5
1.0
log(µi)
log(
s i)
intercept = −3.2
slope = 1.977
11
Reciprocal Transform
According to guideline: ⇒ α = 2 or λ = 1− α = −1, i.e.,Yijk = Y−1ijk = 1/Yijk a reciprocal transform for our response.
A rationalization attempt: Suppose the absorption rateR = d/t (of dose d over time t) under any given combinationis the most variable process aspect from insect to insect.
Assume that this absorption variability (ingestion variabilityfrom insect to insect) is constant across all (insecticide type,delivery method) combinations.
Assume lethal dose D is ≈ constant for each type, ∀ insects.Then the time to reach lethal dose is T = D/R .
The transformed 1/T = R/D has ≈ constant variability.
A 1-term Taylor expansion around µR with D ≈ constant
T =D
R≈
D
µR− (R − µR)
D
µ2R⇒ µT ≈
D
µR, var(T ) ≈
D2σ2Rµ4R
⇒ σT ∝ µ2T
12
Reciprocal Time Boxplots
I II III
0.1
0.2
0.3
0.4
0.5
type of poison
reci
proc
al ti
me
to d
eath
(1/
min
ute)
A B C D0.
10.
20.
30.
40.
5
delivery method
reci
proc
al ti
me
to d
eath
(1/
min
ute)
The variability is much more stable now.
13
Reciprocal Time by Delivery Methodre
cipr
ocal
tim
e to
dea
th (
1/m
inut
es)
delivery method A
I II III
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●●
●
●
●
●●●
●
●
reci
proc
al ti
me
to d
eath
(1/
min
utes
)
delivery method B
I II III
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●●
●
●
●
●
●
●●
●
reci
proc
al ti
me
to d
eath
(1/
min
utes
)
delivery method C
I II III
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●●
●
●
●
●
●
●
●
●●
●
reci
proc
al ti
me
to d
eath
(1/
min
utes
)
delivery method D
I II III
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●●
●
●
●
●
●
●
●●
14
Full Comparison of Reciprocal Timesre
cipr
ocal
tim
e to
dea
th (
1/m
inut
e)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●
●
I.A
●
●
●
●
II.A
●
●
●
●
III.A
●
●
●
●
I.B
●
●
●
●
II.B
●
●●
●
III.B
●●
●
●
I.C
●
●
●
●
II.C
●
●
●
●
III.C
●
●●●
I.D
●
●
●
●
II.D
●
●
●
●
III.D
type.delivery combinations
Note the consistent variability across all 12 treatment combinations!
15
log(si) = a × log(µi) + b for Reciprocal Times
●
●
●
●
●
●
●
●
●
●
●
●
0.2 0.3 0.4
0.02
0.03
0.04
0.05
0.06
0.07
0.08
µi
s i
●
●
●
●
●
●
●
●
●
●
●
●
−2.2 −2.0 −1.8 −1.6 −1.4 −1.2 −1.0 −0.8−
3.8
−3.
6−
3.4
−3.
2−
3.0
−2.
8−
2.6
log(µi)
log(
s i)
No strong linearity remains!
16
ANOVA of Reciprocal Times vs Type
Assume recip.time,type,delivery are in the workspace.type and delivery are factorsor write as.factor(type) in place of type!> anova(lm(recip.time ∼ type))or> anova(lm(recip.time ∼ type,data=poison))
Analysis of Variance Table
Response: recip.timeDf Sum Sq Mean Sq F value Pr(>F)
type 2 0.34877 0.17439 25.621 3.728e-08 ***Residuals 45 0.30628 0.00681---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
17
ANOVA of Reciprocal Times vs Delivery
> anova(lm(recip.time ∼ delivery))or> anova(lm(recip.time ∼ delivery,data=poison))
Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.20414 0.06805 6.6401 0.0008496 ***Residuals 44 0.45091 0.01025
18
ANOVA for Reciprocal Timesvs All Type:Delivery Combinations
> anova(lm(recip.time ∼ type:delivery))or> anova(lm(recip.time ∼
type:delivery,data=poison))
Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)type:delivery 11 0.56862 0.05169 21.531 1.289e-12 ***Residuals 36 0.08643 0.00240
This is like a one-way ANOVA with 12 treatments.
Legitimate but not enlightening.
19
All Three ANOVAs for Reciprocal Times
$ANOVA.type Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)type 2 0.34877 0.17439 25.621 3.728e-08 ***Residuals 45 0.30628 0.00681
$ANOVA.delivery Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.20414 0.06805 6.6401 0.0008496 ***Residuals 44 0.45091 0.01025
$ANOVA.type.delivery Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)type:delivery 11 0.56862 0.05169 21.531 1.289e-12 ***Residuals 36 0.08643 0.00240
20
Are these Analyses Appropriate?
What does MSE represent in the �rst two ANOVAs?
Compare these values to MSE in the third ANOVA.
The MSE 's in the �rst two ANOVAs are in�ated because meanvariation in the ignored factor is part of the error variation.
On slide 14 note how the variation within each of the 4delivery groups also re�ects the response variation due topoison type (color).
The variation within each of the three colors (poison type) alsore�ects the variation due to delivery method.
Note that the p-value in the third ANOVA is very muchsmaller than in the other two ANOVAs.
This is due to appropriately smaller MSE here.
21
The Third ANOVA
The third ANOVA is technically correct in stating that themeans change across factor level combinations.
View the 3 · 4 combinations as t = 12 treatments/samples.
Does the third ANOVA give any insight on the separatecontributions of the type factor and the delivery factor?
No! It is insu�cient in that regard.
It is easy to conceive of situations where the 1st and 2nd
ANOVA produce insigni�cant F -values but the 3rd ANOVAproduces a highly signi�cant F -value.
This can result when the MSE in the �rst two ANOVAs areunduly in�ated, i.e., � MSE in the third ANOVA.
The next 5 slides illustrate this with doctored data.
22
View by Typere
cipr
ocal
tim
e to
dea
th (
1/m
inut
e)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●
●
●
●
●
●
●●
●
●
●
●●●
I
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
II
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
III
insecticide type
The delivery variation is a good part of the �error� variation within type.
23
View by Deliveryre
cipr
ocal
tim
e to
dea
th (
1/m
inut
e)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●
●
●
●
●
●
●
●
●
●
A
●
●
●
●
●
●
●
●
●
●●
●
B
●●
●
●
●
●
●
●
●
●
●
●
C
●
●●●
●
●
●
●
●
●
●
●
D
delivery method
The type variation is a good part of the �error� variation within delivery.
24
View by Typere
cipr
ocal
tim
e to
dea
th (
1/m
inut
e)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●
●
●
●
●
●
●●
●
●
●
●●●
I
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
II
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
III
insecticide type
(III,A) lowered by .4, (I,D) raised by .15
3 type data sets seem well meshed when ignoring the 4 delivery methods.
25
View by Deliveryre
cipr
ocal
tim
e to
dea
th (
1/m
inut
e)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
●
●●
●
●
●
●
●
●
●
●
●
A
●
●
●
●
●
●
●
●
●
●●
●
B
●●
●
●
●
●
●
●
●
●
●
●
C
●
●●●
●
●
●
●
●
●
●
●
D
delivery method
(III,A) lowered by .4, (I,D) raised by .15
4 delivery methods seem well meshed when ignoring the 3 poison types.
26
Three ANOVAs for Modi�ed Data
$ANOVA.type Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)type 2 0.03592 0.01796 1.5582 0.2217Residuals 45 0.51868 0.01153
$ANOVA.delivery Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)delivery 3 0.08425 0.02808 2.6271 0.06208 .Residuals 44 0.47035 0.01069
$ANOVA.type.delivery Analysis of Variance TableResponse: recip.time
Df Sum Sq Mean Sq F value Pr(>F)type:delivery 11 0.46817 0.04256 17.727 2.294e-11 ***Residuals 36 0.08643 0.00240
27
Three ANOVAs for Modi�ed Data
The �rst two ANOVAs con�rm that neither factor alone (typeor delivery) shows a signi�cant e�ect (p-value ≤ α = .05).
This con�rms the meshing comments made below the last twoplots.
By the data changes we aligned the means of all compareddata sets more closely, but the variation from the ignoredfactor still in�ates the MSE , leading to non-signi�cant results.
The third ANOVA shows a highly signi�cant e�ect oftype:delivery combination. This is e�ected by the muchreduced MSE here (0.00240 as compared to 0.01153 or0.01069).
28
Additive E�ects Model
For now we will deal only with the balanced model.
Same number n of replications per factor level combination.
Unbalanced cases can get quite messy.1
If time permits, may touch back on that later.
Additive E�ects Model:
Yijk = µ+ai+bj+εijk , i = 1, . . . , t1 , j = 1, . . . , t2 , k = 1, . . . , n
with independent error terms εijk ∼ N (0, σ2)
1For an extensive treatment of the unbalanced case see S.R. Searle (1987),Linear Models for Unbalanced Data, John Wiley & Sons, New York
29
Dealing with Identi�ability Issues
The parameters µ, a1, . . . , at1 , b1, . . . , bt2 , are unidenti�able.
Adding a constant c to µ and subtracting it from the ai (bj)give the same means µij = E (Yijk) for all (i , j).
µij = µ+ ai + bj = µ+ c + ai − c + bj = µ′ + a′i + bj
As in the 1-way ANOVA we only need t1 − 1 parameters todistinguish between t1 row levels, and similarly t2 − 1parameters to distinguish between t2 column levels.
There are two customary ways of imposing side conditions thatdeal with this, i.e., to render all parameters as identi�able.∑
i ai = 0 and∑
j bj = 0 sum-to-zero side conditions
a1 = 0 and b1 = 0 set-to-zero side conditions
30
Sum-to-zero Side Conditions
Assume µij = µ+ ai + bj with∑
i ai =∑
j bj = 0 or
a. = b. = 0.
Here we identify µ with the average mean over all levelcombinations, because µ.. = µ+ a. + b.= µ.
Since µ i. = µ+ ai + b. = µ+ ai we can interpretai = µ i. − µ = µ i. − µ.. as the average change from µ = µ..due to level i of factor one when averaged over all levels offactor two.
Similarly, bj = µ.j − µ = µ.j − µ...The parameters µ, ai , bj with a. = b. = 0 de�ne the means µijwith the above additive structure.
They are uniquely identi�ed via the µij , as shown above.
Changes in both levels are additive µij = µ+ ai + bj
31
Set-to-zero Side Conditions
Assume µij = µ? + a?i + b?j with a?1 = b?1 = 0.
Here we identify the parameter µ? with the mean under thefactor level combination (1, 1): µ11 = µ? + a?1 + b?1 = µ?.
We express each change from µ? due to other levels ( 6= 1) infactor one via a?i , i.e., µi1 = µ? + a?i + b?1 = µ? + a?iand thus a?i = µi1 − µ? = µi1 − µ11.Each change from µ? due to other levels ( 6= 1) in factor two isexpressed via b?j , i.e., µ1j = µ? + a?1 + b?j = µ? + b?jand thus b?j = µ1j − µ? = µ1j − µ11.The parameters µ?, a?i , b
?j with a?1 = b?1 = 0 de�ne the means
µij with the above additive structure.
They are uniquely identi�ed via the µij , as shown above.
Changes in both levels are additive µij = µ? + a?i + b?j
32
How to Create Such Mean Structures
Suppose you want to simulate such data.
How would you create such mean structures?
Pick any t1 − 1 + t2 − 1 + 1 = t1 + t2 − 1 numbers a2, . . . , at1 ,b2, . . . , bt2 , and µ.
In the sum-to-zero case take
a1 = −t1∑i=2
ai and b1 = −t2∑j=2
bj =⇒t1∑i=1
ai =
t2∑j=1
bj = 0
In the set-to-zero case take a1 = 0 and b1 = 0
In either case de�ne
µij = µ+ ai + bj
33
Additive Model as Reduced Model
The additive model is a reduced model since in the full modeleach factor level combination (i , j) has its own mean µij .
There are t1 · t2 such means µij which can vary freely.
In the additive model with identi�ability restrictions we onlyhave 1 + (t1 − 1) + (t2 − 1) = t1 + t2 − 1 free parameters.
Note that t1 · t2 − [t1 + t2 − 1] = (t1 − 1) · (t2 − 1) can besubstantially greater than zero.
We get zero only when one of the factors has just one level.
In that case we are back in the 1-way (1-factor) ANOVA.
The second factor only has one level, i.e., does not change.
34
A Tabular View of the Additive Model
t1 · t2 = 3 · 4 Factorial Design
Factor 2
b1 b2 b3 b4
a3 µ+ a3 + b1 µ+ a3 + b2 µ+ a3 + b3 µ+ a3 + b4
Factor 1 a2 µ+ a2 + b1 µ+ a2 + b2 µ+ a2 + b3 µ+ a2 + b4
a1 µ+ a1 + b1 µ+ a1 + b2 µ+ a1 + b3 µ+ a1 + b4
Rows i and i ′ di�er by ai − ai ′ = µij − µi ′j = a?i − a?i ′ ,the same amount across all columns j = 1 . . . , t2.
Columns j and j ′ di�er by bj − bj ′ = µij − µij ′ = b?j − b?j ′ ,the same amount across all rows i = 1, . . . , t1.
Such di�erences are meaningful regardless of additive modelparametrization, i.e., regardless of constraints (sum-to-zero orset-to-zero).
35
Additivity in Factor 1 ⇐⇒ Additivity in Factor 2
If we take 5 numbers, say 5, 7, 9, 2, 3 in a row, and create fournew rows by adding 2, or 4, or 5 we get the following tableau
a?1 = 0 5 7 9 2 3
a?2 = 2 7 9 11 4 5
a?3 = 4 9 11 13 6 7
a?4 = 5 10 12 14 7 8
b?j 0 2 4 −3 −2
The columns di�er automatically by constant amounts b?j .
=⇒ additivity in factor 2.
The column di�erences are set in the �rst row.
They are not a�ected by translating that �rst row to variouslevels via the a?i (additivity in factor 1).
36
Additive Model Decomposition
Yijk = Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yijk − Y... − [Yi.. − Y...]− [Y.j. − Y...])
= Y... + (Yi.. − Y...) + (Y.j. − Y...) + (Yijk − Yi.. − Y.j. + Y...)= µ+ ai + bj + εijk∑
i
ai = t11
t1
∑i
(Yi.. − Y...) = t1(Y... − Y...) = 0
∑j
bj = t21
t2
∑j
(Y.j. − Y...) = t2(Y... − Y...) = 0
∑ik
εijk = nt11
nt1
∑ik
(Yijk − Yi.. − Y.j. + Y...) = 0
∑jk
εijk = nt21
nt2
∑jk
(Yijk − Yi.. − Y.j. + Y...) = 0
∑ijk
εijk = 0
37
Least Squares Estimates
µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...are the least squares estimates of µ, ai , and bjsubject to the conditions
∑i ai = 0 and
∑j bj = 0.
This follows without calculus from
Q(µ, a,b) = Q(µ, a1, . . . , at1 , b1, . . . , bt2) =∑ijk
(Yijk − µ− ai − bj)2
=∑ijk
{Y... − µ+ [(Yi.. − Y...)− ai ]
+ [(Y.j. − Y...)− bj ] + (Yijk − Yi.. − Y.j. + Y...)}2
=∑ijk
{(Y... − µ)2 + [(Yi.. − Y...)− ai ]
2
+ [(Y.j. − Y...)− bj ]2 + (Yijk − Yi.. − Y.j. + Y...)2
}.
All cross product terms disappear (see next slide).
Note that∑
i ai =∑
j bj = 0, solutions satisfy constraints.
38
Cross Product Terms = 0
∑ijk
(Y... − µ)[(Yi.. − Y...)− ai ] = (Y... − µ)t2n
{∑i
(Yi.. − Y...)−∑i
ai
}= 0
∑ijk
(Y... − µ)[(Y.j. − Y...)− bj ] = (Y... − µ)t1n
∑j
(Y.j. − Y...)−∑j
bj
= 0
Yijk − Yi.. − Y.j. + Y... = εijk
∑ijk
(Y... − µ)εijk = (Y... − µ)∑ijk
εijk = 0
∑ijk
[(Yi.. − Y...)− ai ][(Y.j. − Y...)− bj ] = n∑i
[ai − ai ]∑j
[bj − bj ] = 0
∑ijk
[(Yi.. − Y...)− ai ]εijk =∑i
[(Yi.. − Y...)− ai ]∑jk
εijk = 0
∑ijk
[(Y.j. − Y...)− bj ]εijk =∑j
[(Y.j. − Y...)− bj ]∑ik
εijk = 0
39
Estimates: Sum-to-Zero −→ Set-to-Zero
As sum-to-zero estimates we have:µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...,
=⇒ µij = µ+ ai + bj .
Sum-to-zero estimates −→ set-to-zero estimates via
µ? = µ11 = µ+a1+b1, a?i = µi1−µ11 = ai−a1, b?j = µ1j−µ11 = bj−b1
Note that a?1 = b?1 = 0.
Furthermore, µij = µ? + a?i + b?j = µ+ ai + bj with
µ? = Y1.. + Y.1. − Y...a?i = Yi.. − Y1..b?j = Y.j. − Y.1.
40
Estimates: Set-to-Zero −→ Sum-to-Zero
The set-to-zero estimates µ?, a?i , b?j with a?1 = 0, b?1 = 0,
de�ne µij = µ? + a?i + b?j .
With these we get the sum-to-zero equivalent representation
µ = µ.. = µ? + a?. + b?.ai = µi. − µ.. = a?i − a?.bj = µ.j − µ.. = b?j − b?.
=⇒ µij = µ? + a?i + b?j = µ+ ai + bj
=⇒ µ = Y..., ai = Yi.. − Y..., and bj = Y.j. − Y...Set-to-zero is what lm in R gives as coe�cients.
41
Fitted Models
The �tted mean per treatment combination under eitherparametrization (sum-to-zero or set-to-zero) are the same, i.e.,
µij = µ+ ai + bj = µ? + a?i + b?j
Components of the �tted values have di�erent interpretations.
This is completely analogous to the previous parameter version
µij = µ+ ai + bj = µ? + a?i + b?j .
Explicitly, in terms of the data
µij = Y... + (Yi.. − Y...) + (Y.j. − Y...) = Yi.. + Y.j. − Y... .
Note
εijk = Yijk − Yi.. − Y.j. + Y... = Yijk − µij .
42
Orthogonal Decomposition of the Data Vector
Y =
Y111
.
.
.
Y11n
.
.
.
Y1t21
.
.
.
Y1t2n
.
.
.
.
.
.
Yt111
.
.
.
Yt11n
.
.
.
Yt1t21
.
.
.
Yt1t2n
=
µ...
µ...
µ...
µ...
.
.
.
µ...
µ...
µ...
µ
⊥+
a1...
a1...
a1...
a1...
.
.
.
at1...
at1...
at1...
at1
⊥+
b1...
b1...
bt2...
bt2...
.
.
.
b1...
b1...
bt2...
bt2
⊥+
ε111...
ε11n...
ε1t21...
ε1t2n...
.
.
.
εt111...
εt11n...
εt1t21...
εt1t2n
= µ1+a+b+ε
43
Orthogonalities
∑ijk
µ ai = µ t2n∑i
ai = 0
∑ijk
µ bj = µ t1n∑j
bj = 0
∑ijk
µ εijk = µ∑ijk
εijk = 0
∑ijk
ai bj = n∑i
ai∑j
bj = 0
∑ijk
ai εijk =∑i
ai∑jk
εijk
= 0
∑ijk
bj εijk =∑j
(bj∑ik
εijk
)= 0
44
Sum of Squares (SS) Decomposition
Orthogonality =⇒ following SS decomposition (Pythagoras again)∑ijk
Y 2
ijk =∑ijk
Y 2... +∑ijk
(Yi.. − Y...)2 +∑ijk
(Y.j. − Y...)2
+∑ijk
(Yijk − Yi.. − Y.j. + Y...)2
=∑ijk
µ2 +∑ijk
a2i +∑ijk
b2j +∑ijk
ε2ijk
=⇒∑ijk
(Yijk − Y...)2 =∑ijk
Y 2
ijk −∑ijk
Y 2... =∑ijk
Y 2
ijk −∑ijk
µ2
=∑ijk
a2i +∑ijk
b2j +∑ijk
ε2ijk
=∑ijk
(Yi.. − Y...)2 +∑ijk
(Y.j. − Y...)2
+∑ijk
(Yijk − Yi.. − Y.j. + Y...)2
SST = SSA + SSB + SSE45
Interpretation of SS Decomposition
SST = SSA + SSB + SSE
SST =∑
ijk(Yijk − Y...)2:Total variation of data around the grand or overall meanSSA =
∑ijk(Yi.. − Y...)2:
Variation of means around the grand mean(by factor 1 level, averaged over the levels of factor 2)SSB =
∑ijk(Y.j. − Y...)2:
Variation of means around the grand mean(by factor 2 level, averaged over the levels of factor 1)SSE =
∑ijk(Yijk − µij)2:
Variation of data around the �tted additive model value.
SSE =∑ijk
(Yijk −
µij︷ ︸︸ ︷[Y... + (Yi.. − Y...) + (Y.j. − Y...)])2
=∑ijk
(Yijk − Yi.. − Y.j. + Y...)2 =∑ijk
ε2ijk
46
Degrees of Freedom
In Y = µ1 + a + b + ε the component vectors areorthogonal to each other.
There is 1 degree of freedom in µ1 and thus there are areN − 1 = t1t2n − 1 degrees of freedom (df) in Y − µ1 ⊥ µ1and thus in SST .
Although the vector a contains t1 distinct values, only t1 − 1can vary freely, due to sum-to-zero or set-to-zero constraints.
There are t1 − 1 df in that vector and thus in SSA.
Similarly, there are t2 − 1 df in b and thus in SSB .
By orthogonal complement there are
t1t2n−1−(t1−1)−(t2−1) = (t1−1)(t2−1)+t1t2(n−1) = dfE
df in the residual error vector ε and thus in SSE .
47
ANOVA Table for the Additive Model
Source SS df MS F
A SSA t1 − 1 MSA = SSA/(t1 − 1) MSA/MSE
B SSB t2 − 1 MSB = SSB/(t2 − 1) MSB/MSE
Error SSE dfE MSE = SSE/dfE
Total SST t1t2n − 1
dfE = t1t2n−1− (t1−1)− (t2−1) = (t1−1)(t2−1) + t1t2(n−1)
View (t1 − 1)(t2 − 1) = t1t2 − [1 + (t1 − 1) + (t2 − 1)] as thenumber of means µij left unexplained by the additive model
View t1t2(n − 1) as the degrees of freedom of within cellvariation (n − 1 per cell) totaled over all t1t2 cells.
48
lm on Reciprocal Time to Death
out <- lm(recip.time ~ type + delivery)
out$coef
Coefficients:(Intercept) typeII typeIII deliveryB deliveryC deliveryD
0.26977 0.04686 0.19964 -0.16574 -0.05721 -0.13583
µ? a?2 a?3 b?2 b?3 b?4Note the implicit set-to-zero form of the parameter estimatesin out$coef! intercept = µ? with a?1 = b?1 = 0.
µ? represents the mean under the treatment combination(typeI,deliveryA)
a?i , b?j represent additive mean deviation e�ects from this
baseline µ?.
49
Sum-to-Zero Estimates
Below are the sum-to-zero estimates corresponding to the previousslide, using the conversion formulas from slide 41.
$mu.hat0.2622376
µ
$a.hattypeI typeII typeIII
-0.08216887 -0.03530475 0.11747362
a1 a2 a3
$b.hatdeliveryA deliveryB deliveryC deliveryD0.08969690 -0.07604334 0.03248336 -0.04613693
b1 b2 b3 b450
anova(lm(recip.time ∼ type + delivery))
Analysis of Variance Table
Response: recip.timeDf Sum Sq Mean Sq F value Pr(>F)
type 2 0.34877 0.17439 71.708 2.865e-14 ***delivery 3 0.20414 0.06805 27.982 4.192e-10 ***Residuals 42 0.10214 0.00243---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
51
Distributional Facts
The ANOVA table states p-values based on null-distributionsderived from the model and εijk i .i .d . ∼ N (0, σ2).Under the additive model
⇒ SSE ∼ σ2χ2dfE and E(MSE ) = E(SSE/dfE ) = σ2
SSA ∼ σ2χ2t1−1,λ1 with ncp
λ1 =∑ijk
(µ i. − µ..)2/σ2 =∑ijk
a2i /σ2 = n t2
∑i
a2i /σ2
E(MSA) = E
(SSA
t1 − 1
)= σ2 + σ2
λ1
t1 − 1= σ2 +
nt2
t1 − 1
∑i
a2i
SSB ∼ σ2χ2t2−1,λ2 with ncp
λ2 =∑ijk
(µ.j − µ..)2/σ2 =∑ijk
b2j /σ2 = n t1
∑j
b2j /σ2
E(MSB) = E
(SSB
t2 − 1
)= σ2 + σ2
λ2
t2 − 1= σ2 +
nt1
t2 − 1
∑j
b2j
nt2 and nt1 act as multipliers in the noncentrality parameters!Looking at both factors jointly, we bene�t from the commonσ2 assumption.
52
Distributional Facts (continued)
SSA, SSB and SSE are independent due to orthogonality ofcomponent vectors.
Under HA : a1 = . . . = at1 = 0 we have λ1 = 0 and thusFA = MSA/MSE ∼ Ft1−1,dfE .
Under HB : b1 = . . . = bt2 = 0 we have λ2 = 0 and thusFB = MSB/MSE ∼ Ft2−1,dfE .
These F -distributions are the basis for the p-values in theprevious ANOVA table.
These p-values correspond to testing HA and HB , respectively.
53
How Well Does the Additive Model Fit?
In the additive model we have: µij = µ+ ai + bjCould compare the full model estimate of µij , namely theaverage Yij. over the n observations in cell (i , j), with theadditive model �tted value for that same cell, i.e., with
µij = µ+ ai + bj = Y... + (Yi.. − Y...) + (Y.j. − Y...)= Yi.. + Y.j. − Y... .
Yij. depends only on data from cell (i , j), averaging n values.
µij depends on data from cells other than cell (i , j) and ismore strongly averaged.
54
Full Model Representation
The full model Yijk = µij + εijk can be written in the followingequivalent form:
Yijk = µ+ ai + bj + (ab)ij + εijk = µ+ ai + bj + cij + εijk
Here it is assumed that εijki.i.d.∼ N (0, σ2).
The equivalent full model form decomposes the meanstructure in µij into two components, namely the previouslyconsidered additive model µ+ ai + bj and the extentcij = (ab)ij to which this additive model does not explain µij ,i.e., cij = µij − (µ+ ai + bj).
These parameters cij = (ab)ij are called interaction terms.
The use of the notational device (ab) is just a mnemonic toindicate the inseparable or joint action of the factors A and B ,i.e., their interaction.
55
Identi�ability Issues
While there are t1t2 mean parameters µij , the alternateparametrization has 1 + t1 + t2 + t1t2 parameters
µ, a1, . . . , at1 , b1, . . . , bt2 , c11, . . . , ct1t2
To make these latter parameters identi�able we need toimpose again certain side conditions.
There are two customary ways which parallel the previousidenti�ability resolution in the case of the additive model.
1) a1 = b1 = c1j = ci1 = 0 ∀i , jset-to-zero side condition (lm output in R)
2)∑
i ai =∑
j bj =∑
i cij =∑
j cij = 0 ∀i , jsum-to-zero side condition
56
µij =⇒ Sum-to-Zero Parametrization
De�ne
µ = µ.. =1
t1t2
∑i
∑j
µij , µi. =1
t2
∑j
µij , µ.j =1
t1
∑i
µij
and then all parameters µ, ai , bj , cij are determined from the µij via
ai = µi. − µ.., bj = µ.j − µ..
cij = µij − µi. − µ.j + µ..
= µij − (µi. − µ..)− (µ.j − µ..)− µ.. = µij − ai − bj − µ
=⇒ µij = µ+ai+bj+cij with∑i
ai =∑j
bj =∑i
cij =∑j
cij = 0
satisfying the sum-to-zero side conditions.57
µij =⇒ Set-to-Zero Parametrization
De�ne all parameters µ?, a?i , b?j , c
?ij from the µij via
µ? = µ11, a?i = µi1−µ11 = µi1−µ?, b?j = µ1j−µ11 = µ1j−µ?,
c?ij = µij − µi1 − µ1j + µ11
= µij − (µi1 − µ11)− (µ1j − µ11)− µ11
= µij − a?i − b?j − µ? .
=⇒ µij = µ?+a?i +b?j +c?ij with a?1 = 0, b?1 = 0, c?i1 = c?1j = 0 ∀i , j
satisfying the set-to-zero side conditions.
Whatever the parametrization, we can easily go from one to theother via the previous de�nitions in terms of the µij .
58
Decomposition and Least Squares Estimation
Extend decomposition as follows (with orthogonal components)
Yijk = Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...) + (Yijk − Yij.)
= µ+ ai + bj + cij + εijk∑ijk
(Yijk−µ−ai−bj−cij)2
=∑ijk
[(Y... − µ) + (Yi.. − Y... − ai ) + (Y.j. − Y... − bj)
+(Yij. − Yi.. − Y.j. + Y... − cij) + (Yijk − Yij.)]2
=∑ijk
[(Y... − µ)2 + (Yi.. − Y... − ai )
2 + (Y.j. − Y... − bj)2
+ (Yij. − Yi.. − Y.j. + Y... − cij)2 + (Yijk − Yij.)2
]59
Full Model Least Squares Estimates (LSEs)
Cross product terms = 0 in the previous quadratic expansion,because of the component orthogonality in the decomposition.
The least squares estimates (LSEs, in sum-to-zero form) are
µ = Y... , ai = Yi.. − Y... , bj = Y.j. − Y...cij = Yij. − Yi.. − Y.j. + Y...
with E (µ) = µ, E (ai ) = ai , E (bj) = bj and E (cij) = cij ,
The LSEs are unbiased.
The �tted values for the µij are
ˆµij = µ+ ai + bj + cij = µij + cij
= Y... + (Yi.. − Y...) + (Y.j. − Y...)+(Yij. − Yi.. − Y.j. + Y...)
= Yij. i.e., the cell means.
with residuals εijk = Yijk − Yij..60
Sum of Squares Decomposition
Using µ = ai = bj = cij = 0 in previous decomposition∑ijk
Y 2
ijk =∑ijk
[Y 2... + (Yi.. − Y...)2 + (Y.j. − Y...)2
+(Yij. − Yi.. − Y.j. + Y...)2 + (Yijk − Yij.)2]
∑ijk
(Yijk − Y...)2 =∑ijk
[(Yi.. − Y...)2 + (Y.j. − Y...)2
+ (Yij. − Yi.. − Y.j. + Y...)2 + (Yijk − Yij.)2]
SST = SSA + SSB + SSAB + SSE
SST =∑ijk
(Yijk − Y...)2
SSA =∑ijk
(Yi.. − Y...)2 =∑ijk
a2i , SSB =∑ijk
(Y.j. − Y...)2 =∑ijk
b2j
SSAB =∑ijk
(Yij. − Yi.. − Y.j. + Y...)2 =∑ijk
c2ij , SSE =∑ijk
(Yijk − Yij.)2 =∑ijk
ε2ijk
61
ANOVA Table for the Full Model
Source SS df MS F
A SSA t1 − 1 MSA = SSA/(t1 − 1) MSA/MSE
B SSB t2 − 1 MSB = SSB/(t2 − 1) MSB/MSE
AB SSAB (t1 − 1)(t2 − 1) MSAB = SSAB/[(t1 − 1)(t2 − 1)] MSAB/MSE
Error SSE t1t2(n − 1) MSE = SSE/[t1t2(n − 1)]
Total SST t1t2n − 1
62
Distributional Facts for the Full Model
SSAσ2∼ χ2t1−1, λA , λA =
∑ijk a
2i
σ2,
SSBσ2∼ χ2t2−1, λB , λB =
∑ijk b
2j
σ2
SSABσ2
∼ χ2(t1−1)(t2−1), λAB, λAB =
∑ijk c
2ij
σ2,
SSEσ2∼ χ2t1t2(n−1)
SSA, SSB , SSAB , SSE are statistically independent (orthogonality).
FA = MSA/MSE ∼ Ft1−1, t1t2(n−1), λA
FB = MSB/MSE ∼ Ft2−1, t1t2(n−1), λB
FAB = MSAB/MSE ∼ F(t1−1)(t2−1), t1t2(n−1), λAB
63
Expected MS for the Full Model
E (MSA) = E
(SSAt1 − 1
)= σ2 +
∑ijk a
2i
t1 − 1= σ2
(1 +
λAt1 − 1
)
E (MSB) = E
(SSBt2 − 1
)= σ2 +
∑ijk b
2j
t2 − 1= σ2
(1 +
λBt2 − 1
)
E (MSAB) = E
(SSAB
(t1 − 1)(t2 − 1)
)= σ2 +
∑ijk c
2ij
(t1 − 1)(t2 − 1)
= σ2(1 +
λAB(t1 − 1)(t2 − 1)
)
E (MSE ) = σ2.
64
F -Tests for Full Model
Reject H0A : a1 = . . . = at1 = 0 whenever FA is too large.
For a level α test reject H0A wheneverFA ≥ qf(1− α,t1 − 1,t1t2(n− 1)).
Reject H0B : b1 = . . . = bt2 = 0 whenever FB is too large.
For a level α test reject H0B wheneverFB ≥ qf(1− α,t2 − 1,t1t2(n− 1)).
Reject H0AB : cij = 0 ∀i , j whenever FAB is too large.
For a level α test reject H0AB wheneverFAB ≥ qf(1− α, (t1 − 1)(t2 − 1),t1t2(n− 1)).
H0AB : cij = 0 ∀i , j means that the additive modelµij = µ+ ai + bj is su�cient to explain the mean structure.
Rejecting H0AB =⇒ the additive model will not provide asu�cient explanation.
65
Comments on the Full Model ANOVA Table
SSadditive modelE =
∑ijk
(Yijk−Yi..−Y.j.+Y...)2 6= SS full modelE =
∑ijk
(Yijk−Yij.)2
∑ijk
(Yijk−Yi..−Y.j.+Y...)2 =∑ijk
(Yijk−Yij.)2+∑ijk
(Yij.−Yi..−Y.j.+Y...)2
MSadditive modelE =
∑ijk(Yijk − Yi.. − Y.j. + Y...)2
(t1 − 1)(t2 − 1) + t1t2(n − 1)
6= MS full modelE =
∑ijk(Yijk − Yij.)2t1t2(n − 1)
F additive modelA = MSA/MSadditive model
E
6= F full modelA = MSA/MS full model
E
F additive modelB = MSB/MSadditive model
E
6= F full modelB = MSB/MS full model
E
66
Reciprocal Time to Death (Insecticide Data)
> round(recip.time,5)[1] 0.32258 0.22222 0.21739 0.23256 0.27778 0.34483 0.25000[8] 0.43478 0.45455 0.47619 0.55556 0.43478 0.12195 0.09091
[15] 0.11364 0.13889 0.10870 0.16393 0.20408 0.08065 0.33333[22] 0.27027 0.26316 0.34483 0.23256 0.22222 0.15873 0.13158[29] 0.22727 0.28571 0.32258 0.25000 0.43478 0.40000 0.41667[36] 0.45455 0.22222 0.14085 0.15152 0.16129 0.17857 0.09804[43] 0.14085 0.26316 0.33333 0.27778 0.32258 0.30303
67
Factors of Insecticide Data
> type[1] I I I I II II II II III III III III
[13] I I I I II II II II III III III III[25] I I I I II II II II III III III III[37] I I I I II II II II III III III IIILevels: I II III
> delivery[1] A A A A A A A A A A A A B B B B B B B B B B B B
[25] C C C C C C C C C C C C D D D D D D D D D D D DLevels: A B C D
Both type and delivery are in factor form.
Don't need to invoke as.factor(type) andas.factor(delivery) in the call of lm.
68
out.lmFULL <- lm(recip.time ∼ type*delivery)
recip.time ∼ type*delivery
compare←→ recip.time ∼ type+delivery
> out.lmFULLCall:lm(formula = recip.time ~ type * delivery)
Coefficients:(Intercept) typeII typeIII
0.248688 0.078159 0.231580deliveryB deliveryC deliveryD-0.132342 -0.062416 -0.079720
typeII:deliveryB typeIII:deliveryB typeII:deliveryC-0.055166 -0.045030 0.006961
typeIII:deliveryC typeII:deliveryD typeIII:deliveryD0.008646 -0.076974 -0.091368
Note the set-to-zero form of parameter estimates.
69
What else is in out.lmFULL?
> names(out.lmFULL)[1] "coefficients" "residuals" "effects" "rank"[5] "fitted.values" "assign" "qr" "df.residual"[9] "contrasts" "xlevels" "call" "terms"
[13] "model"
70
Full ANOVA: anova(out.lmFULL)
Analysis of Variance Table
Response: recip.timeDf Sum Sq Mean Sq F value Pr(>F)
type 2 0.34877 0.17439 72.6347 2.310e-13 ***delivery 3 0.20414 0.06805 28.3431 1.376e-09 ***type:delivery 6 0.01571 0.00262 1.0904 0.3867Residuals 36 0.08643 0.00240---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Thus it appears that the additive model is quite acceptable.
Both factors play strongly in the additive model.
71
Graphical View of No Interaction E�ect
If one view looks parallel so will the other.
●
●
●●
●
●
●
●
●
●
●
●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
A B C D
●
●
●
IIIIII
●
●
●
●●
●
●
●
●
● ●
●
0.0
0.1
0.2
0.3
0.4
0.5
0.6
I II III
●
●
●
●
ABCD
72
If One View Looks Parallel so Will the Other.
µ11 µ12 µ13 µ14µ11 + d2 µ12 + d2 µ13 + d2 µ14 + d2µ11 + d3 µ12 + d3 µ13 + d3 µ14 + d3
Row di�erences are constant!
=⇒ Column di�erences are constant as well, i.e.,∆2 = µ12 − µ11 is the di�erence between column 2 and column 1∆3 = µ13 − µ11 is the di�erence between column 3 and column 1∆4 = µ14 − µ11 is the di�erence between column 4 and column 1.
µ11 µ11 + ∆2 µ11 + ∆3 µ11 + ∆4
µ21 µ21 + ∆2 µ21 + ∆3 µ21 + ∆4
µ31 µ31 + ∆2 µ31 + ∆3 µ31 + ∆4
Used µ11 = µ11, µ21 = µ11 + d2, and µ31 = µ11 + d3.
Similarly one argues going the other direction.
73
Examining Factor Level Di�erences
Strong evidence of factor level di�erences.
Examine them individually to see which di�erences matter.
A naive approach: Perform a 2-sample t-test.
Or look at the corresponding con�dence intervals.
Comparing type I with type II means we gett.test(recip.time[type == "I"],recip.time[type == "II"],var.equal=T)
Two Sample t-test
data: recip.time[type == "I"] and recip.time[type == "II"]t = -1.6383, df = 30, p-value = 0.1118alternative hypothesis: true difference in means
is not equal to 095 percent confidence interval:-0.10528550 0.01155725sample estimates:mean of x mean of y0.1800688 0.2269329
74
Comparing Types I and II of Insecticide
reciprocal time to death (1/minute)
● ●● ●● ● ●
● ● ● ●●●●
●●
●●● ●●● ● ● ●●
●● ●● ● ●I
II
type
of p
oiso
n
0.0 0.1 0.2 0.3 0.4 0.5
points are jittered vertically and horizontally
75
Getting MSE in Types I and II Comparison
> anova(lm(recip.time[type=="I" | type=="II"]~+ type[type=="I" | type=="II"]))Analysis of Variance Table
Response: recip.time[type == "I" | type == "II"]Df Sum Sq Mean Sq F value Pr(>F)
type[type == "I" | type == "II"] 1 0.017570 0.017570 2.6839 0.1118Residuals 30 0.196394 0.006546
Note the same p-value 0.1118 as in previous t-test.
We are doing the same test, since t2f = F1,f .
From this table we get s =√MSE =
√0.006546 = .08091.
This could also have been backed out from previous t-basedcon�dence interval.
type == "I" | type == "II" = logical OR.
76
What is Wrong?
In the previous 2-sample t-test/interval we treated theobservations as i.i.d. from two populations.
Ignored the known variations due to the delivery method.
The pooled sample standard deviation from these 2 �samples�confounds variation between delivery method means withvariation (σ) of within (delivery,type) combination.
Our �reference distribution� will thus be too dispersed.
Our test will be less discriminating.
Our con�dence interval will be too wide.
77
Closer Look in Comparing Types I and II
reciprocal time to death (1/minute)
0.0 0.1 0.2 0.3 0.4 0.5
I
II
type
of p
oiso
n
ABCD
delivery method
Note the reduced variability within same color clusters.
78
Correct Approach
According to our (accepted) additive model we have
Y1jk = µ+ a1 + bj + ε1jk and
Y2jk = µ+ a2 + bj + ε2jk with εijki.i.d.∼ N (0, σ2)
The di�erence due to type I and type II is captured by a1 − a2.
This can be interpreted as the di�erence between the meanresponse under type I and the mean response under type II:
µ1. − µ2. =
∑j µ1j
4−∑
j µ2j
4
=
∑j(µ+ a1 + bj)−
∑j(µ+ a2 + bj)
4= a1 − a2
The e�ect of type = I vs. type = II can be interpreted as acontrast in cell means.
Think of it as contrasting the e�ects of interest whilecanceling/averaging out other e�ects.
79
Contrast in Full Model
Even in the full model with interactions the previous contrast inmeans stays the same since
µ1. − µ2. =
∑j µ1j
4−∑
j µ2j
4
=
∑j(µ+ a1 + bj + c1j)−
∑j(µ+ a2 + bj + c2j)
4
= µ+ a1 +
∑j(bj + c1j)
4− µ− a2 −
∑j(bj + c2j)
4
= a1 − a2
µ, bj , c1j and c2j are canceled out.
80
Estimated Contrast
The natural estimate of µ1. − µ2. = a1 − a2 is
µ1.−µ2. = a1−a2 = (Y1..−Y...)−(Y2 ..−Y...) = Y1..−Y2 ..
Same as in previous (naive) 2-sample t-test/interval.Can also be viewed as the contrast of estimated cell averages
a1 − a2 = Y1.. − Y2 ..
=Y11. + Y12. + Y13. + Y14.
4− Y21. + Y22. + Y23. + Y24.
4
=ˆµ11 + ˆµ12 + ˆµ13 + ˆµ14
4−
ˆµ21 + ˆµ22 + ˆµ23 + ˆµ244
=µ11 + µ12 + µ13 + µ14
4− µ21 + µ22 + µ23 + µ24
4
because ci1 + ci2 + ci3 + ci4 = 0 in∑
jˆµij =
∑j(µij + cij).
81
Correct Approach (continued)
However, var(Y1..−Y2..) = var(Y1..)+var(Y2..) =σ2
nt2+σ2
nt2=
2σ2
nt2
(Y1.. − Y2.. − (a1 − a2))/(σ√
2nt2
)s/σ
=(Y1.. − Y2.. − (a1 − a2))
s√2/(nt2)
=a1 − a2 − (a1 − a2)
s√2/(nt2)
∼ tf .
We have two options in choosing s and the corresponding f :
s2 = s2× = MS full modelE =
∑ijk(Yijk − Yij.)2t1t2(n − 1)
and thus f = t1t2(n − 1) .
or s2 = s2+ = MSadditive modelE =
∑ijk(Yijk − Yi.. − Y.j. + Y...)2
t1t2(n − 1) + (t1 − 1)(t2 − 1)
and thus f = t1t2(n − 1) + (t1 − 1)(t2 − 1) = t1t2n − t1 − t2 + 1.
82
t-Test for H0 : a1 = a2
Reject H0 : a1 = a2 when∣∣∣∣∣ a1 − a2√2MSE/(nt2)
∣∣∣∣∣ > t1−α/2,f
or |a1 − a2| >√2MSE/(nt2)t1−α/2,f
or |a1 − a2| > SE (a1 − a2)t1−α/2,f = LSDA ,
LSDA = least signi�cant di�erence with which any estimateddi�erences ai − ai ′ (i 6= i ′) in factor A levels can be compared.
Parallel with our previous use of LSD in the ANOVA situation.
83
t-Test for H0 : b1 = b2
Reject H0 : b1 = b2 when∣∣∣∣∣ b1 − b2√2MSE/(nt1)
∣∣∣∣∣ > t1−α/2,f
or |b1 − b2| >√2MSE/(nt1)t1−α/2,f
or |b1 − b2| > SE (b1 − b2)t1−α/2,f = LSDB ,
LSDB = least signi�cant di�erence with which any estimateddi�erences bj − bj ′ (j 6= j ′) in factor B levels can be compared.
Note the change (t2 ←→ t1) between LSDA and LSDB .
84
Factor Level Means
> mean(recip.time[type=="I"])[1] 0.1800688> mean(recip.time[type=="II"])[1] 0.2269329> mean(recip.time[type=="III"])[1] 0.3797112
> mean(recip.time[delivery=="A"])[1] 0.3519345> mean(recip.time[delivery=="B"])[1] 0.1861943> mean(recip.time[delivery=="C"])[1] 0.294721> mean(recip.time[delivery=="D"])[1] 0.2161007
85
LSDA and LSDB
For the Type Factor
LSDA = t.975,f SE (a1 − a2) = 2.028094√.00240 2
4·4 = 0.03513
for the full model with f = 36.
LSDA = t.975,f SE (a1 − a2) = 2.018082√.00243 2
4·4 = 0.03517
for the additive model with f = 42.
For the Delivery Factor
LSDB = t.975,f SE (b1 − b2) = 2.028094√.00240 2
3·4 = 0.04056
for the full model with f = 36.
LSDB = t.975,f SE (b1 − b2) = 2.018082√.00243 2
3·4 = 0.04061
for the additive model with f = 42.
86
LSD Groupings
Type
LSDA = .0352
Poison Mean LSDType µi. Grouping
I 0.180 1
II 0.227 2
III 0.380 3
Note di�erenceto naive 2-sample t-test.
Delivery
LSDB = .0406
Poison Mean LSDDelivery µ.j Grouping
B 0.186 1
D 0.216 1
C 0.295 2
A 0.352 3
87
Looking Back
0.2
0.3
0.4
type
mea
n of
rec
ip.ti
me
I II III
delivery
ACDB
0.2
0.3
0.4
delivery
mea
n of
rec
ip.ti
me
A B C D
type
IIIIII
88
interaction.plot
The plots on the previous slide were produced by:
> par(mfrow=c(2,1),mar=c(4,4,1,1)+.1)> interaction.plot(type,delivery,recip.time,
col=c("green","red","blue","cyan"))> interaction.plot(delivery,type,recip.time,
col=c("green","red","blue"))
Here mar=c(4,4,1,1)+.1 inside par sets plot margins.
mfrow=c(2,1) sets up plotting for two plots per page,one above the other.
89
The Full Model Revisited
The full model is Yijk = µij + εijk .
We can reparametrize it as
Yijk = µ+ai+bj+cij+εijk with replications k = 1, . . . , n, where
1) µ =∑
ij µij/(t1t2) = µ.. is the grand mean.2) ai =
∑j(µij − µ..)/t2 = µi. − µ.. with
∑i ai = 0.
3) bj =∑
i (µij − µ..)/t2 = µ.j − µ.. with∑
j bj = 0.
ai and bj are also referred to as main e�ects.4) cij = µij − µi. − µ.j + µ.. = µij − (µ+ ai + bj) with∑
i cij =∑
j cij = 0.
In the additive model the only change is 4) cij = 0 ∀ i , j .
90
The Interpretation of ai − ai ′
In the additive or main e�ects model we have
ai−ai ′ = µij−µi ′j = µ+ai +bj−(µ+ai ′+bj) ∀j = 1, . . . , t2 ,
ai − ai ′ = di�erence in mean response between levels i and i ′
of factor A, and it is the same for each level j of factor B .
In the full or interaction model (since b. = ci. = ci ′. = 0)
ai−ai ′ = µi.−µi ′. = µ+ai+b.+ci.−(µ+ai ′+b.+ci ′.) ∀j = 1, . . . , t2 ,
ai − ai ′ = di�erence in mean response between levels i and i ′
of factor A, when averaged over all levels j of factor B .
Corresponding interpretations hold for bj − bj ′ .
91
Understanding Interactions
Main e�ects ai and bj are called that way because theiradditive e�ects on the mean µij are easily understood.
Interactions can be more complicated and may need morescrutiny in order to develop some understanding.
We will just give a few example situations that illustrate somedistinct and very di�erent situations.
There are certainly many other possibilities.
For simplicity we consider a 2× 4 two-factor experiment.
92
Interaction Pattern 1
●
●
A1.B1 A2.B1 A1.B2 A2.B2 A1.B3 A2.B3 A1.B4 A2.B4
24
68
93
Interaction Pattern 2
A1.B1 A2.B1 A1.B2 A2.B2 A1.B3 A2.B3 A1.B4 A2.B4
24
68
94
Interaction Pattern 3
●
●
●
A1.B1 A2.B1 A1.B2 A2.B2 A1.B3 A2.B3 A1.B4 A2.B4
24
68
95
Comments on Interaction Patterns
Pattern 1 shows a linear interaction trend along the levels offactor B .
Factor B seems to have no additive or main e�ect.
Pattern 2 seems to have no additive e�ect from factor Band almost no additive e�ect from factor A.
Only for B4 of factor B is there a clear change in factor A.
Hence B4 acts as an interaction switch.
Pattern 3 shows no additive or main e�ect due to factor B .
If the colors of the last two boxes were switched we wouldhave a clear additive or main e�ect due to factor A (±2.5).This would give a far simpler data explanation and it suggeststhe possibility of a labeling error.
96
Randomized Complete Block Designs
When we did our ANOVA to examine the e�ects of a factor Aof interest, we saw that the power of the F -test is anincreasing function of the noncentrality parameterλ =
∑i ni (µi − µ)2/σ2.
Cannot in�uence the size of treatment e�ects, |µi − µ|,We can in�uence the sample sizes ni and possibly σ.
Note σ −→ σ/2 ⇐⇒ ni −→ 4 · ni !!
How can we in�uence σ itself?
We need to understand what may a�ect σ.
Often σ is due to variability of hidden or ignored factors.
Consider the delivery factor in our insecticide experiment.
It had a de�nite e�ect on the measured response times.
Had we ignored it or left it to happenstance which delivery wasused for each experimental unit (insect), we would haveconfounded the variability due to delivery with the remainingvariability within (delivery,type).
97
Ignoring a Factor
Assuming an additive model (similar reasoning under fullmodel):
Source SS df MS F
A SSA t1 − 1 SSA/(t1 − 1) MSA/MSE
B SSB t2 − 1 SSB/(t2 − 1) MSB/MSE
Error SSE dfE SSE/dfE
Total SST t1t2n − 1
If we ignored factor B , we would treat SS ′E = SSB + SSE asour error sum of squares with df
′E = dfE + t2 − 1.
SS ′E/df′E = MS ′E would be a legitimate estimate of σ2
if b1 = . . . = bt2 = 0.
If not, MS ′E would be in�ated,
it would estimate σ′2 > σ2 ⇒ loss of power.
98
Blocking
Blocking consists of stratifying experimental units into groupsthat will have more homogeneous responses within groups.
Possibly quite inhomogeneous responses between groups.
Such grouping/blocking can be accomplished by anappropriately chosen factor, where the levels of that factorde�ne the di�erent groups.
Blocking will be bene�cial if the factor used for blocking causesvariation in the response as the levels of that factor change.
Bene�cial in the sense that this variation can be eliminated.
Thus experimental units within a level of that factor (i.e.,within a block) will not experience that change and will thusbe more homogeneous in their response.
Treatment comparisons within blocks, block by block.
99
Typical Blocking Criteria
Location: Experiments are conducted over varying locationsand location is judged to have an e�ect on the response.
Time: If time of day, month, or year are likely to a�ectresponse. Results of the study are to stand regardless of time.
Litters: If animals in the same litter are likely to produce morehomogeneous responses. Useful in medical experiments.
Batches of Material: Variations in the process for creatingexperimental material are likely to show up in the responses.
Any variation inducing aspect of an experiment (education,income, . . .) that is not considered a treatment of interest.
100
Sir Ronald Aylmer Fisher
The statistician Sir Ronald Aylmer Fisher(1890-1962)developed experimental design in an agricultural setting atRothamstead Experimental Station.
See http://www.bookrags.com/Ronald_Fisher onhis great in�uence as one of the founding fathers of statistics.
To some he is more famous for his work in Genetics.
�He bred poultry, mice, snails, and other creatures andpublished his �ndings in several papers that contributed toscientists' overall understanding of genetic dominance.�
Apparently he had much to do with the fact that thesigni�cance level of .05 is so entrenched until today, see:
http://www.jerrydallal.com/LHSP/p05.htm
Thus it is only �tting to consider an agricultural experiment toillustrate blocking.
101
Nitrogen Fertilizer Timing
A nitrogen fertilizer can be administered according to 6di�erent timing schedules (treatments).
The response is the nitrogen uptake (ppm×10−2).The experimental material: One irrigated �eld.
Subdividing the �eld into di�erent experimental units for usewith di�erent treatments could be a�ected by soil moisturevariation, caused by a sloping �eld gradient.
This moisture gradient is mainly across the width of the �eld.
This suggest rows along the �eld length as blocks.
102
Field Moisture Gradient
wet
dry
soil
wet
ness
103
The Data
Field subdivided into 4 rows (or blocks) with 6 plots each.
The 6 treatments were randomly assigned to each row.
Remaining moisture variability within rows a�ects MSE .
rowtreatment
response
12
40.895
37.994
37.181
34.986
34.893
42.07
21
41.223
49.424
45.856
50.155
41.992
46.69
36
44.573
52.685
37.611
36.942
46.654
40.23
42
41.904
39.206
43.295
40.453
42.911
39.97
Also available as fertilizerdata.csv on web site.
104
Randomized Complete Block (RCB) Design
Experimental units are blocked into presumably morehomogeneous groups.
The blocks are complete, i.e., each treatment in each block.
The blocks are balanced
t1 = 4 observations for each treatment level.t2 = 6 treatments for each block level (row).n = 1 observation per (block,treatment) combination.
105
Nitrogen Fertilizer Box Plots
1 2 3 4 5 6
3540
4550
treatment
nitr
ogen
upt
ake
(ppm
×
10−2
)
1 2 3 4
3540
4550
row
nitr
ogen
upt
ake
(ppm
×
10−2
)
106
ANOVA Table
Source SS df MS F
Block SSBlock t1 − 1 SSBlock/(t1 − 1) MSBlock/MSE
Treatment SSTreat t2 − 1 SSTreat/(t2 − 1) MSTreat/MSE
Error SSE dfE SSE/dfE
Total SST t1t2n − 1
Note that here n = 1, thus t1t2n − 1 = t1t2 − 1, anddfE = (t1 − 1)(t2 − 1) + t1t2(n − 1) = (t1 − 1)(t2 − 1).
The table is the same as in the additive 2-factor ANOVA.
Just relabel factors A and B to Block and Treatment.
107
Nitrogen Fertilizer Results
> anova(lm(response~as.factor(row)+as.factor(treatment),data=fertilizerdata))
Analysis of Variance Table
Response: responseDf Sum Sq Mean Sq F value Pr(>F)
as.factor(row) 3 197.004 65.668 9.1198 0.001116 **as.factor(treatment) 5 201.316 40.263 5.5917 0.004191 **Residuals 15 108.008 7.201---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Both treatment and blocking factor are signi�cant at .005.
108
Nitrogen Fertilizer Results (Ignoring the Blocking)
> anova(lm(response~as.factor(treatment),data=fertilizerdata))
Analysis of Variance Table
Response: responseDf Sum Sq Mean Sq F value Pr(>F)
as.factor(treatment) 5 201.316 40.263 2.3761 0.08024 .Residuals 18 305.012 16.945---Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Not signi�cant at .05.
A�ected by extra variation induced by wetness gradient.
Less discrimination power to see the fertilizer timing e�ect.
109
F -Statistics in the Last Two ANOVAs
Compare the treatment F -statistics for the last two ANOVAs.
F(1)Treat =
∑ij(Y.j − Y..)2/5∑
ij(Yij − Y.j − [Yi. − Y..])2/15
and F(2)Treat =
∑ij(Y.j − Y..)2/5∑ij(Yij − Y.j)2/18
In F(1)Treat the variation of the Yij around Y.j is corrected for the
variation due to Yi. − Y.., i.e., the row variation.
In F(2)Treat such a correction is omitted, i.e., the row variation is
ignored and absorbed as part of the MSE :∑ij
(Yij−Y.j)2 =∑ij
(Yij−Y.j−[Yi.−Y..])2+∑ij
(Yi. − Y..)2
110
The Last Identity
The previous decomposition could use some elaboration:
∑ij
(Yij − Y.j − [Yi. − Y..])2 =∑ij
(Yij − Y.j )2 +∑ij
(Yi. − Y..)2
−2∑ij
(Yij − Y.j )(Yi. − Y..)
=∑ij
(Yij − Y.j )2 +∑ij
(Yi. − Y..)2
−2∑ij
(Yi. − Y..)2
=∑ij
(Yij − Y.j )2 −∑ij
(Yi. − Y..)2
=⇒∑ij
(Yij − Y.j )2 =∑ij
(Yij − Y.j − [Yi. − Y..])2 +∑ij
(Yi. − Y..)2
111
A Look at the Block Adjustment
Model: Yij = µ+ bi + τj + εij
Estimated block e�ect bi = Yi. − Y..Consider the block adjusted observationsZij = Yij − bi = Yij − (Yi. − Y..)These Zij , when plotted on separate levels for each blockidenti�er i , will look more aligned, more like �replicates.�Thus it makes sense to look at the average Z.j − Z.. overthose blocks as estimate for the j th treatment e�ect.
Z.j − Z.. = Y.j − 0− (Y.. − 0) = Y.j − Y..Zij − Z.j = Yij − (Yi. − Y..)− (Y.j − 0) = Yij − Yi. − Y.j + Y..Compare the treatment e�ect dispersion∑
ij(Z.j − Z..)2 =∑
ij(Y.j − Y..)2against the within �replication� dispersion:∑
ij(Zij − Z.j)2 =∑
ij(Yij − Yi. − Y.j + Y..)2which leads back to our F
(1)Treat
.
112
Visual Block Adjustment
8 9 10 11 12
●
block 1
●
block 2
●block 3
●block 4
response
9.6 9.8 10.0 10.2 10.4
●
block 1
●
block 2
●block 3
●block 4
response adjusted for block effect
113
Comments on the Visual Block Adjustments
Clear block e�ect.
Compare variability between and within blocks.
After the blocks are shifted to remove the individual blocke�ects the data sets appear aligned on top of each other.
Pattern relationship within each block is undisturbed.
We are just using a magni�ed scale.
Easy to see di�erences between symbol groups (treatments)compared to variation within symbol groups.
Not easy to see without prior alignment.
114
Randomization Tests Revisited
Assign 6 treatments randomly within each row.The number of possible assignments is
(6!)4 = 7204 = 268, 738, 560, 000 ≈ 2.7 · 1011
This is roughly 1/10, 000 the number of patterns assigning 6treatments in groups of 4 without row blocking restriction(
24
4
)(20
4
)(16
4
)(12
4
)(8
4
)(4
4
)= 3.246671 · 1015
If the treatment e�ects were identical, then the treatmentassignment would not have a�ected the responses.
Any other of the 2.7 · 1011 treatment assignments would havegiven us the same results Yij .Test H0 : τ1 = . . . = τ6(= 0) using as test statistics FTreat andits resulting randomization reference distribution
FTreat =
∑ij(Y.j − Y..)2/(6− 1)∑
ij(Yij − Yi. − Y.j + Y..)2/((6− 1)× (4− 1))
115
A Large Value of FTreat
According to the premise of no treatment e�ect, whencalculating FTreat for all these 2.7 · 1011 treatmentassignments, we get 2.7 · 1011 equally likely values of FTreat ,i.e., the reference distribution of FTreat .
This distribution by theory ≈ F5,15-distribution.
F obs
Treat , would then be like one of these equally likely values.
If F obs
Treat is in the far upper tail of the reference distribution,judge the result signi�cant evidence against H0.
Strength of conviction ⇔ p-value = P(FTreat ≥ F obs
Treat)
116
Randomization Test for Treatment E�ect
Randomization Distribution (F−test for Treatment Effect)
randomization F−statistic
Den
sity
0 2 4 6 8 10
0.0
0.2
0.4
0.6
p−va
lue
= 0
.001
27
with F5 , 15 −density superimposed
Nsim = 1e+05
117
Comparing Results
Randomization reference distribution=⇒ p-value = .00127 = 127/100000.
Normal theory test for the same hypothesis and using the sametest statistic came up with a p-value = .004191 from F5,15.
Question: Is this explainable by statistical variation alone?
Such statistical variation would come from the fact that weapproximated the true randomization reference distribution bysimulation from Nsim = 100, 000 treatment allocations.
A 95% upper con�dence bound for the p-value resulting fromthe full true reference distribution can be obtained from R viaqgamma(.95,127+1)/100000=0.001471603 or viaqbeta(.95,127+1,100000-127)=0.001471455.
That still leaves a wide gap to .004191.
F5,15 approximation is not good enough out in the far tail.
The superimposed F -density looks mostly like a good �t.
118
Unbalanced Designs
Deliberately undo the balanced design in poison/delivery data.
> recip.time0 <- recip.time[c(-1,-5,-15)]> delivery0 <- delivery[c(-1,-5,-15)]> type0 <- type[c(-1,-5,-15)]
> table(type,delivery)delivery
type A B C DI 4 4 4 4II 4 4 4 4III 4 4 4 4
> table(type0,delivery0)delivery0
type0 A B C DI 3 3 4 4II 3 4 4 4III 4 4 4 4
119
ANOVA in Unbalanced Designs
delivery0type0 <- lm(recip.time0 ~ delivery0*type0)> coef(delivery0type0)
(Intercept) delivery0B delivery0C delivery0D0.22405722 -0.10680749 -0.03778486 -0.05508903
type0II type0III delivery0B:type0II delivery0C:type0II0.11914618 0.25621130 -0.09705669 -0.03402663
delivery0D:type0II delivery0B:type0III delivery0C:type0III delivery0D:type0III-0.11796097 -0.07056376 -0.01598499 -0.11599898
> anova(delivery0type0)
Analysis of Variance Table
Response: recip.time0Df Sum Sq Mean Sq F value Pr(>F)
delivery0 3 0.19124 0.063745 27.7042 3.873e-09 ***type0 2 0.34039 0.170193 73.9673 6.403e-13 ***delivery0:type0 6 0.02142 0.003570 1.5514 0.1926Residuals 33 0.07593 0.002301
120
ANOVA in Unbalanced Design (order reversed)
> type0delivery0 <- lm(recip.time0 ~ type0*delivery0)> coef(type0delivery0)
(Intercept) type0II type0III delivery0B0.22405722 0.11914618 0.25621130 -0.10680749delivery0C delivery0D type0II:delivery0B type0III:delivery0B
-0.03778486 -0.05508903 -0.09705669 -0.07056376type0II:delivery0C type0III:delivery0C type0II:delivery0D type0III:delivery0D
-0.03402663 -0.01598499 -0.11796097 -0.11599898
> anova(type0delivery0)
Analysis of Variance Table
Response: recip.time0Df Sum Sq Mean Sq F value Pr(>F)
type0 2 0.35058 0.175291 76.1830 4.295e-13 ***delivery0 3 0.18104 0.060347 26.2271 7.307e-09 ***type0:delivery0 6 0.02142 0.003570 1.5514 0.1926Residuals 33 0.07593 0.002301
121
Comments on ANOVA in Unbalanced Designs
The type and delivery data subspaces are no longerorthogonal.
Each x ∈ delivery subspace can be decomposed into
x = y⊥+ z with y ∈ type subspace.
The projection onto the type subspace will re�ect acomponent within the delivery subspace.
The subsequent projection onto the delivery subspace willonly account for the part not yet explained by the previousprojection.
Reversing type and delivery has a corresponding e�ect.
The order of projection matters for the distances (or SS's).
The parameter estimates are the same (same model).
122
Top Related