Factorial Experiments
description
Transcript of Factorial Experiments
Factorial Experiments
Analysis of Variance
Experimental Design
• Dependent variable Y
• k Categorical independent variables A, B, C, … (the Factors)
• Let– a = the number of categories of A– b = the number of categories of B– c = the number of categories of C– etc.
The Completely Randomized Design
• We form the set of all treatment combinations – the set of all combinations of the k factors
• Total number of treatment combinations– t = abc….
• In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination.– Total number of experimental units N = nt=nabc..
The treatment combinations can thought to be arranged in a k-dimensional rectangular block
A
1
2
a
B1 2 b
A
B
C
Another way of representing the treatment combinations in a factorial experiment
A
B
...
D
C
...
Example
In this example we are examining the effect of
We have n = 10 test animals randomly assigned to k = 6 diets
The level of protein A (High or Low) and The source of protein B (Beef, Cereal, or Pork) on weight gains Y (grams) in rats.
The k = 6 diets are the 6 = 3×2 Level-Source combinations
1. High - Beef
2. High - Cereal
3. High - Pork
4. Low - Beef
5. Low - Cereal
6. Low - Pork
TableGains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s
ource of protein (Beef, Cereal, or Pork)
Levelof Protein High Protein Low protein
Sourceof Protein Beef Cereal Pork Beef Cereal Pork
Diet 1 2 3 4 5 6
73 98 94 90 107 49102 74 79 76 95 82118 56 96 90 97 73104 111 98 64 80 86
81 95 102 86 98 81107 88 102 51 74 97100 82 108 72 74 106
87 77 91 90 67 70117 86 120 95 89 61111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
Example – Four factor experiment
Four factors are studied for their effect on Y (luster of paint film). The four factors are:
Two observations of film luster (Y) are taken for each treatment combination
1) Film Thickness - (1 or 2 mils)
2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and
4) Temperature of wash (92 ˚C or 100 ˚C)
The data is tabulated below:Regular Dry Special DryMinutes 92 C 100 C 92C 100 C
1-mil Thickness20 3.4 3.4 19.6 14.5 2.1 3.8 17.2 13.430 4.1 4.1 17.5 17.0 4.0 4.6 13.5 14.340 4.9 4.2 17.6 15.2 5.1 3.3 16.0 17.860 5.0 4.9 20.9 17.1 8.3 4.3 17.5 13.9
2-mil Thickness20 5.5 3.7 26.6 29.5 4.5 4.5 25.6 22.530 5.7 6.1 31.6 30.2 5.9 5.9 29.2 29.840 5.5 5.6 30.5 30.2 5.5 5.8 32.6 27.4
60 7.2 6.0 31.4 29.6 8.0 9.9 33.5 29.5
NotationLet the single observations be denoted by a single letter and a number of subscripts
yijk…..l
The number of subscripts is equal to:(the number of factors) + 1
1st subscript = level of first factor 2nd subscript = level of 2nd factor …Last subsrcript denotes different observations on the same treatment combination
Notation for Means
When averaging over one or several subscripts we put a “bar” above the letter and replace the subscripts by •
Example:
y241 • •
Profile of a Factor
Plot of observations means vs. levels of the factor.
The levels of the other factors may be held constant or we may average over the other levels
Definition:
A factor is said to not affect the response if the profile of the factor is horizontal for all combinations of levels of the other factors:
No change in the response when you change the levels of the factor (true for all combinations of levels of the other factors)
Otherwise the factor is said to affect the response:
Definition:• Two (or more) factors are said to interact if
changes in the response when you change the level of one factor depend on the level(s) of the other factor(s).
• Profiles of the factor for different levels of the other factor(s) are not parallel
• Otherwise the factors are said to be additive .
• Profiles of the factor for different levels of the other factor(s) are parallel.
• If two (or more) factors interact each factor effects the response.
• If two (or more) factors are additive it still remains to be determined if the factors affect the response
• In factorial experiments we are interested in determining
– which factors effect the response and– which groups of factors interact .
0
10
20
30
40
50
60
70
0 20 40 60
Factor A has no effect
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Additive Factors
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Interacting Factors
A
B
The testing in factorial experiments 1. Test first the higher order interactions.2. If an interaction is present there is no need
to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact
3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.
Level of Protein Beef Cereal Pork Overall
Low 79.20 83.90 78.70 80.60
Source of Protein
High 100.00 85.90 99.50 95.13
Overall 89.60 84.90 89.10 87.87
Example: Diet Example
Summary Table of Cell means
70
80
90
100
110
Beef Cereal Pork
Wei
ght
Gai
n
High Protein
Low Protein
Overall
Profiles of Weight Gain for Source and Level of Protein
70
80
90
100
110
High Protein Low Protein
Wei
ght
Gai
nBeef
Cereal
Pork
Overall
Profiles of Weight Gain for Source and Level of Protein
Models for factorial Experiments
Single Factor: A – a levels
yij = + i + ij i = 1,2, ... ,a; j = 1,2, ... ,n
01
a
ii
Random error – Normal, mean 0, std-dev.
i
iAyi when ofmean thei
Overall mean Effect on y of factor A when A = i
y11
y12
y13
y1n
y21
y22
y23
y2n
y31
y32
y33
y3n
ya1
ya2
ya3
yan
Levels of A1 2 3 a
observationsNormal dist’n
Mean of observations
1 2 3 a
+ 1
+ 2
+ 3
+ a
Definitions
a
iia 1
1mean overall
a
iiiii a
iA1
1 )en (Effect wh
Two Factor: A (a levels), B (b levels
yijk = + i + j+ ()ij + ijk
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,n
0,0,0,01111
b
jij
a
iij
b
jj
a
ii
ij
ijji
ij jBiAy
and when ofmean the
Overall mean
Main effect of A Main effect of B
Interaction effect of A and B
Table of Means
Table of Effects – Overall mean, Main effects, Interaction Effects
Three Factor: A (a levels), B (b levels), C (c levels)
yijkl = + i + j+ ij + k + ()ik + ()jk+ ijk + ijkl
= + i + j+ k + ij + (ik + (jk
+ ijk + ijkl
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n
0,,0,0,0,011111
c
kijk
a
iij
c
kk
b
jj
a
ii
Main effects Two factor Interactions
Three factor Interaction Random error
ijk = the mean of y when A = i, B = j, C = k
= + i + j+ k + ij + (ik + (jk
+ ijk
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,n
0,,0,0,0,011111
c
kijk
a
iij
c
kk
b
jj
a
ii
Main effects Two factor Interactions
Three factor Interaction
Overall mean
Levels of C
Levels of B
Levels of A
Levels of B
Levels of A
No interaction
Levels of C
Levels of B
Levels of A Levels of A
A, B interact, No interaction with C
Levels of B
Levels of C
Levels of B
Levels of A Levels of A
A, B, C interact
Levels of B
Four Factor:
yijklm = + + j+ ()ij + k + ()ik + ()jk+ ()ijk + l+ ()il + ()jl+ ()ijl + ()kl + ()ikl + ()jkl+ ()ijkl + ijklm
=
+i + j+ k + l
+ ()ij + ()ik + ()jk + ()il + ()jl+ ()kl
+()ijk+ ()ijl + ()ikl + ()jkl
+ ()ijkl + ijklm
i = 1,2, ... ,a ; j = 1,2, ... ,b ; k = 1,2, ... ,c; l = 1,2, ... ,d; m = 1,2, ... ,n
where 0 = i = j= ()ij k = ()ik = ()jk= ()ijk = l= ()il = ()jl = ()ijl = ()kl = ()ikl = ()jkl =
()ijkl
and denotes the summation over any of the subscripts.
Main effects Two factor Interactions
Three factor Interactions
Overall mean
Four factor Interaction Random error
Estimation of Main Effects and Interactions • Estimator of Main effect of a Factor
• Estimator of k-factor interaction effect at a combination of levels of the k factors
= Mean at the combination of levels of the k factors - sum of all means at k-1 combinations of levels of the k factors +sum of all means at k-2 combinations of levels of the k factors - etc.
= Mean at level i of the factor - Overall Mean
Example:
• The main effect of factor B at level j in a four factor (A,B,C and D) experiment is estimated by:
• The two-factor interaction effect between factors B and C when B is at level j and C is at level k is estimated by:
yyˆjj
yyyy kjjkjk
• The three-factor interaction effect between factors B, C and D when B is at level j, C is at level k and D is at level l is estimated by:
• Finally the four-factor interaction effect between factors A,B, C and when A is at level i, B is at level j, C is at level k and D is at level l is estimated by:
yyyyyyyy lkjklljjkjkljkl
jklikiijjklklilijijkijklijkl yyyyyyyyy
yyyyyyy lkjikllj
Anova Table entries
• Sum of squares interaction (or main) effects being tested = (product of sample size and levels of factors not included in the interaction) × (Sum of squares of effects being tested)
• Degrees of freedom = df = product of (number of levels - 1) of factors included in the interaction.
a
iiA nbSS
1
2
b
jjB naSS
1
2
a
i
b
jijAB nSS
1 1
2
a
i
b
j
n
kijijkError yySS
1 1 1
2
Analysis of Variance (ANOVA) Table Entries (Two factors – A and B)
The ANOVA Table
a
iiA nbcSS
1
2
b
jjB nacSS
1
2
a
i
b
jijAB ncSS
1 1
2
a
i
c
kikAC nbSS
1 1
2
b
j
c
kjkBC naSS
1 1
2
a
i
b
j
c
kijkABC nSS
1 1 1
2
a
i
b
j
c
k
n
lijkijklError yySS
1 1 1 1
2
Analysis of Variance (ANOVA) Table Entries (Three factors – A, B and C)
c
kkC nabSS
1
2
The ANOVA Table
Source SS df
A SSA a-1
B SSB b-1
C SSC c-1
AB SSAB (a-1)(b-1)
AC SSAC (a-1)(c-1)
BC SSBC (b-1)(c-1)
ABC SSABC (a-1)(b-1)(c-1)
Error SSError abc(n-1)
• The Completely Randomized Design is called balanced
• If the number of observations per treatment combination is unequal the design is called unbalanced. (resulting mathematically more complex analysis and computations)
• If for some of the treatment combinations there are no observations the design is called incomplete. (some of the parameters - main effects and interactions - cannot be estimated.)
Example: Diet example
Mean
= 87.867
y
Main Effects for Factor A (Source of Protein)
Beef Cereal Pork
1.733 -2.967 1.233
yyˆ ii
Main Effects for Factor B (Level of Protein)
High Low
7.267 -7.267
yyˆjj
AB Interaction Effects
Source of Protein
Beef Cereal Pork
Level High 3.133 -6.267 3.133
of Protein Low -3.133 6.267 -3.133
yy-y-y jiijij
Example 2
Paint Luster Experiment
Table: Means and Cell Frequencies
Means and Frequencies for the AB Interaction (Temp - Drying)
0
5
10
15
20
25
92 100
Temperature
Lus
ter
Regular Dry
Special Dry
Overall
Profiles showing Temp-Dry Interaction
Means and Frequencies for the AD Interaction (Temp- Thickness)
0
5
10
15
20
25
30
92 100
Temperature
Lus
ter
1-mil
2-mil
Overall
Profiles showing Temp-Thickness Interaction
The Main Effect of C (Length)
7060504030201012
13
14
15
16
Profile of Effect of Length on Luster
Length
Lu
ster
Factorial Experiments
Analysis of Variance
Experimental Design
• Dependent variable Y
• k Categorical independent variables A, B, C, … (the Factors)
• Let– a = the number of categories of A– b = the number of categories of B– c = the number of categories of C– etc.
Objectives
•Determine which factors have some effect on the response
•Which groups of factors interact
The Completely Randomized Design
• We form the set of all treatment combinations – the set of all combinations of the k factors
• Total number of treatment combinations– t = abc….
• In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination.– Total number of experimental units N = nt=nabc..
0
10
20
30
40
50
60
70
0 20 40 60
Factor A has no effect
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Additive Factors
A
B
0
10
20
30
40
50
60
70
0 20 40 60
Interacting Factors
A
B
The testing in factorial experiments 1. Test first the higher order interactions.2. If an interaction is present there is no need
to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact
3. The testing continues with for lower order interactions and main effects for factors which have not yet been determined to affect the response.
Anova table for the 3 factor Experiment
Source SS df MS F p -value
A SSA a - 1 MSA MSA/MSError
B SSB b - 1 MSB MSB/MSError
C SSC c - 1 MSC MSC/MSError
AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError
AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError
BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError
ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError
Error SSError abc(n - 1) MSError
Sum of squares entries
a
ii
a
iiA yynbcnbcSS
1
2
1
2
Similar expressions for SSB , and SSC.
a
i
b
jjiij
a
iijAB yyyyncncSS
1 1
2
1
2
Similar expressions for SSBC , and SSAC.
Sum of squares entries
Finally
a
iikjABC nSS
1
2
a
i
b
j
c
kijkkiijijk yyyyyn
1 1 1 2 ikj yyy
a
i
b
j
c
k
n
lijkijklError yySS
1 1 1 1
2
The statistical model for the 3 factor Experiment
effectsmain effectmean kjiijk/y
error randomninteractiofactor 3nsinteractiofactor 2
ijk/ijkjkikij
Anova table for the 3 factor Experiment
Source SS df MS F p -value
A SSA a - 1 MSA MSA/MSError
B SSB b - 1 MSB MSB/MSError
C SSC c - 1 MSC MSC/MSError
AB SSAB (a - 1)(b - 1) MSAB MSAB/MSError
AC SSAC (a - 1)(c - 1) MSAC MSAC/MSError
BC SSBC (b - 1)(c - 1) MSBC MSBC/MSError
ABC SSABC (a - 1)(b - 1)(c - 1) MSABC MSABC/MSError
Error SSError abc(n - 1) MSError
The testing in factorial experiments 1. Test first the higher order interactions.2. If an interaction is present there is no need
to test lower order interactions or main effects involving those factors. All factors in the interaction affect the response and they interact
3. The testing continues with lower order interactions and main effects for factors which have not yet been determined to affect the response.
Examples
Using SPSS
Example
In this example we are examining the effect of
We have n = 10 test animals randomly assigned to k = 6 diets
• the level of protein A (High or Low) and • the source of protein B (Beef, Cereal, or
Pork) on weight gains (grams) in rats.
The k = 6 diets are the 6 = 3×2 Level-Source combinations
1. High - Beef
2. High - Cereal
3. High - Pork
4. Low - Beef
5. Low - Cereal
6. Low - Pork
TableGains in weight (grams) for rats under six diets differing in level of protein (High or Low) and s
ource of protein (Beef, Cereal, or Pork)
Levelof Protein High Protein Low protein
Sourceof Protein Beef Cereal Pork Beef Cereal Pork
Diet 1 2 3 4 5 6
73 98 94 90 107 49102 74 79 76 95 82118 56 96 90 97 73104 111 98 64 80 86
81 95 102 86 98 81107 88 102 51 74 97100 82 108 72 74 106
87 77 91 90 67 70117 86 120 95 89 61111 92 105 78 58 82
Mean 100.0 85.9 99.5 79.2 83.9 78.7Std. Dev. 15.14 15.02 10.92 13.89 15.71 16.55
The data as it appears in SPSS
To perform ANOVA select Analyze->General Linear Model-> Univariate
The following dialog box appears
Select the dependent variable and the fixed factors
Press OK to perform the Analysis
The Output
Tests of Between-Subjects Effects
Dependent Variable: WTGN
4612.933a 5 922.587 4.300 .002
463233.1 1 463233.1 2159.036 .000
266.533 2 133.267 .621 .541
3168.267 1 3168.267 14.767 .000
1178.133 2 589.067 2.746 .073
11586.000 54 214.556
479432.0 60
16198.933 59
SourceCorrected Model
Intercept
SOURCE
LEVEL
SOURCE * LEVEL
Error
Total
Corrected Total
Type IIISum of
Squares dfMean
Square F Sig.
R Squared = .285 (Adjusted R Squared = .219)a.
Example – Four factor experiment
Four factors are studied for their effect on Y (luster of paint film). The four factors are:
Two observations of film luster (Y) are taken for each treatment combination
1) Film Thickness - (1 or 2 mils)
2) Drying conditions (Regular or Special) 3) Length of wash (10,30,40 or 60 Minutes), and
4) Temperature of wash (92 ˚C or 100 ˚C)
The data is tabulated below:Regular Dry Special DryMinutes 92 C 100 C 92C 100 C
1-mil Thickness20 3.4 3.4 19.6 14.5 2.1 3.8 17.2 13.430 4.1 4.1 17.5 17.0 4.0 4.6 13.5 14.340 4.9 4.2 17.6 15.2 5.1 3.3 16.0 17.860 5.0 4.9 20.9 17.1 8.3 4.3 17.5 13.9
2-mil Thickness20 5.5 3.7 26.6 29.5 4.5 4.5 25.6 22.530 5.7 6.1 31.6 30.2 5.9 5.9 29.2 29.840 5.5 5.6 30.5 30.2 5.5 5.8 32.6 27.460 7.2 6.0 31.4 29.6 8.0 9.9 33.5 29.5
The Data as it appears in SPSS
The dialog box for performing ANOVA
Tests of Between-Subjects Effects
Dependent Variable: LUSTRE
6548.020a 31 211.226 76.814 .000
12586.035 1 12586.035 4577.000 .000
5039.225 1 5039.225 1832.550 .000
5.700 1 5.700 2.073 .160
70.285 3 23.428 8.520 .000
844.629 1 844.629 307.155 .000
15.504 1 15.504 5.638 .024
3.155 3 1.052 .383 .766
9.890 3 3.297 1.199 .326
6.422 3 2.141 .778 .515
511.325 1 511.325 185.947 .000
1.410 1 1.410 .513 .479
.150 1 .150 .055 .817
15.642 3 5.214 1.896 .150
11.520 3 3.840 1.396 .262
7.320 3 2.440 .887 .458
5.840 3 1.947 .708 .554
87.995 32 2.750
19222.050 64
6636.015 63
SourceCorrected Model
Intercept
TEMP
COND
LENGTH
THICK
TEMP * COND
TEMP * LENGTH
COND * LENGTH
TEMP * COND * LENGTH
TEMP * THICK
COND * THICK
TEMP * COND * THICK
LENGTH * THICK
TEMP * LENGTH * THICK
COND * LENGTH *THICK
TEMP * COND * LENGTH* THICK
Error
Total
Corrected Total
Type IIISum of
Squares dfMean
Square F Sig.
R Squared = .987 (Adjusted R Squared = .974)a.
The output
Random Effects and Fixed Effects Factors
• So far the factors that we have considered are fixed effects factors
• This is the case if the levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels.
• If the levels have been selected at random from a population of levels the factor is called a random effects factor
• The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment
Example - Fixed Effects
Source of Protein, Level of Protein, Weight GainDependent
– Weight Gain
Independent– Source of Protein,
• Beef• Cereal• Pork
– Level of Protein,• High• Low
Example - Random Effects
In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured.Dependent
– Mileage
Independent– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),• Random Effects factor
The Model for the fixed effects experiment
where , 1, 2, 3, 1, 2, ()11 , ()21 , ()31 , ()12 , ()22 , ()32 , are fixed unknown constants
And ijk is random, normally distributed with mean 0 and variance 2.
Note:
ijkijjiijky
01111
b
jij
a
iij
n
jj
a
ii
The Model for the case when factor B is a random effects factor
where , 1, 2, 3, are fixed unknown constants
And ijk is random, normally distributed with mean 0 and variance 2.
j is normal with mean 0 and varianceand
()ij is normal with mean 0 and varianceNote:
ijkijjiijky
01
a
ii
2B
2AB
This model is called a variance components model
The Anova table for the two factor model
ijkijjiijky
Source SS df MS
A SSAa -1 SSA/(a – 1)
B SSAb - 1 SSB/(a – 1)
AB SSAB(a -1)(b -1) SSAB/(a – 1) (a – 1)
Error SSError ab(n – 1) SSError/ab(n – 1)
The Anova table for the two factor model (A, B – fixed)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSError
B SSA b - 1 MSB MSB/MSError
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iia
nb
1
22
1
b
jjb
na
1
22
1
a
i
b
jijba
n
1 1
22
11
EMS = Expected Mean Square
The Anova table for the two factor model (A – fixed, B - random)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSAB
B SSA b - 1 MSB MSB/MSError
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iiAB a
nbn
1
222
1
22Bna
22ABn
Note: The divisor for testing the main effects of A is no longer MSError but MSAB.
Rules for determining Expected Mean Squares (EMS) in an Anova
Table
1. Schultz E. F., Jr. “Rules of Thumb for Determining Expectations of Mean Squares in Analysis of Variance,”Biometrics, Vol 11, 1955, 123-48.
Both fixed and random effects
Formulated by Schultz[1]
1. The EMS for Error is 2.2. The EMS for each ANOVA term contains
two or more terms the first of which is 2.3. All other terms in each EMS contain both
coefficients and subscripts (the total number of letters being one more than the number of factors) (if number of factors is k = 3, then the number of letters is 4)
4. The subscript of 2 in the last term of each EMS is the same as the treatment designation.
5. The subscripts of all 2 other than the first contain the treatment designation. These are written with the combination involving the most letters written first and ending with the treatment designation.
6. When a capital letter is omitted from a subscript , the corresponding small letter appears in the coefficient.
7. For each EMS in the table ignore the letter or letters that designate the effect. If any of the remaining letters designate a fixed effect, delete that term from the EMS.
8. Replace 2 whose subscripts are composed entirely of fixed effects by the appropriate sum.
2
2 1 by 1
a
ii
A a
2
2 1 by 1 1
a
iji
AB a b
Example: 3 factors A, B, C – all are random effects
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2ABC AB AC An nc nb nbc
2 2 2 2 2ABC AB BC Bn nc na nac
2 2 2 2 2ABC BC AC Cn na nb nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2 2ABC BCn na
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ABCMS MS
ABC ErrorMS MS
Example: 3 factors A fixed, B, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2
1
1a
ABC AB AC ii
n nc nb nbc a
2 2 2
BC Bna nac
2 2 2BC Cna nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ErrorMS MS
ABC ErrorMS MS
C BCMS MS
B BCMS MS
Example: 3 factors A , B fixed, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2
1
1a
AC ii
nb nbc a
2 2Cnab
2 2ACnb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B BCMS MS 2 2 2
1
1a
BC ji
na nac b
22 2
1 1
1 1a b
ABC iji j
n nc a b
A ACMS MS
Example: 3 factors A , B and C fixed
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2
1
1a
ii
nbc a
2
AB ErrorMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B ErrorMS MS 2 2
1
1a
ji
nac b
22
1 1
1 1a b
iji j
nc a b
A ErrorMS MS
2 2
1
1c
kk
nbc c
22
1 1
1 1a c
iji k
nb a c
22
1 1
1 1b c
ijj k
na b c
22
1 1 1
1 1 1a b c
ijki j k
n a b c
Example - Random Effects
In this Example a Taxi company is interested in comparing the effects of three brands of tires (A, B and C) on mileage (mpg). Mileage will also be effected by driver. The company selects at random b = 4 drivers at random from its collection of drivers. Each driver has n = 3 opportunities to use each brand of tire in which mileage is measured.Dependent
– Mileage
Independent– Tire brand (A, B, C),
• Fixed Effect Factor
– Driver (1, 2, 3, 4),• Random Effects factor
The DataDriver Tire Mileage Driver Tire Mileage
1 A 39.6 3 A 33.91 A 38.6 3 A 43.21 A 41.9 3 A 41.31 B 18.1 3 B 17.81 B 20.4 3 B 21.31 B 19 3 B 22.31 C 31.1 3 C 31.31 C 29.8 3 C 28.71 C 26.6 3 C 29.72 A 38.1 4 A 36.92 A 35.4 4 A 30.32 A 38.8 4 A 352 B 18.2 4 B 17.82 B 14 4 B 21.22 B 15.6 4 B 24.32 C 30.2 4 C 27.42 C 27.9 4 C 26.62 C 27.2 4 C 21
Asking SPSS to perform Univariate ANOVA
Select the dependent variable, fixed factors, random factors
The Output
Tests of Between-Subjects Effects
Dependent Variable: MILEAGE
28928.340 1 28928.340 1270.836 .000
68.290 3 22.763a
2072.931 2 1036.465 71.374 .000
87.129 6 14.522b
68.290 3 22.763 1.568 .292
87.129 6 14.522b
87.129 6 14.522 2.039 .099
170.940 24 7.123c
SourceHypothesis
Error
Intercept
Hypothesis
Error
TIRE
Hypothesis
Error
DRIVER
Hypothesis
Error
TIRE * DRIVER
Type IIISum ofSquares df
MeanSquare F Sig.
MS(DRIVER)a.
MS(TIRE * DRIVER)b.
MS(Error)c.
The divisor for both the fixed and the random main effect is MSAB
This is contrary to the advice of some texts
The Anova table for the two factor model (A – fixed, B - random)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSAB
B SSA b - 1 MSB MSB/MSError
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iiAB a
nbn
1
222
1
22Bna
22ABn
Note: The divisor for testing the main effects of A is no longer MSError but MSAB.
References Guenther, W. C. “Analysis of Variance” Prentice Hall, 1964
The Anova table for the two factor model (A – fixed, B - random)
ijkijjiijky
Source SS df MS EMS F
A SSA a -1 MSA MSA/MSAB
B SSA b - 1 MSB MSB/MSAB
AB SSAB (a -1)(b -1) MSAB MSAB/MSError
Error SSError ab(n – 1) MSError2
a
iiAB a
nbn
1
222
1
222BAB nan
22ABn
Note: In this case the divisor for testing the main effects of A is MSAB . This is the approach used by SPSS.
References Searle “Linear Models” John Wiley, 1964
Crossed and Nested Factors
The factors A, B are called crossed if every level of A appears with every level of B in the treatment combinations.
Levels of B
Levels of A
Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of B
Levels of A
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant
Plants
Machines
Machines (B) are nested within plants (A)
The model for a two factor experiment with B nested within A.
error random within ofeffect factor ofeffect mean overall
ijkAB
ijA
iijky
The ANOVA table
Source SS df MS F p - value
A SSA a - 1 MSA MSA/MSError
B(A) SSB(A) a(b – 1) MSB(A) MSB(A) /MSError
Error SSError ab(n – 1) MSError
Note: SSB(A ) = SSB + SSAB and a(b – 1) = (b – 1) + (a - 1)(b – 1)
Example: A company has a = 4 plants for producing paper. Each plant has 6 machines for producing the paper. The company is interested in how paper strength (Y) differs from plant to plant and from machine to machine within plant.
Also we have n = 5 measurements of paper strength for each of the 24 machines
The Data
Plant 1 2 machine 1 2 3 4 5 6 7 8 9 10 11 12
98.7 59.2 84.1 72.3 83.5 60.6 33.6 44.8 58.9 63.9 63.7 48.1 93.1 87.8 86.3 110.3 89.3 84.8 48.2 57.3 51.6 62.3 54.6 50.6
100.0 84.1 83.4 81.6 86.1 83.6 68.9 66.5 45.2 61.1 55.3 39.9 Plant 3 4 machine 13 14 15 16 17 18 19 20 21 22 23 24
83.6 76.1 64.2 69.2 77.4 61.0 64.2 35.5 46.9 37.0 43.8 30.0 84.6 55.4 58.4 86.7 63.3 81.3 50.3 30.8 43.1 47.8 62.4 43.0
90.6 92.3 75.4 60.8 76.6 73.8 32.1 36.3 40.8 41.0 60.8 56.9
Anova Table Treating Factors (Plant, Machine) as crossed
Tests of Between-Subjects Effects
Dependent Variable: STRENGTH
21031.065a 23 914.394 7.972 .000
298531.4 1 298531.4 2602.776 .000
18174.761 3 6058.254 52.820 .000
1238.379 5 247.676 2.159 .074
1617.925 15 107.862 .940 .528
5505.469 48 114.697
325067.9 72
26536.534 71
SourceCorrected Model
Intercept
PLANT
MACHINE
PLANT * MACHINE
Error
Total
Corrected Total
Type IIISum of
Squares dfMean
Square F Sig.
R Squared = .793 (Adjusted R Squared = .693)a.
Anova Table: Two factor experiment B(machine) nested in A (plant)
Source Sum of Squares df Mean Square F p - valuePlant 18174.76119 3 6058.253731 52.819506 0.00000 Machine(Plant) 2856.303672 20 142.8151836 1.2451488 0.26171 Error 5505.469467 48 114.6972806
Analysis of Variance
Factorial Experiments
• Dependent variable Y
• k Categorical independent variables A, B, C, … (the Factors)
• Let– a = the number of categories of A– b = the number of categories of B– c = the number of categories of C– etc.
The Completely Randomized Design
• We form the set of all treatment combinations – the set of all combinations of the k factors
• Total number of treatment combinations– t = abc….
• In the completely randomized design n experimental units (test animals , test plots, etc. are randomly assigned to each treatment combination.– Total number of experimental units N = nt=nabc..
Random Effects and Fixed Effects Factors
fixed effects factors•he levels of the factor are a fixed set of levels and the conclusions of any analysis is in relationship to these levels.random effects factor •If the levels have been selected at random from a population of levels.•The conclusions of the analysis will be directed at the population of levels and not only the levels selected for the experiment
Example: 3 factors A, B, C – all are random effects
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2ABC AB AC An nc nb nbc
2 2 2 2 2ABC AB BC Bn nc na nac
2 2 2 2 2ABC BC AC Cn na nb nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2 2ABC BCn na
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ABCMS MS
ABC ErrorMS MS
Example: 3 factors A fixed, B, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2 2 2
1
1a
ABC AB AC ii
n nc nb nbc a
2 2 2
BC Bna nac
2 2 2BC Cna nab
2 2 2ABC ABn nc
2 2 2ABC ACn nb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ABCMS MS
BC ErrorMS MS
ABC ErrorMS MS
C BCMS MS
B BCMS MS
Example: 3 factors A , B fixed, C random
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2 2
1
1a
AC ii
nb nbc a
2 2Cnab
2 2ACnb
2 2BCna
2 2ABCn
2
AB ABCMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B BCMS MS 2 2 2
1
1a
BC ji
na nac b
22 2
1 1
1 1a b
ABC iji j
n nc a b
A ACMS MS
Example: 3 factors A , B and C fixed
Source EMS F
A
B
C
AB
AC
BC
ABC
Error
2 2
1
1a
ii
nbc a
2
AB ErrorMS MS
AC ErrorMS MS
BC ErrorMS MS
ABC ErrorMS MS
C ErrorMS MS
B ErrorMS MS 2 2
1
1a
ji
nac b
22
1 1
1 1a b
iji j
nc a b
A ErrorMS MS
2 2
1
1c
kk
nbc c
22
1 1
1 1a c
iji k
nb a c
22
1 1
1 1b c
ijj k
na b c
22
1 1 1
1 1 1a b c
ijki j k
n a b c
Crossed and Nested Factors
Factor B is said to be nested within factor A if the levels of B differ for each level of A.
Levels of B
Levels of A
The Analysis of Covariance
ANACOVA
Multiple Regression
1. Dependent variable Y (continuous)
2. Continuous independent variables X1, X2, …, Xp
The continuous independent variables X1, X2, …, Xp are quite often measured and observed (not set at specific values or levels)
Analysis of Variance
1. Dependent variable Y (continuous)
2. Categorical independent variables (Factors) A, B, C,…
The categorical independent variables A, B, C,… are set at specific values or levels.
Analysis of Covariance
1. Dependent variable Y (continuous)
2. Categorical independent variables (Factors) A, B, C,…
3. Continuous independent variables (covariates) X1, X2, …, Xp
Example
1. Dependent variable Y – weight gain
2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low)
ii. B = source of protein (Beef, Cereal, Pork)
3. Continuous independent variables (covariates)
i. X1= initial wt. of animal.
Statistical Technique
Independent variables
continuous categorical
Multiple Regression ×
ANOVA ×
ANACOVA × ×
Dependent variable is continuous
It is possible to treat categorical independent variables in Multiple Regression using Dummy variables.
The Multiple Regression Model
0 1 1 p pY X X
The ANOVA Model
Main Effects Interactions
i j ijY
The ANACOVA Model
Main Effects Interactions
i j ijY
1 1 1 1Covariate Effects
X X
ANOVA Tables
The Multiple Regression Model
Source S.S. d.f.
Regression SSReg p
Error SSError n – p - 1
Total SSTotal n - 1
The ANOVA ModelSource S.S. d.f.
Main Effects
A SSA a - 1
B SSB b - 1
Interactions
AB SSAB (a – 1)(b – 1)
⁞
Error SSError n – p - 1
Total SSTotal n - 1
The ANACOVA ModelSource S.S. d.f.
Covariates SSCovaraites p
Main Effects
A SSA a - 1
B SSB b - 1
Interactions
AB SSAB (a – 1)(b – 1)
⁞
Error SSError n – p - 1
Total SSTotal n - 1
Example
1. Dependent variable Y – weight gain
2. Categorical independent variables (Factors) i. A = level of protein in the diet (High, Low)
ii. B = source of protein (Beef, Cereal, Pork)
3. Continuous independent variables (covariates)
X = initial wt. of animal.
The data
wtgn initial wt Level Source wtgn initial wt Level Source
112 1031 High Beef 56 1044 Low Beef126 1087 High Beef 86 1025 Low Beef88 890 High Beef 78 878 Low Beef97 1089 High Beef 69 1193 Low Beef91 894 High Beef 76 1024 Low Beef78 917 High Beef 65 1078 Low Beef86 972 High Beef 60 965 Low Beef83 899 High Beef 80 958 Low Beef
108 821 High Beef 78 1135 Low Beef104 846 High Beef 41 847 Low Beef42 1041 High Cereal 68 986 Low Cereal93 1108 High Cereal 67 1003 Low Cereal
102 1132 High Cereal 71 968 Low Cereal77 1023 High Cereal 76 1035 Low Cereal85 1090 High Cereal 85 1018 Low Cereal88 921 High Cereal 37 882 Low Cereal82 909 High Cereal 119 1053 Low Cereal41 1091 High Cereal 91 978 Low Cereal63 838 High Cereal 51 1057 Low Cereal88 935 High Cereal 57 1035 Low Cereal
104 1098 High Pork 96 965 Low Pork114 888 High Pork 67 1025 Low Pork78 1000 High Pork 85 970 Low Pork
111 993 High Pork 17 836 Low Pork109 1043 High Pork 67 961 Low Pork115 992 High Pork 54 931 Low Pork47 834 High Pork 105 1017 Low Pork
124 1005 High Pork 64 845 Low Pork80 905 High Pork 92 1092 Low Pork97 1059 High Pork 62 932 Low Pork
The ANOVA Table
Source Sum of Squares df Mean Square F Sig.
Initial (Covariate) 3357.8165 1 3357.82 9.075 0.00397LEVEL 6523.4815 1 6523.48 17.631 0.0001SOURCE 2013.6469 2 1006.82 2.721 0.07499LEVEL * SOURCE 2528.0163 2 1264.01 3.416 0.04022Error 19609.4835 53 369.99
Total 31966.8500 59
Using SPSS to perform ANACOVA
The data file
Select Analyze->General Linear Model -> Univariate
Choose the Dependent Variable, the Fixed Factor(s) and the Covaraites
The following ANOVA table appears
Tests of Between-Subjects Effects
Dependent Variable: WTGN
12357.366a 6 2059.561 5.567 .000
24.883 1 24.883 .067 .796
3357.816 1 3357.816 9.075 .004
6523.482 1 6523.482 17.631 .000
2013.647 2 1006.823 2.721 .075
2528.016 2 1264.008 3.416 .040
19609.484 53 369.990
421265.0 60
31966.850 59
SourceCorrected Model
Intercept
INITIAL
LEVEL
SOURCE
LEVEL * SOURCE
Error
Total
Corrected Total
Type IIISum ofSquares df
MeanSquare F Sig.
R Squared = .387 (Adjusted R Squared = .317)a.
40
60
80
100
120
140
700 800 900 1000 1100 1200 1300 1400
Covariate
Dep
end
ent
vari
able
The Process of Analysis of Covariance
Covariate
Ad
just
ed D
epen
den
t va
riab
leThe Process of Analysis of Covariance
40
60
80
100
120
140
700 800 900 1000 1100 1200 1300 1400
• The dependent variable (Y) is adjusted so that the covariate takes on its average value for each case
• The effect of the factors ( A, B, etc) are determined using the adjusted value of the dependent variable.
• ANOVA and ANACOVA can be handled by Multiple Regression Package by the use of Dummy variables to handle the categorical independent variables.
• The results would be the same.
Analysis of unbalanced Factorial Designs
Type I, Type II, Type III
Sum of Squares
Sum of squares for testing an effect
modelComplete ≡ model with the effect in.
modelReduced ≡ model with the effect out.
Reduced Completemodel modelEffectSS RSS RSS
Type I SS
• Type I estimates of the sum of squares associated with an effect in a model are calculated when sums of squares for a model are calculated sequentially
Example
• Consider the three factor factorial experiment with factors A, B and C.
The Complete model
• Y = + A + B + C + AB + AC + BC + ABC
A sequence of increasingly simpler models
1. Y = + A + B + C + AB + AC + BC + ABC
2. Y = + A+ B + C + AB + AC + BC
3. Y = + A + B+ C + AB + AC
4. Y = + A + B + C+ AB
5. Y = + A + B + C
6. Y = + A + B
7. Y = + A
8. Y =
Type I S.S.
2 1model modelABCSS RSS RSS I
3 2model modelBCSS RSS RSS I
4 3model modelACSS RSS RSS I
5 4model modelABSS RSS RSS I
6 5model modelCSS RSS RSS I
7 6model modelBSS RSS RSS I
8 7model modelASS RSS RSS I
Type II SS
• Type two sum of squares are calculated for an effect assuming that the Complete model contains every effect of equal or lesser order. The reduced model has the effect removed ,
The Complete models
1. Y = + A + B + C + AB + AC + BC + ABC (the three factor model)
2. Y = + A+ B + C + AB + AC + BC (the all two factor model)
3. Y = + A + B + C (the all main effects model)
The Reduced models
For a k-factor effect the reduced model is the all k-factor model with the effect removed
2 1model modelABCSS RSS RSS II
2modelABSS RSS Y A B C AC BC RSS II
3modelASS RSS Y B C RSS II
2modelACSS RSS Y A B C AB BC RSS II
2modelBCSS RSS Y A B C AB AC RSS II
3modelBSS RSS Y A C RSS II
3modelCSS RSS Y A B RSS II
Type III SS
• The type III sum of squares is calculated by comparing the full model, to the full model without the effect.
Comments
• When using The type I sum of squares the effects are tested in a specified sequence resulting in a increasingly simpler model. The test is valid only the null Hypothesis (H0) has been accepted in the previous tests.
• When using The type II sum of squares the test for a k-factor effect is valid only the all k-factor model can be assumed.
• When using The type III sum of squares the tests require neither of these assumptions.
An additional Comment
• When the completely randomized design is balanced (equal number of observations per treatment combination) then type I sum of squares, type II sum of squares and type III sum of squares are equal.
Example
• A two factor (A and B) experiment, response variable y.
• The SPSS data file
Using ANOVA SPSS package
Select the type of SS using model
ANOVA table – type I S.S
Tests of Between-Subjects Effects
Dependent Variable: Y
11545.858a 8 1443.232 45.554 .000
61603.201 1 61603.201 1944.440 .000
3666.552 2 1833.276 57.865 .000
809.019 2 404.509 12.768 .000
7070.287 4 1767.572 55.792 .000
760.361 24 31.682
73909.420 33
12306.219 32
SourceCorrected Model
Intercept
A
B
A * B
Error
Total
Corrected Total
Ty pe I Sumof Squares df
MeanSquare F Sig.
R Squared = .938 (Adjusted R Squared = .918)a.
ANOVA table – type II S.S
Tests of Between-Subjects Effects
Dependent Variable: Y
11545.858a 8 1443.232 45.554 .000
61603.201 1 61603.201 1944.440 .000
3358.643 2 1679.321 53.006 .000
809.019 2 404.509 12.768 .000
7070.287 4 1767.572 55.792 .000
760.361 24 31.682
73909.420 33
12306.219 32
SourceCorrected Model
Intercept
A
B
A * B
Error
Total
Corrected Total
Ty pe IISum ofSquares df
MeanSquare F Sig.
R Squared = .938 (Adjusted R Squared = .918)a.
ANOVA table – type III S.S
Tests of Between-Subjects Effects
Dependent Variable: Y
11545.858a 8 1443.232 45.554 .000
52327.002 1 52327.002 1651.647 .000
2812.027 2 1406.013 44.379 .000
1010.809 2 505.405 15.953 .000
7070.287 4 1767.572 55.792 .000
760.361 24 31.682
73909.420 33
12306.219 32
SourceCorrec ted Model
Intercept
A
B
A * B
Error
Total
Correc ted Total
Ty pe IIISum ofSquares df
MeanSquare F Sig.
R Squared = .938 (Adjusted R Squared = .918)a.
Next Topic Other Experimental Designs