Post on 18-Jan-2016
Topic 24: Two-Way ANOVA
Outline
• Two-way ANOVA
–Data
–Cell means model
–Parameter estimates
–Factor effects model
Two-Way ANOVA
• The response variable Y is continuous
• There are two categorical explanatory variables or factors
Data for two-way ANOVA
• Y is the response variable
• Factor A with levels i = 1 to a
• Factor B with levels j = 1 to b
• Yijk is the kth observation in cell (i,j)
• In Chapter 19, we assume equal
sample size in each cell (nij=n)
KNNL Example• KNNL p 833
• Y is the number of cases of bread sold
• A is the height of the shelf display, a=3 levels: bottom, middle, top
• B is the width of the shelf display, b=2 levels: regular, wide
• n=2 stores for each of the 3x2=6 treatment combinations (nT=12)
Read the data
data a1; infile ‘../data/ch19ta07.txt'; input sales height width;
proc print data=a1; run;
The dataObs sales height width 1 47 1 1 2 43 1 1 3 46 1 2 4 40 1 2 5 62 2 1 6 68 2 1 7 67 2 2 8 71 2 2 9 41 3 1 10 39 3 1 11 42 3 2 12 46 3 2
Notation
• For Yijk we use
– i to denote the level of the factor A
– j to denote the level of the factor B
–k to denote the kth observation in cell (i,j)
• i = 1, . . . , a levels of factor A
• j = 1, . . . , b levels of factor B
• k = 1, . . . , n observations in cell (i,j)
Model
• We assume that the response variable observations are
–Normally distributed
•With a mean that may depend on the levels of the factors A and B
•With a constant variance
– Independent
Cell Means Model• Yijk = μij + εijk
–where μij is the theoretical mean or expected value of all observations in cell (i,j) – the εijk are iid N(0, σ2)
• This means Yijk ~ N(μij, σ2), independent• The parameters of the model are– μij, for i = 1 to a and j = 1 to b –σ2
Estimates• Estimate μij by the mean of the
observations in cell (i,j), • • For each (i,j) combination, we can get
an estimate of the variance
• • We need to combine these to get an
estimate of σ2
ij.Yn/)Y(Y k ijkij.
k2
ij.ijk2ij )1n/()YY(s
Pooled estimate of σ2
• In general we pool the sij2, using
weights proportional to the df, nij -1
• The pooled estimate is
s2 = (Σ (nij-1)sij2) / (Σ(nij-1))
• Here, nij = n, so s2 = (Σsij2) / (ab),
which is the average sample variance
Run proc glm
proc glm data=a1; class height width; model sales= height width height*width; means height width height*width;run;
Output
Class Level InformationClass Levels Valuesheight 3 1 2 3width 2 1 2
Number of Observations Read 12Number of Observations Used 12
Means statement height
Level ofheight N
sales
Mean Std Dev1 4 44.0000000 3.16227766
2 4 67.0000000 3.74165739
3 4 42.0000000 2.94392029
Means statement width
Level ofwidth N
sales
Mean Std Dev1 6 50.0000000 12.0664825
2 6 52.0000000 13.4313067
Means statement ht*w
Level ofheight
Level ofwidth N
sales
Mean Std Dev1 1 2 45.0000000 2.828427121 2 2 43.0000000 4.242640692 1 2 65.0000000 4.242640692 2 2 69.0000000 2.828427123 1 2 40.0000000 1.414213563 2 2 44.0000000 2.82842712
Code the factor levelsdata a1; set a1; if height eq 1 and width eq 1 then hw='1_BR'; if height eq 1 and width eq 2 then hw='2_BW'; if height eq 2 and width eq 1 then hw='3_MR'; if height eq 2 and width eq 2 then hw='4_MW'; if height eq 3 and width eq 1 then hw='5_TR'; if height eq 3 and width eq 2 then hw='6_TW';
Plot the data
symbol1 v=circle i=none;proc gplot data=a1; plot sales*hw/frame;run;
The plot
Put the means in a2
proc means data=a1; var sales; by height width; output out=a2 mean=avsales;proc print data=a2; run;
Output Data Set
Obs height width _TYPE_ _FREQ_ avsales
1 1 1 0 2 45 2 1 2 0 2 43 3 2 1 0 2 65 4 2 2 0 2 69 5 3 1 0 2 40 6 3 2 0 2 44
Plot the means
symbol1 v=square i=join c=black;symbol2 v=diamond i=join c=black;proc gplot data=a2; plot avsales*height=width/frame;run;
The interaction plot
Questions to consider
• Does the height of the display affect
sales? If yes, compare top with middle,
top with bottom, and middle with bottom
• Does the width of the display affect
sales? If yes, compare regular and wide
But wait!!! Are these factor level comparisons
meaningful?• Does the effect of height on sales
depend on the width?
• Does the effect of width on sales depend on the height?
• If yes, we have an interaction and we need to do some additional analysis
Factor effects model
• For the one-way ANOVA model, we wrote μi = μ + αi
• Here we use μij = μ + αi + βj + (αβ)ij
• Under “common” formulation– μ (μ.. in KNNL) is the “overall mean”
– αi is the main effect of A
– βj is the main effect of B
– (αβ)ij is the interaction between A and B
Factor effects model
• μ = (Σij μij)/(ab)
• μi. = (Σj μij)/b and μ.j = (Σi μij)/a
• αi = μi. – μ and βj = μ.j - μ
• (αβ)ij is difference between μij and μ + αi + βj
• (αβ)ij = μij - (μ + (μi. - μ) + (μ.j - μ))
= μij – μi. – μ.j + μ
Interpretation
• μij = μ + αi + βj + (αβ)ij
• μ is the “overall” mean
• αi is an adjustment for level i of A
• βj is an adjustment for level j of B
• (αβ)ij is an additional adjustment that takes into account both i and j that cannot be explained by the previous adjustments
Constraints for this framework
• α. = Σi αi= 0
• β. = Σjβj = 0
• (αβ).j = Σi (αβ)ij = 0 for all j
• (αβ)i. = Σj (αβ)ij = 0 for all i
Estimates for factor effects model
....j...i.ijij
....j.j.....ii
.j..j..i.i
ijk ijk...
YYYY)ˆ(
YYˆ and YYˆ
Yˆ and Yˆ
abn/)Y(Yˆ
SS for ANOVA Table22
.. ...ijk
2jijk
2ijijk
2ijk .ijk
2...ijk
ˆSSA (Y Y )
ˆSSB
SSAB ( )
SSE (Y Y )
SSTO (Y Y )
ˆ
i iijk
ij
ijk
df for ANOVA Table
• dfA = a-1
• dfB = b-1
• dfAB = (a-1)(b-1)
• dfE = ab(n-1)
• dfT = abn-1 = nT-1
MS for ANOVA Table
• MSA = SSA/dfA
• MSB = SSB/dfB
• MSAB = SSAB/dfAB
• MSE = SSE/dfE
• MST = SST/dfT
Hypotheses for two-way ANOVA
• H0A: αi = 0 for all i
• H1A: αi ≠ 0 for at least one i
• H0B: βj = 0 for all j
• H1B: βj ≠ 0 for at least one j
• H0AB: (αβ)ij = 0 for all (i,j)
• H1AB: (αβ)ij ≠ 0 for at least one (i,j)
F statistics
• H0A is tested by FA = MSA/MSE; df=dfA, dfE
• H0B is tested by FB = MSB/MSE; df=dfB,
dfE
• H0AB is tested by FAB = MSAB/MSE;
df=dfAB, dfE
ANOVA Table
Source df SS MS F A a-1 SSA MSA MSA/MSE B b-1 SSB MSB MSB/MSE AB (a-1)(b-1) SSAB MSAB MSAB/MSEError ab(n-1) SSE MSE _ Total abn-1 SSTO MST
P-values
• P-values are calculated using the F(dfNumerator, dfDenominator) distributions
• If P ≤ 0.05 we conclude that the effect being tested is statistically significant
KNNL Example• NKNW p 833• Y is the number of cases of bread sold• A is the height of the shelf display, a=3
levels: bottom, middle, top• B is the width of the shelf display, b=2:
regular, wide• n=2 stores for each of the 3x2
treatment combinations
PROC GLM
proc glm data=a1; class height width; model sales= height width height*width;run;
Output
Note that there are 6 cells inthis design…(6-1)df for model
Source DFSum of
SquaresMean
Square F Value Pr > FModel 5 1580.0000 316.000000 30.58 0.0003Error 6 62.000000 10.333333Corrected Total
11 1642.0000
Output ANOVA
Note Type I and Type III Analyses are the same becausenij is constant
Source DF Type III SS Mean Square F Value Pr > Fheight 2 1544.00000 772.000000 74.71 <.0001
width 1 12.000000 12.000000 1.16 0.3226
height*width 2 24.000000 12.000000 1.16 0.3747
Other output
R-Square Coeff Var Root MSE sales Mean0.962241 6.303040 3.214550 51.00000
Commonly do not consider R-sq when performing ANOVA…interested more in difference in levels rather than the models predictive ability
Results
• The main effect of height is statistically significant (F=74.71; df=2,6; P<0.0001)
• The main effect of width is not statistically significant (F=1.16; df=1,6; P=0.32)
• The interaction between height and width is not statistically significant (F=1.16; df=2,6; P=0.37)
Interpretation
• The height of the display affects sales of bread
• The width of the display has no apparent effect
• The effect of the height of the display is similar for both the regular and the wide widths
Plot of the means
Additional analyses
• We will need to do additional analyses to explain the height effect (factor A)
• There were three levels: bottom, middle and top
• We could rerun the data with a one-way anova and use the methods we learned in the previous chapters
• Use means statement with lines
Run Proc GLM
proc glm data=a1; class height width; model sales= height width height*width; means height / tukey lines; lsmeans height / adjust=tukey;run;
MEANS OutputAlpha 0.05Error Degrees of Freedom 6Error Mean Square 10.33333Critical Value of Studentized Range 4.33920Minimum Significant Difference 6.9743
Means with the same letter are not significantly different.
Tukey Grouping Mean N heightA 67.000 4 2
B 44.000 4 1BB 42.000 4 3
LSMEANS Outputheight sales LSMEAN
LSMEAN Number
1 44.0000000 1
2 67.0000000 2
3 42.0000000 3
Least Squares Means for effect heightPr > |t| for H0: LSMean(i)=LSMean(j)
Dependent Variable: salesi/j 1 2 31 0.0001 0.67142 0.0001 <.00013 0.6714 <.0001
Last slide
• We went over Chapter 19
• We used program topic24.sas to generate the output for today.