anova
-
Upload
anshulgoel9054 -
Category
Documents
-
view
534 -
download
0
Transcript of anova
Created by: Prabhat Mittal 1E-mail ID: [email protected]
Statistics for Business Analysis
Day 10
Session-I
ANOVA: Analysis of Variance
Learning Objectives
� To compare more than two population means
using one way analysis of variance
� To use the F distribution to test hypotheses about
two population variances
� To learn about randomized block design
� To learn the technique of two-way analysis of
variance and the concept of interaction
Created by: Prabhat Mittal 2E-mail ID: [email protected]
Chapter Overview
Analysis of Variance (ANOVA)
F-test
Tukey-Kramer
test
One-Way ANOVA
Two-Way ANOVA
Interaction
Effects
Randomized Block Design
Multiple
Comparisons
General ANOVA Setting
� Investigator controls one or more independent variables
� Called factors (or treatment variables)
� Each factor contains two or more levels (or groups or
categories/classifications)
� Observe effects on the dependent variable
� Response to levels of independent variable
� Experimental design: the plan used to collect the data
Created by: Prabhat Mittal 3E-mail ID: [email protected]
Completely Randomized Design
� Experimental units (subjects) are assigned randomly to treatments
� Subjects are assumed homogeneous
� Only one factor or independent variable
� With two or more treatment levels
� Analyzed by one-way analysis of variance (ANOVA)
One-Way Analysis of Variance
� Evaluate the difference among the means of three or more groups
Examples: Accident rates for 1st, 2nd, and 3rd shift
Expected mileage for five brands of tires
� Assumptions
� Populations are normally distributed
� Populations have equal variances
� Samples are randomly and independently drawn
Created by: Prabhat Mittal 4E-mail ID: [email protected]
Hypotheses of One-Way ANOVA
�
� All population means are equal
� i.e., no treatment effect (no variation in means among
groups)
�
� At least one population mean is different
� i.e., there is a treatment effect
� Does not mean that all population means are different
(some pairs may be the same)
c3210 µµµµ:H ==== L
same the are means population the of all Not:H1
One-Factor ANOVA
All Means are the same:
The Null Hypothesis is True
(No Treatment Effect)
c3210 µµµµ:H ==== L
same the are µ all Not:H j1
321 µµµ ==
Created by: Prabhat Mittal 5E-mail ID: [email protected]
One-Factor ANOVA
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)
c3210 µµµµ:H ==== L
same the are µ all Not:H j1
321 µµµ ≠= 321 µµµ ≠≠
or
(continued)
Partitioning the Variation
� Total variation can be split into two parts:
SST = Total Sum of Squares
(Total variation)SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)
SST = SSA + SSW
Created by: Prabhat Mittal 6E-mail ID: [email protected]
Partitioning the Variation
Total Variation = the aggregate dispersion of the individual
data values across the various factor levels (SST)
Within-Group Variation = dispersion that exists among
the data values within a particular factor level (SSW)
Among-Group Variation = dispersion between the factor
sample means (SSA)
SST = SSA + SSW
(continued)
Partition of Total Variation
Variation Due to
Factor (SSA)Variation Due to Random
Sampling (SSW)
Total Variation (SST)
Commonly referred to as:
� Sum of Squares Within
� Sum of Squares Error
� Sum of Squares Unexplained
� Within-Group Variation
Commonly referred to as:
� Sum of Squares Between
� Sum of Squares Among
� Sum of Squares Explained
� Among Groups Variation
= +
d.f. = n – 1
d.f. = c – 1 d.f. = n – c
Created by: Prabhat Mittal 7E-mail ID: [email protected]
Total Sum of Squares
∑∑= =
−=c
1j
n
1i
2ij
j
)XX(SST
Where:
SST = Total sum of squares
c = number of groups (levels or treatments)
nj = number of observations in group j
Xij = ith observation from group j
X = grand mean (mean of all data values)
SST = SSA + SSW
Total Variation
Group 1 Group 2 Group 3
Response, X
X
2
cn
2
12
2
11 )XX(...)XX()XX(SSTc
−++−+−=
(continued)
Created by: Prabhat Mittal 8E-mail ID: [email protected]
Among-Group Variation
Where:
SSA = Sum of squares among groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
X = grand mean (mean of all data values)
2j
c
1j
j )XX(nSSA −=∑=
SST = SSA + SSW
Among-Group Variation
Variation Due to
Differences Among Groups
iµ jµ
2j
c
1j
j )XX(nSSA −=∑=
1c
SSAMSA
−=
Mean Square Among =
SSA/degrees of freedom
(continued)
Created by: Prabhat Mittal 9E-mail ID: [email protected]
Among-Group Variation
Group 1 Group 2 Group 3
Response, X
X1
X 2X
3X
22
22
2
11 )xx(n...)xx(n)xx(nSSA cc −++−+−=
(continued)
Within-Group Variation
Where:
SSW = Sum of squares within groups
c = number of groups
nj = sample size from group j
Xj = sample mean from group j
Xij = ith observation in group j
2jij
n
1i
c
1j
)XX(SSWj
−= ∑∑==
SST = SSA + SSW
Created by: Prabhat Mittal 10E-mail ID: [email protected]
Within-Group Variation
Summing the variation
within each group and then
adding over all groups cn
SSWMSW
−=
Mean Square Within =
SSW/degrees of freedom
2jij
n
1i
c
1j
)XX(SSWj
−= ∑∑==
(continued)
jµ
Within-Group Variation
Group 1 Group 2 Group 3
Response, X
1X 2
X3
X
2ccn
2212
2111 )XX(...)XX()Xx(SSW
c−++−+−=
(continued)
Created by: Prabhat Mittal 11E-mail ID: [email protected]
Obtaining the Mean Squares
cn
SSWMSW
−=
1c
SSAMSA
−=
1n
SSTMST
−=
One-Way ANOVA Table
Source of Variation
dfSS MS(Variance)
Among Groups
SSA MSA =
Within Groups
n - cSSW MSW =
Total n - 1SST =
SSA+SSW
c - 1MSA
MSW
F ratio
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
SSA
c - 1
SSW
n - c
F =
Created by: Prabhat Mittal 12E-mail ID: [email protected]
One-Way ANOVAF Test Statistic
� Test statistic
MSA is mean squares among groups
MSW is mean squares within groups
� Degrees of freedom
� df1 = c – 1 (c = number of groups)
� df2 = n – c (n = sum of sample sizes from all populations)
MSW
MSAF =
H0: µ1= µ2 = … = µc
H1: At least two population means are different
Interpreting One-Way ANOVA F Statistic
� The F statistic is the ratio of the amongestimate of variance and the within estimate of variance� The ratio must always be positive
� df1 = c -1 will typically be small
� df2 = n - c will typically be large
Decision Rule:
� Reject H0 if F > FU, otherwise do not reject H0
0
α = .05
Reject H0Do not reject H0
FU
Created by: Prabhat Mittal 13E-mail ID: [email protected]
One-Way ANOVA F Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select five
measurements from trials on an automated driving
machine for each club. At the
0.05 significance level, is
there a difference in mean
distance?
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206251 216 204
••
••
•
One-Way ANOVA Example: Scatter Diagram
270
260
250
240
230
220
210
200
190
•
•
••
•
••
•
••
Distance
1X
2X
3X
X
227.0 x
205.8 x 226.0x 249.2x 321
=
===
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206251 216 204
Club1 2 3
Created by: Prabhat Mittal 14E-mail ID: [email protected]
One-Way ANOVA Example Computations
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206251 216 204
X1 = 249.2
X2 = 226.0
X3 = 205.8
X = 227.0
n1 = 5
n2 = 5
n3 = 5
n = 15
c = 3
SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4
SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6
MSA = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.325.275
93.3
2358.2F ==
F = 25.275
One-Way ANOVA Example Solution
H0: µ1 = µ2 = µ3
H1: µj not all equal
α = 0.05
df1= 2 df2 = 12
Test Statistic:
Decision:
Conclusion:
Reject H0 at αααα = 0.05
There is evidence that at least one µj differs from the rest
0
α = .05
FU = 3.89
Reject H0Do not reject H0
25.27593.3
2358.2
MSW
MSAF ===
Critical
Value:
FU = 3.89
Created by: Prabhat Mittal 15E-mail ID: [email protected]
145836.0Total
93.3121119.6Within Groups
3.894.99E-0525.2752358.224716.4Between Groups
F critP-valueFMSdfSSSource of Variation
ANOVA
94.2205.810295Club 3
77.522611305Club 2
108.2249.212465Club 1
VarianceAverageSumCountGroups
SUMMARY
One-Way ANOVA Excel Output
EXCEL: tools | data analysis | ANOVA: single factor
Numerical Problems
Ref: 11-28. Page no. 604 The following data show the
number of claims processed per day for a group of four insurance company employees observed for a
number of days. Test the hypothesis that the
employees’ mean claims per day are all the same.
Use the 0.05 level of significance.
Employee 1 15 17 14 12
Employee 2 12 10 13 17
Employee 3 11 14 13 15 12
Employee 4 13 12 12 14 10 9
Solution: H0: µ1 = µ2 = µ3 = µ4 & H1: µj not all equal
Created by: Prabhat Mittal 16E-mail ID: [email protected]
One-Way ANOVA Excel OutputEXCEL: tools | data analysis | ANOVA: single factor
P-value F CritFMsDfSS Sources of variation
0.26 3.281.466.48319.45Between groups
Do not reject Ho. The employees' productivities are not significantly different
1885.78Total
4.421566.33Within Groups
3.46666711.67706Employee 4
2.513655Employee 3
8.66666713524Employee 2
4.33333314.5584Employee 1
VarianceAverageSumCountGroups
SUMMARY
Statistics for Business Analysis
Day 10
Session-II
ANOVA: Analysis of Variance
Created by: Prabhat Mittal 17E-mail ID: [email protected]
The Tukey-Kramer Procedure
� Tells which population means are significantly different� e.g.: µ1 = µ2 ≠ µ3
� Done after rejection of equal means in ANOVA
� Allows pair-wise comparisons
� Compare absolute mean differences with critical range
xµ1 = µ 2
µ3
Tukey-Kramer Critical Range
where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of α (see appendix E.9 table)
MSW = Mean Square Within
nj and nj’ = Sample sizes from groups j and j’
+=
j'j
Un
1
n
1
2
MSWQRange Critical
Created by: Prabhat Mittal 18E-mail ID: [email protected]
The Tukey-Kramer Procedure: Example
1. Compute absolute mean differences:Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206251 216 204 20.2205.8226.0xx
43.4205.8249.2xx
23.2226.0249.2xx
32
31
21
=−=−
=−=−
=−=−
2. Find the QU value from the table in appendix E.10 with c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom
for the desired level of α (α = 0.05 used here):
3.77QU =
The Tukey-Kramer Procedure: Example
5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at 5% level of significance. Thus, with 95% confidence we can conclude that the mean distance for club 1 is greater than club 2 and 3, and club 2 is greater than club 3.
16.2855
1
5
1
2
93.33.77
n
1
n
1
2
MSWQRange Critical
j'j
U =
+=
+=
3. Compute Critical Range:
20.2xx
43.4xx
23.2xx
32
31
21
=−
=−
=−
4. Compare:
(continued)
Created by: Prabhat Mittal 19E-mail ID: [email protected]
The Randomized Block Design
� Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...
� ...but we want to control for possible variation from a second factor (with two or more levels)
� Levels of the secondary factor are called blocks
Partitioning the Variation
� Total variation can now be split into three parts:
SST = Total variation
SSA = Among-Group variationSSBL = Among-Block variation
SSE = Random variation
SST = SSA + SSBL + SSE
Created by: Prabhat Mittal 20E-mail ID: [email protected]
Sum of Squares for Blocking
Where:
c = number of groups
r = number of blocks
Xi. = mean of all values in block i
X = grand mean (mean of all data values)
∑=
−=r
1i
2i. )XX(cSSBL
SST = SSA + SSBL + SSE
Partitioning the Variation
� Total variation can now be split into three parts:
SST and SSA are
computed as they were
in One-Way ANOVA
SST = SSA + SSBL + SSE
SSE = SST – (SSA + SSBL)
Created by: Prabhat Mittal 21E-mail ID: [email protected]
Mean Squares
1c
SSAgroups among square MeanMSA
−==
1r
SSBLblocking square MeanMSBL
−==
)1)(1( −−==
cr
SSEMSE error square Mean
Randomized Block ANOVA Table
Source of Variation
dfSS MS
Among
BlocksSSBL MSBL
Error (r–1)(c-1)SSE MSE
Total rc - 1SST
r - 1 MSBL
MSE
F ratio
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom
Among
Treatments SSA c - 1 MSA MSA
MSE
Created by: Prabhat Mittal 22E-mail ID: [email protected]
Blocking Test
� Blocking test: df1 = r – 1
df2 = (r – 1)(c – 1)
MSBL
MSE
...µµµ:H 3.2.1.0 ===
equal are means block all Not:H1
F =
Reject H0 if F > FU
� Main Factor test: df1 = c – 1
df2 = (r – 1)(c – 1)
MSA
MSE
c..3.2.10 µ...µµµ:H ====
equal are means population all Not:H1
F =
Reject H0 if F > FU
Main Factor Test
Created by: Prabhat Mittal 23E-mail ID: [email protected]
The Tukey Procedure
� To test which population means are significantly different� e.g.: µ1 = µ2 ≠ µ3
� Done after rejection of equal means in randomized block ANOVA design
� Allows pair-wise comparisons� Compare absolute mean differences with critical
range
xµµµµ = µµµµ µµµµ1 2 3
etc...
xx
xx
xx
.3.2
.3.1
.2.1
−
−
−
The Tukey Procedure(continued)
r
MSERange Critical uQ=
If the absolute mean difference is greater than the critical range then there is a significant difference between that pair of means at the chosen level of significance.
Compare:
?Range CriticalxxIs .j'.j >−
Created by: Prabhat Mittal 24E-mail ID: [email protected]
Factorial Design:
Two-Way ANOVA
� Examines the effect of
� Two factors of interest on the dependent
variable
� e.g., Percent carbonation and line speed on soft drink
bottling process
� Interaction between the different levels of these
two factors
� e.g., Does the effect of one particular carbonation
level depend on which level the line speed is set?
Two-Way ANOVA
� Assumptions
� Populations are normally distributed
� Populations have equal variances
� Independent random samples are drawn
(continued)
Created by: Prabhat Mittal 25E-mail ID: [email protected]
Two-Way ANOVA Sources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications for each cell
n = total number of observations in all cells
(n = rcn’)
Xijk = value of the kth observation of level i of
factor A and level j of factor B
Two-Way ANOVA Sources of Variation
SST
Total Variation
SSAFactor A Variation
SSBFactor B Variation
SSABVariation due to interaction
between A and B
SSERandom variation (Error)
Degrees of
Freedom:
r – 1
c – 1
(r – 1)(c – 1)
rc(n’ – 1)
n - 1
SST = SSA + SSB + SSAB + SSE
(continued)
Created by: Prabhat Mittal 26E-mail ID: [email protected]
Two Factor ANOVA Equations
∑∑∑= =
′
=
−=r
1i
c
1j
n
1k
2
ijk )XX(SST
2r
1i
..i )XX(ncSSA −′= ∑=
2c
1j
.j. )XX(nrSSB −′= ∑=
Total Variation:
Factor A Variation:
Factor B Variation:
Two Factor ANOVA Equations
2r
1i
c
1j
.j.i..ij. )XXXX(nSSAB +−−′= ∑∑= =
∑∑∑= =
′
=
−=r
1i
c
1j
n
1k
2.ijijk )XX(SSE
Interaction Variation:
Sum of Squares Error:
(continued)
Created by: Prabhat Mittal 27E-mail ID: [email protected]
Two Factor ANOVA Equations
where:Mean Grand
nrc
X
X
r
1i
c
1j
n
1k
ijk
=′
=
∑∑∑= =
′
=
r) ..., 2, 1, (i A factor of level i of Meannc
X
X th
c
1j
n
1k
ijk
..i ==′
=
∑∑=
′
=
c) ..., 2, 1, (j B factor of level j of Meannr
X
X th
r
1i
n
1k
ijk
.j. ==′
=∑∑
=
′
=
ij cell of Meann
XX
n
1k
ijk.ij =
′=∑
′
=
r = number of levels of factor A
c = number of levels of factor B
n’ = number of replications in each cell
(continued)
Mean Square Calculations
1r
SSA Afactor square MeanMSA
−==
1c
SSBB factor square MeanMSB
−==
)1c)(1r(
SSABninteractio square MeanMSAB
−−==
)1'n(rc
SSEerror square MeanMSE
−==
Created by: Prabhat Mittal 28
E-mail ID: [email protected]
Two-Way ANOVA:The F Test Statistic
F Test for Factor B Effect
F Test for Interaction Effect
H0: µ1.. = µ2.. = µ3.. = • • •
H1: Not all µi.. are equal
H0: the interaction of A and B is
equal to zero
H1: interaction of A and B is not
zero
F Test for Factor A Effect
H0: µ.1. = µ.2. = µ.3. = • • •
H1: Not all µ.j. are equal
Reject H0
if F > FUMSE
MSAF =
MSE
MSBF =
MSE
MSABF =
Reject H0
if F > FU
Reject H0
if F > FU
Two-Way ANOVASummary Table
SST
SSE
SSAB
SSB
SSA
Sum of
Squares
n – 1Total
MSE =
SSE/rc(n’ – 1)rc(n’ – 1)Error
MSAB
MSE
MSB
MSE
MSA
MSE
F
Statistic
MSAB= SSAB / (r – 1)(c – 1)
(r – 1)(c – 1)AB
(Interaction)
MSB
= SSB /(c – 1)c – 1Factor B
MSA
= SSA /(r – 1)r – 1Factor A
Mean
Squares
Degrees of
Freedom
Source of
Variation
Created by: Prabhat Mittal 29
E-mail ID: [email protected]
Features of Two-Way ANOVA F Test
� Degrees of freedom always add up
� n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)
� Total = error + factor A + factor B + interaction
� The denominator of the F Test is always the
same but the numerator is different
� The sums of squares always add up
� SST = SSE + SSA + SSB + SSAB
� Total = error + factor A + factor B + interaction
Examples:Interaction vs. No Interaction
� No interaction:
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
Mean R
esponse
Mean R
esponse
� Interaction is present:
Created by: Prabhat Mittal 30
E-mail ID: [email protected]
Multiple Comparisons:
The Tukey Procedure
� Unless there is a significant interaction, you can determine the levels that are significantly different using the Tukey procedure
� Consider all absolute mean differences and compare to the calculated critical range
� Example: Absolute differences
for factor A, assuming three factors:
3..2..
3..1..
2..1..
XX
XX
XX
−
−
−
Multiple Comparisons:
The Tukey Procedure
� Critical Range for Factor A:
(where Qu is from Table E.10 with r and rc(n’–1) d.f.)
� Critical Range for Factor B:
(where Qu is from Table E.10 with c and rc(n’–1) d.f.)
n'c
MSERange Critical UQ=
n'r
MSERange Critical UQ=
Created by: Prabhat Mittal 31
E-mail ID: [email protected]
Summary
� Described one-way analysis of variance
� The logic of ANOVA
� ANOVA assumptions
� F test for difference in c means
� The Tukey-Kramer procedure for multiple comparisons
� Considered the Randomized Block Design
� Treatment and Block Effects
� Multiple Comparisons: Tukey Procedure
� Described two-way analysis of variance
� Examined effects of multiple factors
� Examined interaction between factors