(General) Linear Statistical Models (lm) Regression ANOVA ...
Anova General
-
Upload
shizuka-toma -
Category
Documents
-
view
225 -
download
0
Transcript of Anova General
-
8/12/2019 Anova General
1/61
Analysis of Variance
Desci2 Lecture
-
8/12/2019 Anova General
2/61
Learning Objectives
In this chapter, you learn:
The basic concepts of experimental design
How to use one-way analysis of variance to testfor differences among the means of several
populations (also referred to as groups in this
chapter)
When to use a randomized block design
How to use two-way analysis of variance and the
concept of interaction
-
8/12/2019 Anova General
3/61
Chapter Overview
Analysis of Variance (ANOVA)
F-test
Tukey-
Kramer
test
One-WayANOVA Two-WayANOVA
Interaction
Effects
RandomizedBlock Design
Multiple
Comparisons
-
8/12/2019 Anova General
4/61
example
A financial analyst studied the annual returns of
three different categories of unit trust.
She wanted to know whether the averageannual returns per category varied across unit
trust categories.
A random sample of unit trusts from each of
three categories (labelled A, B and C) wereselected and their annual returns were
recorded.
The data is shown in Table below.
-
8/12/2019 Anova General
5/61
-
8/12/2019 Anova General
6/61
Management Question
Can the financial analyst conclude that the
average annual returns from the threecategories of unit trusts are the same?
-
8/12/2019 Anova General
7/61
General ANOVA Setting
Investigator controls one or more independent
variables
Called factors(or treatment variables)
Each factor contains two or more levels(or groups orcategories/classifications)
Observe effects on the dependent variable
Response to levels of independent variable
Experimental design: the plan used to collect
the data
-
8/12/2019 Anova General
8/61
Completely Randomized Design
Experimental units (subjects) are assigned
randomly to treatments
Subjects are assumed homogeneous Only one factor or independent variable
With two or more treatment levels
Analyzed by one-way analysis of variance
(ANOVA)
-
8/12/2019 Anova General
9/61
One-Way Analysis of Variance
Evaluate the difference among the means of three
or more groups
Examples: Accident rates for 1st, 2nd, and 3rdshift
Expected mileage for five brands of tires
Assumptions
Populations are normally distributed Populations have equal variances
Samples are randomly and independently drawn
-
8/12/2019 Anova General
10/61
Hypotheses of One-Way ANOVA
All population means are equal
i.e., no treatment effect (no variation in means among
groups)
At least one population mean is different
i.e., there is a treatment effect
Does not mean that all population means are different
(some pairs may be the same)
c3210 :H
samethearemeanspopulationtheofallNot:H1
-
8/12/2019 Anova General
11/61
One-Factor ANOVA
All Means are the same:
The Null Hypothesis is True
(No Treatment Effect)
c3210 :H
sametheareallNot:H j1
321
Example:
3 Groups
-
8/12/2019 Anova General
12/61
One-Factor ANOVA
At least one mean is different:
The Null Hypothesis is NOT true
(Treatment Effect is present)
c3210 :H sametheareallNot:H j1
321 321
or
(continued)
-
8/12/2019 Anova General
13/61
Partitioning the Variation
Total variation can be split into two parts:
SST = Total Sum of Squares
(Total variation)
SSA = Sum of Squares Among Groups
(Among-group variation)
SSW = Sum of Squares Within Groups
(Within-group variation)
SST = SSA + SSW
-
8/12/2019 Anova General
14/61
Partitioning the Variation
Total Variation= the aggregate dispersion of the individualdata values across the various factor levels (SST)
Within-Group Variation= dispersion that exists among
the data values within a particular factor level (SSW)
Among-Group Variation= dispersion between the factor
sample means (SSA)
SST = SSA + SSW
(continued)
-
8/12/2019 Anova General
15/61
Partition of Total Variation
Variation Due toFactor (SSA)
Variation Due to RandomSampling (SSW)
Total Variation (SST)
Commonly referred to as: Sum of Squares Within
Sum of Squares Error
Sum of Squares Unexplained
Within-Group Variation
Commonly referred to as: Sum of Squares Between
Sum of Squares Among
Sum of Squares Explained
Among Groups Variation
= +
d.f. = n1
d.f. = c1 d.f. = nc
-
8/12/2019 Anova General
16/61
Total Sum of Squares
c
1j
n
1i
2
ij
j
)XX(SST
Where:
SST = Total sum of squares
c = number of groups (levels or treatments)
nj= number of observations in group j
Xij= ithobservation from group j
X = grand mean (mean of all data values)
SST = SSA + SSW
-
8/12/2019 Anova General
17/61
Total Variation
Group 1 Group 2 Group 3
Response, X
X
2
cn
2
12
2
11 )XX(...)XX()XX(SST c
(continued)
-
8/12/2019 Anova General
18/61
Among-Group Variation
Where:
SSA = Sum of squares among groups
c = number of groups
nj= sample size from group j
Xj= sample mean from group j
X = grand mean (mean of all data values)
2j
c
1j
j )XX(nSSA
SST = SSA + SSW
-
8/12/2019 Anova General
19/61
Among-Group Variation
Variation Due toDifferences Among Groups
i j
2j
c
1j
j )XX(nSSA
1cSSAMSA
Mean Square Among =
SSA/degrees of freedom
(continued)
-
8/12/2019 Anova General
20/61
Among-Group Variation
Group 1 Group 2 Group 3
Response, X
X1
X 2X
3X
22
22
2
11 )xx(n...)xx(n)xx(nSSA cc
(continued)
-
8/12/2019 Anova General
21/61
Within-Group Variation
Where:
SSW = Sum of squares within groups
c = number of groupsnj= sample size from group j
Xj= sample mean from group j
Xij
= ithobservation in group j
2jij
n
1i
c
1j
)XX(SSWj
SST = SSA + SSW
-
8/12/2019 Anova General
22/61
Within-Group Variation
Summing the variation
within each group and then
adding over all groups cnSSWMSW
Mean Square Within =
SSW/degrees of freedom
2jij
n
1i
c
1j
)XX(SSWj
(continued)
j
-
8/12/2019 Anova General
23/61
Within-Group Variation
Group 1 Group 2 Group 3
Response, X
1X
2X
3X
2ccn
2212
2111 )XX(...)XX()Xx(SSW c
(continued)
-
8/12/2019 Anova General
24/61
Obtaining the Mean Squares
cn
SSWMSW
1c
SSAMSA
1n
SSTMST
-
8/12/2019 Anova General
25/61
One-Way ANOVA Table
Source of
VariationdfSS MS
(Variance)
Among
Groups SSA MSA =
Within
Groupsn - cSSW MSW =
Total n - 1SST =SSA+SSW
c - 1MSA
MSW
F ratio
c = number of groups
n = sum of the sample sizes from all groups
df = degrees of freedom
SSA
c - 1
SSW
n - c
F =
-
8/12/2019 Anova General
26/61
One-Way ANOVAF Test Statistic
Test statistic
MSAis mean squares amonggroups
MSWis mean squares withingroups
Degrees of freedom
df1= c1 (c = number of groups)
df2= nc (n = sum of sample sizes from all populations)
MSWMSAF
H0: 1= 2= = c
H1: At least two population means are different
-
8/12/2019 Anova General
27/61
Interpreting One-Way ANOVAF Statistic
The F statistic is the ratio of the amongestimate of variance and the withinestimateof variance
The ratio must always be positive df1= c-1 will typically be small
df2= n- c will typically be large
Decision Rule:
Reject H0if F > FU,otherwise do notreject H0
0
= .05
Reject H0Do notreject H0
FU
-
8/12/2019 Anova General
28/61
One-Way ANOVAF Test Example
You want to see if three
different golf clubs yield
different distances. You
randomly select fivemeasurements from trials on
an automated driving
machine for each club. At the
0.05 significance level, isthere a difference in mean
distance?
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197237 227 206
251 216 204
-
8/12/2019 Anova General
29/61
One-Way ANOVA Example:Scatter Diagram
270
260
250
240
230
220
210200
190
Distance
1X
2X
3X
X
227.0x
205.8x226.0x249.2x 321
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197237 227 206
251 216 204
Club1 2 3
-
8/12/2019 Anova General
30/61
One-Way ANOVA ExampleComputations
Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204
X1= 249.2
X2= 226.0
X3= 205.8
X = 227.0
n1= 5
n2= 5
n3= 5
n = 15
c = 3
SSA = 5 (249.2227)2+ 5 (226227)2+ 5 (205.8227)2 = 4716.4
SSW = (254249.2)2+ (263249.2)2++ (204 205.8)2= 1119.6
MSA = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.325.275
93.3
2358.2F
-
8/12/2019 Anova General
31/61
F= 25.275
One-Way ANOVA ExampleSolution
H0: 1= 2= 3
H1: jnot all equal
= 0.05
df1= 2 df2= 12
Test Statistic:
Decision:
Conclusion:
Reject H0at = 0.05
There is evidence that
at least one j differs
from the rest
0
= .05
FU
= 3.89
Reject H0Do notreject H0
25.275
93.3
2358.2
MSW
MSAF
Critical
Value:
FU
= 3.89
-
8/12/2019 Anova General
32/61
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5Club 3 5 1029 205.8 94.2
ANOVA
Source of
VariationSS df MS F P-value F crit
BetweenGroups
4716.4 2 2358.2 25.275 4.99E-05 3.89
Within
Groups1119.6 12 93.3
Total 5836.0 14
One-Way ANOVAExcel Output
EXCEL: tools | data analysis | ANOVA: single factor
-
8/12/2019 Anova General
33/61
The Tukey-Kramer Procedure
Tells whichpopulation means are significantlydifferent e.g.: 1= 23
Done after rejection of equal means in ANOVA
Allows pair-wise comparisons
Compare absolute mean differences with criticalrange
x1 = 2 3
-
8/12/2019 Anova General
34/61
Tukey-Kramer Critical Range
where:
QU = Value from Studentized Range Distribution
with c and n - c degrees of freedom for
the desired level of (see appendix E.9 table)
MSW = Mean Square Within
njand nj= Sample sizes from groups j and j
j'j
U
n
1
n
1
2
MSWQRangeCritical
-
8/12/2019 Anova General
35/61
The Tukey-Kramer Procedure:Example
1. Compute absolute meandifferences:Club 1 Club 2 Club 3
254 234 200
263 218 222
241 235 197
237 227 206
251 216 204 20.2205.8226.0xx
43.4205.8249.2xx
23.2226.0249.2xx
32
31
21
2. Find the QUvalue from the table in appendix E.10 with
c = 3 and (nc) = (153) = 12 degrees of freedomfor the desired level of (= 0.05 used here):
3.77QU
-
8/12/2019 Anova General
36/61
The Tukey-Kramer Procedure:Example
5. All of the absolute mean differencesare greater than critical range.Therefore there is a significant
difference between each pair ofmeans at 5% level of significance.Thus, with 95% confidence we can concludethat the mean distance for club 1 is greaterthan club 2 and 3, and club 2 is greater than
club 3.
16.2855
1
5
1
2
93.33.77
n
1
n
1
2
MSWQRangeCritical
j'j
U
3. Compute Critical Range:
20.2xx
43.4xx
23.2xx
32
31
21
4. Compare:
(continued)
-
8/12/2019 Anova General
37/61
The Randomized Block Design
Like One-Way ANOVA, we test for equal
population means (for different factor levels, for
example)...
...but we want to control for possible variation
from a second factor (with two or more levels)
Levels of the secondary factor are called blocks
-
8/12/2019 Anova General
38/61
Partitioning the Variation
Total variation can now be split into three parts:
SST = Total variation
SSA = Among-Group variation
SSBL = Among-Block variation
SSE = Random variation
SST = SSA + SSBL + SSE
-
8/12/2019 Anova General
39/61
Sum of Squares for Blocking
Where:
c = number of groups
r = number of blocks
Xi.= mean of all values in block i
X = grand mean (mean of all data values)
r
1i
2
i. )XX(cSSBL
SST = SSA + SSBL + SSE
-
8/12/2019 Anova General
40/61
Partitioning the Variation
Total variation can now be split into three parts:
SST and SSA are
computed as they werein One-Way ANOVA
SST = SSA + SSBL + SSE
SSE = SST(SSA + SSBL)
-
8/12/2019 Anova General
41/61
Mean Squares
1c
SSAgroupsamongsquareMeanMSA
1r
SSBLblockingsquareMeanMSBL
)1)(1(
cr
SSEMSE errorsquareMean
-
8/12/2019 Anova General
42/61
Randomized Block ANOVA Table
Source of
VariationdfSS MS
Among
BlocksSSBL MSBL
Error
(r1)(c-1)SSE MSE
Total rc - 1SST
r - 1 MSBL
MSE
F ratio
c = number of populations rc = sum of the sample sizes from all populations
r = number of blocks df = degrees of freedom
Among
Treatments SSA c - 1 MSAMSA
MSE
-
8/12/2019 Anova General
43/61
Blocking Test
Blocking test: df1= r1
df2= (r1)(c1)
MSBL
MSE
...:H 3.2.1.0
equalaremeansblockallNot:H1
F =
Reject H0 if F > FU
-
8/12/2019 Anova General
44/61
Main Factor test: df1= c1
df2= (r1)(c1)
MSA
MSE
c..3.2.10 ...:H
equalaremeanspopulationallNot:H1
F =
Reject H0 if F > FU
Main Factor Test
-
8/12/2019 Anova General
45/61
The Tukey Procedure
To test whichpopulation means are significantlydifferent e.g.: 1= 23 Done after rejection of equal means in randomized
block ANOVA design
Allows pair-wise comparisons Compare absolute mean differences with critical
range
x= 1 2 3
-
8/12/2019 Anova General
46/61
etc...
xx
xx
xx
.3.2
.3.1
.2.1
The Tukey Procedure(continued)
r
MSERangeCritical uQ
If the absolute mean differenceis greater than the critical rangethen there is a significantdifference between that pair ofmeans at the chosen level ofsignificance.
Compare:?RangeCriticalxxIs .j'.j
F t i l D i
-
8/12/2019 Anova General
47/61
Factorial Design:
Two-Way ANOVA
Examines the effect of
Two factors of intereston the dependent
variable
e.g., Percent carbonation and line speed on soft drinkbottling process
Interaction between the different levelsof these
two factors
e.g., Does the effect of one particular carbonation
level depend on which level the line speed is set?
-
8/12/2019 Anova General
48/61
Two-Way ANOVA
Assumptions
Populations are normally distributed
Populations have equal variances
Independent random samples are
drawn
(continued)
T W ANOVA
-
8/12/2019 Anova General
49/61
Two-Way ANOVASources of Variation
Two Factors of interest: A and B
r = number of levels of factor A
c = number of levels of factor B
n= number of replications for each cell
n = total number of observations in all cells
(n = rcn)
Xijk= value of the kthobservation of level i of
factor A and level j of factor B
T W ANOVA
-
8/12/2019 Anova General
50/61
Two-Way ANOVASources of Variation
SST
Total Variation
SSAFactor A Variation
SSBFactor B Variation
SSAB
Variation due to interactionbetween A and B
SSERandom variation (Error)
Degrees of
Freedom:
r1
c1
(r1)(c1)
rc(n 1)
n - 1
SST = SSA + SSB + SSAB + SSE
(continued)
-
8/12/2019 Anova General
51/61
Two Factor ANOVA Equations
r
1i
c
1j
n
1k
2
ijk )XX(SST
2r
1i
..i )XX(ncSSA
2c
1j
.j. )XX(nrSSB
Total Variation:
Factor A Variation:
Factor B Variation:
-
8/12/2019 Anova General
52/61
Two Factor ANOVA Equations
2
r
1i
c
1j
.j.i..ij . )XXXX(nSSAB
r
1i
c
1j
n
1k
2.ijijk )XX(SSE
Interaction Variation:
Sum of Squares Error:
(continued)
-
8/12/2019 Anova General
53/61
Two Factor ANOVA Equations
where:MeanGrand
nrc
X
X
r
1i
c
1j
n
1k
ijk
r)...,2,1,(iAfactorofleveliofMeannc
X
X th
c
1j
n
1k
ijk
..i
c)...,2,1,(jBfactorofleveljofMeannr
X
X
th
r
1i
n
1k
ijk
.j.
ijcellofMeann
XX
n
1k
ijk.ij
r = number of levels of factor A
c = number of levels of factor B
n= number of replications in each cell
(continued)
-
8/12/2019 Anova General
54/61
Mean Square Calculations
1r
SSAAfactorsquareMeanMSA
1c
SSBBfactorsquareMeanMSB
)1c)(1r(
SSABninteractiosquareMeanMSAB
)1'n(rc
SSEerrorsquareMeanMSE
T W ANOVA
-
8/12/2019 Anova General
55/61
Two-Way ANOVA:The F Test Statistic
F Test for Factor B Effect
F Test for Interaction Effect
H0: 1.. = 2.. = 3..=
H1: Not all i..are equal
H0: the interaction of A and B is
equal to zero
H1: interaction of A and B is not
zero
F Test for Factor A Effect
H0: .1. = .2. = .3.=
H1: Not all .j.are equal
Reject H0
if F > FUMSE
MSAF
MSE
MSBF
MSE
MSABF
Reject H0
if F > FU
Reject H0
if F > FU
T o Wa ANOVA
-
8/12/2019 Anova General
56/61
Two-Way ANOVASummary Table
Source of
Variation
Sum of
Squares
Degrees of
Freedom
Mean
Squares
F
Statistic
Factor A SSA r1MSA
= SSA/(r1)
MSA
MSE
Factor B SSB c1MSB
= SSB/(c1)
MSB
MSE
AB
(Interaction)
SSAB (r1)(c1)MSAB
= SSAB/ (r1)(c1)
MSAB
MSE
Error SSE rc(n1)MSE =
SSE/rc(n1)
Total SST n1
Features of Two Way ANOVA
-
8/12/2019 Anova General
57/61
Features of Two-Way ANOVAFTest
Degrees of freedom always add up
n-1 = rc(n-1) + (r-1) + (c-1) + (r-1)(c-1)
Total = error + factor A + factor B + interaction
The denominator of the FTest is always the
same but the numerator is different
The sums of squares always add up
SST = SSE + SSA + SSB + SSAB
Total = error + factor A + factor B + interaction
Examples:
-
8/12/2019 Anova General
58/61
Examples:Interaction vs. No Interaction
No interaction:
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
Factor B Level 1
Factor B Level 3
Factor B Level 2
Factor A Levels
MeanResponse
Me
anResponse
Interaction is
present:
Multiple Comparisons:
-
8/12/2019 Anova General
59/61
Multiple Comparisons:
The Tukey Procedure
Unless there is a significant interaction, you
can determine the levels that are significantly
different using the Tukey procedure
Consider all absolute mean differences and
compare to the calculated critical range
Example: Absolute differences
for factor A, assuming three factors:
3..2..
3..1..
2..1..
XX
XX
XX
Multiple Comparisons:
-
8/12/2019 Anova General
60/61
Multiple Comparisons:
The Tukey Procedure
Critical Range for Factor A:
(where Quis from Table E.10 with r and rc(n1) d.f.)
Critical Range for Factor B:
(where Quis from Table E.10 with c and rc(n1) d.f.)
n'c
MSERangeCritical UQ
n'r
MSERangeCritical UQ
-
8/12/2019 Anova General
61/61
Chapter Summary
Described one-way analysis of variance
The logic of ANOVA
ANOVA assumptions
F test for difference in c means
The Tukey-Kramer procedure for multiple comparisons
Considered the Randomized Block Design
Treatment and Block Effects
Multiple Comparisons: Tukey Procedure
Described two-way analysis of variance
Examined effects of multiplefactors
Examined interaction between factors