2.16 Analysis of Variance (ANOVA) Rev DD 20100604
Transcript of 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
1/71
2.16 Analysis of Variance (ANOVA)
Six Sigma Black Belt and Green Belt
Week 2
Revised 4th
June 2010
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
2/71
2010-06-04 SKF Group Slide 1 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Objectives
To introduce ANOVA hypothesis testing
Graphical method for analysing differences between means obtained
from two or more samples
Analysis of Variance (ANOVA) methods for analysing the differencesbetween means
To understand the relationship of
"within" subgroup estimates of variation and
"between" subgroup estimates of variation
To understand the measuring effect size
To practice examples
To introduce the Post Hoc test
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
3/71
2010-06-04 SKF Group Slide 2 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
SKF Six Sigma roadmap
Six Sigma methodology and roadmap for common tool usage
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
4/71
2010-06-04 SKF Group Slide 3 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
By knowing and controlling the Xs, we reduce thevariability in Y.
We validate Xs and Ys with hypothesis testing.
Variable
(Continuous)
Variables with categories
(Attribute)
Validating key process inputs and outputs with ANOVA
Y = f(X1
, X2
, X3
, ..., Xn
)
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
5/71
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
6/712010-06-04 SKF Group Slide 5 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Method
Uses sums of squared differences, just like a standard deviation, to
evaluate the total variability of the system
Calculates "standard deviations" for each source and subtracts their
variability from the total
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
7/712010-06-04 SKF Group Slide 6 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
ANOVA graphical
Between subgroup variation (signal)
Within subgroup variation (error)
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
8/71
2010-06-04 SKF Group Slide 7 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Degree of freedom introduction
Degrees of freedom (df) is the number of independent comparisons
available to estimate a specific statistic.
In ANOVA, the degrees of freedom are based on the total number of
responses and the number of levels at which factors are tested.
What is the minimum number of comparisons it would take todetermine which person is the shortest?
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
9/71
2010-06-04 SKF Group Slide 8 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The degree of freedom concept?
Example:
Consider a sample ofn
= 3 scores with a mean of X-bar = 5. The first
score in the sample can be selected without any restrictions; all
scores are independent of each other and they can have any value.
For this demonstration assume X = 2 is obtained for the first scoreand X = 9 for the second.
At this point, however, the third score can be determined.
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
10/71
2010-06-04 SKF Group Slide 9 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The degree of freedom concept?
In this case the third score must be X = 4.
The reason that the third score has to be X = 4 is, the entire sampleofn
= 3 scores has a mean of: X-bar = 5, which means that the sum
of the total must be: X = 15. The first two scores add up to 11
(= 9 + 2), so the third score must be X = 4.
In this case the first two out of three scores were free to have
any
value, but the final score was dependent on the values chosen for thefirst two. With a sample ofn
scores, the first n-1 scores are free to
vary, but the final score can be determined.
As a result, the sample is said to have n-1 degrees of freedom (df).The degrees of freedom determine the number of scores in thesample which are independent and free to vary.
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
11/71
2010-06-04 SKF Group Slide 10 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Thus in other words ...
Degrees of freedom are "statistical cash" ...
We "earn" a degree of freedom for every data point we collect
We "spend" a degree of freedom for every parameter we estimate
Degrees of freedom (within groups):
Earn a degree of freedom for each observation within each group
Spend one degree of freedom to calculate the average for each group
dfW
= n
1, where n
= sample size / treatment
Degrees of freedom (between groups):
Earn a degree of freedom for each group
Spend one degree of freedom to calculate the overall average
dfB
= k
1, where k
= # of group averages or number of treatments
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
12/71
2010-06-04 SKF Group Slide 11 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Degree of freedom and ANOVA
The n-1 degrees of freedom for a sample is the same n-1 that isused in the formulas for sample variance and sample standarddeviation.
Remember, variance is defined as the mean square deviation. This
mean it is computed by finding the sum and dividing by the numberof scores:
Mean = Sum / Number of scores
To calculate sample variance (mean squared deviation), we find thesum of the squared deviations (SS) and divide by the number ofscores that are free to vary. This number is n-1 = df.
df
Sum of squared deviations
Number of scores free to vary
SS==s2
i i
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
13/71
2010-06-04 SKF Group Slide 12 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Calculate the F statistic Example of seal life by shift (1)Shift 1 Shift 2 Shift 3
25.40 23.40 20.00
26.31 21.80 22.20
24.10 23.50 19.75
23.74 22.75 20.60
25.10 21.60 20.40
Mean 24.93 22.61 20.59
df = (5-1) (5-1) (5-1)
Overall average = 22.71
Data collection !
dftotal
= (4) + (4) + (4) = 12
C l l h F i i
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
14/71
2010-06-04 SKF Group Slide 13 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Calculate the F statistic Example of seal life by shift (2)
Mean shift 1
(24.93) Mean shift 2(22.61)
Mean shift 3(20.59)
Overall average = 22.71
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
15/71
C l l h F i i
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
16/71
2010-06-04 SKF Group Slide 15 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Calculate the F statistic Example of seal life by shift (3)between
between2between
dfSSs =
within
within2
withindf
SSs =
25.600.921123.582
ssF 2
within
2
betweendf,df 21
===
23.58213
47.164dfSS
between
between=
=
0.9211315
11.0532
df
SS
within
within=
=
The F-distribution depends on two sets of degrees of freedom:
the df from each variance: df1
for the between
and df2
for the within.
Number of shifts
Total data available
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
17/71
2010-06-04 SKF Group Slide 16 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
What is the distribution of the F-ratio?
This is the distribution of F-ratios that would occur if there was nodifference in group means.
For example, say Im willing to take a 5% chance of being wrong by
saying there is more between than within variation.
Fcritical
at 5%
5% of the total area is fromthis F value, Fcrit
to the right
The curvechanges as afunction of thenumerator df
anddenominator df
Represents theamount of risk I'mwilling to take ofbeing wrong when Isay that Ive foundthis factor to be asignificant effect.
A calculated F-ratio > Fcrit
gives me less than a 5%chance that the largerbetween variationoccurred by chance alone.
Remember you choose the amount of risk to take, then find a corresponding Fcritical
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
18/71
F di t ib ti t bl
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
19/71
2010-06-04 SKF Group Slide 18 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
F-distribution table Probability points of the F-distributionDegrees of
Freedom for
Denominator
Degrees of Freedom for Numerator (df)1 2 3 4 5 6 7 8 9 10 15 20
1161.4 199.5 315.7 224.6 230.2 234.0 236.8 238.9 240.5 241.9 245.9 248.0
4052 5000 5403 5625 5764 5859 5928 5981 6022 6056 6157 6209
218.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.40 19.43 19.45
98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 99.40 99.43 99.45
3
10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.70 8.66
34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 27.23 26.87 26.69
47.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.86 5.80
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.55 14.20 14.02
56.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.62 4.56
16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 10.05 9.72 9.55
65.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 3.94 3.87
13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.56 7.40
75.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.51 3.44
12.25 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.31 6.16
85.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.22 3.15
11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.52 5.36
95.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.01 2.94
10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 4.96 4.81
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.85 2.7710.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.56 4.41
154.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.40 2.33
8.68 6.36 5.42 4.89 5.56 4.32 4.14 4.00 3.89 3.80 3.52 3.37
204.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.20 2.12
8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.09 2.94
= 0.05 ... first row
= 0.01 ... second rowNumerator
Denominator
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
20/71
2010-06-04 SKF Group Slide 19 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Mean sum of squares (MS)
In ANOVA, we use the term Mean Square, or simply
MS, in stead of the
term variance.
Remember that variance is defined as the mean of the
squared deviations. In the same way that we use SS
to stand for the
sum of the squared deviations, we now will use MS
to stand for the
mean of the squared deviations. For the final F-ratio we will need anMSbetween
treatments for the numerator and MSwithin
treatments for the
denominator.
within
between
MS
MSratio-F =
MSbetween
= SSbetween/ dfbetween
MSwithin
= SSwithin/ dfwithin
P titi f i d F ti
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
21/71
2010-06-04 SKF Group Slide 20 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Partition of variance and F-ratio OverviewTotal variability
Between treatmentsvariance
Within treatmentsvariance
Measures differences due to:
Treatment effects and
Chance
Measures differences due to:
Chance
Signal Error
Variance (MSbetween
) = SSbetweendfbetween
Variance (MSwithin
) = SSwithindfwithin
F-ratio =MSbetween
MSwithin
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
22/71
2010-06-04 SKF Group Slide 21 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The p-value and ANOVA
Assumptions ...
H0 : There are no differences between subgroups meansHA
: There are differences between subgroups means
Low p-values suggest that there ARE differences betweensubgroups means.
Tip: P-value is low, H0
must go !
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
23/71
2010-06-04 SKF Group Slide 22 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The R-squared value and ANOVA
To provide an indication of how large the effect actually is, we
check
p-value but also the R2
value to take the decision if the result is robust
or not.
For Analysis of Variance, the simplest and most direct way to measureeffect size is to compute R2, the percentage of variance accounted for.
In simpler terms, R2
measures how much of the difference between
scores is accounted for by the differences between treatments.
SSbetween
measures the variability accounted for by the treatment
differences, and SStotal
measures the total variability.
total
between2
SS
SSR =
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
24/71
2010-06-04 SKF Group Slide 23 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The R-squared value and ANOVA - examples
R2
= 90%
Sum
ofvariance
Variance explained by thefactor (treatment)
Error, part ofvariance not
explained by thefactor (xs)
R2
= 50 %
50%
90%
Sum
ofvariance
Which model is more robust? A or B?
A B
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
25/71
2010-06-04 SKF Group Slide 24 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
ANOVA assumptions
1.
Normality
2.
Homogeneity of variance (equal variances)
3.
Independence of error
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
26/71
2010-06-04 SKF Group Slide 25 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Independence of error
Errors should be independent for each value and over
time
If not, then do not assume test is valid
Identify why error is not independent and correct
We use control charts to check the stability and detectthe special cause
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
27/71
2010-06-04 SKF Group Slide 26 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Normality
The values in each group are Normally distributed
While the ANOVA method is robust against departures
from normality as in the t-test, especially with largesample sizes, non-normal distributions where normality
would be expected may indicate an area of investigation
Master Black Belt may be consulted when non-normal
data is being analysed (non-parametric tests)
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
28/71
2010-06-04 SKF Group Slide 27 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Homogeneity of variance
The variance within each group is equal
However, if the sample sizes are equal between groups,
the F-test is robust enough for unequal variances
Always try to have equal sample sizes
If both normality and equal variances are violated,Master Black Belt may be consulted
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
29/71
2010-06-04 SKF Group Slide 28 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
The p-value
For a classical hypothesis test, use the p-value to evaluatethe probability that the calculated F-ratio (or test statistic)was due to within
subgroup noise.
Low p-values
suggest that there ARE differences between
subgroups means:
H0
: There are no differences between subgroups means
HA
: There are differences between subgroups means
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
30/71
2010-06-04 SKF Group Slide 29 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Examples to practice !
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
31/71
2010-06-04 SKF Group Slide 30 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One-way ANOVA
Stat > ANOVA > One-Way
Data must be in one column and the subscripts in another
Can be used with balanced and unbalanced designs
A one-way analysis of variance (ANOVA) tests the hypothesis that
the means of several populations are equal
The method is an extension of the two-sample t-test, specifically
for the case were the population variances are assumed to be
equal. A one-way analysis of variance requires the following: A response, or measurement taken from the units sampled
A factor, or discrete variable which is altered systematically
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
32/71
2010-06-04 SKF Group Slide 31 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One-way ANOVA example: Tire brand test
Four cars:
1 2 3 4
Four brands of tires:
A B C D
Objective: To determine tread wear of tires after 30,000 km of driving.
Problem: How do we assign 16 tires to the 4 cars?
Assign each of the 16 tires at random to a wheel. (Large variability
within brands.)
Ref.: "Fundamental Concepts in the Designof Experiments" by Hicks and Turner
Cars 1 2 3 4C (12) A (14) C (10) A (13)
A (17) A (13) D (11) D (9)
D (13) B (14) B (14) B (8)D (11) C (12) B (13) C (9)
Difference in tread thickness in mm.Model
Tread wear = Overall mean + Brand effect + error
Data of tread wear of tires
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
33/71
2010-06-04 SKF Group Slide 32 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Data of tread wear of tires Each of the 16 tires assigned at random to wheelCar Brand Tread
One
C 12
A 17
D 13
D 11
Two
A 14
A 13
B 14
C 12
Three
D 10
C 11
B 14
B 13
Four
A 13
D 9
B 8
C 9
Open the file
and
check the different assumptions:
Stability
Normality
Homogeneity of variance
(equal variances)
ANOVA - Tire Brand.MTW
One-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
34/71
2010-06-04 SKF Group Slide 33 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One way ANOVA Stat > ANOVA > One-way
You wish to compare the mean tread wear
for the different types of brands of tires.H0 is that
the tread wear are all the same.
Any variation is caused by random variationfound in each brand. The HA is that differentbrands have different tread wear.
Normality and stability
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
35/71
2010-06-04 SKF Group Slide 34 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
5.02.50.0-2.5-5.0
99
90
50
10
1
Residual
Perc
ent
14131211
2
0
-2
-4
Fitted V alue
Resi
dual
3210-1-2-3-4
3
2
1
0
Residual
Frequency
16151413121110987654321
2
0
-2
-4
Observation Order
Res
idual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Tread
Normality and stability One-way ANOVA Residual plots
Normal ?
Stable ?
V i
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
36/71
2010-06-04 SKF Group Slide 35 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
D
C
B
A
181614121086420
B
rand
95% Bonferroni Confidence Intervals for StDevs
Test St at istic 1.52
P-V alue 0.677
Test St at istic 0.15
P-V alue 0.926
Bart lett 's Test
Levene's Test
Test for Equal Variances for Tread
Variance
Variances are equal ?
O ANOVA l
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
37/71
2010-06-04 SKF Group Slide 36 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One-way ANOVA example
1.
Open the file 2.
Select Stat > ANOVA > One-Way
3.
Select Tread for the Response
and Brand for the Factor
4.
Click on OK
ANOVA - Tire Brand.MTW
Interpreting the One-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
38/71
2010-06-04 SKF Group Slide 37 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One-way ANOVA: Tread versus Brand
Source DF SS MS F P
Brand 3 30.69 10.23 2.44 0.115
Error 12 50.25 4.19
Total 15 80.94
S = 2.046 R-Sq = 37.92% R-Sq(adj) = 22.39%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
A 4 14.250 1.893 (----------*----------)
B 4 12.250 2.872 (----------*----------)
C 4 11.000 1.414 (----------*----------)
D 4 10.750 1.708 (----------*----------)
-------+---------+---------+---------+--
10.0 12.0 14.0 16.0
Pooled StDev = 2.046
MINIT
AB
Interpreting the One way ANOVA Output from the session windowThe 1st
row "Brand" gives the stats for
the variation between the means of thefactor levels.The 2nd
row "Error" gives the stats for
the variation due to random error.The 3rd
row "Total" gives the stats for
the overall variability in the data.
Interpreting the One-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
39/71
2010-06-04 SKF Group Slide 38 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
One-way ANOVA: Tread versus Brand
Source DF SS MS F P
Brand 3 30.69 10.23 2.44 0.115
Error 12 50.25 4.19
Total 15 80.94
S = 2.046 R-Sq = 37.92% R-Sq(adj) = 22.39%
Individual 95% CIs For Mean Based on
Pooled StDev
Level N Mean StDev -------+---------+---------+---------+--
A 4 14.250 1.893 (----------*----------)
B 4 12.250 2.872 (----------*----------)
C 4 11.000 1.414 (----------*----------)
D 4 10.750 1.708 (----------*----------)
-------+---------+---------+---------+--
10.0 12.0 14.0 16.0
Pooled StDev = 2.046
MINIT
AB
Interpreting the One way ANOVA Output from the session window1.
What is your decision?
2.
The result is robust or not and why?
3. Which Brand is best?
Two-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
40/71
2010-06-04 SKF Group Slide 39 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Two way ANOVAUsing a 2nd
variable to block Car variation
Assign each tire at random but under the condition thateach tire occurs exactly once on each car.
Reduces
unexplained variability.
Cars 1 2 3 4B (14) D (11) A (13) C (9)
C (12) C (12) B (13) D (9)
A (17) B (14) D (11) B (8)D (13) A (14) C (10) A (13)
Model
Tread wear = Overall mean + Brand effect + Car effect + errorDifference in tread thickness in mm.
Data of tread wear of tires
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
41/71
2010-06-04 SKF Group Slide 40 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Data of tread wear of tires Each tire occurs exactly once on each carCar Brand Tread
One
B 14
C 12
A 17
D 13
Two
D 11
C 12
B 14
A 14
Three
A 13
B 13
D 11
C 10
Four
C 9
D 9
B 8
A 13
Open the file
and check the assumptions:
Stability
Normality
Homogeneity of variance
(equal variances)
ANOVA - Tire Brand Car.MTW
T ANOVA l Ti b d t t
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
42/71
2010-06-04 SKF Group Slide 41 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Two-way ANOVA example: Tire brand test
1.
Open the file 2.
Select Stat > ANOVA > Two-Way
3.
Select Tread for the Response
and Brand for the Row factorand Car for Column factor.Check Display means.
4.
Click on OK
ANOVA - Tire Brand Car.MTW
Interpreting the Two-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
43/71
2010-06-04 SKF Group Slide 42 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Two-way ANOVA: Tread versus Brand, Car
Source DF SS MS F P
Brand 3 30.6875 10.2292 7.96 0.007
Car 3 38.6875 12.8958 10.04 0.003Error 9 11.5625 1.2847
Total 15 80.9375
S = 1.133 R-Sq = 85.71% R-Sq(adj) = 76.19%
MINIT
AB
Interpreting the Two way ANOVA Output from the session windowp-values are low for Car and Brand,therefore: Brands are not the same, andTread loss for Cars is not the same.
Lets look at the residuals plots ...
T ANOVA R id l l t
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
44/71
2010-06-04 SKF Group Slide 43 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Two-way ANOVA Residual plots
210-1-2
99
90
50
10
1
Residual
Perc
ent
161412108
1
0
-1
-2
Fit t ed Value
Residual
1.00.50.0-0.5-1.0-1.5-2.0
4
3
2
1
0
Residual
Freq
uency
16151413121110987654321
1
0
-1
-2
Observat ion Order
Res
idual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Tread
The residuals plots show no unusual observations.The Histogram is not bell shaped (only 16 observations) so it is hard to interpret.
Interpreting the Two-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
45/71
2010-06-04 SKF Group Slide 44 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Two-way ANOVA: Tread versus Brand, Car
Individual 95% CIs For Mean Based on Pooled StDev
Brand Mean -+---------+---------+---------+--------
A 14.25 (-------*-------)B 12.25 (-------*-------)
C 10.75 (-------*-------)
D 11.00 (-------*-------)
-+---------+---------+---------+--------
9.6 11.2 12.8 14.4
Individual 95% CIs For Mean Based on Pooled StDev
Car Mean --------+---------+---------+---------+-
Four 9.75 (------*-----)
One 14.00 (-----*-----)
Three 11.75 (------*-----)Two 12.75 (------*-----)
--------+---------+---------+---------+-
10.0 12.0 14.0 16.0
MINITA
B
p g y Output from the session window
The confidence intervals show:Brands are not the same, andTread loss for Cars is not the same.
1.
What is your decision?
2.
The result is robust or not and why?
3.
Which factor is significant?
Lets look at the data graphically
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
46/71
2010-06-04 SKF Group Slide 45 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Let s look at the data graphically
Graph > Chart > Values from a table
Lets look at the data graphically
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
47/71
2010-06-04 SKF Group Slide 46 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Let s look at the data graphically
Graph > Chart > Values from a table > Data View
Displaying the Two way ANOVA design
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
48/71
2010-06-04 SKF Group Slide 47 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Displaying the Two-way ANOVA design
Car
Brand
FourThreeTwoOne
DACBDACBDACBDACB
18
16
14
12
10
8
6
4
2
0
Tread
B
C
A
D
Brand
Chart of Tread
All 4 Brands performed better in Car One. This is an assignable difference due to Car.
Also it appears that Brand A performs better at each Car than the other Brands.
Displaying the Two way ANOVA design
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
49/71
2010-06-04 SKF Group Slide 48 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Displaying the Two-way ANOVA design
Brand
Car
DACB
Four
Three
Two
One
Four
Three
Two
One
Four
Three
Two
One
Four
Three
Two
One
18
16
14
12
10
8
6
4
2
0
Tread
One
Two
Three
Four
Car
Chart of Tread
Here we are trying to discover which Brand of Tires had the best
Tread Wear characteristics.
We included a blocking variable to explain some of the variability. Based on a comparison
of the bar chart and the ANOVA table which Brand should be selected?
Three-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
50/71
2010-06-04 SKF Group Slide 49 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
y Using a Latin Square designEach brand appears once in each position and only once oneach car (2 restrictions on randomisation).
Minimises
variability.
Model
Tread wear = Overall mean + Brand effect + Car effect+ Position
effect
+ errorDifference in tread thickness in mm.
Position 1 2 3 4I C (12) D (11) A (13) B (8)
I I B (14) C (12) D (11) A (13)
I I I A (17) B (14) C (10) D (9)I V D (13) A (14) B (13) C (9)
Data of tread wear of tires
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
51/71
2010-06-04 SKF Group Slide 50 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Each brand appears once in each position and each carCar Position Brand Tread
One
Left Front B 14
Right Front C 12
Left Back A 17
Right Back D 13
Two
Left Front D 11
Right Front C 12
Left Back B 14
Right Back A 14
Three
Left Front A 13
Right Front B 13
Left Back D 11
Right Back C 10
Four
Left Front C 9
Right Front D 9
Left Back B 8
Right Back A 13
Open the file
and check the
assumptions:
Stability
Normality
Homogeneity of variance
(equal variances)
ANOVA - Tire Brand Car Position.MTW
Three-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
52/71
2010-06-04 SKF Group Slide 51 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
y Stat > ANOVA > General Linear ModelFill out the dialog box as shown.
Click OK.
Interpreting the General Linear Model
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
53/71
2010-06-04 SKF Group Slide 52 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
General Linear Model: Tread versus Car, Position, Brand
Factor Type Levels Values
Car fixed 4 Four, One, Three, Two
Position fixed 4 Left Back, Left Front, Right Back, Right FrontBrand fixed 4 A, B, C, D
Analysis of Variance for Tread, using Adjusted SS for Tests
Source DF Seq SS Adj SS Adj MS F P
Car 3 38.6875 38.6875 12.8958 14.40 0.004
Position 3 6.1875 6.1875 2.0625 2.30 0.177
Brand 3 30.6875 30.6875 10.2292 11.42 0.007
Error 6 5.3750 5.3750 0.8958
Total 15 80.9375
S = 0.946485 R-Sq = 93.36% R-Sq(adj) = 83.40%
MINITA
B
p g Output from the session windowThe 1st
half of the table lists the
value for each level of each factor.The 2nd
half is the ANOVA table.
Two factors are statisticallysignificant at the
= 0.05 level:
Car, Brand. Factor Position
doesnt
appear to be a significant effect.The residual plots will confirm
whether the basic assumptionsabout the error have been met.
Lets look at the residuals plots ...
General Linear Model Residual plots
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
54/71
2010-06-04 SKF Group Slide 53 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
General Linear Model Residual plots
General Linear Model Residual plots
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
55/71
2010-06-04 SKF Group Slide 54 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
General Linear Model Residual plots
10-1
99
90
50
10
1
Residual
Pe
rcent
16141210
1.0
0.5
0.0
-0.5
-1.0
Fit t ed Value
Re
sidual
1.00.50.0-0.5-1.0
4
3
2
1
0
Residual
Frequency
16151413121110987654321
1.0
0.5
0.0
-0.5
-1.0
Observat ion Order
Residual
Normal Probability Plot Versus Fits
Histogram Versus Order
Residual Plots for Tread
Review the residual plots and state the conclusions about the assumptions
regarding error, i.e. that the errors for each treatment level areindependent, normally distributed with a mean = 0 and a constant
variance.
ANOVA example with GLM
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
56/71
2010-06-04 SKF Group Slide 55 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
ANOVA example with GLM
How to include the interaction in the model?
2.
Select Tread for Response
Car and Brand
for Model.
For the interaction we create Car*Brand.
3. Click on OK
1.
Select Stat > ANOVA > General Linear Model
GLM with an unbalanced and nested design
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
57/71
2010-06-04 SKF Group Slide 56 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
GLM with an unbalanced and nested design
Four chemical companies produce insecticides that can be used to
kill
mosquitoes, but the composition of the insecticides differs from
company to
company.
An experiment is conducted to test the efficacy of the insecticides by placing
400 mosquitoes inside a glass container treated with a single insecticide andcounting the live mosquitoes 4 hours later.
Three replications are performed for each product.
The goal is to compare the product effectiveness of the different companies.The factors
are fixed
because you are interested in comparing the particular
brands.
The factors are nested
because each insecticide for each company is unique.
You use GLM to analyse your data because the design is unbalanced:
Company A: 3 type of products
Company B: 2 type of products
Company C: 2 type of products
Company D: 4 type of products
GLM with an unbalanced and nested design
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
58/71
2010-06-04 SKF Group Slide 57 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
GLM with an unbalanced and nested design
For the
Nested design add (Company)
2.
Select NMosquito for Response
Company and Product for Model.
1.
Select Stat > ANOVA > General Linear Model
3.
Click on OK
GLM with an unbalanced and nested design
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
59/71
2010-06-04 SKF Group Slide 58 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
GLM with an unbalanced and nested design
ANOVA table in session window
1.
What is your decision?
2.
Which parameter is significant?
Multi-way ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
60/71
2010-06-04 SKF Group Slide 59 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Multi way ANOVA
Two-way, Balanced, General Linear Model.
Two-way ANOVA may also be used to analyse a design where thereare two controllable factors, both of which are of interest.
More than two factors can be analysed using Balanced ANOVA orGeneral Linear Model.
There may be more than one factor that has an effect on theresponse variable.
This commonly occurs in manufacturing processes. It is often wise toinclude more than one factor in the analysis.
Valuable resources can be used more efficiently by investigating
several
factors at one time.
More error can be explained by including additional factors in the model.
By including more factors interactions can be studied.
What about the other ANOVA options?Wh h i ?
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
61/71
2010-06-04 SKF Group Slide 60 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
When are they appropriate?One-wayANOVA
Studies the effect of one factor at various levels on a response
variable.Two-wayANOVA
Studies the effect of two factors and their interaction at variouslevels on a response variable.
BalancedANOVA
Studies the impact of 2 or more factors and their interactions atvarious levels on a response variable. The levels of factors are
structured such that there are an equal number of levels andobservations within each level for each factor.GeneralLinearModel
Studies the impact of 2 or more factors and their interactions atvarious levels on a response variable. The number of levels andobservations may vary. The factors may be a mixture nested andcrossed relationship. User must specify factors, interactions andnested/crossed relationships of interest.
FullyNestedANOVA
Studies the impact of 2 or more factors. The factors arestructured in a hierarchical structure such that one factor isnested (or unique to) the factor above it. No interactions areobtained.
Partitioning of sums of squares
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
62/71
2010-06-04 SKF Group Slide 61 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Partitioning of sums of squares
SS Within
Brands
Total
SS
SS Between
Brands
SS Within
Cars
SS Between
Cars
SS Within
(Error)
SS Between
Positions
Summary ANOVA
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
63/71
2010-06-04 SKF Group Slide 62 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Summary ANOVA
One Way ANOVA
To analyse the difference between means from 2 or more samples
Balanced ANOVA
To compare the means of populations that are classified in two ormore ways (two or more factors)
General Linear Model
Similar to above
Last words
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
64/71
2010-06-04 SKF Group Slide 63 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Last words
We reviewed:
Graphical methods for analysing differences between means obtainedfrom 2 or more samples.
Analysis of Variance (ANOVA) methods for analysing the differencesbetween means.
Methods for determining whether or not significant differences invariance exist between two or more samples.
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
65/71
2010-06-04 SKF Group Slide 64 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Appendix
Post Hoc tests
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
66/71
2010-06-04 SKF Group Slide 65 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Pos Hoc es s
Definition: Post hoc tests are additional hypothesis tests that are doneafter an ANOVA to determine exactly which mean differences aresignificant and which are not. These tests are done when:
You reject H0
and there are three or more treatments.
Rejecting H0
indicates that at least one difference exists among the
treatments.
With k
= 3 or more, the problem is to find where the differences are.
Note that when you have two treatments, rejecting H0
indicates that the
two means are not equal, in this case there is no question about
which
means are different, and there is no need to do Post Hoc Tests.
The first test we consider is Tukeys HSD test. Tukeys test allows youto compute a single value that determines the minimum difference
between treatment mean that is necessary for significance.
Post Hoc tests
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
67/71
2010-06-04 SKF Group Slide 66 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
This value, called the Honestly Significant Difference (HSD) is then usedto compare any two treatments (Xs). If the mean difference exceedTukeys HSD you conclude that there is significant difference betweentreatments. The formula is:
N: number of data for each treatment
Where the value ofq
is found in the table (next slide). To locate the
appropriate value ofq, you must know the number of treatments in theoverall experiment (k) and the degree of freedom for the Error andselect the Alpha-risk (0.05) q
value used in this test is called a
Studentised range statistic.
Tukeys test requires that the sample size must be the same for all
treatments.
n
MSqHSD within=
Post Hoc tests
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
68/71
2010-06-04 SKF Group Slide 67 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
Tukeys HSD test example
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
69/71
2010-06-04 SKF Group Slide 68 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
y p
Example of seal life by shift
Shift 1 Shift 2 Shift 3
25.40 23.40 20.00
26.31 21.80 22.20
24.10 23.50 19.75
23.74 22.75 20.60
25.10 21.60 20.40
Mean 24.93 22.61 20.59
ANOVA result:
P-value is low, the difference issignificant between the shifts.
Now the question is:
Which mean differences are
significant and which are not?
Tukeys HSD test example
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
70/71
2010-06-04 SKF Group Slide 69 SKF (Group Six Sigma) 2.16 Analysis of Variance (ANOVA)
y p
Tukeys HSD calculation step 1:Determine the q value, in this example k=3 and dffor Error = 12. Checkthe value in the table, we get = 3.77 with Alpha-risk = 0.05
Tukeys HSD calculation step 2:
Determine the HSD value
Tukeys HSD calculation step 3:
The mean difference between any two samples must be at least 1,618to be significant. Using this value, we can make the followingconclusions :
Shift 1 is significantly different from Shift 2 (Mean S1
Mean S2
= 2.32)
Shift 1 is significantly different from Shift 3 (Mean S1
Mean S3
= 4.34)
Shift 2 is significantly different from Shift 2 (Mean S2
Mean S3
= 2.02)
1.6185
0.9213.77HSD ==
Summary
-
8/6/2019 2.16 Analysis of Variance (ANOVA) Rev DD 20100604
71/71
y
ANOVA is used as a hypothesis test and we also use it forcomponents of variation studies
The X is attribute and Y is variable
very common data sets
ANOVA introduced us to 3 preliminary tests before concluding to
accept or reject the null:
Stability
Normality
Homogeneity of variance
All hypothesis tests require these or similar tests of assumptions
Use the appropriate design before to calculate ANOVA
Use Tukeys HSD test to adjust the conclusion if needed