Anova by Hazilah Mohd Amin
-
Upload
hazilahmohd -
Category
Education
-
view
3.284 -
download
2
description
Transcript of Anova by Hazilah Mohd Amin
Analysis of Variance (ANOVA)
Hazilah Mohd Amin
Goals
After completing, you should be able to:
• Recognize situations in which to use analysis of variance (ANOVA)
• Perform a single-factor hypothesis test for Comparing More Than Two Means and interpret results
The F - Distribution
• Analysis-of-variance procedures rely on F-distribution.
• There are infinitely many F-distributions, and we identify an F-distribution (and F-curve) by its number of degrees of freedom. • F-distribution has two numbers of degrees of freedom.
Key Fact F distribuition curve:
Find Critical Value: Example
• Find the F value for 8 df for numerator, 14 df for denominator, and 0.05 area in the right tail of the F distribuition curve.
Critical value: F, df numerator,df denominator = F, 8,14 = ?
Table 12.1 (p. 534) Critical value: F, 8,14 = 2.70
Hypotheses of One-Way ANOVA
• – All population means are equal – i.e., no treatment effect (no variation in means among groups)
• – At least one population mean is different – i.e., there is a treatment effect – Does not mean that all population means are different (some pairs
may be the same)
k3210 μμμμ:H
same the are means population the of all Not:HA
The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.
One-Factor ANOVA
All Means are the same:The Null Hypothesis is True
(No Treatment Effect)
k3210 μμμμ:H
same the are μ all Not:H iA
321 μμμ
One-Factor ANOVA
At least one mean is different:The Null Hypothesis is NOT true
(Treatment Effect is present)
k3210 μμμμ:H
same the are μ all Not:H iA
321 μμμ 321 μμμ
or
One-Way Analysis of Variance
One-Factor ANOVA F Test: Example 1
You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean distance?
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
Solution of Example 1– The data are interval– The problem objective is to compare mean
distances in three type of golf club.– We hypothesize that the three population
means are equal
One Way Analysis of Variance
H0: 1 = 2= 3
H1: At least two means differ
• Solution
Defining the Hypotheses
Independent samples are drawn from k populations (treatments).
X11
x21
.
.
.Xn1,1
1
1x
n
X12
x22
.
.
.Xn2,2
2
2x
n
X1k
x2k
.
.
.Xnk,k
k
kx
n
Sample sizeSample mean
X is the “response variable”.The variables’ value are called “responses”.
Notation
Terminology• In the context of this problem…
Response variable – distance
Experimental unit – golf club when we record distance figures.
Factor or treatment – the criterion by which we classify the populations (the treatments). In this problems the factor is the type of golf clubs.
The rationale of the name of Analysis of Variance (ANOVA)
• We are testing the different between means but why ANOVA?
• Two types of variability are employed when testing for the equality of the population means: Within Samples and Between Samples
Graphical demonstration:Employing two types of variability: Within Samples and Between Samples
One Way Analysis of Variance
20
25
30
1
7
Treatment 1 Treatment 2 Treatment 3
10
12
19
9
Treatment 1Treatment 2Treatment 3
20
161514
1110
9
10x1
15x2
20x3
10x1
15x2
20x3
The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.
A small variability withinthe samples makes it easierto draw a conclusion about the population means.
••••
•
One-Factor ANOVA Example: Scatter Diagram
270
260
250
240
230
220
210
200
190
•••••
•••••
Distance
227.0 x
205.8 x 226.0x 249.2x 321
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
Club1 2 3
3x
1x
2x x
From scatter diagram, we can clearly see sample means
difference because of small within-sample variability
Test Statistics (F), Critical Value & Rejection Criterion
• Test statistic:
where MSB is mean squares between varianceswhere MSW is mean squares within variances
• Rejection Region: F > F, k-1,n-k
• Degrees of freedom– df1 = k – 1 (k = levels or treatments)
– df2 = n – k (n = sum of sample sizes from all populations)
MSW
MSBF
H0: μ1= μ2 = … = μ k
HA: At least two population means are different
The hypothesis test:
One-Factor ANOVA Example Computations
Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204
x1 = 249.2
x2 = 226.0
x3 = 205.8
x = 227.0
n1 = 5
n2 = 5
n3 = 5
n = 15
k = 3
MSB = 4716.4 / (3-1) = 2358.2
MSW = 1119.6 / (15-3) = 93.325.275
93.3
2358.2F
kn
SSWMSW
T
1
k
SSBMSB
MSW
MSBF
SSB = 4716.4
SSW = 1119.6
n
x
n
T
n
T
n
T2
3
23
2
22
1
21SSB
3
23
2
22
1
212 )(SSW
n
T
n
T
n
Tx
F = 25.275
One-Factor ANOVA Example Solution
H0: μ1 = μ2 = μ3
HA: μi not all equal = .05df1= k-1 =3-1 =2 df2 = n-k =15-3 =12
Test Statistic:
Decision: Test statistic F is greater than critical value
Conclusion:Reject H0 at = 0.05
There is evidence that at least one μi differs from the rest
0
= .05
F.05 = 3.885Reject H0Do not
reject H0
25.27593.3
2358.2
MSW
MSBF
Critical Value: F, k-1,n-k = F, 2,12 = 3.885
SUMMARY
Groups Count Sum Average Variance
Club 1 5 1246 249.2 108.2
Club 2 5 1130 226 77.5
Club 3 5 1029 205.8 94.2
ANOVA
Source of Variation
SS df MS F P-value F crit
Between Groups
4716.4 2 2358.2 25.275 4.99E-05 3.885
Within Groups
1119.6 12 93.3
Total 5836.0 14
ANOVA Single Factor: Excel OutputEXCEL: tools | data analysis | ANOVA: single factor
25.27593.3
2358.2
MSW
MSBF F, k-1,n-k = F, 2,12 = 3.885
Rationale 1: Variability Between Sample
• If H0: μ1= μ2 = … = μk is true, we would expect all the sample means to be close to one another.
• If the alternative hypothesis is true, at least some of the sample means would differ.
• Thus, we measure variability between sample means (and hence MSB or MSTr).
• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.
• Therefore, even though sample means may markedly differ from one another, we have to consider the “within samples variability” (and hence MSW or MSE).
Rationale II: Variability Within
Interpreting One-Factor ANOVA F Statistic
• The F statistic is the ratio of the between estimate of variance and the within estimate of variance– The ratio must always be positive– df1 = k -1 will typically be small– df2 = n - k will typically be large
The test statistic F ratio should be close to 1 (SSB small due to small sample means difference) if
H0: μ1= μ2 = … = μk is true
The ratio will be larger than 1 (SSB large due to large sample means difference) if
H0: μ1= μ2 = … = μk is false
MSW
MSBF
Example 2
A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table on the next screen represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard.
Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level?
Notation: The type of applicator is the treatment, factor or level .
hence k = 3
Notation Used in ANOVAFactor Levels
Sample from Sample from Sample from Sample from
Replication Level 1 Level 2 Level 3 Level k
n = 1 x 1,1 x 2,1 x 3,1 x k ,1
n = 2 x 1,2 x 2,2 x 3,2 x k ,2
n = 3 x 1,3 x 2,3 x 3,3 x k ,3
Column T 1 T 2 T 3 T k T
Totals T = grand total = sum of all x 's = x = T i
. . .
. . .
. . .
Applicator (Level)Brush Roller Pad
(i = 1 ) (i = 2) (i = 3)39.1 31.6 32.739.4 33.4 33.231.1 30.2 28.733.7 41.8 29.230.5 33.9 25.834.6 31.4
26.729.5
Sum 208.4 170.9 237.2
Mean 34.73 34.18 29.65
1T 2T 3T
1x 2x 3x
Sample Results
Solution• Assumptions:
– The data (samples) was randomly collected and all observations are independent.
– The populations are (approximately) normally distributed.
– Populations have equal variances.
• The null and the alternative hypothesis:
Ho: 1 = 2 = 3
The mean drying time is the same for each applicator
Ha: At least one mean is different Not all drying time means are equal
Partition of Total Variation
Variation Due to Factor/Treatment (SSB)
Variation Due to Random Sampling (SSW)
Sum of Squares Total (SST)
Commonly referred to as: Sum of Squares Within (SSW) Sum of Squares Error (SSE) Sum of Squares Unexplained Within Groups Variation
Commonly referred to as: Sum of Squares Between (SSB) Sum of Squares Treatment (SSTr) Sum of Squares Factor Sum of Squares Among Sum of Squares Explained Among Groups Variation
= +
Total variation SST can be split into two parts: SST = SSB + SSW
n
x
n
T
n
T
n
T2
3
23
2
22
1
21SSB
3
23
2
22
1
212 )(SSW
n
T
n
T
n
Tx
x and x2 Calculator: Enter xi data, retrieve x and x2
• Enter Statictics SD: Mode Mode 1• Clear old data: Shift Clr 1 =
• Enter xi data: 39.1 DT 39.4 DT 31.1 DT 33.7 DT 30.5 DT 34.6 DT …29.5 DT
• Find (x): Shift S-SUM 2 = 616.5• Find (x2): Shift S-SUM 1 = 20,316.69
35
89.31280.2000369.2031619
5.61669.20316)(SST
22
2
n
xx
97.1088.2000377.20112
19
)5.616(
8
2.237
5
9.170
6
4.208
SSB
2222
2
3
23
2
22
1
21
n
x
n
T
n
T
n
T
92.20397.10889.312
SSBSSTSSW
Variation Sums of Squares
93.203
20112.77-20316.69
)(SSW3
23
2
22
1
212
n
T
n
T
n
Tx
Mean SquareThe mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom
75.1216
92.203df(error)SS(error)
MS(error)
49.542
97.108df(factor)SS(factor)
MS(factor)
Calculations:
Numerator degrees of freedom = df(factor) = k 1 = 3 1 = 2
df(total) = n 1 = 19 1 = 18
Denominator degrees of freedom = df(error) = n k = 19 3 = 16
One-Way ANOVA Table
Source of Variation
dfSS MS
Between Samples
SSB MSB =
Within Samples n - kSSW MSW =
Total n - 1SST =SSB+SSW
k - 1 MSBMSW
F ratio
SSBk - 1SSWn - k
F =
The sums of squares and the degrees of freedom must check SS(factor) + SS(error) = SS(total) or SSB + SSW = SST df(factor) + df(error) = df(total) or df(between) + df(within) = df(total)
An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations. Format for the ANOVA Table:
Source df SS MS
Factor 2 108.97 54.59
Error 16 203.92 12.75
Total 18 312.89
The Completed ANOVA Table
The Complete ANOVA Table:
27.475.1249.54
MS(error)MS(factor)
* FThe Test Statistic:
Solution Continued
The Results
a. Decision: Reject Ho at = 0.05b. Conclusion: There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time at the 0.05 level of significance.
Critical Value: F, k-1,n-k = F, 2,16 = 3.63
The Test Statistic F = 4.27 is in the rejection region.
Reject H0
F.05 = 3.63
Do not reject H0
= .05
One-Way ANOVA F-Test: Exercise 1•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.
M1 M2 M3 M410 11 13 18
9 16 8 235 9 9 25
© 1984-1994 T/Maker Co.
Answer: Critical Value = 4.07. Test statistic = 11.6
Hey! Lets get our hand dirty… Using SPSS….
One Way Analysis of Variance Using SPSS
• Suppose we want to know whether students who have to work many hours outside school to support themselves find their grade suffering.
• We examine this question by comparing the GPAs of students who work various hours outside school.
• Let’s examine this question using data in Student file. File>Open> Student
One Way Analysis of Variance Using SPSS
• First examine the average GPA for each of the three work categories (0 hrs, 1-19hrs, >20hrs) - WorkCat
• Graph>Boxplot then choose Simple and click Define. Select GPA as the variable and WorkCat for the Category Axis. Click
• Option
After Clicking Options…, click off Display groups defined by missing value, and click
Continue then OK.
• You’ll get this
What is the Box-plot telling us?
• Some variation across the groups• See median GPAs (dark line in the middle
of box) differ slightly between groups. • So, should we attribute the observed
difference to sampling error or they genuinely differ?
• Neither box-plot nor the median offer decisive evidence. Hence we need ANOVA.
One Way Analysis of Variance Using SPSS
• We are testing: H0 : 1 = 2 = 3
H1: At least two means differ
• Before attempting ANOVA, need to review the ANOVA assumptions. (i)Independent samples (ii) Normality (iii) Variances equality. We can test both (ii) & (iii).
• Analyze>Descriptive Statistics>Explore
Analyze>Descriptive Statistics>Explore
• In the Explore dialog box, select GPAs as the dependent List variable, WorkCat as the Factor List variable and Plot as the Display. Next, click Plot…
• We are interested in a normality test, select
Select this & deselect this only. Click Continue
and OK. See next slide…
The Output has several parts, let focus on the tests of normality
• The Kolmogorov-Smirnov test assesses whether there is significant departure from normality in the population distribution of the 3 groups. H0: Distributions are normal.
• Look at the p-values, all are > 0.05. Do not reject H0. Hence no evidence of non-normality.
One Way Analysis of Variance Using SPSS
• We still need to validate the homogeneity of variance assumption. We do this within ANOVA.
• Analyze>Compare Means>One-Way ANOVA• Dependent List variable is GPA and Factor variable is WorkCat.Click Option,
One Way Analysis of Variance Using SPSS
• under Statistics, select Descriptive and Homogeneity of variance test. Click Continue & OK
• H0: Variances are equal. One-Way ANOVA output consists many parts. Look at the p-value > 0.05.
• Hence do not reject H0.
Normality & Homogeneity of variances assumptions met … hence
• Let find out whether students who work various hours outside school differ in their GPAs.
• The P-value of .000 is very small, hence we reject Ho and conclude that
the means GPAs are not all the same. Where are the differences? Hence Post-Hoc test…
End of ANOVA
See U Later…
One-Way ANOVA F-Test: Exercise 1 Solution
•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.
M1 M2 M3 M410 11 13 18
9 16 8 235 9 9 25
© 1984-1994 T/Maker Co.
Summary Table Solution*
Source ofSource ofVariationVariation
DegreesDegrees ofofFreedomFreedom
Sum ofSum ofSquaresSquares
MeanMeanSquareSquare
(Variance)(Variance)
FF
TreatmentTreatment((MethodsMethods))
4 - 1 = 34 - 1 = 3 348348 116116 11.611.6
ErrorError 12 - 4 = 812 - 4 = 8 8080 1010
TotalTotal 12 - 1 = 1112 - 1 = 11 428428
FF00 4.074.07
One-Way ANOVA F-Test Solution*
•H0: 1 = 2 = 3 = 4
•Ha: Not All Equal• = .05•1 = 3 2 = 8 •Critical Value(s):
Test Statistic: Test Statistic:
Decision:Decision:
Conclusion:Conclusion:Reject at Reject at = .05 = .05
There Is Evidence Pop. There Is Evidence Pop. Means Are DifferentMeans Are Different
= .05= .05
FFMSBMSB
MSEMSE
116116
10101111 66..