Anova by Hazilah Mohd Amin

56
Analysis of Variance (ANOVA) Hazilah Mohd Amin

description

Nota kuliah Pensampelan dan analisis data

Transcript of Anova by Hazilah Mohd Amin

Page 1: Anova by Hazilah Mohd Amin

Analysis of Variance (ANOVA)

Hazilah Mohd Amin

Page 2: Anova by Hazilah Mohd Amin

Goals

After completing, you should be able to:

• Recognize situations in which to use analysis of variance (ANOVA)

• Perform a single-factor hypothesis test for Comparing More Than Two Means and interpret results

Page 3: Anova by Hazilah Mohd Amin

The F - Distribution

• Analysis-of-variance procedures rely on F-distribution.

• There are infinitely many F-distributions, and we identify an F-distribution (and F-curve) by its number of degrees of freedom. • F-distribution has two numbers of degrees of freedom.

Page 4: Anova by Hazilah Mohd Amin

Key Fact F distribuition curve:

Page 5: Anova by Hazilah Mohd Amin

Find Critical Value: Example

• Find the F value for 8 df for numerator, 14 df for denominator, and 0.05 area in the right tail of the F distribuition curve.

Critical value: F, df numerator,df denominator = F, 8,14 = ?

Page 6: Anova by Hazilah Mohd Amin

Table 12.1 (p. 534) Critical value: F, 8,14 = 2.70

Page 7: Anova by Hazilah Mohd Amin

Hypotheses of One-Way ANOVA

• – All population means are equal – i.e., no treatment effect (no variation in means among groups)

• – At least one population mean is different – i.e., there is a treatment effect – Does not mean that all population means are different (some pairs

may be the same)

k3210 μμμμ:H

same the are means population the of all Not:HA

The analysis of variance is a procedure that tests to determine whether differences exits between two or more population means.

Page 8: Anova by Hazilah Mohd Amin

One-Factor ANOVA

All Means are the same:The Null Hypothesis is True

(No Treatment Effect)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ

Page 9: Anova by Hazilah Mohd Amin

One-Factor ANOVA

At least one mean is different:The Null Hypothesis is NOT true

(Treatment Effect is present)

k3210 μμμμ:H

same the are μ all Not:H iA

321 μμμ 321 μμμ

or

Page 10: Anova by Hazilah Mohd Amin

One-Way Analysis of Variance

Page 11: Anova by Hazilah Mohd Amin
Page 12: Anova by Hazilah Mohd Amin
Page 13: Anova by Hazilah Mohd Amin

One-Factor ANOVA F Test: Example 1

You want to see if three different golf clubs yield different distances. You randomly select five measurements from trials on an automated driving machine for each club. At the .05 significance level, is there a difference in mean distance?

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Page 14: Anova by Hazilah Mohd Amin

Solution of Example 1– The data are interval– The problem objective is to compare mean

distances in three type of golf club.– We hypothesize that the three population

means are equal

One Way Analysis of Variance

Page 15: Anova by Hazilah Mohd Amin

H0: 1 = 2= 3

H1: At least two means differ

• Solution

Defining the Hypotheses

Page 16: Anova by Hazilah Mohd Amin

Independent samples are drawn from k populations (treatments).

X11

x21

.

.

.Xn1,1

1

1x

n

X12

x22

.

.

.Xn2,2

2

2x

n

X1k

x2k

.

.

.Xnk,k

k

kx

n

Sample sizeSample mean

X is the “response variable”.The variables’ value are called “responses”.

Notation

Page 17: Anova by Hazilah Mohd Amin

Terminology• In the context of this problem…

Response variable – distance

Experimental unit – golf club when we record distance figures.

Factor or treatment – the criterion by which we classify the populations (the treatments). In this problems the factor is the type of golf clubs.

Page 18: Anova by Hazilah Mohd Amin

The rationale of the name of Analysis of Variance (ANOVA)

• We are testing the different between means but why ANOVA?

• Two types of variability are employed when testing for the equality of the population means: Within Samples and Between Samples

Page 19: Anova by Hazilah Mohd Amin

Graphical demonstration:Employing two types of variability: Within Samples and Between Samples

One Way Analysis of Variance

Page 20: Anova by Hazilah Mohd Amin

20

25

30

1

7

Treatment 1 Treatment 2 Treatment 3

10

12

19

9

Treatment 1Treatment 2Treatment 3

20

161514

1110

9

10x1

15x2

20x3

10x1

15x2

20x3

The sample means are the same as before,but the larger within-sample variability makes it harder to draw a conclusionabout the population means.

A small variability withinthe samples makes it easierto draw a conclusion about the population means.

Page 21: Anova by Hazilah Mohd Amin

••••

One-Factor ANOVA Example: Scatter Diagram

270

260

250

240

230

220

210

200

190

•••••

•••••

Distance

227.0 x

205.8 x 226.0x 249.2x 321

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

Club1 2 3

3x

1x

2x x

From scatter diagram, we can clearly see sample means

difference because of small within-sample variability

Page 22: Anova by Hazilah Mohd Amin

Test Statistics (F), Critical Value & Rejection Criterion

• Test statistic:

where MSB is mean squares between varianceswhere MSW is mean squares within variances

• Rejection Region: F > F, k-1,n-k

• Degrees of freedom– df1 = k – 1 (k = levels or treatments)

– df2 = n – k (n = sum of sample sizes from all populations)

MSW

MSBF

H0: μ1= μ2 = … = μ k

HA: At least two population means are different

The hypothesis test:

Page 23: Anova by Hazilah Mohd Amin

One-Factor ANOVA Example Computations

Club 1 Club 2 Club 3254 234 200263 218 222241 235 197237 227 206251 216 204

x1 = 249.2

x2 = 226.0

x3 = 205.8

x = 227.0

n1 = 5

n2 = 5

n3 = 5

n = 15

k = 3

MSB = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.325.275

93.3

2358.2F

kn

SSWMSW

T

1

k

SSBMSB

MSW

MSBF

SSB = 4716.4

SSW = 1119.6

n

x

n

T

n

T

n

T2

3

23

2

22

1

21SSB

3

23

2

22

1

212 )(SSW

n

T

n

T

n

Tx

Page 24: Anova by Hazilah Mohd Amin

F = 25.275

One-Factor ANOVA Example Solution

H0: μ1 = μ2 = μ3

HA: μi not all equal = .05df1= k-1 =3-1 =2 df2 = n-k =15-3 =12

Test Statistic:

Decision: Test statistic F is greater than critical value

Conclusion:Reject H0 at = 0.05

There is evidence that at least one μi differs from the rest

0

= .05

F.05 = 3.885Reject H0Do not

reject H0

25.27593.3

2358.2

MSW

MSBF

Critical Value: F, k-1,n-k = F, 2,12 = 3.885

Page 25: Anova by Hazilah Mohd Amin

SUMMARY

Groups Count Sum Average Variance

Club 1 5 1246 249.2 108.2

Club 2 5 1130 226 77.5

Club 3 5 1029 205.8 94.2

ANOVA

Source of Variation

SS df MS F P-value F crit

Between Groups

4716.4 2 2358.2 25.275 4.99E-05 3.885

Within Groups

1119.6 12 93.3

Total 5836.0 14        

ANOVA Single Factor: Excel OutputEXCEL: tools | data analysis | ANOVA: single factor

25.27593.3

2358.2

MSW

MSBF F, k-1,n-k = F, 2,12 = 3.885

Page 26: Anova by Hazilah Mohd Amin

Rationale 1: Variability Between Sample

• If H0: μ1= μ2 = … = μk is true, we would expect all the sample means to be close to one another.

• If the alternative hypothesis is true, at least some of the sample means would differ.

• Thus, we measure variability between sample means (and hence MSB or MSTr).

Page 27: Anova by Hazilah Mohd Amin

• Large variability within the samples weakens the “ability” of the sample means to represent their corresponding population means.

• Therefore, even though sample means may markedly differ from one another, we have to consider the “within samples variability” (and hence MSW or MSE).

Rationale II: Variability Within

Page 28: Anova by Hazilah Mohd Amin

Interpreting One-Factor ANOVA F Statistic

• The F statistic is the ratio of the between estimate of variance and the within estimate of variance– The ratio must always be positive– df1 = k -1 will typically be small– df2 = n - k will typically be large

The test statistic F ratio should be close to 1 (SSB small due to small sample means difference) if

H0: μ1= μ2 = … = μk is true

The ratio will be larger than 1 (SSB large due to large sample means difference) if

H0: μ1= μ2 = … = μk is false

MSW

MSBF

Page 29: Anova by Hazilah Mohd Amin

Example 2

A study was conducted to determine if the drying time for a certain paint is affected by the type of applicator used. The data in the table on the next screen represents the drying time (in minutes) for 3 different applicators when the paint was applied to standard wallboard.

Is there any evidence to suggest the type of applicator has a significant effect on the paint drying time at the 0.05 level?

Notation: The type of applicator is the treatment, factor or level .

hence k = 3

Page 30: Anova by Hazilah Mohd Amin

Notation Used in ANOVAFactor Levels

Sample from Sample from Sample from Sample from

Replication Level 1 Level 2 Level 3 Level k

n = 1 x 1,1 x 2,1 x 3,1 x k ,1

n = 2 x 1,2 x 2,2 x 3,2 x k ,2

n = 3 x 1,3 x 2,3 x 3,3 x k ,3

Column T 1 T 2 T 3 T k T

Totals T = grand total = sum of all x 's = x = T i

. . .

. . .

. . .

Page 31: Anova by Hazilah Mohd Amin

Applicator (Level)Brush Roller Pad

(i = 1 ) (i = 2) (i = 3)39.1 31.6 32.739.4 33.4 33.231.1 30.2 28.733.7 41.8 29.230.5 33.9 25.834.6 31.4

26.729.5

Sum 208.4 170.9 237.2

Mean 34.73 34.18 29.65

1T 2T 3T

1x 2x 3x

Sample Results

Page 32: Anova by Hazilah Mohd Amin

Solution• Assumptions:

– The data (samples) was randomly collected and all observations are independent.

– The populations are (approximately) normally distributed.

– Populations have equal variances.

• The null and the alternative hypothesis:

Ho: 1 = 2 = 3

The mean drying time is the same for each applicator

Ha: At least one mean is different Not all drying time means are equal

Page 33: Anova by Hazilah Mohd Amin

Partition of Total Variation

Variation Due to Factor/Treatment (SSB)

Variation Due to Random Sampling (SSW)

Sum of Squares Total (SST)

Commonly referred to as: Sum of Squares Within (SSW) Sum of Squares Error (SSE) Sum of Squares Unexplained Within Groups Variation

Commonly referred to as: Sum of Squares Between (SSB) Sum of Squares Treatment (SSTr) Sum of Squares Factor Sum of Squares Among Sum of Squares Explained Among Groups Variation

= +

Total variation SST can be split into two parts: SST = SSB + SSW

Page 34: Anova by Hazilah Mohd Amin

n

x

n

T

n

T

n

T2

3

23

2

22

1

21SSB

3

23

2

22

1

212 )(SSW

n

T

n

T

n

Tx

Page 35: Anova by Hazilah Mohd Amin

x and x2 Calculator: Enter xi data, retrieve x and x2

• Enter Statictics SD: Mode Mode 1• Clear old data: Shift Clr 1 =

• Enter xi data: 39.1 DT 39.4 DT 31.1 DT 33.7 DT 30.5 DT 34.6 DT …29.5 DT

• Find (x): Shift S-SUM 2 = 616.5• Find (x2): Shift S-SUM 1 = 20,316.69

35

Page 36: Anova by Hazilah Mohd Amin

89.31280.2000369.2031619

5.61669.20316)(SST

22

2

n

xx

97.1088.2000377.20112

19

)5.616(

8

2.237

5

9.170

6

4.208

SSB

2222

2

3

23

2

22

1

21

n

x

n

T

n

T

n

T

92.20397.10889.312

SSBSSTSSW

Variation Sums of Squares

93.203

20112.77-20316.69

)(SSW3

23

2

22

1

212

n

T

n

T

n

Tx

Page 37: Anova by Hazilah Mohd Amin

Mean SquareThe mean square for the factor being tested and for the error is obtained by dividing the sum-of-square value by the corresponding number of degrees of freedom

75.1216

92.203df(error)SS(error)

MS(error)

49.542

97.108df(factor)SS(factor)

MS(factor)

Calculations:

Numerator degrees of freedom = df(factor) = k 1 = 3 1 = 2

df(total) = n 1 = 19 1 = 18

Denominator degrees of freedom = df(error) = n k = 19 3 = 16

Page 38: Anova by Hazilah Mohd Amin

One-Way ANOVA Table

Source of Variation

dfSS MS

Between Samples

SSB MSB =

Within Samples n - kSSW MSW =

Total n - 1SST =SSB+SSW

k - 1 MSBMSW

F ratio

SSBk - 1SSWn - k

F =

The sums of squares and the degrees of freedom must check SS(factor) + SS(error) = SS(total) or SSB + SSW = SST df(factor) + df(error) = df(total) or df(between) + df(within) = df(total)

An ANOVA table is often used to record the sums of squares and to organize the rest of the calculations. Format for the ANOVA Table:

Page 39: Anova by Hazilah Mohd Amin

Source df SS MS

Factor 2 108.97 54.59

Error 16 203.92 12.75

Total 18 312.89

The Completed ANOVA Table

The Complete ANOVA Table:

27.475.1249.54

MS(error)MS(factor)

* FThe Test Statistic:

Page 40: Anova by Hazilah Mohd Amin

Solution Continued

The Results

a. Decision: Reject Ho at = 0.05b. Conclusion: There is evidence to suggest the three population means are not all the same. The type of applicator has a significant effect on the paint drying time at the 0.05 level of significance.

Critical Value: F, k-1,n-k = F, 2,16 = 3.63

The Test Statistic F = 4.27 is in the rejection region.

Reject H0

F.05 = 3.63

Do not reject H0

= .05

Page 41: Anova by Hazilah Mohd Amin

One-Way ANOVA F-Test: Exercise 1•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.

M1 M2 M3 M410 11 13 18

9 16 8 235 9 9 25

© 1984-1994 T/Maker Co.

Answer: Critical Value = 4.07. Test statistic = 11.6

Page 42: Anova by Hazilah Mohd Amin

Hey! Lets get our hand dirty… Using SPSS….

Page 43: Anova by Hazilah Mohd Amin

One Way Analysis of Variance Using SPSS

• Suppose we want to know whether students who have to work many hours outside school to support themselves find their grade suffering.

• We examine this question by comparing the GPAs of students who work various hours outside school.

• Let’s examine this question using data in Student file. File>Open> Student

Page 44: Anova by Hazilah Mohd Amin

One Way Analysis of Variance Using SPSS

• First examine the average GPA for each of the three work categories (0 hrs, 1-19hrs, >20hrs) - WorkCat

• Graph>Boxplot then choose Simple and click Define. Select GPA as the variable and WorkCat for the Category Axis. Click

• Option

Page 45: Anova by Hazilah Mohd Amin

After Clicking Options…, click off Display groups defined by missing value, and click

Continue then OK.

• You’ll get this

Page 46: Anova by Hazilah Mohd Amin

What is the Box-plot telling us?

• Some variation across the groups• See median GPAs (dark line in the middle

of box) differ slightly between groups. • So, should we attribute the observed

difference to sampling error or they genuinely differ?

• Neither box-plot nor the median offer decisive evidence. Hence we need ANOVA.

Page 47: Anova by Hazilah Mohd Amin

One Way Analysis of Variance Using SPSS

• We are testing: H0 : 1 = 2 = 3

H1: At least two means differ

• Before attempting ANOVA, need to review the ANOVA assumptions. (i)Independent samples (ii) Normality (iii) Variances equality. We can test both (ii) & (iii).

• Analyze>Descriptive Statistics>Explore

Page 48: Anova by Hazilah Mohd Amin

Analyze>Descriptive Statistics>Explore

• In the Explore dialog box, select GPAs as the dependent List variable, WorkCat as the Factor List variable and Plot as the Display. Next, click Plot…

• We are interested in a normality test, select

Select this & deselect this only. Click Continue

and OK. See next slide…

Page 49: Anova by Hazilah Mohd Amin

The Output has several parts, let focus on the tests of normality

• The Kolmogorov-Smirnov test assesses whether there is significant departure from normality in the population distribution of the 3 groups. H0: Distributions are normal.

• Look at the p-values, all are > 0.05. Do not reject H0. Hence no evidence of non-normality.

Page 50: Anova by Hazilah Mohd Amin

One Way Analysis of Variance Using SPSS

• We still need to validate the homogeneity of variance assumption. We do this within ANOVA.

• Analyze>Compare Means>One-Way ANOVA• Dependent List variable is GPA and Factor variable is WorkCat.Click Option,

Page 51: Anova by Hazilah Mohd Amin

One Way Analysis of Variance Using SPSS

• under Statistics, select Descriptive and Homogeneity of variance test. Click Continue & OK

• H0: Variances are equal. One-Way ANOVA output consists many parts. Look at the p-value > 0.05.

• Hence do not reject H0.

Page 52: Anova by Hazilah Mohd Amin

Normality & Homogeneity of variances assumptions met … hence

• Let find out whether students who work various hours outside school differ in their GPAs.

• The P-value of .000 is very small, hence we reject Ho and conclude that

the means GPAs are not all the same. Where are the differences? Hence Post-Hoc test…

Page 53: Anova by Hazilah Mohd Amin

End of ANOVA

See U Later…

Page 54: Anova by Hazilah Mohd Amin

One-Way ANOVA F-Test: Exercise 1 Solution

•You’re a trainer for Microsoft Corp. Is there any evidence to suggest the type of training method has a significant effect on the learning time at the 0.05 level?•The data in the table represents the learning times (in hours) of 12 people using 4 different training methods.

M1 M2 M3 M410 11 13 18

9 16 8 235 9 9 25

© 1984-1994 T/Maker Co.

Page 55: Anova by Hazilah Mohd Amin

Summary Table Solution*

Source ofSource ofVariationVariation

DegreesDegrees ofofFreedomFreedom

Sum ofSum ofSquaresSquares

MeanMeanSquareSquare

(Variance)(Variance)

FF

TreatmentTreatment((MethodsMethods))

4 - 1 = 34 - 1 = 3 348348 116116 11.611.6

ErrorError 12 - 4 = 812 - 4 = 8 8080 1010

TotalTotal 12 - 1 = 1112 - 1 = 11 428428

Page 56: Anova by Hazilah Mohd Amin

FF00 4.074.07

One-Way ANOVA F-Test Solution*

•H0: 1 = 2 = 3 = 4

•Ha: Not All Equal• = .05•1 = 3 2 = 8 •Critical Value(s):

Test Statistic: Test Statistic:

Decision:Decision:

Conclusion:Conclusion:Reject at Reject at = .05 = .05

There Is Evidence Pop. There Is Evidence Pop. Means Are DifferentMeans Are Different

= .05= .05

FFMSBMSB

MSEMSE

116116

10101111 66..