anova

31
Created by: Prabhat Mittal 1 E-mail ID: [email protected] Statistics for Business Analysis Day 10 Session-I ANOVA: Analysis of Variance Learning Objectives To compare more than two population means using one way analysis of variance To use the F distribution to test hypotheses about two population variances To learn about randomized block design To learn the technique of two-way analysis of variance and the concept of interaction

Transcript of anova

Page 1: anova

Created by: Prabhat Mittal 1E-mail ID: [email protected]

Statistics for Business Analysis

Day 10

Session-I

ANOVA: Analysis of Variance

Learning Objectives

� To compare more than two population means

using one way analysis of variance

� To use the F distribution to test hypotheses about

two population variances

� To learn about randomized block design

� To learn the technique of two-way analysis of

variance and the concept of interaction

Page 2: anova

Created by: Prabhat Mittal 2E-mail ID: [email protected]

Chapter Overview

Analysis of Variance (ANOVA)

F-test

Tukey-Kramer

test

One-Way ANOVA

Two-Way ANOVA

Interaction

Effects

Randomized Block Design

Multiple

Comparisons

General ANOVA Setting

� Investigator controls one or more independent variables

� Called factors (or treatment variables)

� Each factor contains two or more levels (or groups or

categories/classifications)

� Observe effects on the dependent variable

� Response to levels of independent variable

� Experimental design: the plan used to collect the data

Page 3: anova

Created by: Prabhat Mittal 3E-mail ID: [email protected]

Completely Randomized Design

� Experimental units (subjects) are assigned randomly to treatments

� Subjects are assumed homogeneous

� Only one factor or independent variable

� With two or more treatment levels

� Analyzed by one-way analysis of variance (ANOVA)

One-Way Analysis of Variance

� Evaluate the difference among the means of three or more groups

Examples: Accident rates for 1st, 2nd, and 3rd shift

Expected mileage for five brands of tires

� Assumptions

� Populations are normally distributed

� Populations have equal variances

� Samples are randomly and independently drawn

Page 4: anova

Created by: Prabhat Mittal 4E-mail ID: [email protected]

Hypotheses of One-Way ANOVA

� All population means are equal

� i.e., no treatment effect (no variation in means among

groups)

� At least one population mean is different

� i.e., there is a treatment effect

� Does not mean that all population means are different

(some pairs may be the same)

c3210 µµµµ:H ==== L

same the are means population the of all Not:H1

One-Factor ANOVA

All Means are the same:

The Null Hypothesis is True

(No Treatment Effect)

c3210 µµµµ:H ==== L

same the are µ all Not:H j1

321 µµµ ==

Page 5: anova

Created by: Prabhat Mittal 5E-mail ID: [email protected]

One-Factor ANOVA

At least one mean is different:

The Null Hypothesis is NOT true

(Treatment Effect is present)

c3210 µµµµ:H ==== L

same the are µ all Not:H j1

321 µµµ ≠= 321 µµµ ≠≠

or

(continued)

Partitioning the Variation

� Total variation can be split into two parts:

SST = Total Sum of Squares

(Total variation)SSA = Sum of Squares Among Groups

(Among-group variation)

SSW = Sum of Squares Within Groups

(Within-group variation)

SST = SSA + SSW

Page 6: anova

Created by: Prabhat Mittal 6E-mail ID: [email protected]

Partitioning the Variation

Total Variation = the aggregate dispersion of the individual

data values across the various factor levels (SST)

Within-Group Variation = dispersion that exists among

the data values within a particular factor level (SSW)

Among-Group Variation = dispersion between the factor

sample means (SSA)

SST = SSA + SSW

(continued)

Partition of Total Variation

Variation Due to

Factor (SSA)Variation Due to Random

Sampling (SSW)

Total Variation (SST)

Commonly referred to as:

� Sum of Squares Within

� Sum of Squares Error

� Sum of Squares Unexplained

� Within-Group Variation

Commonly referred to as:

� Sum of Squares Between

� Sum of Squares Among

� Sum of Squares Explained

� Among Groups Variation

= +

d.f. = n – 1

d.f. = c – 1 d.f. = n – c

Page 7: anova

Created by: Prabhat Mittal 7E-mail ID: [email protected]

Total Sum of Squares

∑∑= =

−=c

1j

n

1i

2ij

j

)XX(SST

Where:

SST = Total sum of squares

c = number of groups (levels or treatments)

nj = number of observations in group j

Xij = ith observation from group j

X = grand mean (mean of all data values)

SST = SSA + SSW

Total Variation

Group 1 Group 2 Group 3

Response, X

X

2

cn

2

12

2

11 )XX(...)XX()XX(SSTc

−++−+−=

(continued)

Page 8: anova

Created by: Prabhat Mittal 8E-mail ID: [email protected]

Among-Group Variation

Where:

SSA = Sum of squares among groups

c = number of groups

nj = sample size from group j

Xj = sample mean from group j

X = grand mean (mean of all data values)

2j

c

1j

j )XX(nSSA −=∑=

SST = SSA + SSW

Among-Group Variation

Variation Due to

Differences Among Groups

iµ jµ

2j

c

1j

j )XX(nSSA −=∑=

1c

SSAMSA

−=

Mean Square Among =

SSA/degrees of freedom

(continued)

Page 9: anova

Created by: Prabhat Mittal 9E-mail ID: [email protected]

Among-Group Variation

Group 1 Group 2 Group 3

Response, X

X1

X 2X

3X

22

22

2

11 )xx(n...)xx(n)xx(nSSA cc −++−+−=

(continued)

Within-Group Variation

Where:

SSW = Sum of squares within groups

c = number of groups

nj = sample size from group j

Xj = sample mean from group j

Xij = ith observation in group j

2jij

n

1i

c

1j

)XX(SSWj

−= ∑∑==

SST = SSA + SSW

Page 10: anova

Created by: Prabhat Mittal 10E-mail ID: [email protected]

Within-Group Variation

Summing the variation

within each group and then

adding over all groups cn

SSWMSW

−=

Mean Square Within =

SSW/degrees of freedom

2jij

n

1i

c

1j

)XX(SSWj

−= ∑∑==

(continued)

Within-Group Variation

Group 1 Group 2 Group 3

Response, X

1X 2

X3

X

2ccn

2212

2111 )XX(...)XX()Xx(SSW

c−++−+−=

(continued)

Page 11: anova

Created by: Prabhat Mittal 11E-mail ID: [email protected]

Obtaining the Mean Squares

cn

SSWMSW

−=

1c

SSAMSA

−=

1n

SSTMST

−=

One-Way ANOVA Table

Source of Variation

dfSS MS(Variance)

Among Groups

SSA MSA =

Within Groups

n - cSSW MSW =

Total n - 1SST =

SSA+SSW

c - 1MSA

MSW

F ratio

c = number of groups

n = sum of the sample sizes from all groups

df = degrees of freedom

SSA

c - 1

SSW

n - c

F =

Page 12: anova

Created by: Prabhat Mittal 12E-mail ID: [email protected]

One-Way ANOVAF Test Statistic

� Test statistic

MSA is mean squares among groups

MSW is mean squares within groups

� Degrees of freedom

� df1 = c – 1 (c = number of groups)

� df2 = n – c (n = sum of sample sizes from all populations)

MSW

MSAF =

H0: µ1= µ2 = … = µc

H1: At least two population means are different

Interpreting One-Way ANOVA F Statistic

� The F statistic is the ratio of the amongestimate of variance and the within estimate of variance� The ratio must always be positive

� df1 = c -1 will typically be small

� df2 = n - c will typically be large

Decision Rule:

� Reject H0 if F > FU, otherwise do not reject H0

0

α = .05

Reject H0Do not reject H0

FU

Page 13: anova

Created by: Prabhat Mittal 13E-mail ID: [email protected]

One-Way ANOVA F Test Example

You want to see if three

different golf clubs yield

different distances. You

randomly select five

measurements from trials on an automated driving

machine for each club. At the

0.05 significance level, is

there a difference in mean

distance?

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206251 216 204

••

••

One-Way ANOVA Example: Scatter Diagram

270

260

250

240

230

220

210

200

190

••

••

••

Distance

1X

2X

3X

X

227.0 x

205.8 x 226.0x 249.2x 321

=

===

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206251 216 204

Club1 2 3

Page 14: anova

Created by: Prabhat Mittal 14E-mail ID: [email protected]

One-Way ANOVA Example Computations

Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206251 216 204

X1 = 249.2

X2 = 226.0

X3 = 205.8

X = 227.0

n1 = 5

n2 = 5

n3 = 5

n = 15

c = 3

SSA = 5 (249.2 – 227)2 + 5 (226 – 227)2 + 5 (205.8 – 227)2 = 4716.4

SSW = (254 – 249.2)2 + (263 – 249.2)2 +…+ (204 – 205.8)2 = 1119.6

MSA = 4716.4 / (3-1) = 2358.2

MSW = 1119.6 / (15-3) = 93.325.275

93.3

2358.2F ==

F = 25.275

One-Way ANOVA Example Solution

H0: µ1 = µ2 = µ3

H1: µj not all equal

α = 0.05

df1= 2 df2 = 12

Test Statistic:

Decision:

Conclusion:

Reject H0 at αααα = 0.05

There is evidence that at least one µj differs from the rest

0

α = .05

FU = 3.89

Reject H0Do not reject H0

25.27593.3

2358.2

MSW

MSAF ===

Critical

Value:

FU = 3.89

Page 15: anova

Created by: Prabhat Mittal 15E-mail ID: [email protected]

145836.0Total

93.3121119.6Within Groups

3.894.99E-0525.2752358.224716.4Between Groups

F critP-valueFMSdfSSSource of Variation

ANOVA

94.2205.810295Club 3

77.522611305Club 2

108.2249.212465Club 1

VarianceAverageSumCountGroups

SUMMARY

One-Way ANOVA Excel Output

EXCEL: tools | data analysis | ANOVA: single factor

Numerical Problems

Ref: 11-28. Page no. 604 The following data show the

number of claims processed per day for a group of four insurance company employees observed for a

number of days. Test the hypothesis that the

employees’ mean claims per day are all the same.

Use the 0.05 level of significance.

Employee 1 15 17 14 12

Employee 2 12 10 13 17

Employee 3 11 14 13 15 12

Employee 4 13 12 12 14 10 9

Solution: H0: µ1 = µ2 = µ3 = µ4 & H1: µj not all equal

Page 16: anova

Created by: Prabhat Mittal 16E-mail ID: [email protected]

One-Way ANOVA Excel OutputEXCEL: tools | data analysis | ANOVA: single factor

P-value F CritFMsDfSS Sources of variation

0.26 3.281.466.48319.45Between groups

Do not reject Ho. The employees' productivities are not significantly different

1885.78Total

4.421566.33Within Groups

3.46666711.67706Employee 4

2.513655Employee 3

8.66666713524Employee 2

4.33333314.5584Employee 1

VarianceAverageSumCountGroups

SUMMARY

Statistics for Business Analysis

Day 10

Session-II

ANOVA: Analysis of Variance

Page 17: anova

Created by: Prabhat Mittal 17E-mail ID: [email protected]

The Tukey-Kramer Procedure

� Tells which population means are significantly different� e.g.: µ1 = µ2 ≠ µ3

� Done after rejection of equal means in ANOVA

� Allows pair-wise comparisons

� Compare absolute mean differences with critical range

xµ1 = µ 2

µ3

Tukey-Kramer Critical Range

where:

QU = Value from Studentized Range Distribution

with c and n - c degrees of freedom for

the desired level of α (see appendix E.9 table)

MSW = Mean Square Within

nj and nj’ = Sample sizes from groups j and j’

+=

j'j

Un

1

n

1

2

MSWQRange Critical

Page 18: anova

Created by: Prabhat Mittal 18E-mail ID: [email protected]

The Tukey-Kramer Procedure: Example

1. Compute absolute mean differences:Club 1 Club 2 Club 3

254 234 200

263 218 222

241 235 197

237 227 206251 216 204 20.2205.8226.0xx

43.4205.8249.2xx

23.2226.0249.2xx

32

31

21

=−=−

=−=−

=−=−

2. Find the QU value from the table in appendix E.10 with c = 3 and (n – c) = (15 – 3) = 12 degrees of freedom

for the desired level of α (α = 0.05 used here):

3.77QU =

The Tukey-Kramer Procedure: Example

5. All of the absolute mean differences are greater than critical range. Therefore there is a significant difference between each pair of means at 5% level of significance. Thus, with 95% confidence we can conclude that the mean distance for club 1 is greater than club 2 and 3, and club 2 is greater than club 3.

16.2855

1

5

1

2

93.33.77

n

1

n

1

2

MSWQRange Critical

j'j

U =

+=

+=

3. Compute Critical Range:

20.2xx

43.4xx

23.2xx

32

31

21

=−

=−

=−

4. Compare:

(continued)

Page 19: anova

Created by: Prabhat Mittal 19E-mail ID: [email protected]

The Randomized Block Design

� Like One-Way ANOVA, we test for equal population means (for different factor levels, for example)...

� ...but we want to control for possible variation from a second factor (with two or more levels)

� Levels of the secondary factor are called blocks

Partitioning the Variation

� Total variation can now be split into three parts:

SST = Total variation

SSA = Among-Group variationSSBL = Among-Block variation

SSE = Random variation

SST = SSA + SSBL + SSE

Page 20: anova

Created by: Prabhat Mittal 20E-mail ID: [email protected]

Sum of Squares for Blocking

Where:

c = number of groups

r = number of blocks

Xi. = mean of all values in block i

X = grand mean (mean of all data values)

∑=

−=r

1i

2i. )XX(cSSBL

SST = SSA + SSBL + SSE

Partitioning the Variation

� Total variation can now be split into three parts:

SST and SSA are

computed as they were

in One-Way ANOVA

SST = SSA + SSBL + SSE

SSE = SST – (SSA + SSBL)

Page 21: anova

Created by: Prabhat Mittal 21E-mail ID: [email protected]

Mean Squares

1c

SSAgroups among square MeanMSA

−==

1r

SSBLblocking square MeanMSBL

−==

)1)(1( −−==

cr

SSEMSE error square Mean

Randomized Block ANOVA Table

Source of Variation

dfSS MS

Among

BlocksSSBL MSBL

Error (r–1)(c-1)SSE MSE

Total rc - 1SST

r - 1 MSBL

MSE

F ratio

c = number of populations rc = sum of the sample sizes from all populations

r = number of blocks df = degrees of freedom

Among

Treatments SSA c - 1 MSA MSA

MSE

Page 22: anova

Created by: Prabhat Mittal 22E-mail ID: [email protected]

Blocking Test

� Blocking test: df1 = r – 1

df2 = (r – 1)(c – 1)

MSBL

MSE

...µµµ:H 3.2.1.0 ===

equal are means block all Not:H1

F =

Reject H0 if F > FU

� Main Factor test: df1 = c – 1

df2 = (r – 1)(c – 1)

MSA

MSE

c..3.2.10 µ...µµµ:H ====

equal are means population all Not:H1

F =

Reject H0 if F > FU

Main Factor Test

Page 23: anova

Created by: Prabhat Mittal 23E-mail ID: [email protected]

The Tukey Procedure

� To test which population means are significantly different� e.g.: µ1 = µ2 ≠ µ3

� Done after rejection of equal means in randomized block ANOVA design

� Allows pair-wise comparisons� Compare absolute mean differences with critical

range

xµµµµ = µµµµ µµµµ1 2 3

etc...

xx

xx

xx

.3.2

.3.1

.2.1

The Tukey Procedure(continued)

r

MSERange Critical uQ=

If the absolute mean difference is greater than the critical range then there is a significant difference between that pair of means at the chosen level of significance.

Compare:

?Range CriticalxxIs .j'.j >−

Page 24: anova

Created by: Prabhat Mittal 24E-mail ID: [email protected]

Factorial Design:

Two-Way ANOVA

� Examines the effect of

� Two factors of interest on the dependent

variable

� e.g., Percent carbonation and line speed on soft drink

bottling process

� Interaction between the different levels of these

two factors

� e.g., Does the effect of one particular carbonation

level depend on which level the line speed is set?

Two-Way ANOVA

� Assumptions

� Populations are normally distributed

� Populations have equal variances

� Independent random samples are drawn

(continued)

Page 25: anova

Created by: Prabhat Mittal 25E-mail ID: [email protected]

Two-Way ANOVA Sources of Variation

Two Factors of interest: A and B

r = number of levels of factor A

c = number of levels of factor B

n’ = number of replications for each cell

n = total number of observations in all cells

(n = rcn’)

Xijk = value of the kth observation of level i of

factor A and level j of factor B

Two-Way ANOVA Sources of Variation

SST

Total Variation

SSAFactor A Variation

SSBFactor B Variation

SSABVariation due to interaction

between A and B

SSERandom variation (Error)

Degrees of

Freedom:

r – 1

c – 1

(r – 1)(c – 1)

rc(n’ – 1)

n - 1

SST = SSA + SSB + SSAB + SSE

(continued)

Page 26: anova

Created by: Prabhat Mittal 26E-mail ID: [email protected]

Two Factor ANOVA Equations

∑∑∑= =

=

−=r

1i

c

1j

n

1k

2

ijk )XX(SST

2r

1i

..i )XX(ncSSA −′= ∑=

2c

1j

.j. )XX(nrSSB −′= ∑=

Total Variation:

Factor A Variation:

Factor B Variation:

Two Factor ANOVA Equations

2r

1i

c

1j

.j.i..ij. )XXXX(nSSAB +−−′= ∑∑= =

∑∑∑= =

=

−=r

1i

c

1j

n

1k

2.ijijk )XX(SSE

Interaction Variation:

Sum of Squares Error:

(continued)

Page 27: anova

Created by: Prabhat Mittal 27E-mail ID: [email protected]

Two Factor ANOVA Equations

where:Mean Grand

nrc

X

X

r

1i

c

1j

n

1k

ijk

=′

=

∑∑∑= =

=

r) ..., 2, 1, (i A factor of level i of Meannc

X

X th

c

1j

n

1k

ijk

..i ==′

=

∑∑=

=

c) ..., 2, 1, (j B factor of level j of Meannr

X

X th

r

1i

n

1k

ijk

.j. ==′

=∑∑

=

=

ij cell of Meann

XX

n

1k

ijk.ij =

′=∑

=

r = number of levels of factor A

c = number of levels of factor B

n’ = number of replications in each cell

(continued)

Mean Square Calculations

1r

SSA Afactor square MeanMSA

−==

1c

SSBB factor square MeanMSB

−==

)1c)(1r(

SSABninteractio square MeanMSAB

−−==

)1'n(rc

SSEerror square MeanMSE

−==

Page 28: anova

Created by: Prabhat Mittal 28

E-mail ID: [email protected]

Two-Way ANOVA:The F Test Statistic

F Test for Factor B Effect

F Test for Interaction Effect

H0: µ1.. = µ2.. = µ3.. = • • •

H1: Not all µi.. are equal

H0: the interaction of A and B is

equal to zero

H1: interaction of A and B is not

zero

F Test for Factor A Effect

H0: µ.1. = µ.2. = µ.3. = • • •

H1: Not all µ.j. are equal

Reject H0

if F > FUMSE

MSAF =

MSE

MSBF =

MSE

MSABF =

Reject H0

if F > FU

Reject H0

if F > FU

Two-Way ANOVASummary Table

SST

SSE

SSAB

SSB

SSA

Sum of

Squares

n – 1Total

MSE =

SSE/rc(n’ – 1)rc(n’ – 1)Error

MSAB

MSE

MSB

MSE

MSA

MSE

F

Statistic

MSAB= SSAB / (r – 1)(c – 1)

(r – 1)(c – 1)AB

(Interaction)

MSB

= SSB /(c – 1)c – 1Factor B

MSA

= SSA /(r – 1)r – 1Factor A

Mean

Squares

Degrees of

Freedom

Source of

Variation

Page 29: anova

Created by: Prabhat Mittal 29

E-mail ID: [email protected]

Features of Two-Way ANOVA F Test

� Degrees of freedom always add up

� n-1 = rc(n’-1) + (r-1) + (c-1) + (r-1)(c-1)

� Total = error + factor A + factor B + interaction

� The denominator of the F Test is always the

same but the numerator is different

� The sums of squares always add up

� SST = SSE + SSA + SSB + SSAB

� Total = error + factor A + factor B + interaction

Examples:Interaction vs. No Interaction

� No interaction:

Factor B Level 1

Factor B Level 3

Factor B Level 2

Factor A Levels

Factor B Level 1

Factor B Level 3

Factor B Level 2

Factor A Levels

Mean R

esponse

Mean R

esponse

� Interaction is present:

Page 30: anova

Created by: Prabhat Mittal 30

E-mail ID: [email protected]

Multiple Comparisons:

The Tukey Procedure

� Unless there is a significant interaction, you can determine the levels that are significantly different using the Tukey procedure

� Consider all absolute mean differences and compare to the calculated critical range

� Example: Absolute differences

for factor A, assuming three factors:

3..2..

3..1..

2..1..

XX

XX

XX

Multiple Comparisons:

The Tukey Procedure

� Critical Range for Factor A:

(where Qu is from Table E.10 with r and rc(n’–1) d.f.)

� Critical Range for Factor B:

(where Qu is from Table E.10 with c and rc(n’–1) d.f.)

n'c

MSERange Critical UQ=

n'r

MSERange Critical UQ=

Page 31: anova

Created by: Prabhat Mittal 31

E-mail ID: [email protected]

Summary

� Described one-way analysis of variance

� The logic of ANOVA

� ANOVA assumptions

� F test for difference in c means

� The Tukey-Kramer procedure for multiple comparisons

� Considered the Randomized Block Design

� Treatment and Block Effects

� Multiple Comparisons: Tukey Procedure

� Described two-way analysis of variance

� Examined effects of multiple factors

� Examined interaction between factors