Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with...

68
Chapter 15 The Analysis of Variance

Transcript of Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with...

Page 1: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

Chapter 15

The Analysis of Variance

Page 2: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

2 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or breast when treated with ascorbate1. In this study, the authors wanted to determine if the survival times differ based on the affected organ.

1 Cameron, E. and Pauling, L. (1978) Supplemental ascorbate in the supportive treatment of cancer: re-evaluation of prolongation of survival time in terminal human cancer. Proceedings of the National Academy of Science, USA, 75, 4538-4542.

A Problem

Page 3: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

3 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

A comparative dotplot of the survival times is shown below.

A Problem

3000200010000

Survival Time (in days)

Dotplot for Survival Time

Cancer Type

Breast

Bronchus

Colon

Ovary

Stomach

Page 4: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

4 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The hypotheses used to answer the question of interest are

H0: stomach = bronchus = colon = ovary = breast

Ha: At least two of the ’s are different

The question is similar to ones encountered in chapter 11 where we looked at tests for the difference of means of two different variables. In this case we are interested in looking a more than two variable.

A Problem

Page 5: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

5 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

A single-factor analysis of variance (ANOVA) problems involves a comparison of k population or treatment means 1, 2, … , k. The objective is to test the hypotheses:

H0: 1 = 2 = 3 = … = k

Ha: At least two of the ’s are different

Single-factor Analysis of Variance (ANOVA)

Page 6: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

6 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The analysis is based on k independently selected samples, one from each population or for each treatment.

In the case of populations, a random sample from each population is selected independently of that from any other population.

When comparing treatments, the experimental units (subjects or objects) that receive any particular treatment are chosen at random from those available for the experiment.

A comparison of treatments based on independently selected experimental units is often referred to as a completely randomized design.

Single-factor Analysis of Variance (ANOVA)

Page 7: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

7 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Single-Factor Analysis of Variance (ANOVA)

70

60

50

40

Fertilizer

Yie

ld

Dotplots of Yield by Fertilizer(group means are indicated by lines)

Type 1 Type 2 Type 3

Sta

tistic

s

Psy

cho

log

y

Eco

nom

ics

Bus

ine

ss

85

75

65

Subject

Pric

e

Dotplots of Price by Subject(group means are indicated by lines)

Notice that in the comparative dotplot on the left, the differences in the treatment means is large relative to the variability within the samples while with the comparative dotplot on the right, the differences in the sample means is relative to the sample variability is not so clear cut. ANOVA techniques will allow us to determined if those differences are significant.

Page 8: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

8 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVA Notationk = number of populations or treatments being compared

Population or treatment 1 2 … k

Population or treatment mean 1 2 … k

Population or treatment variance …

Sample size n1 n2 … nk

Sample mean …

Sample variance …N = n1 + n2 + … + nk (Total number of observations in

the data set)T = grand total = sum of all N observations

1x 2x kx

21 2

2 2k

21s 2

2s 2ks

1 1 2 2 k kn x n x n x T

x grand meanN

k = number of populations or treatments being compared

Population or treatment 1 2 … k

Population or treatment mean 1 2 … k

Population or treatment variance …

Sample size n1 n2 … nk

Sample mean …

Sample variance …N = n1 + n2 + … + nk (Total number of observations in

the data set)T = grand total = sum of all N observations

1x 2x kx

21 2

2 2k

21s 2

2s 2ks

1 1 2 2 k kn x n x n x T

x grand meanN

1x 2x kx

21 2

2 2k

21s 2

2s 2ks

1 1 2 2 k kn x n x n x T

x grand meanN

Page 9: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

9 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Assumptions for ANOVA

1. Each of the k populations or treatments, the response distribution is normal.

2. 1 = 2 = … = k (The k normal distributions have identical standard deviations.

3. The observations in the sample from any particular one of the k populations or treatments are independent of one another.

4. When comparing population means, k random samples are selected independently of one another. When comparing treatment means, treatments are assigned at random to subjects or objects.

Page 10: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

10 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

DefinitionsA measure of disparity among the sample means is the treatment sum of squares, denoted by SSTr is given by

2 2 2

1 1 2 2 k kSSTr n x x n x x n x x

A measure of disparity among the sample means is the treatment sum of squares, denoted by SSTr is given by

2 2 2

1 1 2 2 k kSSTr n x x n x x n x x

A measure of variation within the k samples, called error sum of squares and denoted by SSE is given by

2 2 21 1 2 2 k kSSE n 1 s n 1 s n 1 s

A measure of variation within the k samples, called error sum of squares and denoted by SSE is given by

2 2 21 1 2 2 k kSSE n 1 s n 1 s n 1 s

Page 11: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

11 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Definitions

The error df comes from adding the df’s associated with each of the sample variances:

(n1 - 1) + (n2 - 1) + …+ (nk - 1)

= n1 + n2 … + nk - 1 - 1 - … - 1 = N - k

A mean square is a sum of squares divided by its df. In particular,

SSTrk 1

mean square for treatments = MSTr = SSTrk 1

mean square for treatments = MSTr =

mean square for error = MSE = SSEN k

mean square for error = MSE = SSEN k

Page 12: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

12 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

ExampleThree filling machines are used by a bottler to fill 12 oz cans of soda. In an attempt to determine if the three machines are filling the cans to the same (mean) level, independent samples of cans filled by each were selected and the amounts of soda in the cans measured. The samples are given below.

Machine 112.033 11.985 12.009 12.00912.033 12.025 12.054 12.050

Machine 212.031 11.985 11.998 11.99211.985 12.027 11.987

Machine 312.034 12.021 12.038 12.05812.001 12.020 12.029 12.01112.021

Page 13: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

13 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example1 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

2 2 21 1 2 2 k k

2 2 2

SSE n 1 s n 1 s n 1 s

7(0.0230078) 6(0.0198890) 8(0.01649579)

0.0037055 0.0023734 0.0021769

0.00825582

2 2 2

1 1 2 2 k k

2 2 2

SSTr n x x n x x n x x

8(0.0065833) 7(-0.0174524) 9(0.0077222)

0.000334672+0.00213210+0.00053669

0.00301552

x 12.018167

Page 14: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

14 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example

1 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

x 12.018167

SSTrk 1

mean square for treatments = MSTr =

SSTr 0.00301552MSTr 0.0015078

k 1 3 1

SSTrk 1

mean square for treatments = MSTr = SSTrk 1

mean square for treatments = MSTr =

SSTr 0.00301552MSTr 0.0015078

k 1 3 1

mean square for error = MSE = SSEN k

SSE 0.0082579MSE 0.00039313

N k 24 3

mean square for error = MSE = SSEN k

mean square for error = MSE = SSEN k

SSE 0.0082579MSE 0.00039313

N k 24 3

Page 15: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

15 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

CommentsBoth MSTr and MSE are quantities that are calculated from sample data. As such, both MSTr and MSE are statistics and have sampling distributions.

More specifically, when H0 is true (1 = 2 = 3 = …

= k), MSTr = MSE.

However, when H0 is false, MSTr = MSE and the greater the differences among the ’s, the larger MSTr will be relative to MSE.

Page 16: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

16 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The Single-Factor ANOVA F Test

Null hypothesis: H0: 1 = 2 = 3 = … = k

Alternate hypothesis: At least two of the ’s are different

Test Statistic: MSTrF

MSETest Statistic: MSTr

FMSE

Page 17: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

17 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The Single-Factor ANOVA F TestWhen H0 is true and the ANOVA assumptions are reasonable, F has an F distribution with df1 = k - 1 and df2 = N - k.

Values of F more contradictory to H0 than what was calculated are values even farther out in the upper tail, so the P-value is the area captured in the upper tail of the corresponding F curve.

Page 18: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

18 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

ExampleConsider the earlier example involving the three filling machines.Machine 1

12.033 11.985 12.009 12.009 12.033 12.025 12.054 12.050Machine 2

12.031 11.985 11.998 11.992 11.985 12.027 11.987

Machine 312.034 12.021 12.038 12.058 12.001 12.020 12.029 12.01112.021

1 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

SSE 0.00825582SSTr 0.00301552

x 12.0181671 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

SSE 0.00825582SSTr 0.00301552

x 12.018167

MSTr 0.0015078 MSE 0.00039313

Page 19: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

19 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example

1. Let 1, 2 and 3 denote the true mean amount of soda in the cans filled by machines 1, 2 and 3, respectively.

2. H0: 1 = 2 = 3

3. Ha: At least two among are 1, 2 and 3

different

4. Significance level: = 0.01

5. Test statistic: MSTrF

MSE5. Test statistic: MSTr

FMSE

Page 20: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

20 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example6. Looking at the comparative dotplot, it seems

reasonable to assume that the distributions have the same ’s. We shall look at the normality assumption on the next slide. *

12.0612.0512.0412.0312.0212.0112.0011.99

Fill

Dotplot for FillMachine

Machine 1

Machine 2

Machine 3

*When the sample sizes are large, we can make judgments about both the equality of the standard deviations and the normality of the underlying populations with a comparative boxplot.

Page 21: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

21 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example6. (continued)

Looking at normal plots for the samples, it certainly appears reasonable to assume that the samples from Machine’s 1 and 2 are samples from normal distributions. Unfortunately, the normal plot for the sample from Machine 2 does not appear to be a sample from a normal population. So as to have a computational example, we shall continue and finish the test, treating the result with a “grain of salt.”

P-Value: 0.692A-Squared: 0.235

Anderson-Darling Normality Test

N: 8StDev: 0.0230078Average: 12.0248

12.05512.04512.03512.02512.01512.00511.99511.985

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

bab

ility

Machine 1

Normal Probability Plot

P-Value: 0.031A-Squared: 0.729

Anderson-Darling Normality Test

N: 7StDev: 0.0198890Average: 12.0007

12.0312.0212.0112.0011.99

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

bab

ility

Machine 2

Normal Probability Plot

P-Value: 0.702A-Squared: 0.237

Anderson-Darling Normality Test

N: 9StDev: 0.0164958Average: 12.0259

12.0612.0512.0412.0312.0212.0112.00

.999

.99

.95

.80

.50

.20

.05

.01

.001

Pro

bab

ility

Machine 3

Normal Probability Plot

Page 22: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

22 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example7. Computation:

1 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

1 1 1n 8, x 12.0248, s 0.02301

2 2 2n 7, x 12.0007, s 0.01989

3 3 3n 9, x 12.0259, s 0.01650

SSE 0.00825582SSTr 0.00301552 SSE 0.00825582SSTr 0.00301552

x 12.018167

MSTr 0.0015078 MSE 0.00039313MSTr 0.0015078 MSE 0.00039313

1 2 3N n n n 8 7 9 24, k 3

1

2

MSTr 0.0015078F 3.835

MSE 0.00039313df treatment df k 1 3 1 2

df error df N k 24 3 21

Page 23: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

23 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

8. P-value:

From the F table with numerator df1 = 2 and denominator df2 = 21 we can see that

0.025 < P-value < 0.05

(Minitab reports this value to be 0.038

3.835

dfden / dfnum 2

21 0.100 2.570.050 3.470.025 4.420.010 5.780.001 9.77

3.8353.835

dfden / dfnum 2

21 0.100 2.570.050 3.470.025 4.420.010 5.780.001 9.77

Example

1

2

MSTr 0.0015078F 3.835

MSE 0.00039313df treatment df k 1 3 1 2

df error df N k 24 3 21

Recall

1

2

MSTr 0.0015078F 3.835

MSE 0.00039313df treatment df k 1 3 1 2

df error df N k 24 3 21

Recall

Page 24: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

24 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example9. Conclusion:

Since P-value > = 0.01, we fail to reject H0. We are unable to show that the mean fills are different and conclude that the differences in the mean fills of the machines show no statistically significant differences.

Page 25: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

25 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Total Sum of Squares

The relationship between the three sums of squares is SSTo = SSTr + SSEwhich is often called the fundamental identity for single-factor ANOVA.

Informally this relation is expressed as

Total variation = Explained variation + Unexplained variation

Total sum of squares, denoted by SSTo, is given by

with associated df = N - 1.

all N obs.

2SSTo (x x)

Total sum of squares, denoted by SSTo, is given by

with associated df = N - 1.

all N obs.

2SSTo (x x)

Page 26: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

26 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Single-factor ANOVA TableThe following is a fairly standard way of presenting the important calculations from an single-factor ANOVA. The output from most statistical packages will contain an additional column giving the P-value.

Page 27: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

27 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Single-factor ANOVA TableThe ANOVA table supplied by Minitab

One-way ANOVA: Fills versus Machine

Analysis of Variance for Fills Source DF SS MS F PMachine 2 0.003016 0.001508 3.84 0.038Error 21 0.008256 0.000393Total 23 0.011271

Page 28: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

28 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another ExampleA food company that sells iced tea, produces 4 different flavored sweetened iced teas (lemon, raspberry, peach, green tea). A dietician working for the company needed to determine if the current formulations gave the same mean sodium levels for the four flavors. In order to determine if the four flavors had the same sodium levels, 15 bottles of each flavor were randomly (and independently) obtained and the sodium content in milligrams (mg) per 12 ounce serving was measured. The sample data are given on the next slide. Use the data to perform an appropriate hypothesis test at the 0.05 level of significance.

Page 29: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

29 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another ExampleFlavor 135.0 35.6 34.1 39.6 35.6 32.3 36.6 34.535.2 33.8 36.7 37.2 34.0 33.8 35.8

Flavor 237.3 37.4 38.3 34.9 39.0 36.5 36.9 37.634.9 40.4 37.5 33.5 38.2 34.6 34.5

Flavor 335.2 33.4 34.5 38.1 36.2 35.4 38.5 31.536.7 35.6 36.7 39.3 36.8 31.5 33.2

Flavor 435.4 35.7 31.4 34.5 34.1 31.2 37.5 37.331.7 33.2 33.8 35.8

Page 30: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

30 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example1. Let 1, 2 , 3 and 4 denote the true mean

sodium content per 12 ounce serving of iced tea for each of the 4 flavors (lemon, raspberry, peach, green tea) in that order, respectively

2. H0: 1 = 2 = 3 = 4

3. Ha: At least two among are 1, 2, 3 and 4

different

4. Significance level: = 0.05

5. Test statistic: MSTrF

MSE5. Test statistic: MSTr

FMSE

Page 31: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

31 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

6. Looking at the following comparative boxplot, it seems reasonable to assume that the distributions have the same ’s as well as the samples being samples from normal distributions (i.e., It is reasonable to assume that the distributions of sodium content per 12 ounce serving are normal for each of the four flavors.

Fla

vor

4

Fla

vor

3

Fla

vor

2

Fla

vor

1

40

35

30

Flavor

Sod

ium

con

ten

t

Boxplots of Sodium by Flavor(means are indicated by solid circles)

mg

/ q

w o

un

ce s

ervi

ng

Another Example

Page 32: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

32 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example7. Computation:Flavor k si

Flavor 1 15 35.320 1.764Flavor 2 15 36.767 1.929Flavor 3 15 35.507 2.361Flavor 4 15 33.893 2.421

xi

x 35.3722 2 2 2

1 1 2 2 3 3 4 4

2 2

2 2

SSTr n (x x) n (x x) n (x x) n (x x)

15(35.320 35.372) 15(36.767 35.372)

15(35.507 35.372) 15(33.893 35.372)

64.673

Treatment df = k - 1 = 4 - 1 = 3

Page 33: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

33 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example

7. Computation (continued):

2 2 2 21 1 2 2 3 3 4 4

2 2 2 2

SSE n 1 s n 1 s n 1 s n 1 s

14(1.764) 14(1.929) 14(2.361) 14(2.421)

255.74

Error df = N - k = 60 - 4 = 56

SSTr

SSE

SSTr 64.673dfMSTr 3F 4.72SSE 255.74MSE

df 56

Page 34: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

34 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example8. P-value:

F = 4.27 with dfnumerator = 3 and dfdenominator = 56

dfden / dfnum 3

60 0.100 2.180.050 2.760.025 3.340.010 4.130.001 6.17 4.27

Using df = 60 (the closest entry to 56 in the table) we find

0.001 < P-value < 0.01

Page 35: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

35 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Another Example9. Conclusion:

Since P-value < = 0.05, we reject H0. We can conclude that the mean sodium content is different for at least two of the flavors.

We need to learn how to interpret the results and will spend some time on developing techniques to describe the differences among the ’s.

Page 36: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

36 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Multiple ComparisonsA multiple comparison procedure is a method for identifying differences among the ’s once the hypothesis of overall equality (H0) has been rejected.

The technique we will present is based on computing confidence intervals for difference of means for the pairs.

Specifically, if k populations or treatments are studied, we would create k(k-1)/2 differences. (i.e., with 3 treatments one would generate confidence intervals for 1 - 2, 1 - 3 and 2 - 3.) Notice that it is only necessary to look at a confidence interval for 1 - 2 to see if 1 and 2 differ.

Page 37: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

37 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The Tukey-Kramer Multiple Comparison ProcedureWhen there are k populations or treatments being compared, k(k-1)/2 confidence intervals must be computed. If we denote the relevant Studentized range critical value by q, the intervals are as follows:

For i - j:

Two means are judged to differ significantly if the corresponding interval does not include zero.

i ji j

MSE 1 1( ) q

2 n n

When there are k populations or treatments being compared, k(k-1)/2 confidence intervals must be computed. If we denote the relevant Studentized range critical value by q, the intervals are as follows:

For i - j:

Two means are judged to differ significantly if the corresponding interval does not include zero.

i ji j

MSE 1 1( ) q

2 n n

Page 38: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

38 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The Tukey-Kramer Multiple Comparison Procedure

When all of the sample sizes are the same, we denote n by n = n1 = n2 = n3 = … = nk, and the confidence intervals (for i - j) simplify to

i j

MSE( ) q

n

When all of the sample sizes are the same, we denote n by n = n1 = n2 = n3 = … = nk, and the confidence intervals (for i - j) simplify to

i j

MSE( ) q

n

Page 39: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

39 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example (continued)Continuing with example dealing with the sodium content for the four flavors of iced tea we shall compute the Tukey-Kramer 95% Tukey-Kramer confidence intervals for 1 - 2, 1 - 3, 1 - 4, 2 - 3, 2 - 3 and 3 - 4.

1 2 3 4

255.74MSE 4.567, n n n n n 15

56q 3.74 approximation with df = 60 rather than 56

MSE 4.567q 3.74 2.06

n 15 /15)

Page 40: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

40 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example (continued)

Notice that the confidence interval for 2 - 4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ.

Difference 95% Confidence

Limits 95% Confidence

Interval

1 – 2 = -1.45 -1.45 ± 2.06 (-3.51, 0.61)

1 – 3 = -0.19 -0.19 ± 2.06 (-2.25, 1.87)

1 – 4 = 1.43 1.43 ± 2.06 (-0.63, 3.49)

2 – 3 = 1.26 1.26 ± 2.06 (-0.8, 3.32)

2 – 4 = 2.87 2.87 ± 2.06 (0.81, 4.93)

3 – 4 = 1.61 1.61 ± 2.06 (-0.45, 3.67)

Page 41: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

41 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example (continued)Notice that the confidence interval for 2 - 4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ.

We also illustrate the differences with the following listing of the sample means in increasing order with lines underneath those blocks of means that are indistinguishable.

Flavor 4 Flavor 1 Flavor 3 Flavor 2

33.893 35.320 35.507 36.767

Notice that the confidence interval for 2 - 4 does not contain 0 so we can infer that the mean sodium content for flavors 2 and 4 differ.

Page 42: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

42 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Minitab Output for Example

One-way ANOVA: Sodium versus Flavor

Analysis of Variance for Sodium Source DF SS MS F PFlavor 3 62.29 20.76 4.55 0.006Error 56 255.74 4.57Total 59 318.02 Individual 95% CIs For Mean Based on Pooled StDevLevel N Mean StDev --+---------+---------+---------+----Flavor 1 15 35.320 1.764 (------*-------) Flavor 2 15 36.767 1.929 (------*------) Flavor 3 15 35.507 2.361 (-------*------) Flavor 4 15 33.893 2.421 (------*------) --+---------+---------+---------+----Pooled StDev = 2.137 33.0 34.5 36.0 37.5

Page 43: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

43 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Minitab Output for Example

Tukey's pairwise comparisons

Family error rate = 0.0500Individual error rate = 0.0106

Critical value = 3.74

Intervals for (column level mean) - (row level mean)

Flavor 1 Flavor 2 Flavor 3

Flavor 2 -3.510 0.617

Flavor 3 -2.250 -0.804 1.877 3.324

Flavor 4 -0.637 0.810 -0.450 3.490 4.937 3.677

Page 44: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

44 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Simultaneous Confidence Level

The Tukey-Kramer intervals are created in a manner that controls the simultaneous confidence level.

For example at the 95% level, if the procedure is used repeatedly on many different data sets, in the long run only about 5% of the time would at least one of the intervals not include that value of what it is estimating.

We then talk about the family error rate being 5% which is the maximum probability of one or more of the confidence intervals of the differences of mean not containing the true difference of mean.

Page 45: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

45 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Randomized Block Experiment

Suppose that experimental units (individuals or objects to which the treatments are applied) are first separated into groups consisting of k units in such a way that the units within each group are as similar as possible. Within any particular group, the treatments are then randomly allocated so that each unit in a group receives a different treatment. The groups are often called blocks and the experimental design is referred to as a randomized block design.

Page 46: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

46 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example

When choosing a variety of melon to plant, one thing that a farmer might be interested in is the length of time (in days) for the variety to bear harvestable fruit. Since the growing conditions (soil, temperature, humidity) also affect this, a farmer might experiment with three hybrid melons (denoted hybrid A, hybrid B and hybrid C) by taking each of the four fields that he wants to use for growing melons and subdividing each field into 3 subplots (1, 2 and 3) and then planting each hybrid in one subplot of each field. The blocks are the fields and the treatments are the hybrid that is planted. The question of interest would be “Are the mean times to bring harvestable fruit the same for all three hybrids?”

Page 47: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

47 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Assumptions and HypothesesThe single observation made on any particular treatment in a given block is assumed to be selected from a normal distribution. The variance of this distribution is 2, the same for each block-treatment combinations. However, the mean value may depend separately both on the treatment applied and on the block. The hypotheses of interest are as follows:

H0: The mean value does not depend on which treatment is applied

Ha: The mean value does depend on which treatment is applied

Page 48: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

48 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of the Randomized Block F Test

Notation: Let

k = number of treatments

l = number of blocks

= average if all observations for treatment i

= average of all observations in block I

= average of all kl observations in the experiment (the grand mean)

ix

ib

x

Notation: Let

k = number of treatments

l = number of blocks

= average if all observations for treatment i

= average of all observations in block I

= average of all kl observations in the experiment (the grand mean)

ix

ib

x

Page 49: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

49 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of the Randomized Block F TestSum of squares and associated df’s are as follows.

Sum of Squares Symbol df Formula

Treatments SSTr k –1

2 2 2

1 2 kl x x x x x x

Blocks SSBl l -1

2 2 2

1 2 lk b x b x b x

Error SSE (k – 1)(l –

1) by subtraction

Total SSTo kl - 1 all kl obs.

2(x x)

Page 50: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

50 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Summary of the Randomized Block F Test

SSE is obtained by subtraction through the use of the fundamental identity

SSTo = SSTr + SSBl + SSE

where

The test is based on df1 = k - 1 and df2 = (k - 1)(l - 1)

Test statistic: MSTrF

MSE

SSTr SSEMSTr and MSE

k 1 (k 1)(l 1)

SSE is obtained by subtraction through the use of the fundamental identity

SSTo = SSTr + SSBl + SSE

where

The test is based on df1 = k - 1 and df2 = (k - 1)(l - 1)

Test statistic: MSTrF

MSETest statistic: MSTr

FMSE

SSTr SSEMSTr and MSE

k 1 (k 1)(l 1)

Page 51: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

51 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

The ANOVA Table for a Randomized Block ExperimentS o u r c e o f V a r i a t i o n d f

S u m o f S q u a r e s M e a n S q u a r e F

T r e a t m e n t s k – 1 S S T r

S S T rM S T r

k 1

M S T rF

M S E

B l o c k s l - 1 S S B l

S S B lM S B l

l 1

E r r o r ( k – 1 ) ( l – 1 ) S S E

S S EM S E

( k 1) ( l 1)

T o t a l k l - 1 S S T o

Page 52: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

52 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example (Food Prices)In an attempt to measure which of 3 grocery chains has the best overall prices, it was felt that there would be a great deal of variability of prices if items were randomly selected from each of the chains, so a randomized block experiment was devised to answer the question. A list of standard items was developed (typically a fairly large list would be used, but in the interest of providing a small example, 15 items were chosen and the price was suppose to be recorded for each of these items in each of the stores. There were problems with the collection of data so that only 7 items appeared in all the stores. The data is given in the next slide.

Page 53: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

53 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example (Food Prices)

Product Store

Tops Wegmans Walmart

Tide (100 oz liquid detergent) $6.39 $5.59 $5.24 1 lb Land O'Lakes Butter $3.99 $3.49 $2.98 1 dozen Large Grade AA eggs $1.49 $1.49 $.72 Tropicana (no pulp, non-conc) OJ (64 oz) $3.99 $2.99 $2.50 2 Liter Diet Coke $1.39 $1.50 $1.04 1 loaf Wonderbread $2.09 $2.09 $1.43 18 oz jar Skippy Peanout Butter $2.49 $2.49 $1.77

Page 54: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

54 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

ANOVA Table

The resulting P-value for the treatments (stores) is 0.000. Our interpretation is that the stores have different means costs for the selected items. We need to do some multiple comparisons to determine the actual differences.

Source of Variation df

Sum of Squares

Mean Square F

Total 20 48.6355

7.5217

0.0607

1.3881 22.85Treatments (Store) 2 2.7762

Error 12 0.7289

Blocks 6 45.1303

Page 55: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

55 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Multiple Comparisons

As before, in single-factor ANOVA, once H0has been rejected, declare that treatments I and j differ significantly if the interval

does not include zero, where q is based on a comparison of k treatments and error df = (k - 1)(l - 1).

i j

MSE( ) q

l

As before, in single-factor ANOVA, once H0has been rejected, declare that treatments I and j differ significantly if the interval

does not include zero, where q is based on a comparison of k treatments and error df = (k - 1)(l - 1).

i j

MSE( ) q

l

Page 56: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

56 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example Store Mean PriceWalmart $2.240 Wegmans $2.806 Tops $3.119

With 3 populations and dferror = 12, the q value from the Table of Critical Values for the Studentized range distribution for a 95% Tukey confidence interval 3.77, so

and the intervals are

MSE 0.0607q 3.77 0.351

l 7

i - j Interval

Weg -Wal = $0.566 (0.215, 0.917)

Tops -Wal = $0.879 (0.528, 1.230)

Tops -Weg = $0.313 (-0.038, 0.664)

Page 57: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

57 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Example

The resulting schematic display is

Store Mean PriceWalmart $2.240 Wegmans $2.806 Tops $3.119

i - j Interval

Weg -Wal = $0.566 (0.215, 0.917)

Tops -Wal = $0.879 (0.528, 1.230)

Tops -Weg = $0.313 (-0.038, 0.664)

Walmart Wegmans Tops$2.240 $2.806 $3.119

With the P-value = 0.000, we have established that the mean cost of the selected items is lower at Walmart than Wegmans or Tops, but we have not shown a significant difference between Wegmans and Tops.

Page 58: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

58 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVANotation:

k = number of levels of factor A

l = number of levels of factor B

kl = number of treatments (each one a combination of a factor A level and a factor B level)

m = number of observations on each treatment

Page 59: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

59 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA ExampleA grocery store has two stocking supervisors, Fred & Wilma. The store is open 24 hours a day and would like to schedule these two individuals in a manner that is most effective. To help determine how to schedule them, a sample of their work was obtained by scheduling each of them for 5 times in each of the three shifts and then tracked the number of cases of groceries that were emptied and stacked during the shift. The data follows on the next slide.

Page 60: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

60 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA Example

Supervisor Day Swing Night495 547 481 457 500 578 504 496 485607 517 515 428 518 497481 520 498 508 471 560 572 550 583533 507 518 578 625 598

Shift

Fred

Wilma

Page 61: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

61 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

InteractionsThere is said to be an interaction between the factors, if the change in true average response when the level of one factor changes depend on the level of the other factor.

One can look at the possible interaction between two factors by drawing an interactions plot, which is a graph of the means of the response for one factor plotted against the values of the other factor.

Page 62: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

62 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA Example

Supervisor Day Swing NightFred 529.40 495.60 500.00 508.33Wilma 507.80 527.00 585.60 540.13Mean Output for Each Shift

518.60 511.30 542.80 524.23

Mean Output for Each

Supervisor

Shift

A table of the sample means for the 30 observations.

Page 63: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

63 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA ExampleTypically, only one of these interactions plots will be constructed. As you can see from these diagrams, there is a suggestion that Fred does better during the day and Wilma is better at night or during the swing shift. The question to ask is “Are these differences significant?” Specifically is there an interaction between the supervisor and the shift.

Fred Wilma

SwingNightDay

590

580

570

560

550

540

530

520

510

500

Shift

Supervisor

Mea

n

Interaction Plot - Data Means for Cases

Day Night Swing

WilmaFred

590

580

570

560

550

540

530

520

510

500

Supervisor

Shift

Mea

n

Interaction Plot - Data Means for Cases

Page 64: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

64 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

InteractionsIf the graphs of true average responses are connected line segments that are parallel, there is no interaction between the factors. In this case, the change in true average response when the level of one factor is changed is the same for each level of the other factor. Special cases of no interaction are as follows:

1.The true average response is the same for each level of factor A (no factor A main effects).

2.The true average response is the same for each level of factor B (no factor B main effects).

Page 65: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

65 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Basic Assumptions for Two-Factor ANOVA

The observations on any particular treatment are independently selected from a normal distribution with variance 2 (the same variance for each treatment), and samples from different treatments are independent of one another.

Page 66: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

66 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA TableThe following is a fairly standard way of presenting the important calculations for an two-factor ANOVA.

The fundamental identity isSSTo = SSA + SSB + SSAB +SSE

Page 67: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

67 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA ExampleMinitab output for the Two-Factor ANOVA

Two-way ANOVA: Cases versus Shift, Supervisor

Analysis of Variance for Cases Source DF SS MS F PShift 2 5437 2719 1.82 0.184Supervis 1 7584 7584 5.07 0.034Interaction 2 14365 7183 4.80 0.018Error 24 35878 1495Total 29 63265

1. Test of H0: no interaction between supervisor and Shift

With the p-value of 0.018 there is strong evidence of an interaction.

We go no further and draw the conclusion that this is an interactions.

Page 68: Chapter 15 The Analysis of Variance. 2 A study was done on the survival time of patients with advanced cancer of the stomach, bronchus, colon, ovary or.

68 Copyright (c) 2001 Brooks/Cole, a division of Thomson Learning, Inc.

Two-Factor ANOVA ExampleLooking at either of the interaction plots it becomes clear that from a practical standpoint, Fred should be scheduled for days and if someone is to be scheduled for the night or swing shifts it should be Wilma, although it appears that Wilma would do best at night.

Fred Wilma

SwingNightDay

590

580

570

560

550

540

530

520

510

500

Shift

Supervisor

Mea

n

Interaction Plot - Data Means for Cases