Bowerman Experimental Chptr 1

Experimental Design

Experimental Design

Unified Concepts, Practical Applications, and Computer Implementation

Bruce L. Bowerman, Richard T. OConnell, and Emily S. Murphree

Experimental Design: Unified Concepts, Practical Applications, and Computer ImplementationCopyright Business Expert Press, LLC, 2015.All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any meanselectronic, mechanical, photocopy, recording, or any other except for brief quotations, not to exceed 400 words, without the prior permission of the publisher.

First published in 2015 byBusiness Expert Press, LLC222 East 46th Street, New York, NY 10017www.businessexpertpress.com

ISBN-13: 978-1-60649-958-0 (paperback)ISBN-13: 978-1-60649-959-7 (e-book)

Business Expert Press Quantitative Approaches to Decision Making Collection

Collection ISSN: 2163-9515 (print)Collection ISSN: 2163-9582 (electronic)

Cover and interior design by Exeter Premedia Services Private Ltd., Chennai, India

First edition: 2015

10 9 8 7 6 5 4 3 2 1

Printed in the United States of America.

Abstract

Experimental Design: Unified Concepts, Practical Applications, and Com-puter Implementation is a concise and innovative book that gives a complete presentation of the design and analysis of experiments in approximately one half the space of competing books. With only the modest prerequisite of a basic (noncalculus) statistics course, this text is appropriate for the widest possible audience.

Keywords

experimental design, fractional factorials, Latin square designs, nested designs, one factor analysis, one-way ANOVA, randomized block design, response surfaces, split plot design, two factor analysis, two level factorial designs, two-way ANOVA

Contents

Preface ix

Chapter 1 An Introduction to Experimental Design: One Factor Analysis ....................................................... 1

Chapter 2 Two Factor Analysis .................................................... 45

Chapter 3 More Advanced Experimental Designs ......................125

Chapter 4 Two Level Factorials, Fractional Factorials, Block Confounding, and Response Surfaces ..............179

Appendix A Statistical Tables .........................................................249

References 257Index 259

Preface

Experimental Design: Unified Concepts, Practical Applications, and Com-puter Implementation is a concise and innovative book that gives a complete presentation of the design and analysis of experiments in approximately one half the space of competing books. With only the modest prereq-uisite of a basic (noncalculus) statistics course, this text is appropriate for the widest possible audiencecollege juniors, seniors, and first year graduate students in business, the social sciences, the sciences and statis-tics, as well as professionals in business and industry. Using a unique and integrative approach, this text organizes and presents the two procedures for analyzing experimental design dataanalysis of variance (ANOVA) and regression analysisin such a way that the reader or instructor can move through the material more quickly and efficiently than when using competing books and so that the true advantages of both ANOVA and regression analysis are made clearer.

Because ANOVA is more intuitive, this book devotes most of its first three chapters to showing how to use ANOVA to analyze the type of experimental design data that it can be validly used to analyze: balanced (equal sample size) data or unbalanced (unequal sample size) data from one factor studies, balanced data from two factor studies (two-way factorials and randomized block designs), and balanced data from three or more factor studies. Chapter 3 includes a general ANOVA procedure for analyzing balanced data experiments.

Regression analysis can be used to analyze almost any balanced or unbalanced data experiment but is less intuitive than ANOVA. Therefore, this book waits to discuss regression analysis until it is needed to analyze data that cannot be analyzed by ANOVA. This is in Section 2.4, where the analysis of unbalanced data resulting from two-way factorials is dis-cussed. Waiting until Section 2.4, gives more space to explain regression analysis from first principles to readers who have little or no background in this subject and also allows concise discussion of the regression anal-yses of one factor studies and incomplete block designs. Section 2.4 also

x PREFACE

introduces using regression to analyze experimental designs employing a covariate (the analysis of covariance), which is discussed in detail in the companion book to this book: Regression Analysis: Unified Concepts, Practical Applications, and Computer Implementation Readers who wish to study all of the ANOVA procedures in Chapters 1, 2, and 3 before studying regression analysis may skip Section 2.4 without loss of continu-ity. Such readers would study Section 2.4 after completing Chapter 3 and then proceed to Chapter 4. Chapter 4 gives (in our opinion) the clearest and most informative discussion in any book of using regression analysis to both understand and analyze data resulting from fractional factorial and block confounding experiments. Chapter 4 also gives a short discus-sion of response surface methodology. In addition, all chapters feature motivating examples and conclude with a section showing how to use SAS and with a set of exercises. Excel, MINITAB, and SAS outputs are used throughout the text, and the books website contains more exercises for each chapter.

Author Bruce Bowerman would like to thank Professor David Nickerson of the University of Central Florida for motivating the writing of this book. Author Bowerman would also like to thank Professor John Skillings of Miami University for many helpful discussions concerning experimental design. In this book we have used some examples and ideas from an excellent, more theoretical experimental design book written by Professor Skillings and Professor Don Weber (see the references). All three authors would like to thank editor Scott Isenberg, production manager Destiny Hadley, and permission editor Marcy Schnidewind, as well as the fine people at Exeter, for their hard work. Most of all, we are indebted to our families for their love and encouragement over the years.

Bruce L. BowermanRichard T. OConnell

Emily S. Murphree

CHAPTER 1

An Introduction to Experimental Design: One Factor Analysis

1.1 Basic Concepts of Experimental Design

In many statistical studies a variable of interest, called the response variable (or dependent variable), is identified. Then data are collected that tell us about how one or more factors (or independent variables) influence the variable of interest. If we cannot control the factor(s) being studied, we say that the data obtained are observational. For example, suppose that in order to study how the size of a home relates to the sales price of the home, a real estate agent randomly selects 50 recently sold homes and records the square footages and sales prices of these homes. Because the real estate agent cannot control the sizes of the randomly selected homes, we say that the data are observational.

If we can control the factors being studied, we say that the data are experimental. Furthermore, in this case the values, or levels, of the factor (or combination of factors) are called treatments. The purpose of most experiments is to compare and estimate the effects of the differ-ent treatments on the response variable. For example, suppose that an oil company wishes to study how three different gasoline types (A, B, and C ) affect the mileage obtained by a popular compact automobile model. Here the response variable is gasoline mileage, and the company will study a single factorgasoline type. Because the oil company can control which gasoline type is used in the compact automobile, the data that the oil company will collect are experimental. Furthermore, the

2 EXPERIMENTAL DESIGN

treatmentsthe levels of the factor gasoline typeare gasoline types A, B, and C .

In order to collect data in an experiment, the different treatments are assigned to objects (people, cars, animals, or the like) that are called experimental units. For example, in the gasoline mileage situation, gas-oline types A, B, and C will be compared by conducting mileage tests using a compact automobile. The automobiles used in the tests are the experimental units. In general, when a treatment is applied to more than one experimental unit, it is said to be replicated. Furthermore, when the analyst controls the treatments employed and how they are applied to the experimental units, a designed experiment is being carried out. A commonly used, simple experimental design is called the completely randomized experimental design.

In a completely randomized experimental design, independent random samples of experimental units are assigned to the treatments. As illustrated in the following example, we can sometimes assign inde-pendent random samples of experimental units to the treatments by assigning different random samples of experimental units to different treatments.

Example 1.1

North American Oil Company is attempting to develop a reasonably priced gasoline that will deliver improved gasoline mileages. As part of its development process, the company would like to compare the effects of three types of gasoline (A, B, and C ) on gasoline mileage. For test-ing purposes, North American Oil will compare the effects of gasoline types A, B, and C on the gasoline mileage obtained by a popular compact model called the Lance. Suppose the company has access to 1,000 Lances that are representative of the population of all Lances, and suppose the company will utilize a completely randomized experimental design that employs samples of size five. In order to accomplish this, five Lances will be randomly selected from the 1,000 available Lances. These autos will be assigned to gasoline type A. Next, five different Lances will be randomly selected from the remaining 995 available Lances. These autos will be assigned to gasoline type B. Finally, five different Lances will be randomly

AN INTRODUCTION TO EXPERIMENTAL DESIGN 3

selected from the remaining 990 available Lances. These autos will be assigned to gasoline type C .

Each randomly selected Lance is test driven using the appropriate gas-oline type (treatment) under normal conditions for a specified distance, and the gasoline mileage for each test drive is measured. We let yij denote the jth mileage obtained when using gasoline type i. The mileage data obtained are given in Table 1.1. Here we assume that the set of gasoline mileage observations obtained by using a particular gasoline type is a sam-ple randomly selected from the infinite population of all Lance mileages that could be obtained using that gasoline type. Examining the box plots shown below the mileage data, we see some evidence that gasoline type B yields the highest gasoline mileages.

1.2 One-Way Analysis of Variance

Suppose we wish to study the effects of p treatments (treatments 1, 2,..., p) on a response variable. For any particular treatment, say treatment i, we define mi and si to be the mean and standard deviation of the population of all possible values of the response variable that could potentially be observed when using treatment i. Here we refer

Gasoline type A

yA1

= 34.0

yA2

= 35.0

yA3

= 34.3

yA4

= 35.5

yA5

= 35.8

Gasoline type C

yC1

= 33.3

yC2

= 34.0

yC3

= 34.7

yC4

= 33.0

yC5

= 34.9

Gasoline type B

yB1

= 35.3

yB2

= 36.5

yB3

= 36.4

yB4

= 37.0

yB5

= 37.6

3837

36

353433

A B CGas type

Mile

age

Table 1.1 The Gasoline Mileage Data


to mi as treatment mean i. The goal of one-way analysis of variance (often called one-way ANOVA) is to estimate and compare the effects of the different treatments on the response variable. We do this by esti-mating and comparing the treatment means m m m1, ,,2 p. Here we assume that a sample has been randomly selected for each of the p treatments by employing a completely randomized experimental design. We let ni denote the size of the sample that has been randomly selected for treat-ment i, and we let yij denote the jth value of the response variable that is observed when using treatment i. The one factor model describing yij says that

yij i ijiji

= += + +

mm t

ee

Here, m m ti i= + is treatment mean i, m is a parameter common to all treatments called the overall mean, and ti is a parameter unique to the ithtreatment called the ith treatment effect. Furthermore, eij is an error term that tells us by how much yij deviates from mi. This error term describes the effects of all factors other than treatment i on yij. To give precise defi-nitions of m and ti, we impose the side condition that says that

tii

p

== 0

1

This implies that m., which we define to be the mean of the treatment means, is

m

m m t m tm. = =

+=

+== = =

ii

p

ii

p

ii

p

p p

p

p1 1 1

( )

That is, the previously considered overall mean m is equal to m., the mean of the treatment means. Moreover, because m m ti i= + the treatment effect ti is equal to m m m mi i = ., the difference between the ith treat-ment mean and the mean of the treatment means.


The point estimate of the ith treatment mean mi is

yy

niij

j

n

i

i

= =

1

the mean of the sample of ni values of the response variable observed when using treatment i. Moreover, the point estimate of si, the standard deviation of the population of all possible values of the response variable that could potentially be observed when using treatment i, is

sy y

niij i

j

n

i

i

=

=

( )21

1

the standard deviation of the sample of ni values of the response variable observed when using treatment i.

For example, consider the gasoline mileage situation. We let mA, mB, and mC denote the means and sA, sB, and sC denote the standard deviations of the populations of all possible gasoline mileages using gasoline types A, B, and C . To estimate these means and standard deviations, North American Oil has employed a completely randomized experimental designand has obtained the samples of mileages in Table 1.1. The means of these samples y y yA B C= = =34 92 36 56 33 98. , . , .and are the point estimates of m m mA B C, , . and The standard deviations of these sampless s sA B C= = =. , . , .7662 8503 8349 and are the point esti-mates of sA, sB, and sC. Using these point estimates, we will (later in this section) test to see whether there are any statistically significant dif-ferences between the treatment means mA, mB, and mC. If such differences exist, wewill estimate the magnitudes of these differences. This will allow North American Oil to judge whether these differences have practical importance.

The one-way ANOVA formulas allow us to test for significant dif-ferences between treatment means and allow us to estimate differences


between treatment means. The validity of these formulas requires that the following three ANOVA assumptions hold:

1. Constant variancethe p populations of values of the response vari-able associated with the treatments have equal variances. We denote the constant variance as s 2.

2. Normalitythe p populations of values of the response variable associated with the treatments all have normal distributions.

3. Independencethe different yij response variable values are statisti-cally independent of each other.

Because the previously described process of randomly assigning experi-mental units to the treatments implies that each yij can be assumed to be a randomly selected response variable value, the ANOVA assumptions say that each yij is assumed to have been randomly and independently selected from a population of response variable values that is normally distributed with mean mi and variance s

2. Stated in terms of the error term of the one factor model yij i ij= +m e , the ANOVA assumptions say that each eij is assumed to have been randomly and independently selected from a population of error term values that is normally distributed with mean zero and variance s2.

The one-way ANOVA results are not very sensitive to violations of the equal variances assumption. Studies have shown that this is partic-ularly true when the sample sizes employed are equal (or nearly equal). Therefore, a good way to make sure that unequal variances will not be a problem is to take samples that are the same size. In addition, it is useful to compare the sample standard deviations s s s p1 2, ,..., to see if they are reasonably equal. As a general rule, the one-way ANOVA results will be approximately correct if the largest sample standard deviation is no more than twice the smallest sample standard deviation. The variations of the samples can also be compared by constructing a box plot for each sample (as we have done for the gasoline mileage data in Table 1.1). Several statistical tests also employ the sample variances to test the equality of the popula-tion variances. See Section 1.3.

The normality assumption says that each of the p populations is nor-mally distributed. This assumption is not crucial. It has been shown that


the one-way ANOVA results are approximately valid for mound-shaped distributions. It is useful to construct a box plot or a stem-and-leaf dis-play for each sample. If the distributions are reasonably symmetric, and if there are no outliers, the ANOVA results can be trusted for sample sizes as small as 4 or 5. As an example, consider the gasoline mileage study of Example 1.1. The box plots of Table 1.1 suggest that the variability of the mileages in each of the three samples is roughly the same. Furthermore, the sample standard deviations sA = .7662, sB = .8503, and sC = .8349 are reasonably equal (the largest is not even close to twice the smallest). Therefore, it is reasonable to believe that the constant variance assump-tion is satisfied. Moreover, because the sample sizes are the same, unequal variances would probably not be a serious problem anyway. Many small, independent factors influence gasoline mileage, so the distributions of mileages for gasoline types A, B, and C are probably mound-shaped. In addition, the box plots of Table 1.1 indicate that each distribution is roughly symmetric with no outliers. Thus, the normality assumption probably approximately holds. Finally because North American Oil employed a completely randomized design, with each of the fifteen differ-ent response variable values (gasoline mileages) being obtained by using a different experimental unit (car), the independence assumption probably holds.

1.2.1 Testing for Significant Differences Between Treatment Means

As a preliminary step in one-way ANOVA, we wish to determine whether there are any statistically significant differences between the treatment means m m m1 2, ,..., p. To do this, we test the null hypothesis

H p0 1 2: ...m m m= = =

which says that the p treatment means are equal. Moreover, because we have seen that the ith treatment effect ti is equal to m mi , the null hypothesis H p0 1 2: m m m= = = is equivalent to the null hypothesis

H p0 1 2 0: ....t t t= = = =


That is, the null hypothesis H0 of equal treatment means is equivalent to the null hypothesis that all treatment effects are zero, which says that all the treatments have the same effect on the mean response. We test H0 versus the alternative hypothesis

Ha p: ,...,, At least two of m m m1 2 differ

or, equivalently,

Ha p: , ,, At least two of t t t1 2 do not equal zero

This alternative says that at least two treatments have different effects on the mean response.

To carry out such a test, we compare what we call the between- treatment variability to the within-treatment variability. We can understand and numerically measure these two types of variability by defining several sums of squares and mean squares. To begin to do this we define n to be the total number of experimental units employed in the one-way ANOVA, and we define y to be the overall mean of all observed values of the response variable. Then we define the following:

The treatment sum of squares is

SST n y yi ii

p

= = ( )2

1

In order to compute SST , we calculate the difference between each sam-ple treatment mean yi and the overall mean y , we square each of these differences, we multiply each squared difference by the number of obser-vations for that treatment, and we sum over all treatments. The SST mea-sures the variability of the sample treatment means. For instance, if all the sample treatment means ( yi values) were equal, then the treatment sum of squares would be equal to 0. The more the yi values vary, the larger will


be SST . In other words, the treatment sum of squares measures the amount of between-treatment variability.

As an example, consider the gasoline mileage data in Table 1.1. In this experiment we employ a total of

n n n nA B C= + + = + + =5 5 5 15

experimental units. Furthermore, the overall mean of the 15 observed gasoline mileages is

y = + + + = =34 0 35 0 34 915

527 315

35 153. . ... . .

.

Then

SST n y y

n y y n y y n y y

ii A B C

i

A A B B C C

=

= + +

=

= ( )( ) ( ) ( )

( .

, ,

2

2 2 2

5 34 922 35 153 5 36 56 35 153 5 33 98 35 15317 0493

2 2 2 + + =

. ) ( . . ) ( . . ).

In order to measure the within-treatment variability, we define the following quantity:

The error sum of squares is

SSE y y y y y yjj

n

j pj pj

n

j

n p

= + + + = ==

( ) ( ) ... ( )1 1 21

2 22 2

11

1 2

Here y j1 is the jth observed value of the response in the first sample, y j2 is the jth observed value of the response in the second sample, and so forth. The previous formula says that we compute SSE by calculating the squared difference between each observed value of the response and its


corresponding sample treatment mean and by summing these squared differences over all the observations in the experiment.

The SSE measures the variability of the observed values of the response variable around their respective sample treatment means. For example, if there were no variability within each sample, the error sum of squares would be equal to 0. The more the values within the samples vary, the larger will be SSE .

As an example, in the gasoline mileage study, the sample treatment means are yA = 34 92. , yB = 36 56. , and yC = 33 98. . It follows that

SSE y y y y y yAj Aj

n

Bj B Cj Cj

n

j

nA CB

= + +

== ==

( ) ( ) ( )[( .

2

1

2 2

11

34 0 + +

+ +

34 92 35 0 34 92 34 3 34 92

35 5 34 92 35 8

2 2 2

2

. ) ( . . ) ( . . )

( . . ) ( . 334 92 35 3 36 56

36 5 36 56 36 4 36 56 37 0

2 2

2 2

. ) ] [( . . )

( . . ) ( . . ) ( .

+

+ + +

+ + +

+

36 56

37 6 36 56 33 3 33 98 34 0 33 98

34

2

2 2 2

. )

( . . ) ] [( . . ) ( . . )

( .77 33 98 33 0 33 98 34 9 33 988 028

2 2 2 + + =

. ) ( . . ) ( . . ) ].

Finally, we define a total sum of squares, denoted SSTO, to be

SSTO y yijj

n

i

p i

= ==

( )211

It can be shown that SSTO is the sum of SST and SSE . That is:

SSTO SST SSE= +

This says that the total variability in the observed values of the response must come from one of two sourcesthe between-treatment variability or the within-treatment variability. Therefore, the SST and SSE are said to partition the total sum of squares. For the gasoline mileage study

SSTO SST SSE= + = + =17 0493 8 028 25 0773. . .


Using the treatment and error sums of squares, we next define two mean squares. The treatment mean square is

MSTSSTp

=1

The error mean square is

MSESSEn p

=

In order to test whether there are any statistically significant differences between the treatment means, we compare the amount of between- treatment variability (MST) to the amount of within-treatment variabil-ity (MSE) by using the test statistic

FMSTMSE

=

It can be shown that if the null hypothesis H p0 1 2: m m m= = = is true, then the population of all possible values of F is described by an Fdistribution having p 1 numerator and n p denominator degrees of freedom. It can also be shown that E MST( ) and E MSE( ), the expected values of the mean squares MST and MSE , are given by the formulas

E MSTn

pE MSE

i ii

p

( )( )

( )

*

= +

==

s

m ms2

2

1 2

1

. and

where m m*. ==n ni ii

p

/ .1

If H p0 1 2: ...m m m= = = is true, the part of

E MST( ) after the plus sign equals zero and thus E MST( ) = s2. This implies that E MST E MSE( ) ( ) =/ 1. On the other hand, if H0 is not true, the part of E MST( ) after the plus sign is greater than 0 and thusE MST( ) > s2. This implies that E MST E MSE( ) ( ) >/ 1. We conclude that values of F MST MSE= / that are large (substantially greater than 1)


would lead us to reject H0. To decide exactly how large F has to be to reject H0, we consider the probability of a Type I error for the hypothesis test. A Type I error is committed if we reject H0 1 2: ...m m m= = = p when H0 is true. This means that we would conclude that the treatment means differ when they do not differ. To perform the hypothesis test, we set the probability of a Type I error (also called the level of significance) for the test equal to a specified value a. The smaller the value of a at which we can reject H0, the smaller is the probability that we have concluded that the treatment means differ when they do not differ. Therefore, the stronger is the evidence that we have made the correct decision in con-cluding that the treatment means differ. In practice, we usually choose ato be between .10 and .01, with .05 being the most common choice of a. If we can reject H0 at level of significance .05, we regard this as strong evidence that the treatment means differ. Note that we rarely set a lower than .01 because doing so would mean that the probability of a Type II error (failing to conclude that the treatment means differ when they really do differ) would be unacceptably large.

An F Test for Differences Between Treatment Means

Suppose that we wish to compare p treatment means m m m1 2, , , p and consider testing

H p0 1 2: ...m m m= = = (all treatment means are equal)

versus

Ha p: , ,...,At least two of m m m1 2 differ (at least two treatment means differ)

Define the F statistic

FMSTMSE

SST pSSE n p

= =

/ ( )/ ( )

1


Also define the p-value related to F to be the area under the curve of the F distribution having p 1 numerator and n p denominator degrees of freedom to the right of F .

Then, we can reject H0 in favor of Ha at level of significance a if either of the following equivalent conditions hold:

1. F F> a 2. p value- < a

Here, F F> a is the point on the horizontal axis under the curve of the F distribution having p 1 numerator and n p denominator degrees of freedom that gives a right hand tail area equal to a.

Figure 1.1 illustrates the rejection point F F> a and the p-value for the hypothesis test. A large value of F results when MST , which measures the between treatment variability, is large in comparison to MSE , which mea-sures the within treatment variability. If F is large enough, this implies that the null hypothesis H0 should be rejected. The rejection point F F> a tells us when F is large enough to reject H0 at level of significance a. When F is large, the associated p-value is small. If this p-value is less than a, we can reject H0 at level of significance a.

Example 1.2

Consider the North American Oil Company data in Table 1.1. The com-pany wishes to determine whether any of gasoline types A, B, and C have different effects on mean Lance gasoline mileage. That is, we wish to see whether there are any statistically significant differences between mA , mB, and mC. To do this, we test the null hypothesis H A B C0 : m m m= = , which says that gasoline types A, B, and C have the same effects on mean gasoline mileage. We test H0 versus the alternative Ha : At least two ofm m mA B C, , and differ, which says that at least two of gasoline types A, B, and C have different effects on mean gasoline mileage.

Because we have previously computed SST to be 17.0493 and SSE to be 8.028, and because we are comparing p = 3 treatment means, we have


MSTSSTp

=

=

=1

17 04933 1

8 525.

.

and

MSESSEn p

=

=

=8 02815 3

0 669.

.

It follows that

FMSTMSE

= = =8 5250 669

12 74..

.

In order to test H0 at the .05 level of significance, we use F.05 with p = =1 3 1 2 numerator and n p = =15 3 12 denominator degrees

The curve of the F distribution havingp and n p degrees of freedom

If F(model) Fa, If F(model) > F

a,

a = The level of significance

Fa

1 a

reject H0 in favor of H

a

(a) The rejection point Fa based on setting the

probability of a Type I error equal to a

do not reject H0 in favor of H

a

F(model)

p-value

(b) If the p-value is smaller than a, thenF(model) > F

a and we reject H0.

The curve of the F distributon havingp and np degrees of freedom

Figure 1.1 An F test for testing for differences between treatment means


of freedom. Table A1 in Appendix A tells us that this F point equals 3.89, so we have

F F= > =12 74 3 8905. ..

Therefore, we reject H0 at the .05 level of significance. This says we have strong evidence that at least two of the treatment means m m mA B C, , and differ. In other words, we conclude that at least two of gasoline types A, B, and C have different effects on mean gasoline mileage.

The results of an analysis of variance are often summarized in what is called an analysis of variance table. This table gives the sums of squaresSST SSE SSTO, , and ( ), the mean squares ( )MST MSE and , and the F

statistic and its related p-value for the ANOVA. The table also gives the degrees of freedom associated with each source of variationtreatments, error, and total. Table 1.2 gives the ANOVA table for the gasoline mileage problem. Notice that in the column labeled Sums of squares, the values of SST and SSE sum to SSTO.

Figure 1.2 gives the MINITAB and Excel output of an analysis of variance of the gasoline mileage data. Note that the upper portion of theMINITAB output and the lower portion of the Excel output give the ANOVA table of Table 1.2. Also, note that each output gives the value F = 12 74. and the related p-value, which equals .001(rounded). Because this p-value is less than .05, we reject H0 at the .05 level of significance. Figure 1.3 gives the SAS output of an analysis of variance of the gasoline mileage data.

1.2.3 Statistical Inference for Pairwise Differences Between and Linear Combinations of Treatment Means

If the one-way ANOVA F test says that at least two treatment means differ, then we investigate which treatment means differ and we esti-mate how large the differences are. We do this by making what we call pairwise comparisons (that is, we compare treatment means two at a time). One way to make these comparisons is to compute point esti-mates of and confidence intervals for pairwise differences. For example, in the gasoline mileage case we might estimate the pairwise differences


m m m m m mB A A BC C , , and . Here, for instance, the pairwise differ-ence m mB A can be interpreted as the change in mean mileage achieved by changing from using gasoline type A to using gasoline type B.

There are two approaches to calculating confidence intervals for pair-wise differences. The first involves computing the usual, or individual, confidence interval for each pairwise difference. Here, if we are computing100 1( ) a percent confidence intervals, we are100 1( ) a percent confi-dent that each individual pairwise difference is contained in its respective interval. That is, the confidence level associated with each (individual) comparison is100 1( ) a percent, and we refer to a as the comparison-wise error rate. However, we are less than100 1( ) a percent confident that all of the pairwise differences are simultaneously contained in their respective intervals. A more conservative approach is to compute simul-taneous confidence intervals. Such intervals make us100 1( ) a percent confident that all of the pairwise differences are simultaneously contained in their respective intervals. That is, when we compute simultaneous intervals, the overall confidence level associated with all the comparisons being made in the experiment is100 1( ) a percent, and we refer to a as the experimentwise error rate.

Table 1.2 Analysis of variance (ANOVA) table for testing H A B C0 : m m m= =

Source

Degrees of

freedomSums of squares

Mean squares F statistic p-value

Treat-ments

p = =

1 3 1

2

SST = 17 0493. MST =

=

=

SSTp 1

17 04933 1

8 525

.

.

FMSTMSE

=

=

=

8 5250 66912 74

.

..

0.001

Error n p = =

15 3

12

SSE = 8 028. MSE SSEn p

=

=

=

8 02815 30 669

.

.

Total n = =

1 15 1

14

SSTO = 25 0773.


()

Leve

lTy

pe A

Type

BTy

pe C

N 5 5 5

StDe

v0.

766

0.85

00.

835

(a) T

he M

INIT

AB

out

put

One-

way

Sour

ceGa

s Ty

peEr

ror

Tota

lIn

dividual 95%

CIs For Mean Based on Pooled StDev

Type B

Type C

Lower

0.2610

-2.3190

Center

1.6400

-0.9400

Upper

3.0190

0.439

Tukey 95% Simultaneous

Confidence Intervals

Type A subtracted

from:

Type C

Lower

-3.9590

Center

-2.5800

Upper

-1.2010

Type B subtracted

from:

Pool

ed S

tDev

= 0

.818

33.6 34.8 36.0 37.2

++++

++++

()

()

0.001 10

Type

B,

8.52

5 7

0.66

9 8

Type

A,

17.0

49

4 8

.028

5

25.0

77

6

ANOV

A:DF 2

1

12

2 14

3

Mean

34.9

20 1

136

.560

12

33.9

80 1

3

Type C

12.74 9

SSMS

FP

*

**

Fig

ure

1.2

MIN

ITA

B a

nd E

xcel

out

put

of a

n an

alys

is o

f var

ianc

e of

the

gas

olin

e m

ileag

e da

ta in

Tab

le 1

.1

(b) T

he E

xcel

out

put

SUM

MA

RY

Bet

wee

n G

roup

sW

ithin

Gro

ups

Tot

al

AN

OV

A

Typ

e A

Typ

e B

Typ

e C

Gro

ups

5 5 5

Cou

nt

174.

618

2.8

169.

9

Sum

0.58

70.

723

0.69

7

Var

ianc

eA

vera

ge

2 12 14

1 2 3

34.9

2 36

.56

33.9

8 12 1311

17.0

493

8.02

8025

.077

3 654

Sour

ce o

f Var

iati

onSS

dfM

SF

P

valu

eF

crit

8.52

47

0.66

90 7 8

12.7

4249

0.00

1110

3.88

5314

p1

np

n1

SST

SS

E

SST

O

MST

M

SE

F st

atist

ic

p

valu

e re

late

d to

F

y

A

yB

y C

F

05

32

14

56

78

910

1112

1314


Depe

nden

t Va

riab

le:

MILEAGE

Mode

lEr

ror

Corr

ecte

d To

tal

2a 12b

14c

17.04933333d

8.02800000

e

25.07733333f

8.52466667

g

0.66900000

h12.74i

0.0011

j

a p

1 b

np

c n

1 d

SST

e S

SE f

SST

O g

MST

h M

SE i

F j

pv

alue

for

F k

s =

M

SE l

y B

yA

m t

for

test

ing

H0:

B

A =

0 n

pv

alue

for

test

ing

H0:

B

A =

0 o

M

SE(1

/nB +

1/n

A)

p y B

q 95

% c

onfid

ence

inte

rval

for

B r

95%

pre

dict

ion

inte

rval

for

y B,0

Sour

ceDF

Sum of Squares

Mean Square

F Value

Pr > F

0.67

9870

Rsq

uare

2.326733

C. V.

0.81792420

k

Root MSE

35.15333333

MILEAGE Mean

MUB

MUA

MUA

MUC

MUB

MUC

MUB

(MUC

+MUA

)/2

Para

mete

r

1.64

000000

l

0.94

000000

2.58

000000

2.11

000000

Estimate

3.17

m

1.82

4.99

4.71

T for H0:

Parameter=0

0.0081

n

0.0942

0.0003

0.0007

Pr > |T|

0.51730069

o

0.51730069

0.51730069

0.44799550

Std Error

ofEstima

te

6

Obse

rvat

ion

35.3

0000

000

Obse

rved

Valu

e

36.56000000p

Predicted

Value

1.26000000

Residual

6

Obse

rvat

ion

35.3

0000

000

Obse

rved

Valu

ePredicted

Value

1.26000000

Residual

35.76301898q

Lower 95% CL

for Mean

37.35698102q

Upper 95%

CLfor Me

an

34.60780316r

Lower 95% CL

for Individual

38.51219684r

Upper 95%

CLfor Individual

36.56000000p

Fig

ure

1.3

SAS

outp

ut o

f an

anal

ysis

of v

aria

nce

of t

he g

asol

ine

mile

age

data

in T

able

1.1


Several kinds of simultaneous confidence intervals can be computed. We first present what is called the Tukey formula for simultaneous inter-vals. If we are interested in studying all pairwise differences between treat-ment means, the Tukey formula yields the most precise (shortest) simultaneous confidence intervals. In general, a Tukey simultaneous 100(1 a) percent confidence interval is longer than the corresponding individual 100(1 a) percent confidence interval. Thus, intuitively, we are paying a penalty for simultaneous confidence by obtaining longer intervals. One pragmatic approach to comparing treatment means is to first determine if we can use the more conservative Tukey intervals to make meaningful pairwise comparisons. If we cannot, then we might see what the individual inter-vals tell us. In the following box we present both individual and Tukey simultaneous confidence intervals for pairwise differences. We also pres-ent the formula for a confidence interval for a single treatment mean and the formula for a prediction interval for an individual response, which we might use after we have used pairwise comparisons to determine the best treatment.

Estimation and Prediction in One-Way ANOVA

1. Consider the pairwise difference m mi h which can be inter-preted to be the change in the mean value of the response variable associated with changing from using treatment h to using treat-ment i. Then, a point estimate of the difference m mi h is y yi h where yi and yh are the sample treatment means associated with treatments i and h.

2. An individual 100(1 a) percent confidence interval for m mi h is

( ) /y y t MSE n ni h i h

+

a 2

1 1

Here, ta /2 is the point on the horizontal axis under the curve of the t distribution having n p degrees of freedom that gives a


right hand tail area equal to /2. Table A2 in Appendix A is a table of t points.

3. A Tukey simultaneous 100(1 a) percent confidence interval for m mi h in the set of all possible pairwise differences between treatment means is

( )y y qMSE

mi h

a

Here the value qa is obtained from Table A3, which is a table of percentage points of the studentized range. In this table qa is listed corresponding to values of p and n p. Furthermore, we assume that the sample sizes ni and nh are equal to the same value, which we denote as m. If ni and nh are not equal, we replace q MSE ma / by ( / ) [( / ) ( / ).q MSE n ni ha 2 1 1+

In this case, the confidence interval is only approximately correct.

4. A point estimate of the treatment mean mi is yi and an individ-ual 100(1 a) percent confidence interval for mi is

y tMSE

ni i

a /2

5. A point prediction of yi i i0 0= +m e , a randomly selected individ-ual value of the response variable when using treatment i, is yi, and a 100(1 a) percent prediction interval for yi0 is

y t MSEni i

+

a /2

11

Note that, because the ANOVA assumptions imply that the error termei0 is assumed to be randomly selected from a normally distributed population of error term values having mean zero, ei0 has a fifty percent chance of being positive and a fifty percent chance of being negative. Therefore, we predict ei0 to be zero, and this implies that the point


Example 1.3

In the gasoline mileage study, we are comparing p = 3 treatment meansm m mA B C, , and ( ). Furthermore, each sample is of size m = 5, there are a

total of n = 15 observed gas mileages, and the MSE found in Table 1.2 is .669. Because q. .05 3 77= is the entry found in Table A3 corresponding top = 3 and n p = 12, a Tukey simultaneous 95 percent confidence inter-val for m mB A is

( ) ( . . ) ..

[ .

.y y qMSE

mB A

=

=

05 36 56 34 92 3 776695

1 64 =

1 379261 3 019

. ][. , . ]

Similarly, Tukey simultaneous 95 percent confidence intervals for m mA C and m mB C are, respectively,

( ) . [( . . ) . ][. . ][ . ,

y yA C [ ] = = =

1 379 34 92 33 98 1 37994 1 3790 439 22 319. ]

and

( ) . [( . . ) . ][ . . ][ . ,

y yB C [ ] = = =

1 379 36 56 33 98 1 3792 58 1 3791 201 33 959. ]

estimate yi of mi is also the point prediction of yi i i0 0= +m e . However, because the error term ei0 will probably not be zero, yi is likely to be less accurate as a point prediction of yi i i0 0= +m e than as a point esti-mate of mi. For this reason, the100 1( ) a percent prediction interval for yi i i0 0= +m e has an extra 1 under the radical and thus is longer than the100 1( ) a percent confidence interval for mi.


These intervals make us simultaneously 95 percent confident that (1)changing from gasoline type A to gasoline type B increases mean mile-age by between .261 and 3.019 mpg, (2) changing from gasoline type C to gasoline type A might decrease mean mileage by as much as .439 mpg or might increase mean mileage by as much as 2.319 mpg, and (3)chang-ing from gasoline type C to gasoline type B increases mean mileage by between 1.201 and 3.959 mpg. The first and third of these intervals make us 95 percent confident that mB is at least .261 mpg greater than mA and at least 1.201 mpg greater than mC. Therefore, we have strong evidence that gasoline type B yields the highest mean mileage of the gasoline types tested. Furthermore, noting that t.025 based on n p = 12 degrees of free-dom is 2.179 (see Table A2), it follows that an individual 95 percent confidence interval for mB is

y tMSEnB B

=

= [

. . ..

. ,

025 36 56 2 1796695

35 763 37.357]]

This interval says we can be 95 percent confident that the mean mile-age obtained by all Lances using gasoline type B is between 35.763 and 37.357 mpg. Also, a 95 percent prediction interval for yB B B0 0= +m e , the mileage obtained by a randomly selected individual Lance when driven using gasoline type B, is

y t MSEnB B

+

= +

. . . .025 1

136 56 2 179 669 1

15

= [ . , . ]34 608 38 512

Notice that the 95 percent confidence interval for mB is graphed on the MINITAB output of Figure 1.2, and both the 95 percent confidence interval for mB and the 95 percent prediction interval for an individual Lance mileage using gasoline type B are given on the SAS output in Fig-ure 1.4. The MINITAB output also shows the 95 percent confidence intervals for mA and mC , and a typical SAS output would also give these


intervals, but to save space we have omitted them. Also, the MINITAB output gives Tukey simultaneous 95 percent intervals. For example, con-sider finding the Tukey interval for m mB A on the MINITAB output. To do this, we look in the table corresponding to Type A subtracted from and find the row in this table labeled Type B. This row gives the interval for Type A subtracted from Type Bthat is, the interval for m mB A . This interval is [.261, 3.109], as previously calculated. Finally, note that the half-length of the individual 95 percent confidence interval for a pair-wise comparison is (because n n nA B C= = = 5 )

t MSEn ni h

. . . .0251 1

2 179 66915

15

1 127+

= + =

This half-length implies that the individual intervals are shorter than the previously constructed Tukey intervals, which have a half-length of 1.379. Recall, however, that the Tukey intervals are short enough to allow us to conclude with 95 percent confidence that mB is greater than mA and mC .

We next suppose in the gasoline mileage situation that gasoline type B contains a chemicalChemical XXthat is not contained in gasoline types A or C . To assess the effect of Chemical XX on gasoline mileage, we consider

m m mB C A+2

This is the difference between the mean mileage obtained by using gas-oline type B and the average of the mean mileages obtained by using gasoline types C and A. Note that

m m m m m m

m m m

B C A A B C

A A B B C Ca a a

+ = + +

= + +

=

( ) / ( )212

112

aal ll A B C

m=

, ,


where a a aA B C= = = ( ) ( )1 2 1 1 2/ , , / and . In general, if a a ap1 2, ,..., are arbitrary constants, we say that

a a a aii

p

i p p= = + + +

11 1 2 2m m m m...

is a linear combination of the treatment means m m m1 2, ,..., p. As with a pairwise difference m mi h , we can find a point estimate of and an indi-vidual 100 1( ) a percent confidence interval for any linear combination of the treatment means. However, since Tukey simultaneous 100 1( ) a percent confidence intervals do not exist for linear combinations that are not simple pairwise differences, we need other kinds of simultaneous 100 1( ) a percent confidence intervals. Two types of such intervals that apply to general linear combinations are Scheff simultaneous 100(1 a) percent confidence intervals and Bonferroni simultaneous 100(1 a) percent confidence intervals. We now summarize estimating a general linear combination of the treatment means. Because the formulas for the Scheff intervals involve Fa points based on numerator and denominator degrees of freedom that vary depending on what is being estimated, we will be notationally concise and place these degrees of freedom for the appro-priate F points in parentheses above the F points. For example, F a

( , )p n p 1 denotes an Fa point based on p 1 numerator and n p denominator degrees of freedom. We will use such notation at various times in this book.

Estimating a Linear Combination of the Treatment Means in One-Way ANOVA

1. A point estimate of the linear combination aii

p

i=

1

m is

a yii

p

i=

1

Letting s MSE= , a 100(1 a) percent confidence interval for ai

i

p

i=

1

m is

a y t sanii

p

ii

ii

p

= =

1

2

2

1a /


2. We define a contrast to be any linear combination aii

p

i=

1

m such

that aii

p

== 0

1

. Suppose that we wish to find a Scheff simulta-

neous 100(1 a) percent confidence interval for a contrast in the set of all possible contrasts. Then:a. The Scheff interval for the difference m mi h (which is a

contrast) is

y y sn ni h

p n p

i h

( ) ( ) +

p F1 a( , )1 1 1

b. The Scheff interval for the contrast aii

p

i=

1

m is

a y p sanii

p

ip n p i

ii

p

=

= ( )

1

12

1

1 F a( , )

3. Suppose that we wish to find a Scheff simultaneous 100(1 a) percent confidence interval for a linear combination in the set of all possible linear combinations (some of which are not con-trasts). Then,

a. The Scheff interval for the difference m mi h , is

y y p sn ni h

p n p

i h

( ) +

Fa( , ) 1 1

b. The Scheff interval for the linear combination aii

p

i=

1

m is

a y p sanii

P

ip n p i

ii

p

=

=

1

2

1

Fa( , )

4. A Bonferroni simultaneous 100(1 a) percent confidence inter-val for m mi h in a prespecified set of g linear combinations is

y y t sn ni h g i h

( ) +

a /2

1 1


The choice of which Scheff formula to use requires that we make a decision before we observe the samples. We must decide whether we are interested in:

1. finding simultaneous confidence intervals for linear combinations, all of which are contrasts, in which case we use formulas 2a and 2b;

2. finding simultaneous confidence intervals for linear combinations, some of which are not contrasts, in which case we use formulas 3a and 3b.

Of course, we will not literally calculate Scheff simultaneous con-fidence intervals for all possible contrasts (or more general linear com-binations). However, the Scheff simultaneous confidence interval formula applies to all possible contrasts (or more general linear combi-nations). This allows us to data snoop. Data snooping means that we will let the data suggest which contrasts or linear combinations we will inves-tigate further. Remember, however, we must decide whether we will study contrasts or more general linear combinations before we observe the data. Also,note that because the Tukey formula for simultaneous 100 1( ) a percent confidence intervals applies to all possible pairwise differences of the treatment means, this formula also allows us to data snoop, in the sense that it allows us to let the data suggest which pairwise differences to further investigate. On the other hand, the Bonferroni formula for Bonferroni simultaneous 100 1( ) a percent confidence intervals requires that we prespecifybefore we observe the dataaset of linear combinations. Thus this formula does not allow us to data snoop.

5. A Bonferroni simultaneous 100(1 a) percent confidence inter-val for ai

i

p

i=

1

m in a prespecified set of g linear combinations is

a y t sanii

p

i gi

ii

p

= =

1

2

2

1a /


Example 1.4

Consider the North American Oil Company problem. Suppose that we had decidedbefore we observed the gasoline mileage data in Table 1.1that we wished to find Scheff simultaneous 95 percent confidence inter-vals for all contrasts in the following set of contrasts:

Set I

m m m m m mB A A C B C mm m

BC A

+

2

Suppose that we also wish to find such intervals for other contrasts that the data might suggest. That is, we are considering all possible contrasts. To verify, for example, that m m mB C A +( ) / 2 is a contrast, note that

m m m m m m

m m mB

C AA B C

A A B B C Ca a a

+

= +

= + +2

12

112

Here, a a aA B C= = = 12

112

, , ,and which implies that

a a a aii A B C

A B C= + +

= +

=

=

, ,

12

112

0

Moreover,

an

an

an

an

i

ii A B C

A

A

B

B

C

C

2 2 2 2

22

2125

15

125

3

= = + +

=

+ +

=

, ,

( ) ( )( )

.

Since s MSE= = =. .669 8179, it follows that a Scheff simultaneous 95 percent confidence interval for m m mB C A +( ) / 2 is (using formula 2b)


yy y

p sanB

C A p n p i

i A B C

+

( )

=

=2 1

36 5633 9

12

Fa( , )

, ,

.. 88 34 92

23 1 8179 3

2 11 2 3 89

053 1 15 3+ ( ) ( )

=

. . .

[ . ( . )

.( , )F

((. ) . ][. , . ]

8179 386 3 36=

This interval says that we are 95 percent confident that mB is between .86mpg and 3.36 mpg greater than ( ) /m mC A+ 2. Note here that Chemi-cal XX might be a major factor causing mB to be greater than ( ) /m mC A+ 2. However, this is not at all certain. The chemists at North American Oil must use the previous comparison, along with their knowledge of the chemical compositions of gasoline types A, B, and C , to assess the effect of Chemical XX on gasoline mileage. The Scheff simultaneous 95 per-cent confidence intervals for m m m m m mB A A C B C , , and (the other contrasts in Set I) can be calculated by using formula 2a.

Next, suppose that we had decidedbefore we observed the gasoline mileage data in Table 1.1that we wished to calculate Scheff simulta-neous 95 percent confidence intervals for all the linear combinations in Set II:

Set II

m m m m m m mA B C B A A C m m

m m mB C

BC A

+2

In addition, suppose that we wish to find such intervals for other linear combinations that the data might suggest. Note that m m mA B C, , and are not contrasts. That is, these means cannot be written as ai ii A B C m= , , , where

aii A B C

== 0

, ,

For example,

m m m mB A B C= + +( ) ( ) ( )0 1 0


which implies that

aii A B C

= + +

== 0 1 0

1, ,

Therefore we must use formulas 3a and 3b to calculate Scheff intervals.Whereas, formulas 2a and 2b use

p p n p( ) = ( )==

1 3 1

2 3 892 7893

105

3 1 15 3F Fa( , )

.( , )

( . ).

to calculate Scheff simultaneous 95 percent confidence intervals for all possible contrasts, formulas 3a and 3b use the larger

p F Fp n pa( , )

.( , )

( . ).

=

==

3

3 3 493 2357

053 15 3

to calculate Scheff simultaneous 95 percent confidence intervals for all possible linear combinations. Because the formulas 3a and 3b differ from the respective formulas 2a and 2b only by these comparative values, we are paying for desiring Scheff simultaneous 95 percent confidence inter-vals for all possible linear combinations (some of which are not contrasts) by having longer (and thus less precise) Scheff simultaneous 95percent confidence intervals for the contrasts.

Next, consider finding Bonferroni simultaneous 95 percent confi-dence intervals for the prespecified linear combinations m m m m m m m m mB A A C B C B C A +, , , ( ) /and 2

m m m m m m m m mB A A C B C B C A +, , , ( ) /and 2. Since there are g = 4 linear combina-tions here, we need to find t t tga / . / ( ) .2 05 2 4 00625= = . Using Excel to find t.00625 based on n p = =15 3 12 degrees of freedom, we find that t. .00625 2 934459= . This t point is larger than the previously found Scheff interval point 3 1 2 789305

3 1 15 3( ) = F. ( , ) . for all possible contrasts, so the Bonferroni simultaneous 95 percent confidence intervals would be longer than the corresponding Scheff intervals. On the other hand, consider


finding Bonferroni simultaneous 95 percent confidence intervals for the prespecified two linear combinations m mB C and m m mB C A +( ) / 2. Since g = 2, we need to find t t tga / . / ( ) .2 05 2 2 0125= = . Using Excel to find t.0125 based on n p = =15 3 12 degrees of freedom, we find thatt. .0125 2 560033= . This t point is smaller than the Scheff interval point

3 1 2 7893053 1 15 3( ) = F. ( , ) . for all possible contrasts, so the Bonferroni

simultaneous 95 percent confidence intervals would be shorter than the corresponding Scheff intervals.

We now summarize the use of Tukey, Scheff, and Bonferroni simul-taneous confidence intervals.

1. If we are interested in all pairwise comparisons of treatment means, the Tukey formula will give shorter intervals than will the Scheff or Bonferroni formulas. If a small number of prespecified pairwise comparisons are of interest, the Bonferroni formula might give shorter intervals in some situations.

2. If we are interested in all contrasts (or more general linear combina-tions) of treatment means, the Scheff formula should be used. If a small number of prespecified contrasts (or more general linear com-binations) are of interest, the Bonferroni formula might give shorter intervals. This is particularly true if the number of prespecified con-trasts (or more general linear combinations) is less than or equal to the number of treatments.

3. Whereas the Tukey and Scheff formulas can be used for data snoop-ing, the Bonferroni formula cannot.

4. It is reasonable in any given problem to use all of the formulas (Tukey, Scheff, and Bonferroni) that apply. Then we can choose the formula that provides the shortest intervals.

Instead of (or in addition to) using confidence intervals to make pairwise comparisons of treatment means, we can also make such comparisons by using hypothesis tests. Suppose, for example, that we wish to make several pairwise comparisons. We might perform several individual t -tests each with a probability of a Type I error set equal to a. The following summary box shows how to perform an individual t test.


For example, in the gasoline mileage situation consider testing H B A0 0: m m = versus Ha B A: m m 0 . Since y yB A = =36 56 34 92 1 64. . .

y yB A = =36 56 34 92 1 64. . . and MSE n nB A[( / ) ( / )] . [( / ) ( / )] .1 1 669 1 5 1 5 5173+ = + =MSE n nB A[( / ) ( / )] . [( / ) ( / )] .1 1 669 1 5 1 5 5173+ = + = , the test statistic t equals 1.64/.5173 = 3.17. The p-value for the

hypothesis test is twice the area under the curve of the t distribution having n p = =15 3 12 degrees of freedom to the right of t = 3 17. . The SAS output in Figure 1.3 gives the point estimates y yB A = 1 64. , the standard error of this estimate MSE n nB A[( / ) ( / )] .1 1 5173+ = , and the test statistic t = 3 17. . The SAS output also tells us that the p-value for the hypothesis test is .0018. Because this p-value is less than an a of .05, we can reject H B A0 0: m m = at the .05 level of sig-nificance. In fact, because this p-value is less than an a of .01, we can also reject H B A0 0: m m = at the .01 level of significance. This would be regarded as very strong evidence that mB and mA differ. Further exam-ining the SAS output, we see this output gives the point estimates, standard errors of the estimates, test statistics, and p-values for testing H A C0 0: m m = , H B C0 0: m m = , and H B C A0 2 0: ( ) /m m m + = . For example, the SAS output tells us that the p-value for testing

An Individual t-Test for Pairwise Comparisons of Treatment Means

Define

ty y

s n ni h

i h

=

( ) + ( )1 1/ /

Also, define the p-value to be twice the area under the curve of thet -distribution having n p degrees of freedom to the right of t . Then we can reject H i h0 0: m m = in favor of Ha i h: m m 0 at level of significance a if either of the following equivalent conditions hold.

1. | | | ( / ) ( / )/ /t t t s n ni h i h> > +a a2 2 1 1or | y y2. p-value < a


H B C0 0: m m = is .0003. Because this p-value is less than an a of .001, we have extremely strong evidence that mB and mC differ. Also, note that the test statistic t for testing H B C A0 2: ( ) /m m m + = 0 is the point estimate y y yB C A + = + =( ) / . ( . . ) / .2 36 56 33 98 34 94 2 2 11 divided by the standard error of this point estimate, which we have calculated in Exam-ple 1.4 and is s a ni i

i A B C

2 8179 3 4480/ . . ., ,=

= = (within rounding).When we perform several individual t tests each with the probability

of a Type I error set equal to a, we say that we are setting the compari-sonwise error rate equal to a. However, in such a case the probability of making at least one Type I error (that is, the probability of deciding that at least one difference between treatment means differs from zero when the difference does not differ from zero) is greater than a. This probability of making at least one Type I error is called the experimentwise error rate. To control the experimentwise error rate, we can carry out hypothesis tests based on the Bonferroni, Scheff, or Tukey methods. The rejection rules for these tests are simple modifications of the simultaneous confidence inter-val formulas. For example, suppose that we wish to test H i h0 0: m m =versus Ha i h: m m 0. Using the Bonferroni method, we would declare the difference between mi and mh to be statistically significant if

| | /y y t s n ni h g i h > +a 2

1 1

Here, g is the number of pairwise comparisons in a prespecified set, and we are controlling the experimentwise error rate over the g pairwise com-parisons in the prespecified set. Using the Scheff method, we would declare the difference between mi and mh to be statistically significant if

| | ( , )y y p sn ni h

p n p

i h

> ( ) + 1 1 11Fa

In this case we are controlling the experimentwise error rate over all null hypotheses that set a contrast ai ii

pm

= 1 equal to zero. The Tukey method declares the difference between mi and mh to be statistically significant if

| |y y qMSE

mi h > a


Here, we are controlling the experimentwise error rate over all possible pairwise comparisons of treatment means. Recall from our discussion of Tukey simultaneous 95 percent confidence intervals that the sample size for each treatment is assumed to be the same value m, and qa is a studen-tized range value obtained from Table A3 corresponding to the values p and n p .

A modification of the Tukey procedure is the StudentNewmanKeuls (SNK) procedure, which has us first arrange the sample treatment means from smallest to largest. Denoting these ordered sample means asy y y p( ) ( ) ( ), ,...,1 2 , the SNK procedure declares the difference between the ordered population means m( )i and m( )h (where i is greater than h) to be statistically significant if

| | ( , )( ) ( )y y q i h n pMSE

mi h > + a 1

Here, we denote the fact that the studentized range value obtained from Table A3 depends upon i h +1, the number of steps between y i( ) andy h( ), by denoting this studentized range value as q i h n pa( , ) + 1 . For example, in the gasoline mileage example the three sample means y y yA B C= = =34 92 36 56 33 98. , . , .and arranged in increasing order are y y y( ) ( ) ( ). , . , . .1 2 333 88 34 42 36 56= = =and To compare m( )3 with m( )1 , (that is, mB with mC) at significance level .05, we look up q q. ., ,05 053 1 1 15 3 3 12 + =( ) ( )

q q. ., ,05 053 1 1 15 3 3 12 + =( ) ( ) in Table A3 to be 3.77. Because | | | . . | .( ) ( )y y3 1 36 56 33 98 2 58 = =| | | . . | .( ) ( )y y3 1 36 56 33 98 2 58 = = is greater than q MSE m. ( , ) / . . / .05 3 12 3 77 669 5 1 379= = ,

we conclude that mB and mC differ. To compare m( )3 with m( )2 (that is, mB with mA) at significance level .05, we look up q q. .( , ) ( , )05 053 2 1 15 3 2 12 + = in Table A3 to be 3.08. Because | | | . . | .( ) ( )y y3 2 36 56 34 92 1 64 = = is greater than q MSE m. ( , ) / . . / .05 2 12 3 08 669 5 1 127= = , we conclude that mB and mA differ. To compare m( )2 with m( )1 (that is, mA with mC) at sig-nificance level .05, we look up q q. .( , ) ( , )05 052 1 1 15 3 2 12 + = in Table A3 to be 3.08. Because | | | . . | .( ) ( )y y2 1 34 92 33 98 94 = = is not greater than q MSE m. ( , ) / . . / .05 2 12 3 08 669 5 1 127= = , we cannot conclude that mA and mC differ. In general, the SNK procedure has neither a comparison-wise or an experimentwise error rate. Rather, the SNK procedure controls the error rate at a for all comparisons of means that are the same number of ordered steps apart. For example, in the gasoline mileage example, the error


rate is a for comparing m( )3 with m( )2 (that is, mB with mA) and m( )2 with m( )1 (that is, mA with mC) because in both cases i h + =1 2.

In general, because q i h n p MSE ma( , ) / + 1 decreases as the number of steps apart i h +1 decreases, the SNK procedure is more lib-eral than the Tukey procedure in declaring significant differences between treatment means. However, the SNK procedure is more conservative than performing individual t tests in declaring such significant differences. It is important to note that since many users find individual t tests each at significance level a [or individual100 1( ) a percent confidence intervals] easy to calculate and, therefore, use them for making multiple treatment mean comparisons (including those suggested by the data), statisticians recommend doing this only if the F test of H p0 1 2: ...m m m= = = rejects H0 at significance level a. Such a use of the F test as a preliminary test of significance, followed by making multiple, individual t test comparisons, is called Fishers least significant difference (LSD) procedure. Simu-lation studies suggest that Fishers LSD procedure controls the experi-mentwise error rate for the multiple comparisons at approximately a.

Lastly, in some situations it is important to use a control treat-ment and compare various treatment means with the control treatment mean. This is true, for example, in medical research where various new medicines would be compared with a placebo. The placebo might, for example, be a pill with no active ingredients, and measuring the placebo effect is important because sometimes patients react favorably simply because they are taking a pill. Dunnetts procedure declares treatment mean mi to be different from the control mean mcontrol at significance level a if

| | ( , )y y d p n pMSEmi control

> a 12

Here, p 1 is the number of noncontrol treatments, m is the common sample size for all treatments (including the control), and d p n pa( , ) 1 is obtained from Table A4. For example, in the gasoline mileage situation, suppose that the oil companys current gasoline is gasoline type A and the company wishes to compare gasoline types B, and C with gasoline type A, which is regarded as the control treatment. Letting a equal .05,


we find from Table A4 that d d. .( , ) ( , )05 053 1 15 3 12 2 12 = = is 2.50. Therefore, d MSE m. ( , ) / . (. ) / . .05 2 12 2 2 50 2 669 5 1 293= = Because | | | | | . . | .y y y yB control B A = = =36 56 34 92 1 64 is greater than 1.293, we conclude that mB differs from mA. Because | | | | | . . | .y y y yC control C A = = =33 98 34 92 94

| | | | | . . | .y y y yC control C A = = =33 98 34 92 94 is not greater than 1.293, we cannot conclude that mC differs from mA.

1.3 Fixed and Random Models

The methods of Section 1.2 describe the situation in which the treatments (that is, factor levels) are the only treatments of interest. This is called the fixed model case. However, in some situations the treatments have been randomly selected from a population of treatments. In such a case we are interested in making statistical inferences about the population of treatments.

Example 1.5

Suppose that a pharmaceutical company wishes to examine the potency of a liquid medication mixed in large vats. To do this, the company ran-domly selects a sample of four vats from a months production and ran-domly selects four separate samples from each vat. The data in Table 1.3 represents the recorded potencies. In this case we are not interested in the potencies in only the four randomly selected vats. Rather, we are inter-ested in the potencies in all possible vats.

Table 1.3 Liquid medication potencies from four randomly selected vats

Vat 1 Vat 2 Vat 3 Vat 46.1 7.1 5.6 6.5

6.6 7.3 5.8 6.8

6.4 7.3 5.7 6.2

6.3 7.7 5.3 6.3

y1 6 35= . y2 7 35= . y3 5 6= . y4 6 45= .


Let yij denote the potency of the jth sample in the ith randomly selected vat. Then the random model says that

yij i ij= +m e

Here, mi is the mean potency of all possible samples of liquid medication that could be randomly selected from the ith randomly selected vat. That is, mi is the mean potency of all of the liquid medication in the ith ran-domly selected vat. Moreover, since the four vats were randomly selected, mi is assumed to have been randomly selected from the population of all possible vat means. This population is assumed to be normally distributed with mean m and variance sm

2. Here, m is the mean potency of all possible samples of liquid medication that could be randomly selected from all possible vats. That is, m is the mean potency of all possible liquid medica-tion. In addition, sm

2 is the variance between all possible vat means.We further assume that each error term eij has been randomly selected

from a normally distributed population of error term values having mean zero and variance s2 and that different error terms eij are independent of each other and of the randomly selected means mi. Under these assump-tions we can test the null hypothesis H0

2 0: sm = . This hypothesis says that all possible vat means are equal. We test H0 versus the alternative hypothesis Ha : ,sm

2 0 which says that there is some variation between the vat means. Specifically, we can reject H0 in favor of Ha at significance level a = .05 if the F statistic of Section 1.2, F MST MSE= / , is greater than F Fa = =. .05 3 49, which is based on p = =1 4 1 3 numerator and n p = =16 4 12 denominator degrees of freedom.

Table 1.4 tells us that since F = 45 5111. is greater than F. .05 3 49= , we can reject H0

2 0: sm = with a = .05. Therefore, we conclude that there is variation in the population of all vat means. That is, we conclude that some of the vat means differ. Furthermore, as illustrated in Table 1.4, we can calculate point estimates of the variance components s2 and sm

2 . These estimates are .0542 and .6031, respectively. Note here that the variance component s2 measures the within-vat variability, while sm

2 measures the between-vat variability. In this case the between-vat variability is substantially higher than the within-vat variability. We can also calculate a 95 percent confidence interval for m, the mean potency


Tab

le 1

.4 A

NO

VA t

able

for

fixed

and

ran

dom

mod

els

Sour

cedf

Sum

of

squa

res

Mea

n

squa

reF

sta

tist

icE

(mea

n sq

uare

)

fixed

mod

elE

(mea

n sq

uare

)

rand

om m

odel

Mod

elp

=

13

SST

=7

4.M

ST=

246

67.

F

=45

5111

.

sm

m2

2

1

11

+

=

pn i

iip

()

ss m

22

+n

Erro

rn

p

=12

SS

E=

.65

MSE

=.0

542

s2

s2

Hp

01

2:

...m

mm

==

=H

02

0:s

m=

Not

es:

1.

=

=

= =

np

nn n

iip

iip

iip

11

1

2

1 1

(= m

for e

qual

sam

ple

size

s)

2. S

ince

F =

45.

5111

> F

.05 =

3.4

9, w

e ca

n re

ject

H0

20

:sm

= w

ith

a=

.05.

3. S

ince

E(M

SE)

= s

2 , a

poin

t est

imat

e of

s2 i

s MSE

= .0

542.

4. S

ince

EM

STn

()=

+

ss m

22, a

poi

nt e

stim

ate

of s

s m2

2+

n is

MST

. Thu

s a p

oint

est

imat

e of

sm2 is

()

()

..

.M

STM

SEn

MST

MSE

m

=

=

=2

4667

0542

460

31

5. F

or e

qual

sam

ple

size

s ()

nm

i=

a p

oint

est

imat

e of

m is

y

y

p

iip

==

++

+=

= 16

357

355

66

454

643

75.

..

..

Furt

herm

ore,

a 1

001

95(

)%%

=

a c

onfid

ence

inte

rval

for m

is

yt

MST pm

t

=

=a

/.

.. (

).

202

56

4375

246

674

46

4375

[

=

318

239

26

518

817

6869

.(.

.,

.

)]

[]

Her

e, t a

/2 is

bas

ed o

n p

1

= 4

1

= 3

deg

rees

of f

reed

om.


of all possible liquid medication. As shown in Table 1.4, this 95 percent interval is [5.1881, 7.6869]. To narrow this interval, we could randomly select more vats and more samples from each vat.

The above example illustrates that the procedure for testingH p0 1 2: ...m m m= = = , which is appropriate when the p treatments are the only treatments of interest, is the same as the procedure for test-ing H0

2 0: sm = , which is appropriate when the p treatments are ran-domly selected from a large population of treatments. Furthermore, each procedure is justified by the expected mean squares given in Table 1.4. Specifically, in the fixed model case, E MST E MSE( ) ( ) =/ 1 when H p0 1 2: ...m m m= = = is true, and in the random model case,E MST E MSE( ) ( ) =/ 1 when H0 2 0: sm = is true. Moreover, in both cases E MST E MSE( ) ( ) >/ 1 when H0 is not true. Thus, in both cases we reject H0 when F MST MSE= / is large.

1.4 Testing the Equality of Population Variances

Consider testing H p0 12

22 2: ...s s s= = = versus Ha : At least two of

s s s12

22 2, ,..., p differ. We can test H0 versus Ha by using the sample vari-

ances s s s p12

22 2, ,..., . Here, these sample variances are assumed to be cal-

culated from p independent samples of sizes n n np1 2, ,..., that have been randomly selected from p normally distributed populations having vari-ances s s s1

222 2, ,..., p. Then, Hartleys test says that if all samples have the

same size m, we can reject H p0 12

22 2: ...s s s= = = in favor of Ha at level of

significance a if

Fs s s

s s sp

p

=max( , ,..., )min( , ,..., )

12

22 2

12

22 2

is greater than Fa, which is based on p numerator and m 1 denominator degrees of freedom. If the sample sizes are unequal, but do not differ sub-stantially, we can set m equal to the maximum sample size.

Unfortunately, Hartleys test is very sensitive to departures from the normality assumption. If the populations being sampled are described by probability distributions that are somewhat nonnormal, but the popu-lation variances are equal, Hartleys test is likely to incorrectly reject the


null hypothesis that the population variances are equal. An alternative test that does not require the populations to have normal distributions is the Brown-Forsythe-Levene (BFL) test. To carry out this test, which involves considerable calculation, we let z y medij ij i= | |, where medi denotes the median of the observations in the ith sample. We then calculate

zz

nz

z

niij

j

n

i

ijj

n

i

pi i

. ..= == ==

1 11and

where n n n np= + + +1 2 ... is the total sample size. The BFL test then says that we can reject H p0 1

222 2: ...s s s= = = at level of significance a if

Ln z z p

z z n p

i ii

p

ij ij

n

i

p i=

=

==

( ) / ( )

( ) / ( )

. ..

.

2

1

2

11

1

is greater than Fa, which is based on p 1 numerator and n p denomi-nator degrees of freedom.

1.5 Using SAS

In Figure 1.4 we present the SAS program that yields the analysis of the North American Oil Company data that is presented in Figure 1.3. Note that in this program we employ a class variable to define the one factor model.

1.6 Exercises

1.1 An oil company wishes to study the effects of four different gaso-line additives on mean gasoline mileage. The company randomly selects four groups of six automobiles each and assigns a group of six automobiles to each additive type (W X Y, , , and Z). Here, all 24 automobiles employed in the experiment are the same make and model. Each of the six automobiles assigned to a gasoline additive is test driven using the appropriate additive, and the gasoline mileage


DATA GASOLINE;

DATALINES;

DataSee Table 1.1

Specifies General Linear Models ProcedureDefines class variable GASTYPE

Specifies model, and CLM requests confidence intervals

Estimates B

A

Estimates A

CEstimates

B

C

CLI requests prediction intervals

;PROC GLM; }CLASS GASTYPE; }MODEL MILEAGE = GASTYPE / P CLM; }

ESTIMATE MUB-MUA GASTYPE -1 1;}ESTIMATE MUA-MUC GASTYPE 1 0 -1;}ESTIMATE MUB-MUC GASTYPE 0 1 -1;}ESTIMATE MUB-(MUC+MUA)/2GASTYPE -.5 1 -.5 ; }PROC GLM ;CLASS GASTYPE ;MODEL MILEAGE = GASTYPE / P CLI ; }

ABC

ABC

ABC

ABC

ABC

34.035.333.3

35.036.534.0

34.336.434.7

35.537.033.0

35.837.634.9}

INPUT GASTYPE $ MILEAGE @@; } Defines factor GASTYPE and response variable MILEAGE

Estimates mB

C A+ 2

Figure 1.4 SAS program to analyze the North American Oil Company data

Notes: 1. The coefficients in the above ESTIMATE statements are obtained by writing the quantity to be estimated as a linear combination of the factor level means m m mA B Cand, , with the factor levels considered in alphabetical order. For example, if we consider MUB MUA (that is, m mB A ), we write this difference as

+ = + +m m m m mA B A B C1 1 0( ) ( ) ( )

Here, the trailing zero coefficient corresponding to mC may be dropped to obtain

ESTIMATE MUB MU GASTYPE ; A 1 1

As another example, the coefficients in the ESTIMATE statement forMUB MUC MUA +( ) / 2 (that B C A /2is, m m m +( ) ) are obtained by writing this expression as

m m m m m m

m m

B C A A B C

A B

+ = + +

= + +

( ) / ( ) ( )

. ( ) ( ) ( . )

212

112

5 1 5 mmC

Thus we obtain

ESTIMATE MUB MUC MUA GASTYPE / . . ; + ( ) 2 5 1 5

2. Expressions inside single quotes (for example, MUB MUA ) are labels that may be up to 16 characters in length.

3. Confidence intervals (CLM) and prediction intervals (CLI) may not be requested in the same MODEL statement when using PROC GLM.


for the test drive is recorded. The results of the experiment are given in Table 1.5. A one-way ANOVA of this data is carried out by using SAS. The PROC GLM output is given in Figure 1.5. Note that the treatment means m m m mW X Y Z, , , and are denoted as MUW, MUX, MUY, and MUZ on the output.(a) Identify and report the values of SSTO SST MST SSE MSE, , , , and

SSTO SST MST SSE MSE, , , , and .(b) Identify, report, and interpret F and its associated p-value.(c) Identify, report, and interpret the appropriate individual t statis-

tics and associated p-values for making all pairwise comparisons of m m m mW X Y Z, , , and .

(d) Identify, report, and interpret the appropriate individual t statistic and associated p-value for testing the significance of [( ) / ] [( ) / ]m m m mY Z X W+ +2 2 .

(e) Identify, report, and interpret a point estimate of and a 95 percent confidence interval for mZ (see observation 24).

(f ) Identify, report, and interpret a point prediction of and a 95 percent prediction interval for yZ Z Z0 0= +m e

1.2 Consider the one-way ANOVA of the gasoline additive data in Table 1.5 and the SAS output of Figure 1.5(a) Compute individual 95 percent confidence intervals for all pos-

sible pairwise differences between treatment means.(b) Compute Tukey simultaneous 95 percent confidence intervals

for all possible pairwise differences between treatment means.(c) Compute Scheff simultaneous 95 percent confidence intervals

for all possible pairwise differences between treatment means.(d) Compute Bonferroni simultaneous 95 percent confidence inter-

vals for the (prespecified) set of all possible pairwise differences between treatment means.

(e) Which of the above intervals are the most precise?1.3 Consider the one-way ANOVA of the gasoline additive data in

Table 1.5 and the SAS output of Figure 1.5. Also consider the prespecified set of linear combinations (contrasts):

m m m m m m m m m m mZ W Y W Z X Y X Z Y X mmm m m m

W

Y Z X W( ) ( )+ +

2 2


(a) Compute an individual 95 percent confidence interval for ( ) / ( ) /m m m mY Z X W+ +2 2. If gasoline additives Y and Z have an ingredient not possessed by gasoline additives X and W , what do you conclude?

(b) Compute Scheff simultaneous 95 percent confidence intervals for the linear combinations in the aforementioned set.

(c) Compute Bonferroni simultaneous 95 percent confidence inter-vals for the linear combinations in the aforementioned set.

1.4 In order to compare the durability of four different brands of golf balls (Alpha, Best, Century, and Divot), the National Golf Association randomly selects five balls of each brand and places each ball into a machine that exerts the force produced by a 250-yard drive. The number of simulated drives needed to crack or chip each ball is recorded. The results are given in Table 1.6. The Excel output of a one-way ANOVA of these data is shown in Figure 1.6. Test for statistically significant differences between the treatment means m m m mAlpha Best Century Divot, , , and . Set a= .05. Use pairwise comparisons to find the most durable brands of golf balls.

Table 1.5 Gasoline additive test results

Gasoline additive

W X Y Z 31.2 27.6 35.7 34.5

32.6 28.1 34.0 36.2

30.8 27.4 35.1 35.2

31.5 28.5 33.9 35.8

32.0 27.5 36.1 34.9

30.1 28.7 34.8 35.3

Mean 31.3667 27.9667 34.9333 35.3167


SAS

GENERAL LINEAR MODELS PROCEDURE

DEPE

NDEN

T VA

RIAB

LE:

MILE

AGE

MODE

LER

ROR

CORR

ECTE

D TO

TAL

SOUR

CE3 20 23DF

213.

8812

5000

11.2

0833

333

225.

0895

8333

SUM

OF S

QUARES

71.29375000

0.56041667

MEAN SQUARE

127.22

F VALUE

0.950205

R-SQUARE

0.0001

0.74860982

PR > F

ROOT MSE

2.3108

32.39583333

C. V.

MILEAGE MEAN

ADDT

YPE

SOUR

CE3DF

213.

8812

5000

TYPE

I S

S127.22

F VALUE

0.0001

PR > F

3DF213.88125000

TYPE III SS

127.22

F VALUE

0.0001

PR > F

MUZ-

MUW

MUZ-

MUX

MUZ-

MUY

MUY-

MUW

MUY-

MUX

MUX-

MUW

(Y+Z

)/2-

(X+W

)/2

PARA

METE

R9.14

17.01

0.89

8.25

16.12

-7.87

17.86

T FOR H0:

PARAMETER=0

0.0001

0.0001

0.3857

0.0001

0.0001

0.0001

0.0001

PR > |T|

0.43221008

0.43221008

0.43221008

0.43221008

0.43221008

0.43221008

0.30561868

STD ERROR OF

ESTIMATE

24

OBSE

RVAT

ION

35.3

0000

000

OBSE

RVED

VALU

E35.31666667

PREDICTED

VALUE

-0.01666667

RESIDUAL

34.67916207

LOWE

Bowerman Experimental Chptr 1

Documents

Transcript of Bowerman Experimental Chptr 1