23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates...

27
23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate. • The procedure, ANCOVA, is a combination of ANOVA with regression.

Transcript of 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates...

Page 1: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-1

Analysis of Covariance(Chapter 16)

• A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable, X, sometimes called a covariate.

• The procedure, ANCOVA, is a combination of ANOVA with regression.

Page 2: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-2

Example: Calf Weight Gain

• An animal scientist wishes to examine the impact of a pair of new dietary supplements on calf weight gain (response).

• Three treatments are defined: standard diet, standard diet + supplement Q, and standard diet + supplement R.

• All new calves from a large herd are available for use as study units. She selects 30 calves for study. Calves are randomized to the three diets at random (completely randomized design).

• Initial weights are recorded, then calves are placed on the diets. At the end of four weeks the final weight is taken and weight gain is computed.

• Simple analysis of variance and associated multiple comparisons procedures indicate no significant differences in weight gain between the two supplementary diets, but big differences between the supplemental diets and the standard diet.

• Is this the end of the story? …

Page 3: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-3

ANOVA Results

Average Weight Gain(Response g/day)

StandardDiet

+ Supplement Q

+ Supplement R

xx x xx x x xx

xxx xxx x xx

xx xx x xx x x x

Simple ANOVA of a one-way classification would suggest no difference between Supplements Q and R but both different from Standard diet.

Page 4: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-4

Initial Weights

Initial Weight

StandardDiet

+ Supplement Q

+ Supplement R

xx xx xx x x xx

xx x xxxx x xx

x x xxx xx x x x

Plotting of the initial weights by group shows that the groups were not equal when it came to initial weights.

Page 5: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-5

Weight Gain to Initial WeightStandard Diet

Weight (kg)

wF1

age

wi1

wi2

wgain1

wF2

2gainw

If animals come into the study at different ages, they have different initial weights and are at different points on the growth curve. Expected weight gains will be different depending on age at entry into study.

Page 6: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-6

Regression of Initial Weight to Weight Gain

Initial Weight

(x)

WeightGain

(g/day)(Y)

wi1

wi2

wgain1

wgain2

If we disregard the age of the animal but instead focus on the initial weight, we see that there is a linear relationship between initial weight and the weight gain expected.

Page 7: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-7

Covariates

Initial weight in the previous example is a covariable or covariate.

A covariate is a disturbing variable (confounder), that is, it is known to have an effect on the response. Usually, the covariate can be measured but often we may not be able to control its effect through blocking.

In the EXAMPLE, had the animal scientist known that the calves were very variable in initial weight (or age), she could have:

• Created blocks of 3 or 6 equal weight animals, and randomized treatments to calves within these blocks.

• This would have entailed some cost in terms of time spent sorting the calves and then keeping track of block membership over the life of the study.

• It was much easier to simply record the calf initial weight and then use analysis of covariance for the final analysis.

• In many cases, due to the continuous nature of the covariate, blocking is just not feasible.

Page 8: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-8

Expectations under Ho

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

Under Ho: no treatment effects.

If all animals had come in with the same initial weight, All three treatments would produce the same weight gain.

Average Weight Animal

Page 9: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-9

Expectations under HA

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

Standard Diet (c)

+ Supplement Q (q)

+ Supplement R (r)

Under Ha: Significant Treatment effects

Average Weight Animal

WGs

WGQ

WGR

Different treatments produce different weight gains for animals of the same initial weight.

Page 10: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-10

Different Initial Weights

Initial Weight

(x)

ExpectedWeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

Under Ho: no treatment effects.

If the average initial weights in the treatment groups differ, the observed weight gains will be different, even if treatments have no effect.

WGs

WGQ

WGR

Page 11: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-11

Observed Responses under HA

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

q

cc

c

c

c

cc

c

cc

rrr

rrrrrr

r Standard Diet

+ Supplement Q

+ Supplement R

Under HA: Significant Treatment effects

Suppose now that different supplements actually do increase weight gain.This translates to animals in different treatment groups following different, but parallel regression lines with initial weight.

WGR

WGs

WGQ

What difference in weight gain is due to Initial weight and what is due to Treatment?

Page 12: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-12

Observed Group Means

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

qc

cc

c

c

c

c

c

cc

rrr

rr

rrrr

rStandard Diet

+ Supplement Q

+ Supplement R

yc

y ryq

Unadjusted treatment means

Simple one-way classification ANOVA (without accounting for initial weight) gives us the wrong answer!

Page 13: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-13

Predicted Average Responses

Initial Weight

(x)

WeightGain

(g/day)(Y)

cc cc cc c c cc qq

q qqqq q qq r r rrr rr r r r

qq

q

qq

qq

qq

qc

cc

c

c

c

c

c

cc

rrr

rr

rrrr

rStandard Diet

+ Supplement Q

+ Supplement R

|yq X x

|yc X x

|y r X x

X x

Expected weight gain is computed for treatments for the average initial weight and comparisons are then made.

Adjusted treatment means

Page 14: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-14

ANCOVA: Objectives

The objective of an analysis of covariance is to compare the treatment means after adjusting for differences among the treatments due to differences in the covariate levels for the treatments groups.

The analysis proceeds by combining a regression model with an analysis of variance model.

Page 15: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-15

Model

ij i ijE(y ) x=m+a +b

The i, i=1,…,t, are estimates of how each of the t treatments modifies the

overall mean response. (The index j=1,…,n, runs over the n replicates for each treatment.)

The slope coefficient, , is a measure of how the average response changes as the value of the covariate changes.

The analysis proceeds by fitting a linear regression model with dummy variables to code for the different treatment levels.

Page 16: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-16

A Priori Assumptions

The covariate is related to the response, and can account for variation in the response.

Check with a scatterplot of Y vs. X.

The covariate is NOT related to the treatments.If Y is related to X, then the variance of the treatment differences is

increased relative to that obtained from an ANOVA model without X, which results in a loss of precision.

The treatment’s regression equations are linear in the covariate.

Check with a scatterplot of Y vs. X, for each treatment. Non-linearity can be accommodated (e.g. polynomial terms, transforms), but analysis may be more complex.

The regression lines for the different treatments are parallel.This means there is only one slope in the Y vs. X plots. Non-parallel

lines can be accommodated, but this complicates the analysis since differences in treatments will now depend on the value of X.

Page 17: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-17

Example

Four different formulations of an industrial glue are being tested. The tensile strength (response) of the glue is known to be related to the thickness as applied. Five observations on strength (Y) in pounds, and thickness (X) in 0.01 inches are made for each formulation.

Here: • There are t=4 treatments (formulations of glue). • Covariate X is thickness of applied glue.• Each treatment is replicated n=5 times at different values of X.

Formulation Strength

Thickness

1 46.5 13

1 45.9 14

1 49.8 12

1 46.1 12

1 44.3 14

2 48.7 12

2 49.0 10

2 50.1 11

2 48.5 12

2 45.2 14

3 46.3 15

3 47.1 14

3 48.9 11

3 48.2 11

3 50.3 10

4 44.7 16

4 43.0 15

4 51.0 10

4 48.1 12

4 46.8 11

Page 18: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-18

Formulation Profiles

40.0

44.0

48.0

52.0

16 15 10 12 11

Thickness (X)

Strength(Y)

Form_1 Form_2 Form_3 Form_4

Page 19: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-19

SAS Program data glue; input Formulation Strength Thickness; datalines;1 46.5 131 45.9 141 49.8 121 46.1 121 44.3 142 48.7 122 49.0 102 50.1 112 48.5 122 45.2 143 46.3 153 47.1 143 48.9 113 48.2 113 50.3 104 44.7 164 43.0 154 51.0 104 48.1 124 46.8 11;run;proc glm; class formulation; model strength = thickness formulation / solution ; lsmeans formulation / stderr pdiff; run;

The basic model is a combination of regression and one-way classification.

Page 20: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-20

Output: Use Type III SS to test significance of each variable

Source DF Squares Mean Square F Value Pr > F Model 4 66.31065753 16.57766438 10.17 0.0003 Error 15 24.44684247 1.62978950 Corrected Total 19 90.75750000

R-Square Coeff Var Root MSE Strength Mean 0.730636 2.691897 1.276632 47.42500

Source DF Type I SS Mean Square F Value Pr > F Thickness 1 63.50120135 63.50120135 38.96 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405

Source DF Type III SS Mean Square F Value Pr > F Thickness 1 53.20115753 53.20115753 32.64 <.0001 Formulation 3 2.80945618 0.93648539 0.57 0.6405

Standard Parameter Estimate Error t Value Pr > |t|

Intercept 58.93698630 B 2.21321008 26.63 <.0001 Thickness -0.95445205 0.16705494 -5.71 <.0001 Formulation 1 -0.00910959 B 0.80810401 -0.01 0.9912 Formulation 2 0.62554795 B 0.82451389 0.76 0.4598 Formulation 3 0.86732877 B 0.81361075 1.07 0.3033 Formulation 4 0.00000000 B . . .

Regression on thickness is significant.No formulation differences.

Divide by MSE to get mean squares.

MSE

Page 21: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-21

Least Squares Means(Adjusted Formulation means computed at the

average value of Thickness [=12.45])

The GLM Procedure Least Squares Means

Strength Standard LSMEAN Formulation LSMEAN Error Pr > |t| Number

1 47.0449486 0.5782732 <.0001 1 2 47.6796062 0.5811616 <.0001 2 3 47.9213870 0.5724527 <.0001 3 4 47.0540582 0.5739134 <.0001 4

Least Squares Means for effect Formulation Pr > |t| for H0: LSMean(i)=LSMean(j)

Dependent Variable: Strength

i/j 1 2 3 4 1 0.4574 0.3011 0.9912 2 0.4574 0.7695 0.4598 3 0.3011 0.7695 0.3033 4 0.9912 0.4598 0.3033

Page 22: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-22

ANCOVA in MinitabFormulation Strength Thickness

1 46.5 131 45.9 141 49.8 121 46.1 121 44.3 142 48.7 122 49.0 102 50.1 112 48.5 122 45.2 143 46.3 153 47.1 143 48.9 113 48.2 113 50.3 104 44.7 164 43.0 154 51.0 104 48.1 124 46.8 11

Stat > ANOVA > General Linear Model …

> Responses: Strength

> Model: Formulation

> Covariates: Thickness

> Options: Adjusted (Type III) Sums of Squares

General Linear Model: Strength versus Formulation

Factor Type Levels Values Formulat fixed 4 1 2 3 4

Source DF Seq SS Adj SS Adj MS F PThicknes 1 63.501 53.201 53.201 32.64 0.000Formulat 3 2.809 2.809 0.936 0.57 0.640Error 15 24.447 24.447 1.630Total 19 90.758

Term Coef SE Coef T PConstant 59.308 2.099 28.25 0.000Thicknes -0.9545 0.1671 -5.71 0.000Formulat1 -0.3801 0.5029 -0.76 0.4622 0.2546 0.5062 0.50 0.6223 0.4964 0.4962 1.00 0.333

Page 23: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-23

4321

47.9

47.8

47.7

47.6

47.5

47.4

47.3

47.2

47.1

47.0

Formulation

Streng

thMain Effects Plot - LS Means for Strength

Factor Plots… > Main Effects Plot > Formulation

Page 24: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-24

ANCOVA in R> glue <- read.table("glue.txt",header=TRUE)> glue$Formulation <- as.factor(glue$Formulation) > # fit linear models: full, thickness only, formulation only> full.lm <- lm(Strength ~ Formulation + Thickness, data=glue)> thick.lm <- lm(Strength ~ Thickness, data=glue)> formu.lm <- lm(Strength ~ Formulation, data=glue)>> anova(thick.lm,full.lm)Analysis of Variance Table

Model 1: Strength ~ ThicknessModel 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F)1 18 27.2563 2 15 24.4468 3 2.8095 0.5746 0.6405

> anova(formu.lm,full.lm)Analysis of Variance Table

Model 1: Strength ~ FormulationModel 2: Strength ~ Formulation + Thickness Res.Df RSS Df Sum of Sq F Pr(>F) 1 16 77.648 2 15 24.447 1 53.201 32.643 4.105e-05 ***

Test for Formulation differences

Test for significance of Thickness

Page 25: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-25

R> summary(full.lm)Call: lm(formula = Strength ~ Formulation + Thickness, data = glue)

Residuals: Min 1Q Median 3Q Max -1.6380 -1.0398 0.1873 0.6966 2.3255

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.92788 2.24551 26.243 5.97e-14 ***Formulation2 0.63466 0.83193 0.763 0.457 Formulation3 0.87644 0.81840 1.071 0.301 Formulation4 0.00911 0.80810 0.011 0.991 Thickness -0.95445 0.16706 -5.713 4.11e-05 ***

> summary(thick.lm)Call: lm(formula = Strength ~ Thickness, data = glue)

Residuals: Min 1Q Median 3Q Max -2.0813 -0.7324 0.1274 0.9090 1.9230

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 59.9294 1.9504 30.726 < 2e-16 ***Thickness -1.0044 0.1551 -6.476 4.32e-06 ***

Residual standard error: 1.231 on 18 degrees of freedomMultiple R-Squared: 0.6997, Adjusted R-squared: 0.683 F-statistic: 41.94 on 1 and 18 DF, p-value: 4.317e-06

Full model (can be refined by omitting formulation)

Reduced model (formulation omitted)

Page 26: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-26

RPlot lines for full model; but these can all be replaced by single line for reduced model (blue).

Page 27: 23-1 Analysis of Covariance (Chapter 16) A procedure for comparing treatment means that incorporates information on a quantitative explanatory variable,

23-27

RCheck fit of reduced model (with just thickness).