Analysis of Covariance - Southern Methodist...

Theory Henson Data Another Example Gain Score Model

Analysis of Covariance

Dr. J. Kyle Roberts

Southern Methodist UniversitySimmons School of Education and Human Development

Department of Teaching and Learning


ANCOVA Theory

• The thought behind the analysis of covariance is thatsomeone might want to conduct an analysis in which they“control” for certain variables before doing an ANOVA.

• This type of analysis is frequently used in psychologicalliterature. In education, this might happen when we arewanting to test for differences in student mathematics abilityamong ethnic groups while controlling for student readinglevel.

• ANCOVA “combines” regression and ANOVA in that we doan ANOVA on the residualized (sort of) results from the linearmodel.

• This is like doing regression and then ANOVA.


ANCOVA Theory, cont.

• Glass and Stanley (1970) said that ANCOVA can be thoughtof as “an analysis of variance performed on the (Y − Y )scores where the Y s are predicted in the usual b1X + b0 way”(p. 499).

• Thompson (2006) gives three very strong cautions:

1. Homogeneity of regression assumption must be met.2. Covariate data must be extremely reliable.3. Residualized dependent variable scores must be interpretable.


The ANCOVA Model

• The one-factor ANCOVA fixed-effects model can be writtenas:

Yij = µY + αj + βw(Xij − µX) + εij

where• Yij is the dependent variable score for individual i in group j• µY is the grand mean of the dv• αj is the group effect for group j• βw is the within-groups regression slope• Xij is the observed score on the covariate• µX is the grand mean for the iv• εij is the random residual error


Null Hypothesis and Assumptions

• For the ANCOVA, the null hypothesis is stated as:

H0 : µ′.1 = µ′

.2 = · · · = µ′.j

where µ′.j is the adjusted mean for the dv for group j in the

presence of the covariates.• Assumptions

1. Random and independent errors2. Homogeneity of variance3. Homogeneity of regression

F =(SSwith(adj) − SSres)/(J − 1)

SSres/(N − 2J)

where SSres is the sum of squared residuals

SSres =J∑

j=1

SSj(1− r2j )


Henson (1998) - ANCOVA with Intact Groups

We are working with the Henson (1998) data.The data are athttp://faculty.smu.edu/kyler/courses/7312/henson.txt> henson <- read.table("henson.txt", header = T)> str(henson)

’data.frame’: 12 obs. of 3 variables:

$ read : int 30 30 40 40 45 45 50 50 55 55 ...

$ achieve: int 34 36 46 46 49 50 60 68 67 70 ...

$ edu : Factor w/ 2 levels "regular ed","special ed": 2 2 2 2 2 2 1 1 1 1 ...

> head(henson)

read achieve edu

1 30 34 special ed

2 30 36 special ed

3 40 46 special ed

4 40 46 special ed

5 45 49 special ed

6 45 50 special ed

http://faculty.smu.edu/kyler/courses/7312/henson.txt


Graphing Data

read

achi

eve

40

50

60

70

80

30 40 50 60

●

●●

●

●

●

●

●

●●

●●

read

achi

eve

40

50

60

70

80

30 40 50 60

●

●

●●

●●

●

●●

●

●

●


Graphing Data (again)

●●

●●

●●

●

● ●

●

●

●

30 35 40 45 50 55 60 65

4050

6070

80

read

achi

eve


First running the ANOVA

> m0 <- aov(achieve ~ edu, henson)> anova(m0)

Analysis of Variance Table

Response: achieve

Df Sum Sq Mean Sq F value Pr(>F)

edu 1 2380.08 2380.08 36.458 0.0001256

Residuals 10 652.83 65.28


Running the ANCOVA

In the ANCOVA case, it is imperative that you list the covariate(s)first in lm and the factor(s) at the end.> m1 <- lm(achieve ~ read + edu, henson)> anova(m1)


Response: achieve


read 1 2912.04 2912.04 397.3828 9.346e-09

edu 1 54.93 54.93 7.4953 0.02293

Residuals 9 65.95 7.33


Testing the Homogeneity of Regression Assumption> m2 <- lm(achieve ~ read * edu, henson)> summary(m2)

Call:

lm(formula = achieve ~ read * edu, data = henson)

Residuals:

Min 1Q Median 3Q Max

-3.2857 -1.5000 0.1786 0.8571 4.7143

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.4286 9.9931 0.043 0.967

read 1.2571 0.1753 7.172 9.5e-05

eduspecial ed 5.2857 12.0917 0.437 0.674

read:eduspecial ed -0.2714 0.2479 -1.095 0.305

Residual standard error: 2.678 on 8 degrees of freedom

Multiple R-squared: 0.9811, Adjusted R-squared: 0.974

F-statistic: 138.3 on 3 and 8 DF, p-value: 3.124e-07


Lomax (2001) - Chapter 16

The data are athttp://faculty.smu.edu/kyler/courses/7312/lomax.txt> lomax <- read.table("lomax.txt", header = T)> str(lomax)

’data.frame’: 12 obs. of 3 variables:

$ quiz : int 1 2 3 4 5 6 1 2 4 5 ...

$ aptitude: int 4 3 5 6 7 9 1 3 2 4 ...

$ group : Factor w/ 2 levels "group 1","group 2": 1 1 1 1 1 1 2 2 2 2 ...

> head(lomax)

quiz aptitude group

1 1 4 group 1

2 2 3 group 1

3 3 5 group 1

4 4 6 group 1

5 5 7 group 1

6 6 9 group 1

http://faculty.smu.edu/kyler/courses/7312/lomax.txt


Graphical Exploration> print(xyplot(quiz ~ aptitude, groups = group,+ data = lomax, type = c("p", "r"), pch = 16))

aptitude

quiz

1

2

3

4

5

6

2 4 6 8

●

●

●

●

●

●

●

●

●

●

● ●


Running the ANOVA

> new0 <- aov(quiz ~ group, lomax)> anova(new0)


Response: quiz


group 1 0.75 0.75 0.1899 0.6723

Residuals 10 39.50 3.95


Running the ANCOVA

> new1 <- lm(quiz ~ aptitude + group, lomax)> anova(new1)


Response: quiz


aptitude 1 20.8807 20.8807 21.961 0.001142

group 1 10.8122 10.8122 11.372 0.008228

Residuals 9 8.5571 0.9508


Covariance Adjustment Model for Gain Scores

• Suppose that we want to do an analysis of gain scores and wewant to see if there are some variables that help us explaindifferences in gain.

• For example, we might want to see if there were differinglevels of gain for two different groups.

• This is typical in a pre- and post-test design, whereby wemight look at the differences in gain between a treatment andcontrol group.

• This type of design does have some important assumptions,though.

• Equivalence at pretest• Homogeneity fo variance at pretest


Gain Score Data

Consider the following dataset.> gains1 <- data.frame(pre1 = c(rnorm(15, 20, 8),+ rnorm(12, 20, 7)), post1 = c(rnorm(15, 22,+ 7), rnorm(12, 35, 4)), treat = rep(c("control",+ "treatment"), c(15, 12)))> head(gains1)

pre1 post1 treat

1 4.457924 14.77789 control

2 31.403569 19.65162 control

3 12.772580 19.23911 control

4 14.929620 29.75043 control

5 33.888733 21.20869 control

6 3.592474 11.58902 control


> t.test(pre1 ~ treat, gains1)

Welch Two Sample t-test

data: pre1 by treat

t = 0.3426, df = 22.626, p-value = 0.735

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

-5.433108 7.587736

sample estimates:

mean in group control mean in group treatment

17.39818 16.32086

> t.test(post1 ~ treat, gains1)

Welch Two Sample t-test

data: post1 by treat

t = -8.3247, df = 21.546, p-value = 3.555e-08



-17.32007 -10.40477

sample estimates:

mean in group control mean in group treatment

21.31432 35.17674


Paired Samples Test

> with(gains1[gains1$treat == "treatment", ], t.test(pre1,+ post1, paired = T))

Paired t-test

data: pre1 and post1

t = -11.7569, df = 11, p-value = 1.435e-07



-22.38586 -15.32590

sample estimates:

mean of the differences

-18.85588


Covariance Adjustment Model> gains1$gainscore <- gains1$post - gains1$pre> summary(m.gain <- lm(gainscore ~ pre1 + treat,+ gains1))

Call:

lm(formula = gainscore ~ pre1 + treat, data = gains1)

Residuals:


-7.996 -2.626 -1.551 2.162 12.097

Coefficients:


(Intercept) 19.1347 2.1878 8.746 6.29e-09

pre1 -0.8747 0.1059 -8.259 1.79e-08

treattreatment 13.9974 1.7731 7.894 3.99e-08





Covariance Adjustment Model> summary(m.gain2 <- lm(gainscore ~ pre1 * treat,+ gains1))

Call:

lm(formula = gainscore ~ pre1 * treat, data = gains1)

Residuals:


-8.097 -2.729 -1.530 2.298 12.069

Coefficients:


(Intercept) 19.26184 2.41921 7.962 4.65e-08

pre1 -0.88203 0.12059 -7.314 1.93e-07

treattreatment 13.38055 4.85528 2.756 0.0113

pre1:treattreatment 0.03731 0.27251 0.137 0.8923





Homework for ANCOVA and Covariance Adjustment Model

1. Look at current journal articles in which the study usedANCOVA. You will probably have the best luck in thepsychology literature. Try and find an article that DOESreport checking for homogeneity of regression. Bring enoughcopies of just that page for everyone in the class.

2. Look back at your homework assignment from http://faculty.smu.edu/kyler/courses/7311/paired_hw.pdf.Re-run this analysis as a covariance adjustment model (justcontrolling for pretest differences on the gain scores). Doesyour output change your interpretation of your findings?

http://faculty.smu.edu/kyler/courses/7311/paired_hw.pdf

http://faculty.smu.edu/kyler/courses/7311/paired_hw.pdf

Analysis of Covariance - Southern Methodist...

Documents

Transcript of Analysis of Covariance - Southern Methodist...