Use and Abuse of Ancova

6
Focus on Qualitative Methods Uses and Abuses of the Analysis of Covariance Steven V. Owen, 1* Robin D. Froman 2† 1 Bureau of Educational Research, Box U-4, University of Connecticut, Storrs, CT 06269 2 Center for Nursing Research, University of Connecticut, Storrs, CT Received 30 March 1998; accepted 11 August 1998 Abstract: The analysis of covariance (ANCOVA) is a powerful analytic tool, but there continue to be abuses of the method. We review assumptions and illustrate legitimate uses of ANCOVA, and summarize statistical packages’ approach to the method. Finally, we consider how ANCO- VA is used in contemporary nursing research. © 1998 John Wiley & Sons, Inc. Res Nurs Health 21:557 – 562, 1998 Keywords: analysis of covariance, ANCOVA Research in Nursing & Health, 1998, 21, 557–562 © 1998 John Wiley & Sons, Inc. CCC 0160-6891/98/060557-06 557 As many statistics books point out, the analysis of covariance (ANCOVA) has two primary pur- poses: (a) to improve the power of a statistical analysis by reducing error variance, and (b) to sta- tistically “equate” comparison groups. The first purpose operates well when participants are ran- domly assigned to their groups. But using ANCO- VA with intact or pre-existing groups can have the opposite effect, a reduction in statistical power. The second purpose usually accompanies nonran- dom group comparisons, and analysts apply AN- COVAto make the group comparisons more “fair.” In this article, we review the merits and demer- its of these claims for ANCOVA. More specifical- ly, we explore various ANCOVA pitfalls that can deliver misleading results for the unwary analyst, and review appropriate uses of ANCOVA. We also show how statistical packages (BMDP, SPSS, SAS, and SYSTAT) differ in their approach to AN- COVA. Though our focus is on the conventional ANOVA formulation, for researchers who sub- scribe to Cohen’s (1968) idea that regression analysis can do ( just about) anything, our remarks apply to regression models as well. In fact, re- gression models may be more vulnerable to AN- COVA problems because independent variables often serve as covariates whether or not the re- searcher intended them to take that role. When Sir Ronald Fisher invented the ANCOVA model in the 1930s, he took random assignment and experimental control for granted. Fisher had been studying agricultural methods, and random assignment was easy to arrange. The point of his invention was to enhance the precision of the sta- tistical analysis. Today, ANCOVAis used routine- ly with quasi-experimental data where treatments cannot—because of expense, ethical concerns, or general disruptiveness—be randomly assigned to participants. The inability to assign participants to treatments is particularly evident in health care re- search. For example, in comparing lung vital ca- pacity in smokers and nonsmokers, participants self-select themselves into the two comparison groups. If the researcher thinks that age might be a confounding variable, age might be assigned to a covariate role. Whether that decision is a good or bad one depends largely on two ANCOVA as- sumptions. The first statistical assumption is that the co- variate(s) is(are) uncorrelated with other indepen- dent variables. In the smoking example, is age cor- related with the independent variable, groups? If the correlation is nonzero, then removing the vari- ance associated with age will also remove some of Correspondence to Steven V. Owen. * Associate Director. Director.

Transcript of Use and Abuse of Ancova

Page 1: Use and Abuse of Ancova

Focus on Qualitative Methods

Uses and Abuses of the Analysisof Covariance

Steven V. Owen,1* Robin D. Froman2†

1 Bureau of Educational Research, Box U-4, University of Connecticut, Storrs, CT 062692 Center for Nursing Research, University of Connecticut, Storrs, CT

Received 30 March 1998; accepted 11 August 1998

Abstract: The analysis of covariance (ANCOVA) is a powerful analytic tool, but there continueto be abuses of the method. We review assumptions and illustrate legitimate uses of ANCOVA,and summarize statistical packages’ approach to the method. Finally, we consider how ANCO-VA is used in contemporary nursing research. © 1998 John Wiley & Sons, Inc. Res Nurs Health21:557–562, 1998

Keywords: analysis of covariance, ANCOVA

Research in Nursing & Health, 1998, 21, 557–562

© 1998 John Wiley & Sons, Inc. CCC 0160-6891/98/060557-06 557

As many statistics books point out, the analysisof covariance (ANCOVA) has two primary pur-poses: (a) to improve the power of a statisticalanalysis by reducing error variance, and (b) to sta-tistically “equate” comparison groups. The firstpurpose operates well when participants are ran-domly assigned to their groups. But using ANCO-VA with intact or pre-existing groups can have theopposite effect, a reduction in statistical power.The second purpose usually accompanies nonran-dom group comparisons, and analysts apply AN-COVAto make the group comparisons more “fair.”

In this article, we review the merits and demer-its of these claims for ANCOVA. More specifical-ly, we explore various ANCOVA pitfalls that candeliver misleading results for the unwary analyst,and review appropriate uses of ANCOVA. We alsoshow how statistical packages (BMDP, SPSS,SAS, and SYSTAT) differ in their approach to AN-COVA. Though our focus is on the conventionalANOVA formulation, for researchers who sub-scribe to Cohen’s (1968) idea that regressionanalysis can do ( just about) anything, our remarksapply to regression models as well. In fact, re-gression models may be more vulnerable to AN-COVA problems because independent variablesoften serve as covariates whether or not the re-

searcher intended them to take that role.When Sir Ronald Fisher invented the ANCOVA

model in the 1930s, he took random assignmentand experimental control for granted. Fisher hadbeen studying agricultural methods, and randomassignment was easy to arrange. The point of hisinvention was to enhance the precision of the sta-tistical analysis. Today, ANCOVA is used routine-ly with quasi-experimental data where treatmentscannot—because of expense, ethical concerns, orgeneral disruptiveness—be randomly assigned toparticipants. The inability to assign participants totreatments is particularly evident in health care re-search. For example, in comparing lung vital ca-pacity in smokers and nonsmokers, participantsself-select themselves into the two comparisongroups. If the researcher thinks that age might be aconfounding variable, age might be assigned to acovariate role. Whether that decision is a good orbad one depends largely on two ANCOVA as-sumptions.

The first statistical assumption is that the co-variate(s) is(are) uncorrelated with other indepen-dent variables. In the smoking example, is age cor-related with the independent variable, groups? Ifthe correlation is nonzero, then removing the vari-ance associated with age will also remove some of

Correspondence to Steven V. Owen.*Associate Director.†Director.

Page 2: Use and Abuse of Ancova

the variance associated with the grouping vari-able. This in effect leaves less of the dependentvariable’s (lung vital capacity) variance to be ac-counted for by the independent variable (smok-ing). Figure 1 illustrates the situation. Notice thatthe covariate, age, overlaps with smoking status(arrowed portion), absorbing some of smoking’srelationship with lung vital capacity.

In the frequent case where ANCOVA isarranged specifically to “equate” groups that dif-fer on some pretest measure, then the analyst hasautomatically violated the assumption. Does thatmake any difference? Wu and Slakter (1989), dis-cussing ANCOVA in nursing research, showed nohesitation in recommending the technique to ad-just for pre-existing group differences; whereasPedhazur and Schmelkin (1991, p. 283) remarkedthat the approach “is fraught with serious biasesand threats to validity.” Our position is morealigned with Pedhazur and Schmelkin’s.

The second statistical assumption for ANCOVAis that the covariate(s) is (are) correlated with thedependent variable. When a covariate is a pretestand the dependent variable is the posttest, thereshould be a substantial correlation between thetwo (but see Pedhazur and Schmelkin [1991, pp.283–284] for other potential problems with sucha design).

In the smoking example, age of course is not apretest; rather, it is considered a proxy variable,that is, a convenient substitute for other underly-ing constructs. In a proxy role, age might representlonger opportunity for smoking and increased vul-nerability to an earlier cultural acceptance ofsmoking. What is the correlation of age with thedependent variable, lung vital capacity? If the cor-relation is small, then the covariate adjustment issimilarly small, and there will not be a noticeableimprovement in power. There might even be a re-

duction in power, because the covariate uses a de-gree of freedom that had been assigned to the er-ror term. That results in an increase in the meansquare error (reducing power). When sample sizeis small to begin with, the covariate must be strongenough to compensate for the loss of the error de-gree of freedom. But even if the covariate-depen-dent variable correlation is large, there may or maynot be an improvement in power; it depends onmeeting the first assumption.

EXAMPLE 1: REDUCING STATISTICAL POWER

The data here are from an example in the BMDPManual, Volume 1 (Dixon, 1992), in which 40 par-ticipants run a mile, and then their pulse rates aremeasured. A 2 (Sex) by 2 (Smoking) ANOVA isarranged. Both main effects are significant: SexF(1,36) 5 37.33, p ,.001, effect size1 (partial h2)5 .51; Smoking F(1,36) 5 6.56, p ,.01, effectsize 5 .15. The interaction term is nonsignificant.

Things change when the data are rerun, usingbaseline pulse as a covariate. Once again, the maineffects are significant: Sex F(1,35) 5 21.81, p,.001, effect size (partial h2) 5 .38; SmokingF(1,35) 5 5.71, p ,.05, effect size 5 .14. Noticethat the effect size for Sex has dropped substan-tially. That is because baseline pulse was correlat-ed with Sex, violating assumption a1.

What about the substantive problem of inter-pretation? This involves the well known problemof variance partitioning. Darlington (1968) andother early analysts described the problem clearly,and admitted no answers. Even today, Pedhazur(1997) has addressed an entire chapter to the issue,because “variance partitioning is widely used,mostly abused, in the social sciences for deter-mining the relative importance of independentvariables. . . .[In the last 15 years,] abuses of vari-ance partitioning have not abated but rather in-creased” (p. 243). When variables covary, there isno satisfactory way to assign unique explanatorypower to them individually. One can make oddsounding statements that reveal how confusing thesituation is. For example, “Adjusting for initialpulse rate, sex is associated with post-exercisepulse rate.” What does it mean to hold pulse rateconstant, as though everyone had the same initial

558 RESEARCH IN NURSING & HEALTH

FIGURE 1. A covariate erodes the effect of another in-dependent variable.

1In ANCOVA, partial h2 is defined as:

adjusted SS for effect

(adjusted SS for effect 1 adjusted SS for error)

(Tabachnick & Fidell, 1996, p. 349).

Page 3: Use and Abuse of Ancova

pulse rate? What value is the hypothetical constantpulse rate? Could one choose a different pulse rateto hold constant? Back to Pedhazur: “Unfortu-nately, applications of ANCOVA in quasi-experi-mental and nonexperimental research are by andlarge not valid” (1997, p. 654).

EXAMPLE 2: IMPROVING STATISTICAL POWER

These data are from an intervention study of 32veterans diagnosed with post-traumatic stress dis-order. Two treatments are randomly assigned: A 1-week Outward Bound experience, or regularcounseling sessions at a Veterans Affairs hospital.Pre-intervention and 1-week follow-up measuresare taken with the Beck Hopelessness Scale (Beck& Steer, 1988).

A one-way ANOVA is run, and the main effectfor treatment is significant: F(1,30) 5 10.67, p,.01, effect size (partial h2) 5 .26. As in the firstexample, the data are rerun, this time using base-line Hopelessness scores as a covariate. Onceagain, the treatment effect is significant: F(1,29)5 17.24, p ,.001, effect size = .37. In this case,all the signs of increased statistical power are pre-sent: larger F-ratio, smaller p value, and larger ef-fect size. In this case, ANCOVAdid its job becauseboth assumptions were in place: Random assign-ment of treatments to participants creates an ex-pected correlation of zero between the pretest andthe grouping variable; and pretest scores are theo-retically and statistically related to the outcomemeasure.

Figure 2 depicts this case. Because of randomassignment of treatment groups, the groupingvariable is not related to the covariate, Hopeless-ness pretest scores. But the covariate is related tothe dependent variable, and boosts the indepen-dent variable’s power by removing some of whatotherwise would be error variance. Once the de-pendent variable’s variance associated with thecovariate’s variance is removed, the portion of theremaining variance in the independent variableshared with the independent variable (treatment)becomes larger.

OTHER ANCOVA ASSUMPTIONS

The usual ANOVA assumptions—homogeneityof variance, normality, and independence ofscores—hold for ANCOVA as well. And, as usu-al, the F-ratio can withstand some disruption inhomogeneity of variance and normality (especial-ly with equal cell sizes), but it is highly vulnerableto correlated scores, which create Type I errors.

There is another ANOVA/ANCOVA assump-tion often unmentioned in statistics books: Nomeasurement error in the covariate(s). In the caseof ANCOVA with random assignment, covariatemeasurement error does not bias the adjustedmeans, but it does produce less statistical power,which in turn increases the probability of a Type IIerror. With a quasi-experimental design lackingrandom assignment, covariate measurement errorcreates bias in adjusted means. The bias is usuallynegative (underadjustment), but under some con-ditions can be positive (Bryk & Weisberg, 1977).

Although measurement error in the dependentvariable is not an ANOVA/ANCOVA assumption,it can disrupt statistical power. With ANOVA,measurement error in the dependent variable re-duces statistical power, but with ANCOVA, theoutcome is less predictable: Even Type I errorsmay result if the covariate is correlated to other in-dependent variables, because measurement errorin the covariate may now ripple through the entiremodel by way of its correlations with other vari-ables.

Homogeneity of regression slopes is an addi-tional assumption for the ANCOVA model. Thismeans that each comparison group should show asimilar regression slope when the dependent vari-able is regressed on the covariate(s). The reasonfor the assumption is that all groups’ dependentvariable scores are adjusted based on a pooled re-gression slope; if the groups’ individual slopes dif-fer sharply, then the pooling becomes a muddy av-erage.

USES AND ABUSES OF ANCOVA / OWEN AND FROMAN 559

FIGURE 2. A covariate improves the effect of another in-dependent variable.

Page 4: Use and Abuse of Ancova

Interestingly, when cell sizes are equal, the AN-COVA F-ratios are generally robust except for themost gross violations of homogenity of regression(Hamilton, 1977; Wu, 1984). That does not meanthat equal cell sizes allows the analyst to ignore thehomogeneity of slope assumption. A robust F-ra-tio is a statistical summary that delivers no partic-ular insight about how groups are different. It canbe far more informative, following a violation ofhomogenous slope, to calculate Johnson-Neymanregions of significance. This technique helps tomap out where groups do and do not differ alongvarious values of the covariate. Dorsey andSoeken (1996) produced an introduction to theJohnson-Neyman method applied to nursing re-search.

The final ANCOVA assumption is that the rela-tionship between the covariate(s) and the depen-dent variable is linear. Because the regression isbased only on the linear portion, any systematicbut nonlinear relationship will cause a reduction instatistical power. The simplest solution for nonlin-earity is to apply a power transformation (e.g.,quadratic, cubic) to the covariate before the AN-COVA analysis.

WHAT DO NURSE RESEARCHERS DO WITH ANCOVA?

We searched four important nursing journals overa 5-year period (1993–1997; reference list avail-able from first author) for examples of ANCOVA.Image had none, in keeping with its recent drift to-ward qualitative research (Henry, 1998). WesternJournal of Nursing Research had only 2, Researchin Nursing & Health published 5 , and Nursing Re-search showed 9, for a total of 16. Of those arti-cles, in 9 (56%) the investigator used random as-signment of participants to treatments, so thecovariate(s) were expected to be uncorrelated withthe dependent variables (assumption a1).

Only 1 of the 16 articles contained a thoroughassessment of ANCOVA assumptions. In fairness,though, in the 9 using random assignment the in-vestigator should not have needed to check thecorrelation between the covariate(s) and indepen-dent variable(s). Also, because random assign-ment can produce (approximately) equal cellsizes, the analysis is inoculated against violationsof all assumptions except independence of scores.It is surprising that so few of the articles containedinformation about assumption a2, the relation-ship of the covariate and the dependent variable.Only 1 indicated F-ratios for covariates, and in 1other study the investigator gave the simple corre-

lations between covariates and dependent vari-ables.

STATISTICAL PACKAGES AND ANCOVA

In 1982, Searle and Hudson compared ANCOVAprocedures from 10 computer programs, and dis-covered different output among all 10. Althoughcontemporary statistical programs are easier touse, output and labeling have not improved muchsince then. Three of the four packages we re-viewed are owned by SPSS (BMDP version 7,SPSS version 8.0, and SYSTAT version 7.01). In-terestingly, the flagship program, SPSS, differsfrom the other two in its approach to ANCOVA.Its default setting is what SPSS terms the “exper-imental” approach, in which main effects and in-teractions are adjusted for the covariate. The de-fault for BMDP and SYSTAT, in SPSS language,is called the “regression” approach, in which eachterm—even the covariate—is adjusted for eachother term. When the covariate is uncorrelatedwith other independent variables, then both ap-proaches give the same result (that is, there isnothing to adjust in the covariate). But in the non-experimental situation, where the covariate maybe related to a grouping variable, the two ap-proaches can deliver markedly different results. Inthis case, the regression approach gives more con-servative results, with less statistical power. Witheach package, the thoughtful analyst can easilyoverride the defaults to produce the alternate ap-proach. SYSTAT does not label the covariate(s) assuch on the printout. BMDP’s programs 1V and4V label the covariate(s) clearly, but 2V does not.

SAS (version 6.12) does not treat the approach-es as alternate. It delivers both the regression andexperimental results in a single table, so the usercan decide which to use (or not decide, and reportboth). SAS does not identify the covariate(s) onthe printout.

SPSS’s ANCOVAis the most unconventional ofthe four packages. The only way to assign covari-ate status to a variable is through SPSS’s GeneralLinear Model procedure. The resulting printoutdoes not distinguish covariates from other inde-pendent variables.

STATISTICAL PACKAGES’ TREATMENTOF ANCOVA ASSUMPTIONS

As a rule, statistical packages encourage users toignore assumptions and leap right to the main

560 RESEARCH IN NURSING & HEALTH

Page 5: Use and Abuse of Ancova

analysis. Inside ANOVAprograms, packages offerthe Levene test for homogenity of variance, butany other tests of assumptions must be arranged bythe user.

For ANCOVA, the situation is no better. InBMDP, only one of its three ANCOVA programs(1V) automatically delivers a homogeneity of re-gression test. Unfortunately, this program handlesonly a one-way model, so if the analyst has a fac-torial model, she must convert the cell structure toa one-way model just to get the assumption tested.SYSTAT and SAS offer no homogeneity test. InSPSS’s GLM procedure, one must construct an in-teraction term representing the assumption test.Without a clear guide (SPSS, 1997, pp. 118–119),this would be hard to discover.

Any analyst facile with regression analysiscould readily test for slope homogeneity inside aregression model. Caution should be used, though,in arranging the model. With a hierarchical analy-sis (the preferred approach), the homogeneity term(interaction between the covariate and the inde-pendent variable) is entered last, and the test is aversion of SPSS’s “experimental” approach,where each successive term is adjusted for previ-ous terms. If a direct or simultaneous regression isused, the homogeneity term is tested with SPSS’s“regression” approach.

CONCLUSIONS

In 1969, Janet Elashoff called the analysis of co-variance (ANCOVA) “a delicate instrument.” Itstill is. Carefully handled, though, it is an excellentdevice for the analyst’s toolkit. To improve thequality of future ANCOVAstudies, we recommendthat the method be limited primarily to randomizeddesigns. When the analyst wants to use ANCOVAwith an intact group or other nonrandom assign-ment, the correlation between the covariate(s) andthe independent variable(s) should be reported. Asthe correlations are increasingly nonzero, then con-clusions drawn about the independent variables areincreasingly suspect. ANCOVA is an interestingand useful toolkit, but it is not a fix-all to be appliedindiscriminately to equate groups. As mentionedabove, the Johnson-Neyman method can be used asan option (or as a complement) to ANCOVA. My-ers and Well (1995) offer a brief comparison ofANCOVA with other approaches—blocking,analysis of gain scores—to improving statisticalpower in nonrandom groups. Kirk (1995, Chapter15) gives a short but excellent review of ANCOVAapplications, and Huitema’s (1980) text remains asthe definitive work on ANCOVA.

We also recommend that researchers reporttests of ANCOVA assumptions. That statisticalpackages make assumption tests challenging is nota good reason to avoid them entirely. And it iseasy, not challenging, to report the simple correla-tions between covariates and dependent variables.In the case where the correlations are tiny, thenthere is no gain whatsoever to using ANCOVA.

REFERENCES

Beck, A.T., & Steer, R.A. (1988). Beck HopelessnessScale manual. San Antonio: Psychological Corpora-tion.

Bryk, A.S., & Weisberg, H.I. (1977). Use of the non-equivalent control group design when subjects aregrowing. Psychological Bulletin, 84, 950–962.

Cohen, J. (1968). Multiple regression as a general dataanalytic system. Psychological Bulletin, 70,426–443.

Darlington, R.B. (1968). Multiple regression in psycho-logical research and practice. Psychological Bulletin,69, 161–182.

Dixon, W.J. (1992). BMDP statistical software manual,Vol. 1. Berkeley, CA: University of California Press.

Dorsey, S.G., & Soeken, K.L. (1996). Use of the John-son-Neyman technique as an alternative to analysis ofcovariance. Nursing Research, 45, 363–366.

Elashoff, J.D. (1969). Analysis of covariance: Adelicateinstrument. American Educational Research Jour-nal, 6, 383–401.

Hamilton, B.L. (1977). An empirical investigation ofthe effects of heterogeneous regression slopes inanalysis of covariance. Educational and Psychologi-cal Measurement, 37, 701–702.

Henry, B. (1998). To Journal readers, report and re-quests, 1998. Image: Journal of Nursing Scholarship,30, 2.

Huitema, B.(1980). The analysis of covariance and al-ternatives. New York: Wiley.

Kirk, R.E. (1995). Experimental design: Procedures forthe behavioral sciences (3rd Ed.). Pacific Grove, CA:Brooks/Cole.

Myers, J.L., & Well, A.D. (1995). Research design &statistical analysis. Hillsdale, NJ: Lawrence Erl-baum.

Pedhazur, E.J. (1997). Multiple regression in behavioralresearch (3rd Ed.). New York: Harcourt Brace.

Pedhazur, E.J., & Schmelkin, L.P. (1991). Measure-ment, design, and analysis: An integrated approach.Hillsdale, NJ: Lawrence Erlbaum.

Searle, S.R., & Hudson, G.F.S. (1982). Some distinctivefeatures of output from statistical computing pack-ages for analysis of covariance. Biometrics, 38,737–745.

SPSS. (1997). SPSS advanced statistics 7.5. Chicago:Author.

Tabachnick, B.G., & Fidell, L.S. (1996). Using multi-variate statistics (3rd Ed.) New York: Harper Collins.

USES AND ABUSES OF ANCOVA / OWEN AND FROMAN 561

Page 6: Use and Abuse of Ancova

Wu, Y-W.B. (1984). The effects of heterogeneous re-gression slopes on the robustness of two test statisticsin the analysis of covariance. Educational and Psy-chological Measurement, 44, 647–663.

Wu, Y-W.B., & Slakter, M.J. (1989). Analysis of co-variance in nursing research. Nursing Research, 38,306–308.

562 RESEARCH IN NURSING & HEALTH