The Ethical Use of Statistical Analyses in Psychological...

23
Running head: ETHICS AND STATISTICS The Ethical Use of Statistical Analyses in Psychological Research James M. Graham Texas A&M University ________________________________________________________________ Graham, J. M. (2001, March). The ethical use of statistical analyses in psychological research. Paper Presented at the annual meeting of Division 17 (Counseling Psychology) of the American Psychological Association, Houston, TX.

Transcript of The Ethical Use of Statistical Analyses in Psychological...

Running head: ETHICS AND STATISTICS

The Ethical Use of Statistical Analyses

in Psychological Research

James M. Graham

Texas A&M University

________________________________________________________________

Graham, J. M. (2001, March). The ethical use of statistical analyses in psychological research. Paper Presented at the annual meeting of Division 17 (Counseling Psychology) of the American Psychological Association, Houston, TX.

The Ethical Use of Statistical 2

Abstract

A large body of literature in psychological research and

methodology points out common misuses of statistical procedures

in published works. Statistical procedures, when incorrectly

applied, result in false and misleading results which, in turn,

lessen the ethical justification for a research project. The

present paper attempts to provide researchers in the field of

psychology ethical guidelines for the proper use of statistical

procedures. First, the problems with the misuse of statistical

analyses in current literature are outlined. Next, the ethical

principles which apply are discussed. Finally, previously

proposed solutions and guidelines for researchers are presented.

The Ethical Use of Statistical 3

The Ethical Use of Statistical Analyses

in Psychological Research

Rosenthal and Rosnow (1984) have identified two major

ethical concerns for those conducting research in the field of

psychology: the protection of the well-being of human

participants, and the protection of the integrity of our work.

While a large amount of writing has been done on the former of

these obligations, surprisingly little attention has been given

to the latter.

As researchers in the field of psychology we are bound by

the ethical principles set forth by the American Psychological

Association (APA) (APA, 1992). Part of our ethical obligation

requires us to take steps to ensure that human subjects are not

harmed or mistreated as a result of our research. The present

paper focuses on the additional ethical requirement that

psychologists produce research of high integrity and quality,

which is neither false nor misleading. In particular, this

concept is applied to the misuses of statistical procedures

which seem to pervade the current literature.

Scientific Quality and Ethics

Research of poor quality leads to a number of problems.

Each year, millions of dollars of time and resources are

invested in research in the field of psychology, and many

important decisions are made based on the findings of that

The Ethical Use of Statistical 4

research. Improperly designed or conducted research does not

fulfill its primary function: to advance the knowledge of the

discipline (Parkinson, 1994). A poorly implemented research

project does not answer the questions which it sets out to

answer and, at worst, harms the field of psychology by

representing false information as scientific truth. This can be

exceptionally damaging to the field of psychology, as future

research projects are based on prior research, compounding the

problem. The “file-drawer” problem, which states that

replications of studies which fail to show statistical

significance are less likely to be published than those do,

makes errors in research unlikely to be corrected in the

literature (Rosenthal, 1979). Poor quality research, once

published, is likely to go uncorrected, much to the detriment of

the field.

Incompetently designed research also wastes time, money,

and resources which would be better spent on higher quality

research (Parkinson, 1995). While it is certainly not true that

research which does not show the expected results is of no use

to the scientific community, research which is improperly

designed or implemented is of no value, regardless of the

results.

Colombo refers to the elimination of poor research design

as “the most uncomplicated definition of scientific merit”

The Ethical Use of Statistical 5

(1995, p. 318). As scientists in the field of psychology, we

are bound by a code of ethics, set forth by the American

Psychological Association (APA), to ensure that we produce

research of the highest possible merit. While this code of

ethics appears most applicable to those in applied settings,

researchers are expected to follow them to the same extent as

their practitioner cousins. As stated by Rosenthal (1994),

“everything else being equal, research that is of higher

scientific quality is likely to be more ethically defensible”

(p. 127).

The adage which states that a chain is only as strong as

its weakest link holds true in regards to research quality in

the field of psychology. The weak link in research methodology

is often the improper use of statistics (Thompson, 1998).

Errors in data analysis invariably lead to inaccurate results,

which serve to greatly lessen the ethical justification for

conducting a research project (Rosenthal, 1994). It is

therefore of the utmost importance that, as ethical researchers,

we take heed of the wide body of literature addressing common

misuses of statistical procedures and errors in statistical

analysis.

Misuse of Statistical Procedures

The preponderance of articles published which address

common errors in the application of statistical methods speak

The Ethical Use of Statistical 6

volumes to the need for such education. The fact remains that

many of the statistical procedures in common use are not only

outdated by better, more modern procedures, but also produce

results which are erroneous and misleading. Even statistical

procedures which can produce very accurate results are often

used in manners for which they are not intended, producing the

same erroneous and misleading results. While it is not the

purpose of this paper to examine these common errors in detail,

several will be briefly outlined to underscore the need for this

paper.

Null Hypothesis Significance Testing

Perhaps the most frequently addressed relevant topic is the

psychological research community’s over-reliance on null

hypothesis statistical significance testing. In statistical

significance testing, p values are often incorrectly interpreted

as the probability that results will replicate, the importance

of the results, or as an indicator of effect magnitude only

(Cohen, 1994; Thompson, 1989). In actuality, statistical

significance testing tells us none of these things.

In the case of statistical significance testing, real

change in common practice has already begun to occur. While the

APA’s (1994) initial suggestion that effect sizes be reported

whenever statistical significance is cited has had little effect

(Kirk, 1996; Lance & Vacha-Haase, 1998; Thompson & Snyder,

The Ethical Use of Statistical 7

1997), the reports of the APA Task Force on Statistical

Inference (Wilkinson & Task Force on Statistical Inference,

1999) and changes in several journal’s editorial policies

(Murphy, 1997; Vacha-Haase, 1998) bode well for the future of

proper use of F-ratios, Wilk’s Lambda, and their ilk. While the

use of significance testing has been a problem acknowledged for

at least 40 years (Rozeboom, 1960), only recently are actions

being made to correct it (Wilkinson & The Task Force on

Statistical Inference, 1999).

Ignoring Assumptions

Wilcox (1998) points out that though hundreds of articles

have stated that classical general linear model (GLM) least-

squares procedures have very low power and can be misleading

under even small departures from normality, they continue to be

misused. Wilcox further states that, “in applied work, many

non-significant results would have been significant if a more

modern method, developed after the year 1960, had been used” (p.

300).

Nearly all classical GLM procedures require that certain

assumptions be met (normality, homogeneity of variance, etc.),

yet these procedures are routinely applied even when the basic

assumptions have not been met (Wilcox, 1998). While appropriate

modern methods of dealing with the failure to meet these

assumptions are well documented (Hoaglin, Mosteller, & Tukey,

The Ethical Use of Statistical 8

1985; Huber, 1981; Wilcox, 1996), they continue to be misused in

the current literature.

Stepwise Procedures

Another common error in statistical analysis occurs when

stepwise procedures are used in discriminant analysis and

multiple regression. A number of articles and textbooks address

the fact that stepwise procedures produce results which can be

inaccurate and are best forsaken for all-possible-subsets

procedures (Huberty, 1989; Snyder, 1991; Thompson, 1989; 1995).

Stepwise procedures are used to select a “best” subset of

variables out of a larger group, but are grossly affected by

sampling error (McCabe, 1975; Thompson, 1995), and do not even

always identify the best subset of a given size (Huberty, 1989;

Thompson, 1995). The fact remains, however, that stepwise

procedures are still in common use (Huberty, 1994), even though

more viable alternatives have been available for nearly 25 years

(McCabe, 1975; McHenry, 1978).

ANOVA Interaction Effects

Interaction effects in classical analysis of variance

(ANOVA) procedures have been referred to by Rosnow and Rosenthal

(1989b) as “probably the universally most misinterpreted

empirical results in psychology” (p. 1282). In fact, in a

review of 191 research articles using ANOVA designs in prominent

journals Rosnow and Rosenthal (1989a) found that only 1% of

The Ethical Use of Statistical 9

articles correctly interpreted interaction effects. Upon

finding a statistically significant omnibus interaction effect,

many researchers make the mistake of using post hoc tests to

compare pairs of cell means to determine the source of the

statistically significant interaction effect. It has been shown

in numerous writings that to do so is not to interpret only the

interaction effects, but the interaction effects obscured by the

main effects (Levin & Marascuilo, 1972; Rosnow & Rosenthal,

1991). Alternate procedures have existed since the development

of the ANOVA and have been expounded by a multitude of authors

since (Harwell, 1998; Levin & Marascuilo, 1972; Marascuilo &

Levin, 1970; Rosnow & Rosenthal, 1989a).

While the above examples are by no means an exhaustive list

of common errors in data analysis, they serve to highlight the

errors in statistical analysis commonplace in psychological

research. As the majority of the articles addressing these

errors are published in journals targeted towards statisticians,

those attempting to educate their colleagues about such matters

are unfortunately often put in the position of preaching to the

choir. However, as ethical researchers, it is important for us

to seek out such knowledge or, when necessary, consult with our

more statistically savvy peers.

Applicable Ethic Codes

The Ethical Use of Statistical 10

In order to be applied to diverse areas of psychology, the

ethical principles set forth by the APA often lack specific

directives, and can therefore be interpreted as ambiguous

(Goodyear, Crego, & Johnston, 1992). While in the current code

of ethics only one section specifically addresses research,

“most of the Ethical Standards are written broadly, in order to

apply to psychologists in varied roles” (APA, 1992, p. 1598).

These roles include, not only clinical practice, supervision,

and teaching, but research as well. Several of the ethical

principles set forth by the APA can be directly applied to the

problem of the improper use of statistical procedures.

Integrity

The Code of Ethics states that, “In describing or reporting

their … research … [psychologists] do not make statements that

are false, misleading, or deceptive” (APA, 1992, p. 1599). As

previously stated, the misuse of statistical procedures can

result in inaccurate and misleading results. While the

inaccuracy of interpretations resulting from the misuse of

statistical procedures is almost certainly not intentional, the

fact remains that the assertions made may simply be false.

Improper usage of statistical procedures can turn even the most

competently designed experiment into a poor-quality research

study, undermining the ethical justification for spending time

The Ethical Use of Statistical 11

and money on the research. In turn, this serves to damage the

integrity of the field of psychology as a whole.

Furthermore, the Code of Ethics states that, “If

psychologists discover significant errors in their published

data, they take reasonable steps to correct such errors in a

correction, retraction, erratum, or other appropriate

publication means” (APA, 1992, p. 1609). As psychologists, we

are therefore bound to, upon discovering errors in our

previously published work, make attempts to correct the errors.

As applied to the present problem, discovered errors in

statistical analyses should be re-analyzed appropriately, with

the subsequent results made available to the research community.

Competence

The Code of Ethics clearly states that psychologists must

maintain high standards of competence, recognize the limits of

their competence, maintain scientific knowledge, and make use of

current scientific resources (APA, 1992). While this principle

is most often recognized as applying to clinical practice, it is

necessary for researchers to follow this guideline as well. For

example, a researcher who has no experience with marital therapy

and who is not familiar with current research on marital therapy

would be unlikely to conduct meaningful research in that area.

In the same vein, a researcher with no knowledge of the

limitations of a specific statistical procedure would be hard-

The Ethical Use of Statistical 12

pressed to appropriately conduct an accurate analysis using that

procedure.

Principle 1.04b further states that, “psychologists …

conduct research … involving new techniques only after first

undertaking appropriate study, training, supervision, and/or

consultation from persons who are competent in those …

techniques” (APA, 1992, p. 1600). As all researchers are not

equally competent in all analytic techniques, those using a

statistical procedure with which they are unfamiliar are

expected to consult with more knowledgeable colleagues.

Social Responsibility

Perhaps the most poignant applicable ethical principle is

that of social responsibility: “When undertaking research

[psychologists] strive to advance human welfare and the science

of psychology,” (APA, 1992, p. 1600). At best, research based

on improper data analysis does nothing to advance the field of

psychology and wastes precious time and resources. At worst,

poor quality data analysis harms the science of psychology by

propagating false or misleading claims. In both cases, nothing

is done to benefit human welfare.

Possible Solutions

In 1972, Steiner half-jokingly suggested that researchers

in the field of psychology should be given a quota of studies

during which every researcher must demonstrate competence. If

The Ethical Use of Statistical 13

the researcher fails to produce meaningful work within the

allotted number of studies, that researcher will no longer be

allowed to conduct research. Thankfully, such drastic methods

are not yet necessary. Other suggestions have been made, and

are presented here.

Institutional Review Boards

Rosenthal and Rosnow (1984) have suggested that a cost-

utility analysis be used to determine whether to conduct or not

conduct research. A planned project that is inexpensive in

terms of money and time and is very useful would therefore be

deemed of high quality. Conversely, an expensive and time

consuming study with little utility would be deemed unethical to

conduct. As applied to the present problem, a study with a poor

quality data analysis will produce inaccurate and misleading

results, and therefore be of no utility.

Rosenthal (1994) further suggests that institutional review

boards (IRB), use this cost-utility approach to consider not

only the treatment of human participants, but the scientific

competence of the investigators as well. If the IRB is not able

to determine methodological competence, Rosenthal (1995)

suggests that they consult with colleagues who can.

As discussed by Colombo (1995), the expectation that IRBs

evaluate studies solely on the basis of a cost-utility analysis

The Ethical Use of Statistical 14

is unlikely to be met. Certainly, it is unrealistic to assume

that IRBs will have the time and resources to fully investigate

the applicability of all types of statistical analyses across

all applications, nor are IRB members necessarily trained to do

so.

Peer Review

A more feasible alternative may be to conduct a formal peer

review of research prior to its conduct (Colombo, 1995). It

would certainly be useful for any researcher to consult with

several colleagues prior to conducting a research study,

particularly in the realm of statistics. Colombo points out

that this is what is currently done with students conducting

research in psychology. However, many researchers may find the

lengthy process of a formal peer review overly time-consuming

and restrictive.

Editorial Policies

Another solution to the problem of the misuse of

statistical procedures in psychology may be in the hands of

journal editors. The acceptance of articles which misuse

statistical procedures not only encourages their continued

production but also enables them to damage the field of

psychology. Journal editors would therefore do well to evaluate

the proper use of statistics in submitted articles by using

The Ethical Use of Statistical 15

competent reviewers. Recently, the editorial policies of many

journals have begun to reflect this, specifically by requiring

that reports of statistical significance be supplemented with

effect sizes (Murphy, 1997; Vacha-Haase, 1998).

Guidelines for Researchers

As stated by a participant in a study on research ethics

conducted by Goodyear et al., 1992, “Ultimately, the authors of

articles must assume sole responsibility for all aspects of

their scholarly product and their interpretations” (p. 205).

While it may be unrealistic to expect all editorial policies to

change or for IRBs to begin assessing the statistical competence

of researchers, there are steps that the individual researcher

can take to maintain adherence to the Code of Ethics as they

apply to statistical analyses. It is in light of the present

discussion that the following guidelines are suggested.

1) Maintain an awareness of current issues in statistical

methodology through relevant journals and continuing education.

There are a variety of journals devoted to statistical

methodology, and workshops on statistical methods are offered at

nearly every major psychological conference.

2) Recognize the limits of your knowledge of current issues in

statistics and use only those procedures with which you are

competent. Use other techniques only after consulting with an

informed colleague.

The Ethical Use of Statistical 16

3) Always consult with several colleagues about appropriate

analytic techniques while designing a study. It is not expected

that every researcher in the field of psychology be an expert

statistician. However, competent and ethical researchers should

be willing to consult with colleagues who are aware of current

issues in statistics.

4) Expect to find conflicting opinions in the literature

regarding accepted statistical practice—our knowledge of

statistics is being constantly updated. Consult with others to

determine current standards in statistics.

5) Do not use a more complicated statistical method when a

simpler one will suffice. Do not choose an analytic technique

merely because it is popular in current literature. Choose your

analytic procedure because it is appropriate given your design

and data.

6) When writing up your work, explain your reasons for

selecting your chosen statistical procedures. Reference several

current published statistical articles that support your choice.

7) Be aware of the assumptions required by your chosen analytic

technique, and be certain that they have been met. Use multiple

techniques, including graphical representations, to ensure that

your chosen analytic technique is appropriate given your data.

8) When using tests of statistical significance, always report

effect sizes.

The Ethical Use of Statistical 17

Conclusion

The problem of erroneous and misleading results due to the

improper use of statistical procedures in the field of

psychology is not a new one. As we continue to discover new

analytic techniques and modify old ones, accepted practice in

statistical analysis will continue to change. As ethical and

responsible psychologists it is our duty to ensure the continued

quality of our research results.

The suggested guidelines should come as no surprise to

those familiar with the ethical standards of psychology.

Consultation, documentation, and practicing within the

boundaries of individual competence are guidelines which can be

applied across all of the psychologist’s varied roles. They

stand true for clinical practice, teaching, supervision, and

research, and are just as applicable to conducting a statistical

analysis.

That the misuse of statistical procedures has continued for

so long does not excuse its existence. As noted by Gergen

(1973), we are responsible for correcting our own mistakes. It

is our responsibility to do what we can to ensure that the

results of our research are neither erroneous nor misleading as

a result of poor statistical practice. Perhaps the Code of

Ethics illustrates the heart of this matter best by stating that

“it is the individual responsibility of each psychologist to

The Ethical Use of Statistical 18

aspire to the highest possible standards of conduct” (APA, 1992,

p. 1599).

The Ethical Use of Statistical 19

References

American Psychological Association. (1992). Ethical principles

of psychologists and code of conduct. American

Psychologist, 47, 1597-1611.

American Psychological Association. (1994). Publication manual

of the American Psychological Association. Washington, DC:

American Psychological Association.

Cohen, J. (1994). The earth is round (p<.05). American

Psychologist, 49, 997-1003.

Colombo, J. (1995). Cost, utility, and judgments of

institutional review boards. Psychological Science, 6, 318-

319.

Gergen, K. J. (1973). The codification of research ethics: Views

of a doubting Thomas. American Psychologist, 28, 907-912.

Goodyear, R. K., Crego, C. A., & Johnston, M. W. (1992). Ethical

issues in the supervision of student research: A study of

critical incidents. Professional Psychology: Research and

Practice, 23, 203-210.

Harwell, M. (1998). Misinterpreting interaction effects in

analysis of variance. Measurement and Evaluation in

Counseling and Development, 31, 125-136.

Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1985). Exploring

data tables, trends, and shapes. New York: Wiley.

Huber, P. (1981). Robust Statistics. New York: Wiley.

The Ethical Use of Statistical 20

Huberty, C. J. (1989). Problems with stepwise methods—better

alternatives. In B. Thompson (Ed.), Advances in social

science methodology (Vol. 1). Stamford, CT: JAI Press.

Kirk, R. (1996). Practical significance: A concept whose time

has come. Educational and Psychological Measurement, 56,

746-759.

Lance, T., & Vacha-Haase, T. (1998, August). The Counseling

Psychologist: Trends and usages of statistical significance

testing. Paper presented at the annual meeting of the

American Psychological Association, San Francisco.

Levin, J. R., & Marascuilo, L. A. (1972). Type IV errors and

interactions. Psychological Bulletin, 78, 368-374.

Marascuilo, L. A., & Levin, J. R. (1970). Appropriate post hoc

comparisons for interaction and nested hypotheses in

analysis of variance designs: The elimination of Type IV

errors. American Educational Research Journal, 7, 397-421.

McCabe, G. P. (1975). Computations for variable selection in

discriminant analysis. Technometrics, 17, 103-109.

McHenry, C. E. (1978). Computation of a best subset in

multivariate analysis. Applied Statistics, 27, 291-296.

Murphy, K. R. (1997). Editorial. Journal of Applied Psychology,

82, 3-5.

Parkinson, S. (1995). Scientific or ethical quality?

Psychological Science, 5, 137-138.

The Ethical Use of Statistical 21

Rosenthal, R. (1979). The “file-drawer problem” and tolerance

for null results. Psychological Bulletin, 86, 638-641.

Rosenthal, R. (1994). Science and ethics in conducting,

analyzing, and reporting research. Psychological Science,

5, 127-134.

Rosenthal, R. (1995). Ethical issues in psychological science:

Risk, consent, and scientific quality. Psychological

Science, 6, 322-323.

Rosenthal, R. & Rosnow, R. L. (1984). Applying Hamlet’s question

to the ethical conduct of research. American Psychologist,

39, 561-563.

Rosnow, R. L., & Rosenthal, R. (1989a). Definition and

interpretation of interaction effects. Psychological

Bulletin, 105, 142-146.

Rosnow, R. L., & Rosenthal, R. (1989b). Statistical procedures

and the justification of knowledge in psychological

science. American Psychologist, 44, 1276-1284.

Rosnow, R. L., & Rosenthal, R. (1991). If you’re looking at the

cell means, you’re not looking at only the interaction

(unless all main effects are zero). Psychological Bulletin,

110, 574-576.

Rozeboom, W. W. (1960). The fallacy of the null hypothesis

significance test. Psychological Bulletin, 57, 416-428.

The Ethical Use of Statistical 22

Steiner, I. D. (1972). The evils of research: Or what my mother

didn’t tell me about the sins of academia. American

Psychologist, 27, 766-768.

Thompson, B. (1989). Why won’t stepwise methods die?.

Measurement and Evaluation in Counseling and Development,

21(4), 146-148.

Thompson, B. (1995). Stepwise regression and stepwise

discriminant analysis need not apply here: A guidelines

editorial. Educational and Psychological Measurement, 55,

525-534.

Thompson, B. (1998, April). Five methodology errors in

educational research: The pantheon of statistical

significance and other faux pas. Invited address presented

at the annual meeting of the American Educational Research

Association, San Diego.

Thompson, B., & Snyder, P. A. (1997). Statistical significance

testing practices in the Journal of Experimental Education.

Journal of Experimental Education, 66, 75-83.

Vacha-Haase, T. (1998, August). A review of APA journals’

editorial policies regarding statistical significance

testing and effect size. Paper presented at the annual

meeting of the American Psychological Association, San

Francisco.

The Ethical Use of Statistical 23

Wilcox, R. R. (1996). Statistics for the social sciences. San

Diego, CA: Academic Press.

Wilcox, R. R. (1998). How many discoveries have been lost by

ignoring modern statistical methods? American Psychologist,

53, 300-314.

Wilkinson, L., & Task Force on Statistical Inference. (1999).

Statistical methods in psychology journals: Guidelines and

explanations. American Psychologist, 54, 594-604.