The Ethical Use of Statistical Analyses in Psychological...
Transcript of The Ethical Use of Statistical Analyses in Psychological...
Running head: ETHICS AND STATISTICS
The Ethical Use of Statistical Analyses
in Psychological Research
James M. Graham
Texas A&M University
________________________________________________________________
Graham, J. M. (2001, March). The ethical use of statistical analyses in psychological research. Paper Presented at the annual meeting of Division 17 (Counseling Psychology) of the American Psychological Association, Houston, TX.
The Ethical Use of Statistical 2
Abstract
A large body of literature in psychological research and
methodology points out common misuses of statistical procedures
in published works. Statistical procedures, when incorrectly
applied, result in false and misleading results which, in turn,
lessen the ethical justification for a research project. The
present paper attempts to provide researchers in the field of
psychology ethical guidelines for the proper use of statistical
procedures. First, the problems with the misuse of statistical
analyses in current literature are outlined. Next, the ethical
principles which apply are discussed. Finally, previously
proposed solutions and guidelines for researchers are presented.
The Ethical Use of Statistical 3
The Ethical Use of Statistical Analyses
in Psychological Research
Rosenthal and Rosnow (1984) have identified two major
ethical concerns for those conducting research in the field of
psychology: the protection of the well-being of human
participants, and the protection of the integrity of our work.
While a large amount of writing has been done on the former of
these obligations, surprisingly little attention has been given
to the latter.
As researchers in the field of psychology we are bound by
the ethical principles set forth by the American Psychological
Association (APA) (APA, 1992). Part of our ethical obligation
requires us to take steps to ensure that human subjects are not
harmed or mistreated as a result of our research. The present
paper focuses on the additional ethical requirement that
psychologists produce research of high integrity and quality,
which is neither false nor misleading. In particular, this
concept is applied to the misuses of statistical procedures
which seem to pervade the current literature.
Scientific Quality and Ethics
Research of poor quality leads to a number of problems.
Each year, millions of dollars of time and resources are
invested in research in the field of psychology, and many
important decisions are made based on the findings of that
The Ethical Use of Statistical 4
research. Improperly designed or conducted research does not
fulfill its primary function: to advance the knowledge of the
discipline (Parkinson, 1994). A poorly implemented research
project does not answer the questions which it sets out to
answer and, at worst, harms the field of psychology by
representing false information as scientific truth. This can be
exceptionally damaging to the field of psychology, as future
research projects are based on prior research, compounding the
problem. The “file-drawer” problem, which states that
replications of studies which fail to show statistical
significance are less likely to be published than those do,
makes errors in research unlikely to be corrected in the
literature (Rosenthal, 1979). Poor quality research, once
published, is likely to go uncorrected, much to the detriment of
the field.
Incompetently designed research also wastes time, money,
and resources which would be better spent on higher quality
research (Parkinson, 1995). While it is certainly not true that
research which does not show the expected results is of no use
to the scientific community, research which is improperly
designed or implemented is of no value, regardless of the
results.
Colombo refers to the elimination of poor research design
as “the most uncomplicated definition of scientific merit”
The Ethical Use of Statistical 5
(1995, p. 318). As scientists in the field of psychology, we
are bound by a code of ethics, set forth by the American
Psychological Association (APA), to ensure that we produce
research of the highest possible merit. While this code of
ethics appears most applicable to those in applied settings,
researchers are expected to follow them to the same extent as
their practitioner cousins. As stated by Rosenthal (1994),
“everything else being equal, research that is of higher
scientific quality is likely to be more ethically defensible”
(p. 127).
The adage which states that a chain is only as strong as
its weakest link holds true in regards to research quality in
the field of psychology. The weak link in research methodology
is often the improper use of statistics (Thompson, 1998).
Errors in data analysis invariably lead to inaccurate results,
which serve to greatly lessen the ethical justification for
conducting a research project (Rosenthal, 1994). It is
therefore of the utmost importance that, as ethical researchers,
we take heed of the wide body of literature addressing common
misuses of statistical procedures and errors in statistical
analysis.
Misuse of Statistical Procedures
The preponderance of articles published which address
common errors in the application of statistical methods speak
The Ethical Use of Statistical 6
volumes to the need for such education. The fact remains that
many of the statistical procedures in common use are not only
outdated by better, more modern procedures, but also produce
results which are erroneous and misleading. Even statistical
procedures which can produce very accurate results are often
used in manners for which they are not intended, producing the
same erroneous and misleading results. While it is not the
purpose of this paper to examine these common errors in detail,
several will be briefly outlined to underscore the need for this
paper.
Null Hypothesis Significance Testing
Perhaps the most frequently addressed relevant topic is the
psychological research community’s over-reliance on null
hypothesis statistical significance testing. In statistical
significance testing, p values are often incorrectly interpreted
as the probability that results will replicate, the importance
of the results, or as an indicator of effect magnitude only
(Cohen, 1994; Thompson, 1989). In actuality, statistical
significance testing tells us none of these things.
In the case of statistical significance testing, real
change in common practice has already begun to occur. While the
APA’s (1994) initial suggestion that effect sizes be reported
whenever statistical significance is cited has had little effect
(Kirk, 1996; Lance & Vacha-Haase, 1998; Thompson & Snyder,
The Ethical Use of Statistical 7
1997), the reports of the APA Task Force on Statistical
Inference (Wilkinson & Task Force on Statistical Inference,
1999) and changes in several journal’s editorial policies
(Murphy, 1997; Vacha-Haase, 1998) bode well for the future of
proper use of F-ratios, Wilk’s Lambda, and their ilk. While the
use of significance testing has been a problem acknowledged for
at least 40 years (Rozeboom, 1960), only recently are actions
being made to correct it (Wilkinson & The Task Force on
Statistical Inference, 1999).
Ignoring Assumptions
Wilcox (1998) points out that though hundreds of articles
have stated that classical general linear model (GLM) least-
squares procedures have very low power and can be misleading
under even small departures from normality, they continue to be
misused. Wilcox further states that, “in applied work, many
non-significant results would have been significant if a more
modern method, developed after the year 1960, had been used” (p.
300).
Nearly all classical GLM procedures require that certain
assumptions be met (normality, homogeneity of variance, etc.),
yet these procedures are routinely applied even when the basic
assumptions have not been met (Wilcox, 1998). While appropriate
modern methods of dealing with the failure to meet these
assumptions are well documented (Hoaglin, Mosteller, & Tukey,
The Ethical Use of Statistical 8
1985; Huber, 1981; Wilcox, 1996), they continue to be misused in
the current literature.
Stepwise Procedures
Another common error in statistical analysis occurs when
stepwise procedures are used in discriminant analysis and
multiple regression. A number of articles and textbooks address
the fact that stepwise procedures produce results which can be
inaccurate and are best forsaken for all-possible-subsets
procedures (Huberty, 1989; Snyder, 1991; Thompson, 1989; 1995).
Stepwise procedures are used to select a “best” subset of
variables out of a larger group, but are grossly affected by
sampling error (McCabe, 1975; Thompson, 1995), and do not even
always identify the best subset of a given size (Huberty, 1989;
Thompson, 1995). The fact remains, however, that stepwise
procedures are still in common use (Huberty, 1994), even though
more viable alternatives have been available for nearly 25 years
(McCabe, 1975; McHenry, 1978).
ANOVA Interaction Effects
Interaction effects in classical analysis of variance
(ANOVA) procedures have been referred to by Rosnow and Rosenthal
(1989b) as “probably the universally most misinterpreted
empirical results in psychology” (p. 1282). In fact, in a
review of 191 research articles using ANOVA designs in prominent
journals Rosnow and Rosenthal (1989a) found that only 1% of
The Ethical Use of Statistical 9
articles correctly interpreted interaction effects. Upon
finding a statistically significant omnibus interaction effect,
many researchers make the mistake of using post hoc tests to
compare pairs of cell means to determine the source of the
statistically significant interaction effect. It has been shown
in numerous writings that to do so is not to interpret only the
interaction effects, but the interaction effects obscured by the
main effects (Levin & Marascuilo, 1972; Rosnow & Rosenthal,
1991). Alternate procedures have existed since the development
of the ANOVA and have been expounded by a multitude of authors
since (Harwell, 1998; Levin & Marascuilo, 1972; Marascuilo &
Levin, 1970; Rosnow & Rosenthal, 1989a).
While the above examples are by no means an exhaustive list
of common errors in data analysis, they serve to highlight the
errors in statistical analysis commonplace in psychological
research. As the majority of the articles addressing these
errors are published in journals targeted towards statisticians,
those attempting to educate their colleagues about such matters
are unfortunately often put in the position of preaching to the
choir. However, as ethical researchers, it is important for us
to seek out such knowledge or, when necessary, consult with our
more statistically savvy peers.
Applicable Ethic Codes
The Ethical Use of Statistical 10
In order to be applied to diverse areas of psychology, the
ethical principles set forth by the APA often lack specific
directives, and can therefore be interpreted as ambiguous
(Goodyear, Crego, & Johnston, 1992). While in the current code
of ethics only one section specifically addresses research,
“most of the Ethical Standards are written broadly, in order to
apply to psychologists in varied roles” (APA, 1992, p. 1598).
These roles include, not only clinical practice, supervision,
and teaching, but research as well. Several of the ethical
principles set forth by the APA can be directly applied to the
problem of the improper use of statistical procedures.
Integrity
The Code of Ethics states that, “In describing or reporting
their … research … [psychologists] do not make statements that
are false, misleading, or deceptive” (APA, 1992, p. 1599). As
previously stated, the misuse of statistical procedures can
result in inaccurate and misleading results. While the
inaccuracy of interpretations resulting from the misuse of
statistical procedures is almost certainly not intentional, the
fact remains that the assertions made may simply be false.
Improper usage of statistical procedures can turn even the most
competently designed experiment into a poor-quality research
study, undermining the ethical justification for spending time
The Ethical Use of Statistical 11
and money on the research. In turn, this serves to damage the
integrity of the field of psychology as a whole.
Furthermore, the Code of Ethics states that, “If
psychologists discover significant errors in their published
data, they take reasonable steps to correct such errors in a
correction, retraction, erratum, or other appropriate
publication means” (APA, 1992, p. 1609). As psychologists, we
are therefore bound to, upon discovering errors in our
previously published work, make attempts to correct the errors.
As applied to the present problem, discovered errors in
statistical analyses should be re-analyzed appropriately, with
the subsequent results made available to the research community.
Competence
The Code of Ethics clearly states that psychologists must
maintain high standards of competence, recognize the limits of
their competence, maintain scientific knowledge, and make use of
current scientific resources (APA, 1992). While this principle
is most often recognized as applying to clinical practice, it is
necessary for researchers to follow this guideline as well. For
example, a researcher who has no experience with marital therapy
and who is not familiar with current research on marital therapy
would be unlikely to conduct meaningful research in that area.
In the same vein, a researcher with no knowledge of the
limitations of a specific statistical procedure would be hard-
The Ethical Use of Statistical 12
pressed to appropriately conduct an accurate analysis using that
procedure.
Principle 1.04b further states that, “psychologists …
conduct research … involving new techniques only after first
undertaking appropriate study, training, supervision, and/or
consultation from persons who are competent in those …
techniques” (APA, 1992, p. 1600). As all researchers are not
equally competent in all analytic techniques, those using a
statistical procedure with which they are unfamiliar are
expected to consult with more knowledgeable colleagues.
Social Responsibility
Perhaps the most poignant applicable ethical principle is
that of social responsibility: “When undertaking research
[psychologists] strive to advance human welfare and the science
of psychology,” (APA, 1992, p. 1600). At best, research based
on improper data analysis does nothing to advance the field of
psychology and wastes precious time and resources. At worst,
poor quality data analysis harms the science of psychology by
propagating false or misleading claims. In both cases, nothing
is done to benefit human welfare.
Possible Solutions
In 1972, Steiner half-jokingly suggested that researchers
in the field of psychology should be given a quota of studies
during which every researcher must demonstrate competence. If
The Ethical Use of Statistical 13
the researcher fails to produce meaningful work within the
allotted number of studies, that researcher will no longer be
allowed to conduct research. Thankfully, such drastic methods
are not yet necessary. Other suggestions have been made, and
are presented here.
Institutional Review Boards
Rosenthal and Rosnow (1984) have suggested that a cost-
utility analysis be used to determine whether to conduct or not
conduct research. A planned project that is inexpensive in
terms of money and time and is very useful would therefore be
deemed of high quality. Conversely, an expensive and time
consuming study with little utility would be deemed unethical to
conduct. As applied to the present problem, a study with a poor
quality data analysis will produce inaccurate and misleading
results, and therefore be of no utility.
Rosenthal (1994) further suggests that institutional review
boards (IRB), use this cost-utility approach to consider not
only the treatment of human participants, but the scientific
competence of the investigators as well. If the IRB is not able
to determine methodological competence, Rosenthal (1995)
suggests that they consult with colleagues who can.
As discussed by Colombo (1995), the expectation that IRBs
evaluate studies solely on the basis of a cost-utility analysis
The Ethical Use of Statistical 14
is unlikely to be met. Certainly, it is unrealistic to assume
that IRBs will have the time and resources to fully investigate
the applicability of all types of statistical analyses across
all applications, nor are IRB members necessarily trained to do
so.
Peer Review
A more feasible alternative may be to conduct a formal peer
review of research prior to its conduct (Colombo, 1995). It
would certainly be useful for any researcher to consult with
several colleagues prior to conducting a research study,
particularly in the realm of statistics. Colombo points out
that this is what is currently done with students conducting
research in psychology. However, many researchers may find the
lengthy process of a formal peer review overly time-consuming
and restrictive.
Editorial Policies
Another solution to the problem of the misuse of
statistical procedures in psychology may be in the hands of
journal editors. The acceptance of articles which misuse
statistical procedures not only encourages their continued
production but also enables them to damage the field of
psychology. Journal editors would therefore do well to evaluate
the proper use of statistics in submitted articles by using
The Ethical Use of Statistical 15
competent reviewers. Recently, the editorial policies of many
journals have begun to reflect this, specifically by requiring
that reports of statistical significance be supplemented with
effect sizes (Murphy, 1997; Vacha-Haase, 1998).
Guidelines for Researchers
As stated by a participant in a study on research ethics
conducted by Goodyear et al., 1992, “Ultimately, the authors of
articles must assume sole responsibility for all aspects of
their scholarly product and their interpretations” (p. 205).
While it may be unrealistic to expect all editorial policies to
change or for IRBs to begin assessing the statistical competence
of researchers, there are steps that the individual researcher
can take to maintain adherence to the Code of Ethics as they
apply to statistical analyses. It is in light of the present
discussion that the following guidelines are suggested.
1) Maintain an awareness of current issues in statistical
methodology through relevant journals and continuing education.
There are a variety of journals devoted to statistical
methodology, and workshops on statistical methods are offered at
nearly every major psychological conference.
2) Recognize the limits of your knowledge of current issues in
statistics and use only those procedures with which you are
competent. Use other techniques only after consulting with an
informed colleague.
The Ethical Use of Statistical 16
3) Always consult with several colleagues about appropriate
analytic techniques while designing a study. It is not expected
that every researcher in the field of psychology be an expert
statistician. However, competent and ethical researchers should
be willing to consult with colleagues who are aware of current
issues in statistics.
4) Expect to find conflicting opinions in the literature
regarding accepted statistical practice—our knowledge of
statistics is being constantly updated. Consult with others to
determine current standards in statistics.
5) Do not use a more complicated statistical method when a
simpler one will suffice. Do not choose an analytic technique
merely because it is popular in current literature. Choose your
analytic procedure because it is appropriate given your design
and data.
6) When writing up your work, explain your reasons for
selecting your chosen statistical procedures. Reference several
current published statistical articles that support your choice.
7) Be aware of the assumptions required by your chosen analytic
technique, and be certain that they have been met. Use multiple
techniques, including graphical representations, to ensure that
your chosen analytic technique is appropriate given your data.
8) When using tests of statistical significance, always report
effect sizes.
The Ethical Use of Statistical 17
Conclusion
The problem of erroneous and misleading results due to the
improper use of statistical procedures in the field of
psychology is not a new one. As we continue to discover new
analytic techniques and modify old ones, accepted practice in
statistical analysis will continue to change. As ethical and
responsible psychologists it is our duty to ensure the continued
quality of our research results.
The suggested guidelines should come as no surprise to
those familiar with the ethical standards of psychology.
Consultation, documentation, and practicing within the
boundaries of individual competence are guidelines which can be
applied across all of the psychologist’s varied roles. They
stand true for clinical practice, teaching, supervision, and
research, and are just as applicable to conducting a statistical
analysis.
That the misuse of statistical procedures has continued for
so long does not excuse its existence. As noted by Gergen
(1973), we are responsible for correcting our own mistakes. It
is our responsibility to do what we can to ensure that the
results of our research are neither erroneous nor misleading as
a result of poor statistical practice. Perhaps the Code of
Ethics illustrates the heart of this matter best by stating that
“it is the individual responsibility of each psychologist to
The Ethical Use of Statistical 18
aspire to the highest possible standards of conduct” (APA, 1992,
p. 1599).
The Ethical Use of Statistical 19
References
American Psychological Association. (1992). Ethical principles
of psychologists and code of conduct. American
Psychologist, 47, 1597-1611.
American Psychological Association. (1994). Publication manual
of the American Psychological Association. Washington, DC:
American Psychological Association.
Cohen, J. (1994). The earth is round (p<.05). American
Psychologist, 49, 997-1003.
Colombo, J. (1995). Cost, utility, and judgments of
institutional review boards. Psychological Science, 6, 318-
319.
Gergen, K. J. (1973). The codification of research ethics: Views
of a doubting Thomas. American Psychologist, 28, 907-912.
Goodyear, R. K., Crego, C. A., & Johnston, M. W. (1992). Ethical
issues in the supervision of student research: A study of
critical incidents. Professional Psychology: Research and
Practice, 23, 203-210.
Harwell, M. (1998). Misinterpreting interaction effects in
analysis of variance. Measurement and Evaluation in
Counseling and Development, 31, 125-136.
Hoaglin, D. C., Mosteller, F., & Tukey, J. W. (1985). Exploring
data tables, trends, and shapes. New York: Wiley.
Huber, P. (1981). Robust Statistics. New York: Wiley.
The Ethical Use of Statistical 20
Huberty, C. J. (1989). Problems with stepwise methods—better
alternatives. In B. Thompson (Ed.), Advances in social
science methodology (Vol. 1). Stamford, CT: JAI Press.
Kirk, R. (1996). Practical significance: A concept whose time
has come. Educational and Psychological Measurement, 56,
746-759.
Lance, T., & Vacha-Haase, T. (1998, August). The Counseling
Psychologist: Trends and usages of statistical significance
testing. Paper presented at the annual meeting of the
American Psychological Association, San Francisco.
Levin, J. R., & Marascuilo, L. A. (1972). Type IV errors and
interactions. Psychological Bulletin, 78, 368-374.
Marascuilo, L. A., & Levin, J. R. (1970). Appropriate post hoc
comparisons for interaction and nested hypotheses in
analysis of variance designs: The elimination of Type IV
errors. American Educational Research Journal, 7, 397-421.
McCabe, G. P. (1975). Computations for variable selection in
discriminant analysis. Technometrics, 17, 103-109.
McHenry, C. E. (1978). Computation of a best subset in
multivariate analysis. Applied Statistics, 27, 291-296.
Murphy, K. R. (1997). Editorial. Journal of Applied Psychology,
82, 3-5.
Parkinson, S. (1995). Scientific or ethical quality?
Psychological Science, 5, 137-138.
The Ethical Use of Statistical 21
Rosenthal, R. (1979). The “file-drawer problem” and tolerance
for null results. Psychological Bulletin, 86, 638-641.
Rosenthal, R. (1994). Science and ethics in conducting,
analyzing, and reporting research. Psychological Science,
5, 127-134.
Rosenthal, R. (1995). Ethical issues in psychological science:
Risk, consent, and scientific quality. Psychological
Science, 6, 322-323.
Rosenthal, R. & Rosnow, R. L. (1984). Applying Hamlet’s question
to the ethical conduct of research. American Psychologist,
39, 561-563.
Rosnow, R. L., & Rosenthal, R. (1989a). Definition and
interpretation of interaction effects. Psychological
Bulletin, 105, 142-146.
Rosnow, R. L., & Rosenthal, R. (1989b). Statistical procedures
and the justification of knowledge in psychological
science. American Psychologist, 44, 1276-1284.
Rosnow, R. L., & Rosenthal, R. (1991). If you’re looking at the
cell means, you’re not looking at only the interaction
(unless all main effects are zero). Psychological Bulletin,
110, 574-576.
Rozeboom, W. W. (1960). The fallacy of the null hypothesis
significance test. Psychological Bulletin, 57, 416-428.
The Ethical Use of Statistical 22
Steiner, I. D. (1972). The evils of research: Or what my mother
didn’t tell me about the sins of academia. American
Psychologist, 27, 766-768.
Thompson, B. (1989). Why won’t stepwise methods die?.
Measurement and Evaluation in Counseling and Development,
21(4), 146-148.
Thompson, B. (1995). Stepwise regression and stepwise
discriminant analysis need not apply here: A guidelines
editorial. Educational and Psychological Measurement, 55,
525-534.
Thompson, B. (1998, April). Five methodology errors in
educational research: The pantheon of statistical
significance and other faux pas. Invited address presented
at the annual meeting of the American Educational Research
Association, San Diego.
Thompson, B., & Snyder, P. A. (1997). Statistical significance
testing practices in the Journal of Experimental Education.
Journal of Experimental Education, 66, 75-83.
Vacha-Haase, T. (1998, August). A review of APA journals’
editorial policies regarding statistical significance
testing and effect size. Paper presented at the annual
meeting of the American Psychological Association, San
Francisco.
The Ethical Use of Statistical 23
Wilcox, R. R. (1996). Statistics for the social sciences. San
Diego, CA: Academic Press.
Wilcox, R. R. (1998). How many discoveries have been lost by
ignoring modern statistical methods? American Psychologist,
53, 300-314.
Wilkinson, L., & Task Force on Statistical Inference. (1999).
Statistical methods in psychology journals: Guidelines and
explanations. American Psychologist, 54, 594-604.