Replication in psychology Necessity or overblown crises ? Psychology department Faculty of...

Replication in psychologyNecessity or overblown crises ?

Psychology departmentFaculty of philosophy, Belgrade University

Iris Žeželj

Replication, an engine to the advancement of science?

Reproducibility: the amount of consistency in results when scientific studies are repeated

" Demarcation criterion between science and non science" (Braude, 1979)

Replication workshop, October 2015 2

How it should be…

Important scientific findings are independently replicatied, evidence of their robustness and universality is accumulated.

If a finding is theoretically grounded, from a soundly designed study with enough statistical power, it will see the light of day.

Regardless of its status : positive or negative.

Science is self correcting: only replicable findings pass the test, their epistemological status becomes more sound.

3Replication workshop, October 2015

Replication?

Not only it confirms scientific findingsBut also:Specifies the conditions under which the effect is

registered.Helps to more accurately estimate the strength of

the effect.(Brandt et al, 2013)


However…

Analysis of ALL articles in top 10 psychological journals from 1900:

• 1.6% uses the term "replication"

Analysis of 500 randomly chosen articles from 1.6%:• 68% of articles using the term replication are designed

to replicate


How big of a problem?

In an attempt to develop treatments for different types of tumors, 53 landmark studies published in biomedical journals were replicated in course of 10 years:

Only 6 (11%) were sucesfully replicated."Some non-reproducible preclinical papers had spawned an entire

field, with hunderds of secondary publications that expanded on elements of the original observation but did nor actually seek to confirm or falsify its fundamental basis." (Begley & Ellis, 2012, p.532)


Reactions from pharmaceutical and biotech industry:

" Situation is intolerable. Why aren’t we progressing faster in discovering efficient treatments? One option is that academic community is not supplying accurate findings. " (CNBC, 2012)



Collaborative replication effort-what is a successful replication-

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.


Collaborative replication effort-predictors of success-

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.

Why is it so?

Structure of insentives in science

"Questionable research practices (QRP)"


Incentives

Principles for funding and publishing research support innovative findings, but not testing the robustness of the existing findings

Daril Bem: Article on ESP (series of underpowered experiments), published in JPSPIndependent replication unsucesfull ”We NEVER publish direct replications” (JPSP Editorial board)Blog documenting troubles with publishing non replication:

http://chronicle.com/blogs/percolator/wait-maybe-you-cant-feel-the-future/27984

Replication finally published in a journal with a different editorial policy


http://chronicle.com/blogs/percolator/wait-maybe-you-cant-feel-the-future/27984

Is this a shared belief?

13

"Negative results are as fundamental for the scientific progress as attractive, counterintuitive positive results – regardless of a person’s popperian passion for falsification."(Kahneman, 2012)

Kahneman D. (2012). Open Letter linked to Nature

Replication workshop, October 2015

Fascination with statistical significance

14

Analysis od 165 articles in 4 major APA journals:• 94% reports on statistical significance• Out of that, 96% rejects null hypothesisAlthough it’s present in other sciences, appears less

prominent. In medical journals:• 70% reports on statistical significance• Out of that, 84% rejects null hypothesis


Where do these patterns originate from?

15

Three types of bias:

Editorial: of 79 editors of high impact journals 94% claims they do not encourage replications (Madden, 1995)

Reviewer: 60% reviewers favour novel findings over replications – "waist of journal space" (Neuliep & Crandall, 1993)

Author: probability of submitting positive 8 higher than negative finding (Greenwald, 1975)


How to interpret null findings "unsucessful replications"

17

a. Type 2 error: highly probable finding does not replicateb. There is no original effectc. Real strength of the effect in lower than claimed in the

original studyd. Design or analysis of either original study or replication are

methodologicaly flawed


Questionable research practices (QRP)

18

Anonymous survey- 6000 APA members:

• 74% does not report on al DVs, but only the ones that produce significant effects

• 71% stops collecting data when reach statistical significance• 54% reports the unexpected results as if they were expected (tzv HARK ing-

Hypothesizing After Results are Known)• 50% ommits negative findings as pilot studies or states they are

methodologicaly flawed, whilst positive findings are excepted with no scrutiny• 1.7% admits to fabricating data


Questionable research practices (QRP) revisited

19

Anonymous survey- 1138 members of German Psychological Association (Schwartz & Fiedler, in press):


QRPs not recognized as such

21

Standards in psycological research not recognized as QRP (Ioannidis, 2005) :

Series of ”small" experiments low in statistical power---- illusion of robustness of the effect

Continuing data collection over the planned sample size until statistical significance is reached ----ideo of increasing power

All QRPs more common in experimental then correlational studies.


Consequences

24

" The prevalence of QRPs raises questions about the credibility of research findings and research integrity by producing unrealistically elegant results that may be difficult to match without engaging in such practices oneself. This could lead to race to the bottom, with questionable research begetting even more questionable research – like performance enhancers in science!" (John, Lowenstein & Prelec, 2012 p. 8)

What is good for the scientists is not good for the science?


Replications: counterarguments

25

Fear of type 1 error (false positive) is unfounded if we stick to the conservative level to reject null hypothesis(.05 ili .01).

But:"A simple count of the percentage of significant results in journals would suggest that psychological studies have over 90% statistical power to reject the null-hypothesis. However, power estimates based on sample sizes and effect sizes suggest that power is at best 60%." (Giegerenzer & Sedelmeier, 1995).




(Pashler & Harris, 2012, p. 532)


Collaborative replication effort-journal ranking by R index-

R index – discrepancy between power estimated by reported significant results and power calculated based on sample sizes and effect sizeshttps://replicationindex.wordpress.com/2015/08/13/replicability-ranking-of-26-psychology-journals/

https://replicationindex.wordpress.com/2015/08/13/replicability-ranking-of-26-psychology-journals/

28

Although direct replications are rare, conceptual replications are often conducted and published, testing not only validity but also the universality of the effect.

ButWhat does an unsucesfull conceptual replication tell us?---Little or nothing about the

original effectWhat does a sucesfull conceptual replication tell us? ---- The original effect is

reproducible and possible to generalize to different, although similar conditions



29

Null results can be a consequence of an error in designing or conducting the research and therefore has no meaningful value.

But:Errors are posible, but in both scenarios (not discovering existing and discovering non

existing phenomena) – no reason to presume asymetry.



30

You can never claim you conducted the research identicaly as the original researcher (so called “quality of the chef” argument).

Is it a valid argument?How detailed our method sections should be?



31

Science lays on asymmetry between positive and negative claims – negative findings do not deserve equal treatment as positive. ONE positive finding (black swan) has more weight than number of negative ones.

ButAre scientific claims of this type? They are usually probability claims, which means the

asymmetry is reversed: one should not prove that the rule applies in one context for one respondent but to number of context and number of respondents..



32

By publishing negative findings, scientists are discouraged to research non robust, subtle effects, and those are often fundamental scientific knowledge.

One could agree or disagree.



33

Publishing negative results impedes the reputation of the original authors.

How much weight should this argument be given?



Replication etiquette (Kahneman , 2014)

34

• Contact the original authors• Ask for materials• Ask for unpublished detailes about the procedure• Share the findings with the original authors• Upon their permission, publish


How to procede, change of insentives needed?

36

Change in design

• Always directly replicate the original effect in an idenpendent experiment • Larger samples, more statistical power• Design preregistration


How to procede, change of incentives needed?

37

Change in reporting

• Abandon the binary approach: report on the effect strength and confidence intervals, not only p levels

• Bayesian statistics • More metaanalysis (even small scale metaanalysis) • Banning inferential statistics alltogether (BASP, Trafimow & Marks, 2015)• Journals more supportive of replication: PsychScience, Perspectives on Psych Science, Social

psychology, JJDM, JRP


38

Sharing the databases, accumulating data

Open Science Framework : https://osf.io/www.figshare.comhttp://psychfiledrawer.org/



http://www.figshare.com/

http://psychfiledrawer.org/

39

No novelty requirement?

• PLOS

Post publishing peer review?

• Test period in medical science, Pub Med



Our experience:preregistered replications

41

Call for special issue of Social psychology: Replications

Introduction and Method section sent out for review


42

Method

Participants: Sampling plan (sample size calculated based on the strength of the effect in initial study; recommended statistical strength for replication .95); Sample characteristics – known differences from the original study

Materials: In ideal scenario, materials are shared by the author of the original study. If not, explain why not and how the equivalence was guaranteed.



43

Method

Procedure: Very detailed description of the setting, experimenters (presence, blind to hypothesis or not), original instructions etc…

Data analysis plan: Specification of the analysis aiming to test the original effect. If integration of databases is planned, specification of metaanalysis procedure



44

Further steps

• 2 reviews• Revising the design• Conducting the replication• Final manuscript reviewed by two editors• Revising the manuscript



45

Collaborating with the original authors

• Final manuscript sent to the original authors• The original authors write commenary• Commentary is reviewed• Replication team responds to the comments• Rejoinder is reviewed• Journal publishes: Replication, commentary and rejoinder



... And different experiences


Take home messages

48

• Direct replication = the first step in cross cultural colaboration• Sample size planned based on the effect size• Design preregistered, if possible• Results integrated in joint databases• Metaanalysis (even small scale) on the data


Motivated reasoning?

49

Scientists=human beings

Link to a folder with the material:https://www.dropbox.com/sh/mzk0tvsaxnkp8c7/AADuLq6vsHtHspjuNdmsMSg7a?dl=0


https://www.dropbox.com/sh/mzk0tvsaxnkp8c7/AADuLq6vsHtHspjuNdmsMSg7a?dl=0

Thank you for your [email protected]

Replication in psychology Necessity or overblown crises ? Psychology department Faculty of...

Documents

Transcript of Replication in psychology Necessity or overblown crises ? Psychology department Faculty of...