A UNESCO Child and Family Research Centre Working Paper...Dr John Canavan Professor Pat Dolan UNESCO...

1

A UNESCO Child and Family Research Centre Working Paper

Evaluation Study Design – A Pluralist Approach to Evidence

Dr Allyn Fives

Dr John Canavan

Professor Pat Dolan

UNESCO Child and Family Research Centre, School of Political Science and Sociology,

National University of Ireland, Galway.

Paper presented to the Board of Eurochild, May 20th, 2014, Brussels.

How to cite this paper:

Fives, A., Canavan, J., & Dolan, P. (2014). Evaluation Study Design – A Pluralist Approach to Evidence.

A UNESCO Child and Family Research Centre Working Paper. Galway: National University of Ireland,

Galway.

2

Evaluation Study Design – A Pluralist Approach to Evidence

ABSTRACT

There is significant controversy over what counts as evidence in the evaluation of social interventions. It is

increasingly common to use methodological criteria to rank evidence types in a hierarchy, with Randomized

Controlled Trials (RCTs) at or near the highest level. Because of numerous challenges to a hierarchical

approach, the current paper offers a Matrix (or typology) of evaluation evidence, which is justified by the

following two lines of argument. First, starting from the principle of methodological aptness, it is argued that

different types of research question are best answered by different types of study. The current paper adopts a

pluralist interpretation of methodological aptness that stands opposed to the polarized debate between those

for whom RCTs are the “gold standard” and those in contrast who contend that RCTs are often inappropriate

in social settings. The two opposing paradigms are based on a series of ethical and methodological arguments

concerning RCTs that this paper attempts to refute. The second line of argument is that evaluations often

require both experimental and non-experimental research in tandem. In a pluralist approach, non-causal

evidence is seen as a vital component in order to evaluate interventions in mixed methods studies (as part of

evidence-based practice) and also is important for good practice itself (as part of practice-based evidence). The

paper concludes by providing a detailed description of what an Evaluation Evidence Matrix can and cannot do.

1. Introduction

Unlike the situation twenty or even ten years ago, policy makers and practitioners

now have access to a range of interventions that are underpinned by strong research

evidence; evidence adequate to support a decision to implement them. Most commonly

identified as “evidence-based practice” (EBP) the existence of these interventions has led to

the hope that the policy making process, and specifically decision-making on policy options,

will be transformed into one that is justified by empirical evidence. Supporting the

development of EBPs, organizations such as the Social Care Institute for Excellence (SCIE)

and the Society for Prevention Research (SPR) seek to disseminate information about “what

works” in areas such as health, education, welfare, social care, etc. (SPR, 2004; Nutley,

Powell, & Davies, 2013). However, the development of evidence-based interventions has

elicited a critical response that is epistemologically and methodologically based –

concerning our understandings of the nature of knowledge and how it is established; and

3

what type of evidence is required to underpin policy and practice and how this can be

obtained.

One approach common in social and health studies is to categorize evidence types in

a hierarchy, with the highest level of confidence in the value of an intervention usually

provided by randomized controlled trials (RCTs) (GRADE Working Group, 2004; NICE, 2004;

Veerman & Van Yperen, 2007; IES, 2008; Puttick & Ludlow, 2012). However, others believe

that experimental studies are in many if not all cases inappropriate when evaluating social

interventions and that, for ethical and methodological reasons, non-experimental studies

should be considered the most appropriate source of evidence (Morrison, 2001;

Hammersley, 2008; Stewart-Brown, Anthony, Wilson, Wintsanley, Stallard, Snooks, &

Simkiss, 2011). In the context of this polarized debate on evidence, the current paper arose

in response to a request from Eurochild to assist them in developing a position on evidence

for evaluation. An Evaluation Evidence Matrix was offered as an orientating guide for those

like Eurochild responsible for decision making on evaluation approaches and / or program

adoption. The Matrix represents a pluralistic approach to evidence and builds on the efforts

of others to dispense with a hierarchy and instead to categorize evidence types in a typology

(Muir Gray, 1996; Petticrew & Robert, 2003).

The position put forward in this paper is based on two lines of argument. First,

starting from the principle of methodological aptness, it is argued that different types of

research question are best answered by different types of study. The current paper adopts a

pluralist interpretation of methodological aptness that stands opposed to both positions in

the polarized debate on RCTs. This paper will argue that both positions are based on a series

of arguments that we will attempt to dispel. The second line of argument is that often both

experimental and non-experimental research is required in tandem. This paper illustrates

the value to evaluation work of a pluralist approach to evidence, by drawing attention in

particular to the questions of “why” and “how” an intervention works as well as highlighting

the benefits of practice-based evidence (PBE) to practice itself.

4

2. Evidence: From Hierarchies to a Matrix

2.1. Evidence Hierarchies

Finding a standardized and manageable approach to the role of evidence in policy

making has been seen as a key challenge for the policy and research community. One

method is to sort evidence types into a hierarchical classification. In such a hierarchy,

typically, the best evidence is provided by an RCT. In an RCT, because treatments are

randomly allocated to participants, confounding explanatory variables can be ruled out, and

for that reason an RCT evaluation can provide causal evidence for the effectiveness of an

intervention.

Level of evidence Type of evidence

1++ High quality meta-analyses, systematic reviews of RCTs, or RCTs with a very low risk of bias

1+ Well-conducted meta-analyses, systematic reviews of RCTs, or RCTs with a low risk of bias

1- Meat-analyses, systematic reviews of RCTs, or RCTs with a high risk of bias*

2++ High-quality systematic reviews of case-control or cohort studies High-quality case-control or cohort studies with a very low risk of confounding, bias, or chance and a high probability that the relationship is causal

2+ Well-conducted case-control of cohort studies with a low risk of confounding, bias, or chance and a moderate probability that the relationship is causal

2- Case-control or cohort studies with a high risk of confounding, bias, or chance and a significant risk that the relationship is no causal*

3 Non-analytic studies (e.g. case reports, case series)

4 Expert opinion, formal consensus

Figure 1 – NICE Levels of Evidence Source: Adapted from NICE (2005) The overall assessment of each study is graded using a code ‘++’, ‘+’ or ‘–’, based on the extent to which the potential biases have been minimized. *Studies with a level of evidence ‘–’ should not be used as a basis for making a recommendation

According to the National Institute for Clinical Excellence (NICE) (2004) (Figure 1),

and also the Grades of Recommendation Assessment, Development and Evaluation (GRADE)

system (GRADE Working Group, 2004), the highest form of evidence is provided by an RCT

with a low risk of bias, or high quality meta-analytic studies or systematic reviews of RCTs.

According to the What Works Clearinghouse guide to evidence (IES, 2008), only RCTs with

low levels of attrition provide “strong evidence” (or “meets evidence standards”). Both RCTs

with high levels of attrition and quasi-experimental studies (QESs) where there is

“equivalence” at baseline can provide only “weaker evidence.” At the bottom of the

hierarchy, “insufficient evidence” is provided by a QES where there is no “equivalence” at

5

baseline. Unlike an RCT, in a QES treatments are not randomly allocated to participants,

which has implications for any attempt to rule out confounding explanatory variables.1

The quality of study design implementation also matters. The NICE hierarchy makes

clear that an RCT study may have a high risk of bias (coded ‘-1’ in Figure 1) and when this is

the case the findings should not be used as a basis for making a recommendation. Bias may

result in mistakenly attributing causal efficacy to an intervention, and may arise (inter alia)

when participants move from one arm of the study to another (i.e., they choose the

treatment they were not assigned to) or due to study attrition affecting one arm of the

study more than the other (i.e., participants leaving the study).

Other evidence hierarchies make more explicit their application to policy and

practice. According to the National Endowment for Science Technology and the Arts’

(NESTA) Standards of Evidence for Impact Investing, while an RCT evaluation can

demonstrate causality, and independent replications can confirm causality, a higher level of

confidence in the impact of an intervention comes with the development of “manuals,

systems, and procedures to ensure consistent replication and [a] positive impact” (Puttick &

Ludlow, 2012, p. 2). In addition, the evidence hierarchy provided by Veerman and Van

Yperen (2007) associates different levels of evidence with different stages in the

development of an intervention (see also Urban, Hargreaves, & Trochim, 2014). While non-

experimental studies may be appropriate when an intervention is at an early stage in its

development, the highest type of evidence is provided by an experiment and it can be

sought only for interventions at their most developed. Veerman and Van Yperen (2007) also

note that causal evidence can be provided by either an RCT or a repeated case study. The

latter is a single subject study design where the participant is exposed to different

treatments or different doses of a treatment over time and data are collected at multiple

time points.

1 A within-groups quasi-experimental study collects data from one group, usually before and after exposure to

the program being evaluated. A between-groups quasi-experimental study collects data from two groups, only one of which received the program, but the treatments are assigned to participants by a non-random process.

6

2.2. Evidence Matrix

An alternative approach is to categorize evidence in a matrix. The Evaluation

Evidence Matrix (see Figure 2) builds on the work of Muir Gray (1996) and Petticrew and

Roberts (2003). A matrix draws attention to the wide variety of research questions that can

be addressed in evaluation but also to the varied study designs appropriate to answering

them. Figure 2 includes research questions posed in exploratory, implementation, and

outcomes studies, and also a variety of experimental and non-experimental study designs.

As we saw, experimental studies can be used to analyze the causal effects of an

intervention. However, as the Matrix shows, many of the same questions asked about the

effectiveness of an intervention can be addressed by a range of study designs. For instance,

a cohort study, which is a non-experimental longitudinal study involving the observation of

participants over a number of time points, can be used to answer the question “Does X do

more good than harm?” Also, a QES may be more appropriate than an RCT in the first stage

of an evaluation to answer the question “Does X work?”; and if the QES evaluation shows

significant gains by participants then there may be a strong rationale to proceed to a RCT

evaluation of efficacy (i.e., under optimal conditions) and then of effectiveness (i.e., under

real-world conditions) (SPR, 2004).

In addition, RCTs are not the only study design capable of generating understanding

of causality (AEA, 2003). An RCT with a very low risk of bias is thought to provide strong

evidence for causality because randomization rules out confounding explanations:

“alternative explanations … are randomly distributed over the experimental conditions”

(Shadish, Cook, & Campbell, 2002, p. 105). Because of the lack of randomization in QESs,

they are thought to provide indicative rather than causal evidence (Veerman & Van Yperen,

2007). Nonetheless, there are various ways to show that alternative explanations are

implausible in a QES, and therefore for a QES to provide evidence of causality, including

observations at multiple prettest time points and multiple follow-up time points (Shadish et

al., 2002).

7

Research Questions Stu

dy

De

sign

(Evi

de

nce

Typ

e)

Met

a-an

alys

es

of

RC

Ts

RC

Ts

Rep

eate

d c

ase

stu

die

s (n

=1)

QES

s

Cas

e-c

on

tro

l

stu

die

s

Co

ho

rt s

tud

ies

Theo

ry o

f

chan

ge s

tud

ies

No

rm

refe

ren

ced

stu

die

s

Ben

chm

ark

stu

die

s

Do

cum

enta

ry

stu

die

s

Qu

alit

ativ

e

stu

die

s

Surv

eys

Exp

ert

kno

wle

dge

stu

die

s

Ref

lect

ive

pra

ctic

e

Exploratory Studies

Are needs being met by existing services? X X X X X X X X X X

Are practitioners satisfied with services? X X X

Are users satisfied with services? X X

Is the current service cost-effective? X X X

What alternative would be appropriate? X X X

Does the workforce have necessary skills to implement alternative service?

X X X X X

Are financial & other resources available to implement alternative service?

X X X X X

Will users take up alternative service? X X X X X

How does proposed alternative fit with policy / service priorities?

X X X X X

Implementation and Outcomes Studies

Does X work more than Y? X X X X

Does X do more good than harm? X X X X X X X X

Does a proven program work here? X X X X X X

How does X work? X X X X X X X X X X

Is it meeting its target group? X X X X X X X

Are users satisfied & will they take up? X X

Are practitioners satisfied? X X X

Was it implemented with fidelity? X X X X

Is it cost-effective? X X

Is it socially valuable? X X X

Figure 2. An Evaluation Evidence Matrix: A pluralist guide to appropriate study designs. Note: RCT = Randomized Controlled Trial; QES = Quasi Experimental Study

8

Other non-causal quantitative study designs are available as well, and they can be

matched to various research questions. A theory of change study, a norm referenced study,

or a benchmark study may be appropriate to answer many of the questions also addressed

by experimental and quasi-experimental studies, as the Matrix shows. In addition, because

these study designs do not deny a treatment to participants, they may be more appropriate

than an RCT to answer the question “Does a proven program work here?” We return to this

issue below in more detail.

Finally, the range of possible research questions is broader than those concerned

with effectiveness and outcomes. Some implementation questions (e.g. “Was it

implemented with fidelity?”) and exploratory questions (e.g. “Is the current service cost

effective?”) are better answered through documentary analysis, qualitative studies, and

survey-based approaches. The Matrix also indicates that an RCT can be employed to answer

the “How does X work” question. Various approaches to statistical analysis (including

structural equation modeling) can be used to test the causal model that underlies the

intervention, for example, that the impact of a reading program on children’s reading will be

mediated through its impact on children’s perceived self-efficacy as readers and/or

frequency of reading (Fives et al., 2015). Therefore, even when an RCT returns findings

showing that the intervention caused the observed outcomes, the underlying causal model

can and should be tested in addition.

2.3. Arguments for an Evaluation Evidence Matrix

Although we have drawn attention to the wide variety of research questions that can

be addressed in evaluation and the varied study designs appropriate to answering those

questions, others may still be unwavering in their support of an evidence hierarchy and

continue to insist that only a small minority provide “strong evidence.” The current paper

adopts two lines of argument in justifying an evidence matrix as an alternative to an

evidence hierarchy.

The first is that often both experimental and non-experimental research is required

in tandem, and this will be addressed below in section 4.

The second line of argument is that a hierarchy of evidence pays insufficient

attention to methodological aptness. If “different types of research questions are best

answered by different types of study” then in certain circumstances the “hierarchy may

9

even be inverted,” for example, if a qualitative study is more appropriate to answer the

research question being posed than an RCT (Petticrew & Roberts, 2003, p. 528). Although

this paper will support the principle of methodological aptness, it is still a matter of great

importance how that principle is to be interpreted. This is the case as the principle can be

interpreted in such a way as to limit the plurality of evidence types, by supporting either

side in the highly polarized debate on RCTs.

On one side of the debate are those who believe RCTs are the gold standard in

evaluation (Axford, Morpeth, Little, & Berry, 2008; Boruch et al., 2009; Oakley et al., 2003;

Ritter & Holley, 2008; Shadish et al., 2002). Although they can accept that different study

designs are appropriate for answering different research questions, they can go on to argue

that, for ethical and methodological reasons, in fact the RCT is the gold standard in

evaluation. On the other side is a grouping of those who for different reasons conclude that,

in most if not all cases, RCTs are not appropriate in evaluating social interventions (Bonell,

Hargreaves, Ciousens, Ross, Hayes, Petticrew, & Kirkwood, 2011; Cheetham, Fuller, McIvor,

& Perth, 1992; Ghate, 2001; Greene & Kreuter, 1991; Morrison, 2001; Pawson & Tilley,

1997; Speller, Learmonth, & Harrison, 1997; Stewart-Brown et al., 2011). Although they

acknowledge that RCTs can generate evidence of causality in appropriate contexts, they also

believe that, unlike the controlled environment of bio-medical studies, very often the fluid

and dynamic environment of social interventions is not appropriate for RCTs.

As the principle of methodological aptness can be understood in conflicting ways, it

is necessary to address directly this polarized debate on RCTs. In the next section we argue

that both positions are based on arguments concerning the ethical and methodological

qualities of RCTs that should be rejected. This will not be an exhaustive review of all issues

pertinent to decisions on appropriate study designs and evidence types, but rather will

focus on a number of contentious and influential arguments. The debate will be illustrated

with reflections on the RCT evaluation of the Wizards of Words (WoW) reading program

(Fives et al. 2013; Fives et al., 2014).

3. Against the Two Extremes in a Polarized Debate on RCTs

In an RCT, treatments are assigned randomly and treatments are withheld from

participants (Altman, 1991; Boruch, Weisburd, Turner, Karpyn, & Littell, 2009; Jaded, 1998).

10

Participants do not all receive the same treatments and treatments are not assigned on the

basis of criteria used in everyday professional practice, in particular the professional’s

judgment of the recipient’s needs. The seminal text on clinical trials states that an RCT is

justified only if (among other things) (1) it is a “crucial experiment” to determine which

intervention is superior and shows “scientific promise” of achieving this result; and (2) there

must be a state of “equipoise,” as the members of the relevant expert community must be

“equally uncertain about, and equally comfortable with, the known advantages and

disadvantages of the treatments” (Beauchamp & Childress, 2009, p. 323, p. 320). However,

the opposing arguments in the current debate on RCTs show that there is deep

disagreement on whether these requirements can, or should, be met.

3.1. The position that RCTs are the ‘gold standard’ in social evaluation

3.1.1. Argument 1: When making ethical judgments of RCTs therapeutic duties of care are

irrelevant

The first argument concerns the fact that in RCTs treatments are withheld. In the

field of clinical trials it is argued by some that, if physicians are to act from a duty of care, it

is never permissible to withhold what is believed to be a beneficial treatment from their

patients. Therefore, it is only ever permissible to recruit subjects in a clinical trial, where

treatments will be withheld, when the physician-scientists are confident that no patients

will be denied a beneficial treatment (Freedman, 1987; Miller & Weijer, 2006). There must

be what is called a state of “equipoise.” This explains why it is ethically objectionable to run

an RCT to “confirm” the effectiveness of a program. When a program has already been

shown to be effective through an RCT, all things being equal, it is inappropriate to run an

RCT again for this program (Beauchamp & Childress, 2009), although it may be necessary to

reconsider this question if and insofar as the program is being implemented among

significantly different population groups.

However, an opposing line of argument has developed which states that therapeutic

duties of care are irrelevant to the ethical justification of RCTs and for that reason equipoise

is unnecessary for an ethically justified RCT. Some of their proponents have claimed that

RCTs in clinical studies should be evaluated “entirely … on a scientific orientation … without

11

making any appeals to the therapeutic norms governing the patient-physician relationship”

(Joffe & Miller, 2008, p. 32). Because the overriding objective in an RCT is scientific rather

than therapeutic, the ethical justification of RCTs requires only non-exploitation of

participants and that the evaluation is a crucial experiment and shows scientific promise

(Miller & Brody, 2007; Veatch, 2007).

If we accept this position, the source of possible moral objections to any one RCT

study is reduced considerably. However, even though the RCT design departs from

therapeutic criteria in the way that it allocates treatments, and even though study subjects

should be made fully aware that participation does not guarantee receipt of any one

treatment (to avoid the therapeutic misconception), nonetheless scientists and non-

scientist collaborators in an RCT continue to have a duty of care to participants, both qua

study subjects and also in many instances qua clients of their services. Because they are in a

position to exercise “discretionary” powers, whether as scientists or as professionals, they

have a duty to demonstrate “reasonable care, diligence and skill” (Miller & Weijer, 2006, p.

437).

Therefore, there are strong grounds to argue that the duty of care is relevant to the

ethical evaluation of RCTs. It also follows that our duty of care may be a source of ethical

concern about the permissibility of a RCT evaluation.

3.1.2. Argument 2: Study conditions must be equal at baseline

When RCT findings are reported, often authors will begin by stating that there were no

statistically significant differences observed between the two conditions at baseline (i.e.,

baseline equality) and for that reason the random allocation process was successful and the

study will allow causal inferences to be drawn about the effectiveness of the treatment

evaluated (Tierney et al., 1995; Oakley et al., 2003; Savage et al., 2003; Ritter & Holley,

2008), and many textbooks and guides to RCTs repeat this contention (Rossi et al., 2004;

SPR, 2004; GSR, 2007; IES, 2008). However, random allocation does not, and need not,

guarantee baseline equality. Although across all possible RCTs the two groups will be equal

at baseline, in any one study, precisely because allocation is random, it may lead to two

unequal groups at baseline (Altman, 1985; Senn, 1994; Fives et al. et al., 2013). If there are

inequalities between groups at baseline, this can be controlled for in the analysis of

outcomes but only if the variables in question predict outcomes (Assmann, Pocock, Enos, &

12

Kasten, 2000). To make such a judgement requires “prior knowledge of the prognostic

importance of the variables,” “clinical knowledge,” and “common sense” along with

knowledge of the relevant research (Altman, 1985, p. 130 – 132).

The actual role of random allocation in an RCT is to protect against one threat to

internal validity, “selection bias.” An RCT is “internally valid” when the differences between

the two study conditions observed at the completion of the treatment (i.e. at posttest) can

be ascribed to the different treatments (along with random error) and not to other

confounding variables (Juni et al., 2001). Random allocation protects against confounding

variables because it removes selection bias, i.e. there is “no systematic relationship”

between membership of the intervention or the control groups “and the observed and / or

unobserved characteristics of the units of study” (GSR, 2007, p. 4). The statement on the

Consolidated Standards of Reporting Trials (CONSORT) recommends that the success of the

random allocation process depends on both the generation of the random allocation

sequence and the steps taken to ensure its concealment (Altman et al., 2001). For example,

in the WoW evaluation, as the process of allocation was random and as it was concealed

from both the researchers and the program providers, we can be confident random

allocation was successful, irrespective of a “small” and non-significant (Cohen’s d = 0.20, p =

.10) difference between study conditions at baseline on the measure of single word reading.

Therefore, it is the case that random allocation of participants can protect against

selection bias and ensure that causal inferences can be drawn about the effectiveness of the

intervention evaluated. Nonetheless, it is not the case that the random allocation process in

an RCT will guarantee baseline equality or that this explains the robustness of findings from

a social experiment.

3.2. The position that RCTs often are not appropriate in evaluating social interventions

3.2.1. Argument 2: Study conditions must be equal at baseline (continued)

As we saw, many who believe RCTs are the gold standard hold to the argument that

random allocation must ensure baseline equality between study conditions. Many of those

opposed to RCTs in social settings share this belief. They assume that, if study conditions are

not equal at baseline, this is evidence that the randomization process has failed and the

13

internal validity of the study has been deeply compromised. They also claim that, because

social settings are more fluid than the laboratory or the hospital for instance, it is not

possible to guarantee that the two groups of participants will be equal at baseline, as there

may be differences between groups on both measured and unmeasured variables, and for

this reason RCTs in social settings cannot be scientifically promising studies (Morrison,

2001).

As we have already said, although the role of random allocation is to safeguard

against selection bias, this does not require the creation of baseline equality. Therefore,

even if the two study conditions are not “the same” in all respects at baseline, this by itself

does not undermine the “scientific promise” of the RCT. An RCT study can allow causal

inferences to be drawn because random allocation safeguards against selection bias and this

is the case even in supposedly more fluid social settings such as children’s education.

Therefore, skepticism concerning RCT evaluations of social interventions is not justified by

this particular belief.

3.2.2. Argument 3: RCTs in social settings are not ethically permissible as equipoise cannot

be attained

The third argument again concerns the fact that in RCTs treatments are withheld. As

we have seen, according to some, it is only ever permissible to recruit subjects in a clinical

trial, where treatments will be withheld, when the physician-scientists are confident that no

patients will be denied a beneficial treatment. We have already argued that the duty of care

is relevant to the ethical judgment of a trial and we have criticized the argument that claims

to show such duties are irrelevant. Now we turn to the question of whether it follows that

equipoise also is necessary for the ethical permissibility of a trial. It has been claimed (this is

the third argument) both that equipoise is necessary for an RCT to be justified, and also that

equipoise is difficult or impossible to attain in social settings. Given the variability in

professional judgments concerning which treatments are most effective for specific clients,

for example, each teacher’s judgments of what each pupil needs to improve their literacy

acquisition, equipoise is unlikely to be achieved and impossible to retain for very long

(Hammersley, 2008). Therefore, in social settings, in most instances an RCT will be

unjustified because it will deny what is thought to be a beneficial treatment to some or all

participants in a study.

14

However, a community of experts can be out of equipoise even when for scientific

reasons there are strong grounds for an RCT evaluation. For example, in the WoW study,

even though the community of experts had a decided preference for programs similar to

WoW, a volunteer reading support for children at risk of reading failure (Brooks, 2002; NESF,

2005), still there was a lack of evidence for its effectiveness. Also, this study helps illustrate

that it may be possible to ethically justify an RCT even in the absence of equipoise. This is

the case because even when there is a decided community-based preference for it, an

intervention may turn out to be ineffective or actually harmful (Oakley et al., 2003). It

suggests there may be a moral dilemma to be addressed: a conflict between the duty of

care to those over whom we exercise discretionary power (e.g. child participants in an RCT)

(Miller & Weijer, 2006) and the duty to go ahead with a crucial experiment that shows

scientific promise. And in some cases, as in the WoW study, considerations about the need

for causal evidence about the effectiveness of a program that could be harmful or

ineffective may be judged important enough to out-weigh opposing considerations about

the need to ensure no one is denied a preferred treatment (Fives et al. et al., 2014).

3.2.3. Argument 4: RCTs cannot be scientifically promising in social settings

As we have seen, to be ethically permissible, one requirement is that the proposed

RCT shows scientific promise and is a crucial experiment. The final argument to address is

that, in most if not all cases, an RCT cannot be scientifically promising in social settings. We

look at just two of the numerous methodological criticisms of RCTs here. First, it is argued

that in social settings participants often will be aware of the treatment type (control or

intervention) they have received, and therefore RCTs cannot be “blind,” and this lack of

blindness will undermine the validity of the results (Morrison, 2001; Stewart-Brown et al.,

2011). For instance, a positive self-concept may be more likely among some subjects

because they know they have received the intervention. Second, in a social setting it is not

possible for a trial to be truly controlled, due to the large number of possible confounding

variables (Morrison, 2001; Hammersley, 2008). For that reason, it is not possible to infer

that the intervention evaluated was the cause of the observed outcomes.

However, these are threats to the internal validity of an RCT that are (along with

many others) routinely addressed in clinical trials. And if these threats to validity can be

15

addressed in clinical trials then there is no prima facie reason why they cannot be addressed

in social trials. For instance, even though students in the WoW evaluation were aware of

their study condition, this was not likely to undermine study validity because external

(rather than self-report) measures were used to collect data on outcomes: i.e. standardised

tests of reading achievement (Shadish et al., 2002; Webb, Campbell, Schwartz, & Sechrest,

1981). Also, continual monitoring reduced the likelihood of control participants receiving

the WoW program (treatment diffusion) or control participants receiving supplementary

reading supports (compensatory equalization). In that way, the “controlled” nature of the

study was preserved and confounding variables ruled out.

Although there are grounds to reject this argument against RCTs in social settings, it

does not follow that RCTs are always preferable on methodological grounds. Even when the

aim is to analyse whether an intervention works, it does not follow the RCT is the “gold

standard.” First, RCTs require a large sample size to guarantee statistical power, and

recruitment of a large sample size is not always feasible for programmatic reasons (Shadish

et al., 2002, p. 46). Therefore, in such a scenario, on methodological grounds a quasi-

experiment or one of the various non-experimental study designs may be preferable.

Second, in highly individualized programs, such as speech and language therapy (and others

in areas such as psychotherapy, family support, parenting), where interventions are

matched to the needs of the individual, other study designs may be more suitable. In

particular, single case study designs, which can provide evidence of causality, may be more

appropriate (Vance & Clegg, 2012).

As we have argued, the claim that RCTs are the “gold standard” in evaluation is

based on two commonly made but highly questionable arguments: that random allocation

should create two groups equal at baseline and that therapeutic duties of care are irrelevant

to the ethical justification of RCTs. The opposing argument that RCTs are inappropriate in

social settings is based on the following three commonly made but highly questionable

arguments: that random allocation should create two groups equal at baseline (but cannot

do so in social settings), that RCTs cannot be ethically permissible as equipoise cannot be

attained, and that RCTs cannot be scientifically promising. We have concluded that RCTs

may be both scientifically promising and ethically justified, but also that ethical objections to

RCTs may arise from our duty of care and also alternative study designs may be chosen for

methodologically sound reasons.

16

4. A Plurality of Evidence Types: Combining Practice-Based Evidence and Evidence-Based

Practice

Starting from the principle of methodological aptness, we have questioned the two

extremes in a polarized debate on RCTs, and argued that ethical and methodological

considerations will determine if and in what circumstances an RCT is appropriate. In so

doing, we have questioned the tendency to place value in only a small minority of study

designs, and provided grounds for greater pluralism in evaluation evidence. We now turn to

the second line of argument for an evidence matrix, namely that often both experimental

and non-experimental research is required in tandem. We will address what role non-causal

evidence types can play in mixed methods studies, and also in practice-based evidence (PBE)

and practice wisdom.

4.1. Adaptation of Evidence-Based Practices

There is a close connection between experimental evidence on the one hand and

manualized programs on the other. As we have seen, according to the NESTA hierarchy of

evidence (Puttick & Ludlow, 2012), while multiple RCTs can confirm that an intervention is

effective, manualization provides confidence that the intervention will be effective

consistently. While an “efficacy” experimental study tests a hypothesis about the causal

impact of a single intervention, implemented in a standard way, in targeting a single

disorder or problem (Barkham & Mellor-Clark, 2003, p. 320), an “effectiveness” RCT is to

provide evidence regarding the program when implemented by practitioners in real world

settings (Green, 2001; Wandersman, 2003). However, in real world settings children and

parents may have multiple and complex problems and disorders (Mitchell, 2011, p. 208;

Jack, 2011), any one family may be receiving services from multiple sectors, and not all

social interventions are discrete, manualized programs (Mitchell, 2011, p. 208). Therefore,

while RCTs are ideally suited to analyze the causal impact of standardized programs under

ideal conditions, if social interventions and social contexts do not all fit this model, this

indicates why other forms of evidence, whether as an adjunct to or in place of

experimentally-derived causal evidence, are of value when decisions are made about

implementation.

17

The PBE paradigm examines interventions as they are in routine practice. This can be

pursued through “practice research,” research often carried out by (a network of)

practitioners themselves so as to inform and improve practice (Barkham & Mellor-Clark,

2003, p. 321) which is also referred to as “empowerment evaluation” (Green, 2001;

Wandersman, 2003). Efforts have been made to “integrate” the two paradigms, and in

particular to show that evidence-based programs are and must be adapted when they are

implemented because “client characteristics, needs and contexts” will not be the same as

they were among participants in the original evaluation (Mitchell, 2001, p. 209; Cf

Vandenbroeck, Roets, & Roose, 2012).

As an example of an integrative approach, the American Psychological Association

(APA) Presidential Taskforce defined EBP as “the integration of the best available research

with clinical expertise in the context of patient characteristics, culture, and preferences”

(APA, 2006, p. 273). An integrative approach is also proposed by the Institute of Medicine

(2001; Sackett, Rosenberg, Gray, Haynes, Richardson, 1996). An integrative approach places

considerable importance on the idea of clinical expertise, or practice wisdom (Rubin &

Babbie, 2014). Practice wisdom is defined as “practice-based knowledge that has emerged

and evolved primarily on the basis of practical experience rather than from empirical

research” (Mitchell, 2011, p. 208). However, if an integrative approach is to be pursued, the

following question must be addressed: how is practice-based knowledge to be integrated

with for instance causal evidence in a way that is rigorous and rationally justifiable?

4.2. Mixed methods studies

The current paper, as part of a pluralist approach, has rejected a hierarchy of

evidence that places experimental studies at the apex. While it is the case that some

evidence types make a greater contribution to knowledge of efficacy or effectiveness

because they better rule out threats to internal validity, it does not follow that causal

evidence is by definition superior to other evidence types (APA, 2006, p. 275). Even when an

experiment provides evidence of efficacy or effectiveness, “other types of research tools

and designs are often needed to yield important information about when, why, and how

interventions work, and for whom” (SPR, 2004, p. 1; emphasis added). Some have gone on

to argue that because RCTs pay insufficient attention to the circumstances in which an

intervention works and for whom it works, RCTs are “not always the best for determining

18

causality” as they “examine a limited number of isolated factors that are neither limited nor

isolated in natural settings” (AEA, 2003; see Bagshaw & Bellomo, 2008). It should be noted

that experimental studies can address questions of when, why, how, and for whom, in

particular through sub-group analyses. However, the point being made here is that these

questions can be addressed as well by various non-causal study designs such as qualitative

interviews, observations, and documentary analyses, but also by the various ways evidence

may emerge from practice (i.e. practice-based evidence).

Experimental studies may provide causal evidence to support a particular policy, for

example, mentoring with at-risk youths to reduce risky behavior, and home visiting in the

first 24 months to reduce child maltreatment in vulnerable families. However, even when

such evidence has been accrued, other types of evidence become necessary for questions of

how and why interventions work with specific groups, questions that need to be answered

so as to make informed decisions on the adoption of programs and/or the reform of existing

interventions (Pawson & Tilley, 1997; Chen, 2004). Qualitative methods, which may identify

how and why interventions work, have been in use in social science research particularly in

social work contexts since the 1960s. An important instance is the use of reflective practice

in the UK and Australia via “process recording” whereby professionals use discrete methods

(Dolan, 2006) to establish with colleagues and those they work with how and in what way

they are deemed useful, ineffective, or a hindrance in their service provision. This has a

value in breaking down into identifiable components why an intervention works for the end

user. For instance, Devaney, in her research with veterans of Family Support, both in

practice and academia, established that the person may be as important as the program, in

that the quality of the caseworker-client relationship was deemed crucial (Devaney, 2011).

19

5. An Evaluation Evidence Matrix: A Pluralist Guide to Appropriate Study Designs

If there is a plurality of evidence types how can informed choices be made about

what type of evidence is appropriate in evaluation and program adoption? The Evaluation

Evidence Matrix (Figure 2) is truly informative in making such choices only if it is understood

in light of a number of qualifying statements arising from the foregoing discussion of a

pluralist approach to evidence. Below we list both what the matrix can do but also what it

cannot do.

5.1. What the Evaluation Evidence Matrix can do

Provide guidance on exemplary questions. We do not claim that the list of research

questions in the Matrix is exhaustive or comprehensive. Rather, our aim was to include

exemplary questions. The value of the matrix is in providing an orientation to any decision

about the appropriate study design / evidence type for specific research questions.

Rule out an evidence type as inappropriate for a specific research question. For

instance, the type of evidence that can be produced from a documentary study or a cohort

study are appropriate for answering some questions, but are not appropriate for answering

others, for example, the question “Does X work more than Y?”

Indicate what evidence type may be appropriate for a specific research question. If

we wish to answer the question “Does X work more than Y?” the type of evidence that an

RCT study provides may be the most appropriate. However, as we saw, many considerations

must be addressed before an RCT can be judged the most appropriate study design.

Show if a study design can be used to answer more than one type of question. While

RCTs are associated with the “what works” questions, they can also be used to answer the

question “How does X work?” if specific data analysis methods are used. For instance,

structural equation modeling can be used to analyze the variables that mediated impact in

an RCT study, and in this way help determine how an impact came about.

Show if more than one study design can be used to help answer the same question.

No one study design can be said to offer the only appropriate approach to answer any one

of the research questions listed, for instance the question “How does X work?” In addition,

any one piece of research may combine different study designs or elements from different

study designs to answer one question (or a number of questions). Therefore, what are

referred to in the Matrix as different study designs in the real world of research may be

20

combined together in a single evaluation. The Matrix therefore acknowledges the reality of

mixed-methods studies.

Help make methodological decisions that are ethically informed. The construction of

the matrix has been informed by ethical considerations. For instance, it is assumed unethical

to run an RCT to “confirm” that a proven program works, as in such a study some

participants would be denied a known benefit. Therefore, although on methodological

grounds there is no reason why an RCT could not answer the question “Does a proven

program work here?” on ethical grounds it would be inappropriate.

5.2. What the Evaluation Evidence Matrix cannot do

Distinguish between strong and weak evidence types. In the literature on evaluation

evidence considerable attention rightly has been given to distinguishing between strong and

weak evidence types based on considerations of validity, and also between study designs

that have been well implemented and those that have not (NICE, 2004; see sections 3.1.2.,

3.2.1., and 3.2.3.). However, these distinctions are not included in the Matrix. For example,

when the Matrix identifies the RCT design as appropriate for answering the question “Does

X work more than Y?” it has to be assumed that the RCT in question is well implemented

and has adequately addressed threats to validity. Therefore, while the Matrix can identify

appropriate evidence types for research questions, this must be supplemented with further

information on study design implementation.

Show how to combine different study design elements. In a mixed methods study

different sources of evidence are combined (see section 4.2.). However, a decision is

required on how to combine different types of evidence and in particular whether one type

of evidence will be used to support (e.g. expand on or clarify) evidence of another sort

(Creswell and Clark, 2007; Wight and Obasi, 2003). The Matrix cannot decide for us which of

the two sources of evidence should play a supportive role and once again the Matrix must

be supplemented, in this instance with information on mixed methods.

Resolve moral dilemmas. Although ethical considerations have informed the

construction of the Matrix (see sections 3.1.1. and 3.2.2.), in some instances ethical reasons

can be raised to question the guidance given in the Matrix. For example, an RCT is

appropriate to help answer the question “Does X work more than Y?” However, even when

there is no evidence from other studies that X is effective, the community of practitioners

21

may have a decided preference for it (i.e. they are out of equipoise) and therefore may

believe it is unethical to run an RCT where some will be denied this program. Just such a

clash of principles arose in the WoW study (as we saw in section 2) and those involved had

to work through this dilemma and make an ethical judgment. What the Matrix cannot do is

make that ethical judgment for us.

Make a normative judgment of programs and policies. Although practice and policy

should be increasingly evidence-based and evidence-informed, normative (i.e. moral or

ideological) judgments are also made about what problems are most pressing or significant,

and such judgments are not and cannot be based solely on empirical evidence. This is most

clearly the case when a decision needs to be made between two programs but the empirical

evidence does not show conclusively that one is better. Again the Matrix cannot decide for

us what normative judgments should be made concerning priorities in practice and policy.

6. Conclusion: Towards a New and Long Overdue Perspective

The current paper has proposed a pluralist approach to evidence, represented in the

Evaluation Evidence Matrix. Although critical of hierarchies of evidence for their too narrow

focus on experimental study design and the “what works” question, the underlying

argument is not one of skepticism, and indeed this paper has critically addressed those who

contend that RCTs are in most cases inappropriate in social contexts. The pluralist approach

to evidence arises from an awareness of a need for new thinking on the relationship

between experimental and non-experimental evidence.

First, in many contexts, it is appropriate for both methodological and ethical reasons

to conduct an RCT evaluation to generate causal evidence for an intervention’s impact. RCTs

are ideally suited to answer the “what works” question, but the same question can be

addressed by other study designs (e.g. repeated case studies, QESs). In addition, in an RCT

study the causal model underlying an intervention should be tested (for example, through

structural equation modeling) to help answer the question “how” or “why” an intervention

worked.

Second, in many if not most evaluations, mixed methods research is required. In

particular, RCTs should be combined with analyses of the processes by which outcomes

were achieved and the contexts in which the intervention was implemented. Such a mixed

22

methods evaluation can help answer questions about “how” an intervention worked, “for

whom,” and “in what contexts?”

Third, it may not always be appropriate to repeat RCTs with the same program when

it is implemented in different countries, in particular for the ethical reason that participants

should not be denied a beneficial treatment. What may be required is a deeper

investigation using other quantitative methods (including time series analysis, longitudinal

studies) to identify the key components of successful programs over time.

Fourth, a loosening of the relationship between overly strict following of manuals

towards greater respect for practice wisdom as a form of evidence may be worthwhile. The

danger of doing so is that it raises problems concerning how to ensure program fidelity. For

that reason, limits must be placed on practitioner innovation, and in particular such

innovation must not entail a departure from what are known to be the key ingredients of a

successful intervention. In this way professional wisdom can be mined as one of a number

of sources to establish practice-based evidence, but its function should be seen as

complimentary to experimental evidence on “what” works.

References

Altman, D. G. (1985). Comparability of randomized groups. The Statistician, 34, 125-136.

Altman, D.G. (1991). Randomization: Essential for reducing bias. British Medical Journal, 302(6791), 1481-

1482.

Altman, D.G., Moher, D., Schulz, K.F., Egger, M., Davidoff, F., Elbourne, D., Gotsche, P. & Lang, T. (2001). The

CONSORT statement: Revised recommendations for improving the quality of reports of parallel group

randomized trials. Annals of Internal Medicine, 134(8), 657-662.

American Evaluation Association (AEA). (2003). American Evaluation Association Response to U. S. Department

of Education. http://www.eval.org/p/cm/ld/fid=95 [June 2014]

APA Presidential Taskforce on Evidence-Based Practice. (2006). Evidence-based practice in psychology.

American Psychologist, 61(4), 271-285.

Assmann, S. F., Pocock, S. J., Enos, L. E., & Kasten, L. E. (2000). Subgroup analysis and other (mis)uses of

baseline data in clinical trials. Lancet, 335(9235), 1064-69.

Axford, N., Morpeth, L., Little, M., & Berry, V. (2008). Linking prevention science and community engagement:

a case study of the Ireland Disadvantaged Children and Youth Programme. Journal of Children’s Services, 3(2),

40-54.

http://www.eval.org/p/cm/ld/fid=95

23

Barkham & Mellor-Clark, (2003). Bridging Evidence-Based Practice and Practice-Based Evidence: Developing a

Rigorous and Relevant Knowledge for the Psychological Therapies. Clinical Psychology and Psychotherapy,

10(6), 319–327.

Beauchamp, T.L., & Childress, J.F. (2009). Principles of Biomedical Ethics, sixth edition. Oxford: Oxford

University Press.

Bonell, C.P., Hargreaves, J., Ciousens, S., Ross, D., Hayes, R., Petticrew, M., & Kirkwood, B.R. (2011).

Alternatives to randomisation in the evaluation of public health interventions: design challenges and solutions.

Journal of Epidemiology and Community Health, 65(7), 582-587.

Boruch, R., Weisburd, D., Turner III, H. M., Karpyn, A., & Littell, J. (2009). Randomized Controlled Trials for

Evaluation and Planning. In L. Bickman & D. J. Rog (Eds.), The Sage Handbook of Applied Social Research.

Second edition (pp. 147-181). London: Sage.

Brooks, G. (2002). What Works for Children with Literacy Difficulties? The Effectiveness of Intervention

Schemes. Department for Education and Skills. Available at:

http://www.dcsf.gov.uk/research/data/uploadfiles/RR380.pdf (accessed March 2010).

Canavan, J., Coen, L., Dolan, P. & White, L. (2009). Privileging Practice: Facing the Challenges of Integrated Working for Outcomes for Children. Children & Society, 23(5), 377-388.

Cheetham, J., Fuller, R., McIvor, G., & Perth, A. (1992). Evaluating Social Work Effectiveness. Buckingham:

Open University Press.

Chen, H.T. (2004). The Roots of Theory-Driven Evaluation, Current Views and Origins. In Alkin, M. (Ed.)

Evaluation Roots, Tracing Theorists' Views and Influences. Thousand Oaks: SAGE.

Creswell, J.W. and Clark, V.L.P. (2007) Designing and Conducting Mixed Methods Research. London: Sage.

Devaney, C. (2011). Family Support as an Approach to Working with Children and Families. An Explorative

Study of Past and Present Perspectives among Pioneers and Practitioners. Germany: LAP LAMBERT Academic

Publishing.

Dolan, P. (2006). Assessment, Intervention and Self Appraisal Tools for Family Support. In Dolan, P., Pinkerton,

J. and Canavan, J. (eds.) Family Support as Reflective Practice, London and Philadelphia: Jessica Kingsley

Publishers (pp. 196-213).

Fives, A., Russell, D., Kearns, N., Lyons, R., Eaton, P., Canavan, J., Devaney, C., & O’Brien, A. (2013). The role of

random allocation in Randomized Controlled Trials: Distinguishing selection bias from baseline

imbalance. Journal of MultiDisciplinary Evaluation, 9(20), 33-42.

Fives, A., Russell, D., Kearns, N., Lyons, R., Eaton, P., Canavan, J., & Devaney, C. (2014). The ethics of

Randomized Controlled Trials in social settings: Can social trials be scientifically promising and must there be

equipoise? International Journal of Research & Method in Education. Available at:

http://www.tandfonline.com/doi/abs/10.1080/1743727X.2014.908338#.U2n1m_ldWWE

http://www.dcsf.gov.uk/research/data/uploadfiles/RR380.pdf

http://www.tandfonline.com/doi/abs/10.1080/1743727X.2014.908338#.U2n1m_ldWWE

24

Fives, A. (2015). Modeling the interaction of academic self-beliefs, frequency of reading at home, emotional

support, and reading achievement: An RCT study of at-risk early readers in 1st

grade and 2nd

grade. In

preparation.

Freedman, B. (1987). Equipoise and the ethics of clinical research. New England Journal of Medicine, 317(3),

141-145.

Ghate, D. (2001). Community-based evaluations in the UK: Scientific concerns and practical constraints.

Children and Society, 15(1), 23-32.

Government Social Research Unit (2007). Magenta Book Background Papers. Paper 7: why do social

experiments? London: HM Treasury. Available at

http://www.civilservice.gov.uk/Assets/chap_6_magenta_tcm6-8609.pdf (accessed January 2011)

Grades of Recommendation Assessment, Development and Evaluation (GRADE) Working Group (2004).

Grading quality of evidence and strength of recommendations. British Medical Journal, 328(7454), 1490-1494.

Green, L. (2001). From research to “best practices” in other settings and populations. American Journal of Health Behavior, 25(3), 165–178.

Greene, L.W., & Kreuter, M.W. (1991). Health Promotion Planning: An educational and environmental

approach. Toronto: Mayfield Publishing Company.

Hammersley, M. (2008). Paradigm war revived? On the diagnosis of resistance to randomized controlled trials

and systematic review in education. International Journal of Research and Method in Education, 31(1), 3-10.

Institute of Education Sciences [IES] (2008). What Works Clearinghouse. Procedures and Standards Handbook.

Version 2.1. Available at: http://ies.ed.gov/ncee/wwc/documentsum.aspx?sid=19

Institute of Medicine. (2001). Crossing the quality chasm: A new health system for the 21st

century. Washington

DC: National Academies Press.

Jack, G. (2011). Early intervention - smart practice. Every Child Journal, 2 (3), 50-53.

Jack, G. (2013). 'I may not know who I am, but I know where I am from': The meaning of place in social work

with children and families. Child and Family Social Work. ISSN 1365-2206 (In Press)

Jaded, A. (1998). Randomised Controlled Trials: A User’s Guide. London: BMJ Books.

Joffe, S. & Miller, F.G. (2008). Bench to bedside: Mapping the moral terrain of clinical research. Hastings Center

Report, 38(2), 30-42.

Juni, P., Altman, D. G., & Egger, M. (2001). Assessing the quality of controlled clinical trials. British Medical

Journal, 323(7303), 42-46.

Mitchell, P. F. (2011). Evidence-based practice in real-world services for young people with complex needs: New opportunities suggested by recent implementation science. Children and Youth Services Review, 33(2), 207–21.

http://nrl.northumbria.ac.uk/5833/



http://www.sciencedirect.com/science/journal/01907409/33/2

25

Miller, F.G., & Brody, H. (2007). Clinical equipoise and the incoherence of research ethics, Journal of Medicine

and Philosophy, 32(2), 151-165.

Miller, P.B., & Weijer, C. (2006). Fiduciary obligation in clinical research. Journal of Law, Medicine & Ethics,

34(2), 424-440.

Morrison, K. (2001). Randomised controlled trials for evidence-based education: Some problems in judging

“What Works.” Evaluation and Research in Education, 15(2), 69-83.

Muir Gray, J.A. (1996). Evidence-based healthcare. London: Churchill Livingstone.

National Economic and Social Forum (NESF) (2005). Early Childhood Care and Education, Report 31, Dublin:

NESF.

National Institute for Clinical Excellence (NICE) (2005) Guideline Development Methods: Information for National Collaborating Centres and Guideline Developers. London: NICE.

Nutley, S., Powell, A. and Davies, H. (2013). What Counts as Good Evidence: Provocation Paper for the Alliance

for Useful Evidence, London: Alliance for Useful Evidence.

Oakley, A., Strange, V., Toroyan, T., Wiggins, M., Roberts, I., & Stephenson, J. (2003). Using random allocation

to evaluate social interventions: Three recent U.K. examples. The ANNALS of the American Academy of Political

and Social Science, 589(1), 170-189.

Pawson, R., & Tilley, N. (1997). Realistic Evaluation. London: Sage.

Petticrew, M. & Robert, H. (2003). Evidence, hierarchies, and typologies: horses for courses, Journal of

Epidemiology and Community Health, 57(7), 527-529.

Puttick, R. and Ludlow, J. (2013). Standards of Evidence: An Approach that Balances the Need for Evidence with

Innovation. London: NESTA.

Rubin, A. and Babbie, E.R. (2014). Research Methods for Social Work, Brooks Cole Empowerment Series, 8th

Edition, Library of Congress Publication USA.

Ritter, G., & Holley, M. (2008). Lessons for conducting random assignment in schools. Journal of Children’s

Services, 3(2), 28-39.

Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A Systematic Approach. Seventh edition.

London: Sage.

Sackett, D.L., Rosenberg, W.M., Gray, J.A., Haynes, R.B., & Richardson, W.S. (1996). Evidence-based medicine:

What it is and what it isn’t. British Medical Journal, 312(7023), 71-72.

Savage, R., Carless, S., Stuart, M. (2003). The effects of rime- and phoneme-based teaching delivered by

Learning Support assistants. Journal of Research in Reading, 26(3), 211-233.

Senn, S. J., (1994) Testing for baseline balance in clinical trials. Statistics in Medicine, 13(17), 1715-1726

26

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for

Generalised Causal Inferences. Boston, U.S.A.: Houghton Mifflin Company.

Society for Prevention Research [SPR] (2004). Standards of Evidence: Criteria for efficacy, effectiveness and

dissemination. Available at:

http://www.preventionresearch.org/StandardsofEvidencebook.pdf

Speller, V., Learmonth, A., & Harrison, D. (1997). The search for evidence of effective health promotion. British

Medical Journal, 315(7104), 611-613.

Stewart-Brown, S., Anthony, R., Wilson, L., Wintsanley, S., Stallard, N., Snooks, H., & Simkiss, D. (2011). Should

randomised controlled trials be the “gold standard” for research on preventive interventions for children?

Journal of Children’s Services, 6(4), 228-235.

Tierney, J. P., Grossman, J. B., & Resch, N. L. (1995). Making a Difference: An Impact Study of Big Brothers Big

Sisters. Public/Private Ventures. Available at:

http://www.ppv.org/ppv/publication.asp?search_id=7&publication_id=111&section_id=0

Urban, J.B., Hargreaves, M., & Trochim, W.M. (2014). Evolutionary Evaluation: Implications for evaluators,

researchers, practitioners, funders and the evidence-based program mandate. Evaluation and Program

Planning, 45, 127-139.

Vance, M. & Clegg, J. (2012). Use of single case study research in child speech, language and communication

interventions. Child Language Teaching and Therapy, 28(3), 255-258.

Vandenbroeck, M., Roets, G., & Roose, R. (2012). Why the evidence-based paradigm in early childhood

education and care is anything but evident. European Early Childhood Education Research Journal, 20(4), 537-

552.

Veatch, R.M. (2007). The irrelevance of equipoise. Journal of Medicine and Philosophy, 32(2), 167-83.

Veerman, J.W. & van Yperen, T.A. (2007). Degrees of freedom and degrees of certainty: A developmental

model for the establishment of evidence-based youth care, Evaluation and Programme Planning, 30, 212-221.

Wandersman, A. (2003). Community Science: Bridging the Gap Between Science and Practice With Community-Centered Models. American Journal of Community Psychology, 31(3/4), 227-242.

Webb, E.J., Campbell, D.T., Schwartz, R.D., Sechrest, L., & Grove, J.B. (1981). Nonreactive Measures in the

Social Sciences. Second edition. Boston, MA: Houghton Mifflin.

Wight, D. and Obasi, A., (2003) “Unpacking the ‘black box’: the importance of process data to explain

outcomes, pp. 151-169, in Stephenson, J.M., Imrie, J., Bonell, C., Effective Sexual Health Interventions: Issues in

Experimental Evaluation. New York: Open University Press.

http://www.preventionresearch.org/StandardsofEvidencebook.pdf

http://www.ppv.org/ppv/publication.asp?search_id=7&publication_id=111&section_id=0

A UNESCO Child and Family Research Centre Working Paper...Dr John Canavan Professor Pat Dolan UNESCO...

Documents

Transcript of A UNESCO Child and Family Research Centre Working Paper...Dr John Canavan Professor Pat Dolan UNESCO...