Undertaking a Quantitative Synthesis

Undertaking a Quantitative Synthesis

Steve Higgins, Durham University

Robert Coe, Durham University

Mark Newman, EPPI Centre, IoE, London University

James Thomas, EPPI Centre, IoE, London University

Carole Torgerson, IEE, York University

Acknowledgements

• This presentation is an outcome of the work of the ESRC-funded Researcher Development Initiative: “Training in the Quantitative synthesis of Intervention Research Findings in Education and Social Sciences” which ran from 2008-2011.

• The training was designed by Steve Higgins and Rob Coe (Durham University), Carole Torgerson (Birmingham University) and Mark Newman and James Thomas, Institute of Education, London University.

• The team acknowledges the support of Mark Lipsey, David Wilson and Herb Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s (2001) “Practical Meta-analysis” and David Wilson’s slides at: http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).

• The materials are offered to the wider academic and educational community community under a Creative Commons licence: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

• You should only use the materials for educational, not-for-profit use and you should acknowledge the source in any use.

Background

• Training funded by the ESRC’s “Researcher Development Initiative”

• Collaboration between the Universities of Durham, York and the Institute of Education, University of London

National initiative• Level 1 and 2

– Round 1:Durham, Edinburgh London (2008-9)– Round 2: Belfast, York, Cardiff (2009-10)

• Level 3– Mark Lipsey, Edinburgh, 16th March 2010– Larry Hedges, London, 7th June 2010– Workshop at RCT Conference, York, September

• Doctoral support– British Educational Research Association (BERA) student

conferences

• Website and resource materials

Overall aims

• To support understanding of meta-analysis of intervention research findings in education and social sciences more broadly;

• To develop understanding of reviewing quantitative research literature;

• To describe the techniques and principles of meta-analysis involved to support understanding of its benefits and limitations;

• To provide references and examples to support further work.

Learning outcomes for Level 1

• To understand the role of research synthesis in identifying messages about ‘what works’ from intervention research findings

• To understand the concept of effect size as a metric for comparing intervention research findings

• To be able to read and understand a forest plot of the results

• To be able to read a meta-analysis of intervention research findings, interpret the results, and draw conclusions.

Learning outcomes for level 2• To be able to identify and select relevant

quantitative data from a published report which can be used to calculate effect sizes for meta-analysis;

• To be able to calculate an effect size from commonly found continuous (and clustered) data;

• To recognise when it is appropriate to combine individual effect sizes;

• To identify possible solutions to cope with heterogeneity;

• To be able to display and interpret the results of a meta-analysis.

Overview of the day

10.00 Arrival/ Registration/ Coffee10.15 Introduction and overview

Identifying data for synthesisCalculating effect sizes

12.30 Lunch1.30 Combining effect sizes

Assessing and coping with heterogeneity3.00 Break3.30 Overview of software for meta-analysis

Summary, discussion and evaluation4.00 Finish

Introductions

• Introduce yourself to those next to you– What is your interest in meta-analysis?– What experience have you in this area?

Meta-analysis as synthesis

• Quantitative data from– Experimental research studies– Correlational research studies– Based on a systematic review

• Methodological assumptions from quantitative approaches (both epistemological and mathematical)

• Hypothesis testing & exploration

Key issues about reviews and evidence

• Applicability of the evidence to the question– Breadth– Scope– Scale

• Robustness of the evidence– Research quality

Session key assumption

• We have found a group of studies that meet our inclusion criteria, that is they evaluate the effectiveness of a similar intervention and measure outcome(s)

• How do we combine the results ?

Stages of synthesis

What is the question?Theories and assumptions in the review question

What is the result?

What new research questions emerge?

What data are available?By addressing review question according to conceptual framework

How does integrating the data answer the question?To address the question (including theory testing or development).

What does the result mean? (conclusions)

How robust is the synthesis?For quality, sensitivity, coherence & relevance.

Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291See also: Popay et al. (2006) Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. Lancaster: Institute for Health Research, Lancaster University. http://www.lancs.ac.uk/fass/projects/nssr/research.htm

What are the patterns in the data?Including study, intervention, outcomes and participant characteristicsCan the

conceptual framework be developed?

Calculating effect sizes• The difference between the two means,

expressed as a proportion of the standard deviation ES = (Me – Mc) / SD

• Cohen's d

• Glass's Δ

• Hedges' g

Practical 1

Calculating effect sizes based on standardised mean differences1. Basic calculation

2. Extracting data and using a web-based tool

a. Investigating the effect size

3. Identifying data from a paper

4. Converting other data

1a) Calculating an effect size

The intervention group’s average score was 28.5, the control group’s was 26.5, the pooled standard deviation was 4.

What was the effect size?

1b) ‘Early Steps’

Log in to: http://eppi.ioe.ac.uk/EPPIReviewer4/EppiReviewer4TestPage.html

• Username: meta

• Password: analysis

1. To be able to identify and select relevant quantitative data from a published report that can be used to calculate effect sizes for meta-analysis

• Which effect?

• Which one is appropriate for your meta-analysis?

Greaney, K., Tunmer, W., & Chapman, J. (1997). The effects of rime-based orthographic analogy training on the word recognition skills of children with reading disability. Journal of Educational Psychology 89, 645-651.

2. To be able to calculate an effect size from commonly found continuous (and clustered) data

1c) Skim through the Greaney et al. (1997) paper. Imagine you are conducting a systematic review of the impact of interventions on reading. Work out the effect size which you think best shows whether the rime-based training is effective.

Calculating Effect Sizes

– Direct calculation based on means and standard deviations– Algebraically equivalent formulas (t-test, SE)– Exact probability value for a t-test

– Approximations based on continuous data (correlation coefficient)

– Estimates of the mean difference (adjusted means, regression B weight, gain score means)

– Estimates of the pooled standard deviation (gain score standard deviation, one-way ANOVA with 3 or more groups, ANCOVA)

– Approximations based on dichotomous data

Equivale

nt

Approxim

ate

Estim

at

es

Using other data

• Converting Standard Error to Standard Deviation

SD = SE × √n

So if the sample size (n) is 64 and the SE is 0.2, what is the SD?

Conversion

• Key issue – is the source data comparable?– Lipsey and Wilson (2001) formulae– Meta-analysis software– Spreadsheet on data stick

• Open the spreadsheet ES_converter.xls

Recap of outcomes

1. Identify and select relevant quantitative data to calculate effect sizes for meta-analysis;

2. Calculate an effect size from commonly found continuous (and clustered) data;

3. To recognise when it is appropriate to combine individual effect sizes;

4. To identify possible solutions to cope with heterogeneity;

5. To be able to display and interpret the results of a meta-analysis.

Running and exploring a meta-analysis

• Practical 2a: Running a meta-analysis

Identifying and exploring heterogeneity

• Key issues– Statistical

– Educational

– Role of quality

– Lumpers and splitters

Assessing between study heterogeneity

• When effect sizes differ consistent with chance error, the effect size estimate is considered to be homogeneous (unique ‘true’ effect).

• When the variability in effect sizes is greater than expected by chance, the effects are considered to be heterogeneous

• The presence of heterogeneity affects the process of the meta-analysis

• What does this mean?

Review

Heterogeneity

Heterogeneity chi-squared = 41.74 (df = 11), p<0.0001;

Q statistic 46.3, p< 0.001; I2= 76.24%

Sub-divided by learner characteristics

Standardised mean difference

Favours Control Favours Phonics

-3.7709 0 3.77098

Study Standardised mean difference (95% CI) % Weight

Ability==0 Greaney 0.30 (-0.36, 0.95) 6.8 Lovett89 0.22 (-0.14, 0.57) 23.1 Lovett90 -0.20 (-0.85, 0.46) 6.9 Martinussen 0.46 (-0.30, 1.21) 5.2 O'Connor 0.57 (-0.59, 1.73) 2.2 Torgesen99 0.07 (-0.34, 0.48) 17.3 Torgesen01 -0.31 (-0.87, 0.24) 9.5 Umbach 2.77 ( 1.77, 3.77) 2.9

Subtotal 0.21 ( 0.01, 0.41) 73.9

Ability==1 Haskell 0.07 (-0.73, 0.87) 4.6 Johnston 0.97 ( 0.43, 1.51) 10.1 Leach 0.84 (-0.08, 1.75) 3.5 Skailand -0.17 (-0.78, 0.44) 8.0

Subtotal 0.45 ( 0.11, 0.78) 26.1

Overall 0.27 ( 0.10, 0.45) 100.0

Sub-divided by intention to teach

Statistical methods to identify heterogeneity

• Visual• Presence

– Q statistic (Cooper & Hedges, 1994)

• Significance level (p-value) 2

2

• Extent– I2

(Higgins & Thompson, 2002)• If it exceeds 50%, it may be advisable not to combine the studies

All have low power with a small number of studies (Huedo-Medina et al. 2006)

Review

To recognise when it is appropriate to combine individual effect sizes (also solutions to heterogeneity – the ability to create a homogenous set of effect

sizes?)

• Educational heterogeneity– What educational features might explain

variation– Pupil age, sex, attainment– Teacher– Intervention– Interpretation

Exploring heterogeneity

• Practical task 2b: Heterogeneity

‘Pooling’ the results

• In a meta-analysis, the effects found across studies are combined or ‘pooled’ to produce a weighted average effect of all the studies - the summary effect.

• Each study is weighted according to some measure of its importance.

• In most meta-analyses, this is achieved by giving a weight to each study in inverse proportion to the variance of its effect.

• ‘Fixed effect’ and ‘random effects’ models based on different statistical assumptions

• The choice of model is determined by how much heterogeneity there is.– Fixed effect if the the studies are relatively

homogeneous. – Random effects there is significant heterogeneity

between study results.

Which model?

Fixed effect model

• The difference between the studies is due to chance– Observed study effect = Fixed effect + error

Key assumption: each study is from a distribution of studies which all estimate the same overall effect, but differ due to random error

Inverse Variance Weighting

• Problem– Sample sizes in studies vary– Larger studies are assumed to provide a better estimate of

effect so should be more important in the synthesis and carry more “weight” than smaller studies

• Solution– Simple approach: weight each ES by its sample size.– Better approach: weight by the inverse variance.

Inverse variance weight: how is it calculated?

• The standard error (SE) is directly related to ES precision

• SE is used to create confidence intervals• The smaller the SE, the more precise the ES• Hedges’ showed that the optimal weights for meta-

analysis are:

2

1

SEw

Inverse Variance Weight formula

For Standardized Mean Differences:

2

1

sew

)(2 2121

21

nn

ES

nn

nnse

sm

Random effects model

Assumes there are two component of variation

1. Due to differences within the studies (e.g. different design, different populations, variations in the intervention, different implementation, etc.)

2. Due to sampling error

Random effects model• Each study is seen

as representing the mean of a distribution of studies

• There is still a resultant overall effect size Key assumption: each study is

from a distribution of studies which all estimate the same overall effect, but differing due to random error

Fixed and random effects models

Fixed effects model - weights each study by the inverse of the sampling variance.

Random effects model - weights each study by the inverse of the sampling variance plus the variability across the population effects.

2

1

ii se

w

wi 1

sei2 ˆ v

Where this is the random effects variance component

Combining effect sizes: running a meta-analysis

• Practical task 2c: ‘Models’

• Random and fixed effects models - focus on consequences – interpretation

• Sensitivity analysis – subgroup analysis – as solutions to educational heterogeneity

Impact of using Fixed Effect or Random Effects on a meta-analysis• Impact on significance levels and confidence

intervals– Confidence intervals will be greater with random effects

model– Significant pooled ES under a fixed effect model may not be

significant with the random effects model

• Random effects models are therefore considered more conservative

What is publication bias?

• Publication bias occurs when there are systematic differences in conclusions between studies that are unpublished compared with those that are published.

• Usually unpublished data are more likely to be ‘negative’ about an intervention than studies that are published.

How can we detect publication bias?

• One simple approach is through the use of a ‘funnel plot’.

• This is a graph where all the effect sizes are plotted on an x-axis whilst the size of the study (N) or the standard error (SE) is on the y-axis.

• If there is NO publication bias the plots will form an ‘inverted funnel’.

Hypothetical Funnel Plot showing little Publication Bias

0

1200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Effect Size

Sa

mp

le S

ize

Review of Adult Literacy Teaching

Funnel Plot of Effect Size against Sample Size

0

1200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Effect Size

Sa

mp

le S

ize

Torgerson, Porthouse & Brooks. JRR, 2003

Review of Phonics Instruction

0

125

-2.5 -1.5 -0.5 0.5 1.5 2.5

Effect Size

Sa

mp

le S

ize

Funnel plots

Assumptions:– larger studies are more

likely to be accurate– smaller studies will be

more widely scattered– publication bias will lead

to asymmetry

Practical meta-analysis worksheet

• Practical 2d: Publication bias

Funnel plots: limitations

• Heterogeneity can lead to asymmetry• Intervention may have different effect in small

studies compared with large• Poorer quality (smaller) studies can produce

asymmetry• Funnel plots may tell us more about effects in

smaller studies than publication bias in particular• Most meta-analyses contain too few studies to

produce a recognisable ‘funnel’

Statistical tests for publication bias

• Begg’s test– Derives from funnel plot

• Egger’s test– More powerful than Begg’s

What to do about publication bias

• Trim and fill– estimates missing studies and their effect

sizes

• Sensitivity analysis

• Do not combine studies statistically

Caution needed in meta-analysis when:

– studies are diverse (e.g. different interventions with different populations and/or comparison groups)

– outcomes are diverse– the quality of included studies is poor – there are significant publication and/or

reporting biases

• All of these involve reviewer judgement

Stages of synthesis

What is the question?Theories and assumptions in the review question

What is the result?

What new research questions emerge?

What data are available?By addressing review question according to conceptual framework

How does integrating the data answer the question?To address the question (including theory testing or development).

What does the result mean? (conclusions)

How robust is the synthesis?For quality, sensitivity, coherence & relevance.

Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291See also: Popay et al. (2006) Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. Lancaster: Institute for Health Research, Lancaster University. http://www.lancs.ac.uk/fass/projects/nssr/research.htm

What are the patterns in the data?Including study, intervention, outcomes and participant characteristicsCan the

conceptual framework be developed?

Drawing conclusions

An effect size is just a number

Needs interpreting

Needs to be done systematically & transparently

Interpretation of systematic review findings

• Reviews often produce generalised findings that require interpretation– For different cultural, social and economic

settings– In conjunction with other types of

knowledge

Applicability

‘A leap of faith is always required when applying any study findings to the population at large’ or to a specific person. ‘In making that jump, one must always strike a balance between making justifiable broad generalizations and being too conservative in one’s conclusions.’ (Friedman 1985)– Review authors should not assume circumstances in which

findings might be applied are similar to their own – How much evidence and of of what quality is enough to say

‘what works’ or ‘is effective’

Researcher role: Interpreting review findings

• Reviews need fully to describe intervention content and the context of studies (where possible)

• Should consider and report evidence of – Feasibility and Acceptability– Results (balance between positive and negative effects)– Quality of studies– Number of study sites/ participants etc– Who recommendation is to and what it means– Cost/ benefit issues – Values & preferences– Alternatives (instead of what?)– Need

Deriving practical meanings

• Interpreting findings (BEE, WWC)

• Transforming effect sizes

• Understanding the extent of the (average) difference and the importance of this

Interpreting the standardised mean difference (1)

• The ‘subjective’ approach• Cohen (1988) proposed a general guideline

that an effect size (d) :– 0.2 is a small effect– 0.5 is a moderate effect– 0.8 is a large effect

• Does not take into account either the underlying incidence of the outcome event or the quality/validity of the outcome measure

Interpreting effect sizes

• When might a pooled average effect size be misleading?

• When might a ‘small’ effect size be valuable?

• When might a ‘large’ effect size be unimportant?


The ‘objective’ approach• Interprets the SMD in terms of

the proportion of the control group who would be below the average person in the experimental group (e.g. Coe 2002)

• NB If the effect size is 0 then 50% of the control group will be below the average person in the experimental group

• AND 50% of the experimental group will also be below the average person in the experimental group.


• The standardised mean difference represents the amount of a standard deviation that the two groups differ by

• This can therefore be converted back to a more ‘user-friendly’ number. E.g.– fruit and vegetable consumption was found to have

increased by a standardised mean difference of 0.65– If, on baseline fruit and vegetable consumption was

measured as being 2.4 portions per day with a standard deviation of 0.9, we can say that the intervention increased consumption by 0.585 portions, or from 2.4 to nearly 3 portions per day


-0.5

0

0.5

1

1.5

2

2.5

Fruit and vegetables

Fruit only

Vegetables only

Using the same scale, it is possible to compare the results of studies

Practical 3: Interpreting effect sizes

• Work with a partner to translate the effect sizes on the sheet

Software for meta-analysisCommercial

E.g Comprehensive Meta-analysis – CMA http://www.MetaAnalysis.com

MetaWin http://www.metawinsoft.com Free/ Shareware

E.g. Review Manager for Cochrane reviews – RevMan http://www.cc-ims.net/revman

MetaAnalyst http://tuftscaes.org/meta_analyst/

Mix: Meta-analysis made easy http://www.mix-for-meta-analysis.info/

See http://www.um.es/metaanalysis/software.php

Questions and evaluation

Acknowledgements

• This presentation is an outcome of the work of the ESRC-funded Researcher Development Initiative: “Training in the Quantitative synthesis of Intervention Research Findings in Education and Social Sciences” which ran from 2008-2011.

• The training was designed by Steve Higgins and Rob Coe (Durham University), Carole Torgerson (Birmingham University) and Mark Newman and James Thomas, Institute of Education, London University.

• The team acknowledges the support of Mark Lipsey, David Wilson and Herb Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s (2001) “Practical Meta-analysis” and David Wilson’s slides at: http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).

• The materials are offered to the wider academic and educational community community under a Creative Commons licence: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License

• You should only use the materials for educational, not-for-profit use and you should acknowledge the source in any use.

Undertaking a Quantitative Synthesis

Documents

Transcript of Undertaking a Quantitative Synthesis