Undertaking a Quantitative Synthesis
description
Transcript of Undertaking a Quantitative Synthesis
Undertaking a Quantitative Synthesis
Steve Higgins, Durham University
Robert Coe, Durham University
Mark Newman, EPPI Centre, IoE, London University
James Thomas, EPPI Centre, IoE, London University
Carole Torgerson, IEE, York University
Acknowledgements
• This presentation is an outcome of the work of the ESRC-funded Researcher Development Initiative: “Training in the Quantitative synthesis of Intervention Research Findings in Education and Social Sciences” which ran from 2008-2011.
• The training was designed by Steve Higgins and Rob Coe (Durham University), Carole Torgerson (Birmingham University) and Mark Newman and James Thomas, Institute of Education, London University.
• The team acknowledges the support of Mark Lipsey, David Wilson and Herb Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s (2001) “Practical Meta-analysis” and David Wilson’s slides at: http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).
• The materials are offered to the wider academic and educational community community under a Creative Commons licence: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
• You should only use the materials for educational, not-for-profit use and you should acknowledge the source in any use.
Background
• Training funded by the ESRC’s “Researcher Development Initiative”
• Collaboration between the Universities of Durham, York and the Institute of Education, University of London
National initiative• Level 1 and 2
– Round 1:Durham, Edinburgh London (2008-9)– Round 2: Belfast, York, Cardiff (2009-10)
• Level 3– Mark Lipsey, Edinburgh, 16th March 2010– Larry Hedges, London, 7th June 2010– Workshop at RCT Conference, York, September
• Doctoral support– British Educational Research Association (BERA) student
conferences
• Website and resource materials
Overall aims
• To support understanding of meta-analysis of intervention research findings in education and social sciences more broadly;
• To develop understanding of reviewing quantitative research literature;
• To describe the techniques and principles of meta-analysis involved to support understanding of its benefits and limitations;
• To provide references and examples to support further work.
Learning outcomes for Level 1
• To understand the role of research synthesis in identifying messages about ‘what works’ from intervention research findings
• To understand the concept of effect size as a metric for comparing intervention research findings
• To be able to read and understand a forest plot of the results
• To be able to read a meta-analysis of intervention research findings, interpret the results, and draw conclusions.
Learning outcomes for level 2• To be able to identify and select relevant
quantitative data from a published report which can be used to calculate effect sizes for meta-analysis;
• To be able to calculate an effect size from commonly found continuous (and clustered) data;
• To recognise when it is appropriate to combine individual effect sizes;
• To identify possible solutions to cope with heterogeneity;
• To be able to display and interpret the results of a meta-analysis.
Overview of the day
10.00 Arrival/ Registration/ Coffee10.15 Introduction and overview
Identifying data for synthesisCalculating effect sizes
12.30 Lunch1.30 Combining effect sizes
Assessing and coping with heterogeneity3.00 Break3.30 Overview of software for meta-analysis
Summary, discussion and evaluation4.00 Finish
Introductions
• Introduce yourself to those next to you– What is your interest in meta-analysis?– What experience have you in this area?
Meta-analysis as synthesis
• Quantitative data from– Experimental research studies– Correlational research studies– Based on a systematic review
• Methodological assumptions from quantitative approaches (both epistemological and mathematical)
• Hypothesis testing & exploration
Key issues about reviews and evidence
• Applicability of the evidence to the question– Breadth– Scope– Scale
• Robustness of the evidence– Research quality
Session key assumption
• We have found a group of studies that meet our inclusion criteria, that is they evaluate the effectiveness of a similar intervention and measure outcome(s)
• How do we combine the results ?
Stages of synthesis
What is the question?Theories and assumptions in the review question
What is the result?
What new research questions emerge?
What data are available?By addressing review question according to conceptual framework
How does integrating the data answer the question?To address the question (including theory testing or development).
What does the result mean? (conclusions)
How robust is the synthesis?For quality, sensitivity, coherence & relevance.
Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291See also: Popay et al. (2006) Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. Lancaster: Institute for Health Research, Lancaster University. http://www.lancs.ac.uk/fass/projects/nssr/research.htm
What are the patterns in the data?Including study, intervention, outcomes and participant characteristicsCan the
conceptual framework be developed?
Calculating effect sizes• The difference between the two means,
expressed as a proportion of the standard deviation ES = (Me – Mc) / SD
• Cohen's d
• Glass's Δ
• Hedges' g
Practical 1
Calculating effect sizes based on standardised mean differences1. Basic calculation
2. Extracting data and using a web-based tool
a. Investigating the effect size
3. Identifying data from a paper
4. Converting other data
1a) Calculating an effect size
The intervention group’s average score was 28.5, the control group’s was 26.5, the pooled standard deviation was 4.
What was the effect size?
1b) ‘Early Steps’
Log in to: http://eppi.ioe.ac.uk/EPPIReviewer4/EppiReviewer4TestPage.html
• Username: meta
• Password: analysis
1. To be able to identify and select relevant quantitative data from a published report that can be used to calculate effect sizes for meta-analysis
• Which effect?
• Which one is appropriate for your meta-analysis?
Greaney, K., Tunmer, W., & Chapman, J. (1997). The effects of rime-based orthographic analogy training on the word recognition skills of children with reading disability. Journal of Educational Psychology 89, 645-651.
2. To be able to calculate an effect size from commonly found continuous (and clustered) data
1c) Skim through the Greaney et al. (1997) paper. Imagine you are conducting a systematic review of the impact of interventions on reading. Work out the effect size which you think best shows whether the rime-based training is effective.
Calculating Effect Sizes
– Direct calculation based on means and standard deviations– Algebraically equivalent formulas (t-test, SE)– Exact probability value for a t-test
– Approximations based on continuous data (correlation coefficient)
– Estimates of the mean difference (adjusted means, regression B weight, gain score means)
– Estimates of the pooled standard deviation (gain score standard deviation, one-way ANOVA with 3 or more groups, ANCOVA)
– Approximations based on dichotomous data
Equivale
nt
Approxim
ate
Estim
at
es
Using other data
• Converting Standard Error to Standard Deviation
SD = SE × √n
So if the sample size (n) is 64 and the SE is 0.2, what is the SD?
Conversion
• Key issue – is the source data comparable?– Lipsey and Wilson (2001) formulae– Meta-analysis software– Spreadsheet on data stick
• Open the spreadsheet ES_converter.xls
Lunch
Recap of outcomes
1. Identify and select relevant quantitative data to calculate effect sizes for meta-analysis;
2. Calculate an effect size from commonly found continuous (and clustered) data;
3. To recognise when it is appropriate to combine individual effect sizes;
4. To identify possible solutions to cope with heterogeneity;
5. To be able to display and interpret the results of a meta-analysis.
Running and exploring a meta-analysis
• Practical 2a: Running a meta-analysis
Identifying and exploring heterogeneity
• Key issues– Statistical
– Educational
– Role of quality
– Lumpers and splitters
Assessing between study heterogeneity
• When effect sizes differ consistent with chance error, the effect size estimate is considered to be homogeneous (unique ‘true’ effect).
• When the variability in effect sizes is greater than expected by chance, the effects are considered to be heterogeneous
• The presence of heterogeneity affects the process of the meta-analysis
• What does this mean?
Review
Heterogeneity
Heterogeneity chi-squared = 41.74 (df = 11), p<0.0001;
Q statistic 46.3, p< 0.001; I2= 76.24%
Sub-divided by learner characteristics
Standardised mean difference
Favours Control Favours Phonics
-3.7709 0 3.77098
Study Standardised mean difference (95% CI) % Weight
Ability==0 Greaney 0.30 (-0.36, 0.95) 6.8 Lovett89 0.22 (-0.14, 0.57) 23.1 Lovett90 -0.20 (-0.85, 0.46) 6.9 Martinussen 0.46 (-0.30, 1.21) 5.2 O'Connor 0.57 (-0.59, 1.73) 2.2 Torgesen99 0.07 (-0.34, 0.48) 17.3 Torgesen01 -0.31 (-0.87, 0.24) 9.5 Umbach 2.77 ( 1.77, 3.77) 2.9
Subtotal 0.21 ( 0.01, 0.41) 73.9
Ability==1 Haskell 0.07 (-0.73, 0.87) 4.6 Johnston 0.97 ( 0.43, 1.51) 10.1 Leach 0.84 (-0.08, 1.75) 3.5 Skailand -0.17 (-0.78, 0.44) 8.0
Subtotal 0.45 ( 0.11, 0.78) 26.1
Overall 0.27 ( 0.10, 0.45) 100.0
Sub-divided by intention to teach
Statistical methods to identify heterogeneity
• Visual• Presence
– Q statistic (Cooper & Hedges, 1994)
• Significance level (p-value) 2
2
• Extent– I2
(Higgins & Thompson, 2002)• If it exceeds 50%, it may be advisable not to combine the studies
All have low power with a small number of studies (Huedo-Medina et al. 2006)
Review
To recognise when it is appropriate to combine individual effect sizes (also solutions to heterogeneity – the ability to create a homogenous set of effect
sizes?)
• Educational heterogeneity– What educational features might explain
variation– Pupil age, sex, attainment– Teacher– Intervention– Interpretation
Exploring heterogeneity
• Practical task 2b: Heterogeneity
‘Pooling’ the results
• In a meta-analysis, the effects found across studies are combined or ‘pooled’ to produce a weighted average effect of all the studies - the summary effect.
• Each study is weighted according to some measure of its importance.
• In most meta-analyses, this is achieved by giving a weight to each study in inverse proportion to the variance of its effect.
• ‘Fixed effect’ and ‘random effects’ models based on different statistical assumptions
• The choice of model is determined by how much heterogeneity there is.– Fixed effect if the the studies are relatively
homogeneous. – Random effects there is significant heterogeneity
between study results.
Which model?
Fixed effect model
• The difference between the studies is due to chance– Observed study effect = Fixed effect + error
Key assumption: each study is from a distribution of studies which all estimate the same overall effect, but differ due to random error
Inverse Variance Weighting
• Problem– Sample sizes in studies vary– Larger studies are assumed to provide a better estimate of
effect so should be more important in the synthesis and carry more “weight” than smaller studies
• Solution– Simple approach: weight each ES by its sample size.– Better approach: weight by the inverse variance.
Inverse variance weight: how is it calculated?
• The standard error (SE) is directly related to ES precision
• SE is used to create confidence intervals• The smaller the SE, the more precise the ES• Hedges’ showed that the optimal weights for meta-
analysis are:
2
1
SEw
Inverse Variance Weight formula
For Standardized Mean Differences:
2
1
sew
)(2 2121
21
nn
ES
nn
nnse
sm
Random effects model
Assumes there are two component of variation
1. Due to differences within the studies (e.g. different design, different populations, variations in the intervention, different implementation, etc.)
2. Due to sampling error
Random effects model• Each study is seen
as representing the mean of a distribution of studies
• There is still a resultant overall effect size Key assumption: each study is
from a distribution of studies which all estimate the same overall effect, but differing due to random error
Fixed and random effects models
Fixed effects model - weights each study by the inverse of the sampling variance.
Random effects model - weights each study by the inverse of the sampling variance plus the variability across the population effects.
2
1
ii se
w
wi 1
sei2 ˆ v
Where this is the random effects variance component
Combining effect sizes: running a meta-analysis
• Practical task 2c: ‘Models’
• Random and fixed effects models - focus on consequences – interpretation
• Sensitivity analysis – subgroup analysis – as solutions to educational heterogeneity
Impact of using Fixed Effect or Random Effects on a meta-analysis• Impact on significance levels and confidence
intervals– Confidence intervals will be greater with random effects
model– Significant pooled ES under a fixed effect model may not be
significant with the random effects model
• Random effects models are therefore considered more conservative
What is publication bias?
• Publication bias occurs when there are systematic differences in conclusions between studies that are unpublished compared with those that are published.
• Usually unpublished data are more likely to be ‘negative’ about an intervention than studies that are published.
How can we detect publication bias?
• One simple approach is through the use of a ‘funnel plot’.
• This is a graph where all the effect sizes are plotted on an x-axis whilst the size of the study (N) or the standard error (SE) is on the y-axis.
• If there is NO publication bias the plots will form an ‘inverted funnel’.
Hypothetical Funnel Plot showing little Publication Bias
0
1200
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Effect Size
Sa
mp
le S
ize
Review of Adult Literacy Teaching
Funnel Plot of Effect Size against Sample Size
0
1200
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Effect Size
Sa
mp
le S
ize
Torgerson, Porthouse & Brooks. JRR, 2003
Review of Phonics Instruction
0
125
-2.5 -1.5 -0.5 0.5 1.5 2.5
Effect Size
Sa
mp
le S
ize
Funnel plots
Assumptions:– larger studies are more
likely to be accurate– smaller studies will be
more widely scattered– publication bias will lead
to asymmetry
Practical meta-analysis worksheet
• Practical 2d: Publication bias
Funnel plots: limitations
• Heterogeneity can lead to asymmetry• Intervention may have different effect in small
studies compared with large• Poorer quality (smaller) studies can produce
asymmetry• Funnel plots may tell us more about effects in
smaller studies than publication bias in particular• Most meta-analyses contain too few studies to
produce a recognisable ‘funnel’
Statistical tests for publication bias
• Begg’s test– Derives from funnel plot
• Egger’s test– More powerful than Begg’s
What to do about publication bias
• Trim and fill– estimates missing studies and their effect
sizes
• Sensitivity analysis
• Do not combine studies statistically
Break
Caution needed in meta-analysis when:
– studies are diverse (e.g. different interventions with different populations and/or comparison groups)
– outcomes are diverse– the quality of included studies is poor – there are significant publication and/or
reporting biases
• All of these involve reviewer judgement
Stages of synthesis
What is the question?Theories and assumptions in the review question
What is the result?
What new research questions emerge?
What data are available?By addressing review question according to conceptual framework
How does integrating the data answer the question?To address the question (including theory testing or development).
What does the result mean? (conclusions)
How robust is the synthesis?For quality, sensitivity, coherence & relevance.
Cooper, H.M. (1982) Scientific Guidelines for Conducting Integrative Research Reviews Review Of Educational Research 52; 291See also: Popay et al. (2006) Guidance on the Conduct of Narrative Synthesis in Systematic Reviews. Lancaster: Institute for Health Research, Lancaster University. http://www.lancs.ac.uk/fass/projects/nssr/research.htm
What are the patterns in the data?Including study, intervention, outcomes and participant characteristicsCan the
conceptual framework be developed?
Drawing conclusions
An effect size is just a number
Needs interpreting
Needs to be done systematically & transparently
Interpretation of systematic review findings
• Reviews often produce generalised findings that require interpretation– For different cultural, social and economic
settings– In conjunction with other types of
knowledge
Applicability
‘A leap of faith is always required when applying any study findings to the population at large’ or to a specific person. ‘In making that jump, one must always strike a balance between making justifiable broad generalizations and being too conservative in one’s conclusions.’ (Friedman 1985)– Review authors should not assume circumstances in which
findings might be applied are similar to their own – How much evidence and of of what quality is enough to say
‘what works’ or ‘is effective’
Researcher role: Interpreting review findings
• Reviews need fully to describe intervention content and the context of studies (where possible)
• Should consider and report evidence of – Feasibility and Acceptability– Results (balance between positive and negative effects)– Quality of studies– Number of study sites/ participants etc– Who recommendation is to and what it means– Cost/ benefit issues – Values & preferences– Alternatives (instead of what?)– Need
Deriving practical meanings
• Interpreting findings (BEE, WWC)
• Transforming effect sizes
• Understanding the extent of the (average) difference and the importance of this
Interpreting the standardised mean difference (1)
• The ‘subjective’ approach• Cohen (1988) proposed a general guideline
that an effect size (d) :– 0.2 is a small effect– 0.5 is a moderate effect– 0.8 is a large effect
• Does not take into account either the underlying incidence of the outcome event or the quality/validity of the outcome measure
Interpreting effect sizes
• When might a pooled average effect size be misleading?
• When might a ‘small’ effect size be valuable?
• When might a ‘large’ effect size be unimportant?
Interpreting the standardised mean difference (2)
The ‘objective’ approach• Interprets the SMD in terms of
the proportion of the control group who would be below the average person in the experimental group (e.g. Coe 2002)
• NB If the effect size is 0 then 50% of the control group will be below the average person in the experimental group
• AND 50% of the experimental group will also be below the average person in the experimental group.
Interpreting the standardised mean difference (3)
• The standardised mean difference represents the amount of a standard deviation that the two groups differ by
• This can therefore be converted back to a more ‘user-friendly’ number. E.g.– fruit and vegetable consumption was found to have
increased by a standardised mean difference of 0.65– If, on baseline fruit and vegetable consumption was
measured as being 2.4 portions per day with a standard deviation of 0.9, we can say that the intervention increased consumption by 0.585 portions, or from 2.4 to nearly 3 portions per day
Interpreting the standardised mean difference (4)
-0.5
0
0.5
1
1.5
2
2.5
Fruit and vegetables
Fruit only
Vegetables only
Using the same scale, it is possible to compare the results of studies
Practical 3: Interpreting effect sizes
• Work with a partner to translate the effect sizes on the sheet
Software for meta-analysisCommercial
E.g Comprehensive Meta-analysis – CMA http://www.MetaAnalysis.com
MetaWin http://www.metawinsoft.com Free/ Shareware
E.g. Review Manager for Cochrane reviews – RevMan http://www.cc-ims.net/revman
MetaAnalyst http://tuftscaes.org/meta_analyst/
Mix: Meta-analysis made easy http://www.mix-for-meta-analysis.info/
See http://www.um.es/metaanalysis/software.php
Questions and evaluation
Acknowledgements
• This presentation is an outcome of the work of the ESRC-funded Researcher Development Initiative: “Training in the Quantitative synthesis of Intervention Research Findings in Education and Social Sciences” which ran from 2008-2011.
• The training was designed by Steve Higgins and Rob Coe (Durham University), Carole Torgerson (Birmingham University) and Mark Newman and James Thomas, Institute of Education, London University.
• The team acknowledges the support of Mark Lipsey, David Wilson and Herb Marsh in preparation of some of the materials, particularly Lipsey and Wilson’s (2001) “Practical Meta-analysis” and David Wilson’s slides at: http://mason.gmu.edu/~dwilsonb/ma.html (accessed 9/3/11).
• The materials are offered to the wider academic and educational community community under a Creative Commons licence: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
• You should only use the materials for educational, not-for-profit use and you should acknowledge the source in any use.