Meta-Analysis: A Gentle Introduction to Research Synthesis Gianna Rendina-Gobioff Jeff Kromrey...

52
Meta-Analysis: A Gentle Introduction to Research Synthesis Gianna Rendina-Gobioff Jeff Kromrey Research Methods in a Nutshell College of Education Presentation December 8, 2006

Transcript of Meta-Analysis: A Gentle Introduction to Research Synthesis Gianna Rendina-Gobioff Jeff Kromrey...

Meta-Analysis: A Gentle Introduction to Research Synthesis

Gianna Rendina-GobioffJeff Kromrey

Research Methods in a NutshellCollege of Education Presentation

December 8, 2006

Discussion Outline

Overview Types of research questions Literature search and retrieval Coding and dependability Effect sizes Describing results Testing hypotheses Threats to validity Reporting meta-analyses References worth pursuing

Overview

Summarization of empirical studies using quantitative methods Results

Estimated weighted mean effect size Confidence interval around mean effect size (or test

null hypothesis about mean effect size) Homogeneity of effect sizes Tests of moderators

Overview:Why Meta-Analyze? Strength in numbers

Several ‘non-significant’ differences may be significant when combined

Strength in diversity Generalizability across variety of participants, settings,

instruments Identification of moderating variables

Good way to look at the forest rather than the trees What do we think we know about a phenomenon? How well do we know it? What remains to be investigated?

It’s fun!

Overview: Stages of Meta-Analysis

Formulate problem Draw sample / collect observations Measure observations Analyze data Interpret data Disseminate

Types of Research Questions: Treatments

Is the treatment (in general) effective? How effective?

Does treatment effectiveness vary by Participant characteristics? Treatment characteristics? Research method characteristics?

Is the treatment ineffective in some conditions?

Types of Research Questions: Relationships

What is the relationship (in general)? Direction? Strength?

Does direction or strength of relationship vary by Participant characteristics? Treatment characteristics? Research method characteristics?

Is the relationship not evident in some conditions?

Literature Search and Retrieval

Decisions to make before searching the literature Inclusion/Exclusion criteria for sources

Types of publication Language and country of publication Dissemination: Journal, presentation, unpublished

Study characteristics Participant characteristics Information reported Timeframe Type of design Measures

Literature Search and Retrieval

Decisions to make before searching the literature Search strategies

Keywords Databases

ERIC, PsychInfo, GoogleScholar, Web of Science Other

Key researchers Listservs Websites Reference sections of articles

Coding of Studies Record

Study inclusion/exclusion characteristics Effect size(s)

Multiple measures? Subsamples? Different times?

Other relevant variables Research design (sampling, controls, treatment, duration) Participant attributes (age, sex, race/ethnicity,

inclusion/exclusion) Settings (geography, classrooms, laboratory) Dissemination characteristics (journal, conference,

dissertation, year, Dr. B)

Coding of Studies (Cont’d)

Written codebook and coding forms Goldilock’s principle:

not too coarse, not too fine. Training and calibration of coders

Beware of drift Estimating reliability of coders

Study Coding FormMeta-Analysis Coding Part I: Increased levels of stress will reduce the likelihood of ART treatment success.

STUDY TITLE: I. Qualifying the study: Answer the following questions as either “yes” or “no”.

Does the study involve women participating in an ART treatment program?

Does the study focus on the relationship between stress and ART treatment outcomes?

Was the study conducted between January 1985 and December 2003?

Does the study employ a prospective design?

Does the study report outcome measures of stress or anxiety as well as ART treatment outcomes?

If the answer to each of the above questions is yes, the study qualifies for inclusion in the meta-analysis.

II. Coding the study:

A. Publication Characteristics1. Title of the study:

2. Year of Publication:

3. Authors:

B. Ecological Characteristics

1. Age of Female Participants: Mean: Range:

2. Country:

3. Race:

White N: %:

Black N: %:

Hispanic N: %:

Asian / Pacific Islander N: %:

American Indian N: %:

Other N: %:

Duration of Psychoeducational Intervention (Please choose)a. Daily for duration of ART treatment

b. 1 – 3 sessions during ART treatmentc. 6 weeks during ART treatmentd. 8 weeks during ART treatmente. 10 weeks during ART treatmentf. Other:

Length of Psychoeducational Intervention (Please choose)a. 1 hourb. 1.5 hoursc. 2 hoursd. Other:

Frequency of Psychoeducational Intervention (Please choose)a. Dailyb. Weeklyc. Bi-Weeklyd. Other:

Effect Size

How false is the null hypothesis? How effective is the treatment? How strong is the relationship?

Independent of sample size (more or less) Useful in primary studies and in meta-

analysis Links to power Descriptive statistic (big enough to care?)

Effect Size (Cont’d)

Jacob Cohen Statistical Power Analysis for the Behavioral

Sciences Anytime a statistical hypothesis is tested, an

effect size is lurking in there somewhere Small, medium, large effects Medium effect size is big enough to be seen by

the naked eye of the careful but naïve observer

1 2

1 2ˆ pooled

X Xd

Effect Size:Standardized Mean Difference

Population effect size

Sample effect size

Small=.20, Medium=.50, Large=.80

2

wN

2

aj oj

j oj

Effect Size:Chi-square Tests

Population effect size

Sample effect size

Small=.10, Medium=.30, Large=.50

k FfN

( 1)ˆ

22

1

R signalf

R noise

22

1 L

R signalf

R remaining noise

Effect Size:ANOVA and Regression

ANOVA

Regression (test of R2)

Regression (test of R2 change)

Effect Size:Correlation

Pearson Product Moment Correlation is an effect size

Commonly transformed to z for aggregation and analyses

Small=.10, Medium=.30, Large=.50

1

1.5log xy

r exy

r

rz

1 1 1ˆ12.58, 3.22, 20X n 2 2 2ˆ10.37, 2.92, 24X n

1 2

1 2

196.999 196.107ˆ 3.06

2 20 24 2

12.58 10.370.72

3.06

pooled

SS SS

n n

and d

Knowing

2ˆ ˆ11

ii i i i

i

SSand SS n

n

2 21 219 3.22 196.999, 23 2.92 196.107SS SS

Effect Size:Computing from Reported StatisticsArticle Information:

t(54) = 4.52, p < .05

2df

td 2 4.521.23

54d

Effect Size:Computing from Reported Statistics

Article Information:

Describing Results:Graphical Displays

Funnel Plot Display for Example Data (k=10)

-0.60

-0.40

-0.20

0.00

0.20

0.40

0.60

50 100 150 200

Sample Size

Eff

ect

Siz

e

Describing Results:Graphical Displays

Describing Results:Graphical Displays

Stem and Leaf Plot

0 44

0 333

0 2

0

0

-0 111

-0 2

Describing Results:Graphical Displays

Describing Results:Graphical Displays

All observed effect sizes from a single population

d1

d2

d6

d5

d4

d3

Testing Hypotheses

Observed effect sizes from two populations

d1

d2

d6

d5

d4

d3

Males

Females

Testing Hypotheses

Population Effect Size

Fre

quen

cy

Population Effect Size

Fre

quen

cyTesting Hypotheses:Fixed Effects vs. Random Effects

Fixed Effects Assumes one population effect size Effect size variance = sampling error (subjects)

Weights represent study variance due to sampling error associated with the subjects (sample size)

Testing Hypotheses:Fixed Effects vs. Random Effects

Random Effects Assumes population effect size is a normal distribution

of values (i.e. not one effect size) Effect size variance = sampling error (subjects) + random effects (study)

Weights represent study variance due to sampling error associated with the subjects (sample size) and sampling of studies (random effects variance component)

Testing Hypotheses:Fixed Effects vs. Random Effects

Fixed Effects vs. Random Effects: Which model to use? Aspects to consider:

Statistics – decision based on the outcome of the homogeneity of effect sizes statistic (conditionally random-effects)

Desired Inferences – decision based on the inferences that the researcher would like to make Conditional Inferences (fixed effect model): Researcher can

only generalize to the studies included in the meta-analysis Unconditional Inferences (random effect model): Researcher

can generalize beyond the studies included in the meta-analysis

Number of studies – when the number of studies is small fixed effects may be more appropriate

Testing Hypotheses:Fixed Effects vs. Random Effects

Fixed effects weight

Random effects weight

For standardized mean difference:

2

1i

i

v

2 2

1i

i

w

2

2 1 2

1 2 1 22

i

i

dn n

n n n n

Testing Hypotheses:Estimation of Weights

Fixed effects

Random effects

i i

i

v d

v

i i

i

w d

w

..

1var( )

iv

..

1var( )

iw

Testing Hypotheses:Weighted Mean Effect Size

Also called Random Effects Variance Component (REVC), symbolized with

Used to calculate random effects weights Three methods to calculate

Observed variance Q based Maximum likelihood

2

Testing Hypotheses:Estimates of Effect Size Variance

Observed variance:

Q based:

Maximum likelihood:

22 2 is

k

2 1Q k

c

2i

ii

vwhere c v

v

Testing Hypotheses:Estimates of Effect Size Variance

1

22

2 22 2 2 2

1 1

1 1; , 2 exp

2

k kki

ii ii i

yl y

Significance testing:

Confidence interval (95% CI):

0

var( )Z

1.96 var( )

Testing Hypotheses:Significance Testing and Confidence Intervals (CI)

Focused test of between group differences

General test of homogeneity of effect sizes

21( )

var( )

BET j

j

Q

21

var( )

i

i

Q dd

Testing Hypotheses:Mean and Individual Effect Size Differences

Generalization of the Q test Continuous or categorical moderators Xi are potential moderating variables

Test for moderating effect

0 1 1 2 2 ...i p pd X X X e

j

Testing Hypotheses:Meta-Analytic Regression Model

Threats to Validity

Sources Primary studies – unreliability, restriction of range,

missing effect sizes (publication bias), incompatible constructs, and poor quality

Meta-analysis processes – incomplete data collection (publication bias), inaccurate data collection, poor methodology, and inadequate power

Threats to Validity

Apples and Oranges Dependent Effect Sizes File Drawer/Publication Bias Methodological Rigor Power

Threats to Validity

Apples and Oranges Are the studies being analyzed similar regarding:

Constructs examined Measures Participants (sampled from same population?) Analyses

Dependent Effect Sizes Participants cannot contribute to the mean effect

size more than once

Threats to Validity:Publication Bias Publication Bias = Studies unavailable to the meta-

analyst due to lack of publication acceptance or submission (termed “file drawer problem” by Rosenthal, 1979)

Pattern in the literature

Effect Size

Small Large

Small

(N=large) Published (Stat Sig)

Published (Stat Sig)

Large Variance

(N=small) Not Published (not Stat Sig)

Published (Stat Sig)

Threats to Validity:Publication Bias

Publication Bias Detection Methods Visual interpretation

Funnel plot display Statistical methods

Begg Rank Correlation (variance or sample size) Egger Regression Funnel Plot Regression Trim and Fill

Threats to Validity

Methodological Rigor of Primary Studies Set criteria for inclusion Include various levels of rigor; then code and use

in meta-analytic analyses (moderators or quality weights)

Power Enough studies collected to achieve significant

findings?

Reporting Meta-Analyses:Pertinent Information to Include

Details regarding the search criteria and retrieval Coding process including rater reliability Describe effect sizes graphically Analyses

Mean effect size (significance test and CI) Fixed vs. Random Effects model Homogeneity of effect sizes Tests for moderators

How threats to validity were addressed

For Further Reading & Thinking Bangert-Drowns, R.L. (1986). Review of developments in meta-analysis method. Psychological

Bulletin, 99, 388-399. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York:

Academic Press. Cooper H. & Hedges, L. (1994). The handbook of research synthesis. New York: Russell Sage

Foundation. Fern, E. F. & Monroe, K. B. (1996). Effect size estimates: Issues and problems in interpretation.

Journal of Consumer Research, 23, 89-105. Grissom R.J. & Kim J.J. (2001). Review of assumptions and problems in the appropriate

conceptualization of effect size. Psychological Methods, 6(2), p. 135-146. Hedges, L.V. & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic

Press. Hedges, L.V. & Vevea, J. (1998). Fixed- and random-effects models in meta-analysis. Psychological

Methods, 3, 486-504 Hedges, L. V., & T. D. Pigott. 2001. The power of statistical tests in meta-analysis. Psychological

Methods, 6, 203–17. Hedges, L. V., & T. D. Pigott. 2004. The power of statistical tests for moderators in meta-analysis.

Psychological Methods, 9, 426–45.

For Further Reading & Thinking Hogarty, K. Y. & Kromrey, J. D. (2000). Robust effect size estimates and meta-analytic tests of

homogeneity. Proceedings of SAS Users’ Group International, 1139-1144. Hogarty, K. Y. & Kromrey, J. D. (2001, April). We’ve been reporting some effect sizes: Can you

guess what they mean? Paper presented at the annual meeting of the American Educational Research Association, Seattle.

Hogarty, K. Y. & Kromrey, J. D. (2003). Permutation tests for linear models in meta-analysis: Robustness and power under non-normality and variance heterogeneity. Proceedings of the American Statistical Association. Alexandria, VA: American Statistical Association.

Huberty, C. J. & Lowman, L. L. (2000). Group overlap as a basis for effect size. Educational and Psychological Measurement, 60, 543 – 563.

Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings (2nd edition). Newbury Park, CA: Sage. Hillsdale, NJ.

Kromrey, J. D., Ferron, J. D., Hess, M. R., Hogarty, K. Y. & Hines, C. V. (2005, April). Robust Inference in Meta-analysis: Comparing Point and Interval Estimates Using Standardized Mean Differences and Cliff’s Delta. Annual meeting of the American Educational Research Association, Montreal.

Kromrey, J. D. & Foster‑Johnson, L. (1996). Determining the efficacy of intervention: The use of effect sizes for data analysis in single‑subject research. Journal of Experimental Education, 65, 73‑93.

For Further Reading & Thinking Kromrey, J. D. & Hogarty, K. Y. (2002). Estimates of variance components in random effects meta-

analysis: Sensitivity to violations of normality and variance homogeneity. Proceedings of the American Statistical Association. Alexandria, VA: American Statistical Association.

Kromrey, J. D., Hogarty, K. Y., Ferron, J. M., Hines, C. V. & Hess, M. R. (2005, August). Robustness in Meta-Analysis: An Empirical Comparison of Point and Interval Estimates of Standardized Mean Differences and Cliff’s Delta. Proceedings of the American Statistical Association Joint Statistical Meetings.

Kromrey, J. D. & Foster-Johnson, L. (1999, February). Effect sizes, cause sizes and the interpretation of research results: Confounding effects of score variance on effect size estimates. Paper presented at the annual meeting of the Eastern Educational Research Association, Hilton Head, South Carolina.

Kromrey, J. D. & Rendina-Gobioff, G. (2006). On knowing what we don't know: An empirical comparison of methods to detect publication bias in meta-analysis. Educational and Psychological Measurement, 66, 357-373.

Lipsey, M. W., & Wilson, D. B. (1993). The efficacy of psychological, educational, and behavioral treatment: Confirmation from meta-analysis. American Psychologist, 48, 1181–1209.

For Further Reading & Thinking Lipsey, M. W., & Wilson, D. B. (2001). Practical Meta-analysis. Thousand Oaks: Sage. National Research Council (1992). Combining information: Statistical issues and

opportunities for research. Washington, DC: National Academy of Science Press. Rendina-Gobioff, G. (2006). Detecting Publication Bias in Random Effects Meta-Analysis:

An Empirical Comparison of Statistical Methods. Unpublished doctoral dissertation, University of South Florida, Tampa.

Rendina-Gobioff, G., Kromrey, J. D., Dedrick, R. F., & Ferron, J. M. (2006, November). Detecting Publication Bias in Random Effects Meta-Analysis: An Investigation of the Performance of Statistical Methods. Paper presented at the annual meeting of the Florida Educational Research Association, Jacksonville.

Rendina-Gobioff, G. & Kromrey, J. D. (2006, October). PUB_BIAS: A SAS® Macro for Detecting Publication Bias in Meta-Analysis. Paper presented at the annual meeting of the Southeast SAS Users Group, Atlanta

Rosenthal, R. (1995). Writing meta-analytic reviews. Psychological Bulletin, 118, 183-192. Sutton, A. J., Abrams, K. R., Jones, D. R., Sheldon, T. A., & Song, F. (2000). Methods of

meta-analysis in medical research. New York: Wiley. Van den Noortgate, W., & Onghena, P. (2003). Multilevel meta-analysis: A comparison

with traditional meta-analytical procedures. Educational and Psychological Measurement, 63, 765-790.

Thank You

Now, let’s just talk…