Meta-analysis and the Synthetic Approach Luke Plonsky Current Developments in Quantitative Research...

Post on 14-Dec-2015

217 views 0 download

Tags:

Transcript of Meta-analysis and the Synthetic Approach Luke Plonsky Current Developments in Quantitative Research...

Meta-analysis and the Synthetic Approach

Luke PlonskyCurrent Developments in

Quantitative Research MethodsDay 2

Traditional Literature Reviews

What do they look like?

Think of a recent one you wrote: What was your process like?

What are their strengths? Weaknesses?

(As we discuss the meta-analytic process, keep a topic or domain of yours in mind.)

Meta-analysis as “the way forward”? (Rousseau, 2008, p. 9)

Systematic, transparent, & quantitative means to

Summarize (all) previous studies (A B; M x N)

Provide a quantitative indication of a relationshipPrevent over/under-interpreting results (Norris &

Ortega, 2006; Rousseau, 2008)

Increase statistical power and generalizability across learners, contexts, L2 features, outcomes, etc. (Plonsky, 2012)

Examine relationships not visible in primary research (A on B when C vs. D)

Identify substantive and methodological trends, weaknesses, and gaps (Plonsky & Gass, 2011)

Meta-analysis is here!

(See Norris & Ortega, 2010; Oswald & Plonsky, 2010)

Pre-2000 2000-2003 2004-2007 2008-in press

0

5

10

15

20

25

30

35

40

45

50

4 6

19

48

+visibility

+impact +citation (Cooper &

Hedges, 2009)

Understand/evaluate choices

advance theory, research, and

practice

Judgment and Decision-Making

Art and ScienceOswald & McCloy

(2003)

Norris & Ortega (2007)

“There doesn’t seem to be a big role in this kind of work for much intelligent statistics, opposed to much wise thought” (Wachter, 1990, p. 182).

vs.

Four major stages(parallel to primary research)

1. Defining the domain / locating primary studies

2. Developing and implementing a coding scheme

3. (Meta-)Analysis

4. Interpreting meta-analytic results

1. DEFINING THE DOMAIN / LOCATING PRIMARY STUDIES

“Best evidence synthesis” (Eysenck, 1995)

Truscott (2007) – strict criteria (e.g., only “long-term” treatments)

Vs. Inclusiveness (preferred) (Norris & Ortega, 2006; Plonsky & Oswald, 2012)Weaknesses mitigated by volume and assessed empirically (e.g.,

Russell & Spada, 2006)

Reliability reported? Yes, d = 0.65; No, d = 0.42 (Plonsky, 2011)

Control for bias? Tight, d = 0.51; Loose, d = 0.38 (Adesope et al., 2010)

(Are there studies with certain methodological features that you would exclude?)

1. Defining the domain / locating primary studies:Methodological considerations

1. Defining the domain / locating primary studies:Publication status (& bias)

Exclude unpublished studies (e.g., Keck et al., 2006; Lyster & Saito, 2010; Mackey

& Goo, 2007) failsafe n (Abraham, 2008; Ross, 1998) lacking precision (e.g., Becker,

2005)

funnel plot (Li, 2010; Norris & Ortega, 2000; Plonsky, 2011)

Include unpublished studies (e.g., Li, 2010; Masgoret & Gardner, 2003, Won, 2008)

Compare Published (g = 0.43) vs. unpublished (g = 0.56) (Taylor et al., 2006)

1. Defining the domain / locating primary studies:Substantive considerations

BroadStrategy instruction (all

skills; Plonsky, 2011)

Multi-word instruction (all types) (Han, in preparation)

Narrow (local)Strategy instruction (reading

only; Taylor et al., 2006)

Collocation instruction + tech.(Nurmukhamedov, in preparation)

(Would you describe your domain as relatively broad or more narrow? If narrow, what broader

domain does your belong to?)

Strict / convenient? quality criteria

The Effectiveness of Bilingual Education Willig (1985) K = 23

d = .63

Rossell & Baker (1996) K = 72 (the “naysayers”; 228 unacceptable) Vote: % of studies helpful (22%), no diff (45%), harmful (33%)

Greene (1998) K = 11 g = .18 (quasi-exp) / .26 (experiments); no Canada

Slavin & Cheung (2003) K = 42; “best-evidence synthesis” No overall d; many subgroups

Roessingh (2004) K = 12 Qual. synthesis; HS learners only; Canadian focus

Rolstad, Mahoney, & Glass (2005) K = 17 (all post-Willig, 1985) dL2 = .23 (usually English); dL1 = .86

Reljić (2011) K = 7 European studies only; d = ?(See also Rossell & Kuder’s [2005] meticulous critique and re-analysis of these studies.)

N&O ‘00

Miller ‘03

R&S ’06

Keck et al. ‘06

M&G ‘07

Truscott ‘07

P&J ‘09

Li ‘10

L&S ’10

Biber et al. ’11

K&W ‘11

Chen & Li ‘12

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

WrittImpExpMLPmptReCF

How effective is feedback?

(Well, it depends…)

Corrective Feedback?

N&O ‘00

Miller ‘03

R&S ’06

Keck et al. ‘06

M&G ‘07

Truscott ‘07

P&J ‘09

Li ‘10

L&S ’10

Biber et al. ’11

K&W ‘11

Chen & Li ‘12

-0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4

WrittImpExpMLPmptReCF

?

(Effects of CF not calculated)

d=-.15

d=1.16

How effective is feedback?

(Well, it depends…)

Corrective Feedback

1. Defining the domain / locating primary studies:Search Strategiesa. Database searches (e.g., LLBA, ERIC, PsycInfo) (see In’nami &

Koizumi, 2010; Plonsky & Brown, under review)

b. Forward citations (Google/Scholar, Web of Science) (Plonsky, 2011)

c. Manual journal searches (Keck et al., 2006; Plonsky & Gass, 2011)

d. Textbooks and edited volumese. Conference proceedings (15 in Lee et al., in press)

f. Reference digging (‘ancestry’)g. Dissertations/theses (10 in Li, 2010; 19 in Lee et al., in press)

h. Previous reviews (e.g., ARAL)

i. Researchers’ websites, online bibliographies, listservs j. Contacting authorsk. others?l. All of the above

1. Defining the domain / locating primary studies:Search Strategies

(in Plonsky & Brown, under review)

Narrow range of search techniques

completeness+redundancy > incompleteness

2. CODING

2. Developing and implementing a coding scheme (the data collection instrument)

Knowledge of…

Substantive issues, relevant models, variables e.g., Taxonomies of instruction, CF moderators e.g., What constitutes a multi-word unit? Collocation? (Han, in prep;

Nurmukhamedov, in prep.) moderators

Research design(s) used Pre-post? Control-experimental only? Classroom/lab, FL/SL, correlational/experimental, length of

treatment, researcher- or teacher-led, outcome measures… more moderators

Methodological features (for analysis of study quality)

2. Developing and implementing a coding scheme

Typically 5 different types of data are coded1. Identification (year, author)2. Sample and context (age, L1, L2, proficiency)3. Design (pre-post/control-experimental, treatment features)4. Outcome features (free response, constrained response)5. Outcomes / effect sizes (r, d)

Coding scheme example: Lee, Jang, & Plonsky (in press) Recommendations:

code variables numerically/categorically whenever possible revise and add new variables as they emerge from coding

(What types of substantive and methodological features would you code for?)

(Which type of index would be most appropriate for your research/domain?)

2. Developing and implementing a coding scheme (cont’d)

Decisions about…

Interrater reliabilityEspecially for high-inference items (e.g., L2 proficiency; task-

essentialness)

Percentage agreement; Cohen’s kappa

Missing data (e.g., SDs VERY common: 31% in Plonsky & Gass,

2011)

1. Ignore/exclude (most common)

2. Impute (i.e., estimate)

3. Request (5/15 and 5/16 sent data in Plonsky, 2011, and Lee et al., in press,

respectively)

3. (META-)ANALYSIS

3. (Meta-)Analysis

Potentially very simple: Overall d = M(study1, study2, …)

Level of analysis (e.g., study?, sample?, within vs. between groups?) Pre-post ESs generally larger than control-experimental ones

Weighting/adjusting ESs for quality, statistical artifactsN (Norris & Ortega, 2000; Plonsky, 2011), inverse variance (Won, 2008)

“Schmidt & Hunter” corrections (Jeon & Yamashita, under review; Masgoret & Gardner, 2003)

Quality/control (e.g., random assignment, pretesting)

Example/template for ES weighting (N; inverse variance)

3. (Meta-)Analysis“adds as well as summarizes knowledge” (Hall et al.,

1994, p. 24)

Moderator analyses (explain variance across studies):

- Ross, 1998: listening; reading

- Norris & Ortega, 2000: +explicitness; +constrained

measures

- Mackey & Goo, 2007: vocab > grammar

- Li, 2010: labs > classrooms

- Plonsky, 2011: longer treatments; fewer strategies; R

& S

- Lee et al., in press.: instruction + feedback; longer

treatments

Overall / mean (d,

r)

(Example of moderator analyses using SPSS)

Totally essential! (and

awesome)

3. (Meta-)Analysis: Treatment types as moderators

Plonsky, 2011

3. (Meta-)Analysis: Outcome measures as moderators

Norris & Ortega, 2000

3. (Meta-)Analysis: Multiple Moderators

Spada & Tomita, 2010

3. (Meta-)Analysis: Treatment length as a moderator

Pragmatics Instruction

L2 Instruction Classroom CF

0.42

1.06

0.720.82

1.08

0.57

0.79

1.13

0.79

(Jeon & Kaya, 2006)(Norris & Ortega, 2000) (Lyster & Saito, 2010)

S LLSB M B S-M L

More advanced (meta-)analytic / techniques

Fixed vs. random effects modeling Bayesian meta-analysis (see Ross, 2013)

Meta-regressionMeta-SEM

(See Borenstein et al., 2009; Cooper, Hedges, & Valentine, 2009)

3. (Meta-)Analysis

4. INTERPRETING RESULTS

SMALL BIG

What do they mean anyway?

What implications do these effect have for

future research, theory, and practice?

What does d = 0.50 (or 0.10, or 1.00…) mean?

How big is ‘big’? And how small

is ‘small’?

4. Interpreting findings(Plonsky & Oswald, under review)

General and field-specific benchmarks (Cohen, 1988; Plonsky & Oswald, under review)

Previous/similar meta-analyses in AL (e.g.,

Abraham, 2008; Lee et al., this colloquium; Mackey & Goo, 2007)

meta-analyses in other fields (Plonsky, 2011)

SD units (Taylor et al., 2006)

Setting (e.g., Li, 2010; Mackey & Goo, 2007)

Length/intensity, practicality (Lee & Huang, 2008; Lee et al., in press;

Lyster & Saito, 2010; Norris & Ortega, 2000)

Study quality (Plonsky, 2011, 2013, in press; Plonsky & Gass, 2011)

L2 Interac-tion

Strategy Instruction

1.00.8

0.60.4

Lab Classroom

Cohen’s (1988) “t-shirt” effect sizes

ESs are best understood in relation to a particular discipline and, ideally, within a particular sub-domain of that discipline (e.g., Cohen, 1988; Valentine & Cooper, 2003)

d = 0.20d = 0.50 d = 0.80

dlinguistics = economics =

social work = …?

d values across 77 L2 meta-analyses(1,733 studies, N = 452,000+; Plonsky & Oswald, under review)

-0.5

0

0.5

1

1.5

2

0.40 ≈ Small(ish)

0.70 ≈ Medium(ish)

1.00 ≈ Large(ish)

M = 0.63

d values across 236 primary L2 studies

- 0

- 1

- 2

- 3

- 4

- 5

1.0775th percentile

large-ish

0.7150th percentile

medium-ish

0.4525th percentile

small-ish

- 0

- 1

- 2

- 3

- 4

- 5

-0.5

0

0.5

1

1.5

2

0.40 ≈ Small

0.70 ≈ Medium

1.00 ≈ Large

M = 0.63

35

1.0775th percentile

large-ish

0.7150th percentile

medium-ish

0.4525th percentile

small-ish

d values across 236 primary L2 studies

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Additional Considerations: Theoretical Maturity

Year

ES

(d)

-fine-grained analyses

+fine-grained analyses

Example: d = 0.42, SD = 0.24, k = 46

Additional Considerations: Methodological Maturity

Example: d = 0.42, SD = 0.24, k = 46

Year

ES

(d)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-refined methods and instruments

+refined methods and instruments

Additional Considerations: Theoretical & Methodological Maturity

Example: d = 0.42, SD = 0.24, K = 92

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

ES

(d)

Year

-refined methods and instruments

+refined methods and instruments

-fine-grained analyses

+fine-grained analyses

Where is your study?

ESs Over TimePlonsky & Gass (2011)

2000s

1990s

1980s

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8

0.52

0.820000000000001

1.62

Average Effect Sizes across Three Decades

Effect Size (d)

Decade

(Literal/Mathematical) SD UnitsExample: d = 0.73; the average EG participant outscored

the average CG participant by about 3/4 a SD

Additional Considerations: Research Setting

Lab vs. Classroom FL vs. SL

*Setting may change over time: L2 interaction (Plonsky & Gass, 2011)

- 1980s ≈ 80% lab-based- 1990s-2000s ≈ 50/50% lab/classroom

(Mackey & Goo, 2007) (Plonsky, 2011) Li (2010)(Taylor et al., 2006)

Additional Considerations: Length/Intensity of Treatment

(Practicality?)

(Jeon & Kaya, 2006) (Norris & Ortega, 2000) (Lyster & Saito, 2010)

S L LSB M B S-M L

Additional Considerations: Manipulation of IVs(Practicality?)

Lee & Huang (2008)The effect of input enhancement on L2 grammar learning: d =

0.22Numerically small, but practically large/significant?

Additional Considerations: Publication Bias, Sample Sizes, & Sampling ErrorPub. bias: The tendency only to publish studies with

statistically significant (or theoretically appealing) findings (Rothstein, Sutton, & Borenstein, 2005; see Plonsky, 2013; Lee, Jang, & Plonsky, in press, for evidence of publication bias in L2 research.)

0

20

40

60

80

100

120

140

160

180

200

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

Effect size (d )

Sam

ple s

ize

Two related statistical artifacts:1. Smaller Ns +sampling error +variance/distance from population mean2. Low instrument reliability smaller effects

-0.5 0 0.5 1 1.5 2 2.5 3 3.5 4 4.50

10

20

30

40

50

60 d = 0.83

vs.

Challenges to meta-analysis

1) Domain maturityage, breadth and depth of researchdanger of pre-mature closure

2) Poor reporting practices (SDs, ESs)Missing data (K = 19 in Nekrasova & Becker, 2009; 22 in

Plonsky, 2011)

3) Instrument reliability low or unreportedReported in 6% of studies (Nekrasova & Becker, 2009)

4) Idiosyncratic/inconsistent research activity

5) Very few replications (see Polio & Gass, 1997; Porte, 2002, 2012)

What challenges might one encounter in conducting a

meta-analysis in your target domain and/or generally?

Challenges to meta-analysis (cont.)

6) Disagreement over definitions and operationalizationsE.g., noticingPerhaps more “adversarial collaboration” is needed (see Tetlock &

Mitchell, 2009)

7) Overreliance on individual studies (see Norris & Ortega, 2007)

8) Bias of primary (and secondary) researchers toward particular types of findings (e.g., in favor/against theory X; p < .05)

9) Tradition of overreliance on NHST (see Schmidt & Hunter, 2002)

CrudeUninformativeUnreliable

A synthetic approach to primary research?

What might this look like generally and in terms of…Research agendas?Reporting practices and interpretations of findings?Researcher training? Journal calls and acceptance policies?

Conclusion: Judgment and decision-making play a

major role in all meta-analyses

Understanding the choices

More appropriate execution and interpretation of meta-analytic findings

More precise advances in theory, more efficient L2 research, and more accurately

informed practice

Further Reading

Synthesizing research on language learning and teaching (Norris & Ortega, 2006)

Research synthesis and meta-analysis: A step-by-step approach (Cooper, 2010)

Practical meta-analysis (Lipsey & Wilson, 2001)

Connections to Other Topics to be Discussed this WeekNHST, effect sizes (MONDAY)Study Quality (WEDNESDAY)Replication (THURSDAY)Reporting practices (FRIDAY)

Tomorrow: Study Quality

What does this mean?

How can we operationalize study quality?

What findings exist for studies of study quality in AL?

Where and how can the findings of quality analyses be implemented?