Doing Synthesis and Meta-Analysis in Applied Linguistics Lourdes Ortega University of Hawai‘i at M...

Post on 28-Dec-2015

215 views 2 download

Tags:

Transcript of Doing Synthesis and Meta-Analysis in Applied Linguistics Lourdes Ortega University of Hawai‘i at M...

Doing Synthesis and Meta-Analysis in

Applied Linguistics

Lourdes OrtegaUniversity of Hawai‘i at Mānoa

National Tsing Hua UniversityTaiwan, June 8, 2011

Please cite as:

Ortega, L. (2011). Doing synthesis and meta-analysis in applied linguistics. Invited workshop at Tsing Hua University, Taipei, June 8, 2011.

Copyright © Lourdes Ortega, 2011

Research synthesis(including meta-analysis)

1. What is it?2. Why do it?3. How do we do it?4. An example…5. Challenges?6. Value?

What isresearch

synthesis?

The reviewing continuum

S e c o n d a r y R e s e a r c h

Narrative ..............................................................Systematic

……………..SYNTHESIS……………LIT REVIEW

META-ANALYSIS

So, what is meta-analysis, specifically?

…one specific kind of research synthesis…

Secondary analysis of quantitative analyses

Each primary study is a data point

Goal: what are the main ‘effects’ or ‘relationships’ found across many studies?

Strictly speaking, only quantitative studies apply

Why do it?

…have lead to unending debates:

What does the evidence “say”? According to whom? How do we know who is right?

Traditional literature reviews…

e.g.: error correction (Ferris

vs. Truscott)

e.g.: Critical Period Hypothesis

(Hyltenstam et al.vs. Birdsong)

Typical strategies of traditional reviews?

Tables summarizingmany studies

e.g. from Krashen et al. (1979):

Vote-counting technique

e.g.: Error correction in L2 writing

Limitations:

No specific set of methods, up to mysterious expertise

Experts are always vested, therefore vulnerable to charge of bias

Statistical significance has serious pitfalls

Idiosyncratic methodology

Evidentiary warrants difficult to judge

Over-reliance on statistical significance (but magnitude, not just generalizability, is of interest to

social scientists!)

What does the evidence “say”? According to whom? How do we know who is right?

Methods for reviewing, from “art” into “science”:

Systematic, not arbitraryMore than the sum of the partsReplicable

SOLUTION in the late 1970s

Secondary, yes...but empirically accountable, & discovering new

truths in old data

How do wedo it?

Norris & Ortega (2006a, 2006b)

Norris, J. M., & Ortega, L. (2010). Timeline: Research synthesis. Language Teaching, 43, 461-479.

Ortega, L. (2010). Research synthesis. In B. Paltridge & A. Phakiti (Eds.), Companion to research methods in applied linguistics (pp. 111-126). London: Continuum.Norris, J. M. (2012). Meta-analysis. In C.

Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.

Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.

1. Principled selection of primary studies

3. Direct use of the evidence reported (not the authors’ interpretations) across studies

What are the definitional features of all syntheses (including meta-analyses)?

2. Systematic coding of each study for main variables

1. Principled selection of studies

Sampling is central to empirical researchwhat population are we trying to understand?

Random[experimental]

Purposive[qualitative]

Sampling is central to synthesis, as well

Complete[secondary research should be based

on the full universe of studies that have investigated the same thing]

Search & Retrieval of Literature

The literature search is a key step in systematic synthesis (some direction: In'nami & Koizumi, 2010)identify all studies that are relevant

Exhaustive[electronic, hand,footnote chasinginvisible college]

Replicable[fully explained in report]

1st electronic searches 2nd other techniques:

Manual searches of journalsFootnote chasing

Forward searches with Web of ScienceWebsite searches of key contributing

scholarsPolite email requests to authors & experts

Inclusion & Exclusion criteria

All potentially relevant studies must then be examined to decide: Include or Exclude (“apples or oranges?”)

Inclusion criteria[all criteria satisfied]

Exclusion criteria[explain each reason for exclusion

and give examples]

Full rationale: [tables, appendices,

philosophy of inclusivity or selectivity]

1. Principled selection of

studies

Literature search +Study eligibility criteria,

Inclusion/exclusion

What are the definitional features of all syntheses (including meta-analyses)?

2. Systematic coding of each study

Eliciting evidence with consistency, just as when surveying, interviewing, or testing participants

Asking research questions of the literature:

What variables are important? How (and how well) have they been

investigated? What are the findings across studies?

Publicationfeatures

Substantivefeatures

Methodologicalfeatures

e.g., How was “explicit” instruction defined?

e.g., How was “learning” measured?

e.g., Means, sd, etc?

Sample size

Design

Reliability

Stats used

Etc.

Year

Author

Published or Fugitive?

•Journal

•Book

•Dissertation

•Presentation

Coding book to identify study features that answer questions

Multiple coders

1. Principled selection of

studies2. Systematic coding of each study for main

variables

Coding book,Standardization,

Intercoder reliability

What are the definitional features of all syntheses (including all meta-analyses)?

Record carefully what authors report and how they report it,…

But ultimately, analyze what the evidence they present tells us, not what they say it means…

Seeking an objective view across studies of the accumulated state of knowledge…

3. Trust the evidence, not the authors

When aggregating and averaging findings is the goal, as in meta-

analysis…

How do we compare, combine, and interpret

findings across numerous quantitative studies of the

same thing?

effect sizes & confidence intervals

An estimate of the magnitude or strength of

a quantitative finding:

…how much difference?…how much improvement?

…how closely related?

Effect size: What is it?

Effect sizes: absolute scales

scale Study 1 Study 2

1. percent Experimental group = 30% better than control

Experimental group = 20% better than control

3. known measure

Pre-post TOEFL score: 450 575

Pre-post TOEFL score: 450 495

Q: What happens when studies to not report findings on comparable scales?

2. correlation Motivation & achievement, r = .36

Motivation & achievement, r = .78

d is also simple to calculate and to interpret, and it incorporates variability differences between groups

Effect size d = The average of the experimental group minus the average of the control group

divided by the pooled standard deviation of both groups.

Effect sizes: standardized

Difference between experimental and control groups in standard deviation units (Cohen’s d)

difference

exper.exper.

contr.contr.

No sizeable effect (d=0.10)

difference

exper.exper.

contr.contr.

Very large effect (d=3.00)

Effect sizes: standardized

Effect sizes for meta-analysis

Study 1

Study 2

Study 3

Study 4

Study 5

Study …

Study …

effect size 1

effect size 2

effect size 3

effect size 4

effect size 5

= average effect size

"The terms 'small,' 'medium,' and 'large' are relative, not only to each other, but to the area of behavioral science or even more particularly

to the specific content and research method being employed in any given investigation..."

(Cohen, 1988, p. 25)

Interpreting effect sizes: What does d really tell us?d < .30

d > .30d < .80 d > .80

The stroll from the hotel to the University is, on average, 10 minutes, plus or minus 3 minutes:

The average is not enoughConfidence

Intervals

Upper bound=13 minutes

Average=10 minutes

Lower bound=7 minutes

“The margin of error in an observation”

95% certainty

Confidence Intervals in Meta-analysis

CIs tell us about the certainty with which we can interpret an average

effect size.

Effect Sizes and Confidence Intervals in Meta-analysis

N K Meand

SDd

95% CI

lower

95% CI

upper

Avg. effect of instructional treatment

49 98 .96 .87 .78 1.14

We can be 95% certain that the actual effect of instruction lies

between .78 and 1.14

Why does it help to focus on effect sizes?

Smoking up to half a pack a day (or less

than 10 cigarettes) a day increases the

chance of mortality by 40% when

compared to non-smokers

Smoking two packs or more a day increases the risk of death by three times to 120% when compared to

non-smokers

U.S. Department of Health, Education, and Welfare Report, 1967

e.g., effects of Smoking researchin the 1960s

There is a statistically significant difference

in mortality rates between smokers and

non-smokers.

And what about small effects—can they be important too?

r = .034a truly ‘tiny’

effect!

Regular aspirin consumption and decrease

in heart attacks = 3.4% decrease = at least 3 out

of 100 who would not have a heart attack if they regularly took aspirin.

d = .30a small

magnitude effect!

Effects of reading tutorials for underachieving

students, the same for untrained peer tutoring and for highly trained teachers engaging in

longer hours of tutoring. Both are important! Interpreting effect sizes: complex,

contextualized, not absolute

1. Principled selection of

studies

3. Direct use of the evidence reported (not the authors’ interpretations)

2. Systematic coding of each study for main

variables

Effect sizes,Confidence Intervals,

Other kinds of new data based on old

What are the definitional features of all syntheses (including all meta-analyses)?

How do we do it? An example ofSynthesis+meta-

analysis

In applied linguistics, the first full-blown synthesis and meta-analysis:

Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.

Effects of instruction

RecastsGarden path

Input enhancement

Inputprocessing

Input flood

inductive

Task-basedinteraction

Traditionalgrammar

Consciousness-raising

dictogloss

Step 1: Problem Specification

Focus of Norris & Ortega

L2 instruction L2 learning

RQ 1&2Instruction

Overall? By type?RQ 6:

Quality of research practices?

RQ 4:Instructional

intensity?

RQ 3: Effect of

outcome measures?

RQ 5:Durabilityof effects?

Step 2: Literature search

1st electronic searches 2nd other techniques:

Manual searches of 14 journalsFootnote chasing of 25 reviews

Footnote chasing of each study included

Step 3: Study eligibility criteria

Potentially relevant 250 >>>> relevant for synthesis 77

>>>> adequate for meta-

analysis 49

Step 4: Coding of study features

Type of instruction: FonF, FonFS, explicit, implicit

Type of outcome measure: metalinguistic, selected, constrained, free

Intensity of instruction: Brief (less than 1 hr), short (between 1 and 2 hrs), medium (between 3 and 6 hrs), long (more than 7 hrs)

Durability of effects: effect sizes on delayed tests

Steps 5 & 6: Analyze, display, interpret

Findings RQ 1 & 2 (effectiveness):

Findings RQ 3 (type of measure)

Findings RQ 4 (intensity):

Findings RQ 5 (durability):

RQ 1-5 (meta-analysis part):How effective is L2 instruction?

Clearly more effective than no instruction or only meaningful exposure to L2 d = 0.96 based on 49 studies

Explicit instruction is superior in the short term to implicit instruction d = 1.13 versus d = 0.54, based on 69 and 29 contrasts, respectively

But focus on form and on formS are equally effective d = 1.00 form versus 0.93 formS, based on 43 and 55 contrasts, respectively

Effects are durable delayed post-tests from 22 studies: d = 1.02

RQ6 (synthesis part):Research practices

Too many variables in a single design need to simplify designs, increase N

No pre-test (18%), no true control group (83%) need to always include both

Poor reporting standards (52% no sd, 84% no instrument reliability, 57% no set alpha) editors need to demand better reporting

Misuse of statistical inference (no assumptions checked or met, parametric stats on small samples, no consideration of magnitude) the field needs better training in statistics if they insist on using such methods

Since then…accumulation of meta-analyses

In 2000, when Norris & Ortega was published, there were only 2 other published systematic syntheses in

applied linguistics. As of 2010, Norris & Ortega identified 23 in their Timeline,

most published since 2006.Motivation: Masgoret & Gardner (2003)

Interaction: Keck et al. (2006), Mackey & Goo (2007)

Oral feedback: Russell & Spada (2006), Lyster & Saito (2010), Li (2010)

Use of glosses in CALL: Taylor (2006 & 2009), Abraham (2008)

Some challenges for

research synthesis in L2

research…

Well known phenomenon, present in all the social sciences (Rosenthal, 1979; Rothstein et al., 2005)

Little understood in applied linguistics

Publication bias: “file drawer problem”

• Include fugitive literature

• Check for publication bias

The quality of a synthesis can only be as good as the quality of the primary studies that are synthesized in it...

But how do we judge quality? Publication type? Methodology ratings? Exclusions?

Quality: “garbage in, garbage out”

Anticipate consequences of synthesis

Ethics

Would it prematurely close the area for research?

Would it taken as a personal attack on researchers/labs?

What is the potential for findings to be (mis)appropriated by audiences (policy makers, teachers, …)?

High-tech statistication,cookie-cutter approach

“... conceptual vacuum when technical meta-analytic expertise is not coupled with deep knowledge of the theoretical and conceptual issues at stake in the research domain under review…”

(Norris & Ortega, 2006b, p. 37)

Meta-analysis only, no interest in quantitative synthesis of other

kinds/scope

New-generation meta-analyses bypass synthesis:

Li (2010)Lyster & Saito (2010)

Plonsky (2011)Spada & Tomita (2010)

Thomas (1994), (2006)Ortega (2003)

?????

Yet, much contemporary research in applied linguistics is qualitative and

increasingly more is mixed-methods… both worth synthesizing!

Qualitative synthesis?

No interest either in exploring qualitative synthesis… Only Téllez & Waxman (2006) in applied linguistics

Meta-ethnography(Noblit & Hare, 1988;see Téllez & Waxman,

2006)Qualitative Comparative Analysis

(Ragin, 1999)

Critical Interpretive Synthesis(Dixon-Woods et al., 2006)

And there are options to draw from in education, health sciences, and

other fields!

Value?

There is huge value in systematic synthesis

(including meta-analysis):Secondary research, yes... but:

• Empirically accountable• Conceptually

illuminating:discovering new truths

in old data

Sustained progress…

• Much improvement in certain reporting practices (LL, MLJ in particular)

• Larger N in primary studies = more trustworthy analyses

• Use of increasingly sophisticated techniques in meta-analyses…

study quality criteria, weighting (by N, reliability, variance), fixed/random effects

models, sensitivity analysis, fill & trim estimations, publication bias, etc.

• Use of meta-analytic software, e.g.:http://www.meta-analysis.com

“we envision synthetic methodologies as advancing our ability to produce new knowledge by carefully building upon, expanding, and transforming what has been accumulated over time ... However, ... all knowledge is bound by context and purpose...”

(Norris & Ortega, 2006b, p. 37)

But only if applied linguists cultivate“the will

to synthesis”

Thank Youlortega@hawaii.edu

References

Abraham, L. B. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning , 21, 199-226.

Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R., Miller, T., Sutton, A. J., et al. (2006). How can systematic reviews incorporate qualitative research? A critical perspective. Qualitative Research, 6, 27-44.

Keck, C. M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the empirical link between task-based interaction and acquisition: A meta-analysis. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 91-131). Amsterdam: John Benjamins.

Krashen, S., Long, M. H., & Scarcella, R. (1979). Accounting for child-adult differences in second language rate and attainment. TESOL Quarterly, 13, 573-582.

Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60, 309-365.

Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-analysis. Studies in Second Language Acquisition, 32(2). Mackey, A., & Goo, J. M. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 407-452). New York: Oxford University Press.

Masgoret, A.-M., & Gardner, R. C. (2003). Attitudes, motivation, and second language learning: A meta-analysis of studies conducted by Gardner and associates. Language Learning, 53, 123-163.

Noblit, G. W., & Hare, R. D. (1988). Meta-ethnography : Synthesizing qualitative studies. Newbury Park, CA: Sage.

Norris, J. M. (2012). Meta-analysis. In C. Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.

Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.

Norris, J. M., & Ortega, L. (Eds.). (2006a). Synthesizing research on language learning and teaching. Amsterdam: John Benjamins.

Norris, J. M., & Ortega, L. (2006b). The value and practice of research synthesis for language learning and teaching. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 3-50). Amsterdam: John Benjamins.

Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.

Norris, J. M., & Ortega, L. (2010). Research timeline: Research synthesis. Language Teaching, 43, 461-479.

Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24, 492-518.

Ortega, L. (2010). Research synthesis. In B. Paltridge & A. Phakiti (Eds.), Companion to research methods in applied linguistics (pp. 111-126). London: Continuum.

Plonsky, L. (2011). The effectiveness of second language strategy instruction: A meta-analysis. Language Learning, 61(4).

Ragin, C. C. (1999). Using Qualitative Comparative Analysis to study causal complexity. Health Services Research, 34 (5 -Part 2), 1225-1239.

Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for the acquisition of L2 grammar: A meta-analysis of the research. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 133-164). Amsterdam: John Benjamins.

Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-analysis. Language Learning, 60, 263-308.

Taylor, A. M. (2006). The effects of CALL versus traditional L1 glosses on L2 reading comprehension. CALICO Journal , 23, 309-318.

Taylor, A. M. (2009). CALL-based versus paper-based glosses: Is there a difference in reading comprehension? CALICO Journal , 27, 147-160.

Téllez, K., & Waxman, H. C. (2006). A meta-synthesis of qualitative research on effective teaching practices for English Language Learners. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 245-277). Amsterdam: John Benjamins.

Thomas, M. (1994). Assessment of L2 proficiency in second language acquisition research. Language Learning, 44, 307-336.

Thomas, M. (2006). Research synthesis and historiography: The case of assessment of second language proficiency. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 279-298). Amsterdam: John Benjamins.