Doing Synthesis and Meta-Analysis in Applied Linguistics Lourdes Ortega University of Hawai‘i at M...
-
Upload
karin-lewis -
Category
Documents
-
view
215 -
download
2
Transcript of Doing Synthesis and Meta-Analysis in Applied Linguistics Lourdes Ortega University of Hawai‘i at M...
Doing Synthesis and Meta-Analysis in
Applied Linguistics
Lourdes OrtegaUniversity of Hawai‘i at Mānoa
National Tsing Hua UniversityTaiwan, June 8, 2011
Please cite as:
Ortega, L. (2011). Doing synthesis and meta-analysis in applied linguistics. Invited workshop at Tsing Hua University, Taipei, June 8, 2011.
Copyright © Lourdes Ortega, 2011
Research synthesis(including meta-analysis)
1. What is it?2. Why do it?3. How do we do it?4. An example…5. Challenges?6. Value?
What isresearch
synthesis?
The reviewing continuum
S e c o n d a r y R e s e a r c h
Narrative ..............................................................Systematic
……………..SYNTHESIS……………LIT REVIEW
META-ANALYSIS
So, what is meta-analysis, specifically?
…one specific kind of research synthesis…
Secondary analysis of quantitative analyses
Each primary study is a data point
Goal: what are the main ‘effects’ or ‘relationships’ found across many studies?
Strictly speaking, only quantitative studies apply
Why do it?
…have lead to unending debates:
What does the evidence “say”? According to whom? How do we know who is right?
Traditional literature reviews…
e.g.: error correction (Ferris
vs. Truscott)
e.g.: Critical Period Hypothesis
(Hyltenstam et al.vs. Birdsong)
Typical strategies of traditional reviews?
Tables summarizingmany studies
e.g. from Krashen et al. (1979):
Vote-counting technique
e.g.: Error correction in L2 writing
Limitations:
No specific set of methods, up to mysterious expertise
Experts are always vested, therefore vulnerable to charge of bias
Statistical significance has serious pitfalls
Idiosyncratic methodology
Evidentiary warrants difficult to judge
Over-reliance on statistical significance (but magnitude, not just generalizability, is of interest to
social scientists!)
What does the evidence “say”? According to whom? How do we know who is right?
Methods for reviewing, from “art” into “science”:
Systematic, not arbitraryMore than the sum of the partsReplicable
SOLUTION in the late 1970s
Secondary, yes...but empirically accountable, & discovering new
truths in old data
How do wedo it?
Norris & Ortega (2006a, 2006b)
Norris, J. M., & Ortega, L. (2010). Timeline: Research synthesis. Language Teaching, 43, 461-479.
Ortega, L. (2010). Research synthesis. In B. Paltridge & A. Phakiti (Eds.), Companion to research methods in applied linguistics (pp. 111-126). London: Continuum.Norris, J. M. (2012). Meta-analysis. In C.
Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.
Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.
1. Principled selection of primary studies
3. Direct use of the evidence reported (not the authors’ interpretations) across studies
What are the definitional features of all syntheses (including meta-analyses)?
2. Systematic coding of each study for main variables
1. Principled selection of studies
Sampling is central to empirical researchwhat population are we trying to understand?
Random[experimental]
Purposive[qualitative]
Sampling is central to synthesis, as well
Complete[secondary research should be based
on the full universe of studies that have investigated the same thing]
Search & Retrieval of Literature
The literature search is a key step in systematic synthesis (some direction: In'nami & Koizumi, 2010)identify all studies that are relevant
Exhaustive[electronic, hand,footnote chasinginvisible college]
Replicable[fully explained in report]
1st electronic searches 2nd other techniques:
Manual searches of journalsFootnote chasing
Forward searches with Web of ScienceWebsite searches of key contributing
scholarsPolite email requests to authors & experts
Inclusion & Exclusion criteria
All potentially relevant studies must then be examined to decide: Include or Exclude (“apples or oranges?”)
Inclusion criteria[all criteria satisfied]
Exclusion criteria[explain each reason for exclusion
and give examples]
Full rationale: [tables, appendices,
philosophy of inclusivity or selectivity]
1. Principled selection of
studies
Literature search +Study eligibility criteria,
Inclusion/exclusion
What are the definitional features of all syntheses (including meta-analyses)?
2. Systematic coding of each study
Eliciting evidence with consistency, just as when surveying, interviewing, or testing participants
Asking research questions of the literature:
What variables are important? How (and how well) have they been
investigated? What are the findings across studies?
Publicationfeatures
Substantivefeatures
Methodologicalfeatures
e.g., How was “explicit” instruction defined?
e.g., How was “learning” measured?
e.g., Means, sd, etc?
Sample size
Design
Reliability
Stats used
Etc.
Year
Author
Published or Fugitive?
•Journal
•Book
•Dissertation
•Presentation
Coding book to identify study features that answer questions
Multiple coders
1. Principled selection of
studies2. Systematic coding of each study for main
variables
Coding book,Standardization,
Intercoder reliability
What are the definitional features of all syntheses (including all meta-analyses)?
Record carefully what authors report and how they report it,…
But ultimately, analyze what the evidence they present tells us, not what they say it means…
Seeking an objective view across studies of the accumulated state of knowledge…
3. Trust the evidence, not the authors
When aggregating and averaging findings is the goal, as in meta-
analysis…
How do we compare, combine, and interpret
findings across numerous quantitative studies of the
same thing?
effect sizes & confidence intervals
An estimate of the magnitude or strength of
a quantitative finding:
…how much difference?…how much improvement?
…how closely related?
Effect size: What is it?
Effect sizes: absolute scales
scale Study 1 Study 2
1. percent Experimental group = 30% better than control
Experimental group = 20% better than control
3. known measure
Pre-post TOEFL score: 450 575
Pre-post TOEFL score: 450 495
Q: What happens when studies to not report findings on comparable scales?
2. correlation Motivation & achievement, r = .36
Motivation & achievement, r = .78
d is also simple to calculate and to interpret, and it incorporates variability differences between groups
Effect size d = The average of the experimental group minus the average of the control group
divided by the pooled standard deviation of both groups.
Effect sizes: standardized
Difference between experimental and control groups in standard deviation units (Cohen’s d)
difference
exper.exper.
contr.contr.
No sizeable effect (d=0.10)
difference
exper.exper.
contr.contr.
Very large effect (d=3.00)
Effect sizes: standardized
Effect sizes for meta-analysis
Study 1
Study 2
Study 3
Study 4
Study 5
Study …
Study …
…
effect size 1
effect size 2
effect size 3
effect size 4
effect size 5
…
= average effect size
"The terms 'small,' 'medium,' and 'large' are relative, not only to each other, but to the area of behavioral science or even more particularly
to the specific content and research method being employed in any given investigation..."
(Cohen, 1988, p. 25)
Interpreting effect sizes: What does d really tell us?d < .30
d > .30d < .80 d > .80
The stroll from the hotel to the University is, on average, 10 minutes, plus or minus 3 minutes:
The average is not enoughConfidence
Intervals
Upper bound=13 minutes
Average=10 minutes
Lower bound=7 minutes
“The margin of error in an observation”
95% certainty
Confidence Intervals in Meta-analysis
CIs tell us about the certainty with which we can interpret an average
effect size.
Effect Sizes and Confidence Intervals in Meta-analysis
N K Meand
SDd
95% CI
lower
95% CI
upper
Avg. effect of instructional treatment
49 98 .96 .87 .78 1.14
We can be 95% certain that the actual effect of instruction lies
between .78 and 1.14
Why does it help to focus on effect sizes?
Smoking up to half a pack a day (or less
than 10 cigarettes) a day increases the
chance of mortality by 40% when
compared to non-smokers
Smoking two packs or more a day increases the risk of death by three times to 120% when compared to
non-smokers
U.S. Department of Health, Education, and Welfare Report, 1967
e.g., effects of Smoking researchin the 1960s
There is a statistically significant difference
in mortality rates between smokers and
non-smokers.
And what about small effects—can they be important too?
r = .034a truly ‘tiny’
effect!
Regular aspirin consumption and decrease
in heart attacks = 3.4% decrease = at least 3 out
of 100 who would not have a heart attack if they regularly took aspirin.
d = .30a small
magnitude effect!
Effects of reading tutorials for underachieving
students, the same for untrained peer tutoring and for highly trained teachers engaging in
longer hours of tutoring. Both are important! Interpreting effect sizes: complex,
contextualized, not absolute
1. Principled selection of
studies
3. Direct use of the evidence reported (not the authors’ interpretations)
2. Systematic coding of each study for main
variables
Effect sizes,Confidence Intervals,
Other kinds of new data based on old
What are the definitional features of all syntheses (including all meta-analyses)?
How do we do it? An example ofSynthesis+meta-
analysis
In applied linguistics, the first full-blown synthesis and meta-analysis:
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.
Effects of instruction
RecastsGarden path
Input enhancement
Inputprocessing
Input flood
inductive
Task-basedinteraction
Traditionalgrammar
Consciousness-raising
dictogloss
Step 1: Problem Specification
Focus of Norris & Ortega
L2 instruction L2 learning
RQ 1&2Instruction
Overall? By type?RQ 6:
Quality of research practices?
RQ 4:Instructional
intensity?
RQ 3: Effect of
outcome measures?
RQ 5:Durabilityof effects?
Step 2: Literature search
1st electronic searches 2nd other techniques:
Manual searches of 14 journalsFootnote chasing of 25 reviews
Footnote chasing of each study included
Step 3: Study eligibility criteria
Potentially relevant 250 >>>> relevant for synthesis 77
>>>> adequate for meta-
analysis 49
Step 4: Coding of study features
Type of instruction: FonF, FonFS, explicit, implicit
Type of outcome measure: metalinguistic, selected, constrained, free
Intensity of instruction: Brief (less than 1 hr), short (between 1 and 2 hrs), medium (between 3 and 6 hrs), long (more than 7 hrs)
Durability of effects: effect sizes on delayed tests
Steps 5 & 6: Analyze, display, interpret
Findings RQ 1 & 2 (effectiveness):
Findings RQ 3 (type of measure)
Findings RQ 4 (intensity):
Findings RQ 5 (durability):
RQ 1-5 (meta-analysis part):How effective is L2 instruction?
Clearly more effective than no instruction or only meaningful exposure to L2 d = 0.96 based on 49 studies
Explicit instruction is superior in the short term to implicit instruction d = 1.13 versus d = 0.54, based on 69 and 29 contrasts, respectively
But focus on form and on formS are equally effective d = 1.00 form versus 0.93 formS, based on 43 and 55 contrasts, respectively
Effects are durable delayed post-tests from 22 studies: d = 1.02
RQ6 (synthesis part):Research practices
Too many variables in a single design need to simplify designs, increase N
No pre-test (18%), no true control group (83%) need to always include both
Poor reporting standards (52% no sd, 84% no instrument reliability, 57% no set alpha) editors need to demand better reporting
Misuse of statistical inference (no assumptions checked or met, parametric stats on small samples, no consideration of magnitude) the field needs better training in statistics if they insist on using such methods
Since then…accumulation of meta-analyses
In 2000, when Norris & Ortega was published, there were only 2 other published systematic syntheses in
applied linguistics. As of 2010, Norris & Ortega identified 23 in their Timeline,
most published since 2006.Motivation: Masgoret & Gardner (2003)
Interaction: Keck et al. (2006), Mackey & Goo (2007)
Oral feedback: Russell & Spada (2006), Lyster & Saito (2010), Li (2010)
Use of glosses in CALL: Taylor (2006 & 2009), Abraham (2008)
Some challenges for
research synthesis in L2
research…
Well known phenomenon, present in all the social sciences (Rosenthal, 1979; Rothstein et al., 2005)
Little understood in applied linguistics
Publication bias: “file drawer problem”
• Include fugitive literature
• Check for publication bias
The quality of a synthesis can only be as good as the quality of the primary studies that are synthesized in it...
But how do we judge quality? Publication type? Methodology ratings? Exclusions?
Quality: “garbage in, garbage out”
Anticipate consequences of synthesis
Ethics
Would it prematurely close the area for research?
Would it taken as a personal attack on researchers/labs?
What is the potential for findings to be (mis)appropriated by audiences (policy makers, teachers, …)?
High-tech statistication,cookie-cutter approach
“... conceptual vacuum when technical meta-analytic expertise is not coupled with deep knowledge of the theoretical and conceptual issues at stake in the research domain under review…”
(Norris & Ortega, 2006b, p. 37)
Meta-analysis only, no interest in quantitative synthesis of other
kinds/scope
New-generation meta-analyses bypass synthesis:
Li (2010)Lyster & Saito (2010)
Plonsky (2011)Spada & Tomita (2010)
Thomas (1994), (2006)Ortega (2003)
?????
Yet, much contemporary research in applied linguistics is qualitative and
increasingly more is mixed-methods… both worth synthesizing!
Qualitative synthesis?
No interest either in exploring qualitative synthesis… Only Téllez & Waxman (2006) in applied linguistics
Meta-ethnography(Noblit & Hare, 1988;see Téllez & Waxman,
2006)Qualitative Comparative Analysis
(Ragin, 1999)
Critical Interpretive Synthesis(Dixon-Woods et al., 2006)
And there are options to draw from in education, health sciences, and
other fields!
Value?
There is huge value in systematic synthesis
(including meta-analysis):Secondary research, yes... but:
• Empirically accountable• Conceptually
illuminating:discovering new truths
in old data
Sustained progress…
• Much improvement in certain reporting practices (LL, MLJ in particular)
• Larger N in primary studies = more trustworthy analyses
• Use of increasingly sophisticated techniques in meta-analyses…
study quality criteria, weighting (by N, reliability, variance), fixed/random effects
models, sensitivity analysis, fill & trim estimations, publication bias, etc.
• Use of meta-analytic software, e.g.:http://www.meta-analysis.com
“we envision synthetic methodologies as advancing our ability to produce new knowledge by carefully building upon, expanding, and transforming what has been accumulated over time ... However, ... all knowledge is bound by context and purpose...”
(Norris & Ortega, 2006b, p. 37)
But only if applied linguists cultivate“the will
to synthesis”
Thank [email protected]
References
Abraham, L. B. (2008). Computer-mediated glosses in second language reading comprehension and vocabulary learning: A meta-analysis. Computer Assisted Language Learning , 21, 199-226.
Dixon-Woods, M., Bonas, S., Booth, A., Jones, D. R., Miller, T., Sutton, A. J., et al. (2006). How can systematic reviews incorporate qualitative research? A critical perspective. Qualitative Research, 6, 27-44.
Keck, C. M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the empirical link between task-based interaction and acquisition: A meta-analysis. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 91-131). Amsterdam: John Benjamins.
Krashen, S., Long, M. H., & Scarcella, R. (1979). Accounting for child-adult differences in second language rate and attainment. TESOL Quarterly, 13, 573-582.
Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language Learning, 60, 309-365.
Lyster, R., & Saito, K. (2010). Oral feedback in classroom SLA: A meta-analysis. Studies in Second Language Acquisition, 32(2). Mackey, A., & Goo, J. M. (2007). Interaction research in SLA: A meta-analysis and research synthesis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collection of empirical studies (pp. 407-452). New York: Oxford University Press.
Masgoret, A.-M., & Gardner, R. C. (2003). Attitudes, motivation, and second language learning: A meta-analysis of studies conducted by Gardner and associates. Language Learning, 53, 123-163.
Noblit, G. W., & Hare, R. D. (1988). Meta-ethnography : Synthesizing qualitative studies. Newbury Park, CA: Sage.
Norris, J. M. (2012). Meta-analysis. In C. Chapelle (Ed.), Encyclopedia of applied linguistics. Malden, MA: Wiley.
Norris, J. M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quantitative meta-analysis. Language Learning, 50, 417-528.
Norris, J. M., & Ortega, L. (Eds.). (2006a). Synthesizing research on language learning and teaching. Amsterdam: John Benjamins.
Norris, J. M., & Ortega, L. (2006b). The value and practice of research synthesis for language learning and teaching. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 3-50). Amsterdam: John Benjamins.
Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41, 805-815.
Norris, J. M., & Ortega, L. (2010). Research timeline: Research synthesis. Language Teaching, 43, 461-479.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24, 492-518.
Ortega, L. (2010). Research synthesis. In B. Paltridge & A. Phakiti (Eds.), Companion to research methods in applied linguistics (pp. 111-126). London: Continuum.
Plonsky, L. (2011). The effectiveness of second language strategy instruction: A meta-analysis. Language Learning, 61(4).
Ragin, C. C. (1999). Using Qualitative Comparative Analysis to study causal complexity. Health Services Research, 34 (5 -Part 2), 1225-1239.
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for the acquisition of L2 grammar: A meta-analysis of the research. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 133-164). Amsterdam: John Benjamins.
Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language feature: A meta-analysis. Language Learning, 60, 263-308.
Taylor, A. M. (2006). The effects of CALL versus traditional L1 glosses on L2 reading comprehension. CALICO Journal , 23, 309-318.
Taylor, A. M. (2009). CALL-based versus paper-based glosses: Is there a difference in reading comprehension? CALICO Journal , 27, 147-160.
Téllez, K., & Waxman, H. C. (2006). A meta-synthesis of qualitative research on effective teaching practices for English Language Learners. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 245-277). Amsterdam: John Benjamins.
Thomas, M. (1994). Assessment of L2 proficiency in second language acquisition research. Language Learning, 44, 307-336.
Thomas, M. (2006). Research synthesis and historiography: The case of assessment of second language proficiency. In J. M. Norris & L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 279-298). Amsterdam: John Benjamins.