Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative...
-
Upload
sheila-brenda-banks -
Category
Documents
-
view
213 -
download
0
Transcript of Defining and Evaluating ‘Study Quality’ Luke Plonsky Current Developments in Quantitative...
Defining and Evaluating ‘Study Quality’
Luke PlonskyCurrent Developments in
Quantitative Research MethodsLOT Winter School, 2014
Study Quality Matters?Building theory (or a house)
Studies = 2x4s, bricks, etc.
Self-evident?Rarely discussed in linguistics researchBut lack of attention to quality ≠ low quality
Implication: Study quality needs to be examined, not assumed
YES!
Defining ‘Study Quality’
How was SQ defined in Plonsky & Gass (2011) and Plonsky (2013)?
How was SQ operationalized?
Do you agree with this definition & operationalization?
Now consider your (sub-)domain of interestHow would you operationalize SQ?How would you weight or prioritize different
features?
Missing data
Data Type Primary Secondary / Meta-Analytic
SDs - Sample variability - Calculate ESs (d) exclusion?
Reliability - Small effects due to treatment or dependent measure?- Inform instrument design
- Adjust ESs for attenuation
Effect sizes
- Interpret magnitude of effects- Future power analyses
- Compare/combine results- Power for moderator analysis
LIMITED INFLUENCE ON L2 THEORY, PRACTICE, AND
FUTURE RESEARCH INEFFICIENT
Sources/Considerations for an Instrument to Measure Study Quality?
1. (Over 400) Existing measures of study quality from the meta-analysis literature (usually for weighting Ess) (e.g., sample: Valentine & Cooper, 2008—Table 2)
2. Societal guidelines (e.g., APA, APS, sample: JARS Working Group—Table 1, 2008, AERA 2006 reporting standards, LSA??, AAAL/AILA??)
3. Journal guidelines (e.g., Chapelle & Duff, 2003)
4. Methodological syntheses from other social sciences (e.g., Skidmore & Thompson, 2010)
5. Previous reviews / meta-analyses (e.g., Chaudron, 2001; Norris & Ortega, 2000; Plonsky, 2011)
6. Methods/stats textbooks (Larson-Hall, 2010; Porte, 2010)
7. Others?
(Only) Two studies in this area to address study quality empirically
Plonsky & Gass (2011); Plonsky (2013, in press)
Rationale & MotivationsStudy quality needs to be measured, not assumed
Concerns expressed about research and reporting practices
“Respect for the field of SLA can come only through sound scientific progress” (Gass, Fleck, Leder, & Svetics, 1998)
No previous reviews of this nature
Plonsky & Gass (2011) & Plonsky (2013)
Two common goals:
1. Describe and evaluate quantitative research practices
2. Inform future research practices
Methods(very meta-analytic but focus on methods rather than substance/effects/outcomes)
Plonsky & Gass (2011)
Domain: Interactionist L2 research; quantitative only
Across 16 journals & 2 books (all published, 1980-2009)
K = 174
Coded for: designs, analyses, reporting practices
Analyses: frequencies/%s
Plonsky (2013)
Domain: all areas of L2 research; quantitative only
Two journals: LL & SSLA (all published, 1990-2010)
K = 606
Coded for: designs, analyses, reporting practices (sample scheme)
Analyses: frequencies/%s
How would you define your domain? Where would you search for primary studies?
RESULTS
Results: Designs
Major Designs across Research SettingsPlonsky (2013) P&G
(2011)Design Class Lab (all)Observational 20% 80% 65Experimental 45% 55% 35
Results: Designs
SamplesStudy Average
nTotal N Groups
P&G (2011) 22 7,951 365Plonsky (2013)
19 181,255 1,732
Results: Designs
Plonsky (2013) P&GFeature Class Lab AllRandom assign.
23% 48% 32%
Ctrl/Comp group
90% 84% 55%
Pretest 78% 59% 39%Delayed posttest
50% 29% 79%
Results: Analyses
Analysis P&G (2011) %
P (2013) %
ANOVA 54 56t test 69 43Correlation 18 31Chi-square 50 19Regression 8 15MANOVA 7 7ANCOVA 7 5Factor analysis 2 5SEM - 2Other - 7Nonparametrics
- 5
Results: Analyses
P&G (2011) %
P (2013) %
Zero 6 12One 32 28Multiple 62 60
Number of Unique Statistical Analyses Used in L2 Research
M35
SD64
95% CIs
30-40
Median18
Tests of Statistical Significance in L2 ResearchPlonsky (2013)
Results: Descriptive Statistics
Item P&G (2011) %
P (2013) %
Percentage 62 68Frequency 71 48Correlation - 30Mean 64 77Standard deviation
52 60
Mean without SD - 31Effect size 18 26Confidence interval
3 5
Plonsky (2013)
Results: Inferential Statistics
Item P&G (2011) %
P (2013) %
F 26 61t 32 36x2 - 17p = 44 49p < or > 61 80p either = or </> - 44p = and p < or > - 42ANOVA / t test without M - 20ANOVA / t test without SD - 35ANOVA / t test without f or t - 24
Results: Other Reporting Practices
Item P&G (2011) %
P (2013) %
RQs or hypotheses - 80Visual displays of data
- 53
Reliability 64 45Pre-set alpha 25 22Assumptions checked 3 17Power analysis 2 1
Plonsky (2013)
?
Studies excluded due to missing data (usually SDs) (as % of meta-analyzed sample)
Li (2010)
Pennock-Roman & Rivera (2011)
Bowles (2010)
Kieffer et al. (2009)
Abraham (2008)
Goldschnieder & DeKeyser (2003)
Norris & Ortega (2000)
Plonsky (2011)
Wa-Mbaleka (2006)
Lin et al. (2013)
Grgurović et al. (2013)
Biber et al. (2011)
Russell & Spada (2006)
Nekrasova & Becker (2009)
Dinsmore (2006)
Wu (1991)
0 40 80 120 160 200 240 280
6212527
364249
595960
81104110
119194
300
MEDIAN
Median K = 16 (Plonsky & Oswald, under review)
Data missing in meta-analyzed studies (as % of total sample)
Nekrasovia & Becker (2009)
Norris & Ortega (2000)
Plonsky & Gass (2011)
Plonsky (in press)
Keck et al. (2006)
Wang (2010)
Lee & Huang (2008)
0 10 20 30 40 50 60 70 80 90100
14
18
20
29
24
12
35
29
6
17
24
49
19
Missing test statMissing SDMissing M
Reporting of reliability coefficients (as % of meta-analyzed sample)
Nekrasova & Becker (2009)
Mackey & Goo (2007)
Norris & Ortega (2000)
Russell & Spada (2006)
Jeon & Kaya (2006)
Plonsky (2011)
Ziegler (2013)
Plonsky (in press)
Adesope et al. (2010)
Adesope et al. (2011)
Plonsky & Gass (2011)
0 25 50 75 100
6
7
16
20
38
41
43
45
46
50
64
Keck et al. (2006)
Norris & Ortega (2000)
Wang (2010)
Mackey & Goo (2007)
Plonsky & Gass (2011)
Plonsky (in press)
Ziegler (2013)
0 10 20 30 40 50 60 70 80 90 100
0
6
11
14
18
26
36
0
3
0
3
5
0
CIES
Reporting of effect sizes & CIs (as % of meta-analyzed sample)
Other data associated with quality/transparency and recommended or required by APA (as % of meta-analyzed sample)
Plonsky & Gass (2011)
Plonsky (in press)
Mackey & Goo (2007)
Nekrasovia & Becker (2009)
Norris & Ortega (2000)
Ziegler (2013)
Keck et al. (2006)
0 10 20 30 40 50 60 70 80 90 100
25
22
25
20
29
38
3
17
36
2
1
7
8053
34
56
PowerAssumptionsPre-set alphaVisualsRQs
Elsewhere in the social sciences…Kesselman et al. (1998-a)
Kesselman et al. (1998-c)
Kesselman et al. (1998-b)
Kieffer et al. (2001-a)
Kesselman et al. (1998-d)
Bangert & Baumberger (2005)
Thompson & Snyder (1997)
Kieffer et al. (2001-b)
Vacha-Haase et al. (1999)
Willson (1980)
Sedlmeier & Gigerenzer (1989)
Cashen & Geiger (2004)
0 10 20 30 40 50 60 70 80 90 100
7
7
10
18
24
43
64
72
61
15
9
33
3
0
0
1
0.003
0
7
0
0.003
11
39
36
37
PowerAssumptionsReliabilityCIsESs
Results: Changes over timeMeara (1995): “[When I was in graduate school], anyone who could explain the difference between a one-tailed and two-tailed test of significance was regarded as a dangerous intellectual; admitting to a knowledge of one-way analyses of variance was practically the same as admitting to witchcraft in 18th century Massachusetts” (p. 341).
Changes Over Time: Designs
Plonsky & Gass (2011) Plonsky (in press)
Changes Over Time: DesignsPlonsky (in press)
Changes Over Time: AnalysesPlonsky & Gass (2011)
Changes Over Time: AnalysesPlonsky (in press)
Changes Over Time: Reporting Practices
Plonsky & Gass (2011)
Changes Over Time: Reporting Practices
Plonsky (in press)
Relationship between quality and outcomes?
Plonsky (2011)
Plonsky & Gass (2011): larger effects for studies that include delayed posttests
Discussion (Or: So what?)
General:
Few strengths and numerous methodological weaknesses are present—common even—in quantitative L2 research
Quality (and certainly methodological features) vary across subdomains AND over time.
Possible relationship between methodological practices and the outcomes they produce.
Three common themes: Means-based analyses Missing data, NHST, and the ‘Power Problem’ Design Preferences
Discussion: Means-based analyses
ANOVAs, t tests dominate, increasinglyNot problematic as long as
Assumptions checked (17% of Plonsky, 2013)
Data are reported thoroughlyTest are most appropriate for RQs (i.e., not default)
Benefits to increased regression analyses (see Cohen, 1968)Less categorization of continuous variables (e.g., proficiency,
working memory) to use ANOVA loss of variance!More precise results (R2s +informative than an overall p or
eta2)Fewer tests preservation of experiment-wise power
Discussion: Missing data, NHST, & Power
In general: lots of missing and inconsistently reported data! BUT We’re getting better!
The “Power Problem” Small samples Heavy reliance on NHST Effects not generally very large Omission of non-statistical results inflated summary results Rarely check assumptions Rarely use multivariate statistics Rarely analyze power
Discussion: Design Preferences
Signs of domain maturity?+classroom-based studies +experimental studies+delayed posttests
Discussion-Summary
Causes/explanations- Inconsistencies among reviewers- Lack of standards- Lack of familiarity with design and appropriate data analysis and reporting- Inadequate training (Lazaraton et al., 1987)
- Non-synthetic-mindedness- Publication bias
Effects• Limited
interpretability• Limited meta-
analyzability• Overestimation of
effects• Overreliance on p
values
S l o w e r P r o g r e s s
Study Quality in Secondary/Meta-analytic Research?
Intro
M-As = high visibility and impact on theory and practice quality is critical
Several instruments proposed for assessing M-A qualityStroup et al. (2000)Shea et al. (2007) JARS/MARS (APA, 2008)Plonsky (2012)
Plonsky’s (2012) Instrument for Assessing M-A Quality
Goal 1: Assess transparency and thoroughness as a means to Clearly delineate the domain under investigation Enable replication Evaluate the appropriateness of the methods in
addressing/answering the study’s RQs
Goal 2: Set a tentative, field-specific standard Inform meta-analysts and reviewers/editors of M-As
Organization: Lit review/intro Methods Discussion
What items would you include?
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section I
Com
bin
e?
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section II
Plonsky’s (2012) Instrument for Assessing M-A Quality—Section III
Looking FORWARDRecommendations for:- Individual researchers- Journal editors- Meta-researchers- Researcher trainers- Learned societies
Consider power before AND after a study (but especially before)
p is overrated (meaningless?) especially when working with (a) small samples, (b) large samples, (c) small effects, (d) large effects
Report and interpret data thoroughly (EFFECT SIZES!)
Consider regression and multivariate analyses
Calculate and report instrument reliability
Team up with an experimental (or observational) researcher
Develop expertise in one or more novel (to you) methods/analyses
Love,
Luke
Dear individual researchers,
Dear journal editors,Use your influence to improve rigor, transparency, and
consistency
It’s not enough to require reporting (of…ES, SDs, reliability etc.) – interpretation too!
Develop field-wide and field-specific standards
Include special methodological reviews (see Magnan, 1994)
Devote (precious) journal space to methodological discussions and reports
Love,
Luke
Dear meta-researchers, Use your voice!
Guide interpretations of effect sizes in your domains
Evaluate and make knows methodological strengths, weaknesses, and gaps; encourage effective practices and expose weak ones
Don’t just summarize
Explain variability in effects, not just means (e.g., due to small samples, heterogeneous samples or treatments)
Examine substantive and methodological changes over time and as they related to outcomes
Cast the net wide in searching for primary studies
Love,
Luke
Dear researcher trainers,
Lots of emphasis on the basics: descriptive statistics, sample size+power+effect size+p; synthetic approach, ANOVA
Encourage more specialized courses, in other departments if necessary
Love,
Luke
Dear learned societies (AILA/AAAL, LSA, etc.),
To Learned Societies (AILA, AAAL, LSA, etc.)
Designate a task force or committee to establish field-specific standards for research and reporting practices:
(a) at least one member of the executive committee,
(b) members from the editorial boards of relevant journals,
(c) a few quantitatively- and qualitatively-minded researchers,
(d) and one or more methodologists in other disciplines
Love,
Luke
Closure
Content objectives: conceptual and practical (but mostly conceptual) Inform participants’ current and future research
effortsMotivate future inquiry with a methodological focus
Happy to consult or collaborate on projects related to these discussions
THANK YOU!