Basic SLA Statistics for the University Educator Peter Neff Harumi Kimura Philip McNally Matthew...
-
Upload
fernanda-mowery -
Category
Documents
-
view
213 -
download
0
Transcript of Basic SLA Statistics for the University Educator Peter Neff Harumi Kimura Philip McNally Matthew...
Basic SLA Statistics for the University Educator
Peter NeffHarumi KimuraPhilip McNallyMatthew Apple
CUE Forum 2007
© 2007 JALT CUE SIG and individual presenters
Purpose
Not how to use statistics in a study, but rather…
To help everyone better understand and interpret common statistical methods encountered in SLA studies
To cover common errors and issues related to each procedure
Overview
IntroductionDescriptive Statistics – HarumiT-tests – PhilipOne-way ANOVA – PeterFactor Analysis – MatthewQ&A
Outline
Each presenter will introduce:– The function of the procedure– Important underlying concepts– Its use in SLA research– An example of the procedure in action– Errors and issues to look out for
Descriptive Statistics
Harumi KimuraNanzan University
Q1: Unreasonable fear?Why do so many language teachers draw
back in terror when confronted with large doses of numbers, tables, and statistics?
It is irresponsible to ignore such research just because you do not have the relatively simple tools for understanding it.
J.D. Brown (1988)
Q 2: Values of statistical studies?
Individual behavior & Group phenomena
Quantifiable data Structured with definite procedures Follow logical steps Replicable Reductive PATTERNS
Q3: What do descriptive statistics provide?
Snapshot description of the situation observed
Numerical representations of how each group performed on the measures
Readers can draw a mental picture
Q4: How do we manage the data?
Organize and present the data for further analysis
We describe them in/as Graphs Figures
Two aspects of group behavior
Mean
Central tendency
Standard Deviation
Variability from mean
Normal Distributiona normal curve
Just as in the natural world …
Position of an individualWithin a group
orComparison of a group
with other groups
How normal?
Flator
Peaked
Not symmetrical
IssuesAre the data appropriatefor further statistical analyses?
Mean and SD
Participants and sampling
N size
Mean and SD
Sampling: Random or Convenience
N size
To conclude
Mean and Standard DistributionNormal Distribution
These concepts “are central to all statistical research and sometimes forgotten by researchers.”
Brown, 1988
t-testsPhilip McNallyOsaka International University
Function: Comparing two means
A t-test will…
…tell you whether there is a statistically significant difference in the mean scores (Pallant, 2006, p.206).
a.) for two different groups, orb.) for one group at two different times.
Types of t-test
One group (Within-subject or repeated measures design)Paired samples t-testMatched pairs t-testDependent means t-test
Two groups (Between-group or Between-subjects design)Independent samples t-testIndependent measures t-testIndependent means t-test
Uses of t-tests
(T)he simplest form of experiment that can be done: only one independent variable is manipulated in only two ways and only one dependent variable is measured (Field, 2003, p.207).
Example: Paired samples t-test
One group
Time 1: no extensive reading (IV); vocab test (DV).
Time 2: after extensive reading (IV); vocab test (DV).
Example: Independent samples t-test
Two groups
Group A - implicit grammar (IV); test (DV).
Group B - explicit grammar (IV); test (DV).
Example: Macaro & Erler (2007)
A longitudinal study of 11-12 year old British learners of French. The effect of reading strategy instruction.
Treatment group: N = 62Control group: N = 54
Measures taken of reading comprehension, reading strategy use, and attitudes to French before and after the intervention.
Interpreting the data: Macaro & Erler (2007)
Results of attitudes to French
Area
ReadingSpeakingWritingListeningSpellingGeneral learningHomeworkTextbook
t = 4.91, df = 114, p = .001* t = 2.28, df = 114, p = .024 t = 2.30, df = 114, p = .023 t = 4.12, df = 114, p = .001* t = 3.74, df = 114, p = .001* t = 3.61, df = 114, p = .001* t = 2.92, df = 114, p = .004* t = 3.01, df = 114, p = .005*
*p < .006
Types of error
Type I errorYou think you’ve got significance, but you haven’t. You should have adjusted your alpha value if you made multiple comparisons.
Type II errorYou think the difference between the means was by chance.It wasn’t, but because you adjusted for multiple comparisons the data failed to reach significance.
Controlling for Type I error
95% level of significance = 95% sure difference is not by chance
20 comparisons = 1 by chance100 comparisons = 5 by chance
Controlling for Type I error
So, we have to make a Bonferroni adjustment if we make multiplecomparisons…
Alpha levelNo. of comparisons
0.05 5
= 0.01
…and use this new figure as your alpha level.
Issues
• does the data meet normality assumptions?
• is the sample size large enough?
• is the data continuous?
• is Type I error controlled for?
One-way ANOVA
Peter NeffDoshisha University
What it is
Function– ANalysis Of VAriance - a search for mean
differences between data sets– One-way ANOVA - looking for significant
differences in the mean scores of 2 or more groups
– Why “one-way?” - looking at the effect that changing one variable has on the study’s participants
ANOVA Example 1
Control Treatment 1 Treatment 2 Group Group Group
M2●
M1●M3 ●
similar means (M) = non-significant (p > .05)
ANOVA Example 2 Control Treatment 1 Treatment 2 Group Group Group
M2●
M3 ●
M1●
M1 significantly different from M2 & M3, but… M2 & M3 not significantly different from each other
p<.05p<.05
p>.05
ANOVAs and T-tests
Both procedures look for significant mean differences between groups;
However, t-tests work best when limited to 2 groups.
ANOVAs can work with 3 or more groups while introducing less error.
ANOVAs in Language Research
Often used to compare:– Assessment scores– Survey responses
A typical situation may be to try different treatments/methods with 3 different groups and then testing them to see if the results show any significant differences
Vocabulary Testing Example
3 learning groups with equivalent starting vocabulary range– Group I learns with word cards– Group II learns with word lists– Group III learns with PC software
After several weeks of study, a vocab test is given
Results from the test are ANOVA analyzed to see if any groups scored significantly higher/lower
Peer Review Survey Example
3 learning groups in EFL writing courses Students peer review each other’s written work in
one of 3 ways– Group I – Written peer review– Group II – Oral peer review– Group III – PC-based peer review
After the review sessions, peer review satisfaction surveys are given using Likert (1~5) scales
Results are ANOVA analyzed for significant differences in satisfaction level among the review types
Reporting One-way ANOVA Results
Three basic components:
– 1) Table of descriptive statistics (mean, standard deviation, etc)
– 2) An ANOVA table (degrees of freedom, sum of squares, F-statistic)
– 3) A report of the post-hoc results with effect size
The F-statistic
The higher the better (for the model)
A significant F-statistic (p < .05) is what researchers look for
ANOVA in the Literature
Descriptive statistics
ANOVA statistics
F-statistic is significant
…i.e. our model seems to work
F-statistic cont.
Reaching significance indicates there are statistically important differences between some of the group means
But…the F-statistic doesn’t tell us where the differences are
For that we turn to…
Post-hoc Results andEffect Size
Post-hoc results These are done if the
F-statistic is significant Paired comparisons of
the group means Tell us where the
significant differences lie
Often reported in the text (though sometimes in table form)
Effect size Often referred to as ‘eta-
squared’ or ‘strength of association’
Indicates the magnitude of the difference between means
Reflects the total variance effected by the treatments
– .01 – small effect– .06 – medium effect– .14 – large effect
*According to Cohen (1988)
Reporting Post-hoc Results and Effect Size “Post-hoc comparisons using the indicated
that the mean score for Group 1 (M=21.36, SD=4.55) was significantly different from Group 3 (M=22.96, SD=4.49). Group 2 (M=22.10, SD=4.15) did not differ significantly from either Group 1 or 3.”
“Despite reaching statistical significance, the actual difference in group means was quite small. The effect size, calculated using eta-squared, was .02.”
Common ANOVA Problems and Issues
Starting out with non-equivalent groups
Not reporting the type of ANOVA performed
Not reporting specific post-hoc results
Not reporting effect size
Post-hoc results ANOVA table
“[This table] shows the result from running through an ANOVA by using SPSS. It can be seen that the difference among treatments is significant (p < 0.05). The scores for the Vocabulary condition were much higher than the other conditions. The Main Character condition was slightly higher than the Combined condition.”
Problems No mention of the type of ANOVA No mention of post-hoc results.
– Which groups were significantly different from each other? No mention of effect size.
– What was the magnitude of the treatment effect?
Conclusion
One-way ANOVAs are useful for looking at the effect of changing one variable on 3 or more equivalent groups
Often used for testing treatment effects or comparing survey results
Involves a two-step process of analyzing the model (through the F-statistic) and performing post-hoc procedures
Effect size (eta-squared) is an important component indicating the magnitude of the treatment effect
Factor Analysis
Matthew AppleDoshisha University
FA: What it is
Measures only one group or sample population
A “family” of FA– PCA, FA, EFA, CFA…
FA: What it does
Tests the existence of underlying (latent) constructs within a sample population– Identifies patterns within large numbers
of participants
– “Reduces” several items into a few measurable factors
Uses of FA within SLA
Typically used with psychological variables and Likert-scale questionnaires
Often a preliminary step before more complicated statistical analyses – Correlational Analysis– Multiple Regression– Structural Equation Modeling
Example questionnaire factors
1. 英語で外国人と話しがしたい。 I would like to communicate with foreigners in English. 2. 英語習得は自分の教養を高めるのに必要だ。 English is essential for personal development. 3. 日本語でも自分がうまく表現できない。 I am not good at expressing myself even in Japanese. 4. 外国の音楽と文化に興味がある。 I am interested in foreign music and culture. 5. 英語は社会で活躍するのに必要だ。 English is essential to be active in society. 6. 難しいトピックに関しても、自分の意見が言える。
I can express my own opinions even about difficult topics.
Integrative
Instrumental
Self-competence
Terminology
Factor - the latent constructVariance - different answers to each
item (variable)
More Terminology!
Factor loading– Amount of shared variance between items and
the factor– Factor loadings above .4 are desirable, above .7
are excellent
Cronbach’s Alpha– Measurement of item-scale reliability– Based on inter-item correlation (i.e., the more
items, the greater the alpha)– Does not “prove” cause-effect or validity of items
themselves
Determining factors, then items
Researchers should determine the factors before adding items to the questionnaire– Previous research results
– Carefully constructed modelItems should be designed to relate to a
particular concept (factor)– “Borrow” items or develop them in a pilot
– 6-8 items for a robust factor
FA in the literatureItem 43 (“The more I study English, the more enjoyable I find it”)
F1 (“Beliefs about a contemporary (communicative) orientation to learning English”)
.630 loading
.63 X .63 = 40% of shared variance with the factor
(Above .4 is acceptable according to Stevens, 1992)
Assumptions of FA
Normal distributionItems are correlated above .3Large N-size
– “Over 300” (Tabachnick & Fidell, 2007)– 5-10 participants for each item
(Field, 2005)
Ex: 30-item questionnaire 3-4 factors 150-300 participants
Problems and issues with FA
“Fishing” for data (i.e., not reading the literature, then simply allowing SPSS to tell you what it finds)
Not understanding the nature of factors (i.e., using 2 or 3 items as a “factor” or keeping too many factors)
Using an arbitrary cut-off point for factor loadings (typically .3, .32, .35)
N-size far too small
Item 38 (“I am very aware that teachers/lecturers know a lot more than I do and so I agree with what they say is important rather than rely on my own judgment”)
F3, .33 loading ; F1, .29 loading
.33 X .33 = 11% of shared variance with the factor
Typical factor loading issues
Conclusions regarding FA
Typically used with questionnaires to reduce individual items to factors for purposes of correlation or prediction
Helps researchers draw conclusions from a large number of items through data reduction
Requires a large N-size and several reference books
Often written up with no regard to APA guidelines or previous research results
Horribly, horribly complicated
To sum up…
Descriptive statistics– Mean and SD
T-tests– Dependent, independent, paired
One-way ANOVA– F, effect size, post-hoc
Factor Analysis– Factor, variance, factor loading
Thank you for attending!
CUE SIG Forum 2007
JALT 2007 International Conference
Yoyogi Olympic Memorial Youth Center
Tokyo, Japan, November 25, 2007