Basic SLA Statistics for the University Educator Peter Neff Harumi Kimura Philip McNally Matthew...

Basic SLA Statistics for the University Educator

Peter NeffHarumi KimuraPhilip McNallyMatthew Apple

CUE Forum 2007

© 2007 JALT CUE SIG and individual presenters

Purpose

Not how to use statistics in a study, but rather…

To help everyone better understand and interpret common statistical methods encountered in SLA studies

To cover common errors and issues related to each procedure

Overview

IntroductionDescriptive Statistics – HarumiT-tests – PhilipOne-way ANOVA – PeterFactor Analysis – MatthewQ&A

Outline

Each presenter will introduce:– The function of the procedure– Important underlying concepts– Its use in SLA research– An example of the procedure in action– Errors and issues to look out for

Descriptive Statistics

Harumi KimuraNanzan University

Q1: Unreasonable fear?Why do so many language teachers draw

back in terror when confronted with large doses of numbers, tables, and statistics?

It is irresponsible to ignore such research just because you do not have the relatively simple tools for understanding it.

J.D. Brown (1988)

Q 2: Values of statistical studies?

Individual behavior & Group phenomena

Quantifiable data Structured with definite procedures Follow logical steps Replicable Reductive PATTERNS

Q3: What do descriptive statistics provide?

Snapshot description of the situation observed

Numerical representations of how each group performed on the measures

Readers can draw a mental picture

Q4: How do we manage the data?

Organize and present the data for further analysis

We describe them in/as Graphs Figures

Two aspects of group behavior

Mean

Central tendency

Standard Deviation

Variability 　 from mean

Normal Distributiona normal curve

Just as in the natural world …

Position of an individualWithin a group

orComparison of a group

with other groups

How normal?

Flator

Peaked

Not symmetrical

IssuesAre the data appropriatefor further statistical analyses?

Mean and SD

Participants and sampling

N size

Mean and SD

Sampling: Random or Convenience

N size

To conclude

Mean and Standard DistributionNormal Distribution

These concepts “are central to all statistical research and sometimes forgotten by researchers.”

Brown, 1988

t-testsPhilip McNallyOsaka International University

Function: Comparing two means

A　t-test　will…

…tell you whether there is a statistically significant difference in the mean scores　　(Pallant,　2006,　p.206).

a.)　for　two　different　groups,　orb.)　for　one　group　at　two　different　times.

Types of t-test

One　group　(Within-subject or repeated measures　design)Paired　samples　t-testMatched　pairs　t-testDependent　means　t-test

Two　groups　(Between-group　or　Between-subjects　design)Independent　samples　t-testIndependent　measures　t-testIndependent　means　t-test

Uses of t-tests

(T)he simplest form of experiment that can be done: only one independent variable is manipulated in only two ways and only one dependent variable is measured (Field, 2003, p.207).

Example: Paired samples t-test

One　group

Time　1:　no　extensive　reading　(IV);　vocab　test　(DV).

Time　2:　after　extensive　reading　(IV);　vocab　test　(DV).

Example: Independent samples t-test

Two　groups

Group　A　　-　implicit　grammar　(IV);　test　(DV).

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Group　B　　-　explicit　grammar　(IV);　test　(DV).

Example: Macaro & Erler (2007)

A　longitudinal　study　of　11-12　year　old　British　learners　of　French.　The　effect　of　reading　strategy　instruction.　

Treatment　group:　　 N　=　62Control　group: N =　54

Measures　taken　of　reading　comprehension,　reading　strategy　use,　and　attitudes　to　French　before　and　after　the　intervention.

Interpreting the data: Macaro & Erler (2007)

Results　of　attitudes　to　French

Area

ReadingSpeakingWritingListeningSpellingGeneral learningHomeworkTextbook

t = 4.91, df = 114, p = .001* t = 2.28, df = 114, p = .024 t = 2.30, df = 114, p = .023 t = 4.12, df = 114, p = .001* t = 3.74, df = 114, p = .001* t = 3.61, df = 114, p = .001* t = 2.92, df = 114, p = .004* t = 3.01, df = 114, p = .005*

*p < .006

Types of error

Type　I　errorYou　think　you’ve　got　significance,　but　you　haven’t.　You　should　have　adjusted　your　alpha value　if　you　made　multiple　comparisons.

Type　II　errorYou　think　the　difference　between　the　means　was　by　chance.It　wasn’t,　but　because　you　adjusted　for　multiple　comparisons　the　data　failed　to　reach　significance.

Controlling for Type I error

95%　level　of　significance　=　95%　sure　difference　is　not　by　chance

20　comparisons　=　1　by　chance100　comparisons　=　5　by　chance

Controlling for Type I error

So,　we　have　to　make　a　Bonferroni　adjustment　if　we　make　multiplecomparisons…

　　　　　　Alpha　levelNo.　of　comparisons

　　　　　　　　　　　0.05　　　　　　　　　　　　　5　　　　　　　　　

　　　　　　　　　=　0.01

…and　use　this　new　figure　as　your　alpha　level.

Issues

•　does　the　data　meet　normality　assumptions?

•　is　the　sample　size　large　enough?

•　is　the　data　continuous?

•　is　Type　I　error　controlled　for?

One-way ANOVA

Peter NeffDoshisha University

What it is

Function– ANalysis Of VAriance - a search for mean

differences between data sets– One-way ANOVA - looking for significant

differences in the mean scores of 2 or more groups

– Why “one-way?” - looking at the effect that changing one variable has on the study’s participants

ANOVA Example 1

Control Treatment 1 Treatment 2 Group Group Group

M2●

M1●M3 ●

similar means (M) = non-significant (p > .05)

ANOVA Example 2 Control Treatment 1 Treatment 2 Group Group Group

M2●

M3 ●

M1●

M1 significantly different from M2 & M3, but… M2 & M3 not significantly different from each other

p<.05p<.05

p>.05

ANOVAs and T-tests

Both procedures look for significant mean differences between groups;

However, t-tests work best when limited to 2 groups.

ANOVAs can work with 3 or more groups while introducing less error.

ANOVAs in Language Research

Often used to compare:– Assessment scores– Survey responses

A typical situation may be to try different treatments/methods with 3 different groups and then testing them to see if the results show any significant differences

Vocabulary Testing Example

3 learning groups with equivalent starting vocabulary range– Group I learns with word cards– Group II learns with word lists– Group III learns with PC software

After several weeks of study, a vocab test is given

Results from the test are ANOVA analyzed to see if any groups scored significantly higher/lower

Peer Review Survey Example

3 learning groups in EFL writing courses Students peer review each other’s written work in

one of 3 ways– Group I – Written peer review– Group II – Oral peer review– Group III – PC-based peer review

After the review sessions, peer review satisfaction surveys are given using Likert (1~5) scales

Results are ANOVA analyzed for significant differences in satisfaction level among the review types

Reporting One-way ANOVA Results

Three basic components:

– 1) Table of descriptive statistics (mean, standard deviation, etc)

– 2) An ANOVA table (degrees of freedom, sum of squares, F-statistic)

– 3) A report of the post-hoc results with effect size

The F-statistic

The higher the better (for the model)

A significant F-statistic (p < .05) is what researchers look for

ANOVA in the Literature

Descriptive statistics

ANOVA statistics

F-statistic is significant

…i.e. our model seems to work

F-statistic cont.

Reaching significance indicates there are statistically important differences between some of the group means

But…the F-statistic doesn’t tell us where the differences are

For that we turn to…

Post-hoc Results andEffect Size

Post-hoc results These are done if the

F-statistic is significant Paired comparisons of

the group means Tell us where the

significant differences lie

Often reported in the text (though sometimes in table form)

Effect size Often referred to as ‘eta-

squared’ or ‘strength of association’

Indicates the magnitude of the difference between means

Reflects the total variance effected by the treatments

– .01 – small effect– .06 – medium effect– .14 – large effect

*According to Cohen (1988)

Reporting Post-hoc Results and Effect Size “Post-hoc comparisons using the indicated

that the mean score for Group 1 (M=21.36, SD=4.55) was significantly different from Group 3 (M=22.96, SD=4.49). Group 2 (M=22.10, SD=4.15) did not differ significantly from either Group 1 or 3.”

“Despite reaching statistical significance, the actual difference in group means was quite small. The effect size, calculated using eta-squared, was .02.”

Common ANOVA Problems and Issues

Starting out with non-equivalent groups

Not reporting the type of ANOVA performed

Not reporting specific post-hoc results

Not reporting effect size

Post-hoc results ANOVA table

“[This table] shows the result from running through an ANOVA by using SPSS. It can be seen that the difference among treatments is significant (p < 0.05). The scores for the Vocabulary condition were much higher than the other conditions. The Main Character condition was slightly higher than the Combined condition.”

Problems No mention of the type of ANOVA No mention of post-hoc results.

– Which groups were significantly different from each other? No mention of effect size.

– What was the magnitude of the treatment effect?

Conclusion

One-way ANOVAs are useful for looking at the effect of changing one variable on 3 or more equivalent groups

Often used for testing treatment effects or comparing survey results

Involves a two-step process of analyzing the model (through the F-statistic) and performing post-hoc procedures

Effect size (eta-squared) is an important component indicating the magnitude of the treatment effect

Factor Analysis

Matthew AppleDoshisha University

FA: What it is

Measures only one group or sample population

A “family” of FA– PCA, FA, EFA, CFA…

FA: What it does

Tests the existence of underlying (latent) constructs within a sample population– Identifies patterns within large numbers

of participants

– “Reduces” several items into a few measurable factors

Uses of FA within SLA

Typically used with psychological variables and Likert-scale questionnaires

Often a preliminary step before more complicated statistical analyses – Correlational Analysis– Multiple Regression– Structural Equation Modeling

Example questionnaire factors

1. 英語で外国人と話しがしたい。　I would like to communicate with foreigners in English. 2. 英語習得は自分の教養を高めるのに必要だ。　English is essential for personal development. 3. 日本語でも自分がうまく表現できない。　I am not good at expressing myself even in Japanese. 4. 外国の音楽と文化に興味がある。　I am interested in foreign music and culture. 5. 英語は社会で活躍するのに必要だ。　English is essential to be active in society. 6. 難しいトピックに関しても、自分の意見が言える。　

I can express my own opinions even about difficult topics.

Integrative

Instrumental

Self-competence

Terminology

Factor - the latent constructVariance - different answers to each

item (variable)

More Terminology!

Factor loading– Amount of shared variance between items and

the factor– Factor loadings above .4 are desirable, above .7

are excellent

Cronbach’s Alpha– Measurement of item-scale reliability– Based on inter-item correlation (i.e., the more

items, the greater the alpha)– Does not “prove” cause-effect or validity of items

themselves

Determining factors, then items

Researchers should determine the factors before adding items to the questionnaire– Previous research results

– Carefully constructed modelItems should be designed to relate to a

particular concept (factor)– “Borrow” items or develop them in a pilot

– 6-8 items for a robust factor

FA in the literatureItem 43 (“The more I study English, the more enjoyable I find it”)

F1 (“Beliefs about a contemporary (communicative) orientation to learning English”)

.630 loading

.63 X .63 = 40% of shared variance with the factor

(Above .4 is acceptable according to Stevens, 1992)

Assumptions of FA

Normal distributionItems are correlated above .3Large N-size

– “Over 300” (Tabachnick & Fidell, 2007)– 5-10 participants for each item

(Field, 2005)

Ex: 30-item questionnaire 3-4 factors 150-300 participants

Problems and issues with FA

“Fishing” for data (i.e., not reading the literature, then simply allowing SPSS to tell you what it finds)

Not understanding the nature of factors (i.e., using 2 or 3 items as a “factor” or keeping too many factors)

Using an arbitrary cut-off point for factor loadings (typically .3, .32, .35)

N-size far too small

Item 38 (“I am very aware that teachers/lecturers know a lot more than I do and so I agree with what they say is important rather than rely on my own judgment”)

F3, .33 loading ; F1, .29 loading

.33 X .33 = 11% of shared variance with the factor

Typical factor loading issues

Conclusions regarding FA

Typically used with questionnaires to reduce individual items to factors for purposes of correlation or prediction

Helps researchers draw conclusions from a large number of items through data reduction

Requires a large N-size and several reference books

Often written up with no regard to APA guidelines or previous research results

Horribly, horribly complicated

To sum up…

Descriptive statistics– Mean and SD

T-tests– Dependent, independent, paired

One-way ANOVA– F, effect size, post-hoc

Factor Analysis– Factor, variance, factor loading

Thank you for attending!

CUE SIG Forum 2007

JALT　2007　International　Conference

Yoyogi　Olympic　Memorial　Youth　Center

Tokyo,　Japan,　November　25,　2007

Basic SLA Statistics for the University Educator Peter Neff Harumi Kimura Philip McNally Matthew...

Documents

Transcript of Basic SLA Statistics for the University Educator Peter Neff Harumi Kimura Philip McNally Matthew...