Statistics. Why do we need statistics? To describe data: A-average B-quartile (ACT tests)...

Post on 18-Jan-2016

215 views 0 download

Transcript of Statistics. Why do we need statistics? To describe data: A-average B-quartile (ACT tests)...

Statistics

Why do we need statistics?

To describe data:

A-average

B-quartile (ACT tests)

C-percentile

Why do we need statistics?

To find relationships● Is use of 'like' related to age?● Do people who learn a language

earlier learn it better?

Why do we need statistics?

To test a hypothesis● Is Shakespeare's vocabulary larger

than King James Bible's? ● Do men interrupt more than women?

Null Hypothesis

●Hypothesis: Women talk more than men

●Null hypothesis: There is no difference between women and men'

Null Hypothesis

●Hypothesis: Women talk more than men

●Null hypothesis: There is no difference between women and men'

●Hypothesis: Program X classifies parts of speech more accurately than program Y

●Null hypothesis: There is no difference between program X and Y

Statistical Significance

● A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.

Statistical Significance

● A significant difference is better than one in twenty of happening by chance (p < .05). The opposite of significance is random chance.

● What if test had only 4 multiple choice questions and only one person took it and was rolling dice to determine answer? How many times could the person take the test with dice and get an 80% or better? The probability is high (over 1/20) that it will happen. If 100 people took the test the chances of getting an average 80% or better by rolling dice go way down (less than 1/20). If the test has 100 questions the possibility goes way down also.

Statistical Significance

● Consider a commercial that claims that four out of five dentists recommend toothpaste X. If only five dentists were actually consulted would you be impressed? Would you not be more motivated to buy it if 4,000 out of 5,000 dentists recommended toothpaste X, in spite of the fact that 4/5 and 4000/5000 are both 80%? In like manner, statistical formulas take into consideration factors such as the number of subjects, responses, and test items when calculating the statistical significance.

● In other words, an 80% vs. 85% score may not be significant if there are few test takers and few items, but an 80% vs. 81% may be significant if the test is long and many people took the test. Statistics takes this into consideration.

Types of Data

Categorical

Gender: male or female

Country of origin: Korea, Canada, Brazil, France

Education: high school graduate or not

Ethnicity: Hispanic, Caucasian, Asian, Black, Polynesian

Types of Data

Categorical

Childhood language background: monolingual, bilingual, multilingual

Prodrop: subject pronoun used with verb, subject pronoun not used with verb

Language abilities of participant: native, non-native

Teaching method: total physical response, audiolingual, grammar translation

Which word is used for “large sandwich”?: hoagie, subway, grinder, po boy

Types of Data

Ordinal

The order in which children acquire certain morphemes.

The way a test participant orders a series of five recordings of non-

natives from “most fluent” to “least fluent.”

Types of Data

Continuous

●Age

●Number of years of formal schooling

●Months spent living in a foreign country

●Time required to recognize a word during an experiment

●Frequency of a formant

●Duration of consonant closure

●Hours spent sending text messages

Variables

Characteristics that change from situation to situation, object to object, or person to person.

– Biographical variables (What kind are they?)

● age

● number of children

● ethnicity

● state of residence

● birth order among siblings

Dependent and Independent Variables

●What is the effect of X on Y?– X is independent– Y is dependent (you measure it)

Dependent and Independent Variables

Idea: People seem to use 'myself' as the non-reflexive object of a preposition rather than as a reflexive a lot more nowadays (e.g. “as for myself”).

Dependent and Independent Variables

Idea: People seem to use 'myself' as the non-reflexive object of a preposition rather than as a reflexive a lot more nowadays (e.g. “as for myself”).

Quantified question: What is the effect of time

(1950s, 1960s, etc.) on the use of 'myself' as the non-

reflexive object of a preposition?

What are the variables?

Dependent and Independent Variables

Idea: People seem to use 'myself' as the non-reflexive object of a preposition rather than as a reflexive a lot more nowadays (e.g. “as for myself”).

Quantified question: What is the effect of time (1950s, 1960s,

etc.) on the use of 'myself' as the non-reflexive object of a preposition?

Variables: Time is a continuous independent variable and number of uses of

'myself' as the object of a preposition is a continuous dependent variable.

Dependent and Independent Variables

Idea: It seems that women always outnumber men in foreign language classes.

Dependent and Independent Variables

Idea: It seems that women always outnumber men in foreign language classes.

Quantified question: What is the effect of gender

on enrollment in foreign language classes?

What are the variables?

Dependent and Independent Variables

Idea: It seems that women always outnumber men in foreign language classes.

Quantified question: What is the effect of gender on

enrollment in foreign language classes?

Variables: Gender is the categorical independent variable and

number of students enrolled is the continuous dependent

variable.

Dependent and Independent Variables

Idea: I wonder if daily consumption of greasy American-style fast food is likely to shorten my life?

Dependent and Independent Variables

Idea: I wonder if daily consumption of greasy American-style fast food is likely to shorten my life?

Yes

Correlation

Question answered: What is the relationship between two variables?

Type of variables used: Both continuous.

Correlation

Examples:

1 How are second language proficiency and degree of cultural adaptation related?

2 What is the relationship between vowel backness and how big an object

represented by a nonce word with back (or front) vowels is perceived to be?

3 How is word frequency related to the amount of time required to name a word?

4 How does income relate to happiness?

Correlation

Do southerners who move away from the South shift the pronunciation [aɪ] to [a] over time?

CorrelationSpeaker % [a] Years Away

1 98 1

2 82 1

3 99 2

4 65 3

5 90 3

6 85 5

7 75 5

8 50 5

9 75 6

10 55 6

11 85 7

12 70 8

13 30 8

14 55 9

15 80 9

16 25 10

Correlation

●Line slopes down =negative correlation

Correlation

●What is the effect of education on income?

Correlation

●What is the effect of education on income?

●Line slopes up=positive correlation

Correlation

●What does this correlation tell you?

●Is it positive or negative?

Correlation coefficient

●Called r

●Ranges from +1 to -1

●Shows direction of correlation (neg pos)

●Shows strength of correlation

Correlation coefficient

●r = .79

Correlation coefficient

●What is the effect of water's volume on its weight?

Correlation coefficient

●What is the effect of water's volume on its weight?

●What is r?

Correlation coefficient

●What is the effect of water's volume on its weight?

●What is r?

●r = 1

Regression Line

●The regression line is the closest line that can be drawn to the data points.

Interactive graph

What is p?

●The probability of getting the results by chance– r = 1 with two data points– r = 1 with 1000 data points

●1 in 20 chance or smaller of getting results by chance is called statistically significant

What is p?

●The probability of getting the results by chance– r = 1 with two data points– r = 1 with 1000 data points

●1 in 20 chance or smaller of getting results by chance is called statistically significant

●1/20 =.05, so p ≤ .05 is significant

What is p?

●The probability of getting the results by chance– r = 1 with two data points– r = 1 with 1000 data points

●1 in 20 chance or smaller of getting results by chance is called statistically significant

●1/20 =.05, so p ≤ .05 is significant

●Smaller p is MORE significant

Number of months in a foreign country and linguistic abilities in the country's language (positive or negative?)●What would this mean? R = 0.56, p < .03

●What would this mean? R = 0.56, p < .07

Number of native dialectal usages and time spent living outside of native dialect area (negative or positive?) ●What would this mean? R = -.23, p < .0001

●What would this mean? R = -.67, p < .0001

What is the past test of spling? What is the past tense of creeze?

Computer People Computer People

splung 35% splung 22% croze 12% croze 6%

Correlation and Causation

Correlation and Causation

●Does wealth cause belief in evolution?

●Does belief in God cause poverty?

Correlation and Causation

●Utah has highest use of antidepressants

●Utah has highest percentage of LDS

Correlation and Causation

●Utah has highest use of antidepressants

●Utah has highest percentage of LDS

●Utah has highest use of thyroid medicine

●Utah has highest autism rate

Correlation and Causation

●Utah has highest use of antidepressants

●Utah has highest percentage of LDS

●Utah has highest use of thyroid medicine

●Utah has highest autism rate

●Utahns go to doctors more

●Utahns don't self medicate with alcohol (as much)

Correlation and Causation

●Number of drownings is positively correlated with ice-cream sales

Correlation and Causation

●Number of drownings is positively correlated with ice-cream sales

●Bad oral health is correlated with Alzheimer's

Correlation and Causation

●Number of drownings is positively correlated with ice-cream sales

●Bad oral health is correlated with Alzheimer's– What are other reasons for this?