Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael...

37
Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic statistics …

Transcript of Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael...

Page 1: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Department of Cognitive Science

Michael J. Kalsher

PSYC 4310COGS 6310MGMT 6969

© 2015, Michael Kalsher

Unit 1B: Everything you wanted to know

about basic statistics …

Page 2: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 2

Learning Outcomes

After completing this section, students will be able to:

• Demonstrate knowledge concerning what a statistical model is and why we use them (e.g., the mean)

• Demonstrate knowledge of what the ‘fit’ of a model is and why it is important (e.g., the standard deviation)

• Distinguish models for samples and populations.• Describe and discuss problems with NHST and

modern approaches (e.g., reporting confidence intervals and effect sizes).

Page 3: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 3

The Research Process

Page 4: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 4

Building Statistical Models

Page 5: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 5

Populations and Samples• Population

– The collection of units (be they people, plankton, plants, cities, suicidal authors, etc.) to which we want to generalize a set of findings or a statistical model.

• Sample– A smaller (but hopefully representative)

collection of units from a population used to determine truths about that population

Page 6: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 6

The Only Equation You Will Ever Need … only partly kidding!

ii errorModelOutcome

Page 7: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 7

A Simple Statistical Model

• In statistics we fit models to our data (i.e. we use a statistical model to represent what is happening in the real world).

• The mean is a hypothetical value (i.e. it doesn’t have to be a value that actually exists in the data set).

• As such, the mean is simple statistical model.

Page 8: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 8

The Mean

The mean (or average) is the value from which the (squared) scores deviate least (it has the least error).

n

xn

ii

X 1)(Mean

Page 9: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 9

The Mean as a Model

Page 10: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 10

Measuring the ‘Fit’ of the Model

• The mean is a model of what happens in the real world: the typical score.

• It is not a perfect representation of the data.

• How can we assess how well the mean represents reality?

Page 11: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 11

A Perfect Fit

0

1

2

3

4

5

6

0 1 2 3 4 5 6

Rater

Rati

ng

(ou

t of

5)

Page 12: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 12

Calculating ‘Error’

• A simple deviation is the difference between the mean and an actual data point.

• Deviations can be calculated by taking each score and subtracting the mean from it:

Page 13: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 13

Page 14: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 14

Use the Total Error?We could just take the error between the mean and each data point and add them up.

0)( XX

Score Mean Deviation

1 2.6 -1.6

2 2.6 -0.6

3 2.6 0.4

3 2.6 0.4

4 2.6 1.4

Total = 0

But, this is what we would get!

Page 15: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 15

Sum of Squared Errors• Merely summing the deviations to find out

the total error is problematic.

• This is because deviations cancel out because some are positive and others negative.

• Therefore, we square each deviation.

• If we add these squared deviations we get the Sum of Squared Errors (SS).

Page 16: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 16

20.5)( 2XXSS

Score Mean DeviationSquared Deviation

1 2.6 -1.6 2.56

2 2.6 -0.6 0.36

3 2.6 0.4 0.16

3 2.6 0.4 0.16

4 2.6 1.4 1.96

Total 5.20

Sum of Squared Errors (SS)

Page 17: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 17

Mean Squared Error (MSe)Although the SS is a good measure of the accuracy of our model, it depends on the amount of data collected. To overcome this problem, we use the average, or mean, squared error, but to compute the average we use the degrees of freedom (df) as the denominator.

Page 18: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 18

Degrees of Freedom

10X

12

118

9

Sample

10

15

78

?

Population

The number of degrees of freedom generally refers to the number of independent observations in a sample minus the number of population parameters that must be estimated from sample data (please refer to Jane Superbrain 2.2 on p. 49 of the Field textbook).

Page 19: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 19

The Standard Error• The standard deviation (SD) tells us how well

the mean represents the sample data.• If we wish to estimate this parameter in the

population (μ is the symbol used to connote the population

mean) then we need to know how well the sample mean represents the values in the population.

• The standard error of the mean (SE or standard error) tells us the extent to which our sample mean is representative of the population parameter.

Page 20: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 20

The SD and the Shape of a Distribution

Page 21: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 21

Samples vs. Populations• Sample

– Here, the Mean and SD describe only the sample from which they were calculated.

• Population– Here, the Mean and SD are intended to describe the

entire population (very rare in practice).

• Sample to Population:– Mean and SD are obtained from a sample, but are

used to estimate the mean and SD of the population (very common in practice).

Page 22: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 22

Sampling Variation

25X 33X 30X 29X

30X

Page 23: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 23Sample Mean

6 7 8 9 10 11 12 13 14

Fre

quen

cy

0

1

2

3

4

Mean = 10SD = 1.22

= 10

M = 8M = 10

M = 9

M = 11

M = 12M = 11

M = 9

M = 10

M = 10

N

sX

The SE can be estimated by dividing the sample standard deviation by the square root of the sample size.

Page 24: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 24

Confidence Intervals • In statistical inference, we attempt to estimate population

parameters using observed sample data.• A confidence interval gives an estimated range of values

which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

• Confidence intervals are constructed at a confidence level, such as 95 %, selected by the user.

• What does this mean? • It means that if the same population is sampled on numerous

occasions and interval estimates are made on each occasion, the resulting intervals would bracket the true population parameter in approximately 95 % of the cases.

Page 25: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 25

Confidence Intervals(see discussion starting on p. 54 of Field textbook)

• Domjan et al. (1998)– ‘Conditioned’ sperm release in Japanese Quail.

• True Mean– 15 Million sperm

• Sample Mean– 17 Million sperm

• Interval estimate– 12 to 22 million (contains true value)– 16 to 18 million (misses true value)– CIs constructed such that 95% contain the true

value.

Page 26: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 26

Page 27: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 27

Showing Confidence Intervals Visually

Page 28: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 28

Types of Hypotheses• Null hypothesis, H0

– There is no effect.– Example: Big Brother contestants and members of the

public will not differ in their scores on personality disorder questionnaires

• The alternative hypothesis, H1

– AKA the experimental hypothesis– Example: Big Brother contestants will score higher on

personality disorder questionnaires than members of the public

Page 29: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 29

Test Statistics• A Statistic for which the frequency of

particular values is known.• Observed values can be used to test

hypotheses.

Page 30: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 30

What does Null Hypothesis Significance Testing (NHST) Tell Us?

• The importance of an effect?– No, significance depends on the sample size.

• That the null hypothesis is false?– No, it is always false.

• That the null hypothesis is true?– No, it is never true.

• One problem with NHST is that it encourages all or nothing thinking.

Page 31: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 31

One- and Two-Tailed Tests

Page 32: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 32

Type I and Type II Errors

• Type I error– occurs when we believe that there is a genuine

effect in our population, when in fact there isn’t.– The probability is the α-level (usually .05)

• Type II error– occurs when we believe that there is no effect

in the population when, in reality, there is.– The probability is the β-level (often .2)

Page 33: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 33

Confidence Intervals & Statistical Significance

Page 34: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 34

Effect Sizes

An effect size is a standardized measure of the size of an effect:

– Standardized = so comparable across studies– Not (as) reliant on the sample size– Allows people to objectively evaluate the size

of observed effect.

Page 35: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 35

Effect Size Measures

• There are several effect size measures that can be used:– Cohen’s d– Pearson’s r– Glass’ Δ– Hedges’ g– Odds Ratio/Risk rates

• Pearson’s r is a good intuitive measure– Oh, apart from when group sizes are different …

Page 36: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 36

Effect Size Measures

• r = .1, d = .2 (small effect):– the effect explains 1% of the total variance.

• r = .3, d = .5 (medium effect):– the effect accounts for 9% of the total variance.

• r = .5, d = .8 (large effect):– the effect accounts for 25% of the variance.

• Beware of these ‘canned’ effect sizes though:– The size of effect should be placed within the research

context.

Page 37: Department of Cognitive Science Michael J. Kalsher PSYC 4310 COGS 6310 MGMT 6969 © 2015, Michael Kalsher Unit 1B: Everything you wanted to know about basic.

Slide 37

Important ConceptsAlternative hypothesis Standard error of the mean (SE)

Confidence interval Sum of squared errors (SS)

Degrees of freedom Test statistic

Deviation Unexplained variance

Effect Size (measures)

Explained variance

Frequency distribution

Mean

Mean Squared Error (MSe)

Model

Null hypothesis

Null Hypothesis Significance Testing (NHST)

Population

Sample

Standard deviation