Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis...

27
1 Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo [email protected]

Transcript of Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis...

1

Introducing Statistical Inference with Randomization Tests

Allan Rossman

Cal Poly – San Luis Obispo

[email protected]

222

Outline

2×2 tables Activity/example 1: Dolphin therapy? Activity/example 2: Murderous nurse?

Quantitative response Activity/example 3: Sleep deprivation? Activity/example 4: Age discrimination? Activity/example 5: Memory study?

Extensions, reflections, further reading

333

Example 1: Dolphin therapy?

Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment

Is dolphin therapy more effective than control? Core question of inference:

Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

Dolphin therapy Control group TotalSubject improved 10 3 13Subject did not 5 12 17

Total 15 15 30Proportion 0.667 0.200

444

Example 1 (cont.)

Standard approach: Could calculate test statistic, p-value from approximate sampling distribution (z, chi-square) But technical conditions do not hold But this would be approximate anyway But how does this relate to what “significance” means?

555

Example 1 (cont.)

Alternative: Simulate random assignment process many times, see how often such an extreme result occurs Assume no treatment effect (null model) Re-randomize 30 subjects to two groups (using cards)

Assuming 13 improvers, 17 non-improvers regardless Determine number of improvers in dolphin group

Or, equivalently, difference in improvement proportions Repeat large number of times (turn to computer) Ask whether observed result is in tail of distribution

Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective

666

Example 1 (cont.)

www.rossmanchance.com/applets (Dolphin study)

777

Example 1 (cont.)

Conclusion: Experimental result is statistically significant What does that mean; what is logic behind that?

Experimental result very unlikely to occur by chance alone A difference in success proportions at least as large

as .467 (in favor of dolphin group) would happen in less than 2% of all possible random assignments if dolphin therapy was not effective

8

Example 1 (cont.)

Exact randomization distribution Hypergeometric distribution Fisher’s Exact Test p-value =

= .0127 0.30

0.25

0.20

0.15

0.10

0.05

0.00

X

Pro

bability

10

0.0127

3

Distribution PlotHypergeometric, N=30, M=13, n=15

15

30

2

17

13

13

3

17

12

13

4

17

11

13

5

17

10

13

9

Example 2: Murderous Nurse?

Murder trial: U.S. vs. Kristin Gilbert Accused of giving patients fatal dose of heart stimulant Data presented for 18 months of 8-hour shifts

Relative risk: 6.34

Gilbert on shift Gilbert not on shift TotalDeath occurred 40 34 74

No death 217 1350 1567Total 257 1384 1641

Proportion 0.156 0.025

10

Example 2 (cont.)

Structurally the same as previous example, but with one crucial difference No random assignment to groups

Observational study Allows many potential explanations other than “random

chance” Confounding variables Perhaps she worked intensive care unit or night shift

Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled

out as an explanation Some statisticians disagree

11

Example 2 (cont.)

Simulation results

p-value: less than 1 in a billion

12

Example 2 (cont.)

Incredibly unlikely to observe such a difference/ratio by chance alone, if there were no difference between the groups But this does not prove, or perhaps even strongly

suggest, guilt Observational study Allows many potential explanations other than “random

chance” Confounding variables Perhaps she worked intensive care unit or night shift

131313

Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects

on cognitive functioning three days later? 21 subjects; random assignment

Core question of inference: Is such an extreme difference unlikely to occur by

chance (random assignment) alone (if there were no treatment effect)?

improvement

sleep c

onditio

n

4032241680-8-16

deprived

unrestricted

141414

Example 3 (cont.)

Could calculate test statistic, p-value from approximate “sampling” distribution (if conditions are met)

68.2

93.5

92.15

1073.14

1117.12

90.382.1922

2

22

1

21

21

ns

ns

xxt

008.68.2Pr ? tvaluep

151515

Example 3 (cont.)

Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs

Start with tactile simulation using index cards Write each “score” on a card Shuffle the cards Randomly deal out 11 for deprived group, 10 for unrestricted

group Calculate difference in group means Repeat many times

16

Example 3 (cont.)

Then use technology to simulate this randomization process

Applet: www.rossmanchance.com/applets/ (Randomization Tests)

difference in group means by random assignment

num

ber

of ra

ndom

izations

181260-6-12-18

120

100

80

60

40

20

0

= 13 / 1000approx p-value

17

Example 3 (cont.)

Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later Justification: Experimental results as extreme as

those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation

Easy to analyze medians instead

18

Example 3 (cont.)

Exact randomization distribution:

Exact p-value 2533/352,716 = .0072

19

Example 4: Age discrimination? Martin vs. Westvaco (Statistics in Action) Employee ages:

25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Fired employee ages in bold:

25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Robert Martin: 55 years old Do the data provide evidence that the firing process

was not “random” How unlikely is it that a “random” firing process would

produce such a large average age?

20

Example 4 (cont.)

Exact permutation distribution:

Exact p-value: 6 / 120 = .05

56524844403632

20

15

10

5

0

mean age (fired)

Frequency

21

Example 5: Memorizing letters You will be given a string of 30 letters

Memorize as many as you can in 20 seconds (in order)

Design questions What kind of study is this? What kind of randomness was used in this study? What are the variable, and what kind are they?

Analysis questions Do boxplots suggest a significant difference? Simulate a randomization test, interpret the results

22

Extensions

Matched pairs design Randomize within pairs (e.g., by flipping coin)

Comparing more than 2 groups Alternative to chi-square, ANOVA Same use of randomization

Somewhat harder to define test statistic

Regression/correlation Randomize/permute one of the variables

232323

Reflections

You can do this at beginning of course Then repeat for new scenarios with more richness Spiraling could lead to deeper conceptual understanding

Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies

Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized

Flexibility in choice of test statistic (e.g. medians, odds ratio) Generalize to more than two groups

Takes advantage of modern computing power Does not require assumptions of normality

24

Fisher on randomization tests“The statistician does not carry out this very

simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.” – R.A. Fisher (1936)

252525

Ptolemaic curriculum?

“Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.”

– George Cobb (TISE, 2007)

26

Further reading

Ernst (2005), Statistical Science Scheaffer and Tabor (2008), Mathematics

Teacher Rossman (2008), Statistics Education

Research Journal Statistics: A Guide to the Unknown (ed. R.

Peck) NSF-funded project:

http://statweb.calpoly.edu/csi/

272727

More information

Please feel free to contact me [email protected]

Thanks very much!