Investigations for Introducing Mathematically Inclined Students to Statistics Allan...

34
Investigations for Introducing Mathematically Inclined Students to Statistics Allan Rossman ([email protected]) Beth Chance ([email protected])

Transcript of Investigations for Introducing Mathematically Inclined Students to Statistics Allan...

Investigations for Introducing Mathematically Inclined Students to StatisticsAllan Rossman ([email protected])

Beth Chance ([email protected])

Student Audience

Introductory statistics course for mathematically inclined students mathematics and statistics majors future secondary teachers perhaps strong science, engineering, computer

science majors

Goals for the Course?

Brainstorm your goals for these students, particularly with attention to whether and how these goals differ from service courses (5 min)

Reporter summarize top three goals

Summary of Goals

Efforts for Math Stat/Prob Sequence Supplement with data analysis component

Witmer’s Data Analysis: An Introduction Infuse data and applications

Rice’s Mathematical Statistics and Data Analysis Use lab activities

Nolan and Speed’s StatLabs Baglivo’s Mathematica Laboratories for

Mathematical Statistics

Our Project

To develop and provide a:

Data-Oriented, Active Learning, Post-Calculus Introduction to Statistical Concepts, Applications, Theory

Supported by the NSF DUE/CCLI #9950476, 0321973

Guiding Principles

Put students in the role of active investigator Motivate with real studies, genuine data Emphasize connections among study design,

inference technique, scope of conclusions Use simulations frequently Use a variety of computational tools Investigate mathematical underpinnings Introduce probability “just in time” Experience entire statistical process over and over Provide a combination of immediate corrective

formative and summative evaluation of key concepts

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6

Data Collection Observation vs. experiment, confounding, randomization

Random sampling, bias, precision, nonsampling errors

Paired data Independent random samples

Bivariate

Descriptive Statistics

Conditional proportions, segmented bar graphs, odds ratio

Quantitative summaries, transformations, z-scores, resistance

Bar graph Models, Probability plots, trimmed mean

Scatterplots, correlation, simple linear regression

Probability Counting, random variable, expected value

empirical rule Bermoulli processes, rules for variances, expected value

Normal, Central Limit Theorem

Sampling/ Randomization Distribution

Randomization distribution for

Randomization distribution for

Sampling distribution for X,

Large sample sampling distributions for

,

Sampling distributions of , OR,

Chi-square statistic, F statistic, regression coefficients

Model Hypergeometric Binomial Normal, t Normal, t, log-normal

Chi-square, F, t

Statistical Inference

p-value, significance, Fisher’s Exact Test

p-value, significance, effect of variability

Binomial tests and intervals, two-sided p-values, type I/II errors

z-procedures for proportions t-procedures, robustness, bootstrapping

Two-sample z- and t-procedures, bootstrap, CI for OR

Chi-square for homogeneity, independence, ANOVA, regression

21 ˆˆ pp 21 xx p̂

x p̂21 ˆˆ pp

21 xx

Outline

Example Investigations

Full versions available at www.rossmanchance.com/iscam/uscots/

Investigation 1: Sleep Deprivation and Visual Learning (randomization tests)

Investigation 2: Sampling Words (random samples, variability)

Investigation 3: Kissing the Right Way (confidence intervals)

Investigation 4: Sleepless Drivers (CI for Odds Ratio)

Investigation 1: Sleep Deprivation Physiology Experiment

Stickgold, James, and Hobson (2000) studied the long-term effects of sleep deprivation on a visual discrimination task

sleep condition n Mean StDev Median IQR deprived 11 3.90 12.17 4.50 20.7unrestricted 10 19.82 14.73 16.55 19.53

(3 days later!)

Investigation 1: Sleep Deprivation How often would such an extreme

experimental difference occur by chance, if there was no sleep deprivation effect?

Set of 21 index cards with the improvement scores (positive and negative).

Randomly assign 11 of the cards to the sleep deprived group.

Calculate the difference in group means(deprived – unrestricted)

Investigation 1: Sleep Deprivation After this reminder of the randomization

process, students then use a Minitab macro

sample 21 c2 c3

unstack c1 c4 c5;

subs c3.

let c6(k1)=mean(c4)-mean(c5)

let k1=k1+1

Investigation 1: Sleep Deprivation Students investigate this question through

Hands-on simulation (index cards) Computer simulation (Minitab) Exact distribution

p-value=.0072

15.92

p-value .002

Investigation 1: Sleep Deprivation Experience the entire statistical process again

Develop deeper understanding of key ideas (randomization, significance, p-value)

Effect of variability Tools change, but reasoning remains same

Tools based on research study, question – not for their own sake

Simulation as a problem solving tool Empirical vs. exact p-values

Investigation 2: Sampling Words Four score and seven years ago, our fathers brought forth upon this continent a new

nation: conceived in liberty, and dedicated to the proposition that all men are created equal.

Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met on a great battlefield of that war.

We have come to dedicate a portion of that field as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

But, in a larger sense, we cannot dedicate, we cannot consecrate, we cannot hallow this ground. The brave men, living and dead, who struggled here have consecrated it, far above our poor power to add or detract. The world will little note, nor long remember, what we say here, but it can never forget what they did here.

It is for us the living, rather, to be dedicated here to the unfinished work which they who fought here have thus far so nobly advanced. It is rather for us to be here dedicated to the great task remaining before us, that from these honored dead we take increased devotion to that cause for which they gave the last full measure of devotion, that we here highly resolve that these dead shall not have died in vain, that this nation, under God, shall have a new birth of freedom, and that government of the people, by the people, for the people, shall not perish from the earth.

Investigation 2: Sampling Words Examine the average length of words in the

sample

Example Class Results

The population mean of all 268 words is 4.295 letters

Investigation 2: Sampling Words Students use Minitab to select sample and

compare results

Example results

Investigation 2: Sampling Words Then turn to technology (applet)

What is the long-term behavior of this (random) sampling method? Unbiased method?

What happens if we change sample size? Population size?

Investigation 2: Sampling Words Using various forms of technology to support

student conceptual learning Tailored to the context Dynamic, interactive, and visual Easy to use

Confront most common student misconceptions directly

Distinguish randomization from random sampling

Investigation 3: Kissing the Right Way Biopsychology observational study

Güntürkün (2003) recorded the direction turned by kissing couples to see if there was also a right-sided dominance.

Investigation 3: Kissing the Right Way Is 1/2 a plausible value for the probability

a kissing couple turns right? Binomial Simulation applet

Introduce idea of two-sided p-value Is 2/3 a plausible value for the probability

a kissing couple turns right? Discuss calculation of non-symmetric two-sided p-

values

Investigation 3: Kissing the Right Way Have students explore and develop an

“interval” of plausible values for

Later Investigations

Use another applet to explore the meaning of confidence level Wald vs. adjusted Wald z vs. t Robustness of t-intervals

Investigation 3: Kissing the Right Way Encourage students to make predictions and

test their knowledge Use the technology to minimize

computational burden so students focus on concepts

Return to key ideas often, increasing the level of complexity each time

Give them a taste for the modern flavor of statistical practice and methodology

Investigation 4: Sleepless Drivers Sociology case-control study

Connor et al (2002) investigated whether those in recent car accidents had been more sleep deprived than a control group of drivers

  No full night’s sleep in past week

At least one full night’s sleep in

past week

Sample sizes

“case” drivers (crash)  61 474 535

“control” drivers (no crash) 44  544  588

Investigation 4: Sleepless DriversSample proportion that were in a car crash

Sleep deprived: .581Not sleep deprived: .466

Odds ratio: 1.59

How often would such an extreme observed odds ratio occur by chance, if there was no sleep deprivation effect?

0%10%20%30%40%50%60%70%80%90%

100%

No full night’s sleep in pastweek

At least one full night’ssleep in past week

no crash

crash

Investigation 4: Sleepless Drivers Students investigate this question through

Computer simulation (Minitab) Empirical sampling distribution of odds-ratio Empirical p-value

Approximate mathematical model

1.59

Investigation 4: Sleepless Drivers SE(log-odds) =

Confidence interval for population log odds: sample log-odds + z* SE(log-odds) Back-transformation

90% CI for odds ratio: 1.13 – 2.24

dcba

1111

Investigation 4: Sleepless Drivers Students understand process through which

they can investigate statistical ideas Students piece together powerful statistical

tools learned throughout the course to derive new (to them) procedures Concepts, applications, methods, theory

Expectations of Students (Midterm Qs) Issues in sampling, nonsampling errors Understand the implications of improper sampling Analyze data numerically and graphically,

communicate their results Be able to explain how random variability affects the

conclusions we should draw Verbalize student conclusions that follow based on

study design – Causation? Generalizability? Explain the idea behind randomization/sampling

distributions, think statistically Increasing understanding of confidence, p-value

Discussion

Are these worthy goals? Recruiting students into statistics (2nd course…) Preparing future teachers

Is such a course feasible? Learning environment Course structure Integration of technology

What are the essential components in students’ ability and understanding to assess?

For More Information

Applets, data files, other resources:www.rossmanchance.com/iscam/

Faculty development workshop (July 18-22, 2005):www.rossmanchance.com/prep/workshop.html

Review copies of text:www.duxbury.com

Thank you

Allan Rossman ([email protected])

Beth Chance ([email protected])