Creating Randomization Distributions

36
Section 4.4 Creating Randomization Distributions

description

Section 4.4. Creating Randomization Distributions. Randomization Distributions. How do we estimate P - values using randomization distributions? Today we’ll discuss ways to simulate randomization samples for a variety of situations. Simulate samples, assuming H 0 is true - PowerPoint PPT Presentation

Transcript of Creating Randomization Distributions

Page 1: Creating  Randomization Distributions

Section 4.4

Creating Randomization

Distributions

Page 2: Creating  Randomization Distributions

Randomization Distributions

How do we estimate P-values using randomization distributions?

Today we’ll discuss ways to simulate randomization samples for a variety of situations.

1. Simulate samples, assuming H0 is true2. Calculate the statistic of interest for each sample3. Find the p-value as the proportion of simulated

statistics as extreme as the observed statistic

Page 3: Creating  Randomization Distributions

• In a randomized experiment on treating cocaine addiction, 48 people were randomly assigned to take either Desipramine (a new drug), or Lithium (an existing drug), and then followed to see who relapsed

• Question of interest: Is Desipramine better than Lithium at treating cocaine addiction?

Cocaine Addiction

Page 4: Creating  Randomization Distributions

•What are the null and alternative hypotheses?

•What are the possible conclusions?

Cocaine Addiction

Page 5: Creating  Randomization Distributions

• What are the null and alternative hypotheses?

• What are the possible conclusions?

Cocaine Addiction

Reject H0: Desipramine is better than LithiumDo not reject H0: We cannot determine from these data whether Desipramine is better than Lithium

Let pD, pL be the proportion of cocaine addicts who relapse after taking Desipramine or Lithium, respectively.

H0: pD = pL Ha: pD < pL

Page 6: Creating  Randomization Distributions

R R R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R

R R R R R R

R R R R R R

R R R R R R

R R R R

R R R R R R

R R R R R R

R R R R R R

Desipramine Lithium

1. Randomly assign units to treatment groups

Page 7: Creating  Randomization Distributions

R R R R

R R R R R R

R R R R R R

N N N N N N

RRR R R R

R R R R N N

N N N N N N

RR

N N N N N N

R = RelapseN = No Relapse

R R R R

R R R R R R

R R R R R R

N N N N N N

RRR R R R

R R R R RR

R R N N N N

RR

N N N N N N

2. Conduct experiment

3. Observe relapse counts in each group

LithiumDesipramine

10 relapse, 14 no relapse 18 relapse, 6 no relapse

1. Randomly assign units to treatment groups

10 1824

ˆ ˆ

24.333

D Lp p

Page 8: Creating  Randomization Distributions

To see if a statistic provides evidence against H0, we need to

see what kind of sample statistics we would observe,

just by random chance, if H0 were true

Measuring Evidence against H0

Page 9: Creating  Randomization Distributions

• “by random chance” means by the random assignment to the two treatment groups

• “if H0 were true” means if the two drugs were equally effective at preventing relapses (equivalently: whether a person relapses or not does not depend on which drug is taken)

• Simulate what would happen just by random chance, if H0 were true…

Cocaine Addiction

Page 10: Creating  Randomization Distributions

R R R R

R R R R R R

R R R R R R

N N N N N N

RRR R R R

R R R R N N

N N N N N N

RR

N N N N N N

10 relapse, 14 no relapse 18 relapse, 6 no relapse

Page 11: Creating  Randomization Distributions

R R R R R R

R R R R N N

N N N N N N

N N N N N N

R R R R R R

R R R R R R

R R R R R R

N N N N N N

R N R N

R R R R R R

R N R R R N

R N N N R R

N N N R

N R R N N N

N R N R R N

R N R R R R

Simulate another randomization

Desipramine Lithium

16 relapse, 8 no relapse 12 relapse, 12 no relapse

ˆ ˆ16 1224 240.167

LDp p

Page 12: Creating  Randomization Distributions

R R R R

R R R R R R

R R R R R R

N N N N N N

RRR R R R

R N R R N N

R R N R N R

RR

R N R N R R

Simulate another randomization

Desipramine Lithium

17 relapse, 7 no relapse 11 relapse, 13 no relapse

ˆ ˆ17 1124 240.250

D Lp p

Page 13: Creating  Randomization Distributions

Simulate Your Own SampleIn the experiment, 28 people relapsed and 20 people

did not relapse. Create cards or slips of paper with 28 “R” values and 20 “N” values.

Pool these response values together, and randomly divide them into two groups (representing Desipramine and Lithium)

Calculate your difference in proportions

Plot your statistic on the class dotplot

To create an entire randomization distribution, we simulate this process many more times with technology: StatKey

Page 14: Creating  Randomization Distributions

www.lock5stat.com/statkey

p-value

Page 15: Creating  Randomization Distributions

Randomization Distribution Center

A randomization distribution is centered at the value of the parameter

given in the null hypothesis.

A randomization distribution simulates samples assuming the null hypothesis is true, so

Page 16: Creating  Randomization Distributions

Randomization Distribution

In a hypothesis test for H0: = 12 vs Ha: < 12, we have a sample with n = 45 and

What do we require about the method to produce randomization samples?

a) = 12b) < 12c)

We need to generate randomization samples assuming the null hypothesis is true.

Page 17: Creating  Randomization Distributions

Randomization Distribution

In a hypothesis test for H0: = 12 vs Ha: < 12, we have a sample with n = 45 and .

Where will the randomization distribution be centered?

a) 10.2b) 12c) 45d) 1.8

Randomization distributions are always centered around the null hypothesized value.

Page 18: Creating  Randomization Distributions

Randomization Distribution

In a hypothesis test for H0: = 12 vs Ha: < 12, we have a sample with n = 45 and

What will we look for on the randomization distribution?

a) How extreme 10.2 is b) How extreme 12 isc) How extreme 45 isd) What the standard error ise) How many randomization samples we collected

We want to see how extreme the observed statistic is.

Page 19: Creating  Randomization Distributions

Randomization Distribution

In a hypothesis test for H0: 1 = 2 , Ha: 1 > 2

sample mean #1 = 26 and sample mean #2 = 21.

What do we require about the method to produce the randomization samples?

a) 1 = 2

b) 1 > 2

c) 26, 21d)

We need to generate randomization samples assuming the null hypothesis is true.

Page 20: Creating  Randomization Distributions

Randomization Distribution

a) 0b) 1c) 21d) 26e) 5

The randomization distribution is centered around the null hypothesized value,1 - 2 = 0

In a hypothesis test for H0: 1 = 2 , Ha: 1 > 2

sample mean #1 = 26 and sample mean #2 = 21.

Where will the randomization distribution be centered?

Page 21: Creating  Randomization Distributions

Randomization Distribution

a) The standard errorb) The center pointc) How extreme 26 isd) How extreme 21 ise) How extreme 5 is

We want to see how extreme the observed difference in means is.

In a hypothesis test for H0: 1 = 2 , Ha: 1 > 2

sample mean #1 = 26 and sample mean #2 = 21.

What do we look for in the randomization distribution?

Page 22: Creating  Randomization Distributions

Randomization Distribution

For a randomization distribution, each simulated sample should…

• be consistent with the null hypothesis• use the data in the observed sample• reflect the way the data were collected

Page 23: Creating  Randomization Distributions

In randomized experiments the “randomness” is the random allocation to treatment groups

• If the null hypothesis is true, the response values would be the same, regardless of treatment group assignment

• To simulate what would happen just by random chance, if H0 were true:

Reallocate cases to treatment groups, keeping the response values the same

Randomized Experiments

Page 24: Creating  Randomization Distributions

Observational StudiesIn observational studies, the “randomness” is random sampling from the population

To simulate what would happen, just by random chance, if H0 were true:

Simulate drawing samples from a population in which H0 is true

How do we simulate sampling from a population in which H0 is true when we only have sample data?

Adjust the sample to make H0 true, then bootstrap!

Page 25: Creating  Randomization Distributions

Let the average human body temperatureH0: = 98.6Ha: ≠ 98.6

• Adjust the sample by adding 98.6 – 98.26 = 0.34 to each value. The sample mean becomes 98.6, exactly the value given by the null hypothesis.

• Bootstrapping the adjusted sample allows us to simulate drawing samples as if the null is true!

Body Temperatures

sample mean = 98.26

Page 26: Creating  Randomization Distributions

In StatKey, when we enter the null hypothesis, this shifting is automatically done for us

StatKey

Body Temperatures

p-value = 0.002

Page 27: Creating  Randomization Distributions

Exercise and Gender

1. State null and alternative hypotheses2. Devise a way to generate a randomization

sample that• Uses the observed sample data• Makes the null hypothesis true• Reflects the way the data were collected

Do males exercise more hours per week than females?

sample mean differencexm– xf = 3

Page 28: Creating  Randomization Distributions

Exercise and Gender1. H0: m = f Ha: m > f

2. Generating a randomization distribution can be done with the “shift groups” method:

• To make H0 true set the sample means equal by adding 3 to every female value. Now bootstrap from this modified sample

Note: There are other ways. In StatKey, the default randomization method is “Reallocate Groups”, but “Shift Groups” is also an option.

Page 29: Creating  Randomization Distributions

Exercise and Gender

p-value = 0.095

Page 30: Creating  Randomization Distributions

Exercise and Gender

The p-value is 0.095. Using α = 0.05, we conclude….

a) Males exercise more than females, on average

b) Males do not exercise more than females, on average

c) Nothing Do not reject the null… we can’t conclude anything.

Page 31: Creating  Randomization Distributions

Blood Pressure and Heart RateIs blood pressure negatively correlated with heart rate?

1. State null and alternative hypotheses2. Devise a way to generate a randomization sample

that• Uses the observed sample data• Makes the null hypothesis true• Reflects the way the data were collected

sample correlationr = -0.037

Page 32: Creating  Randomization Distributions

Blood Pressure and Heart Rate1. H0: = 0 Ha: < 0

2. Generating a randomization distribution:

Two variables have correlation 0 if they are not associated (null hypothesis). We can “break the association” by randomly shuffling one of the variables.

Each time we do this, we get a sample we might observe just by random chance, if there really is no correlation

Page 33: Creating  Randomization Distributions

Blood Pressure and Heart Rate

p-value = 0.219

Even if blood pressure and heart rate are not correlated, we would see correlations this extreme about 22% of the time, just by random chance.

Page 34: Creating  Randomization Distributions

Randomization Distributions:

Cocaine Addiction (randomized experiment)Rerandomize cases to treatment groups, keeping

response values fixed

Body Temperature (single mean)Shift to make H0 true, then bootstrap

Exercise and Gender (observational study)Shift to make H0 true, then bootstrap

Blood Pressure and Heart Rate (correlation)Randomly shuffle one variable

Page 35: Creating  Randomization Distributions

• As long as the original data is used and the null hypothesis is true for the randomization samples, most methods usually give similar p-values

• StatKey generates the randomizations for us. We will not be concerned with the details of the process. It is enough to understand the general principles.

Generating Randomization Samples

Page 36: Creating  Randomization Distributions

SummaryRandomization samples should be generated

• Consistent with the null hypothesis• Using the observed data• Reflecting the way the data were collected

The specific method varies with the situation, but the general idea is always the same