A Tale of Three Numbers

75
A Tale of Three Numbers Statistical Significance, Effect Size, and Sample Size

description

A Tale of Three Numbers. Statistical Significance, Effect Size, and Sample Size. Brief review. Causation vs. Correlation. When two variables A and B are correlated, there are four possibilities: A causes B B causes A A common cause C causes both A and B The correlation is accidental. - PowerPoint PPT Presentation

Transcript of A Tale of Three Numbers

Page 1: A Tale of Three Numbers

A Tale of Three Numbers

Statistical Significance,Effect Size, and

Sample Size

Page 2: A Tale of Three Numbers

BRIEF REVIEW

Page 3: A Tale of Three Numbers

Causation vs. Correlation

When two variables A and B are correlated, there are four possibilities:

1. A causes B2. B causes A3. A common cause C causes both A and B4. The correlation is accidental

Page 4: A Tale of Three Numbers

So, discovering that countries with democratic elections get in fewer wars, we might conclude:

1. Democracy causes peace.2. Peace causes democracy.3. Christianity causes both democracy and peace.4. Democracy and peace are only accidentally

correlated.

Page 5: A Tale of Three Numbers

Observational Studies

Importantly, if we just observe the facts and collect data on how things are, we cannot tell which hypothesis is true.

Observational studies find correlations, not the causal structure of the world. (This is what HW4 was about.)

Page 6: A Tale of Three Numbers

The Best Evidence

So far, we’ve learned that a good experiment or clinical trial is:

• Randomized• Double-blind• Controlled

This is often abbreviated ‘RCT’: Randomized Controlled Trial.

Page 7: A Tale of Three Numbers
Page 8: A Tale of Three Numbers
Page 9: A Tale of Three Numbers

Controls

An experiment with no controls is useless.

It tells us what happens when we do X, but not what happens when we don’t do X (control).

Maybe the same results would happen from not doing X. Maybe X does nothing. Or a lot. Or a little. With no controls, it is impossible to tell.

Page 10: A Tale of Three Numbers

Randomization

An experiment or trial is randomized when each person who is participating in the experiment/ trial has a fair and equal chance of ending up either in the control group or the experimental group.

Page 11: A Tale of Three Numbers

Benefits of Randomization

Proper randomization:

Minimizes experimenter bias– the experimenter can’t bias who goes into which group.

Minimizes allocation bias– lowers the chance that the control group and experimental group differ in important ways.

Page 12: A Tale of Three Numbers

Selection Bias

Randomization cannot get rid of all selection bias.

For example, many psychology experiments are just performed on American undergraduates by their professors.

This means both groups over-represent young Westerners. (“Sampling bias”)

Page 13: A Tale of Three Numbers

Allocation Bias

Randomization also guards against allocation bias, where the control group and experimental group are different in important ways.

For example, if you assign the first 20 people to enroll in the experiment to the control and the next 20 to the experimental group, there may be allocation bias: the first to enroll may be more eager to take part, because they are sicker.

Page 14: A Tale of Three Numbers

The Importance of Randomization

Previously we saw that improper randomization procedures on average exaggerated effects by 41%.

This is an average result, so improper randomization often leads to exaggerations that are even larger than 41%.

Page 15: A Tale of Three Numbers

Why RCTs?

The importance of the experimental method (as opposed to scientific observation) is that it allows us to discern the causal structure of the world.

Page 16: A Tale of Three Numbers

Causal Structure

If we find a correlation between our experimental treatment T and our desired outcome O, we can rule out:• O caused T in the experiment.• A common cause C caused both O and T in the

experiment.

Page 17: A Tale of Three Numbers

Causal Structure

But can we determine whether the correlation between T and O is real in the first place and not accidental?

Yes!

Page 18: A Tale of Three Numbers

STATISTICAL SIGNIFICANCE

Page 19: A Tale of Three Numbers

Statistical Significance

We say that an experimental correlation is statistically significant if it’s unlikely to be accidental.

How can we tell when it’s unlikely to be accidental?

Page 20: A Tale of Three Numbers

Null Hypothesis

We give a name to the claim that there is no causal connection between the variables being studied.

It is called the null hypothesis.

Our goal is to reject the null hypothesis when it is false, and to accept it when it is true.

Page 21: A Tale of Three Numbers
Page 22: A Tale of Three Numbers

Rejecting the Null Hypothesis

All experimental data is consistent with the null hypothesis. Any correlation can always be due entirely to chance.

But sometimes the null hypothesis doesn’t fit the data very well. When the null hypothesis suggests that our actual observations are very unlikely, we reject the null hypothesis.

Page 23: A Tale of Three Numbers

P-Values

One way to characterize the significance of an observed correlation is with a p-value.

The p-value is the probability that we would observe our data on the assumption that the null hypothesis is true.

p = P(observations/ null hypothesis = true)

Page 24: A Tale of Three Numbers

P-Values

Obviously lower p-values are better, that means your observed correlation is more likely to be true.

In science we have an arbitrary cut-off point, 5%. We say that an experimental result with p < .05 is statistically significant.

Page 25: A Tale of Three Numbers

Statistical Significance

What does p < .05 mean?

It means that the probability that our experimental results would happen if the null hypothesis is true is less than 5%.

According to the null hypothesis, there is less than a 1 in 20 chance that we would obtain these results.

Page 26: A Tale of Three Numbers

Note

Importantly, p-values are not measures of how likely the null hypothesis is, given the data. They are measures of how likely the data is, given the null hypothesis.

p = P(data/ null hypothesis = true) ≠

P(null hypothesis = true/ data)

Page 27: A Tale of Three Numbers

Example

Suppose I have a coin, and I hypothesize that the coin is biased toward heads.

The null hypothesis might be “this is a fair coin, it is equally likely to land heads or tails”.

Suppose I then flip it 5 times and it lands HHHHH– heads 5 times in a row.

Page 28: A Tale of Three Numbers

Example

We know that the probability of this happening if the coin is fair is 1/25 = 1/32 = 0.03125 or about 3%.

P(HHHHH/ the coin is fair) =P(HHHHH/ null hypothesis = true) =p = 3%

Page 29: A Tale of Three Numbers

Example

So p = .03 < .05, and we can reject the null hypothesis. The bias toward heads is statistically significant.

Page 30: A Tale of Three Numbers

Importance

Just because the results of an experiment (or observational study) are “statistically significant” does not mean the revealed correlations are important.

The effect size also matters, that is the strength of the correlation.

Page 31: A Tale of Three Numbers

EFFECT SIZES

Page 32: A Tale of Three Numbers
Page 33: A Tale of Three Numbers

Effect Size

One NAEP analysis of 100,000 American students found that science test scores for men were higher than the test scores for women, and this effect was statistically significant

These results are unlikely if the null hypothesis, that gender plays no role in science scores, were true.

Page 34: A Tale of Three Numbers

Effect Size

However, the average difference between men and women on the test was just 4 points out of 300, or 1.3% of the total score.

Yes, there was a real (statistically significant) difference. It was just a very, very small difference.

Page 35: A Tale of Three Numbers

Effect Size

One way to put the point might be: “p-values tell you when to reject the null hypothesis. But they do not tell you when to care about the results.”

Page 36: A Tale of Three Numbers

Measures of Effect Size

There are lots of measures of effect size:

Pearson’s r, Cohen’s f, Cohen’s d, Hedges’ g, Cramér’s V,…

Here we will just talk about two measures that are commonly reported: odds ratios and relative risks.

Page 37: A Tale of Three Numbers

Odds Ratio

First, let’s introduce the idea of a binary variable. A binary variable is a variable that can have only two values.

“height” is not a binary variable, because there are more than two heights people can have.“got an A” is a binary variable, because either you got an A or you didn’t.

Page 38: A Tale of Three Numbers

Odds

Whenever you have a binary variable, you can ask about the odds of that variable– what are the odds of getting an A?

If 10 students got A’s out of 50 students, then 10 students passed and 40 failed. The odds of getting an A are 10:40 or 1:4 or 25%.

Page 39: A Tale of Three Numbers

Odds vs. Probabilities

Odds are not probabilities. There are 50 students and 10 of them got A’s.

The probability of getting an A: 10/50 = 20%

The odds of getting an A: 10/40 = 25%

Page 40: A Tale of Three Numbers

Odds Ratios

Suppose I have another binary variable “studied”– students either studied for the exam or they didn’t.

I can ask about the odds that a student who studied got an A, and the odds that a student who didn’t study got an A.

Page 41: A Tale of Three Numbers

In Table Format

Got an A = yes

Got an A = no

Totals

Study = yes 6 15 21Study = no 4 25 29Totals 10 40 50

Page 42: A Tale of Three Numbers

Odds Ratio

So the odds of getting an A among studiers are 6:15 or 40%.

And the odds of getting an A among non-studiers are 4/25 or 16%.

Page 43: A Tale of Three Numbers

Odds Ratio

The odds ratio is the ratio of these odds, or 40%:16% ≈ 2.5

This means that (in our example) studying raises the odds that someone will get an A by 150%.

Alternatively: a student who studies has two and a half times better odds of getting an A.

Page 44: A Tale of Three Numbers

Relative Risk

While odds ratios are appropriate when we have two correlated binary variables in an observational study (as when I observe the effects of studying on getting an A), the effect sizes in RCTs are usually reported by relative risks, which are also called risk ratios.

Page 45: A Tale of Three Numbers

Relative Risk

Relative risks are just like odds ratios except they compare probabilities and not odds.

The odds that a studying student passes are 6:15 = 40%

The probability is 6/(6 + 15) = 6/21 ≈ 29%

Page 46: A Tale of Three Numbers

Example

The odds that a non-studying student passes are 4:25 = 16%.

The probability is 4/(4 + 25) = 4/29 ≈ 14%.

Page 47: A Tale of Three Numbers

Example

Whereas the odds ratio was 40:16 = 250%, we get a relative risk of:

29%:14% = 29:14 = 2.07 = 207%

These numbers are similar, but obviously not the same. The risk ratio tells you that a student who studies is twice as likely to get an A.

Page 48: A Tale of Three Numbers

Relation

As the probabilities of events get smaller the odds approach the probabilities, and odds ratios and relative risks are similar.

However, as the probabilities of the events get higher, the odds and risk ratios get very different.

Page 49: A Tale of Three Numbers

Here’s our table again…

Got an A = yes

Got an A = no

Totals

Study = yes 6 15 21Study = no 4 25 29

Totals 10 40 50

Page 50: A Tale of Three Numbers

Odds Ratio for High Probability Events

The probability of not getting an A is much higher than the probability of getting an A: 40/50 >> 10/50. The odds of study = no, A = no: 25/4 = 6.25The odds of study = yes, A = no: 15/6 = 2.5Odds ratio: 6.25/2.5 = 250%.Not studying increases odds of A = no by one and a half times.

Page 51: A Tale of Three Numbers

Relative Risk for High Probability Events

What about probabilities?

P(A = no/ study = no) = 86%P(A = no/ study = yes) = 71%Relative risk = 86/71 = 121%

So not studying increases your risk of not getting an A by 21%.

Page 52: A Tale of Three Numbers

What This Means

What this means is that if you see an effect size reported in the news you must know whether it is an odds ratio or a risk ratio.

Otherwise a seemingly very big difference might actually be a very small difference.

Page 53: A Tale of Three Numbers

Real Life Case

Here’s a real headline from the NY Times:

“Doctors are only 60% as likely to order cardiac catheterization for women and blacks as for men and whites.”

This sounds like a risk ratio. Doctors refer white men n% of the time and blacks and women 60% of n% of the time. Right?

Page 54: A Tale of Three Numbers

Large Difference in Risk!

The study found that doctors referred white men to heart specialists 90.6% of the time.

If the “60%” figure is a risk ratio, then they referred blacks and women 60% x 90.6% = 54.4% of the time.

That’s a big difference!

Page 55: A Tale of Three Numbers

Actually… No

But people who write newspaper articles don’t understand odds ratios and risk ratios.

The probability of a doctor referring a black man or a woman to a heart specialist was 84.7%, not 54.4%.

The article was confusing an odds ratio with a risk ratio.

Page 56: A Tale of Three Numbers

What’s Going On?

If 90.6% of white males were referred, then 9.4% were not referred, and so a white male's odds of being referred were 90.6/9.4 ≈ 9.6.

Since 84.7% of blacks and women were referred, 15.3% were not referred, and so for them, the odds of referral were 84.7/15.3 ≈ 5.5.

Page 57: A Tale of Three Numbers

The odds ratio was therefore 5.5/9.6 ≈ 60%. The odds of a referral if you were black or a woman were about 60% of the odds of referral if you were a white man.

But the risk ratio was much higher. If you were black or a woman, the probability that you would be referred was 93% of the probability that a white man would be referred.

Page 58: A Tale of Three Numbers

This Happens All the Time

This is from “Childhood Asthma Gene Identified by scientists” from the Science Editor of The Independent:

“Inheriting the gene raises the risk of developing asthma by between 60 and 70 per cent– enough for researchers to believe that the discovery may eventually open the way to new treatments for the condition.”

Page 59: A Tale of Three Numbers

Complete Misrepresentation

The quote talks about “raising the risk”.

Is that what the scientists found? No. The number reported wasn’t an increased risk, it was an odds ratio.

The gene only raised the risk of asthma by 19%.

Page 60: A Tale of Three Numbers

More “Science Editors”

This quote is from the Science Editor of the London Times in “Genetic breakthrough offers MS sufferers new hopes for treatment”:

“Research has identified two genetic variants that each raises a person's risk of developing MS by about 30 per cent, shedding new light on the origins of the autoimmune disease that could ultimately lead to better therapies”

Page 61: A Tale of Three Numbers

Complete Misrepresentation

The quote talks about genes “raising the risk” of Multiple Sclerosis.

Is that what the researchers found? No!

The first gene raised the risk of MS by only 3%, the second by only 4%!

Page 62: A Tale of Three Numbers

SAMPLE SIZE

Page 63: A Tale of Three Numbers

Sample

In statistics, the people who we are studying are called the sample.

Our question is then: what sample size is needed for a result that generalizes to the population?

Page 64: A Tale of Three Numbers

Non-Random Samples

The first thing we should realize is that it’s not going to do us any good to ask a non-random group of people.

Suppose everyone who goes to ILoveMitt.com is voting for Mitt. If I ask them, it will seem like 100% of the population will vote for Mitt, even if only 3% will really vote for him.

Page 65: A Tale of Three Numbers

Internet Polls

Internet polls are not trustworthy. They are biased toward people who have the internet, people who visit the site that the poll is on, and people who care enough to vote on a useless internet poll.

Page 66: A Tale of Three Numbers

Representative Samples

The opposite of a biased sample is a representative sample.

A perfectly representative sample is one where if n% of the population is X, then n% of the sample is X, for every X.

For example, if 10% of the population smokes, 10% of the sample smokes.

Page 67: A Tale of Three Numbers

Random Sampling

One way to get a representative sample is to randomly select people from the population, so that each has a fair and equal chance of ending up in the sample.

Page 68: A Tale of Three Numbers
Page 69: A Tale of Three Numbers

Confidence Interval

Suppose I poll a sample of some population and find out that 50% of the sample will vote for candidate C. I might be:• 90% certain that 48-52% of the population will

vote for C• 95% certain that 45-55% of the population will

vote for C• 99% certain that 40-50% of the population will

vote for C

Page 70: A Tale of Three Numbers
Page 71: A Tale of Three Numbers

Margin of Error

The margin error is half of some confidence interval (usually 95%).

So if I’m 95% certain that between 45 and 55% of people will vote for C, then the margin of error is ±5%.

Page 72: A Tale of Three Numbers
Page 73: A Tale of Three Numbers

Error Bars

Often, confidence intervals/ margins of error are presented graphically as error bars.

Page 74: A Tale of Three Numbers
Page 75: A Tale of Three Numbers