Causation ? Tim Wiemken, PhD MPH CIC Assistant Professor Division of Infectious Diseases University...

Causation ?Causation ?

Tim Wiemken, PhD MPH CICAssistant Professor

Division of Infectious DiseasesUniversity of Louisville, Kentucky

1. Testing for an Association

3. Confidence Intervals

2. Other Measures of Association

OverviewOverview



Overview 1. Testing for an Association

Null hypothesis: There is no association

Alternative hypothesis: There is an association

1. Develop hypothesis

Testing for Association

1. Develop hypothesis


What P-value will you consider statistically significant?

Usually 0.05 - arguments for bigger/smaller

2. Choose your level of significance

α value


Call your statistician.

• A bad test gives bad results.• A good test may give bad results (bad data?).• A good statistician may tell you if the results are bad, but

cannot always tell you if the data were bad.

3. Choose Your Test


Will tell you if there is an association between two variables

Chi-squared Test



Chi-squared Test


Measures observed versus expected counts in study groups


Chi-squared Test


Measures observed versus expected counts in study groups

Must have adequate sample size

2x2 table – categorical data

Chi-squared Test

Outcome + Outcome -

Predictor +

Predictor -


Example

Research question: Does HIV impact mortality in hospitalized patients with community-acquired

pneumonia?

Hospitalized CAP Patients

HIV+ HIV-

Dead DeadAlive Alive

Does HIV Have an Effect on Patient In-Hospital Mortality?

Example


HIV+ HIV-


Predictor Variable: ?

Example


HIV+ HIV-


Outcome Variable: ?

Example

Significance Level

Null Hypothesis

What Test?


Example


Outcome + Outcome -

Predictor +

Predictor -

Example


+ HIV, - died: - HIV, - died: + HIV, + died :- HIV, + died :

Example


Outcome + Outcome -

Predictor +

Predictor -

Example


How many patients died in-hospital?

Example


How many patients died in-hospital?n=27

Example


How many patients had HIV?

Example


How many patients had HIV?n=30

Example


Dead + Dead -

HIV+

HIV-

Example

n=27

n=30

n=100

=countifs(b2:b101, 1, z2:z101, 1)


How many patients with HIV died?

Example

count the number of cases of deaths (column b, in_hosp_mort=1) that had HIV (column z, hiv=1)


Dead + Dead -

HIV+ 11

HIV-

Example

n=27

n=30

n=100


Dead + Dead -

HIV+ 11

HIV- 27 - 11 = 16

Example

n=27

n=30

n=100


Dead + Dead -

HIV+ 11 30 - 11 = 19

HIV- 27 - 11 = 16

Example

n=27

n=30

n=100

Check this!


Dead + Dead -

HIV+ 11 30 - 11 = 19

HIV- 27 - 11 = 16

Example

n=27

n=30

n=100

=countifs(b2:b101, 0, z2:z101, 1)


Dead + Dead -

HIV+ 11 30 - 11 = 19

HIV- 27 - 11 = 16 100 – (11+16+19) = 54

Example

n=27

n=30

n=100

Plug the data into your excel stats program


Dead + Dead -

HIV+ 11 30 - 11 = 19

HIV- 27 - 11 = 16 100 – (11+16+19) = 54

Example

Do they?

Example

No! P=0.154

P>0.05

Do they?

Example

Where to publish?

ExampleExample

Example

Maybe those without HIV are older than those with HIV, so the mortality ends up the same

Example

How do we check this?

Null Hypothesis:

Example

Alternative Hypothesis:

Null Hypothesis: The age of patients with and without HIV are NOT different.

Example

Alternative Hypothesis: The age of patients with and without HIV ARE different.

Is age different in patients with and without HIV? patients?

Example

Back to your dataset!

Total cases of HIVmean age HIVSD age HIV

Total cases of non-HIVmean age non HIVSD age non HIV

Example

Total Cases

Total cases of HIV

=countif(Z2:Z101,1)

Total cases of non-HIV

=countif(Z2:Z101,0)

Example

Average Age

=averageif(Z2:Z101,1,AN2:AN101)

Example

=averageif(Z2:Z101,0,AN2:AN101)

HIV+

HIV-

Standard Deviations… not as easy.

=stdev(if(Z2:Z101=1,AN2:AN101))

Example

Need to use an Array and a nested IF

HIV+

DON’T HIT ENTER!!!!!!!!!

Standard Deviations… not as easy.

=stdev(if(Z2:Z101=1,AN2:AN101))

Example

Need to use an Array and a nested IF

HIV+

ON WINDOWS: Control+Shift+Enter

ON MAC: Command+Enter

Back to your stats program!

Total cases of HIV = 30mean age HIV: 50.3SD age HIV: 13.62

Total cases of non-HIV = 70mean age non HIV: 56.5SD age non HIV: 15.96

Example

Is it?

Example

NO! P>0.05

Do they?

Example

BUT IT IS SOOOOO CLOSE!




Overview

Used for cohort studies or clinical trials

Gold standard measure for observational studies

1. Risk Ratio

Answers: How much more (less) likely is this group to get an outcome versus this other group?

Measures of Association

Do those admitted to the ICU die more than those not admitted to the ICU?

Example

Use the 2x2 Totals Tab

Total with outcome:

Total without outcome:


Example

Use the 2x2 Totals Tab

Total with outcome: =countif(B2:B101,1)n=27

Total without outcome: 100 – 27n=73


Example

Total with outcome in the ICU:

Total without outcome in the ICU:


Example

Total with outcome in the ICU: =countifs(B2:B101,1,I2:I101,1)n=9

Total without outcome in the ICU:=countifs(B2:B101,0,I2:I101,1)

n=7

Do those admitted to the ICU die more than those not in the ICU?

Example

Dead + Dead -

ICU+ 9 7

ICU- ? ?

P=0.004

Do those admitted to the ICU die more than those not in the ICU?

Example

Dead + Dead -

ICU+ 9 7

ICU- 27 - 9 = 18 73 – 7 = 66

P=0.004

How much more likely are those admitted to the ICU to die?

Example

Risk of death in ICU group: 9/ 9+7= 56.3%

Dead + Dead -

ICU+ 9 7

ICU- 18 66


Example

Risk of death in ICUgroup: 9/ 9+7= 56.3%

Risk of death in non ICU group: 18/ 18+66= 21.4%

Dead + Dead -

ICU+ 9 7

ICU- 18 66


Example

Risk of death in ICUgroup: 9/ 9+7= 56.3%

Risk of death in non ICU group: 18/ 18+66= 21.4%

Dead + Dead -

ICU+ 9 7

ICU- 18 66

Risk Ratio: 0.563/0.214 = 2.63

Interpret the Risk Ratio

Example

Who wants to interpret a risk ratio of 2.63?


Example

Patients admitted to the ICU are 2.63 times more likely to die than those patients not

admitted to the ICU.

Example

CAP Patients

Empiric Atypical Pathogen Coverage

No Empiric Atypical Pathogen

Coverage


Does Empiric Atypical Pathogen Coverage Have an Effect on Patient Mortality?

Example

Assuming a cohort study…

Do those patients who have empiric atypical pathogen coverage die less often

than those without atypical coverage?

+ Atypical : 2220- Atypical : 658+ Atypical + died : 217- Atypical + died : 110

Example


Do those patients who have atypical pathogen coverage die more often than

those without atypical coverage?

Outcome + Outcome -

Predictor +

Predictor -

Example


Do those patients who have empiric atypical pathogen coverage die less often than those without atypical

coverage?

+ Atypical : 2220- Atypical : 658+ Atypical + died : 217- Atypical + died : 110

Example


Do those patients who have atypical pathogen coverage die more often than

those without atypical coverage?

Outcome + Outcome -

Predictor + 217 2003

Predictor - 110 548

Example

Anyone??


Example


Example

Those with atypical coverage are 42% less likely to die as compared to those without atypical coverage

Remember your baseline risk.

What does that mean?

Assuming 8% of CAP patients die, what is the risk of death with empiric atypical pathogen coverage?

Example

What does that mean?

Example

8% x 0.58 = 4.64

Just multiply original risk by the risk ratio!

Even Better:

Example

Number Needed to Treat

1/Absolute Risk Reduction (ARR)

ARR = Unexposed Risk – Exposed Risk

Even Better:

Example


ARR = Unexposed Risk – Exposed Risk

ARR = Risk w/out atypical coverage – Risk w/atypical coverage

Even Better:

Example


Even Better:

Example


16.7 = unexposed risk

16.7 = unexposed risk

Even Better:

Example

Number Needed to Treat9.8 = exposed

risk9.8 = exposed

risk

Even Better:

Example


1 / (16.7 – 9.8) = 15 (round up!)

Need to treat 15 patients to save 1

Used for case-control studies

Is an approximation of the risk ratio

2. Odds Ratio

Answers: How much more (less) likely are those with the outcome to have been in this group versus this other group?


Only a good approximation when the outcome is rare

Can be an extremely bad approximation

2. Odds Ratio

Can correct with a formula

Zhang, J., & Yu, K. F. (1998). What's the relative risk? A method of correcting the odds ratio in cohort studies of common outcomes. JAMA, 280(19), 1690-1691.


Acinetobacter outbreak

You gather information from 100 patients with Acinetobacter and 200 patients without.

Example

Need to identify the risk factors


Select sample based on the outcome (Acinetobacter)

Key:

Example


Because the sample was selected based on the outcome (a subset of everyone who might get the outcome in your

population), you can never know the actual incidence of the outcome in everyone who was exposed.

Cohort Study Sample

Example


Everyone Exposed

Everyone Not Exposed

Outcome

Outcome

Case-Control Study Sample

Example


Subset with Outcome

Subset without Outcome

Exposure Status

Exposure Status

Case-Control Study Sample

Example


Subset with Outcome

Subset without Outcome

Exposure Status

Exposure Status

Cannot know everyone exposed who gets the

outcome

Example

Analyze a number of risk factors to see if they are associated with Acinetobacter infection


+ Acinetobacter : 100- Acinetobacter : 200+ Acinetobacter + wound : 55- Acinetobacter + wound : 10

Outbreak Investigation: Was having a traumatic wound associated with Acinetobacter baumannii

infection?

Example

Assuming a case-control study…

Outbreak Investigation: Was having a traumatic wound associated with Acinetobacter baumannii infection?

Outcome + Outcome -

Predictor +

Predictor -

Example

+ Acinetobacter : 100- Acinetobacter : 200+ Acinetobacter + wound : 55- Acinetobacter + wound : 10

Outbreak Investigation: Was having a traumatic wound associated with Acinetobacter baumannii

infection?

Example

Assuming a case-control study…

Outbreak Investigation: Was having a traumatic wound associated with Acinetobacter baumannii infection?

Acinetobacter + Acinetobacter -

Wound + 55 10

Wound - 45 190

ExampleExample

Anyone??

Interpret the Odds Ratio

Example

Those with Acinetobacter have a 23 times higher odds of having a nonsurgical wound compared to those without Acinetobacter.


Example

What?


Outcome + Outcome -

Predictor +

Predictor -

Order of interpretation:

ExampleExample

Risk: Know the incidence of the outcome.

So what’s the difference?

How you choose your population

Odds: Don’t know the incidence of the outcome.

Risk Versus Odds



You can’t identify the likelihood of someone with a predictor getting an outcome because you don’t know who all had the

outcome

Risk Versus Odds

Correct the Odds

Common Outcomes = Odds is a poor approximation of Risk

Risk Versus Odds

Even Chuck Norris Hates Odds.



Risk Versus Odds

Used for Time-to-event data

As good as the risk ratio

3. Hazard Ratio

Answers: How much more (less) likely are those in this group to get the outcome versus this other group at any given time?





Overview

Patients in the Universe

Patients in the

Sample

Sampling

Generalizing

Confidence IntervalsConfidence Intervals

Uses an arbitrary cutoff (0.05)

Doesn’t give info on precision

P-value is not good.

Doesn’t help you generalize

Confidence Intervals

Fix: Use Confidence Interval

You are 95% confident that the risk (odds) of the patients in the universe is between that interval.

Definition – 95% CI




“Universe” is not everyone in the world – it is everyone you can generalize back to.






If the CI includes 1, that measure of association is not statistically significant (like a P-value >0.05)





‘Tighter’ CI = more power, more precision, larger sample

If the CI includes 1, that measure of association is not statistically significant (like a P-value >0.05)

Caveat


Since CI gets tighter with more people in the sample, every measure of association (except exactly 1) will eventually be significant with a large enough sample size.

Is this risk ratio statistically significant?

Dead + Dead -

Bacteremia + 25 100

Bacteremia - 310 1537


No – 95% Confidence Interval includes 1

Is the RR from the bacteremia example statistically significant?

Risk Ratio: 1.19

95% CI: (0.83,

1.72)


Using the same proportions of Predictors and Outcomes

What happens as we increase the sample size?

Dead + Dead -

Bacteremia + 200 800

Bacteremia - 2500 12400

ExampleExample

Yes – 95% CI does not include 1.

Now is the RR from the bacteremia example statistically significant?

Risk Ratio: 1.19 (Same as

before)95% Confidence Interval:

(1.05, 1.36)

Sample Size

The confidence interval becomes tighter


Sample SizeSample Size



Assuming the proportion of patients in each group stays the same, the risk ratio eventually becomes statistically significant.

Sample Size



Assuming the proportion of patients in each group stays the same, the risk ratio eventually becomes statistically significant.

Sample Size

This is because the power you have to detect that effect size has increased.

The larger your sample, the closer you are to actually sampling the entire universe.


Sample Size

Therefore, your confidence interval is tighter and closer to “the truth in your universe.”

This makes sense.


Sample Size

The more people in your study, the closer you are to having the universe as your sample. Therefore your statistic should be pretty close to the ‘truth in the universe’.

Patients in the Universe Patients

in the Sample

Sampling (easy)

Generalizing (hard)


Patients in the Universe

Patients in the Sample

Sampling (hard)

Generalizing (easy)


Causation ? Tim Wiemken, PhD MPH CIC Assistant Professor Division of Infectious Diseases University...

Documents

Transcript of Causation ? Tim Wiemken, PhD MPH CIC Assistant Professor Division of Infectious Diseases University...