Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis...

38
Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis...

Page 1: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Winter Electives

• Molecular and Genetic Epidemiology• Decision and Cost-effectiveness Analysis• Grantwriting (Workshop – not for credit hours)• Medical Informatics

Page 2: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

• Today: – Lecture: Confounding & Interaction III– Section: 3:30 to 5 (S-18, S-22, S-22)

• Next Tuesday (12/6/05) – All at China Basin– 8:15 to 9:45: Journal Club

– 10:00 to 1:00 pm: Mitch Katz• Note chapters in his text book• box lunches provided

– 1:15 to 2:45: Last Small Group Section• Web-based course evaluation• Bring laptop

– Distribute Final Exam• Exam due 12/13 (in hands of Olivia by

4 pm) by email or China Basin 5700

Page 3: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Confounding and Interaction: Part III

• When Evaluating Association Between an Exposure and an Outcome, the Possible Roles of a 3rd Variable are:– Intermediary Variable– Effect Modifier– Confounder– No Effect

• Forming “Adjusted” Summary Estimates to Evaluate Presence of Confounding– Concept of weighted average

• Woolf’s Method• Mantel-Haenszel Method

– Clinical/biological decision rather than statistical– Handling more than one potential confounder

• Limitations of Stratification to Adjust for Confounding– the motivation for multivariable regression

Page 4: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

When Assessing the Association Between an Exposure and a Disease,

What are the Possible Effects of a Third Variable?

EM+

_Confounding:

ANOTHER PATHWAY TO

GET TO THE DISEASE

Confounding:

ANOTHER PATHWAY TO

GET TO THE DISEASE

Effect Modifier (Interaction):

MODIFIES THE EFFECT OF THE EXPOSURE

D

I C

Intermediary

Variable:

No Effect

ON CAUSAL PATHWAY

Assumption: The third variable a priori is felt to be relevant

Page 5: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

What are the Possible Effects of a 3rd Variable?

• Intermediary Variable• Effect Modifier (interaction)• Confounder• No Effect

Effect Modifier? (numerically assess both magnitude and statistical differences)

yesno

Confounder? (numerically assess difference between adjusted and crude; not a statistical decision)

yesno

Report stratum-specific estimates

Report “adjusted” summary estimate

Report Crude Estimate (3rd variable has no effect)

Intermediary Variable? (conceptual decision)

Report Crude Estimate

no yes

Page 6: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Effect of a Third Variable: Statistical Interaction

Delayed Not DelayedSmoking 26 133No Smoking 64 601

DelayedNot

DelayedSmoking 15 61No Smoking 47 528

Stratified

Crude

No Caffeine Use

Heavy Caffeine Use

RR crude = 1.7

RRno caffeine use = 2.4

DelayedNot

DelayedSmoking 11 72No Smoking 17 73

RRcaffeine use = 0.7

. cs delayed smoking, by(caffeine)

caffeine | RR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- no caffeine | 2.414614 1.42165 4.10112 5.486943 heavy caffeine | .70163 .3493615 1.409099 8.156069 -----------------+------------------------------------------------- Crude | 1.699096 1.114485 2.590369 M-H combined | 1.390557 .9246598 2.091201-----------------+-------------------------------------------------Test of homogeneity (M-H) chi2(1) = 7.866 Pr>chi2 = 0.0050

Declare interaction; confounding is not relevant

Page 7: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Statistical Tests of Interaction: Test of Homogeneity (heterogeneity)

• Null hypothesis: The individual stratum-specific estimates of the measure of association differ only by random variation– i.e., the strength of association is homogeneous

across all strata– i.e., there is no interaction

• The test statistic will have a chi-square distribution with degrees of freedom of one less than the number of strata

Page 8: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Report vs Ignore Interaction?Some Guidelines

Relative Risks for a Given Exposure and

Disease

Potential Effect Modifier Present Absent

P value for heterogeneity

Report or Ignore

Interaction

2.3 2.6 0.45 Ignore

2.3 2.6 0.001 Ignore

2.0 20.0 0.001 Report

2.0 20.0 0.20 Report

2.0 20.0 0.40 Ignore

3.0 4.5 0.30 Ignore

3.0 4.5 0.001 +/-

0.5 3.0 0.001 Report

0.5 3.0 0.20 Report

0.5 3.0 0.30 +/-

Is an art form: requires consideration of both clinical and statistical significance

Page 9: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

If Interaction is not Present, What Next?

• Case-control study of post-exposure AZT use in preventing HIV seroconversion after needlestick (NEJM 1997)

Crude

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

ORcrude =0.61

(95% CI: 0.26 - 1.4)

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

Page 10: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Post-exposure prophylaxis with AZT after a needlestick

HIV

AZT Use

Severity of Exposure

Page 11: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Evaluating for Interaction

• Potential confounder: severity of exposure

Minor Severity Major

Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

Page 12: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

. cc HIV AZTuse,by(severity)

severity | OR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- minor | 0 0 2.302373 1.070588 major | .35 .1344565 .9144599 6.956522-----------------+-------------------------------------------------

Test of homogeneity (B-D) chi2(1) = 0.60 Pr>chi2 = 0.4400

To stratify the subjects into those women with maternal age less than 35 and those with maternal age >= 35, you add a “by(matage) option. If you add a “, pool” option as I have here, the program will give you not only the default MH summary but also the Woolf estimate.

To stratify the subjects into those women with maternal age less than 35 and those with maternal age >= 35, you add a “by(matage) option. If you add a “, pool” option as I have here, the program will give you not only the default MH summary but also the Woolf estimate.

Finally, you are already familiar with this command but for sake of comparison let’s look at the summary estimate as obtained by logistic regression which as you know uses the MLE approach. As you can see, the MH estimate is essentially identical to the MLE in this problem.

Finally, you are already familiar with this command but for sake of comparison let’s look at the summary estimate as obtained by logistic regression which as you know uses the MLE approach. As you can see, the MH estimate is essentially identical to the MLE in this problem.

Is there interaction?

Is there confounding?

Page 13: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Assuming Interaction is not Present, Form a Summary of the Unconfounded

Stratum-Specific Estimates

• Construct a weighted average– Assign weights to the individual strata– Summary Estimate = Weighted Average of

the stratum-specific estimates

– a simple mean is a weighted average where the weights are equal to 1

– which weights to use depends on type of effect estimate desired (OR, RR, RD) and characteristics of the data

– e.g., • Woolf’s method• Mantel-Haenszel method

ii

ii

w

istratuminestimateeffectw )] ([

Right. We need to assign a weight to each stratum and then perform a weighted average.

Right. We need to assign a weight to each stratum and then perform a weighted average.

How do we decide on a weight?How do we decide on a weight?

Hopefully the concept of a weighted average is understood by everyone. A simple mean is in fact a weighted average where the weights equal one. To get the average height of everyone in class, we add up everyone’s height and divide by the number of persons

contributing. The weight is one.

Hopefully the concept of a weighted average is understood by everyone. A simple mean is in fact a weighted average where the weights equal one. To get the average height of everyone in class, we add up everyone’s height and divide by the number of persons

contributing. The weight is one.

The second approach to getting a summary estimate is actually the one used by multivariable modeling approaches and we will touch on this briefly today. It is called the maximum likelihood approach

The second approach to getting a summary estimate is actually the one used by multivariable modeling approaches and we will touch on this briefly today. It is called the maximum likelihood approach

5)1)(4(

)8(1)6(1)4(1)2(1mean simple

Page 14: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Forming a Summary Estimate for Stratified Data

• Goal: – Create a summary “adjusted” estimate for

the relationship in question while adjusting for the potential confounder

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

How would you weight these strata? According to sample size? No. of cases?

Page 15: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Summary Estimators: Woolf’s Method

• aka Directly pooled or precision estimator• Woolf’s estimate for adjusted odds ratio

– where wi

– wi is the inverse of the variance of the stratum-specific log(odds ratio)

idicibia1111

1

One of the first approaches developed for forming summaryl adjusted estimates was Woolf’s method:

One of the first approaches developed for forming summaryl adjusted estimates was Woolf’s method:

This is the inverse of the variance of

the log odds ratio. This makes sense the more precise strata have the

smallest variances and the inverse of a small number is a

large number

This is the inverse of the variance of

the log odds ratio. This makes sense the more precise strata have the

smallest variances and the inverse of a small number is a

large number

i

i

i

ii

Woolfw

w )]OR (log[

OR log

)(OR logOR WoolfWoolf e

Disease No DiseaseExposed ai bi

Unexposed ci di

Page 16: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Calculating a Summary Effect Using the Woolf Estimator

• e.g. AZT use, severity of needlestick, and HIV

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

281

161

401

81

1

1611

31

911

01

1

)]0.35 log(

281

161

401

81

1[)]0 log(

1611

31

911

01

1[

WoolfOR log

Page 17: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Summary Estimators: Woolf’s Method

• Conceptually straightforward

• Best when:– number of strata is small– sample size within each strata is large

• Cannot be calculated when any cell in any stratum is zero because log(0) is undefined– 1/2 cell corrections have been suggested but are subject to

bias

• Formulae for Woolf’s summary estimates for other measures (e.g., RR, RD) available in texts and software documentation

– sensitive to small strata, cells with “0”– computationally messy

It seems the most reasonable to assign each stratum according to how sure you are of the inference and the

variance of the estimate is the best measure we have for this.

It seems the most reasonable to assign each stratum according to how sure you are of the inference and the

variance of the estimate is the best measure we have for this.

I discuss this approach first not only because it was one of the first proposed but also because it is the most conceptually straightforward.

I discuss this approach first not only because it was one of the first proposed but also because it is the most conceptually straightforward.

In the days before computers, this was considered computationally messy such that other easier methods were sought

In the days before computers, this was considered computationally messy such that other easier methods were sought

Page 18: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Summary Estimators: Mantel-Haenszel

• Mantel-Haenszel estimate for odds ratios

– ORMH =

– wi =

– wi is inverse of the variance of the stratum-specific odds ratio under the null hypothesis (OR =1)

i

ii

N

cb

i

ii

i

ii

Ncb

Nda

i

ii

i

i

i

i

i

ii

Ncb

dbca

Ncb

*

Disease No DiseaseExposed ai bi

Unexposed ci di

ai+ bi + ci + di = Ni

A more robust approach is the Mantel-Haenszel methodA more robust approach is the Mantel-Haenszel method

Again, using the same cell definitions, the M-H estimate for the summary OR is the sum of a times d divided by T divided by the sum of . . .

Again, using the same cell definitions, the M-H estimate for the summary OR is the sum of a times d divided by T divided by the sum of . . .

If we decompose this slightly, we can see that the weight is for each stratum is actually b times c divided by T. This is actually the inverse of the . . .

If we decompose this slightly, we can see that the weight is for each stratum is actually b times c divided by T. This is actually the inverse of the . . .

And the same logic as before, strata with the smallest variance get the most weight

And the same logic as before, strata with the smallest variance get the most weight

Page 19: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Summary Estimators: Mantel-Haenszel

• Mantel-Haenszel estimate for odds ratios

– relatively resistant to the effects of large numbers of strata with few observations

– resistant to cells with a value of “0”

– computationally easy

– most commonly used

The MH is the most commonly used estimator. The MH is the most commonly used estimator.

It is fairly resistant (ie it doesn’t blow up) . . .It is fairly resistant (ie it doesn’t blow up) . . .

Although really not a factor in the computer era, the computation of the MH estimator is a breeze.

Although really not a factor in the computer era, the computation of the MH estimator is a breeze.

More importantly is that the M-H closely approximates the MLE estimate which is generally regarded as the most accurate.

More importantly is that the M-H closely approximates the MLE estimate which is generally regarded as the most accurate.

Page 20: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• e.g. AZT use, severity of needlestick, and HIV

• ORMH =

• ORMH =

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

i

ii

ii

ii

i

ii

N

cbcb

da

N

cb*

i

ii

i

ii

Ncb

Nda

30.0

921640

255391

92288

2551610

Page 21: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Calculating a Summary Effect in Stata

• epitab command - Tables for epidemiologists– see “Survival Analysis and Epidemiological

Tables Reference Manual”

• To produce crude estimates and 2 x 2 tables:– For cross-sectional or cohort studies:

• cs variablecase variable exposed

– For case-control studies:

• cc variablecase variableexposed

• To stratify by a third variable:

– cs varcase varexposed, by(varthird variable)

– cc varcase varexposed, by(varthird variable)

• Default summary estimator is Mantel-Haenszel– , pool will also produce Woolf’s method

How can we make our lives a lot easier and implement all of this on the computer?How can we make our lives a lot easier and implement all of this on the computer?

The epitab command - Tables for Epidemiologists is quite a little handy command. Has anyone used it ?The epitab command - Tables for Epidemiologists is quite a little handy command. Has anyone used it ?

Page 22: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• e.g. AZT use, severity of needlestick, and HIV

• . cc HIV AZTuse,by(severity) pool• severity | OR [95% Conf. Interval] M-H Weight• -----------------+-------------------------------------------------• minor | 0 0 2.302373 1.070588 • major | .35 .1344565 .9144599 6.956522 • -----------------+-------------------------------------------------• Crude | .6074729 .2638181 1.401432 • Pooled (direct) | . . .• M-H combined | .30332 .1158571 .7941072 • -----------------+-------------------------------------------------• Test of homogeneity (B-D) chi2(1) = 0.60 Pr>chi2 = 0.4400• Test that combined OR = 1:• Mantel-Haenszel chi2(1) = 6.06• Pr>chi2 = 0.0138

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

Page 23: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• In addition to the odds ratio, Mantel-Haenszel estimators are also available in Stata for:

– risk ratio

• “cs varcase varexposed, by(varthird variable)”

– rate ratio

• “ir varcase varexposed vartime, by(varthird variable)”

Page 24: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Assessment of Confounding: Interpretation of Summary Estimate

• Compare “adjusted” estimate to crude estimate

– e.g. compare ORMH (= 0.30 in the example) to ORcrude (= 0.61 in the example)

• If “adjusted” measure “differs meaningfully” from crude estimate, then confounding is present

– e.g., does ORMH = 0.30 “differ meaningfully” from ORcrude = 0.61?

• What does “differs meaningfully” mean?– a matter of judgement based on

biologic/clinical sense rather than on a statistical test

– no one correct answer– the objective is to remove bias– 10% change from the crude often used– your threshold needs to be stated a priori

and included in your methods section

So, its in the hands of the researcherSo, its in the hands of the researcher

If the summary estimate, here a M-H OR estimator of 3.8If the summary estimate, here a M-H OR estimator of 3.8

Page 25: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Statistical Testing for Confounding is Inappropriate

• Testing for statistically significant differences between crude and adjusted measures is inappropriate

– e.g., when examining an association for which a factor is a known confounder (say age in the association between HTN and CAD)

– if the study has a small sample size, even large differences between crude and adjusted measures will not be statistically different

• yet, we know confounding is present

• therefore, the difference between crude and adjusted measures cannot be ignored as merely chance. The difference must be reported as confounding

– the issue of confounding is one of internal validity, not of sampling error.

• we must live with whatever effects we see after adjustment for a factor for which there is an a priori belief about confounding

• we’re not concerned that sampling error is causing confounding and therefore we don’t have to worry about testing for role of chance

Page 26: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Confidence Interval Estimation and Hypothesis Testing for the Mantel-

Haenszel Estimator

• e.g. AZT use, severity of needlestick, and HIV

• . cc HIV AZTuse,by(severity) pool• severity | OR [95% Conf. Interval] M-H Weight• -----------------+-------------------------------------------------• minor | 0 0 2.302373 1.070588 • major | .35 .1344565 .9144599 6.956522 • -----------------+-------------------------------------------------• Crude | .6074729 .2638181 1.401432 • Pooled (direct) | . . .

M-H combined | .30332 .1158571 .7941072

• -----------------+-------------------------------------------------• Test of homogeneity (B-D) chi2(1) = 0.60 Pr>chi2 = 0.4400

• Test that combined OR = 1:• Mantel-Haenszel chi2(1) = 6.06• Pr>chi2 = 0.0138

• What does the p value = 0.0138 mean?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

Page 27: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Mantel-Haenszel Confidence Interval and Hypothesis Testing

stratumeach in cell a

for the valueexpected theis E

)1(

5.0

eCI %95

;;;

)(2

)(

))((2

)(

)(2

)(

OR) (logSE

i

12

2121

2

1 121

)MH

OR SE(log x (1.96 MH

OR log

1

2

1

1 1

1

1

2

1

where

NN

mmnn

Ea

N

cbw

N

daR

N

cbQ

N

daP

where

w

wQ

wR

RQwP

R

RP

k

i ii

iiii

k

i

k

iii

i

iii

i

iii

i

iii

i

iii

k

ii

k

iii

k

i

k

iii

k

iiiii

k

ii

k

iii

Disease No DiseaseExposed ai bi m1i

Unexposed ci di m2i

n1i n2i Ni

Page 28: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Mantel-Haenszel Techniques

• Mantel-Haenszel estimators• Mantel-Haenszel chi-square statistic• Mantel’s test for trend (dose-response)

Page 29: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Summary Effect in Stata -example• e.g. Spermicide use, maternal age and

Down’s Down’s No Down’sSpermicide use 4 109No spermicide use 12 1145

Down’s NoDown’s

Spermici use 3 104No spermic. 9 1059

1175

Age < 35 Age > 35

Crude

StratifiedDown’s No

Down’sSpermic. use 1 5No spermic. 3 86

95

OR = 3.4 OR = 5.7

OR = 3.5

With this in mind, let’s consider an example using . . .With this in mind, let’s consider an example using . . .

Should we pool these?Should we pool these?

Is there confounding present?Is there confounding present?

. cc downs spermici , by(matage) pool

matage | OR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- < 35 | 3.394231 .9800358 11.80389 .7965957 >= 35 | 5.733333 0 50.8076 .1578947-----------------+------------------------------------------------- Crude | 3.501529 1.171223 10.49699 Pooled (direct) | 3.824166 1.196437 12.22316 M-H combined | 3.781172 1.18734 12.04142-----------------+-------------------------------------------------Test for heterogeneity (direct) chi2(1) = 0.137 Pr>chi2 = 0.7109Test for heterogeneity (M-H) chi2(1) = 0.138 Pr>chi2 = 0.7105

Test that combined OR = 1: Mantel-Haenszel chi2(1) = 5.81 Pr>chi2 = 0.0159

Which answer should you report as “final”?

Page 30: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

No Effect of Third Variable

Lung Ca No Lung CaSmoking 900 300No Smoking 100 700

Lung CaNo

Lung CASmoking 810 270No Smoking 10 70

Stratified

Crude

Matches Absent

Matches Present

Lung CaNo

Lung CASmoking 90 30No Smoking 90 630

OR crude = 21.0

(95% CI: 16.4 - 26.9)

ORmatches = 21.0 OR no matches = 21.0

OR adj = 21.0

(95% CI: 14.2 - 31.1)

Page 31: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Whether or not to accept the “adjusted” summary estimate in favor

of the crude?

• Methodologic literature is inconsistent on this

• Scientifically most rigorous approach would appear to be to create two lists of potential confounders prior to the analysis:

– A. Those factors for which you will accept the adjusted result no matter how small the difference from the crude

– B. Those factors for which you will accept the adjusted result only if it meaningfully differs from the crude (with some pre-specified difference, e.g., 10%)

• For some analyses, may have no factors on A list. For other analyses, no factors on B list.

• Always putting all factors on A list may seem conservative, but not necessary the right thing to do to take the penalty in statistical imprecision

Page 32: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Presence or Absence of Confounding by a Third Variable?

Relative RisksCrude Third

FactorPresent

ThirdFactorAbsent

Adjusted

Adjust orIgnore?

4.1 1.9 2.1 2.0 Adjust4.0 1.2 1.0 1.1 Adjust0.2 0.7 0.9 0.8 Adjust4.0 3.8 4.2 4.1 Ignore4.0 8.2 7.7 7.9 Adjust1.0 3.1 2.7 3.0 Adjust1.9 1.6 1.9 1.8 Prob. Ignore0.9 0.1 0.2 0.1 Adjust4.0 0.4 0.6 0.5 Adjust

Page 33: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Stratifying by Multiple Potential Confounders

Crude

Stratified

<40 smokers

>60 non-smokers40-60 non-smokers

CAD NoCAD

Chlamydia

NoChlamydia

<40 non-smokers

40-60 smokers >60 smokers

CAD No CADChlamydiaNo chlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

Page 34: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

The Need for Evaluation of Joint Confounding

• Variables that evaluated alone show no confounding may show confounding when evaluated jointly

Crude

Stratified by Factor 1 alone

by Factor 2 alone

by Factor 1 & 2

The examples I have shown thus far have just one potential confounder to worry about. What should we do when more than . . .

The examples I have shown thus far have just one potential confounder to worry about. What should we do when more than . . .

In this example, the crude estimate is identical to the stratum specific measures when the 2 other variables are looked at separately.

In this example, the crude estimate is identical to the stratum specific measures when the 2 other variables are looked at separately.

Disease No DiseaseExposed 12 4Unexposed 30 22

OR = 2.2

F1 +Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F1+F2+Disease

NoDisease

Exposed 1 1Unexposed 10 10

OR = 1.0

F1-F2+Disease

NoDisease

Exposed 5 1Unexposed 5 1

OR = 1.0

F1+F2-Disease

NoDisease

Exposed 5 1Unexposed 5 1

OR = 1.0

F1-F2-Disease

NoDisease

Exposed 1 1Unexposed 10 10

OR = 1.0

F1 -Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F2 +Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F2 -Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

Page 35: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Approaches for When More than One Potential Confounder is Present

• Backward versus forward confounder evaluation strategies

– relevant both for stratification and especially multivariable modeling (the heart of model selection)

• Backwards Strategy

– initially evaluate all potential confounders together (i.e., look for joint confounding)

– conceptually preferred because in nature variables are all present and act together

– Procedure:

• with all potential confounders considered, form adjusted estimate. This is the “gold standard”

• one variable can then be dropped and the adjusted estimate is re-calculated (adjusted for remaining variables)

• if the dropping of the first variable results in a non-meaningful (eg <10%) change compared to the gold standard, it can be eliminated

• procedure continues until no more variables can be dropped (i.e. are remaining variables are relevant)

– Problem:

• with many potential confounders, cells become very sparse and stratum-specific estimates very imprecise

This introduces the whole topic of This introduces the whole topic of

I know you are learning a bit about this in biostatistics. Which is

preferable -backward or forwards?

I know you are learning a bit about this in biostatistics. Which is

preferable -backward or forwards?

In fact, you may not even be able to get off the ground because the initial stratification is just too thin

In fact, you may not even be able to get off the ground because the initial stratification is just too thin

Page 36: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Example: Backwards Selection

• Research question: Is prior hospitalization associated with the presence of methicillin-resistant S. aureus (MRSA)? (from Kleinbaum 2003)

• Outcome variable: MRSA (present or absent)• Primary predictor: prior hospitalization (yes/no)• Potential confounders: age (<55, >55), gender, prior antibiotic

use (atbxuse; yes/no)• Assume no interaction

Factors Adjusted For OR (95% CI) CI Width none (crude) 11.67 (5.99 to 22.77) 16.78

age, gender, atbxuse

(gold standard)

4.66 (2.14 to 10.14)

8.0

gender, atbxuse 5.04 (2.31 to 11.03) 8.72

age, atbxuse 4.63 (2.08 to 10.29) 8.21 age, gender 11.59 (5.91 to 22.76) 16.85

atbxuse 5.00 (2.26 to 11.04) 8.78

age 11.56 (5.87 to 22.76) 16.89 gender 12.06 (6.15 to 23.62) 17.47

• Which OR to report?

Page 37: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Approaches for When More than One Potential Confounder is Present

• Forward Strategy– start with the variable that has the biggest

“change-in-estimate” impact

– then add the variable with the second biggest impact

– keep this variable if its presence meaningfully changes the adjusted estimate

– procedure continues until no other added variable has an important impact

– Advantage:• avoids the initial sparse cell problem of

backwards approach

– Problem:• does not evaluate joint confounding effects

of many variables

In the forward selection approach, you start with . . .In the forward selection approach, you start with . . .

Page 38: Winter Electives Molecular and Genetic Epidemiology Decision and Cost-effectiveness Analysis Grantwriting (Workshop – not for credit hours) Medical Informatics.

Stratification to Reduce Confounding

• Advantages– straightforward to implement and comprehend– easy way to evaluate statistical interaction

• Limitations– Looks at only one exposure-disease assoc. at a time– Requires continuous variables to be discretized

• loses information; possibly results in “residual confounding”

– Deteriorates with multiple confounders• e.g. suppose 4 confounders with 3 levels

– 3x3x3x3=81 strata needed– unless huge sample, many cells have “0”’s

and strata have undefined effect measures– Solution:

• Mathematical modeling (multivariable regression)– e.g.

» linear regression» logistic regression» proportional hazards regression

Although you are all now learning about the wonderful world of multivariable modeling, I would encourage you to examine your data whenever you can with stratification because it

is the most native way to see your data and the easiest to explain your data to others

Although you are all now learning about the wonderful world of multivariable modeling, I would encourage you to examine your data whenever you can with stratification because it

is the most native way to see your data and the easiest to explain your data to others

It does, however, have its limitations which is principally that it breaks down with multiple confounders

It does, however, have its limitations which is principally that it breaks down with multiple confounders

These approaches are the topics of Mitch Katz’s upcoming sessions and your Thursday sessions.

These approaches are the topics of Mitch Katz’s upcoming sessions and your Thursday sessions.