Bias and confounding - kupublicifsv.sund.ku.dk/~pka/epi18/MK2.pdfTypes of bias 02/02/2018 7...
Transcript of Bias and confounding - kupublicifsv.sund.ku.dk/~pka/epi18/MK2.pdfTypes of bias 02/02/2018 7...
Course: Epidemiological methods in medical research
Mads Kamper-Jørgensen Section of Epidemiology February 6th 2018
Bias and confounding
02/02/2018 1
The world according to an epidemiologist
02/02/2018 2
• We estimate the association between an exposure and an outcome. But does the association reflect causality or is it due to error?
Today we will talk about
• Chance
• Information bias
• Selection bias
• Confounding
Exposure Outcome
Two types of error
02/02/2018 3
Type I error
• We demonstrate an association, although no such association exist
• We typically accept a risk of type I error (α-level) of 5%
Type II error
• We do not demonstate an association, although a such actually does exist
• We typically accept a risk of type II error (β-level) of 20%
The error rates are traded off against each other. The only way to reduce both error rates is to increase the sample size.
Type I and type II error
02/02/2018 4
The truth
Association exists No association exists
Result of study
Association demonstrated
Reject 0-hypothesis (correct inference)
Reject 0-hypothesis (Type I error)
Association not demonstrated
Accept 0-hypothesis (Type II error)
Accept 0-hypothesis (correct inference)
Precision and bias
02/02/2018 5
Blood pressure measured once for 20 people
A) Precise, unbiased: Blood pressure meter
B) Precise, biased: Poorly calibrated blood pressure meter.
C) Unprecise, unbiased: iPhone
D) Unprecise, biased: Poorly calibrated iPhone
Precision and bias
02/02/2018 6
• Reduces the precision
• Has no direction
• Depends on sample size: Bigger is better
• Does not nescesarily lead to bias
• Reduces the validity
• Leads to over- or under estimation
• Bigger is not better
• Leads to bias
RANDOM ERROR SYSTEMATIC ERROR
Types of bias
02/02/2018 7
Information bias
• Has to do with the information about study participants
Selection bias
• Has to do with the selection of study participants
Confounding
• Has to do with mixing of effects because the compared study participants are not comparable
Why information bias?
02/02/2018 8
• Because we can over or under estimate frequencies or associations and draw the wrong inference if the information on participants is incorrect
• So far we assumed correct information: (Hardly) never the case
• Pertains to exposure, covariates and/or outcome
• Due to e.g. biologic variation, poor memory, imprecise question, ignorance etc.
• Information bias is due to systematically incorrect information about participants
• You can’t undo information bias once data has been collected so use precise instruments, questions, standardized procedures, blinding, training
Sensitivity and specificity
02/02/2018 9
• Sensitivity: the ability of a test to classify true positives (TP) as positives. Calculation: TP/(TP+FN)
• Specificity: the ability of a test to classify true negatives (TN) as negatives. Calculation: TN/(TN+FP)
• Most often related to the quality of a biologic test, can describe how well a question reflects ‘truth’
Diseased Non-diseased
Diseased TP FP
Non-diseased FN TN
Total TP+FN FP+TN
Misclassification
02/02/2018 10
• Wrong classification of participants
• If misclassification is similar in the compared groups it’s called non-differential misclassification
• If misclassification is not similar in the compared groups it’s called differential misclassification
• Both non-differential and differential misclassification may cause information bias
Examples from own research
02/02/2018 11
Ignorance
• Few adult Americans received transfusion
Culture
• Few adult Frenchmen drink alcohol
Poor question
• Few Danish children have age-appropriate motor skills
Quiz
02/02/2018 12
• Visit www.madskamper.dk/epiphd
• Take only the HPV quiz
• Discuss with your neighbour
Misclassification
02/02/2018 13
• Fictitious cohort study of the association between alcohol consumption and self-percieved health using a poor measure of alcohol consumption
• True information on alcohol: RR=1.66
Good Bad Total
Abstinent 236 59 295
Consumer 846 419 1265
Total 1082 478 1560
Non-differential misclassification
02/02/2018 14
• True information on alcohol: RR=1.66
• 10% of consumers are misclassified: RR=1.38
Good Bad Total
Abstinent 321 101 422
Consumer 761 377 1139
Total 1082 478 1560
Non-differential misclassification
02/02/2018 15
• True information on alcohol: RR=1.66
• 10% of consumres are misclassified: RR=1.38
• 20% of consumers are misclassified: RR=1.27
• The association goes towards no difference between groups i.e. 0 if the scale is absolute and 1 if the scale is relative
Good Bad Total
Abstinent 405 143 548
Consumer 677 335 1012
Total 1082 478 1560
Differential misclassification
02/02/2018 16
• True information on alcohol: RR=1.66
• 10% of consumers are misclassified, but only among those with self-percieved bad health: RR=1.03
Good Bad Total
Abstinent 236 101 337
Consumer 846 377 1223
Total 1082 478 1560
Differential misclassification
02/02/2018 17
• True information on alcohol: RR=1.66
• 10% of consumres are misclassified, but only among those with self-percieved bad health: RR=1.03
• 20% of consumers are misclassified, but only among those with self-percieved bad health: RR=0.75
• Can reverse the association
Good Bad Total
Abstinent 236 143 379
Consumer 846 335 1181
Total 1082 478 1560
Examples of differential misclassification
02/02/2018 18
Case-control study
• Recall bias: cases remember exposures differently (often better) than controls. Not the same as poor memory!
• Interviewer bias: Interviewer asks differently (often in more detail) regarding exposures among cases compared with controls
Cohort study
• Detection bias: exposed are at different (often higher) risk of the outcome compared with unexposed
• Interviewer bias: exposed are asked differently (often in more detail) about the outcome compared with unexposed
02/02/2018 19
BREAK What are the sources of
information bias in your project – and is it non-differential or
differential?
Why selection bias?
02/02/2018 20
• Because we can over or under estimate frequencies or associations and draw the wrong inference if the study population does not represent the target population
• So far we assumed that participants in our study are comparable to those who do not participate: Not always the case
• Selection bias is due to systematic differences between participants and thoose who do not participate
• Selection into the cohort and attrition
Selection bias
02/02/2018 21
Target population
Source population
Study population
Systematic differences
An example of selection bias
02/02/2018 22
Target population
• Pregnant women in Denmark
Source population
• Pregnant women at selected GPs
Study population
• Paricipants in the Danish National Birth Cohort (DNBC): participation dependent on whether the woman wanted to participate
Selection bias?
• Is the study population different than the source population, and is the source population different than the target population?
It depends …
02/02/2018 23
DNBC women are different
• They drink less, they are better educated, they eat healthier, they use less medication etc.
Scientific question
• How many use pain killers during pregnancy? Yes, very likely information bias
• Is folic acid associated with neural tube defects? No, not very likely
Because
• Both the exposure and the outcome should be associated with the likelihood of participating in the study in comparative studies
Validity
02/02/2018 24
Internal validity
• Do the results apply to the target population?
• Threatened by selection bias, information bias and confounding
External validity
• Do results apply beyond the target population?
• Dependent on internal validity
• Qualitative statement of the direction and strength of an association
Are the results biased?
02/02/2018 25
• We (often times) do not know if the frequency or association is biased by selection because we (often times) do not have information about non-participants
• Risk of selection bias must be considered depending on the scientific question, the study design, and the applied data
• Texan study of HIV prevalence
Matthew McConaughey in ‘Dallas Buyers Club’
What to do?
02/02/2018 26
Data collection
• Maximize response rate through reminders, competitions, payment etc.
• Response rates dropped throughout 30 years
• Snowball sampling (hard-to-get groups)
• National registers without selection
Quiz
02/02/2018 27
• Visit www.madskamper.dk/epiphd
• Take only the hepatitis quiz
• Discuss with your neighbour
Examples of selection bias
02/02/2018 28
Intervention and cohort studies
• Generally not a problem because selection must relate to both exposure and outcome (which happens in the future)
• Attrition bias e.g. new anti-depressant and depression. Under estimates the effect of the new anti-depressant because the most depressed using the old drug drop out
Case-control studies
• Poor selection of controls: Pancreas cancer and coffee. Over estimates the effect of coffee because controls have been advised not to drink coffee
Examples of selection bias
02/02/2018 29
Cross-sectional studies
• Survival bias: Smoking and COPD. Under estimates the effect of smoking because smokers with COPD are at high risk of dying
Can selection bias explain it?
02/02/2018 30
• 1000 people were invited to participate in a study of the association between sex and hair loss. Of those, 650 (65%) agreed.
• OR = (100/200) / (50/300) = 3.00 (95% CI 2.04 - 4.40)
• We suspect men losing their hair to be more interested in participating than the other groups.
+ Hair loss - Hair loss
Man 100 200
Woman 50 300
Can selection bias explain it?
02/02/2018 31
• All men losing their hair participate, while participation in the other groups is 61%
+ Hair loss - Hair loss
Man 100 (100%) 200 (61%)
Woman 50 (61%) 300 (61%)
OR x truepart%(d) / part%(b)
part%(c) / part%(a) OR Observed
OR x true61 / 61
61 / 100 3
2.53)-1.32 CI (95% 1.83 OR True
02/02/2018 32
BREAK Do you have reasons to fear
selection in your studies – can you justify it?
Confounding
02/02/2018 33
What is it?
• To mix up, confuse, mistake
• Used in epidemiology to describe mixing up of causes of a given effect
• Leads to misinterpretation, wrong inference
An example
• Does birth order affect the risk of Down’s syndrome?
Birth order and Down’s syndrome
02/02/2018 34
From: K Rothman: Epidemiology – An Introduction 2002
DK in 2005-2009: ~ 0,5 per 1000 births
Maternal age and Down’s syndrome
02/02/2018 35
From: K Rothman: Epidemiology – An Introduction 2002
Birth order, maternal age and Down’s syndrome
02/02/2018 36
From: K Rothman: Epidemiology – An Introduction 2002
Confounding
02/02/2018 37
Is present when
• An observed association between exposure and outcome fully or partly can be attributed a different distribution of risk factors for the outcome, among exposed and unexposed i.e. unexchangeability
Criteria
• Independent risk-factor for the outcome
• Associated with the exposure
• Not an inter-mediate step between exposure and outcome
Confounder model
02/02/2018 38
Exposure Outcome
Confounder
Independent risk-factor for the outcome
Associated with the exposure
Not inter-mediate between exposure and outcome
Quiz
02/02/2018 39
• Visit www.madskamper.dk/epiphd
• Take the last quiz
• Discuss with your neighbour
Confounder identification
02/02/2018 40
Methods
• Stepwise selection (forwards or backwards)
• Change-in-estimate
• Causal diagrams (DAGs)
Recommendation
• Common sense
• Do not nescessarily do what others have done before
02/02/2018 41
Confounder control
02/02/2018 42
Randomization
• Not possible in observational design
Matching
• Not possible to investigate the effect of matching variable
• May remove the effect you are interested in studying
• Twin and sibling design
DESIGN
Standardization • Indirect standardization
(one population is standard)
• Direct standardization (external standard population)
Stratified analysis • Only possible to stratify according to a few
variables
Multivariate analysis • Adjust simultaneously for several variables
• Estimates from such analysis are called ‘adjusted’
ANALYSIS
Unmeasured vs. residual confounding
02/02/2018 43
Unmeasured
• Variables which we have no data on
Residual
• If the categorization is too crude or the information regarding the confounder is imprecise
Look out for mix-ups
Design and bias
02/02/2018 44
Sir Bradford Hill’s criteria of causality
02/02/2018 45
Criterion Explanation
Stregnth Strength depend on the prevalence. A strong association are not likely only due to confounding
Consistency Several investigations point towards the same i.e. replicated in other designs and settings
Specificity One cause leads to one outcome
Temporality Cause must predate effect
Dosis-response The risk of outcome increases with increasing exposure
Plausibility
Plausible biological explanation?
Experimental evidence
Designs with control of conditions (RCT or animal models)
Analogy If some exposures are harmfull similar exposures are probably harmfull too