EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009
description
Transcript of EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009
11/2009 1
EPI 5240:Introduction to Epidemiology
Screening and diagnostic test evaluationNovember 2, 2009
Dr. N. Birkett,Department of Epidemiology & Community
Medicine,University of Ottawa
11/2009 2
Session Overview
• Review key features of tests for disease.• Diagnostic test evaluation
– Study designs– Key biases
• Screening programmes– Overview– Criteria for utility– Issues in evaluation and implementation
• Regression to the mean
11/2009 3
A 54 year old female teacher visited her FP for an ‘annual checkup’. She reported no illnesses in the previous year, felt well and had no complaints. Hot flashes related to menopause had resolved. A detailed physical examination, included breast palpation, was unremarkable. A screening mammogram was recommended as per current guidelines.
Scenario (1)
11/2009 4
The mammogram results were ‘not normal’ and a follow-up breast biopsy was recommended. The surgeon confirmed the negative clinical exam. But, based on the abnormal mammogram, a fine-needle aspiration biopsy of the abnormal breast under radiological guidance was recommended. Pathological review of the biopsy revealed the presence of a malignant breast tumor. Further surgery was scheduled to pursue this abnormal finding.
Scenario (2)
11/2009 5
Why not here?
11/2009 6
FNA positive risk
• 100% vs 64%• Depends on definition of a ‘positive’ FNA.
– Must be clear carcinoma100% positive (0% false positives)
– Abnormal cells, may not be cancer 64% positive (36% false positive)
• Why use second approach?– Reduces the risk that you will miss someone who has
a true cancer• Tradeoff of sensitivity and specificity• More later
11/2009 7
Test Properties (1)
• Most common situation (for teaching at least) assumes:– Dichotomous outcome (ill/not ill)– Dichotomous test results (positive/negative)
• Represented as a 2x2 table (yet another variant!).
• Advanced methods can consider tests with multiple outcomes– advanced; moderate; minimal; no disease
11/2009 8
Test Properties (2)
Diseased Not diseased
Test +ve 90 5 95
Test -ve 10 95 105
100 100 200
True positives False positives
False negatives True negatives
11/2009 9
Test Properties (4)
Diseased Not diseased
Test +ve 90 5 95
Test -ve 10 95 105
100 100 200
Sensitivity = 0.90 Specificity = 0.95
11/2009 10
Test Properties (5)
Diseased Not diseased
Test +ve a b a+b
Test -ve c d c+d
a+c b+d a+b+c+d
Sensitivity Specificity
11/2009 11
Test Properties (6)
• Sensitivity = Pr(test positive in a personwith disease)
• Specificity = Pr(test negative in a person without disease)
• Range: 0 to 1– > 0.9: Excellent– 0.8-0.9: Not bad– 0.7-0.8: So-so– < 0.7: Poor
11/2009 12
Test Properties (7)
• Generally, high sensitivity associated with low specificity and vice-versa (more later).
• Do you want a test with high sensitivity or specificity?– Depends on cost of ‘false positive’ and ‘false
negative’ cases.– PKU – one false negative is a disaster.– Ottawa Ankle Rules
11/2009 13
Test Properties (8)
• Patients don’t ask: if I’ve got the disease how likely is it that the test will be positive?
• They ask: My test is positive? Does that mean I have the disease?
• Predictive values.
11/2009 14
Test Properties (9)Diseased Not diseased
Test +ve 90 5 95
Test -ve 10 95 105
100 100 200
PPV = 0.95
NPV = 0.90
11/2009 15
Test Properties (10)
Diseased Not diseased
Test +ve a b a+b
Test -ve c d c+d
a+c b+d a+b+c+d
PPV
NPV
11/2009 16
Test Properties (11)
• PPV = Pr(subject has disease given thattheir test was positive)
• NPV =Pr(subject doesn’t have diseasegiven that their test was negative)
• Range: 0 to 1• PPV is affected by the prevalence of the disease in the
target population. Sensitivity & specificity are not affected by prevalence.
• To use test in new population, you need to ‘calibrate’ the PPV/NPV.
• Example: sens = 0.85; spec = 0.9
11/2009 17
Test Properties (12)
Diseased Not diseased
Test +ve 425 50 475
Test -ve 75 450 525
500 500 1,000
Tertiary care: research study. Prevalence=0.5
PPV = 0.89
11/2009 18
Test Properties (13)Calibration by hypothetical table
Fill cells in following order:
“Truth”
Disease Disease Total PV
Present Absent
Test Pos 4th 7th 8th 10th
Test Neg 5th 6th 9th 11th
Total 2nd 3rd 1st (10,000)
11/2009 19
Test Properties (14)
Diseased Not diseased
Test +ve
Test -ve
10,000
Primary care: Prevalence=0.01
PPV = 0.08
9,900
85
15
100
990
8,910
1,075
8,925
0.01*10000
0.85*100
0.9*9900
11/2009 20
Test Properties (16)Likelihood Ratio
Diseased Not diseased
Test +ve a b a+b
Test -ve c d c+d
a+c b+d a+b+c+d
Post-test odds
Pre-test odds
post-test oddsLR+ve = ----------------------- pre-test odds
11/2009 21
Test Properties (15)Likelihood ratio
Diseased Not diseased
Test +ve
90 5 95
Test -ve
10 95 105
100 100 200 Pre-test odds = 1.00
Post-test odds = 18.0
Likelihood ratio (+ve) = LR(+) = 18.0/1.0 = 18.0
11/2009 22
Test Properties (17)
• LR(+ve) gives the amount by which the odds of disease increase if the test is positive.– Big values are good. Need at least 8-10 to have an acceptable
test.
a * (b+d) sensitivity
LR(+ve) = ------------------- = ---------------------
(a+c) * b (1 – specificity)
• LR(+ve) is not affected by disease prevalence.– Can be used to adjust PPV/NPV for differences in prevalence.
11/2009 23
Test Properties (18)
• Adjusting PPV/NPV using LR(+ve)– Compute LR (+ve) from your test sample (LRtest)
– Convert the new disease prevalence into odds (pre-test odds):
• pre-test odds = p/(1-p)
– Multiply pre-test odds by LRtest to give post-test odds (oddspost)
– Convert oddspost to PPV:
• PPV = oddspost/(1 + oddspost)
11/2009 24
Test Properties (19)PPV via LR(+ve)
• Previous example– Prevalence = 1%; sens = 85%; spec = 90%
• Pretest odds = .01/.99 = 0.0101• LR+ = .85/.1 = 8.5 (>1, but not that great)• Post-test odds (+ve) = .0101*8.5 = .0859• PPV = .0859/1.0859 = 0.079 = 7.9%
• Compare to the ‘hypothetical table’ method (PPV=8%)
11/2009 25
Test Properties (20)
• Most tests give continuous readings– Serum hemoglobin– PSA– X-rays
• How to determine ‘cut-point’ for normal vs diseased (negative vs positive)?
• ↑ sensitivity ↓specificity• Receiver Operating Characteristic (ROC)
curves
11/2009 26
11/2009 27
False -ve False +ve
PositiveNegative
11/2009 28
False -ve False +ve
PositiveNegative
11/2009 29
ROC curve from sample data
1 - Specificity
0.0 0.2 0.4 0.6 0.8 1.0
Sen
sitiv
ity
0.0
0.2
0.4
0.6
0.8
1.0
11/2009 30
AUC = Area Under Curve
11/2009 31
11/2009 32
Diagnostic test study issues (1)
• How do you select the subjects for a study to evaluate the properties of a diagnostic test?
• Most test evaluations are done in tertiary care settings PPV/NPV issues.
• Three main methods of choosing subjects:– Take ‘all comers’– Select a group of people with disease and a group
without disease– Select a group who are test positive and a group who
are test negative.
11/2009 33
Diagnostic test study issues (2)
Diseased Not diseased
Test +ve
Test -ve
1
3
2
11/2009 34
Diagnostic test study issues (3)
• Method 1:– Inefficient – most people won’t have disease.
• Method 2:– Hard to implement if test must be
administered before outcome is known (e.g. a measure of reactive arterial narrowing and diagnosis of a heart attack)
• Method 3:– Gives biased estimates of
sensitivity/specificity (Work-up Bias)
11/2009 35
Diagnostic test study issues (4)
• Spectrum Bias– It’s easy to diagnose a broken leg in a person
with a compound fracture.– It’s much harder to distinguish someone with
a hairline fracture from a person with a deep bruise or ligament injury.
• Study must include subjects with the relevant spectrum of disease states.– Spectrum needed depends on purpose of the
test.
11/2009 36
Diagnostic test study issues (5)
• Work-up bias– The study selects patients based on the result of the
diagnostic test (e.g. 100 test +ve and 100 test –ve). • sens/spec will be biased.
• Example:– Evaluate a new method to screen men with chest
pain. It’s hard to get men with known CHD (can’t be done in ED alone). You might try to select men based on results of the screening test.
11/2009 37
Work-up Bias
Sensitivity = 150/250 = 60%
Specificity = 900/950 = 95%NOW, suppose we only studied 100 people with a negative test but everyone with a positive test?
Diseased Not diseased
Test +ve
150 50 200
Test -ve
100 900 1000
250 950 1200
TRUETESTPERFORMANCE
11/2009 38
Work-up Bias (2)
Sensitivity = 150/160 = 94% not 60%
Specificity = 90/140 = 64% not 95%
Diseased Not diseased
Test +ve
150 50 200
Test -ve
0.1 * 1000.1 * 100010010
140160 300
BIAS!
90
TESTPERFORMANCEFROM STUDY
11/2009 39
Screening (1)
• Screening– The presumptive identification of an unrecognized
disease or defect by the application of tests, examinations or other procedures
• Can be applied to an unselected population or to a high risk group.
• Examples– Pap smears (cervical cancer)– Mammography (breast cancer)– Early childhood development– PKU
11/2009 40
Screening (2)• Levels of prevention:
– Primary prevention– Secondary prevention– Tertiary prevention
11/2009 41
Screening (3)DPCP§
§ Detectable Pre-Clinical Phase
11/2009 42
Screening (4)
11/2009 43
Screening (5)
Criteria to determine if a screening programme should be implemented
• Disease Factors– Severity– Presence of a lengthy DPCP– Evidence that earlier treatment improves
prognosis
11/2009 44
Screening (6)
• Test Factors– Valid - sensitive and specific with respect to
DPCP– Reliable and reproducible (omitted from
most lists, but shouldn't be)– Acceptable - cf. sigmoidoscopy– Easy– Cheap– Safe
11/2009 45
Screening (7)
• Test Factors (cont)– Test must reach high-risk groups - cf Pap
smears– Sequential vs parallel tests
• Sequential higher specificity• Parallel higher sensitivity
• System Factors– Follow-up provided and available to all– Treatment resources adequate
11/2009 46
Screening (8)
• Evaluation of Screening– Can it work?– Does it work in the real world?
• Case-control vs. cohort vs. RCT
• Are we evaluating– Screening alone
• Mammography and breast cancer detection
– Screening plus therapy• Mammography and survival
11/2009 47
Screening (9)
• Biases in interpreting evaluations of screening programmes.
• Lead-time Bias– Detecting disease early gives more years of
‘illness’ but doesn’t prolong life
• Length Bias– Slowly progressive cases are more likely to
be detected than rapidly progressive cases
11/2009 48
Screening (10)
11/2009 49
Screening (11)
11/2009 50
Screening (12)
• Study proposes to evaluate a screening programme in an RCT by comparing survival (adjusted for lead time bias) in people who were screened to those who were not screened.
• Will give a biased estimate of effectiveness (screening will look ‘too good’).
11/2009 51
Screening (13)
Screened Detected
Slow: 5/5
Fast: 2/5
Better survival than non-screened subjects even in screening is useless
11/2009 52
Screening (14)
• Compare survival in people screened to survival rates prior to screening
– Lead time bias
• Compare survival in screen detected cases to other cases
– Length bias– (lead time bias)
• Compare survival in people offered screening to those not offered screening
– RCT is best– Depends on screening compliance rates
11/2009 53
Screening (15)
• Potential outcomes include– Death
• Disease-specific• Total mortality
– Disease progression– Intermediate outcomes– Biomarkers– Quality of life
11/2009 54
11/2009 55
11/2009 56
11/2009 57
Screening (16)
• Some Issues with Screening– Compliance– Impact on Quality of Life– Impact on ‘period’ incidence trends
11/2009 58
Screening (17)
• Some Issues with Screening (cont)– Repeat Screening
• Most population-based screening requires repeated exams
– Within same person » But not for genetic testing
– Different people (to detect new cases)
• What screening interval to use?• Prevalent vs. incidence cases• Interval cases
11/2009 59
Screening (18)
• Some Issues with Screening (cont)– False positives
• 90% of positive mammograms are false +ve• Cost to evaluate person• Stress, psychological effects
11/2009 60
Screening (19)
• Mammography false positive implications
• 1996 Swedish data– 352 false positives
• 1112 MD visits• 397 FNA biopsies• 187 extra mammograms• 90 surgical biopsies• $600,000
– Screening detected 128 cancers.
11/2009 61
Screening (20)
• Some Issues with Screening (cont)– False positives
• 90% of positive mammograms are false +ve• Cost to evaluate person• Stress, psychological effects
– False negatives• Missed disease• False reassurance leading to delayed clinical diagnosis
– True negatives• False reassurance and failure to alter risky life style, etc.
– Labelling
11/2009 62
µ2
11/2009 63
11/2009 64
11/2009 65
Summary
• Diagnostic tests can be evaluated by considering their error rates
– sensitivity & specificity are the key parameters used
• Screening tests have similar properties• Screening should not be used unless early
detection of diseases changes natural history• Screening tests generally need high sensitivity