EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009

11/2009 1

EPI 5240:Introduction to Epidemiology

Screening and diagnostic test evaluationNovember 2, 2009

Dr. N. Birkett,Department of Epidemiology & Community

Medicine,University of Ottawa

11/2009 2

Session Overview

• Review key features of tests for disease.• Diagnostic test evaluation

– Study designs– Key biases

• Screening programmes– Overview– Criteria for utility– Issues in evaluation and implementation

• Regression to the mean

11/2009 3

A 54 year old female teacher visited her FP for an ‘annual checkup’. She reported no illnesses in the previous year, felt well and had no complaints. Hot flashes related to menopause had resolved. A detailed physical examination, included breast palpation, was unremarkable. A screening mammogram was recommended as per current guidelines.

Scenario (1)

11/2009 4

The mammogram results were ‘not normal’ and a follow-up breast biopsy was recommended. The surgeon confirmed the negative clinical exam. But, based on the abnormal mammogram, a fine-needle aspiration biopsy of the abnormal breast under radiological guidance was recommended. Pathological review of the biopsy revealed the presence of a malignant breast tumor. Further surgery was scheduled to pursue this abnormal finding.

Scenario (2)

11/2009 5

Why not here?

11/2009 6

FNA positive risk

• 100% vs 64%• Depends on definition of a ‘positive’ FNA.

– Must be clear carcinoma100% positive (0% false positives)

– Abnormal cells, may not be cancer 64% positive (36% false positive)

• Why use second approach?– Reduces the risk that you will miss someone who has

a true cancer• Tradeoff of sensitivity and specificity• More later

11/2009 7

Test Properties (1)

• Most common situation (for teaching at least) assumes:– Dichotomous outcome (ill/not ill)– Dichotomous test results (positive/negative)

• Represented as a 2x2 table (yet another variant!).

• Advanced methods can consider tests with multiple outcomes– advanced; moderate; minimal; no disease

11/2009 8

Test Properties (2)

Diseased Not diseased

Test +ve 90 5 95

Test -ve 10 95 105

100 100 200

True positives False positives

False negatives True negatives

11/2009 9

Test Properties (4)


Test +ve 90 5 95

Test -ve 10 95 105

100 100 200

Sensitivity = 0.90 Specificity = 0.95

11/2009 10

Test Properties (5)


Test +ve a b a+b

Test -ve c d c+d

a+c b+d a+b+c+d

Sensitivity Specificity

11/2009 11

Test Properties (6)

• Sensitivity = Pr(test positive in a personwith disease)

• Specificity = Pr(test negative in a person without disease)

• Range: 0 to 1– > 0.9: Excellent– 0.8-0.9: Not bad– 0.7-0.8: So-so– < 0.7: Poor

11/2009 12

Test Properties (7)

• Generally, high sensitivity associated with low specificity and vice-versa (more later).

• Do you want a test with high sensitivity or specificity?– Depends on cost of ‘false positive’ and ‘false

negative’ cases.– PKU – one false negative is a disaster.– Ottawa Ankle Rules

11/2009 13

Test Properties (8)

• Patients don’t ask: if I’ve got the disease how likely is it that the test will be positive?

• They ask: My test is positive? Does that mean I have the disease?

• Predictive values.

11/2009 14

Test Properties (9)Diseased Not diseased

Test +ve 90 5 95

Test -ve 10 95 105

100 100 200

PPV = 0.95

NPV = 0.90

11/2009 15

Test Properties (10)


Test +ve a b a+b

Test -ve c d c+d

a+c b+d a+b+c+d

PPV

NPV

11/2009 16


• PPV = Pr(subject has disease given thattheir test was positive)

• NPV =Pr(subject doesn’t have diseasegiven that their test was negative)

• Range: 0 to 1• PPV is affected by the prevalence of the disease in the

target population. Sensitivity & specificity are not affected by prevalence.

• To use test in new population, you need to ‘calibrate’ the PPV/NPV.

• Example: sens = 0.85; spec = 0.9

11/2009 17



Test +ve 425 50 475

Test -ve 75 450 525

500 500 1,000

Tertiary care: research study. Prevalence=0.5

PPV = 0.89

11/2009 18

Test Properties (13)Calibration by hypothetical table

Fill cells in following order:

“Truth”

Disease Disease Total PV

Present Absent

Test Pos 4th 7th 8th 10th

Test Neg 5th 6th 9th 11th

Total 2nd 3rd 1st (10,000)

11/2009 19



Test +ve

Test -ve

10,000

Primary care: Prevalence=0.01

PPV = 0.08

9,900

85

15

100

990

8,910

1,075

8,925

0.01*10000

0.85*100

0.9*9900

11/2009 20

Test Properties (16)Likelihood Ratio


Test +ve a b a+b

Test -ve c d c+d

a+c b+d a+b+c+d

Post-test odds

Pre-test odds

post-test oddsLR+ve = ----------------------- pre-test odds

11/2009 21

Test Properties (15)Likelihood ratio


Test +ve

90 5 95

Test -ve

10 95 105

100 100 200 Pre-test odds = 1.00

Post-test odds = 18.0

Likelihood ratio (+ve) = LR(+) = 18.0/1.0 = 18.0

11/2009 22


• LR(+ve) gives the amount by which the odds of disease increase if the test is positive.– Big values are good. Need at least 8-10 to have an acceptable

test.

a * (b+d) sensitivity

LR(+ve) = ------------------- = ---------------------

(a+c) * b (1 – specificity)

• LR(+ve) is not affected by disease prevalence.– Can be used to adjust PPV/NPV for differences in prevalence.

11/2009 23


• Adjusting PPV/NPV using LR(+ve)– Compute LR (+ve) from your test sample (LRtest)

– Convert the new disease prevalence into odds (pre-test odds):

• pre-test odds = p/(1-p)

– Multiply pre-test odds by LRtest to give post-test odds (oddspost)

– Convert oddspost to PPV:

• PPV = oddspost/(1 + oddspost)

11/2009 24

Test Properties (19)PPV via LR(+ve)

• Previous example– Prevalence = 1%; sens = 85%; spec = 90%

• Pretest odds = .01/.99 = 0.0101• LR+ = .85/.1 = 8.5 (>1, but not that great)• Post-test odds (+ve) = .0101*8.5 = .0859• PPV = .0859/1.0859 = 0.079 = 7.9%

• Compare to the ‘hypothetical table’ method (PPV=8%)

11/2009 25


• Most tests give continuous readings– Serum hemoglobin– PSA– X-rays

• How to determine ‘cut-point’ for normal vs diseased (negative vs positive)?

• ↑ sensitivity ↓specificity• Receiver Operating Characteristic (ROC)

curves

11/2009 26

11/2009 27

False -ve False +ve

PositiveNegative

11/2009 28

False -ve False +ve

PositiveNegative

11/2009 29

ROC curve from sample data

1 - Specificity

0.0 0.2 0.4 0.6 0.8 1.0

Sen

sitiv

ity

0.0

0.2

0.4

0.6

0.8

1.0

11/2009 30

AUC = Area Under Curve

11/2009 31

11/2009 32

Diagnostic test study issues (1)

• How do you select the subjects for a study to evaluate the properties of a diagnostic test?

• Most test evaluations are done in tertiary care settings PPV/NPV issues.

• Three main methods of choosing subjects:– Take ‘all comers’– Select a group of people with disease and a group

without disease– Select a group who are test positive and a group who

are test negative.

11/2009 33



Test +ve

Test -ve

1

3

2

11/2009 34


• Method 1:– Inefficient – most people won’t have disease.

• Method 2:– Hard to implement if test must be

administered before outcome is known (e.g. a measure of reactive arterial narrowing and diagnosis of a heart attack)

• Method 3:– Gives biased estimates of

sensitivity/specificity (Work-up Bias)

11/2009 35


• Spectrum Bias– It’s easy to diagnose a broken leg in a person

with a compound fracture.– It’s much harder to distinguish someone with

a hairline fracture from a person with a deep bruise or ligament injury.

• Study must include subjects with the relevant spectrum of disease states.– Spectrum needed depends on purpose of the

test.

11/2009 36


• Work-up bias– The study selects patients based on the result of the

diagnostic test (e.g. 100 test +ve and 100 test –ve). • sens/spec will be biased.

• Example:– Evaluate a new method to screen men with chest

pain. It’s hard to get men with known CHD (can’t be done in ED alone). You might try to select men based on results of the screening test.

11/2009 37

Work-up Bias

Sensitivity = 150/250 = 60%

Specificity = 900/950 = 95%NOW, suppose we only studied 100 people with a negative test but everyone with a positive test?


Test +ve

150 50 200

Test -ve

100 900 1000

250 950 1200

TRUETESTPERFORMANCE

11/2009 38

Work-up Bias (2)

Sensitivity = 150/160 = 94% not 60%

Specificity = 90/140 = 64% not 95%


Test +ve

150 50 200

Test -ve

0.1 * 1000.1 * 100010010

140160 300

BIAS!

90

TESTPERFORMANCEFROM STUDY

11/2009 39

Screening (1)

• Screening– The presumptive identification of an unrecognized

disease or defect by the application of tests, examinations or other procedures

• Can be applied to an unselected population or to a high risk group.

• Examples– Pap smears (cervical cancer)– Mammography (breast cancer)– Early childhood development– PKU

11/2009 40

Screening (2)• Levels of prevention:

– Primary prevention– Secondary prevention– Tertiary prevention

11/2009 41

Screening (3)DPCP§

§ Detectable Pre-Clinical Phase

11/2009 42

Screening (4)

11/2009 43

Screening (5)

Criteria to determine if a screening programme should be implemented

• Disease Factors– Severity– Presence of a lengthy DPCP– Evidence that earlier treatment improves

prognosis

11/2009 44

Screening (6)

• Test Factors– Valid - sensitive and specific with respect to

DPCP– Reliable and reproducible (omitted from

most lists, but shouldn't be)– Acceptable - cf. sigmoidoscopy– Easy– Cheap– Safe

11/2009 45

Screening (7)

• Test Factors (cont)– Test must reach high-risk groups - cf Pap

smears– Sequential vs parallel tests

• Sequential higher specificity• Parallel higher sensitivity

• System Factors– Follow-up provided and available to all– Treatment resources adequate

11/2009 46

Screening (8)

• Evaluation of Screening– Can it work?– Does it work in the real world?

• Case-control vs. cohort vs. RCT

• Are we evaluating– Screening alone

• Mammography and breast cancer detection

– Screening plus therapy• Mammography and survival

11/2009 47

Screening (9)

• Biases in interpreting evaluations of screening programmes.

• Lead-time Bias– Detecting disease early gives more years of

‘illness’ but doesn’t prolong life

• Length Bias– Slowly progressive cases are more likely to

be detected than rapidly progressive cases

11/2009 48

Screening (10)

11/2009 49

Screening (11)

11/2009 50

Screening (12)

• Study proposes to evaluate a screening programme in an RCT by comparing survival (adjusted for lead time bias) in people who were screened to those who were not screened.

• Will give a biased estimate of effectiveness (screening will look ‘too good’).

11/2009 51

Screening (13)

Screened Detected

Slow: 5/5

Fast: 2/5

Better survival than non-screened subjects even in screening is useless

11/2009 52

Screening (14)

• Compare survival in people screened to survival rates prior to screening

– Lead time bias

• Compare survival in screen detected cases to other cases

– Length bias– (lead time bias)

• Compare survival in people offered screening to those not offered screening

– RCT is best– Depends on screening compliance rates

11/2009 53

Screening (15)

• Potential outcomes include– Death

• Disease-specific• Total mortality

– Disease progression– Intermediate outcomes– Biomarkers– Quality of life

11/2009 54

11/2009 55

11/2009 56

11/2009 57

Screening (16)

• Some Issues with Screening– Compliance– Impact on Quality of Life– Impact on ‘period’ incidence trends

11/2009 58

Screening (17)

• Some Issues with Screening (cont)– Repeat Screening

• Most population-based screening requires repeated exams

– Within same person » But not for genetic testing

– Different people (to detect new cases)

• What screening interval to use?• Prevalent vs. incidence cases• Interval cases

11/2009 59

Screening (18)

• Some Issues with Screening (cont)– False positives

• 90% of positive mammograms are false +ve• Cost to evaluate person• Stress, psychological effects

11/2009 60

Screening (19)

• Mammography false positive implications

• 1996 Swedish data– 352 false positives

• 1112 MD visits• 397 FNA biopsies• 187 extra mammograms• 90 surgical biopsies• $600,000

– Screening detected 128 cancers.

11/2009 61

Screening (20)

• Some Issues with Screening (cont)– False positives

• 90% of positive mammograms are false +ve• Cost to evaluate person• Stress, psychological effects

– False negatives• Missed disease• False reassurance leading to delayed clinical diagnosis

– True negatives• False reassurance and failure to alter risky life style, etc.

– Labelling

11/2009 62

µ2

11/2009 63

11/2009 64

11/2009 65

Summary

• Diagnostic tests can be evaluated by considering their error rates

– sensitivity & specificity are the key parameters used

• Screening tests have similar properties• Screening should not be used unless early

detection of diseases changes natural history• Screening tests generally need high sensitivity

EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009

Documents

Transcript of EPI 5240: Introduction to Epidemiology Screening and diagnostic test evaluation November 2, 2009