BSERVATIONAL and calibrating for error in MEDICAL ...

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP


An empirical approach to measuring and calibrating for error in

observational analyses

Patrick Ryan on behalf of the OMOP research team

25 April 2013


2

Consider a typical observational database study: Exploring clopidogrel and upper gastrointestinal

bleeding

p<.001

Error = distance from the point estimate to the true effect How far away from truth is RR=2.07?

Bias = expected value of the error distribution

When applying this type of analysis to this type of data for this type of outcome, how far on average is the estimate from the true value?

Coverage = probability that true effect is contained within confidence interval

When applying this type of analysis to this type of data for this type of outcome, do the 95% confidence intervals (1.66 to 2.58 in this case) actually contain the true relative risk 95% of the time?


3

• Their recommendation: Use 3-4 negative controls, in addition to target outcome, as a means of assessing the plausibility of an observational analysis result

• Our recommendation: Use a large sample of negative (and positive) controls to empirically measure analysis operating characteristics and use them to calibrate your study finding

Learning from what's already known


4

• Develop a standardized implementation of the analysis strategy – Study design: Case-control

– Nesting within indication (unstable angina)

– Case definition: First episode of upper GI hemorrhage

– 10 controls per case, matched on age, gender, and index date

– Exposure definition: Length of exposure + 30d

– Exclusion criteria: <180d of observation before case

• Systematically apply the analysis across a network of databases, consistently for a large sample of positive and negative controls

– GI Bleeding: 24 positive controls, 67 negative controls

– Criteria for negative controls:

• Event not listed anywhere in any section of active FDA structured product label

• Drug not listed as ‘causative agent’ in Tisdale et al, 2010: “Drug-Induced Diseases”

• Literature review identified no evidence of potential positive association

• Record all effect estimates (RR, CI) from all analysis-database-drug-outcome combinations and summarize analysis*database performance

– If we assume drugs identified as negative controls truly have no effect on outcome, then we can assume true RR = 1 as a basis for measuring error

OMOP approach to methodological research

Standard approach yields similar results as initial study: Opatrny 2008 in CRPD: 2.07 (1.66, 2.58) OMOP 2012 in CCAE: 1.86 (1.79, 1.93)


5

CC

: 2000314, CC

AE

, GI B

leed Case-control estimates for GI bleed negative controls


• If 95% confidence interval was properly calibrated, then 95%*65 = 62 of the estimates should cover RR = 1

• We observed 29 of negative controls did cover RR=1

• Estimated coverage probability =

29 / 65 = 45% • Positive tendency: 74% of

estimates have RR>1

• Error distribution demonstrates positive bias (expected value > 1) and substantial variability


6

• Bias – expected difference between true RR and estimated RR

• Mean squared error – sum of variance and squared bias of the estimated RR

• Coverage probability - % of drugs where true RR is contained within estimated 95% confidence interval

• Real data: negative controls, assume true RR = 1

• Can’t use positive controls in real data if you don’t know true RR

• Simulated data: positive controls, inject true RR = 1, 1.25, 1.5, 2, 4, 10

• Discrimination (AUC) – probability that estimate can distinguish between no effect and positive effect

• AUC can use any rank-order statistic (RR, p-value)

• AUC only assumes true RR should be bigger for positive controls than negative controls

• Can be/has been studied in both real and simulated data

• Sensitivity/specificity – expected operating characteristics of a procedure at a defined decision threshold

• Decision threshold can be any dichotomous criteria (ex: RR>2, p<0.05, LBRR>1.5)

• Sensitivity - % of positive controls that meet decision threshold

• Specificity - % of negative controls that do not meet decision threshold

• Can set desired sensitivity or specificity to determine decision threshold

• Can be/has been studied in both real and simulated data

Measures of accuracy used in OMOP’s evaluations


CM: 21000214 New user cohort, propensity score stratification, with active comparator (drugs known to be negative controls for outcome)

Comparing accuracy of cohort and self-controlled designs Data: MarketScan Medicare Supplemental Beneficiaries (MDCR)

HOI: ‘GI Bleeding’ broad definition

Discrimination Error Coverage

SCCS: 1955010 Multivariate self-controlled case series, including all events, and defining time-at-risk as all-time post-exposure

OS: 403002 Self-controlled cohort design, including all exposures and outcomes, defining time-at-risk and control time as length of exposure + 30d

Observation: Analyses have different error distributions, but all methods have low coverage probability Potential solution: empirical calibration to adjust estimate/standard error for observed bias and residual error

Bias: -0.40

MSE: 0.31

Mean SE:

0.03

Bias: -0.32

MSE: 0.22

Mean SE:

0.05

Bias: -0.21

MSE: 0.31

Mean SE:

0.10


8

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP C

C: 2000314, C

CA

E, G

I Bleed

Using theoretical null: 55% have p < .05

Using empirical null: 6% have p < .05

Case-control estimates for GI bleed negative controls

Intuition for empirical calibration:

You can use empirical null to adjust original estimate by ‘shifting’ for bias and

‘stretching’ for variance in error distribution at each

true effect size

Ex: Clopidogrel-bleeding: Pre-calibration: (1.79-1.93) Post-calibration: (0.79-4.57)


Original coverage probability = 75% Calibrated coverage probability = 96%

6 original estimates that did not contain true RR=1

After calibration, only 1 estimate does not contain true RR = 1

Applying case-control design and calibrating estimates of positive controls in simulated data, RR=1.00


CM: 21000214 New user cohort, propensity score stratification, with active comparator (drugs known to be negative controls for outcome)

Comparing accuracy of cohort and self-controlled designs, after empirical calibration

Data: MDCR; HOI: ‘GI Bleeding’ broad

Discrimination Error Coverage

SCCS: 1955010 Multivariate self-controlled case series, including all events, and defining time-at-risk as all-time post-exposure

OS: 403002 Self-controlled cohort design, including all exposures and outcomes, defining time-at-risk and control time as length of exposure + 30d

Bias: 0.04

MSE: 0.33

Mean SE:

0.67

Bias: 0.00

MSE:

0.11

Mean SE: 0.25

Bias: -0.02

MSE: 0.38

Mean SE:

0.36

Observation: Calibration does not influence discrimination, but tends to improve bias, MSE, and coverage


14

• Systematic exploration of negative and positive controls can be used to augment observational studies to measure analysis operating characteristics • Errors in observational studies were observed to be differential by analysis

design, data source, and outcome

• Magnitude and direction of bias varied, but all analyses had error distributions far from nominal

• Traditional interpretation of 95% confidence interval, that the CI covers the true effect size 95% of the time, may be misleading in the context of observational database studies – Coverage probability was much lower across all methods and all outcomes

– Sampling variability is small portion of the true uncertainty in any study

• Empirical calibration is one approach to attempt to account for residual error that should be expected within any observational analysis

Concluding thoughts

BSERVATIONAL and calibrating for error in MEDICAL ...

Documents

Transcript of BSERVATIONAL and calibrating for error in MEDICAL ...