Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin...
-
Upload
erika-chandler -
Category
Documents
-
view
216 -
download
1
Transcript of Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin...
![Page 1: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/1.jpg)
Data Quality(a.k.a. “Data
Heterogeneity”)Kent Bailey, Susan Rea Welch,
Lacey Hart, Kevin Bruce,
Susan Fenton
![Page 2: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/2.jpg)
Objectives
Assess Data variability within and across institutions
Assess impact of this variability on Secondary Use of EMR
Generate specifications for Widgets– “Warning Label” for suspect data categories– Data quality audits with logs– Batch data correction / removal
![Page 3: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/3.jpg)
Current Research: Effects of Variation on Diabetes Phenotyping Algorithm
Purpose: Compare data relevant to Type 2 DM eMERGE phenotyping algorithm between Intermountain and Mayo
Methods: 1. Identify adult subjects with evidence in any
semantic category of algorithm: ICD-9-CM codes for Diabetes Mellitus Abnormal glucose or HbA1C Antihyperglycemic medications Capillary glucose (Glucometer) procedures
![Page 4: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/4.jpg)
Methods2. Collect relevant data on these subjects
– ICD-9-CM codes– Procedure codes– Demographic data– Smoking status– Body Mass index– Specialty of provider– Geographic info– Frequency of health care encounters
3. Describe variation between institutions
![Page 5: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/5.jpg)
AnalysisCompare (between institutions) frequencies of
data elements– ICD9 codes– overall and specific codes
Compare lab values– number and valuesCompare medications– Control for:
– Provider specialty– Geographic variables– Demographic variables
![Page 6: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/6.jpg)
Interpretation
Assess impact of data heterogeneity on phenotyping at different institutions
Recommendations for– High throughput Phenotyping– High throughput screening for clinical trials
Generalization to other phenotypesHypothesis generation
![Page 7: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/7.jpg)
Preliminary Mayo Results
Mayo Data: (ICD or abn.labs or capill. Glucose, limited to Olmsted and surrounding counties)
– 13,754 subjects 89% Caucasian, 2.5% African-American, 2.0% Asian 6.5% Native Am, Pac. Isl., other, unknown, refuse
– Mean current age 64, range 20 to 104– Sex: 53% male, 47% female
![Page 8: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/8.jpg)
Preliminary Mayo resultsN=13,754
Smoking (n=11,626)– Current 66%, past 16%, never 13%, Unk 6%
BMI (limited to < 60) (n=6,338)– Mean 32.6 +/- 7.2– Median 31.6, quartiles (27.5, 36.6)
![Page 9: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/9.jpg)
Preliminary Results: ICD9 codes
Complications– None 6743 (250.0)– Ketoacidosis 1 (250.1)– Hyperosmolality 2 (250.2)– Renal 398 (250.4)– Opthalmic 1385 (250.5)– Neuro 586 (250.6)– Peripheral Circ. 25 (250.7)– “other specified” 312 (250.8)– Unspecified 336 (250.9)
![Page 10: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/10.jpg)
Preliminary Results: ICD9 codes
250.X0 Type 2 or unspecified, controlled or not
» specified as uncontrolled
250.X1 Type 1, controlled or not
» Specified as uncontrolled
250.X2 Type 2 or unspecified, uncontrolled
250.X3 Type 1, uncontrolled
![Page 11: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/11.jpg)
Type 2/U vs. Type 1 DM codesMayo Data: n=13707
Type 1 DM
codes
Type 2/U DM codes
0 1+
0 6339
(46%)
6631
(48%)
1+ 483
(4%)
254
(2%)
![Page 12: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/12.jpg)
Intermountain peek (sic)
Type 1 ICD9 codes
Type 2/U ICD9 codes
0 1+
0 -- 65,983
1+ 2,083 6,629
Disclaimer– don’t assume data are ready to compare between sites at this point
![Page 13: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/13.jpg)
Back to Mayo SummarySample Lab data
Test name
N Min 1% Med. 99% Max
Glucose(P)
40,786 1 67 127 394 1300
Glucose POCT
211,746 25 63 141 392 600
Hemoglobin A1c, B
35,206 4.0% 5.1% 6.9
%
12.1%
16.7
%
![Page 14: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/14.jpg)
Future DirectionsCarry out inter-institution comparisonStudy effects of geography, race, etc.Implement chart review (on random sample)
for “gold standard” definition of Type 2 DMUse of lab values /meds for definition of
continuous phenotype (DM-ness)Extrapolation / generalization to other
diseases /phenotypes
![Page 15: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/15.jpg)
Data Quality(a.k.a. “Data
Heterogeneity”)
Susan Rea Welch
![Page 16: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/16.jpg)
Conclusions: PhD ResearchCohort Amplification
– Knowledge Discovery from Databases (KDD)– Associative Classification Methods– Classification Rules for Diabetes and Asthma
comparably accurate Concise consistent with domain knowledge
– Contributed new knowledge Attributes for cohort identification Unanticipated comorbidity associations
![Page 17: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/17.jpg)
Consistency and NoveltyDiabetes
Elevated quantitative lab glucose assays– Frequency 19%, Likelihood 87%– Less predictive than glucose by glucometer or Urine Microalbumin
Abnormal HbA1c test– Equivalent predictive power of HBA1c test order
Antihyperglycemic medications– Variable predictive strength:
Metformin, Insulin, Insulin Release Stimulators,Insulin Response Enhancers
![Page 18: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/18.jpg)
Consistency and NoveltyAsthma
Medications were most predictive
– High Likelihood: Salmeterol, Leukotriene receptor antagonist
– Albuterol / Glucocorticoid combine: Pulmonary Procedures (CPT hierarchy) Female gender Abnormal CBC
Unexpected comorbidity associations– Suggests discovery of shared pathways
![Page 19: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/19.jpg)
Associative Classification – What?
• Pattern discovery in transaction database• Independent of domain expertise
• Deductive, global associations in data
• Induce a general & accurate classifier
![Page 20: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/20.jpg)
Associative Classification – Why?
• No domain expertise attribute selection
• Not affected by missing data
• Proven accuracy
• Understandable rules
• Independent rules
![Page 21: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/21.jpg)
Core Candidate Attributes
Diagnosis codesProvider specialtyLab observationsProcedure codes‘Abnormal’ lab obs. Imaging proceduresMedication listAge groupsFemale gender
![Page 22: Data Quality (a.k.a. “Data Heterogeneity”) Kent Bailey, Susan Rea Welch, Lacey Hart, Kevin Bruce, Susan Fenton.](https://reader031.fdocuments.us/reader031/viewer/2022032312/56649e155503460f94aff7ec/html5/thumbnails/22.jpg)
SHARPn Y2 Research Aims
Associations reliable across EHRs?
Improve algorithms’ sensitivity / specificity?
– AC attribute selection + other classifiers