Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac...

31
Automated Differential Diagnosis of Heart Disease In Emergency Department Diyang Xue Dec 07, 2018 University Of Pittsburgh

Transcript of Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac...

Page 1: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

Automated Differential Diagnosis of Heart Disease In Emergency Department

Diyang XueDec 07, 2018

University Of Pittsburgh

Page 2: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Differential Diagnosis● In medicine, a differential diagnosis is the distinguishing of a particular

disease or condition from others that present similar clinical features

● Differential diagnostic procedures are used by physicians and othertrained medical professionals to diagnose the specific disease in apatient

Page 3: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Page 4: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Hypothesis statement

● One can systematically do differential diagnosis with the help of

machine learning algorithms, and reach the levels of experiencedphysicians in accuracy and effectiveness

● In other words, we want to mimic doctor's diagnosis process, find themost useful questions, make diagnosis with the fewest steps

Page 5: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Decision Tree algorithm

Page 6: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive
Page 7: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Data● Emergency department visits whose primary diagnosis is heart disease from 15

hospitals of UPMC● Year 2008 - year 2013 as training data, year 2014 as test data

○ Training data:91,036○ Test data:26,193

● Heart disease○ There are 239 ICD-9-CM codes for heart diseases ○ The Clinical Classifications Software (CCS)○ 11 categories totally○ Based on physician’s suggestion:delete 7.2.5; merge 7.2.8 and 7.2.9 to

7.2.89; merge 7.2.1, 7.2.6, 7.2.7, 7.2.10 to “other”

Page 8: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

CCS level Level Name train(2008-2013) test(2014) total

7.2.1 Heart valve disorders 132 62 194

7.2.2 Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)

364 117 481

7.2.3 Acute myocardial infarction 1014 327 1341

7.2.4 Coronary atherosclerosis and other heart disease

2597 788 3385

7.2.5 Nonspecific chest pain 69234 19317 88551

7.2.6 Pulmonary heart disease 309 138 447

7.2.7 Other and ill-defined heart disease 51 21 72

7.2.8 Conduction disorders 229 85 314

7.2.9 Cardiac dysrhythmias 12925 4078 17003

7.2.10 Cardiac arrest and ventricular fibrillation 1980 485 2465

7.2.11 Congestive heart failure; nonhypertensive 2201 775 2976

SUM 91036 26193 117229

Page 9: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

CCS level Level Name train(2008-2013) test(2014) total

7.2.1 Heart valve disorders 132 62 194

7.2.2 Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)

364 117 481

7.2.3 Acute myocardial infarction 1014 327 1341

7.2.4 Coronary atherosclerosis and other heart disease

2597 788 3385

7.2.5 Nonspecific chest pain 69234 19317 88551

7.2.6 Pulmonary heart disease 309 138 447

7.2.7 Other and ill-defined heart disease 51 21 72

7.2.8 Conduction disorders 229 85 314

7.2.9 Cardiac dysrhythmias 12925 4078 17003

7.2.10 Cardiac arrest and ventricular fibrillation 1980 485 2465

7.2.11 Congestive heart failure; nonhypertensive 2201 775 2976

SUM 91036 26193 117229

Page 10: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

CCS level Level Name train(2008-2013) test(2014) total

7.2.2(0) Peri-; endo-; and myocarditis; cardiomyopathy (except that caused by TB or STD)

364 117 481

7.2.3(1) Acute myocardial infarction 1014 327 1341

7.2.4(2) Coronary atherosclerosis and other heart disease

2597 788 3385

7.2.89(3) Conduction disorders Cardiac dysrhythmias

13154 4163 17317

7.2.11(4) Congestive heart failure; nonhypertensive 2201 775 2976

OTHER(7.2.1,7.2.6,7.2.7,7.2.10)(5)

Heart valve disordersPulmonary heart diseaseOther and ill-defined heart diseaseCardiac arrest and ventricular fibrillation

2472 706 3178

SUM 21802 6876 28678

Page 11: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Data

● Features○ Demographic data, Discharge report (parsed by Medlee), choose

Semantic Types “finding” and “sign and symptom” features○ 8468 features

■ 5 demographic features:Gender, Race, Age, Income, Insurance

■ 8463 NLP features

Page 12: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Decision tree algorithm

● Scikit-learn package○ CART algorithm

● Gini index

Page 13: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Performance(accuracy)● 10-fold cross-validation: max_depth: 12; min_samples_leaf: 30● Accuracy: 0.7686

Page 14: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Performance(F1 score)● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.7695; F1 score: 0.522

Page 15: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Why performance so bad

● Group patients based on primary diagnosis

● A patient may have multiple heart diseases at the same time

Page 16: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Pure data

Training Data 10,698

Test Data 3,195

Page 17: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Pure data--imbalance--accuracy● 10-fold cross-validation: max_depth: 12; min_samples_leaf: 10● Accuracy: 0.8948

Total: 10698Accuracy: 0.8948

Page 18: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Pure data--imbalance--F1 score● 10-fold cross-validation: max_depth: 18; min_samples_leaf: 10● Accuracy: 0.8910

Total: 10698

Accuracy: 0.8910

Page 19: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Pure data -- balanced data--undersampling● 10-fold cross-validation: max_depth: 10; min_samples_leaf: 10● Accuracy: 0.8013

Total: 1899

Accuracy: 0.8013

Page 20: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Pure data -- balanced data--oversampling● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.7775

Total: 8270 Accuracy: 0.7775

Page 21: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Decision Tree framework● Try other algorithm besides classic decision tree algorithm to choose the best split node

Page 22: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Performance--random Forest● 10-fold cross-validation: max_depth: 14; min_samples_leaf: 10● Accuracy: 0.8920

Page 23: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

Future work

● We can not find good split point for certain categories: 0 and 1● Future work:

○ Combine external medical knowledge with pure machine learning algorithm○ Compare the performance of our model with physician

Page 24: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External knowledge

1. Improve classification performance2. Make decision tree more clinical meaningful

Page 25: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External knowledge

Page 26: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External knowledge

Page 27: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External Knowledge

Page 28: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External Knowledge

Page 29: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External Knowledge

Page 30: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive

University Of Pittsburgh

External Knowledge

C0000727 Abdomen, Acute

C0000731 Abdomen distended

C0000734 Abdominal mass

C0000737 Abdominal Pain

Page 31: Automated Differential Diagnosis of Heart Disease In ... · 7.2.89(3) Conduction disorders Cardiac dysrhythmias 13154 4163 17317 7.2.11(4) Congestive heart failure; nonhypertensive