Metis Project 3: Predicting Heart Disease Diagnosis

15
DIAGNOSING CORONARY ARTERY DISEASE BY GENDER JAMIE FRADKIN FEBRUARY 19, 2016

Transcript of Metis Project 3: Predicting Heart Disease Diagnosis

Page 1: Metis Project 3: Predicting Heart Disease Diagnosis

DIAGNOSING CORONARY ARTERY DISEASE BY GENDERJAMIE FRADKINFEBRUARY 19, 2016

Page 2: Metis Project 3: Predicting Heart Disease Diagnosis

PROBLEM STATEMENT

Evaluate a classification algorithm that sensitively diagnoses CAD in men and women in order to determine which features of each gender are the most influential in predicting this diagnosis.

Page 3: Metis Project 3: Predicting Heart Disease Diagnosis

DATA SET*Feature TypeECG Reading during Exercise

Discrete value (corresponding to abnormality)

ECG Reading during Rest Discrete value (corresponding to abnormality)

Chest Pain Type Discrete Value (corresponding to severity)

Exercise-induced Chest Pain

Boolean

Hypertension BooleanAge IntegerCholesterol IntegerResting Heart Rate IntegerPeak Heart Rate during Exercise

Integer

Resting BP (Systolic + Diastolic)

Integer

Peak BP (Systolic + Diastolic)

Integer*Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

Page 4: Metis Project 3: Predicting Heart Disease Diagnosis

DATA SET*Feature TypeECG Results during Exercise

Discrete value (corresponding to abnormality)

ECG Results during Rest Discrete value (corresponding to abnormality)

Chest Pain Type Discrete Value (corresponding to severity)

Exercise-induced Chest Pain

Boolean

Hypertension BooleanAge IntegerCholesterol IntegerResting Heart Rate IntegerPeak Heart Rate during Exercise

Integer

Resting BP (Systolic + Diastolic)

Integer

Peak BP (Systolic + Diastolic)

Integer*Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease

Male Female

Positive Diagnosi

s

215 33

Negative Diagnosi

s

109 86

Total 324 119

Page 5: Metis Project 3: Predicting Heart Disease Diagnosis

CONSIDERATIONS

Important to maximize Recall (sensitivity) = , i.e. the proportion of correctly identified positive results

Want to diagnose genders with the same classification model so that feature contributions can be fairly compared

Page 6: Metis Project 3: Predicting Heart Disease Diagnosis

PROCEDURE1. Use Grid Search to optimize model parameters for best recall and accuracy

Models reviewed: Random Forest Classifier Gaussian Naïve Bayes K-Nearest Neighbors Logistic Regression Linear Support Vector Classification

2. Compare model performance using cross-validation to determine which algorithm offers the best predictions

3. Investigate feature significance to determine factors most relevant to an accurate diagnosis for each gender

Page 7: Metis Project 3: Predicting Heart Disease Diagnosis

INITIAL MODEL REVIEW

Page 8: Metis Project 3: Predicting Heart Disease Diagnosis

INITIAL MODEL REVIEW

Male Female

Recall 80% 81%Accuracy

77% 70%

Model Results:(with cross-validation)

Page 9: Metis Project 3: Predicting Heart Disease Diagnosis

NAÏVE BAYES PERFORMANCE: ON TEST SET

True Negative False Positive

False Negative True Positive

True Negative False Positive

False Negative True Positive

Page 10: Metis Project 3: Predicting Heart Disease Diagnosis

DISTINGUISHING FEATURES BY GENDER: MALEHistograms of feature values in each class

Page 11: Metis Project 3: Predicting Heart Disease Diagnosis

DISTINGUISHING FEATURES BY GENDER: MALEHistograms of feature values in each class

Page 12: Metis Project 3: Predicting Heart Disease Diagnosis

DISTINGUISHING FEATURES BY GENDER: FEMALEHistograms of feature values in each class

Page 13: Metis Project 3: Predicting Heart Disease Diagnosis

DISTINGUISHING FEATURES BY GENDER: FEMALEHistograms of feature values in each class

Page 14: Metis Project 3: Predicting Heart Disease Diagnosis

CONCLUSION Gaussian Naïve Bayes proved to be the best algorithm to diagnose heart disease in men

and women Features that most contribute most to diagnosis of CAD in men are:

Age > 55 Resting Heart Rate < 140 bpm Resting Blood Pressure (Sys. + Dias.) > 220 mmHg

Features that most contribute most to diagnosis of CAD in women are: Cholesterol > 275 mg/dL Peak Heart Rate < 150 bpm Peak Blood Pressure (Sys. + Dias.) > 240 mmHg

Page 15: Metis Project 3: Predicting Heart Disease Diagnosis

Thank you!