Metis Project 3: Predicting Heart Disease Diagnosis
-
Upload
jamie-fradkin -
Category
Data & Analytics
-
view
118 -
download
0
Transcript of Metis Project 3: Predicting Heart Disease Diagnosis
DIAGNOSING CORONARY ARTERY DISEASE BY GENDERJAMIE FRADKINFEBRUARY 19, 2016
PROBLEM STATEMENT
Evaluate a classification algorithm that sensitively diagnoses CAD in men and women in order to determine which features of each gender are the most influential in predicting this diagnosis.
DATA SET*Feature TypeECG Reading during Exercise
Discrete value (corresponding to abnormality)
ECG Reading during Rest Discrete value (corresponding to abnormality)
Chest Pain Type Discrete Value (corresponding to severity)
Exercise-induced Chest Pain
Boolean
Hypertension BooleanAge IntegerCholesterol IntegerResting Heart Rate IntegerPeak Heart Rate during Exercise
Integer
Resting BP (Systolic + Diastolic)
Integer
Peak BP (Systolic + Diastolic)
Integer*Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
DATA SET*Feature TypeECG Results during Exercise
Discrete value (corresponding to abnormality)
ECG Results during Rest Discrete value (corresponding to abnormality)
Chest Pain Type Discrete Value (corresponding to severity)
Exercise-induced Chest Pain
Boolean
Hypertension BooleanAge IntegerCholesterol IntegerResting Heart Rate IntegerPeak Heart Rate during Exercise
Integer
Resting BP (Systolic + Diastolic)
Integer
Peak BP (Systolic + Diastolic)
Integer*Source: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
Male Female
Positive Diagnosi
s
215 33
Negative Diagnosi
s
109 86
Total 324 119
CONSIDERATIONS
Important to maximize Recall (sensitivity) = , i.e. the proportion of correctly identified positive results
Want to diagnose genders with the same classification model so that feature contributions can be fairly compared
PROCEDURE1. Use Grid Search to optimize model parameters for best recall and accuracy
Models reviewed: Random Forest Classifier Gaussian Naïve Bayes K-Nearest Neighbors Logistic Regression Linear Support Vector Classification
2. Compare model performance using cross-validation to determine which algorithm offers the best predictions
3. Investigate feature significance to determine factors most relevant to an accurate diagnosis for each gender
INITIAL MODEL REVIEW
INITIAL MODEL REVIEW
Male Female
Recall 80% 81%Accuracy
77% 70%
Model Results:(with cross-validation)
NAÏVE BAYES PERFORMANCE: ON TEST SET
True Negative False Positive
False Negative True Positive
True Negative False Positive
False Negative True Positive
DISTINGUISHING FEATURES BY GENDER: MALEHistograms of feature values in each class
DISTINGUISHING FEATURES BY GENDER: MALEHistograms of feature values in each class
DISTINGUISHING FEATURES BY GENDER: FEMALEHistograms of feature values in each class
DISTINGUISHING FEATURES BY GENDER: FEMALEHistograms of feature values in each class
CONCLUSION Gaussian Naïve Bayes proved to be the best algorithm to diagnose heart disease in men
and women Features that most contribute most to diagnosis of CAD in men are:
Age > 55 Resting Heart Rate < 140 bpm Resting Blood Pressure (Sys. + Dias.) > 220 mmHg
Features that most contribute most to diagnosis of CAD in women are: Cholesterol > 275 mg/dL Peak Heart Rate < 150 bpm Peak Blood Pressure (Sys. + Dias.) > 240 mmHg
Thank you!