Post on 20-Mar-2017
Predicting Hospital Readmissions
MANGT 665Business Analytics & Data Mining
Prof. Bongsug ChaeFinal Project
Derek ChristensenDec 8, 2016
Telling a Story with Data*(Communicating effectively with analytics)
• Summary• Recommendations• Implications of Results• Outline of Research Process
*Deloitte Review by Thomas H. Davenport
Problem Description & Introduction• - Background: Hospitals are penalized for patients that are re-
admitted less than 30 days after they are released.• - Business Objectives: To reduce or eliminate the number of
patients re-admitted less than 30 days after they are released.
• - Success Criteria: Identification of factors that increse the likelihood of a patient returning within 30 days.
• - Business Value: The average cost in 2011 for a hospital stay was $10,000.*
• - *http://www.beckershospitalreview.com/finance/11-statistics-on-average-hospital-costs-per-stay.html
Key Findings & Insights• Random Forest = 99.23%
Final Analysis & Recommendation
Next Steps- Analyse those close to the 30 day threshold - i.e. 31 to 45-60 days- Weight Data- Cross referencing between the 3 Diagnosis'- Analyzing the Order of the 3 Diagnosis'- Add more Diagnosis- More Granular in the Diagnosis- ?
Dataset• Description: The dataset contains over 56,000 HIPPA
compliant de-identified records of hospital admissions.• Source: Hack K-State 2016 : Data Science For Social
Good - https://zslie.github.io/• Details: There are 50 columns, of which is the Visit ID
and Patient ID, along with 48 factors.• Factors: The factors have varying number of attributes,
ranging from 1 to 715, so there are ~5.27x10^41 solutions.
• Factors: Descriptions below.
ETL• Performed some data manipulation directly
in excel, including:• Changed 'medical_specialy' to 'MED_SPEC_NUM'• Changed the 3 'diag_x's to 'DIAG_CAT_X'S & converted 858
unique diagnosis' into 33 Diagnosis Categories• Notes are in Challenge_1_Training_Data_Conversion.xlsx file on
the "Storage" page• Key Business Data Question Summary• Of 56,000 hospital visits in this dB:
• 6,285 were re-admitted < 30 days - these are the instances that need solved for
• 19,477 were also re-admitted, but after the 30 day threshold• 30,238 were not re-admitted - there could be some insight also
gleaned from why they DID'T have to be re-admitted
Exploratory Analysis• Preliminary possibilites correlated with readm2 has
changed versus readmitted• number_emergency = 0.103321 ==> No longer is showing
significant correlation now at 0.053• number_inpatient = 0.233149 ==> Is now the only one showing
any significant correlation at 0.162• number_diagnoses = 0.103885 ==> No longer is showing
significant correlation now at 0.045
Model Building• Decision Tree – 88.97%
Interesting questions about the data – what does it mean??
Questions