Driving Healthcare Operations with Data Science
-
Upload
sandy-ryza -
Category
Data & Analytics
-
view
103 -
download
0
Transcript of Driving Healthcare Operations with Data Science
Driving Healthcare Operations with Data
Science
"literally a health insurance company"
"Operations"Clinical Operations
● Close member "gaps in care"○ Not taking their meds○ Not seeing their doctors○ Not getting tested
● Document conditions
Insurance Operations
● Approve / deny claims● Approve / deny authorizations● Catch fraud
"Operations"Clinical Operations
● Close member "gaps in care"○ Not taking their meds○ Not seeing their doctors○ Not getting tested
● Document conditions
Insurance Operations
● Approve / deny claims● Approve / deny authorizations● Catch fraud
E.g.
Talk to them about consequences of not
doing so?
Knock on the doors of most
non-adherent members?
Ask members politely? Use different messages for rich and
poor members?
Enter
Data ScienceEnter
Data ScienceWhat should we do?
For whom?
Did it work?
Enter
Case Study: Whom to Call for Home Visits?
Can we predict which of our diabetic members will have complications in the
next 6 months?
Time
Observation Interval Prediction Interval
Time
Observation Interval Prediction Interval
Demographic info, lab tests, medications,
other diagnoses
Diagnosed with diabetes
complications?
Features Labels
Member Age Hypertension hba1c
CP001 65 Yes 6.5
CP002 77 No 8.3
CP002 84 Yes 7.4
Diagnosed with Complication in 6-month Interval
Yes
No
Yes
Challenge: High Class Imbalance● Historically, only 8% of diabetic members have been diagnosed with
complications over a 6-month period.
● Easy to get "high" accuracy, but hard to get decent precision/recall tradeoff.
Approach: High Class Imbalance● Evaluate using area under ROC curve.
● Empirically, tree ensemble models appear to handle the imbalance better than logistic regression.
Challenge: Missing Data● Glycated hemoglobin clearly an important feature… but we only have
measurements for ~60% of members.
● Whether we have a measurement correlates with both:○ Diabetes complications.○ How well a model trained without the lab measurement performs.
Approach: Missing Data● Simply hardcode all missing values to something outside the measurement
range.○ In our case, 0.0.
● This way, tree models can split on "have a measurement" vs. "don't have a measurement".
Final Model: Gradient Boosting Tree Ensemble
Evaluation
AUROC: 0.8
Precision: 24%
Recall: 66%
Most Predictive Features
Glycated Hemoglobin
Age
Hypertension
Takes Insulin
Did it work?
Do we catch more complications if we make calls using the model?
Control Group Treatment Group
Control Group Treatment Group
Call Group(Chosen at Random)
Call Group(Chosen by Model)
Found Complications Didn't Find Complications
Control Group 8 92
Treatment Group 24 76
FAKE RESULTS
Found Complications Didn't Find Complications
Control Group 8 92
Treatment Group 24 76
FAKE RESULTS
Chi-Squared Test