Driving Healthcare Operations with Data Science

Post on 13-Jan-2017

103 views 0 download

Transcript of Driving Healthcare Operations with Data Science

Driving Healthcare Operations with Data

Science

"literally a health insurance company"

"Operations"Clinical Operations

● Close member "gaps in care"○ Not taking their meds○ Not seeing their doctors○ Not getting tested

● Document conditions

Insurance Operations

● Approve / deny claims● Approve / deny authorizations● Catch fraud

"Operations"Clinical Operations

● Close member "gaps in care"○ Not taking their meds○ Not seeing their doctors○ Not getting tested

● Document conditions

Insurance Operations

● Approve / deny claims● Approve / deny authorizations● Catch fraud

E.g.

Talk to them about consequences of not

doing so?

Knock on the doors of most

non-adherent members?

Ask members politely? Use different messages for rich and

poor members?

Enter

Data ScienceEnter

Data ScienceWhat should we do?

For whom?

Did it work?

Enter

Case Study: Whom to Call for Home Visits?

Can we predict which of our diabetic members will have complications in the

next 6 months?

Time

Observation Interval Prediction Interval

Time

Observation Interval Prediction Interval

Demographic info, lab tests, medications,

other diagnoses

Diagnosed with diabetes

complications?

Features Labels

Member Age Hypertension hba1c

CP001 65 Yes 6.5

CP002 77 No 8.3

CP002 84 Yes 7.4

Diagnosed with Complication in 6-month Interval

Yes

No

Yes

Challenge: High Class Imbalance● Historically, only 8% of diabetic members have been diagnosed with

complications over a 6-month period.

● Easy to get "high" accuracy, but hard to get decent precision/recall tradeoff.

Approach: High Class Imbalance● Evaluate using area under ROC curve.

● Empirically, tree ensemble models appear to handle the imbalance better than logistic regression.

Challenge: Missing Data● Glycated hemoglobin clearly an important feature… but we only have

measurements for ~60% of members.

● Whether we have a measurement correlates with both:○ Diabetes complications.○ How well a model trained without the lab measurement performs.

Approach: Missing Data● Simply hardcode all missing values to something outside the measurement

range.○ In our case, 0.0.

● This way, tree models can split on "have a measurement" vs. "don't have a measurement".

Final Model: Gradient Boosting Tree Ensemble

Evaluation

AUROC: 0.8

Precision: 24%

Recall: 66%

Most Predictive Features

Glycated Hemoglobin

Age

Hypertension

Takes Insulin

Did it work?

Do we catch more complications if we make calls using the model?

Control Group Treatment Group

Control Group Treatment Group

Call Group(Chosen at Random)

Call Group(Chosen by Model)

Found Complications Didn't Find Complications

Control Group 8 92

Treatment Group 24 76

FAKE RESULTS

Found Complications Didn't Find Complications

Control Group 8 92

Treatment Group 24 76

FAKE RESULTS

Chi-Squared Test