Using Predictive Analytics to Solve Business Problems · Using Predictive Analytics to Solve...
Transcript of Using Predictive Analytics to Solve Business Problems · Using Predictive Analytics to Solve...
SOA Predictive Analytics Seminar – Taiwan 31 Aug. 2018 | Taipei, Taiwan
Session 3
Using Predictive Analytics to Solve Business Problems
Yi-Ling Lin, FSA, MAAA
9/12/2018
2018 SOA Predictive Analytics TaiwanYI‐LING LIN, FSA, MAAA, FCA
THE TERRY GROUP
Using Predictive Analytics to Solve Business ProblemsAugust 31, 2018
What’s the Buzz is all About…
Many terms used interchangeably…confusing…exciting…and seem magical
Big Data
Machine Learning
Artificial Intelligence
Deep Learning
Wide Data
Tall Data
Predictive Analytics
Data Analytics
Big Data• Five “V”s…• Structured/unstructured data…• Traditional data……to provide value need…
Data Analytics• Spectrum by function, business value and
sophistication… uses traditional and emerging algorithms…
Statistical methods and machine learning• ML is part of AI, deep learning is part of
ML• Mathematically defined “learning”
architecture plus optimization framework…
9/12/2018
Applying Data Analytics to Business Problems
3
Predictive
Prescriptive
Spectrum of data analytics: hindsight to insight to foresight
Value towardsbusiness solutions
What happened?
Why did it happen?
What will happen?
0
1
2
3
4
5
6
7
8
A B C D
Diagnostic
Descriptive
How do I influence what will happen?
0
1
2
3
4
5
6
7
8
A B C D
0
4
8
12
16
0 4 8 12 16 20
Y1
X1
0
4
8
12
16
0 4 8 12 16 20
Y1
X1
Adapted from Gartner’s Data Analytics Maturity Model
Analytical sophistication
…or decision‐informing
Application of Data Analytics in HealthCurrent and emerging applications of data analytics in US healthcare
• Dashboards, descriptive statistics, and reporting
• Trends exploration and forecasting
• Pricing
• Claims reserving
• Risk scoring, risk stratification, and risk adjustment
• Plan design modeling
• Stress testing
• Care management and decease prediction
• Fraud detection and outliers analysis
• Assumptions setting
… and many more
2009 2010 2011 2012 2013 2014 2015 2016Year
9.2%
9.9%
9.9%
11.4%
8.8%
7.8%7.9%7.4%
6.6%
8.7%
7.2%
8.2%
Annual Trend
9/12/2018
Models and Business ProblemsData analytics toolbox: spectrum of traditional to emerging methodologies
Generalized linear regression/
logit
Clustering/ classification
Survival/ Markov models
Time Series
Deep Leaning
Machine Learning
• Trend forecasting• Stress testing
• Fraud identification• Targeted marketing• High cost group stratification• Provider referral patterns
• Disease progression• Claims reserving
• Risk Scoring and Risk Adjustment
• Plan design and choice modeling
• Product conversion
• Signal processing• Text processing
New Programming Paradigm: Machine Learning• Humans Input data & answers• And how to “learn”… and what does it mean to
be wrong…• Example: clustering algorithm or neural networks
or decision tree/Random Forest
Traditional Modeling versus Machine LearningCould computer automatically learn the rules by looking at data?
Traditional
Classical programming model• Humans input data and set of
rules/function on how to arrive at answers • Also how close they want data to fit to the
“model”…• Example: linear regression or generalized
linear regression
Data
Rules
AnswersHumans input:
Machine Learning
Data
Rules
Answers
Humans input:
9/12/2018
Model Evaluation and ValidationModel evaluation is an important part of any modeling project
• Relevance and importance of criteria
• Appropriate and consistent with purpose
• On “unseen” or “test” sample of data
• Examples of criteria/metrics
Standard statistical measures (R squared, RMSE, MAE, etc.)
Predictive Ratios on groups of interest (e.g. Diagnostic groups or age groups)
Tolerance curves
Sensitivity and specificity (confusion matrix)
ROC curves
Comparison with naïve and standard models
…and many more
Cautionary tale!Famous Anscombe’s quartet: all four datasets have the same statistical properties, including R squared=0.67, means and variance of x and y, correlation and linear regression model: y=3+0.5x
Challenges Exciting things often come with challenges and potential pitfalls
• Messy, often high‐dimensional with missing values, data and data quality issues
• Potential bias in data
• Use of proxies
• Non‐discrimination, security and confidentiality
• Transparency vs. “black box”
• Spurious correlations: correlation vs. causality
• Interpretability and replicability
• Overfitting and overreliance
• Business purpose appropriateness and applicability
… and many more
Stories from the front lines…
o Quality of data. NYC taxi story: if it’s too good to be true… it’s probably isn’t!
o Bias inherited from humans through training data (training facial analytics on male white faces!)
o Spurious correlations…(divorce rate in Maine correlates with per capita margarine consumption at 99.3%)
9/12/2018
Case Study: Risk Scoring Modeling in Health
Risk Scoring Modeling in Healthcare Application of data analytics in healthcare for risk scoring
Risk scoring = using individuals’ data to “predict”/estimate outcome of an event
Used in many areas: cost predictions, mortality hazard ratios, probability of disease
Credit scores, healthcare risk scores, mortality risk scores, life underwriting…
In healthcare risk scoring (cost-based) in US moving from traditional to emerging
– Methodologies: from linear regression models to machine learning algorithms
– Data: traditional claims and enrollment data (Rx, Dx, demographic, prior year costs, etc.) to new and emerging data, e.g. socio-economic status factors
– Types of risk scoring models: concurrent vs. prospective; off-the-shelf vs. custom
Health risk scores are used for variety
of purposes
Many sources of uncertainty
9/12/2018
Case Study: Descriptive AnalyticsDashboards, distributions, descriptive statistics
In this case study babies under age of 2 were excluded, and population shown were enrolled at least for one month in both years
This is traditional analysis to inform what actually happened and the first step in any modeling project
160237
374
501605 593
710657
704780 805
964
395 348 331
405456
348422
482
744
884
1,221
1,541
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
baby child 18‐24 25‐29 30‐34 35‐39 40‐44 45‐49 50‐54 55‐59 60‐64 65+
Average PMPM (Medical and Rx) by year and gender
F‐2016 PMPM F‐2017 PMPM M‐2016 PMPM M‐2017 PMPM
Case study for illustration purposes only
Case Study: Diagnostic AnalyticsInvestigating and identifying trends & relationship
Claim costs are lognormally distributed; prior and current somewhat linearly correlated
(0.54)
Diagnostics focused on uncovering
patterns, relationships, trends,
and potentially engineering
predictive features
Case study for illustration purposes only
Case study for illustration purposes only
9/12/2018
Case Study: Comparing Custom and Off‐the‐shelfGreat improvement on accuracy and predictive ratios
On Test Data:is 0.24, correlation 0.65,
MAE=67% for off‐the‐shelf HCC
is 0.48, correlation 0.70, MAE = 73% for residual custom‐recalibrated HCC
Case study for illustration purposes only
0
1000
2000
3000
4000
5000
6000
0 1000 2000 3000 4000 5000 6000
Residual recalibrated HCC‐based Risk Scores
0
1000
2000
3000
4000
5000
6000
0 1000 2000 3000 4000 5000 6000
Off‐the‐shelf HCC Risk Scores
0.62
0.74
0.88
0.99
1.03
1.10
1.01
1.08
1.161.12
1.17
0.93
1.10
1.011.04
0.991.03
1.05
0.971.00 0.99 0.98
1.02 1.03
0.6
0.7
0.8
0.9
1
1.1
1.2
baby child 18‐24 25‐29 30‐34 35‐39 40‐44 45‐49 50‐54 55‐59 60‐64 65+
Predictive Ratios by Age Group
Perfect fit HCC Recalibrated HCC‐based
Actual cost
Actual cost
Predicted cost Predicted
cost
Case Study: Decision‐informing AnalyticsVarious uses of risk scoring in health care: population health and care management
From low to high
Identifying best cases for care management
Acute illness, trauma, accidents
Chronic deceases, high cost and risk
HealthyRising cost?
Risk
Cost
Assess characteristics
of high risk/low cost
group: potential for
care management
Prevention and wellness programs
Case study for illustration purposes only
9/12/2018
Case Study: Health Plan Choice Modeling
Case Study: Choice ModelingAn employer group wants to change the medical plans it offers to employees
Important to “predict” who will select various plans for financial and other modeling
Current Options
Actuarial Value
Percent
HMO 0.90 10%
High Value 0.86 42%
Medium Value 0.81 45%
CDHP 0.76 2%
New Options
Actuarial Value
Percent
Plan A 0.87
Plan B 0.81
Plan C 0.75
Plan D 0.68
Modeling
Participants data
Current choices /utility
Probability of selecting new options
9/12/2018
17
Subject Matter Expertise is Critical
ID Medical Coverage Coverage Tier Zip Code
2 Medium Value Option Employee + 1 Dependent 95066
6 Medium Value Option Employee Only 98053
7 Medium Value Option Employee Only 60630
8 HMO Plan Employee Only 95121
9 Medium Value Option Employee Only 33472
14 HMO Plan Employee + 1 Dependent 94610
15 Medium Value Option Employee Only 94109
16 Medium Value Option Employee Only 60478
17 HMO Plan Employee + 1 Dependent 94607
21 Medium Value Option Employee + 1 Dependent 11377
24 Medium Value Option Employee Only 94587
25 HMO Plan Employee Only 94109
26 Medium Value Option Employee Only 94606
27 Medium Value Option Employee Only 94133
28 Medium Value Option Employee Only 60532
31 Medium Value Option Employee + 2 or More Dependents 95635
32 HMO Plan Employee + 1 Dependent 91206
33 Medium Value Option Employee Only 02458
34 Medium Value Option Employee Only 94103
36 Medium Value Option Employee Only 98121
39 Medium Value Option Employee Only 60614
45 Medium Value Option Employee + 1 Dependent 10604
48 HMO Plan Employee Only 94121
53 HMO Plan Employee Only 94122
54 HMO Plan Employee Only 95356
55 Medium Value Option Employee Only 33026
58 Medium Value Option Employee Only 94107
59 Medium Value Option Employee Only 94123
61 HMO Plan Employee Only 94501
63 HMO Plan Employee + 2 or More Dependents 94595
69 Medium Value Option Employee + 1 Dependent 32703
72 Medium Value Option Employee Only 90046
73 HMO Plan Employee + 2 or More Dependents 91367
77 Medium Value Option Employee Only 94587
81 Medium Value Option Employee Only 28173
82 HMO Plan Employee Only 95691
85 Medium Value Option Employee Only 10573
89 Medium Value Option Employee Only 94115
Feature engineering is often informed by subject matter expert
Data Visualizations and Feature Engineering
18
$0K $100K $200K $300K $400K $500K
Salary
$0
$50
$100
$150
$200
$250
$300
$350
$400
$450
$500
$550
$600
$650
$700
$750
$800
To
talM
on
thly
De
du
ctio
ns
(no
Med
)
Current Medical Plan
CDHP Opt ion
High Value Opt ion
CDHP Opt ion High Value Opt ion
$0K $50K $100K $150K $200K $250K
Salary
$0K $50K $100K $150K $200K $250K
Salary
$0
$50
$100
$150
$200
$250
$300
$350
$400
$450
$500
$550
$600
$650
$700
$750
$800
To
talM
on
thly
Ded
uct
ion
s (n
o M
ed)
Data visualization helps with feature selection, model evaluation and much more…
9/12/2018
Process of Predicting and Evaluating Choice
What are predicted costs for individuals and plan?
Logit Model
Age / Stage in Life
Individual Plan
Elections
Risk Tolerance
Premium / Contributions
Expected Claims
Plan Design
Individual Annual Cost
Total Employee
Cost Sharing for the
individual
Plan Cost
Case Study: Modeling ApproachEstimating Parameters
Heterogeneous Logit Model
• i – individuals
• j ‐ plan options
• k ‐ # of attributes with weights ik
• ij – utility of plan option j to
person i
ij = j i + ij ik = k + k ik
• Monte Carlo simulation and
maximize log likelihood function
Logit Model
Age / Stage in Life
Individual Plan
Elections
Risk Tolerance
Premium / Contributions
Expected Claims
Plan Design
9/12/2018
Case Study: Modeling New ChoicesProbability of participants’ elections of options A, B, C, or D
New choices
Know preferences ( and )
and now changing the
attributes ( )
Logit Model
Age / Stage in Life
Individual Plan
Elections
Risk Tolerance
Premium / Contributions
Expected Claims
Plan Design
Case Study: Choice Modeling Results
22
Current Options
Actuarial Value
Percent
HMO 0.90 10%
High Value 0.86 42%
Medium Value 0.81 45%
CDHP 0.76 2%
New Options
Actuarial Value
Percent
Plan A 0.87 57%
Plan B 0.81 22%
Plan C 0.75 17%
Plan D 0.68 5%
Use results to inform business decisions and identify areas for further study
In this case study, people are more likely to choose a higher value plan option regardless of their current plan
Plan A (87% AV) Plan B (81% AV) Plan C (75% AV) Plan D (68% AV)
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
2.2
Loss
Ra
tio
9/12/2018
Questions? Thoughts… Comments?