L.A. Miner, R Nisbet, J.H. Hilbe, P.S. Bolding, M ... › wp-content › ... · Predictive...
Transcript of L.A. Miner, R Nisbet, J.H. Hilbe, P.S. Bolding, M ... › wp-content › ... · Predictive...
Keep on the lookout for the forthcoming seminal book:
L.A. Miner, R Nisbet, J.H. Hilbe, P.S. Bolding, M. Goldstein, N Walton, and G.D.
Miner; Practical Predictive Analytics and Decisioning Systems for Medicine:
Accuracy and Cost-Effectiveness for Healthcare Administration and Delivery
Including Medical Research, (2013 / 2014) Waltham, MA: Elsevier/Academic
Press
0
BOOK: Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die
www.thepredictionbook.com CONFERENCE:
Predictive Analytics World
San Francisco, Chicago, Boston, Washington DC, Toronto, Berlin, and London
www.predictiveanalyticsworld.com
ONLINE PORTAL AND NEWS SITE:
Predictive Analytics Times
www.predictiveanalyticstimes.com
CONFERENCE:
Text Analytics World
San Francisco and Boston
www.textanalyticsworld.com
Online training:
“Predictive Analytics Applied" - View it on-demand www.businessprediction.com
1
2
3
4
5
MISC. BROAD PUBLICATIONS ON HEALTHCARE ANALYTICS:
Nice survey report wrt priority level of healthcare applications of predictive
analytics: http://www.ehcca.com/presentations/predmodel6/riddle_pc2.pdf
For more healthcare applications, see: "9 Ways to Apply Predictive Analytics to
Healthcare" - http://www.icosystem.com/9-ways-to-apply-predictive-analytics-
to-healthcare/
... and also see: "EHR oracles: Merging big data streams could create mission
control for healthcare" - http://medcitynews.com/2013/05/ehr-oracles-merging-
big-data-streams-could-create-mission-control-for-healthcare/
Analytics Magazine's Jan/Feb 2012 edition focuses on healthcare analytics:
http://viewer.zmags.com/publication/644753a2#/644753a2/1
GOOD ARTICLE: http://www.analytics-magazine.org/januaryfebruary-
2012/503-analytics-a-the-future-of-healthcare
http://www.forbes.com/sites/tomgroenfeldt/2012/01/20/big-data-delivers-deep-
views-of-patients-for-better-care/
I am an individual patient, and an individual insurance policyholder. Risk effects
all parties involved.
ACL replacement surgery choice of graft source influences the risk of long term
knee pain when “knee walking”.
7
Insured “office workers”
8
For the story on exactly how Obama’s campaign used predictive analytics in
2012:
http://bigthink.com/experts-corner/team-obama-mastered-the-science-of-mass-
persuasion-and-won
http://www.predictiveanalyticsworld.com/patimes/obama/
9
10
11
12
13
14
15
17
18
19
Predictive modeling learns from data in order to generate a predictive model. For
details on how this work see Chapter 4 of the book "Predictive Analytics: The
Power to Predict Who Will Click, Buy, Lie, or Die"
(http://www.thepredictionbook.com).
20
A predictive model generates a predictive score for an individual. For details on
how this work see Chapters 1 and 4 of the book "Predictive Analytics: The Power
to Predict Who Will Click, Buy, Lie, or Die"
(http://www.thepredictionbook.com).
21
Marketing targets an individual predicted as likely to buy. For details on how this
work see the Introduction and Chapter 1 of the book "Predictive Analytics: The
Power to Predict Who Will Click, Buy, Lie, or Die"
(http://www.thepredictionbook.com).
22
23
24
Is prediction an audacious goal? Isn't prediction impossible? For details on how
why predictive analytics predicts well enough, see the Introduction and Chapter
1 of the book "Predictive Analytics: The Power to Predict Who Will Click, Buy,
Lie, or Die" (http://www.thepredictionbook.com).
25
A crummy predictive model delivers big value. It’s like a skunk with bling.
Simple arithmetic shows the bottom line profit of direct mail, both in general and
then improved by predictively targeting (and only contacting 25% of the list).
The less simple part is how the predictive scores are generated for each
individual in order to determine exactly who belongs in that 25%. For details on
how this work see Chapter 1 of the book "Predictive Analytics: The Power to
Predict Who Will Click, Buy, Lie, or Die" (http://www.thepredictionbook.com).
Put another way, predicting better than guessing is often sufficient to generate
great value by rendering operations more efficient and effective. For details on
how this works, see the Introduction and Chapter 1 of the book "Predictive
Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die"
(http://www.thepredictionbook.com).
26
27
28
Each row of training data corresponds to one individual – first the individual’s
facts and figures are listed (predictor variables, aka independent variables), and
then the target variable (aka dependent variable) – ie, the thing you’re trying to
predict – is listed.
A table of such rows composes the training data, on which predictive modeling
operates.
29
Urban myth to some, but based on reported results:
http://www.dssresources.com/newsletters/66.php
30
Online conversion to paying membership, by email domain. Customers who sign up
with "Hotmail" and "Yahoo" email accounts are far less likely (20 - 25% as likely) to
convert to a paid subscription than users with email addresses that may be more
"permanent," such as ".net" or "EarthLink" email addresses. This insight speaks directly
to business strategy, such as employing incentives for customers to provide permanent
email addresses, or partnering with certain email service providers.
Candian Tire examples, from "What Does Your Credit-Card Company Know
About You?" New York Times, May 12, 2009.
http://www.nytimes.com/2009/05/17/magazine/17credit-t.html
32
Correlation does not entail causation. For more information, see Chapter 3 of the
book "Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or
Die" (http://www.thepredictionbook.com).
33
34
"Prospective Study of Breakfast Eating and Incident Coronary Heart Disease in a
Cohort of Male US Health Professionals," by Cahill et al.
http://circ.ahajournals.org/content/128/4/337.full.pdf
35
36
The hip bone’s connected to the leg bone, and the leg bone’s…
Your purchases relate to your shopping history, online behavior, and preferred
payment method, and to the actions of your social contacts. Data reveals how to
predict consumer behavior from these elements.
Your health relates to your life choices and environment, and therefore data
captures connections predictive of health based on type of neighborhood and
household characteristics.
Your job satisfaction relates to your salary, evaluations, and promotions, and data
mirrors this reality.
Financial behavior and human emotions are connected, so data may reflect that
relationship as well.
37
38
39
...and many more, such as Cox Communications, FedEx, Sprint, etc. - see the
book "Predictive Analytics" (www.thepredictionbook) for many case studies,
including a central compendium of 147 mini-case studies, of which 37 are
examples in marketing applications of predictive analytics.
PREMIER Bankcard also lowered delinquency to increase net by over $10
million
More information about First Tennessee Bank and other case studies are available
at http://tinyurl.com/PAExamples
Dan Marks, First Tennessee Bank, "First Tennessee Bank: Analytics Drives
Higher ROI from Marketing Programs," IBM.com, March 9, 2011.
www.ibm.com/smarterplanet/us/en/leadership/firsttenbank/assets/pdf/IBM-
firstTennBank.pdf
40
41
Both marketing and clinical nomenclature:
1. treatment
2. risk
3. outcome
4. effectiveness
5. impact
42
Other examples not included in this presentation:
Pulminary embellism diagnosis:
http://www.sigkdd.org/kddcup/index.php?section=2006&method=info
http://ccls.columbia.edu/research/epilepsy-early-warning
"A New Approach to Predicting Epileptic Seizures" -
http://spectrum.ieee.org/biomedical/diagnostics/a-new-approach-to-predicting-
epileptic-seizures
43
Also see:
Andrew H. Beck, Ankur R. Sangoi, Samuel Leung, Robert J. Marinelli, Torsten
O. Nielsen, Marc J. van de Vijver, Robert B. West, Matt van de Rijn, and Daphne
Koller, "Systematic Analysis of Breast Cancer Morphology Uncovers Stromal
Features Associated with Survival," Science Magazine Online; Science
Translational Medicine 3, issue 108 (November 9, 2011): 108ra113.
doi:10.1126/scitranslmed.3002564.
http://stm.sciencemag.org/content/3/108/108ra113.
Andrew Myers, "Stanford Team Trains Computer to Evaluate Breast Cancer,"
Stanford School of Medicine, November 9, 2011.
http://med.stanford.edu/ism/2011/november/computer.html.
44
See also, wrt diabetes: "Leveraging Disparate Data Sources to Drive Business
Decisions," Swati Abbott, CEO, Blue Health Intelligence -
http://www.predictiveanalyticsworld.com/chicago/2013/agenda.php#day1_350a
45
This type of 2-by-2 error table has the very helpful name confusion matrix.
46
Kaggle, "Predict HIV Progression," Competition, April 27, 2010.
https://www.kaggle.com/c/hivprogression/details/Background
47
Health Prize, "Improve Healthcare, Win $3,000,000," Competition, April 4, 2011.
www.heritagehealthprize.com/c/hhp
Another admissions prediction case study: Scott Zasadil, UPMC Health Plan, "A
Predictive Model for Hospital Readmissions," Predictive Analytics World
Washington, DC, Conference, October 19, 2010, Washington, DC.
www.predictiveanalyticsworld.com/dc/2010/agenda.php#day1-17a. For video
access to this session:
http://www.rmportal.performedia.com/rm/paw10/gallery_01#1358
OTHER READMISSION CASE STUDIES:
http://www.ehcca.com/presentations/predmodel6/kalman_1.pdf
48
Riskprediction.org.uk
See also:
Mortality Prediction in Cardiac Surgery Patients. Comparative Performance of
Parsonnet and General Severity Systems. J. Martinez-Alario, MD; I. D. Tuesta,
MD; E. Plasencia, MD; M. Santana, MD; M. L. Mora, MD, PhD.
http://circ.ahajournals.org/content/99/18/2378.long
Predicting Mortality Risk in Patients Undergoing Bariatric Surgery.MARK H.
EBELL, MD, MS, Athens, Georgia. Am Fam Physician. 2008 Jan 15;77(2):220-
221. http://www.aafp.org/afp/2008/0115/p220.html
Predicting mortality in damage control surgery for major abdominal trauma
JOEP TIMMERMANS, M.D. et al.
http://www.ajol.info/index.php/sajs/article/viewFile/53677/42233
EuroSCORE Predicts Long-Term Mortality After Heart Valve Surgery. Ioannis K.
Toumpoulis, MD, Constantine E. Anagnostopoulos, MD, Stavros K. Toumpoulis,
MD, Joseph J. DeRose, Jr, MD, and Daniel G. Swistel, MD.
http://www.columbia.edu/~cea8/Toumpoulis/_pdfs/2005_06_ats_079_1902-
1908.pdf
49
http://stream.wsj.com/story/latest-headlines/SS-2-63399/SS-2-267868/
http://ajp.psychiatryonline.org/article.aspx?articleid=177762
http://docserver.ingentaconnect.com/deliver/connect/asma/00956562/v74n7/s12.p
df?expires=1360465524&id=72798881&titleid=8218&accname=Guest+User&c
hecksum=01AF11DF90C899E18B406BAE23D17737
http://www.sciencedirect.com/science/article/pii/S0091743502910546
http://www.sciencedirect.com/science/article/pii/S0749379799001774
50
More on mortality prediction:
Solo rockers die younger than those in bands. Although all rock stars face higher
risk, solo rock stars suffer twice the risk of early death as rock band members.
This may be due to the fact that band members benefit from peer support and
solo artists exhibit even riskier behavior (factoid courtesy of public health offices
in the UK).
Men on the Titanic faced much greater risk than women. A woman on the Titanic
was almost four times as likely to survive as a man. Most men died and most
women lived. This may be due to the fact that priority for access to life boats was
given to women.
51
52
53
54
Arizona's Petrified Forest National Park
Psychology professor Robert Cialdini
55
56
57
58
An example interstitial promotion. If the user accepts the offer, he/she is
allowing the host to pass profile information directly to the sponsor (in addition
to the fields shown).
59
A few additional percentage points can be tough to get, in the face of fairly adept
existing systems, but can make a big difference. Consider the insurance business,
where predictive analytics aims to reduce the loss ratio by 2 to 5 points beyond
that attained by standard actuarial methodology, or the engineering of jet engines,
where a 1% increase in efficiency would be a huge bite out of annual fuel
consumption.
The revenue results above are for interstitial ads only; many more ads are
embedded within functional product web pages, and could also be targeted with
only a slight alteration to the analytical system and deployment integration
developed for this project.
The large 25% increase in acceptance rates means formerly less "popular" ads are
now being given a better chance, leading to success; these sponsors are likely to
appreciate the increase in customer leads now coming from advertising with the
client.
Likewise, user satisfaction is likely higher, since users are seeing more ads in
which they are provably more interested.
60
IBM’s Watson computer predicts, for an individual Jeopardy! question and
candidate answer, whether it is the correct answer. For more information, see
Chapter 6 of Predictive Analytics (www.thepredictionbook.com)
61
Once the tests are complete and results are added to the existing
information that has already been gathered, the Watson Advisor has
identified treatment options for the oncologist to consider along with the
corresponding confidence levels. Watson has also indicated how each of
the treatment options align with the patient’s preference for limiting hair
loss, if possible.
Relevant IBM Case studies: http://www-
01.ibm.com/software/success/cssdb.nsf/industryL2VW?OpenView&Start=
1&Count=10&RestrictToCategory=spss_Healthcare&cty=en_us
63
65
66
67
Does contacting the customer make them more likely to respond?
MEDICAL:
Will the patient survive if treated?
"My headache went away!“ Proof of causality by example.
Driving medical decisions with personalized medicine: selecting treatment, e.g.,
treating heart failure with betablockers
Personalized medicine. Naturally, healthcare is where the term treatment
originates. While one medical treatment may deliver better results on average
than another, personalized medicine aims to decide which treatment is best suited
for each patient, since a treatment that helps one patient could hurt another. For
example, to drive beta-blocker treatment decisions for heart failure, researchers
"use two independent data sets to construct a systematic, subject-specific
treatment selection procedure." (Claggett et al 2011) Certain HIV treatment is
shown more effective for younger children. (McKinney et al 1998) Cancer
treatments are targeted by the patient's genes. (Winslow 2011)
Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, and L. J. Wei.
"Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment
with Competing Risk Event-Time Data." Harvard University Biostatistics
69
Personalized medicine. While one medical treatment may deliver better results
on average than another, this one-size-fits-all approach commonly implemented
by clinical studies means treatment decisions that help many may in fact hurt
some. In this way, healthcare decisions backfire on occasion, exerting influence
opposite to that intended: they hurt or kill - although they kill fewer than
following no clinical studies at all. Personalized medicine aims to predict which
treatment is best suited for each patient, employing analytical methods to predict
treatment impact (i.e., medical influence) similar to the uplift modeling
techniques used for marketing treatment decisions. For example, to drive beta-
blocker treatment decisions for heart failure, researchers "use two independent
data sets to construct a systematic, subject-specific treatment selection
procedure." A certain HIV treatment is shown to be more effective for younger
children. Treatments for various cancers are targeted by genetic markers - a trend
so significant the Food and Drug Administration is increasingly requiring for new
pharmaceuticals, as the New York Times puts it, "a companion test that could
reliably detect the [genetic] mutation so that the drug could be given to the
patients it is intended to help," those "most likely to benefit.“
Andrew Pollack, "A Push to Tie New Drugs to Testing," New York Times,
December 26, 2011. www.nytimes.com/2011/12/27/health/pressure-to-link-
drugs-and50-companion-diagnostics.html.
70
Personalized medicine – evaluation and development of uplift modeling for healthcare applications:
To drive beta-blocker treatment decisions for heart failure, researchers "use two independent data sets to construct a systematic, subject-specific treatment selection procedure."
A certain HIV treatment is shown to be more effective for younger children. Treatments for various cancers are targeted by genetic markers -- a trend so significant the Food and Drug Administration is increasingly requiring for new pharmaceuticals, as the New York Times puts it, "a companion test that could reliably detect the [genetic] mutation so that the drug could be given to the patients it is intended to help, those "most likely to benefit."
Brian Claggett, Lihui Zhao, Lu Tian, Davide Castagno, and L. J. Wei, "Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Competing Risk Event-Time Data," Harvard University Biostatistics Working Paper Series, Working Paper 125, March 2011. biostats.bepress.com/harvardbiostat/paper125.
Ross E. McKinney Jr., MD, George M. Johnson, MD, Kenneth Stanley, MD, Florence H. Yong, MS, Amy Keller, Karen J. O'Donnell, PhD, Pim Brouwers, PhD, Wendy G. Mitchell, MD, Ram Yogev, MD, Diane W. Wara, MD, Andrew Wiznia, MD, Lynne Mofenson, MD, James McNamara, MD, and Stephen A. Spector, MD, the Pediatric AIDS Clinical Trials Group Protocol 300 Study Team, "A Randomized Study of Combined Zidovudine-Lamivudine versus Didanosine Monotherapy in Children with Symptomatic Therapy-Naive HIV-1 Infection," Journal of Pediatrics 133, issue 4 (October 1998): 500-508. www.sciencedirect.com/science/article/pii/S0022347698700575.
Ron Winslow, "Major Shift in War on Cancer: Drug Studies Focus on Genes of Individual Patients; Testing Obstacles Loom," Wall Street Journal, June 5, 2011. http://online.wsj.com/article/SB10001424052702304432304576367802580935000.html.
Ilya Lipkovich, Alex Dmitrienko, Jonathan Denne, and Gregory Enas, "Subgroup Identification Based on Differential Effect Search - A Recursive Partitioning Method for Establishing Response to Treatment in Patient Subpopulations," Statistics in Medicine 30, issue 21 (September 20, 2011): 2601-2621. onlinelibrary.wiley.com/doi/10.1002/sim.4289/abstract.
J. C. Foster, J. M. Taylor, and S. J. Ruberg, "Subgroup Identification from Randomized Clinical Trial Data," U.S. National Library of Medicine National Institute of Health, Statistics in Medicine 30, no. 24 (October 30, 2011): 2867-2880. doi:10.1002/sim.4322. Epub, August 4, 2011. www.ncbi.nlm.nih.gov/pubmed/21815180.
Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, and Bogong Li, "Subgroup Analysis via Recursive Partitioning," Journal of Machine Learning Research 10 (2009): 141-158. http://jmlr.csail.mit.edu/papers/volume10/su09a/su09a.pdf.
71
72
(This paper in turn references all the core technical papers on this topic.)
Free white paper: www.predictiveanalyticsworld.com/signup-uplift-
whitepaper.php
73
74
75
77