Big data chicago v2 5 14 14

Post on 27-Jan-2015

113 views 1 download

Tags:

description

 

Transcript of Big data chicago v2 5 14 14

Big Data in Health Care What Marketers Need to Know

Tim Gilchrist, May 2014@timgilchrist

Session Goals

• Cover– Big Data– Artificial Intelligence / Machine Learning – How to be an Informed Consumer– Applications for Marketers–Questions

2

Big Data

The term for a collection of data sets so large and complex that they become difficult to process

• Many data sources with different formats

• Data with missing values

• Text / Social Media

• Things that don’t fit in Excel

3

Artificial Intelligence

4

“Pay no attention to the man behind the curtain”

Machine Learning

The construction and study of systems that can learn from data

5

Bayes

Thomas Bayes (1701 – 7 April 1761) was an English mathematician and Presbyterian minister, known for formulating the theorem that bears his name: Bayes' theorem

6

Bayes theorem uses prior probabilities, combined with new observations to calculate the probability of a hypothesis being true or false

Bayes is a natural fit to health care due to the presence of hypothesis (diagnosis) and events (tests / observations)

Example

You are all doctors who have administered a critical test to 50 patients

You know the test is:– 75% Accurate– 10% False Positives

7

– 10 of your patients tested positive– How many are actually sick?

Bayes’ Theorem – Mammogram Example

We can present the data as a decision tree representing the probabilities confronting doctors and patients

8

If we were to take population-level down to the individual level, much more accurate probabilities would be possible

Cancer 10%

No Cancer 90%

Cancer .03%

No Cancer 99%

Test + 8%

Test - 92%#1 Mammogram

#2 B

iops

y#3

Tim

e

Observation

Probability

What Does this Mean To Marketers?

• Big data is about discovering relationships

9

“Can’t beat a man with some insurance. I need that health plan baaaby!”

• Then using data-driven insights to inform strategy

Big Data Health Landscape

HIE

Member PCP Specialist

Sees PCP Gets X-Ray Sees Specialist Ambulatory

Outpatient

Analysis / Transformation

EMR AdmissionDischarge PrescriptionClaims

Plan Data Portal

Care

Pat

hO

utpu

tsD

ata

Type

s

Direct ConnectionWearable

Telemetry

What is Happening What Will Happen

Social PurchasedLocation

Cell

Example

Text Mining for Sales

11

Listening & Collecting

12

Noise Signal

Training / Processing

13

Tweets extracted from the Twitter Fire Hose with

key words “Health” and “Plan”

1MM per day

1

@CapoeiraBatuque

“What's the best plan thru affordable health care. Blue cross? Blue shield? Health net? #confused #healthcare”

@bluecalgal

Obama says don't listen to Fox, why? Obama lied about keep your dr, health plan, cheaper

than cell phone (bs) keep your dr

Create Training databases for any classification desired. + and – outcomes used here

2

Weka turns Tweets into numerical code that can be

analyzed by computer “String to Word Vector”.

uses Naïve Bayes classifier

3

“What's the best plan thru affordable health care. Blue cross?

Blue shield? Health net? #confused #healthcare”

.225.357

.155

.999

ACA AccessHealthHealthPlanNow PressureConfusedDumb

@ Handle# Followers# TweetsProfileRetweetsLocationDate/Time

Text AnalysisStemming/Tokenization

Demographic Analysis

Once classifications are established, rules can be

applied to new Tweets with high accuracy ~90+%

4

Result

14

Other Uses

15

• Model who will be your most valuable:

• Customer

• Facebook follower

• (Really) Determining Sentiment

• Marketing Mix Simulations

• Consumer Facing Predictive Technology

• Prod development (HIX)

Questions?

16

Appendix

17

Discovering Relationships Between Data

We can use machine learning to form relationships between sets of data that are seemingly unrelated (Causal Relationships): • Making your bed in the morning and job

satisfaction• Artificial Christmas trees and family “brag

letters”• What you buy and why

18

A Causal Diagram Based on Established Relationships for Estimating the Incidence of Coronary Heart Disease (CHD).

(Source: Comparative quantification of health risks: Conceptual framework and methodological issues)

19© Tim Gilchrist 2013

Bayes’ Theorem – Mammogram Example

Problem: Estimates of breast cancer over diagnosis range from 25%–52%. Physicians often misinterpret their own lab results

20

Prior ProbabilityChance that a woman will develop breast cancer in her 40s X 1.4%

New Event: MammographyAbility of mammogram to detect cancer when present Y 75%

False positives Z 10%

Posterior ProbabilityRevised Probability, given new event (positive mammogram)

xy+z(1-x)9.6%

A positive mammogram still leaves a 90.4% chance that the test showed something other than cancer. When biopsies are performed on this age group, 75% are negative. Physicians rarely consider the other 90.4%

What Does this Mean To Marketers?

21

The State of Big Data

23