2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical...

23
The Unreasonable Effectiveness of Data Peter Szolovits MIT Computer Science and Artificial Intelligence Lab Professor of Computer Science and Engineering Professor of Health Sciences and Technology, IMES Critical Data: Secondary Use of Big Data from Critical Care January 7, 2014 1 Clinical

Transcript of 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical...

Page 1: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

The UnreasonableEffectiveness of Data

Peter SzolovitsMIT Computer Science and Artificial Intelligence LabProfessor of Computer Science and EngineeringProfessor of Health Sciences and Technology, IMES

Critical Data: Secondary Use of Big Datafrom Critical CareJanuary 7, 2014

1

Clinical

Page 2: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Peter NorvigIEEE Intelligent Systems,2009

... a largetraining set ofthe input-output behaviorthat we seek toautomate isavailable to usin the wild.

Page 3: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Google’s Lessons

Much of human knowledge is not like physics!“... invariably, simple models and a lot of data trump more elaborate models based

on less data”“... simple n-gram models or linear classifiers based on millions of specific features

perform better than elaborate models that try to discover general rules”“... all the experimental evidence from the last decade suggests that throwing away

rare events is almost always a bad idea, because much Web data consists ofindividually rare but collectively frequent events”

••

Page 4: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

More Data vs. Better Algorithms(Word Sense Disambiguation Task)

http://www.stanford.edu/group/mmds/slides2010/Norvig.pdf

Page 5: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Large Language Models in Machine Translation

Text

Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. Large language models in machine translation.In Proceedings of the2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 858–867,Prague, Czech Republic, 2007.

Page 6: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Vioxx and Heart Attacks

“Trends in inpatient stay due to MI weretightly coupled to the rise and fall ofprescriptions of COX-2 inhibitors, with an18.5% increase in inpatient stays for MIwhen both rofecoxib and celecoxib wereon the market (P<0.001). For everymillion prescriptions of rofecoxib andcelecoxib, there was a 0.5% increase inMI (95%CI 0.1 to 0.9) explaining 50.3%of the deviance in yearly variation of MI-related hospitalizations.”

Brownstein JS, Sordo M, Kohane IS, Mandl KD (2007)The Tell-Tale Heart: Population-Based SurveillanceReveals an Association of Rofecoxib and Celecoxib withMyocardial Infarction. PLoS ONE 2(9): e840.

Page 7: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Shift from Knowledge to Data

Model Prediction = f(Inputs)

Page 8: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

The Learning Health Care System

Every clinical encounter is a “natural experiment”Comprehensive data to support analysis of observational data and validate

hypothesesPrinciples

Use “found data”Clinical records, not experimental data collection protocolsEverything: labs, meds, notes, reports, discharge summaries, billing codes,

vitals, monitoring data, gene sequence & expression, environment, geography,social media, metagenomics, epigenomics, proteomics, ...Crimson—use discarded samples

“Anecdotes are not data” —Folk wisdom“A million anecdotes are data!” —Zak Kohane

Focus on predictive modeling

••

••

••

•••

Page 9: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Using MIMIC Data to Build Predictive Models

MortalityComparison to SAPS IIDaily Acuity ScoresReal-time Acuity Scores (real-time risk assessment)

Other clinical eventspressor weaningintra-aortic balloon pump weaningonset of septic shockacute kidney injury

Data set (MIMIC 2, earlier snapshot, today ~3-4x)10,066 patients: 7,048 development, 3,018 validationselected cases with adequate dataexcluded neurological and trauma cases

••••

•••••

••••

http://dspace.mit.edu/handle/1721.1/46690

Page 10: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational
Page 11: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

What Kinds of Models to Build?

Patient state depends on pathophysiologyGenetic complement, environmental exposures, pathogens, auto-regulatory

mechanisms, treatments, ...Possible formalism:

POMDP’s, but intractableGraphical models (Bayes Nets, Influence Diagrams, etc.), but require many

independence assumptionsSimple models: Cox proportional hazard, naïve Bayes, linear/logistic regression

Derived variables can summarize essential contributions of dynamic variationintegrals, slopes, ranges, frequencies, etc.Transformed variables: inverse, abs, square, square root, log-abs, abs deviation

from mean, log abs deviation, ...

••

•••

••

••

Page 12: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Summary of Mortality Models

Page 13: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Models for Therapeutic Opportunities and Risks

Predictions are not as accurate as mortality models, but still impressiveUsing acuity score instead of such specific models is worse

E.g., Vasopressor weaning — 0.679 vs. 0.809

••

Prediction AUC

Weaning from vasopressors within next 12 hours,remain off for 4 hours

0.809

Pressor weaning + Survival 0.825

Weaning from Intra-Aortic Balloon Pump 0.816

Onset of Septic Shock 0.843

Acute kidney injury 0.742

Page 14: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

But where to go from here?

Include data from narrative records: discharge summaries,radiology/pathology/... reports, nursing/doctor notes, ...

Requires natural language processing — in progressImproved intermediate abstractions of data

Use knowledgeUnsupervised machine learning to find clusters of similar data

patterns

••

••

Page 15: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Clustering Snapshotshttp://groups.csail.mit.edu/medg/ftp/kshetri/Kshetri_MEng.pdf

~10,000 patients x ~12,000 possible features/patient ≈ 1M rows (sparse), 30 minsnapshots ⇒ 11 clusters

# in cluster Survival

GCS Heart Rate Events

Page 16: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Prediction using multi-layer abstraction

Plane of Observations

Plane of Pathophysiological

Clusters(Foci)

Plane of Disease Clusters

Na KGlu

pH CO2

CrUrine

BUN

RRVasopressors

Orientation

SPO2 Vent

MAPJaundiceAIDS

Kidney States

Breathing

InjuryAcute/Chronic aggregate states

Electrolytes Imbalances CVS

Coagulation

INR

WBC

Organ Failure

Rhythms

Slide from Rohit Joshi

Page 17: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Foci%and%FeaturesSelected%~10,000%pa5ents%in%MIMIC$II$database$(excluded%pediatrics,%trauma)Over%12%domain%foci;%many%clinical%variables%per%focusOver%1%million%chart%events%(aBer%binning%into%1%hour%windows)

•••

Focus Features%in%each%Focus

Kidney Crea5nine,%BUN,%BUN2Cr,%UrineOut/Hr/Kg,%…

Liver AST,%Alt,%TBili,%Dbili,%Albumin,%tProtein

Cardio MAP,%HR,%CVP,%BPSys,%BPDias,%Cardiac%Index,%…

Respira5on RR,%SpO2,%FiO2Set,%PEEPSet,%TidVolSet,%SaO2,%PIP,%MinVent,%…

Hematology Hematocrit,%Hgb,%Platelets,%INR,%WBC,RBC,%PT

Electrolytes Na,%Mg,%K,%Ca,%Glucose,%…

AcidYbase Art%CO2,%Art%PaCO2,%Art%pH,%Art%BE

General GCS,%Age,%temp

Medica5on%type Diure5c,%An5arrhythmic,%An5platelet,%%Sympathomime5c,…

Chronic AIDS,%HematMalig,%Metacarcinoma

Electrocardio%(EKG) PVC,%Rhythm%types,%Ectopic%frequency,%… Slide from Rohit Joshi

Page 18: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Layer&2:&Disease&Severity&using&RDF

Learned'clusters'are'closely'related''to'mortality'!! Slide from Rohit Joshi

Page 19: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Mortality Prediction on the Test Data

Our model: AUC of 0.9SAPS-II: AUC of 0.81;

Our method outperforms customized SAPS-II score

High Severity PatientsAll Patients

Our model: AUC of 0.91SAPS-II: AUC of 0.77;

Slide from Rohit Joshi

Page 20: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Pa#ent'State)Transi#on:Effects)of)Past)Transi#ons)of)Other)Foci))

Lungs ElectrolytesKidney

KidneyLungsGeneral

HematAcidbaseElectrolytes

Liver

CardioVascular

Past;Transi=on

ln(organ_transi,on).~..current_state(organ).+.current_state(otherorgans).+...........................................latest_transi,on._trend(otherorgans).+..........................................dura,on_in_current_state(organ).+.........................................dura,on_since_last_change(otherorgans)..Slide from Rohit Joshi

Page 21: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Forecas(ng+the+transi(ons+of+organ+systems

Cardiovascular Respiratory

Confusion3Matrix3(Plot3of3True3states3vs.3Predicted3States)Slide from Rohit Joshi

Page 22: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Future&Needs/Opportuni0es

More&comprehensive&data&collec0onProspec0ve&analysisDecision&supportImproved&visualiza0on&of&pa0ent&stateOp0miza0on&of&interven0ons

•••––

22

Page 23: 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical encounter is a “natural experiment” Comprehensive data to support analysis of observational

Thanks

Funding fromNIH-National Library of Medicine (i2b2)NIH-National Institute of Biomedical Imaging and Bioengineering (MIMIC)ONC SHARP 4 project (Secondary Use of Clinical Data)Simons Center for the Social BrainSiemens Corp.MIMIC Team: Roger Mark, George Moody, Leo Celi, etc.Modeling collaborators: Rohit Joshi, Bill Long, Caleb Hug, Kanak Kshetrii2b2 collaborators: Tianxi Cai, Susanne Churchill, Vivian Gainer, Sergey Goryachev,

Elizabeth Karlson, Isaac Kohane, Fina Kurreema, Katherine Liao, Shawn Murphy,Robert Plenge, Soumya Raychaudhuri, Qing Zeng-Treitler, etc.NLP collaborators: Özlem Uzuner, Bill Long, Anna Rumshisky, Guergana Savova,

Marzyeh Ghassemi, Yuan Luo, Andreea Bodnari, Tawanda Sibanda

My research group: http://medg.csail.mit.edu

•••••••••