2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical...
Transcript of 2014-01-07 Unreasonable Effectiveness of Data...The Learning Health Care System Every clinical...
The UnreasonableEffectiveness of Data
Peter SzolovitsMIT Computer Science and Artificial Intelligence LabProfessor of Computer Science and EngineeringProfessor of Health Sciences and Technology, IMES
Critical Data: Secondary Use of Big Datafrom Critical CareJanuary 7, 2014
1
Clinical
Peter NorvigIEEE Intelligent Systems,2009
... a largetraining set ofthe input-output behaviorthat we seek toautomate isavailable to usin the wild.
Google’s Lessons
Much of human knowledge is not like physics!“... invariably, simple models and a lot of data trump more elaborate models based
on less data”“... simple n-gram models or linear classifiers based on millions of specific features
perform better than elaborate models that try to discover general rules”“... all the experimental evidence from the last decade suggests that throwing away
rare events is almost always a bad idea, because much Web data consists ofindividually rare but collectively frequent events”
••
•
•
More Data vs. Better Algorithms(Word Sense Disambiguation Task)
http://www.stanford.edu/group/mmds/slides2010/Norvig.pdf
Large Language Models in Machine Translation
Text
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. Large language models in machine translation.In Proceedings of the2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 858–867,Prague, Czech Republic, 2007.
Vioxx and Heart Attacks
“Trends in inpatient stay due to MI weretightly coupled to the rise and fall ofprescriptions of COX-2 inhibitors, with an18.5% increase in inpatient stays for MIwhen both rofecoxib and celecoxib wereon the market (P<0.001). For everymillion prescriptions of rofecoxib andcelecoxib, there was a 0.5% increase inMI (95%CI 0.1 to 0.9) explaining 50.3%of the deviance in yearly variation of MI-related hospitalizations.”
Brownstein JS, Sordo M, Kohane IS, Mandl KD (2007)The Tell-Tale Heart: Population-Based SurveillanceReveals an Association of Rofecoxib and Celecoxib withMyocardial Infarction. PLoS ONE 2(9): e840.
Shift from Knowledge to Data
Model Prediction = f(Inputs)
The Learning Health Care System
Every clinical encounter is a “natural experiment”Comprehensive data to support analysis of observational data and validate
hypothesesPrinciples
Use “found data”Clinical records, not experimental data collection protocolsEverything: labs, meds, notes, reports, discharge summaries, billing codes,
vitals, monitoring data, gene sequence & expression, environment, geography,social media, metagenomics, epigenomics, proteomics, ...Crimson—use discarded samples
“Anecdotes are not data” —Folk wisdom“A million anecdotes are data!” —Zak Kohane
Focus on predictive modeling
••
••
••
•••
•
Using MIMIC Data to Build Predictive Models
MortalityComparison to SAPS IIDaily Acuity ScoresReal-time Acuity Scores (real-time risk assessment)
Other clinical eventspressor weaningintra-aortic balloon pump weaningonset of septic shockacute kidney injury
Data set (MIMIC 2, earlier snapshot, today ~3-4x)10,066 patients: 7,048 development, 3,018 validationselected cases with adequate dataexcluded neurological and trauma cases
••••
•••••
••••
http://dspace.mit.edu/handle/1721.1/46690
What Kinds of Models to Build?
Patient state depends on pathophysiologyGenetic complement, environmental exposures, pathogens, auto-regulatory
mechanisms, treatments, ...Possible formalism:
POMDP’s, but intractableGraphical models (Bayes Nets, Influence Diagrams, etc.), but require many
independence assumptionsSimple models: Cox proportional hazard, naïve Bayes, linear/logistic regression
Derived variables can summarize essential contributions of dynamic variationintegrals, slopes, ranges, frequencies, etc.Transformed variables: inverse, abs, square, square root, log-abs, abs deviation
from mean, log abs deviation, ...
••
•••
••
••
Summary of Mortality Models
Models for Therapeutic Opportunities and Risks
Predictions are not as accurate as mortality models, but still impressiveUsing acuity score instead of such specific models is worse
E.g., Vasopressor weaning — 0.679 vs. 0.809
••
•
Prediction AUC
Weaning from vasopressors within next 12 hours,remain off for 4 hours
0.809
Pressor weaning + Survival 0.825
Weaning from Intra-Aortic Balloon Pump 0.816
Onset of Septic Shock 0.843
Acute kidney injury 0.742
But where to go from here?
Include data from narrative records: discharge summaries,radiology/pathology/... reports, nursing/doctor notes, ...
Requires natural language processing — in progressImproved intermediate abstractions of data
Use knowledgeUnsupervised machine learning to find clusters of similar data
patterns
•
••
••
Clustering Snapshotshttp://groups.csail.mit.edu/medg/ftp/kshetri/Kshetri_MEng.pdf
~10,000 patients x ~12,000 possible features/patient ≈ 1M rows (sparse), 30 minsnapshots ⇒ 11 clusters
•
# in cluster Survival
GCS Heart Rate Events
Prediction using multi-layer abstraction
Plane of Observations
Plane of Pathophysiological
Clusters(Foci)
Plane of Disease Clusters
Na KGlu
pH CO2
CrUrine
BUN
RRVasopressors
Orientation
SPO2 Vent
MAPJaundiceAIDS
Kidney States
Breathing
InjuryAcute/Chronic aggregate states
Electrolytes Imbalances CVS
Coagulation
INR
WBC
Organ Failure
Rhythms
Slide from Rohit Joshi
Foci%and%FeaturesSelected%~10,000%pa5ents%in%MIMIC$II$database$(excluded%pediatrics,%trauma)Over%12%domain%foci;%many%clinical%variables%per%focusOver%1%million%chart%events%(aBer%binning%into%1%hour%windows)
•••
Focus Features%in%each%Focus
Kidney Crea5nine,%BUN,%BUN2Cr,%UrineOut/Hr/Kg,%…
Liver AST,%Alt,%TBili,%Dbili,%Albumin,%tProtein
Cardio MAP,%HR,%CVP,%BPSys,%BPDias,%Cardiac%Index,%…
Respira5on RR,%SpO2,%FiO2Set,%PEEPSet,%TidVolSet,%SaO2,%PIP,%MinVent,%…
Hematology Hematocrit,%Hgb,%Platelets,%INR,%WBC,RBC,%PT
Electrolytes Na,%Mg,%K,%Ca,%Glucose,%…
AcidYbase Art%CO2,%Art%PaCO2,%Art%pH,%Art%BE
General GCS,%Age,%temp
Medica5on%type Diure5c,%An5arrhythmic,%An5platelet,%%Sympathomime5c,…
Chronic AIDS,%HematMalig,%Metacarcinoma
Electrocardio%(EKG) PVC,%Rhythm%types,%Ectopic%frequency,%… Slide from Rohit Joshi
Layer&2:&Disease&Severity&using&RDF
Learned'clusters'are'closely'related''to'mortality'!! Slide from Rohit Joshi
Mortality Prediction on the Test Data
Our model: AUC of 0.9SAPS-II: AUC of 0.81;
Our method outperforms customized SAPS-II score
High Severity PatientsAll Patients
Our model: AUC of 0.91SAPS-II: AUC of 0.77;
Slide from Rohit Joshi
Pa#ent'State)Transi#on:Effects)of)Past)Transi#ons)of)Other)Foci))
Lungs ElectrolytesKidney
KidneyLungsGeneral
HematAcidbaseElectrolytes
Liver
CardioVascular
Past;Transi=on
ln(organ_transi,on).~..current_state(organ).+.current_state(otherorgans).+...........................................latest_transi,on._trend(otherorgans).+..........................................dura,on_in_current_state(organ).+.........................................dura,on_since_last_change(otherorgans)..Slide from Rohit Joshi
Forecas(ng+the+transi(ons+of+organ+systems
Cardiovascular Respiratory
Confusion3Matrix3(Plot3of3True3states3vs.3Predicted3States)Slide from Rohit Joshi
Future&Needs/Opportuni0es
More&comprehensive&data&collec0onProspec0ve&analysisDecision&supportImproved&visualiza0on&of&pa0ent&stateOp0miza0on&of&interven0ons
•••––
22
Thanks
Funding fromNIH-National Library of Medicine (i2b2)NIH-National Institute of Biomedical Imaging and Bioengineering (MIMIC)ONC SHARP 4 project (Secondary Use of Clinical Data)Simons Center for the Social BrainSiemens Corp.MIMIC Team: Roger Mark, George Moody, Leo Celi, etc.Modeling collaborators: Rohit Joshi, Bill Long, Caleb Hug, Kanak Kshetrii2b2 collaborators: Tianxi Cai, Susanne Churchill, Vivian Gainer, Sergey Goryachev,
Elizabeth Karlson, Isaac Kohane, Fina Kurreema, Katherine Liao, Shawn Murphy,Robert Plenge, Soumya Raychaudhuri, Qing Zeng-Treitler, etc.NLP collaborators: Özlem Uzuner, Bill Long, Anna Rumshisky, Guergana Savova,
Marzyeh Ghassemi, Yuan Luo, Andreea Bodnari, Tawanda Sibanda
My research group: http://medg.csail.mit.edu
•••••••••
•
•