Actionable Analytics - Solving Real World Problems With Big Data, Xerox Innovation Group
TRANSFORMING BIG DATA INTO CLINICALLY ACTIONABLE … · With Big Data comes Big Opportunity • We...
Transcript of TRANSFORMING BIG DATA INTO CLINICALLY ACTIONABLE … · With Big Data comes Big Opportunity • We...
TRANSFORMING
BIG DATA INTO
CLINICALLY ACTIONABLE
STRATEGIES
John Quackenbush
Dana-Farber Cancer Institute
Harvard School of Public Health
Background and Disclosures• Professor of Biostatistics and Computational
Biology, Dana-Farber Cancer Institute
• Professor of Computational Biology and
Bioinformatics, Harvard School of Public Health
• Many other academic titles
• Numerous advisory boards
• Co-Founder of Genospace, a Precision Genomic
Medicine Software Company (now owned
by HCA).
Background and Disclosures• Professor of Biostatistics and Computational
Biology, Dana-Farber Cancer Institute
• Professor of Computational Biology and
Bioinformatics, Harvard School of Public Health
• Many other academic titles
• Numerous advisory boards
• Co-Founder of Genospace, a Precision Genomic
Medicine Software Company (now owned
by HCA).
Every revolution in science—from
Copernican heliocentric model to the
rise of statistical and quantum
mechanics, from Darwin’s theory of
evolution and natural selection to the
theory of the gene—has been driven
by one and only one thing: access to
data.
–John Quackenbush
@johnquackenbush-Every revolution in
the history science has been driven by
one and only one thing: access to data.
Twitter version, 115 characters with spaces
@notrealdonaldtrump-We have the best
data. Fantastic Bigly! The other
scientists. losers So sad #fakedata
#failedscientists covfefe
Twitter version, 131 characters with spaces
Costs of Generating Data Have Plummeted
New Sources of Health and Medical Data
Drug Research Social Media Patient Records Genomics
Test Results Claims Data Home Monitoring Mobile Apps
http://ihealthtran.com/wordpress/2013/03/infographic-friday-the-body-as-a-source-of-big-data/
The Body as a Source of Big Data
• It is estimated that by 2015,
the average hospital will
generate 665TB of data
annually.
• Medical imaging archives are
increasing by 20-40% annually.
• Today, 80% of data is
unstructured such as images,
video, and notes.
Graphic from NetApp
Building Data-Driven
Precision Medicine
CHALLENGES AND KEY TRENDS
EXPLOSION OF GENOMIC DATA
$1,000
$10,000
$100,000
$1,000,000
$10,000,000
$100,000,000
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
COST PER GENOME
RANGE OF CLINICAL RELEVANCE
COMPUTING RESOURCES
• Data Transfer• Security/Permissioning• Storage• Compute Resources• Semantic Databases
VARIOUS DATA SOURCES IN SILOS
EMR(clinical,
outcomes)
DNA(genomic)
Other Lab
(diagnostic)
VARIOUS USE CASES AND STAKEHOLDERS
PatientClinic Lab R&D
POPULATION ANALYTICS: COMPLEX ANALYSIS MADE EASY
©Genospace, 2012-2015
COMMUNITY PORTAL: SELF-REPORTED DATA
Springfield Diagnostic Labs
CONFIDENTIAL
PRECISION MEDICINE DEMANDS SIMPLICITY
Curation, curation, curation….
The Need for
Data-Driven
Precision Medicine
Biomarkers, Disease Subtypingand Targeted Therapy
Her-2+
(Herceptin)
(Perjeta)
EML4-ALK
(Xalkori)
K-ras
(Erbitux)
(Vectibix)
BRAF-V600
(Zelboraf)
CFTR-G551
(Kalydeco)
Companion Diagnostics: the Right Rx for the Right Disease (Subtype)
Non-Responders to Oncology Therapeutics
Are Highly Prevalent and Very Costly
BRAF Inhibitor Shrinks Metastatic Melanoma
But ONLY in patients whose tumors have the BRAF mutation.Only a subset of patients have a durable response.
BRAF Inhibitor Prolongs Survival in Patients with Metastatic Melanoma
McDermott U et al. N Engl J Med 2011;364:340-350.
Immunotherapy: No Silver Bullet
But despite the clinical success of antibodies against the immune
regulators CTLA4 and PD-L1/PD-1, only a subset of people exhibit
durable responses, suggesting that a broader view of cancer
immunity is required.
“Even within NCCN, certainly the majority of decision
nodes that are enshrined in NCCN [guidelines] are not
supported by high level evidence.”
Dr. Clifford Hudis
President, ASCO
Interview in Cancer Letter 22 Nov. 2013, 39
Evidence: We Still Need Research
NCCN = National Cooperative Cancer Center Network
The Solution Must Be Driven
by Data
We Need to Invest
in Big Data Research
National Research Council on Big DataNational Research Council’s 2013 “Frontiers of Massive
Data Analysis” report concluded:
• The challenges associated with massive data go
far beyond the technical aspects of data.
• The key element in meeting Big Data’s challenges
is the development of rigorous quantitative and
statistical methods.
• It is quite possible to turn data into something
resembling knowledge when actually it is not.
• Overlooking this foundation may yield results that
are, at best, not useful, or harmful at worst.
http://www.nap.edu/catalog.php?record_id=18374
Key Challenges in Big Data
• Preprocessing (Normalization) and Hot Spot Detection
• Need methods to compare measurements across sources
and to rapidly identify salient features
• Data Integration
• Need methods that can combine data from various
sources where there are hidden correlations in the data
• Reproducible Research
• Need to leverage the volume and velocity of the data to
provide opportunities for validation of findings
• Network Methods
• Need to move beyond correlations in studying
relationships in data
• Data Access and Utility
• Need to assure that different constituencies have access
to data in a form that meets their unique needs
Network Methods
Integrative Network Inference:
PANDA
Kimberly Glass, GC Yuan
Gen
es
Condition
s
Expression data
(Angiogenic)
Gen
es
Condition
s
Expression data
(Non-angiogenic)
Co
mp
are
/Iden
tify D
iffere
nces
Network for
Angiogenic Subtype
Network for
Non-angiogenic Subtype
PANDA: Integrative Network Models
Kimberly Glass, GC Yuan
Kimberly Glass, GC Yuan
Kimberly Glass, GC Yuan
Regulatory Patterns suggest Therapies
good
prognosis
poor
prognosis
• Single-sample networks for TCGA
glioblastoma patients
• 3yr survival to define good and poor
prognosis
• LIMMA analysis using network
“edge Z-scores”
• Analysis points to important roles
for FOS-JUN and NFkB
• Gene degree differences identify
mitosis and immune-related genes
Significant
differential targeting
(gene degree
differences),
GSEA, FDR<0.01
higher targeting in
good
higher targeting in
poor
Glioblastoma
network signatures
With Big Data comes Big Opportunity
• We are at a point in time where data generation is a
commodity.
• Data management, data provenance, data cleaning, data
wrangling are ongoing challenges.
• Solving these problems provides opportunities to drive
new discoveries important in health and biomedical
research.
• In pharma, there are tremendous opportunities for drug
repurposing, drug discovery, and drug rescue if you
develop integrated data strategies and intelligent
analyses.
• For pharma, part of the strategy for success is
recognizing that you are a data company.
Before I came here I was confused
about this subject.
After listening to your lecture,
I am still confused but at a higher level.
- Enrico Fermi, (1901-1954)
Acknowledgments
DFCI GTEx Team
Joseph Barry
Joey (Cho-Yi) Chen
Shelia Gaynor
Marieke Kuijjer
Camila Lopes-Ramos
Megha Padi
Joseph Paulson
John Platig
John Quackenbush
Daniel Schlauch
Administrative
Support
Nicole Trotman
http://compbio.dfci.harvard.edu
CDNM, Brigham and
Women’s Hospital
Dawn DeMeo
Kimberly Glass
Abhijeet Sonawane