NICTA Copyright 2013 From imagination to impact
Bioinforma)cs and data analy)cs for next-‐genera)on cancer care
Karin Verspoor, PhD
Principal Researcher
Scien)fic Director, Health and Life Sciences
NICTA
NICTA Copyright 2013 From imagination to impact
Challenge
• Enhancing and suppor)ng biomedical data analysis and interpreta)on will facilitate – Automated surveillance of pa)ents – Performance outcomes analysis
– Improved efficiency in treatment
– Clinical Decision Support – Predic)ve modeling of disease risk
– Reduc)on of human effort in disease research – Improved diagnos)cs for disease
– Accelerated drug target and lead iden)fica)on – Personalised/precision medicine
NICTA Copyright 2013 From imagination to impact
Data, Data, Everywhere
• Electronic health records • Radiology images: X-‐ray, MRI and PET Scans
• Radiology and pathology reports • Data from sensors
• Registry data • Medicare claim data • Published biomedical ar)cles
• DNA (gene)c material) from biopsy samples
3
NICTA Copyright 2013 From imagination to impact
Making Sense of Biomedical Data
4
NICTA Copyright 2013 From imagination to impact 5
Computa)on for biomedical data
NICTA Copyright 2013 From imagination to impact
The need for automa)on
• Yann LeCun, Director of the Center for Data Science at New York University1: – “much of the knowledge in the world will soon need to be extracted by machines, because there will not be enough brain power to do it.”
• Russ Altman, Stanford University2: – “Our en)re understanding of biology and medicine is really contained in the published literature. And since people write in natural language, if you can’t get computers to turn that informa)on into databases and computable informa)on, you’re falling behind.”
6
1http://www.forbes.com/sites/sap/2013/11/14/the-white-house-honors-sap-stanford-and-nct/ 2http://biomedicalcomputationreview.org/content/ncbcs-take-stock-and-look-forward-fruitful-centers-face-sunset
NICTA Copyright 2013 From imagination to impact
“Convergence”
Bringing together clinicians, biologists, engineers, computer scien)sts, mathema)cians, sta)s)cians and physicists
Biomedical Informa)cs: Applica)on of knowledge representa)on and computa)onal infrastructure for biomedical data storage, retrieval, manipula)on, and analysis.
Bioinforma)cs: Process, analyse and interpret protein and genomics data.
Computa)onal methods and algorithms Robust, scalable computa)on
Data mining
Predic)ve analy)cs 7
NICTA Copyright 2013 From imagination to impact
Machines to Data to Machines to Knowledge to Ac)on
8
NICTA Copyright 2013 From imagination to impact
Uncovering Hidden Informa)on
• About 80% of informa)on is buried in textual form – Clinical notes – Radiology reports – GP and specialist lecers – Medical ar)cles
• Text Mining Applica)ons – Extrac)ng data from clinical notes
– Connec)ng with proteomic and genomic data – Linking with biomedical literature
9
NICTA Copyright 2013 From imagination to impact
Prac)ce-‐based Evidence
• EHRs capture health-‐related data • Turning that data into ac)onable informa)on requires
analysis and modeling – Data-‐driven methods – Integra)on of mul)ple sources of data
– e.g. combining clinical and gene)c indicators in predic)on of cancer prognosis
• Models produced via data mining and predic)ve analysis profile inherited risks and environmental/behavioral factors associated with pa)ent disorders
• U)lise to generate predic)ons about treatment outcomes
10
NICTA Copyright 2013 From imagination to impact
Pharmacovigilance • Mining of clinical records to iden)fy adverse drug events
– Es)mated >90% of adverse events do not appear in coded data
– Transform pa)ent records into pa)ent-‐feature matrix encoded using clinical terminologies
• Detect sta)s)cal associa)ons between drugs and adverse events
11 LePendu et al. (2013) “Pharmacovigilance Using Clinical Notes” Clinical Pharmacology & Therapeutics 93(6), 547–555; doi: 10.1038/clpt.2013.47
NICTA Copyright 2013 From imagination to impact
Text Mining for in-‐hospital infec)on
• Hospital-‐acquired infec8on is a major health burden – $4.5 billion cost, 98,000 deaths in US annually [1] – >$100 million, 1000 deaths in Australia annually for 2 common infec)ons [2]
• Surveillance as the founda)on of preven)on and control – shown to lower infec)on rates, improve detec)on, iden)fy overuse of expensive drugs [3]
– pervasive surveillance not feasible without automated support
• Our approach: mining radiology reports and images – automate surveillance, leverage hospital informa)on flow
– side benefit: early detec)on – Joint project: Alfred Health, Melbourne Health, Peter Mac Cancer Ins)tute
12
NICTA Copyright 2013 From imagination to impact
Text mining and beyond
• Current text mining performance – 94% recall, 90% precision at scan level – 98% recall, 88% precision at pa)ent level – Effec)ve for surveillance; improvement needed for real-‐
4me detec4on
• Directly classifying CT images for IFI – Matching images being provided by hospital partners
– Set up as mul4-‐task learning problem: Detect <Image,Report> pair as indica)ve of IFI
• Mining pa8ent records for risk indicators – Mining historical pa)ent data to learn impac)ng factors
13
NICTA Copyright 2013 From imagination to impact
Searching for Disease-‐related Genes
14
• Large amounts of individual gene)c varia)on – SNPs, inser)ons, dele)ons – Copy number varia)on,
genomic duplica)ons, inversions, transloca)ons
– DNA methyla)on, chroma)n state, histone modifica)on, RNA binding affinity, etc.
• Iden)fying varia)on is becoming easier, interpre)ng it remains difficult
Image credit: Jane Ades, NHGRI, http://www.sciencedaily.com/releases/2008/01/080122101914.htm
NICTA Copyright 2013 From imagination to impact
Singular Nucleo)de Polymorphisms
• Haplotypes: Associated SNP alleles
• Chromosome regions where two groups differ in haplotype frequencies might contain genes affec)ng the disease
• Analysis enabled by large-‐scale genomic data collec)on, data storage, and sta)s)cal frameworks scaled to large data sets
15
NICTA Copyright 2013 From imagination to impact 16
1M SNPs
…
… …
…
…
…
…
…
1 m
m
x 1
0,00
0 s
ampl
es
1 m
m
…
…
Size of Epistasis Search Space
NICTA Copyright 2013 From imagination to impact
17
[Insert image or x 1,000,000
GWIS – Genome Wide Interaction Search system
Our Strength: Integra)on of mathema)cal, computa)onal, signal processing and bioinforma)cs exper)se resul)ng in unique novel solu)on, Genome Wide Interac)on Search (GWIS):
● Run )me improved by up to 3 orders of magnitude with ● Improved detec)on rate
Genome Wide Interac8on Search (GWIS) Adam Kowalcyzk
NICTA Copyright 2013 From imagination to impact
2nd Order GWIS with Bigger Datasets The future of GWAS studies implies bigger datasets giving more precision, but
longer computing times ! We are ready for these future datasets.
3rd Order GWIS We are developing even faster techniques, to make 3rd Order GWAS feasible (all combinations of 3 SNPs).
* Fastest according to the benchmark paper: Li Chen, Guoqiang Yu, David J. Miller, Lei Song, Carl Langefeld, David Herrington, Yongmei Liu, and Yue Wang, A Ground Truth Based Comparative Study on Detecting Epistatic SNPs, Proceedings (IEEE Int Conf Bioinformatics Biomed). 2009 November 1; 1-4(Nov 2009):
SNPs x Samples
Standard algorithm (IG*)
GWIS-CPU (4 Cores Intel 3.0 GHz)
GWIS-GPU (1 GTX 470) Chi-squared test
GWIS-GPU on MASSIVE GPU Cluster (~ 200 Tesla C2050)
GWIS-GPU on Titan (18,688 Tesla K20)
2nd Order 300K x 3K 108 days 39 minutes 3 minutes ~ 0 ~ 0
1M x 10K 11 years 25 hours 1.85 hours ~ 0.5 minutes ~ 0
5M x 10K 275 years 26 days 1.91 days ~ 12.24 minutes ~ 0
3rd Order 300K x 3K ~ 30K years ~ 30 years ~ 2.3 years ~ 5 days ~ 38 minutes
1M x 10K ~ 3.6M years ~ 3.7K years ~ 282 years ~ 612 days ~ 3.2 days
5M x 10K ~ 458.3M years ~ 453K years ~ 34.9K years ~ 208 years ~ 1.1 years
Timing at a glance
NICTA Copyright 2013 From imagination to impact 19
Applica)on context: Integrated genomics for lethal prostate cancer
A/Prof Chris Hovens at Royal Melbourne Hospital has: • Acquired a unique bio-‐bank of over 1500 prostate cancer samples
• Extensive clinical informa)on
• Demands computa)onal resources and exper)se to address complex genomic analysis problems
NICTA Copyright 2013 From imagination to impact
Integrated genomics for lethal prostate cancer Sample acquisi)on
20
Unique metastatic samples are harvested by clinical and surgical researchers during the progression of the disease
NICTA Copyright 2013 From imagination to impact
Integrated genomics for lethal prostate cancer
21
Molecular analysis
Samples are profiled using mul)ple high-‐resolu)on, high-‐throughput plasorms genera)ng large amounts of molecular level data
Heterogeneous DNA sequencing (whole genome) RNA sequencing Methyla)on profiling Copy-‐number varia)on profiling
= 40TB data
(doubling every 3 months)
NICTA Copyright 2013 From imagination to impact
Algorithms for variant interpreta8on Harness the power of the literature • Extract informa)on about
genes and gene)c variants from biomedical research publica)ons
• Start with the simple hypothesis that any men)on of a gene)c variant is meaningful
• Priori)ze variants with literature support
• Provide pointers to the evidence for human interpreta)on
22
NICTA Copyright 2013 From imagination to impact
Cura)on of Gene)c Variant Informa)on from the biomedical literature
hcp://opennicta.com/home/health/variome
• Partnership with InSiGHT database (Human Variome Project) – Collect and catalogue muta)ons in specific genes implicated in
gastrointes)nal hereditary tumours
– Collected both by direct deposit of gene)c variants, and from cura)on of the published literature
• We have developed a text annota)on schema and annotated a corpus of relevant literature – Variant Annota)on Schema
– covers genes, muta)ons, diseases, pa)ents, body parts, ethnic group, age, gender, characteris)cs; also rela)onships among these
• In progress: build en)ty and rela)on extrac)on tools to build tools to support cura)on of this informa)on
NICTA Copyright 2013 From imagination to impact
A “Phenotypic code” for complex disease
• Simple and complex diseases appear to share a gene)c architecture
• Mining of co-‐morbidi)es of complex diseases and Mendelian diseases with known gene)c cause iden)fies a ‘code’ for each complex disease in terms of Mendelian gene)c loci.
• Evidence of epistasis among the Mendelian variants (superlinear complex disease risk) 24
Blair et al. Cell (2013); 155 (1); 70-80. http://dx.doi.org/10.1016/j.cell.2013.08.030
NICTA Copyright 2013 From imagination to impact
BiomRKRS Biomarker Retrieval and Knowledge Reasoning System
• Knowledge management for biomarker data
• Using ontologies/controlled vocabularies as backbone for integra)on and retrieval
• Integra)ng informa)on from a range of sources, including the literature
• Support querying according to various characteris)cs
25
NICTA Copyright 2013 From imagination to impact
Searching for informa)on via complex queries
26
NICTA Copyright 2013 From imagination to impact
Predic)ve Modeling
• EHRs capture health-‐related data • Turning that data into ac)onable informa)on requires
analysis and modeling – Data-‐driven methods – Integra)on of mul)ple sources of data
– e.g. combining clinical and gene)c indicators in predic)on of cancer prognosis
• Models produced via data mining and predic)ve analysis profile inherited risks and environmental/behavioral factors associated with pa)ent disorders, which can be u)lized to generate predic)ons about treatment outcomes
27
NICTA Copyright 2013 From imagination to impact 28
Biomedical informa)cs @ NICTA
NICTA Copyright 2013 From imagination to impact
We Do Good STUFF
29
Top Related