TraitCapture:Open source tools for DIY high throughput Phenomics and NextGen data visualization
Genomics2 Phenomics Complete
-
Upload
interpretomics -
Category
Health & Medicine
-
view
366 -
download
0
Transcript of Genomics2 Phenomics Complete
18th June 2001017th December 2009
Genomics to Phenomics:The Complex Journey in Big Data Biomedicine
Asoke K Talukder, PhDInterpretOmics, Bangalore
Indian Society of Human Genetics41st Annual Meeting,
Sankara Nethralaya, Chennai, 3-5 March, 2016
Acknowledgement
• Organizing Committee, ISHG2016
• Authors & Agencies Making their Research articles, and Data available in the open domain and Internet
• Authors of Open source Software
• NCBI, NIH, Wikipedia, Google & other Internet sites that believe in Bhikshu Economy by making their contents open in the Cloud
2March 3-5, 2016
Hunting the “Dwarfing” Gene?
3March 3-5, 2016
Palm Oil – Activate the Dwarfing Gene (Genomics)Teak – Repress the Dwarfing Gene (Genomics)
The Human Genome – Decoding the Book of Life
A Milestone for Humanity – the Human genome
Human Genome Completed, 26 June, 2000
Francis CollinsBill ClintonJ Craig Ventor
Craig Venter Bill Clinton Francis Collins
4March 3-5, 2016
Trillion-Dollar Science to Trillion-Dollar Industry
5March 3-5, 2016
The relationship between the number of stem cell divisions in the lifetime of a given tissue and the lifetime risk of
cancer in that tissue
Reference: Cristian Tomasetti, and Bert Vogelstein, Jan 2 Science 2015;347:78-81
6March 3-5, 2016
Reference: Norbert Stefan, et al, Divergent associations of height with
cardiometabolic disease and cancer: epidemiology, pathophysiology, and
global implications. The Lancet Diabetes & Endocrinology, 2016; DOI:
Reduction Vs Integration
7March 3-5, 2016
Genomics (System)
(Genetics)
Talukder AK, Genomics 3.0, Big Data Analytics, Springer, 2015
Evidence Based Science (Biology & Medicine)
8March 3-5, 2016
Genetics Genomics
Confirmatory Exploratory
Hypothesis Driven Hypothesis Creating
Component Holistic
Biology Statistical Data Mining
Big Data in Biomedicine
The 7 Vs of Genomic Big Data
• Volume is defined in terms of the physical volume of the data that need to be online, like giga-byte (10^9), tera-byte (10^12), peta-byte (10^15) or exa-byte (10^18) or even beyond.
• Velocity is about the data-retrieval time or the time taken to service a request. Velocity is also measured through the rate of change of the data volume.
• Variety relates to heterogeneous types of data like text, structured, unstructured, video, audio etcetera.
• Veracity is another dimension to measure data reliability - the ability of an organization to trust the data and be able to confidently use it to make crucial decisions.
• Vexing covers the effectiveness of the algorithm. The algorithm needs to be designed to ensure that data processing time is close to linear and the algorithm does not have any bias; irrespective of the volume of the data, the algorithm is able to process the data in reasonable time.
• Variability is the scale of data. Data in biology is multi-scale, ranging from sub-atomic ions at picometers, macro-molecules, cells, tissues and finally to a population [9] at thousands of kilometers.
• Value is the final actionable insight or the functional knowledge. The same mutation in a gene may have a different effect depending on the population or the environmental factors.
9March 3-5, 2016
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
21st Century Biomedicine is a Multi-Scale Challenge
Genome
Transcriptome
Proteome
Cellular Structure and Function
Tissue Structureand Function
Organ Structure and Function
Patient
Molecular Scale
ηm~μm μm~mm mm~cm 1mηm
ηs
ηs-μs
μs-s
s~hour
hour~day
years
Molecular events (Eg: Ion-channel gating
Diffusion and cell signaling
Motility
Mitosis
Protein turnover
Human lifetime
10March 3-5, 2016
Genomics
Phenomics
‘OMICS’ (High-throughput) Big Data Domains
GWAS
Population
GeneticsMicroarray
Systems
Biology
Phenomics
ChIp-Seq DNA-Seq
RNA-Seq
Exome-Seq
Repli-Seq
Small
RNA-Seq
Metabolic
Networks
Proteomics
Metagenomics
11March 3-5, 2016
Multi Omics Big Data
12March 3-5, 2016
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
Hypothesis Creating Multi-Omics Big Data Analytics
13March 3-5, 2016
Genomic Big Data
Statistics
(Exploratory Data Analysis)
Phenomic &
Environmental
Knowledgebase
Systems Biology
Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015
Lung Cancer: A Multi-Omics Multi-ScaleBig Data Case Study using iOMICS Pipelines
We have taken a Lung squamous cell carcinoma study and reanalysed its data using iOMICS pipelines to unleash novel knowledge
The reanalysis is for a Lung Squamous Cell Carcinoma (SCC) 18 years Longitudinal clinical research published in PMID: 25189482.
The data consist of Omics data for 93 tumor patients and 16 healthy individuals. DNA level genotype data: 64 tumor samples, 373,398
DNA sites RNA level gene expression data: 109 samples, 20,117
genes Clinical data: General and clinical information (where
applicable) for all 109 individuals in the study. Survival information was also available in the form of overall and disease recurrence free survival
14March 3-5, 2016
Multi-Omics Based Multi-Scale Analytics Framework
Data from patient is integrated with existing knowledgebases using a 3 step analysis framework
Top-down Exploratory Data Analysis: Analysis of experimental data for molecular information such as DNA mutations and gene expression
Multi-scale Integrative Analysis: Integration of molecular scale data such as DNA, RNA level results for mechanistic modeling
Bottom-up Integrative and Network Analysis:Integration of experimental data analysis results with existing knowledgebases for generalizability and improved quality results
Results from the framework can be used to power clinical decision support systems for treatment strategies and drug design
15March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics,
Springer, LNCS9498, 2015
Patient Stratification
Integration of gene expression with recurrence free survival of patients
Recurrence free survival known for 87 samples
Used Cox regression to model survival time as a function of gene expression
Stratified patients into 3 response groups:
Good, Average and Poor prognosis
Aim: Markers of Patient Survival
Survival Probability Curves for Stratified Prognosis Groups
Top Significant Genes separating Poor and Average prognosis are: EIF5A, SCEL, ABCA11P, VAV2
Top Significant Genes separating Good and Average prognosis are: SLC7A11, G6PD, ALDH3A1, NQO1, SOST
Top Significant Genes separating Good and Poor prognosis are: SCEL, VAV2, PPP1R26, ZNF77, EIF5A
16March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Phenotype Based Patient Stratification
17© InterpertOmics
Data: Lung Squamous Cell Carcinoma with Basaloid Histology Study (PubMed-ID: 25189482) Clinical Data.
Overall
Survivability: The
basaloid tumor
samples have
poor overall
survival (OS)
compared to the
other samples. Fig
1 in the original
paper
Recurrence Free
Survivability:
Basaloid tumors
show distinctly
poor recurrence-
free survival (RFS)
compared to other
samples
Age factor:
Patients diagnosed
at an age of 53 or
less showed better
prognosis
compared to those
diagnosed later
Adjuvant
Radiotherapy
Factor: Patients
who did not receive
adjuvant
radiotherapy (Age
≤ 53) show better
overall survival,
compared to those
who did
Unique Findings:
- The basaloid subtypes showed distinctly poor prognosis compared to the other samples
- Adjuvant radiotherapy is not very effective for improving patient survival in these cases
- For patients diagnosed before 53 years of age, administration of adjuvant radiotherapy represents worse long term overall survival
Aim: Markers of Patient Survival
Differential Gene Expression
Key differentially expressed protein-coding genes between the 2 cancer subtypes were identified
106 differentially expressed genes were identified based on the filtering criteria
Key differentially expressed genes were: KLHL23, IVL, MPZL2, KCNK6, SPRR3, ELL2, MALL, RPRD1A, ZNF124
p-value criteria ≤ 0.0001 and absolute log fold change > 0.6
Aim: Basaloid vs. SCC Molecular Comparison
18March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Mutation Association with Cancer
Identified DNA sequence sites with different genotypes between the two lung carcinoma subtypes (basaloid and SCC)
After linkage analysis and filtering, the 373,398 sites were reduced to 735 disease type associated DNA loci
These mapped to 558 unique genes
Aim: Basaloid vs. SCC Molecular Comparison
Karyotype Plot for Mutation Locations across Chromosomesp-value criteria ≤ 0.001 and odds ratio criteria ≥ 3
19March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
We characterized the 558 mutated genes identified from the DNA level analysis using XomPathways
Results indicated the key pathways differentiating the tumor subtypes such as cell signaling and adhesion
Functional CharacterizationAim: Basaloid vs. SCC Molecular Comparison
p-value criteria ≤ 0.001 for pathway enrichment
20March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data
Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Pathway-Pathway Network
Gene-Gene Network
Genes-Pathway Bipartite Network
Multi-Scale Integrative Biology (Expression QTL)Aim: Basaloid vs. SCC Molecular Comparison
The DNA level variants identified for the basaloid
histology comparisons were compared with Gene
Expression levels to view the effect of mutations from
the DNA to RNA level
Expression levels of some of the associated genes
were altered with a large fold change
Interesting genes include:
CLCA2, CENPF, SHROOM3, ELL2, ATP10B, CASC15,
TIAM2, PROX1, EYA1, C10orf54, HOXC9, SCEL,
BCL2, FUT3, YPEL1, PATZ1, CAV2
21March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Multi-Scale Integration Mutations associated with expression level changes were identified
These were associated with up or down-regulation of gene expression
Genes-Mutations Integration
22March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Functional CharacterizationAim: Basaloid vs. SCC Molecular Comparison
Functional Enrichment highlights key pathways involved. For the top differentially expressed genes between tumor and normal samples.
The pathways and processes involved in epidermal and epithelial cell differentiation
Together, the functional analysis results show that the primary differences between the basaloid and SCC subtypes are associated with tissue structure
This is consistent with histology based distinction between the two subtypes
Genes -Biological Processes Bipartite Network
23March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Metabolic and Biochemical Reactions IntegrationAim: Identification of Potential Drug Targets
Genomic level alterations translate into protein and metabolism changes, which finally affect phenotype at a cellular and tissue level
Using expression data, metabolic network models were constructed for healthy and lung cancer samples
Recon X was taken as a reference genome scale model
Genes associated with maximum metabolic alterations can serve as effective targets
Carbohydrate Metabolism Pathways
Image source: Khazaei, T., McGuigan, A., Mahadevan, R.: Ensemble modeling of cancer metabolism. Frontiers in physiology 3 (2012)
24March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Solve Constrained Based Differential EquationsAim: Identification of Potential Drug Targets
Three Step process
Step I: Model initiation using constraint based modeling
Cancer state optimized for maximum growth
Healthy state optimized for maximum energy production
Step II: Identification of highly altered reactions and associated genes
Step III: Extension of gene list to include first degree PPI interactions as potential targets
Step I
Step II
Step III
25March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big
Data Analytics, Springer, LNCS9498, 2015
Metabolic and Protein Network IntegrationAim: Identification of Potential Drug Targets
Identified Metabolic Reactions Network Protein-protein interactions for an identified gene (EIF1B)
26March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Potential targets were identified as genes with large association with altered reactions
High degree in the human protein interaction network for these genes indicates that effect of targeting these will impact more pathways and may be toxic to the cell
Identified potential drug targets include: NME2, GSR, YWHAZ, TGM2, JAM2, STAT3, TIMP2, RHOB, GIT2 and TK1
Systems Biology and the Small Molecule TargetsAim: Identification of Potential Drug Targets
27March 3-5, 2016
Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015
Conclusions from the Cancer/iOMICS Case Study Molecular differences between basaloid and SCC lung carcinoma subtypes:
Based on DNA and RNA level comparisons, we were able to identify genes involved in the differentiation of the two cancer subtypes.
We tracked the mutations in genes such as SHROOM3, PROX1, CLCA2 etc. to gene expression alterations.
The molecular level differences between the two subtypes were able to predict the cellular and tissue level differences seen between the subtypes
Molecular states associated with poor patient survival:
Identified genes involved in poor patient survival probabilities such as VAV2, EIF5A, SCEL etc.
Identified a hidden molecular subtype within the pure basaloid subgroup, having particularly poor prognosis
Identification of potential drug targets:
Based on the translation of gene expression to metabolic fluxes, we identified key altered metabolic pathways, reactions and associated genes which are putative drug targets
All analysis results were validated using extensive bibliomic data
28March 3-5, 2016
Omnia Knowledgebase & Clinical Decision Support System
29March 3-5, 2016
Patient Specific Survival for breast cancer based on the patient
age, sex, grade and stage. There are 2,613 individuals with breast
cancer of age group 45-49, from SEER within Omnia
• For adjuvant therapeutic intervention A+B,
overall QALYs (Quality Adjusted Life Year)
is around 8 years and cost per QALY is
₹2,00,000; with likely disease burden of
~₹16,00,000 for 8 years of life.
• For drug A, the overall QALYs is around 6
years and cost per QALY is ₹80,000; with
likely burden of ~₹4,80,000 for 6 years of
life.
• Using this prognostic information, informed
decision can be made by considering the
QALYs and the total cancer burden.
Drugs with detailed description report for breast cancer type
chr16_g.69373414T>C (NIP7)Omnia contains curated Multi-Omics data (Variation, Expression, GO,
Pathway, Drug, and Pharmacogenomics) along with subjects’ clinical data
such as Demographics, Environmental, Phenotype and other attributes like
HGNC, OMIM, UMLS, ICD10, SEER, and MeSH terms. Currently, Omnia
contains more than 200,000 Variations, 100 Genomic experiments and 5000
Curated papers for Genotype-Phenotype relationships.
Reference: Adhil M, Talukder AK, Gandham S, Agarwal M, CuraEx: Clinical Expert System Using Big data for Precision Medicine,
Big Data Analytics, Springer, LNCS9498, 2015
iOMICS – the MultiOmics Platform
30March 3-5, 2016
iOMICS App Store
31March 3-5, 2016
Enterprises Disrupting Biomedical Industries
InterpretOmics
(http://www.interpretomics.co)
Revolutionizing Genomics through Big data Multi-Scale
Multi-Omics Solutions
Singapore Life Sciences
Transforming Life Sciences and Precision Medicine
Applied Genetics Diagnostics
(http://www.appgendx.com)
The Next Generation Healthsciences company offering
Genetic Diagnostic Services
32March 3-5, 2016
JNCASR
Some Of Our Collaborators/Customers
33March 3-5, 2016
iOMICS Accelerate Your Biomedical Research –Making it Quicker, Reliable, and Affordable
InterpretOmicsOffice: Shezan Lavelle, 5th Floor,
#15 Walton Road, Bengaluru 560001
Sequencing Center: #329, 7th Main, HAL 2nd Stage,
Indiranagar, Bengaluru 560008
Phone: +91(80)46623800