Genomics2 Phenomics Complete

18th June 2001017th December 2009

Genomics to Phenomics:The Complex Journey in Big Data Biomedicine

Asoke K Talukder, PhDInterpretOmics, Bangalore

Indian Society of Human Genetics41st Annual Meeting,

Sankara Nethralaya, Chennai, 3-5 March, 2016

Acknowledgement

• Organizing Committee, ISHG2016

• Authors & Agencies Making their Research articles, and Data available in the open domain and Internet

• Authors of Open source Software

• NCBI, NIH, Wikipedia, Google & other Internet sites that believe in Bhikshu Economy by making their contents open in the Cloud

2March 3-5, 2016

Hunting the “Dwarfing” Gene?

3March 3-5, 2016

Palm Oil – Activate the Dwarfing Gene (Genomics)Teak – Repress the Dwarfing Gene (Genomics)

The Human Genome – Decoding the Book of Life

A Milestone for Humanity – the Human genome

Human Genome Completed, 26 June, 2000

Francis CollinsBill ClintonJ Craig Ventor

Craig Venter Bill Clinton Francis Collins

4March 3-5, 2016

Trillion-Dollar Science to Trillion-Dollar Industry

5March 3-5, 2016

The relationship between the number of stem cell divisions in the lifetime of a given tissue and the lifetime risk of

cancer in that tissue

Reference: Cristian Tomasetti, and Bert Vogelstein, Jan 2 Science 2015;347:78-81

6March 3-5, 2016

Reference: Norbert Stefan, et al, Divergent associations of height with

cardiometabolic disease and cancer: epidemiology, pathophysiology, and

global implications. The Lancet Diabetes & Endocrinology, 2016; DOI:

Reduction Vs Integration

7March 3-5, 2016

Genomics (System)

(Genetics)

Talukder AK, Genomics 3.0, Big Data Analytics, Springer, 2015

Evidence Based Science (Biology & Medicine)

8March 3-5, 2016

Genetics Genomics

Confirmatory Exploratory

Hypothesis Driven Hypothesis Creating

Component Holistic

Biology Statistical Data Mining

Big Data in Biomedicine

The 7 Vs of Genomic Big Data

• Volume is defined in terms of the physical volume of the data that need to be online, like giga-byte (10^9), tera-byte (10^12), peta-byte (10^15) or exa-byte (10^18) or even beyond.

• Velocity is about the data-retrieval time or the time taken to service a request. Velocity is also measured through the rate of change of the data volume.

• Variety relates to heterogeneous types of data like text, structured, unstructured, video, audio etcetera.

• Veracity is another dimension to measure data reliability - the ability of an organization to trust the data and be able to confidently use it to make crucial decisions.

• Vexing covers the effectiveness of the algorithm. The algorithm needs to be designed to ensure that data processing time is close to linear and the algorithm does not have any bias; irrespective of the volume of the data, the algorithm is able to process the data in reasonable time.

• Variability is the scale of data. Data in biology is multi-scale, ranging from sub-atomic ions at picometers, macro-molecules, cells, tissues and finally to a population [9] at thousands of kilometers.

• Value is the final actionable insight or the functional knowledge. The same mutation in a gene may have a different effect depending on the population or the environmental factors.

9March 3-5, 2016

Reference: Talukder AK, Genomics 3.0: Big Data in Precision Medicine, Big Data Analytics, Springer, LNCS9498, 2015

21st Century Biomedicine is a Multi-Scale Challenge

Genome

Transcriptome

Proteome

Cellular Structure and Function

Tissue Structureand Function

Organ Structure and Function

Patient

Molecular Scale

ηm~μm μm~mm mm~cm 1mηm

ηs

ηs-μs

μs-s

s~hour

hour~day

years

Molecular events (Eg: Ion-channel gating

Diffusion and cell signaling

Motility

Mitosis

Protein turnover

Human lifetime

10March 3-5, 2016

Genomics

Phenomics

‘OMICS’ (High-throughput) Big Data Domains

GWAS

Population

GeneticsMicroarray

Systems

Biology

Phenomics

ChIp-Seq DNA-Seq

RNA-Seq

Exome-Seq

Repli-Seq

Small

RNA-Seq

Metabolic

Networks

Proteomics

Metagenomics

11March 3-5, 2016

Multi Omics Big Data

12March 3-5, 2016


Hypothesis Creating Multi-Omics Big Data Analytics

13March 3-5, 2016

Genomic Big Data

Statistics

(Exploratory Data Analysis)

Phenomic &

Environmental

Knowledgebase

Systems Biology


Lung Cancer: A Multi-Omics Multi-ScaleBig Data Case Study using iOMICS Pipelines

We have taken a Lung squamous cell carcinoma study and reanalysed its data using iOMICS pipelines to unleash novel knowledge

The reanalysis is for a Lung Squamous Cell Carcinoma (SCC) 18 years Longitudinal clinical research published in PMID: 25189482.

The data consist of Omics data for 93 tumor patients and 16 healthy individuals. DNA level genotype data: 64 tumor samples, 373,398

DNA sites RNA level gene expression data: 109 samples, 20,117

genes Clinical data: General and clinical information (where

applicable) for all 109 individuals in the study. Survival information was also available in the form of overall and disease recurrence free survival

14March 3-5, 2016

Multi-Omics Based Multi-Scale Analytics Framework

Data from patient is integrated with existing knowledgebases using a 3 step analysis framework

Top-down Exploratory Data Analysis: Analysis of experimental data for molecular information such as DNA mutations and gene expression

Multi-scale Integrative Analysis: Integration of molecular scale data such as DNA, RNA level results for mechanistic modeling

Bottom-up Integrative and Network Analysis:Integration of experimental data analysis results with existing knowledgebases for generalizability and improved quality results

Results from the framework can be used to power clinical decision support systems for treatment strategies and drug design

15March 3-5, 2016

Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics,

Springer, LNCS9498, 2015

Patient Stratification

Integration of gene expression with recurrence free survival of patients

Recurrence free survival known for 87 samples

Used Cox regression to model survival time as a function of gene expression

Stratified patients into 3 response groups:

Good, Average and Poor prognosis

Aim: Markers of Patient Survival

Survival Probability Curves for Stratified Prognosis Groups

Top Significant Genes separating Poor and Average prognosis are: EIF5A, SCEL, ABCA11P, VAV2

Top Significant Genes separating Good and Average prognosis are: SLC7A11, G6PD, ALDH3A1, NQO1, SOST

Top Significant Genes separating Good and Poor prognosis are: SCEL, VAV2, PPP1R26, ZNF77, EIF5A

16March 3-5, 2016

Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015

Phenotype Based Patient Stratification

17© InterpertOmics

Data: Lung Squamous Cell Carcinoma with Basaloid Histology Study (PubMed-ID: 25189482) Clinical Data.

Overall

Survivability: The

basaloid tumor

samples have

poor overall

survival (OS)

compared to the

other samples. Fig

1 in the original

paper

Recurrence Free

Survivability:

Basaloid tumors

show distinctly

poor recurrence-

free survival (RFS)

compared to other

samples

Age factor:

Patients diagnosed

at an age of 53 or

less showed better

prognosis

compared to those

diagnosed later

Adjuvant

Radiotherapy

Factor: Patients

who did not receive

adjuvant

radiotherapy (Age

≤ 53) show better

overall survival,

compared to those

who did

Unique Findings:

- The basaloid subtypes showed distinctly poor prognosis compared to the other samples

- Adjuvant radiotherapy is not very effective for improving patient survival in these cases

- For patients diagnosed before 53 years of age, administration of adjuvant radiotherapy represents worse long term overall survival

Aim: Markers of Patient Survival

Differential Gene Expression

Key differentially expressed protein-coding genes between the 2 cancer subtypes were identified

106 differentially expressed genes were identified based on the filtering criteria

Key differentially expressed genes were: KLHL23, IVL, MPZL2, KCNK6, SPRR3, ELL2, MALL, RPRD1A, ZNF124

p-value criteria ≤ 0.0001 and absolute log fold change > 0.6

Aim: Basaloid vs. SCC Molecular Comparison

18March 3-5, 2016


Mutation Association with Cancer

Identified DNA sequence sites with different genotypes between the two lung carcinoma subtypes (basaloid and SCC)

After linkage analysis and filtering, the 373,398 sites were reduced to 735 disease type associated DNA loci

These mapped to 558 unique genes

Aim: Basaloid vs. SCC Molecular Comparison

Karyotype Plot for Mutation Locations across Chromosomesp-value criteria ≤ 0.001 and odds ratio criteria ≥ 3

19March 3-5, 2016


We characterized the 558 mutated genes identified from the DNA level analysis using XomPathways

Results indicated the key pathways differentiating the tumor subtypes such as cell signaling and adhesion

Functional CharacterizationAim: Basaloid vs. SCC Molecular Comparison

p-value criteria ≤ 0.001 for pathway enrichment

20March 3-5, 2016

Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data

Analytics for Cancer Genomics, Big Data Analytics, Springer, LNCS9498, 2015

Pathway-Pathway Network

Gene-Gene Network

Genes-Pathway Bipartite Network

Multi-Scale Integrative Biology (Expression QTL)Aim: Basaloid vs. SCC Molecular Comparison

The DNA level variants identified for the basaloid

histology comparisons were compared with Gene

Expression levels to view the effect of mutations from

the DNA to RNA level

Expression levels of some of the associated genes

were altered with a large fold change

Interesting genes include:

CLCA2, CENPF, SHROOM3, ELL2, ATP10B, CASC15,

TIAM2, PROX1, EYA1, C10orf54, HOXC9, SCEL,

BCL2, FUT3, YPEL1, PATZ1, CAV2

21March 3-5, 2016


Multi-Scale Integration Mutations associated with expression level changes were identified

These were associated with up or down-regulation of gene expression

Genes-Mutations Integration

22March 3-5, 2016


Functional CharacterizationAim: Basaloid vs. SCC Molecular Comparison

Functional Enrichment highlights key pathways involved. For the top differentially expressed genes between tumor and normal samples.

The pathways and processes involved in epidermal and epithelial cell differentiation

Together, the functional analysis results show that the primary differences between the basaloid and SCC subtypes are associated with tissue structure

This is consistent with histology based distinction between the two subtypes

Genes -Biological Processes Bipartite Network

23March 3-5, 2016


Metabolic and Biochemical Reactions IntegrationAim: Identification of Potential Drug Targets

Genomic level alterations translate into protein and metabolism changes, which finally affect phenotype at a cellular and tissue level

Using expression data, metabolic network models were constructed for healthy and lung cancer samples

Recon X was taken as a reference genome scale model

Genes associated with maximum metabolic alterations can serve as effective targets

Carbohydrate Metabolism Pathways

Image source: Khazaei, T., McGuigan, A., Mahadevan, R.: Ensemble modeling of cancer metabolism. Frontiers in physiology 3 (2012)

24March 3-5, 2016


Solve Constrained Based Differential EquationsAim: Identification of Potential Drug Targets

Three Step process

Step I: Model initiation using constraint based modeling

Cancer state optimized for maximum growth

Healthy state optimized for maximum energy production

Step II: Identification of highly altered reactions and associated genes

Step III: Extension of gene list to include first degree PPI interactions as potential targets

Step I

Step II

Step III

25March 3-5, 2016

Reference: Agarwal M, Adhil M, Talukder AK, Multi-Omics Multi-Scale Big data Analytics for Cancer Genomics, Big

Data Analytics, Springer, LNCS9498, 2015

Metabolic and Protein Network IntegrationAim: Identification of Potential Drug Targets

Identified Metabolic Reactions Network Protein-protein interactions for an identified gene (EIF1B)

26March 3-5, 2016


Potential targets were identified as genes with large association with altered reactions

High degree in the human protein interaction network for these genes indicates that effect of targeting these will impact more pathways and may be toxic to the cell

Identified potential drug targets include: NME2, GSR, YWHAZ, TGM2, JAM2, STAT3, TIMP2, RHOB, GIT2 and TK1

Systems Biology and the Small Molecule TargetsAim: Identification of Potential Drug Targets

27March 3-5, 2016


Conclusions from the Cancer/iOMICS Case Study Molecular differences between basaloid and SCC lung carcinoma subtypes:

Based on DNA and RNA level comparisons, we were able to identify genes involved in the differentiation of the two cancer subtypes.

We tracked the mutations in genes such as SHROOM3, PROX1, CLCA2 etc. to gene expression alterations.

The molecular level differences between the two subtypes were able to predict the cellular and tissue level differences seen between the subtypes

Molecular states associated with poor patient survival:

Identified genes involved in poor patient survival probabilities such as VAV2, EIF5A, SCEL etc.

Identified a hidden molecular subtype within the pure basaloid subgroup, having particularly poor prognosis

Identification of potential drug targets:

Based on the translation of gene expression to metabolic fluxes, we identified key altered metabolic pathways, reactions and associated genes which are putative drug targets

All analysis results were validated using extensive bibliomic data

28March 3-5, 2016

Omnia Knowledgebase & Clinical Decision Support System

29March 3-5, 2016

Patient Specific Survival for breast cancer based on the patient

age, sex, grade and stage. There are 2,613 individuals with breast

cancer of age group 45-49, from SEER within Omnia

• For adjuvant therapeutic intervention A+B,

overall QALYs (Quality Adjusted Life Year)

is around 8 years and cost per QALY is

₹2,00,000; with likely disease burden of

~₹16,00,000 for 8 years of life.

• For drug A, the overall QALYs is around 6

years and cost per QALY is ₹80,000; with

likely burden of ~₹4,80,000 for 6 years of

life.

• Using this prognostic information, informed

decision can be made by considering the

QALYs and the total cancer burden.

Drugs with detailed description report for breast cancer type

chr16_g.69373414T>C (NIP7)Omnia contains curated Multi-Omics data (Variation, Expression, GO,

Pathway, Drug, and Pharmacogenomics) along with subjects’ clinical data

such as Demographics, Environmental, Phenotype and other attributes like

HGNC, OMIM, UMLS, ICD10, SEER, and MeSH terms. Currently, Omnia

contains more than 200,000 Variations, 100 Genomic experiments and 5000

Curated papers for Genotype-Phenotype relationships.

Reference: Adhil M, Talukder AK, Gandham S, Agarwal M, CuraEx: Clinical Expert System Using Big data for Precision Medicine,

Big Data Analytics, Springer, LNCS9498, 2015

iOMICS – the MultiOmics Platform

30March 3-5, 2016

iOMICS App Store

31March 3-5, 2016

Enterprises Disrupting Biomedical Industries

InterpretOmics

(http://www.interpretomics.co)

Revolutionizing Genomics through Big data Multi-Scale

Multi-Omics Solutions

Singapore Life Sciences

Transforming Life Sciences and Precision Medicine

Applied Genetics Diagnostics

(http://www.appgendx.com)

The Next Generation Healthsciences company offering

Genetic Diagnostic Services

32March 3-5, 2016

JNCASR

Some Of Our Collaborators/Customers

33March 3-5, 2016

iOMICS Accelerate Your Biomedical Research –Making it Quicker, Reliable, and Affordable

InterpretOmicsOffice: Shezan Lavelle, 5th Floor,

#15 Walton Road, Bengaluru 560001

Sequencing Center: #329, 7th Main, HAL 2nd Stage,

Indiranagar, Bengaluru 560008

Phone: +91(80)46623800

Genomics2 Phenomics Complete

Health & Medicine

Transcript of Genomics2 Phenomics Complete