Challenges and opportunities in personal omics profiling

Challenges and Opportunities in Personal OMICS Profiling

Suresh kumar

The broad idea behind the topic

• The functional state of a cell can be explained by the integrated set of different OMICS data, called molecular signature or biomarker.

• The same fact can be exploited to find out difference between diseased and normal.

• For diagnosis of a diseases in future, personal OMICS profiling (POP) is indispensible.

• The POP further confer advantage to produce personal drugs, based on POP.

Small clarification about components of this topic

• OMICS– The term ‘‘omic’’ is derived from the Latin suffix ‘‘ome’’ meaning mass or

many. Thus, OMICS involve a mass (large number) of measurements per endpoint. (Jackson et al., 2006)

• Integration of OMICS data– Efficient integration of data from different OMICS can greatly facilitate the

discovery of true causes and states of disease, mostly done by softwares (Andrew et al., 2006).

• Biomarker development or molecular signatures– A set of biomolecular features (snapshots of OMICS integration) to predict a

phenotype (diseased) of clinical interest on a previously unseen patient sample (Sung et al., 2012).

• Personalized OMICS profiling– The minimal required OMICS data for every person

• Personalized medicine– The drug formulations which are prepared based on the POP (Chan and

Ginsburg, 2011)

What is ‘omics’?• In biological context , suffix –omics is used to refer to

the study of large sets of biological molecules (Smith et al., 2005)

• The realization that DNA is not alone regulate complex biological processes (as a result of HGP, 2001), triggered the rapid development of several fields in molecular biology that together are described with the term OMICS.

• The OMICS field ranges from – Genomics (focused on the genome) – Proteomics (focused on large sets of proteins, the

proteome) – Metabolomics (focused on large sets of small molecules,

the metabolome). (Jelle et al., 2010)

Genomics

• The field of genomics has been divided into 3 major categories.– Genotyping (focused on the genome sequence),

• The physiological function of genes and the elucidation of the role of specific genes in disease susceptibility (Syvanen, 2001)

– Transcriptomics (focused on genomic expression)• The abundance of specific mRNA transcripts in a biological sample

is a reflection of the expression levels of the corresponding genes (Manning et al., 2007)

– Epigenomics (focused on epigenetic regulation of genome expression)

• Study of epigenetic processes (expression activities not involving DNA) on a large (ultimately genome-wide) scale (Feinberg, 2007)

Genotyping• Goal

– Identification of the physiological function of genes– Role of specific genes in disease susceptibility (syvanen et al., 2001)

• Common Parameter used– Among different variations (insertions, deletions, SNPs, etc.), single

nucleotide polymorphisms (SNPs) are the most commonly investigated (Sachidanandam et al., 2001) and can be used as markers for diseases.

– Tag SNPs (informative subset of SNPs) and fine mapping are further used to identify true cause of phenotype (patil et al., 2001).

• Application– Identification of genes associated with disease

• Recent improvement in genotyping– Array-based genotyping techniques, allowing the simultaneous

assessment (up to 1 million SNPs) per assay, leads to the genotyping of entire genome known as genome-wide association studies (GWAS) Jelly et al., 2010)

Transcriptomics• Gene expression profiling

– The identification and characterization of the mixture of mRNA that is present in a specific sample.

• Principle– The abundance of specific mRNA transcripts in a biological sample is a

reflection of the expression levels of the corresponding genes

(Manning et al., 2007).• Application

– To associate differences in mRNA mixtures originating from different groups of individuals to phenotypic differences between the groups

(Nachtomy et al., 2007). • Challenge

– The transcriptome in contrast to the genome is highly variable over time, between cell types and environmental changes (Celis et al., 2000).

Epigenomics• Epigenetic processes

– Mechanisms other than changes in DNA sequence that cause effect in gene transcription and gene silencing30-32.

– Number of mechanisms of epigenomics but is mainly based on two mechanisms, DNA methylation and histone modification28 33-39.

– Recently RNAi has acquired considerable attention31 40 41. • Goal

– The focus of epigenomics is to study epigenetic processes on a large (ultimately genome-wide) scale to assess the effect on disease28 29.

• Association with disease– Hypermethylation of CpG islands located in promoter regions of genes

is related to gene silencing. 28 36. Altered gene silencing plays a causal role in human disease31 34 37 38 42.

– Histone proteins are involved in the structural packaging of DNA in the chromatin complex. Post translational histone modifications such as acetylation and methylation are believed to regulate chromatin structure and therefore gene expression34 37

http://www.ncbi.nlm.nih.gov/pubmed/17522671







Proteomics• Proteomics provides insights into the role proteins in biological systems.

The proteome consists of all proteins present in specific cell types or tissue and highly variable over time, between cell types and will change in response to changes in its environment, a major challenge (Fliser et al., 2007).

• The overall function of cells can be described by the proteins (intra- and inter-cellular )and the abundance of these proteins (Sellers et al., 2003)

• Although all proteins are directly correlated to mRNA (transcriptome) , post translational modifications (PTM) and environmental interactions impede to predict from gene expression analysis alone (Hanash et al., 2008)

• Tools for proteomics– Mainly two different approaches that are based on detection by

• mass spectrometry (MS) and • protein microarrays using capturing agents such as antibodies.

• Major focuses– the identification of proteins and proteins interacting in protein-complexes– Then the quantification of the protein abundance. The abundance of a specific

protein is related to its role in cell function (Fliser et al., 2007)

Metabolomics• The metabolome consists of small molecules (e.g. lipids or

vitamins) that are also known as metabolites (Claudino et al., 2007).

• Metabolites are involved in the energy transmission in cells (metabolism) by interacting with other biological molecules following metabolic pathways.

• Metabolic phenotypes are the by-products of interactions between genetic, environmental, lifestyle and other factors (Holmes et al., 2008).

• The metabolome is highly variable and time dependent, and it consists of a wide range of chemical structures.

• An important challenge of metabolomics is to acquire qualitative and quantitative information with preturbance of environment (Jelly et al., 2010)

Application of different omics

Joyce et al., 2006

Overview of the different OMICS technologies

TechnologyMolecules of

interestDefinition

Temporal variance

Disease influence

Genotyping DNA Assessment of variability in DNA sequence in the genome

None No

Epigenomics Epigenetic modifications of DNA

Assessment of factors that regulate gene expression without changing DNA sequence of the genome

Low / Moderate

Probable

Gene expression profiling

RNA Assessment of variability in composition and abundance of the transcriptome

High Yes

Proteomics Proteins Assessment of variability in composition and abundance of the proteome

High Yes

Metabolomics Small molecules

Assessment of variability in composition and abundance of the metabolome

High Yes

(Jelle et al., 2010)

(Jiannis, 2009)

(Carmen and Matthias , 2004)

Genomic techniques

Proteomic techniques

Biological sample

Metabolic Profiling Techniques

• There is no single technology to detect all compounds found in biological system.

• Metabolic analytical techniques – gas chromatography

(GC), – liquid chromatography

(LC), – capillary

electrophoresis (CE)-MS, and

– NMR

(Kazuki S and Fumio M, 2010)

'OMICS' data repositories

(Joyce et al. 2006)

Why do we integrate the OMICS data?

• A functional state of a biological system can be seen as snapshots of OMICs

• To make better and faster decisions about therapeutic targets.

• To differentiate the diseased phenotype with the normal ones

• Thus data integration is a perennial issue in OMICS.

(Akula et al., 2009)

Integrating OMICS data• The computational tools for

integrating 'omics' data generally tackle three specific tasks– Identifying the network

scaffold by delineating the connections that exist between cellular components

– Decomposing the network scaffold into its constituent parts in an attempt to understand the overall network structure

– Developing cellular or system models to simulate and predict the network behaviour that gives rise to particular cellular phenotypes.

(Akula et al., 2009)

OMICS integration techniques

(Joyce et al., 2006)

Software for omics data integration

(Joyce et al., 2006)

What is omics based medicine?

• To date, application of comprehensive molecular information to medicine has been referred to as “genomic medicine”(Guttacher and Collins, 2002)

• Post genomic advances collectively called omics are giving rise to new possibilities of medicine, inducted a rapidly progressing informatics, called “clinical bioinformatics” (Knaup et al., 2004), or in a more recent term, “translational informatics” (Gaughan, 2006) is playing an indispensable role by deriving clinically meaningful information from the vast amount of omics data and more predictive or preventive than conventional genomic medicine.

• This new stage of molecular medicine needs a new term to distinguish itself from genomic medicine. We may call it simply “omics-based medicine” (Tanaka, 2010)

Developmental stages of omics medicine

• Data driven analysis of omics data– It leads to efficient sets of genes called “signature” from data

mining or exploratory statistics to gene expression profiles of diseased cells to predict recurrence of cancers (Alizadeh et al., 2002).

• Model driven analysis of omics data– Diseases would be better understood as a phenotype caused by

“systems distortion of the molecular network” due to the interrelated malfunction of genes and proteins, termed as pathway diseases (Grubb et al., 2009)

• System based analysis of omics data– All omics data exclusively from a biological system analysed for

diseases as “systems pathology”, in the sense that it is a proper application of systems biology to diseases (Tanaka, 2009).

Three generations of omics based medicine

• The first generation of omics based medicine– Base

• The inborn individual differences of genome using genetic polymorphism– Analytical method

• Simple statistical parameters• In the second generation of omicsbased medicine,

– Base• Vast amount of the various post-genomic disease omics data containing comprehensive

molecular information of diseased somatic cells – Analytical method

• Data driven analysis.• Third generation of omics based medicine

– Base• Knowledge about the cellular molecular network, system level understanding of the

disease, called systems pathology, – Analytical method

• Model driven analysis.

(Tanaka, 2009)

Some of commercial Signatures

What is personalized medicine?• Personalized medicine is a

– Broad and rapidly advancing field of health care using each person's unique clinical, genetic, genomic, and environmental information.

– An integrated, coordinated, and evidence-based approach for individualizing patient care.

– PM utilizes our molecular understanding of disease to enhance preventive health care strategies.

• The overarching goal of personalized medicine is to optimize medical care and outcomes for each individual, resulting in an unprecedented customization of patient care.

• The components of personalized medicine are,

– Family Health History (FHH)– Health Risk Assessment (HRA)– Integration of omics datasets– Clinical Decision Support (CDS)

(Isaac and Ginsburg, 2010)

Family Health History (FHH)• FHH is an invaluable tool for the delivery of personal

health risk information, reflecting the complex combination of shared genetic, environmental, and lifestyle factors.

• The assessment and integration of FHH information have not been embraced by the health care community (79)

• The challenge of incorporating FHH into the public's health involves three essential components:

(a) accessible, standard collection methods; (b) health care provider access; and (c) clinical guidance for interpretation and use. (175).

Health Risk Assessment (HRA)

• A fundamental component of personalized medicine is a standard health risk assessment (HRA) to evaluate an individual's likelihood of developing the most common chronic diseases (or disease events).

Eg., • Framingham coronary heart disease model, developed from the

Framingham Heart Study begun in 1948 (111). • The Gail model breast-cancer risk assessment and its modified

versions are also widely accepted tools (58).

• lack of standards for the clinical data required or the algorithms used, and to the lack of integration into health information technology systems (133)

Clinical Decision Support (CDS)

• To optimize the use of FHH and HRAs, clinical decision support (CDS) systems are used.

• Computerized CDS systems are increasingly being used, which integrates all patient-specific information to help manage diagnosis and treatment.

• CDS systems have been shown to improve prescribing practices, enhance preventive care, and improve compliance with evidence-based standards of care (12, 195, 224)

• Efficient algorithms and standard input format for different kind of patient specific information.

Clinical importance of omics“-omics” approach Generated information Applications Notable examples

Human genome sequence (genomics)

Whole-genome sequence, SNPs, and CNVs (10–15 million)

Disease mechanismsDisease diagnosisPharmacogenomics

Age-related macular degeneration (120), HCV virologic response (1), AML (32), warfarin dosing (6)

Gene expression profiles (transcriptomics)

Microarrays and RNA sequencing ( 25,000 transcripts)

Disease mechanismsDisease diagnosisDisease prognosisPharmacogenomics

AML (71), ALL (94), ACS (20), breast cancer (161)

Proteome (proteomics)

Protein profiles of specific protein products

Disease diagnosis ACS (143)

Metabolome (metabolomics)

Metabolic profiles (1,000–10,000 metabolites)

Disease mechanismsPharmacogenomics

ACS (182), drug toxicity (44), cancer profiling (76), CAD (193

Abbreviations: ACS, acute coronary syndromes; ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; CAD, coronary artery disease; CNV, copy number variation; HCV, hepatitis C virus; SNP, single-nucleotide polymorphism. Table adapted from Reference 66.

Molecular diagnostics of disease

Opportunities

• There are two important origins of opportunities for personal omics profiling– The opportunities arising from advances in the

biologic sciences – The opportunities arising from advances in

healthcare IT

Increased level in testing

*NIH Report on Genetics and Health **BNP = B-type Natriuretic Peptide

Predictive model development

Overall opportunities of PM

Advancement in health care IT

Challenge I

• OMICS data is currently spread world wide in wide variety of formats.

• These formats can be unified and migrated across platforms through suitable techniques

• Possible solution– The use of XML techniques to store data. – XML is used to provide a document markup

language that is easier to learn, retrieve, store and transmit. It is semantically richer than HTML.

(Akula, 2009)

Challenge II• Integrating fragmentation of knowledge from several sources of

heterogeneous information into a coherent entity (Goble et al., 2008)

• It is widely recognized that successful data integration is one of the keys to improve productivity for stored data.

• Possible solutions– bio warehousing (tool sql)

• integrates its component databases into a common representational framework within a single database management system (Lee, 2006)

– database federation (COBRA and J2EE)• A federated database is a logical association of independent databases that

provides a single, integrated, coherent view of all resources in the federation.– controlled vocabularies

• a form of data integration by enforcing naming conventions for data elements that ultimately appear in omics databases (Avraham et al., 2008)‐

Overall challenges

Making available of relevant information

Why did they develop?– Repository of molecular information and detailed clinical

information– Relating the genome and the pathological findings may yield

good future medicine.

iCOD• Data stored (140 patient cases

of hepatocellular carcinoma)– disease information of the

patients – CGH (Comparative Genomic

Hybridization)– gene expression profiles– comprehensive clinical

information • clinic al manifestations, • medical images (CT, X-ray,

ultrasounds, etc), • laboratory tests, • drug histories, • pathological findings and• life-style environmental

information.• Online address

– http ://omics.tmd.ac.jp/icod_p ub_eng

Omics data integration tool• Aim

– Making the omics data in exchangable format and organize the data in an integrative way and link it with applications for data interpretation and analysis

• Description– DIPSBC is a data integration

platform for medium-scale collaboration projects.

– Because of its modular design and the incorporation of XML data formats it is highly flexible and easy to use.

– DIPSBC uses XML for data representation

• URL– http://dipsbc.molgen.mpg.de.

Advanced personalized medicine

Overview of the work• Idea behind the work

– Personalized medicine may get new realm by combining genomic information with regular periodical monitoring of physiological states by multiple high-throughput methods.

• Methodology– Authors presented an integrative personal omics profile (iPOP), an analysis

that combines genomic, transcriptomic, proteomic, metabolomic, and autoantibody profiles from a single individual over a 14 month period.

• Outcomes– The iPOP analysis revealed various medical risks, including type 2 diabetes. – It also uncovered extensive, dynamic changes in diverse molecular

components and biological pathways across healthy and diseased conditions. – Extremely high-coverage genomic and transcriptomic data, which provide the

basis of our iPOP, revealed extensive heteroallelic changes during healthy and diseased states and

– an unexpected RNA editing mechanism. – This study demonstrates that longitudinal iPOP can be used to interpret

healthy and diseased states by connecting genomic information with additional dynamic omics activity.

Conclusion

• Advances in molecular biology and computational informatics are powering personalized medicine

• Personalized medicine presents real opportunities and real challenges to the existing model of care provision

• Personalized medicine includes genomics, but is more than genomics

• Healthcare IT will be vital to the realization of personalized medicine

Thank you

Challenges and opportunities in personal omics profiling

Education

Transcript of Challenges and opportunities in personal omics profiling