Approaches for Integration of multiple ‘Omic’ Data
Dmitry Grapov, PhD
Examples
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643
FBA = flux-balance analysis
• Topological enrichment can give broad overview of impacted genes, proteins and metabolites
• Changes in biochemical domains corroborated by multi-Omic data sets can be used to identify robust candidates responsible for phenotypic variation between comparisons
• Gene-gene, protein-protein or gene-protein interaction networks can be used to deconvolute ambiguous metabolic pathways
Common Approaches
Nature Reviews Genetics 15, 107–120 (2014) doi:10.1038/nrg3643
Biochemical Domain Enrichment Analysis
• Genes/Proteins DAVID, AmiGo, etc GO:terms
• Genes/Proteins + Metabolites IMPaLA: Integrated Molecular Pathway Level Analysis (http://impala.molgen.mpg.de/) pathways
1. Classify all species domains (e.g. biological process, pathway, etc)
2. Calculate probability of observing changes in species by chance
IMPaLA: Gene + Metabolite pathway enrichment
Challenges:• Removal of redundant information• Preference of specific vs. generic pathways• Visualization of gene + metabolite + pathway relationships
Determining significance of the enrichment: Hypergeometric Test
How to calculate statistics to determine enrichment?
hit.num = 51 # number of significantly changed pathway metabolites set.num = 1455 # number of metabolites in pathway full = 3358 # all possible metabolites in organismq.size = 72 # number of significantly changed metabolites
phyper(hit.num-1, set.num, full-set.num, q.size, lower.tail=F)= 1.717553e-06
GO Enrichment analysis:Hierarchy of Redundancy (parents)
• GO is an ontology wherein enrichment is often shared by children and parents.
• Difficult to co-visualize term hierarchy and gene to term mapping
Enrichment networks: Removing the Hierarchy of Redundancy
Workflow:
1. If two nodes share all genes, drop least enriched (highest p-value)
2. Filter terms based on enrichment
3. Display term to gene/protein relationships as edges in a network
4. Map direction of change in genes/proteins to network node attributes
Enrichment NetworkMapping of parents through children
GO enrichment network displays:
• gene names associated with each overrepresented term
• Fold change in protein expression between two groups (can be extended k>2 groups)
• Can display enrichment p-value for each term
• Can incorporate metabolites as children of genes
Empirical Networks
• Correlation based networks (CN) (simple, tendency to hairball)
• GGM or partial correlation based networks (advanced, preference of direct over indirect relationships
• *Increase in robustness with sample size
10.1007/978-1-4614-1689-0_17
Topological Enrichment Networks
http://pubchem.ncbi.nlm.nih.gov//score_matrix/score_matrix.cgi
http://www.genome.jp/dbget-bin/www_bget?rn:R00975
Topological Enrichment Networks:genes + proteins + metabolites
MetaMapRBiological network generator
https://github.com/dgrapov/MetaMapR
[email protected] metabolomics.ucdavis.edu
This research was supported in part by NIH 1 U24 DK097154
Top Related