Abstract

1
An Integrative Approach for the Study of Sequence Variation Impact on Biological Processes, Diseases and Environmental Agents’ Risk Sivakumar Gowrisankar, Amol S Deshmukh, Anil G Jegga and Bruce J Aronow Department of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center and University of Cincinnati The integration of genomic sequence analyses from multiple species and strains, along with protein interaction data and gene expression profiles that reflect specific biological states and processes has opened many new avenues to understand specific biological systems. Nevertheless, formidable challenges remain to be overcome for the improvement of prediction, diagnosis, prognosis, and treatment of human diseases. Can we infer from large molecular datasets how different biological entities are organized and interact, and then predict the effect that genetic polymorphisms or sequence variations might confer on interconnected biological processes? The integration of heterogeneous data and information in fact is a key issue in functional genomics. An appropriate data model and consistent methods for its integrated representation, analysis, and visualization has the potential to pave the way for the emergence of discovery-driven science, enhance hypothesis- generation, and provide new focus for experimental validation and refinement. Thus, to represent the presence and impact of polymorphisms further in the context of biological pathways, we have sought to unify our representation of molecular, biological, and environmental entities such that biological knowledge from experts and biomedical literature could be assembled in a storyboard canvas. For example, the representation of a disease could consist of a biological process composed of one or more pathways, within which, entities (gene products, complexes, and cellular and subcellular components) are subjected to one or more interactions and transitions to disease term associated states. We have begun the development of a suite of applications using a common database structure that can represent biological processes using a host of publicly available data sources including gene objects and biological ontologies that in turn represent systematic abstractions of biomedical literature and expert knowledge. As part of this exercise, we have compiled all existing protein-protein interactions from “interactome” rich databases (PreBIND, MINT, DIP and HPRD) and mine the biomedical literature for novel interactions unrepresented in these specialized databases. Our compiled interactions data comply with the standards set out by Proteomics Standards Initiative (PSI) facilitating easy data exchange. As available annotations increase the challenge is to integrate biological process representation in such a way as to increase our understanding rather than obscure in convoluted figures or excessive detail. The use of a network visualizer provides not only a lucid means of summarizing existing biological knowledge about molecular behavior but also helps in elucidating the potential implications sequence variations can have on protein-protein interactions or the binding of specific transcription factors. Abstract Abstract Intestine s Hollow viscus Large intestina l structure Organ with organ cavity Large Intestine Colon structure Region of large intestine Colon A Systems Biology Integrative Approach A Systems Biology Integrative Approach GKP-PathMaker GKP-PathMaker Future Directions Future Directions References & Support References & Support 1. XPrInt and PatholoGene: http://abstrainer.cchmc.org 2. UMLS Knowledge Source Server: http://umlsks.nlm.nih.gov 3. Open Biological Ontologies: http://obo.sourceforge.net Support: NIEHS U01 ES11038 Mouse Centers Genomics Consortium PatholoGene – Development of a system to link biological entities, anatomy, pathways and diseases using the UMLS Semantic Network, NCBI-OMIM and MedLine abstract parsing with ICD10 disease terms and gene symbols. The Semantic Network, through its semantic types, provides a categorization of all UMLS Metathesaurus concepts. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain. The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, bibliographic and full-text databases, and expert systems. As a test case we illustrate the analysis of colon cancer as a function of anatomy, pathology, etiology and disease progression. PatholoGene PatholoGene XPrInt: Extracting & Compiling Protein Interactions XPrInt: Extracting & Compiling Protein Interactions P r o t e i n I n t e r a c t i o n s PreBIND GeneRIF HPRD OMIM FANCG, NBS1, RB1, TP53, CDKN2A TNF, IL5, TNFRSF14, IL12B, IL12A, IL8, IL1B, IL4R, LTB, RAG1, TNFRSF6, TNFRSF17, APOE, TNFRSF7, TNFRSF4, TNFRSF9, TNFRSF5, F3, LTA NIEHS Candidate Genes’ Categorization Based on GO (Biological Process) Are these functionally clustered proteins involved in a common biological network or interaction? Co-citation in literature abstracts using gene/protein symbols and “interactome -specific” keywords Does a SNP in one or more biological entities result in aberrations within a pathway and manifest as a disease or contribute to increased susceptibility to disease or an altered response to therapeutic agents? Map Molecule Gene BioMaterial Publishable GKP Object Expert Curated Ontologies Unified Representation of Disease States and Biological Processes using Clinical Phenotype, Molecular Signatures, and Genetic Attributes Analysis, Diagnosis and Prediction Disease State A Disease State B Therapeutic Interventio n Diseas e Proces s Modeli ng Tool Patie nt-Cent e r ed Clinical Obser va tions New Insights & Hypothes es Pathways and Processes- C entered Biomedical Knowledge Sample-Centered Genetic and Genomic Data Biological Entities 12 Siblings (UMLS – Concepts) Adenomatous Polyposis Coli Basal Cell Nevus Syndrome Colorectal Neoplasms, Hereditary Nonpolyposis Dysplastic Nevus Syndrome Exostoses, Multiple Hereditary Hamartoma Syndrome, Multiple Li-Fraumeni Syndrome Multiple Endocrine Neoplasia Nephroblastoma Neurofibromatoses Peutz-Jeghers Syndrome Sturge-Weber Syndrome Inborn Genetic Diseases Neoplasms Hereditary NonPolyposis Colon Cancer Hereditary Neoplastic Syndromes HNPCC (hMSH2, hMLH1, hPMS1, hPMS2) Anatomy Ontology Disease Ontology Protein- Protein Interactions Variation (SNPs) Protein Domains & 3D Structure MedLine Pathway Databases Sequence Databases Other Databases Ontologies Ontolog y Explore r PathBuilder GPB Integra ted Annotat ion Complex Builder Gene Summary Taxonomy GO Cluster er Gene Express ion PathMake r Canvas Genomics Knowledge Platform Biological Object Model Network Representation Biological Pathways Cognitive Processing (Researcher/Scientist Reasoning) Biological Explanation Mechanistic Explanation Novel Treatments Normal Cellular Function Disease Processes Biomedical Discovery Process Biological Entities Genotype Environment Etiologies? Treatment? Mechanisms? Signatures? Prevention? Genome Transcripto me Proteome Interactome Metabolome Physiome Regulome Variome Pathome Pharmacogenome Pathologene Report: Extracting relationships between disease, anatomy and genes.

description

Genome. Genomics Knowledge Platform Biological Object Model. Variome. Regulome. Transcriptome. Proteome. PathMaker Canvas. Interactome. Pharmacogenome. Metabolome. Physiome. Pathome. - PowerPoint PPT Presentation

Transcript of Abstract

Page 1: Abstract

An Integrative Approach for the Study of Sequence Variation Impact on Biological Processes, Diseases and Environmental Agents’ RiskSivakumar Gowrisankar, Amol S Deshmukh, Anil G Jegga and Bruce J Aronow

Department of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center and University of Cincinnati

The integration of genomic sequence analyses from multiple species and strains, along with protein interaction data and gene expression profiles that reflect specific biological states and processes has opened many new avenues to understand specific biological systems. Nevertheless, formidable challenges remain to be overcome for the improvement of prediction, diagnosis, prognosis, and treatment of human diseases. Can we infer from large molecular datasets how different biological entities are organized and interact, and then predict the effect that genetic polymorphisms or sequence variations might confer on interconnected biological processes? The integration of heterogeneous data and information in fact is a key issue in functional genomics. An appropriate data model and consistent methods for its integrated representation, analysis, and visualization has the potential to pave the way for the emergence of discovery-driven science, enhance hypothesis-generation, and provide new focus for experimental validation and refinement. Thus, to represent the presence and impact of polymorphisms further in the context of biological pathways, we have sought to unify our representation of molecular, biological, and environmental entities such that biological knowledge from experts and biomedical literature could be assembled in a storyboard canvas. For example, the representation of a disease could consist of a biological process composed of one or more pathways, within which, entities (gene products, complexes, and cellular and subcellular components) are subjected to one or more interactions and transitions to disease term associated states. We have begun the development of a suite of applications using a common database structure that can represent biological processes using a host of publicly available data sources including gene objects and biological ontologies that in turn represent systematic abstractions of biomedical literature and expert knowledge. As part of this exercise, we have compiled all existing protein-protein interactions from “interactome” rich databases (PreBIND, MINT, DIP and HPRD) and mine the biomedical literature for novel interactions unrepresented in these specialized databases. Our compiled interactions data comply with the standards set out by Proteomics Standards Initiative (PSI) facilitating easy data exchange. As available annotations increase the challenge is to integrate biological process representation in such a way as to increase our understanding rather than obscure in convoluted figures or excessive detail. The use of a network visualizer provides not only a lucid means of summarizing existing biological knowledge about molecular behavior but also helps in elucidating the potential implications sequence variations can have on protein-protein interactions or the binding of specific transcription factors.

AbstractAbstractAbstractAbstract

IntestinesHollow viscus

Large intestinal structure

Organ with organ cavity

Large Intestine Colon structure Region of large intestine

Colon

A Systems Biology Integrative ApproachA Systems Biology Integrative ApproachA Systems Biology Integrative ApproachA Systems Biology Integrative Approach

GKP-PathMakerGKP-PathMakerGKP-PathMakerGKP-PathMaker

Future DirectionsFuture DirectionsFuture DirectionsFuture Directions

References & SupportReferences & SupportReferences & SupportReferences & Support1. XPrInt and PatholoGene: http://abstrainer.cchmc.org

2. UMLS Knowledge Source Server: http://umlsks.nlm.nih.gov

3. Open Biological Ontologies: http://obo.sourceforge.net

Support: NIEHS U01 ES11038 Mouse Centers Genomics Consortium

PatholoGene – Development of a system to link biological entities, anatomy, pathways and diseases using the UMLS Semantic Network, NCBI-OMIM and MedLine abstract parsing with ICD10 disease terms and gene symbols. The Semantic Network, through its semantic types, provides a categorization of all UMLS Metathesaurus concepts. The links between the semantic types provide the structure for the Network and represent important relationships in the biomedical domain. The UMLS Metathesaurus contains information about biomedical concepts and terms from many controlled vocabularies and classifications used in patient records, bibliographic and full-text databases, and expert systems. As a test case we illustrate the analysis of colon cancer as a function of anatomy, pathology, etiology and disease progression.

PatholoGenePatholoGenePatholoGenePatholoGene

XPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein InteractionsXPrInt: Extracting & Compiling Protein Interactions

Protein

Interaction

s

PreBIND

GeneRIF

HPRD

OMIM

FANCG, NBS1, RB1, TP53, CDKN2A

TNF, IL5, TNFRSF14, IL12B, IL12A, IL8, IL1B, IL4R, LTB, RAG1, TNFRSF6, TNFRSF17, APOE, TNFRSF7, TNFRSF4, TNFRSF9, TNFRSF5, F3, LTA

NIEHS Candidate Genes’ Categorization Based on

GO (Biological Process)

Are these functionally clustered proteins

involved in a common biological network or

interaction? Co-citation in literature

abstracts using gene/protein symbols and “interactome-

specific” keywordsDoes a SNP in one or more biological

entities result in aberrations within a pathway and manifest as a disease or contribute to increased susceptibility to disease or an altered response to therapeutic agents?

MapMoleculeGene BioMaterial

Publishable

GKP Object

Expert Curated

Ontologies

Unified Representation of Disease States and Biological Processes using Clinical Phenotype, Molecular Signatures, and Genetic Attributes

Analysis, Diagnosis and Prediction

Disease State A Disease State B

Therapeutic Intervention

Disease Process

Modeling Tool

Pat

ient-

Cen

tere

d

Clinic

al

Obse

rvat

ions

New Insights & Hypotheses

Path

ways an

d

Processes-

Cen

tered

Biom

edical

Know

ledge

Sample-Centered Genetic and

Genomic DataBiological Entities

12 Siblings (UMLS – Concepts)

→Adenomatous Polyposis Coli

→Basal Cell Nevus Syndrome

→Colorectal Neoplasms, Hereditary Nonpolyposis

→Dysplastic Nevus Syndrome

→Exostoses, Multiple Hereditary

→Hamartoma Syndrome, Multiple

→Li-Fraumeni Syndrome

→Multiple Endocrine Neoplasia

→Nephroblastoma

→Neurofibromatoses

→Peutz-Jeghers Syndrome

→Sturge-Weber Syndrome

Inborn Genetic Diseases

Neoplasms

Hereditary NonPolyposis Colon Cancer

Hereditary Neoplastic Syndromes

HNPCC (hMSH2, hMLH1, hPMS1, hPMS2)

Anatomy Ontology

Disease Ontology

Protein-Protein Interactions

Variation (SNPs)

Protein Domains & 3D Structure

MedLine

Pathway Databases

Sequence Databases

Other Databases

Ontologies

Ontology Explorer

PathBuilder

GPB Integrate

d Annotatio

n

Complex Builder

Gene Summary

Taxonomy

GO Clusterer

Gene Expressio

n

PathMaker Canvas

Genomics Knowledge Platform Biological Object Model

Network Representation

Biological Pathways

Cognitive Processing (Researcher/Scientist Reasoning)

Biological Explanation Mechanistic Explanation

Novel Treatments

Normal Cellular Function Disease Processes

Biomedical Discovery Process

Biological Entities

Genotype

Environment

Etiologies? Treatment?Mechanisms? Signatures? Prevention?

Genome

Transcriptome

Proteome

Interactome

Metabolome

Physiome

Regulome Variome

Pathome

Pharmacogenome

Pathologene Report: Extracting

relationships between disease, anatomy and

genes.