Getting Started in Biological Pathway Construction and Analysis
Transcript of Getting Started in Biological Pathway Construction and Analysis
-
7/30/2019 Getting Started in Biological Pathway Construction and Analysis
1/5
Message from ISCB
Getting Started in Biological Pathway
Construction and AnalysisGanesh A. Viswanathan, Jeremy Seto, Sonali Patil, German Nudelman, Stuart C. Sealfon
*
Introduction
Life depends on the capacity ofindividual cells to respond effectivelyto cues about their changing internaland external environments. Cellulardecision making and responses areorchestrated by complex molecularnetworks consisting of entities such asproteins or RNAs connected byinteractions such as activation orsynthesis. Information contained inprimary databases and in the
experimental literature relevant tothese networks is so extensive andrapidly growing that it is increasinglydifficult to integrate. As an aid totheoretical and experimental research,it is convenient to distill the inferencescontained in the experimentalliterature and databases intoknowledgebases that consist ofannotated representations of biologicalpathways.
Pathway building has beenperformed by individual groups
studying a network of interest (e.g.,Kitanos group who assembled animmune signaling pathway [1]) as wellas by large bioinformatics consortia(e.g., the Reactome Project [2]) andcommercial entities (e.g., IngenuitySystems). Pathway building is theprocess of identifying and integratingthe entities, interactions, andassociated annotations, and populatingthe knowledgebase. Pathwayconstruction can have either a data-driven objective (DDO) or aknowledge-driven objective (KDO).Data-driven pathway construction isused to generate relationshipinformation of genes or proteinsidentified in a specific experiment suchas a microarray study. Knowledge-
driven pathway construction entailsdevelopment of a detailed pathwayknowledgebase for particular domainsof interest, such as a cell type, disease,or system. To help researchers get theirbearings in this field, in the subsequentsections we provide a brief, practicalorientation to existing knowledgebasesand to the methods of pathwayconstruction and analysis.
Biological Pathway ConstructionWorkflow
The curation process of a biologicalpathway entails identifying andstructuring content, mininginformation manually and/orcomputationally, and assembling aknowledgebase using appropriatesoftware tools. A schematic illustratingthe major steps involved in the data-driven and knowledge-drivenconstruction processes is shown inFigure 1. For either DDO or KDOpathway construction, the first step is
to mine pertinent information fromrelevant information sources (discussedin Public and Private InformationSources) about the entities andinteractions. The information retrievedis assembled using appropriateformats, information standards, andpathway building tools (discussed inFormats, Standards, and PathwayBuilding Tools) to obtain a pathwayprototype. The pathway is furtherrefined to include context-specificannotations such as species, cell/tissuetype, or disease type. The pathway can
then be verified by the domain expertsand updated by the curators based onappropriate feedback. In the sectionIllustration of the Pathway BuildingProcess, we describe an example of theKDO approach for building a pathway.
Public and Private InformationSources
The extension of reductive biologybegun with Aristotles Parts of Animalsto the molecular realm has defined
large numbers of entities andinteractions in various cells andorganisms. Recent attempts to improveknowledge integration have led torefined classifications of cellularentities, such as Gene Ontology (GO),and to the assembly of structuredknowledge repositories. Datarepositories, which containinformation regarding sequence data,metabolism, signaling, reactions, andinteractions are a major source ofinformation for pathway building. Afew useful databases are described inTable 1. A comprehensive list ofresources can be found at http://www.pathguide.org.
Formats, Standards, andPathway Building Tools
Various standard, computerreadable, object-oriented formats havebeen developed to facilitate theorganization, storage, exchange, andparsing of pathway knowledgebases
and the relevant experimental evidenceinformation. Important pathway andpathway-related formats, which are allXML-based, include Systems BiologyMarkup Language (SBML), ProteomicsStandards InitiativeMolecularInteractions (PSI-MI), and Biological
Editor: Olga Troyanskaya, Princeton University,United States of America
Citation: Viswanathan GA, Seto J, Patil S, NudelmanG, Sealfon SC (2008) Getting started in biologicalpathway construction and analysis. PLoS Comput
Biol 4(2): e16. doi:10.1371/journal.pcbi.0040016Copyright: 2008 Viswanathan et al. This is anopen-access article distributed under the terms ofthe Creative Commons Attribution License, whichpermits unrestricted use, distribution, andreproduction in any medium, provided the originalauthor and source are credited.
Ganesh A. Viswanathan, Jeremy Seto, Sonali Patil,German Nudelman, and Stuart C. Sealfon are with theCenter for Translational Systems Biology andDepartment of Neurology, Mount Sinai School ofMedicine, New York, New York, United States ofAmerica.
* To whom correspondence should be addressed. E-mail: [email protected]
PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160001
-
7/30/2019 Getting Started in Biological Pathway Construction and Analysis
2/5
Pathways eXchange (BioPAX) [3].SBML, which is used mainly forrepresentation of pathways andmathematical models and supported bymore than 100 software systems, is
currently the best-suited format formathematical modeling andsimulations. PSI-MI is designed forstructured representation ofexperimental evidence information,such as molecular interactions data.The richest format, BioPAX, integratesPSI-MI within a pathwayrepresentation format and providesgeneral representation mechanisms
that permit storage of additionalinformation, such as mathematicalmodels. However, BioPAX is relativelynew, and its features are rapidlyevolving, making it a technicalchallenge to implement. Standardshave also been developed forrepresentation of different biologicalinformation such as the nomenclatureof entities and interactions (e.g.,HUGO, Human Genome
Organization), and experimental data,
(e.g., MIAME, Minimal InformationAssociated with MicroarrayExperiments). The ability to extractinformation automatically and to makeinferences is furthered by the use of thecontrolled vocabularies of establishedtaxonomies and ontologies [4]. GOclassifies genes to provide insight intotheir function and relationships and
serves as a model for other biologicalontologies. A comprehensive review ofbiological information standards canbe found in [5].
Pathway building tools are requiredto populate, visualize, and store apathway. Currently there are variouspathway building tools [3] that providethe ability to extract information aswell as to support multiple standardformats. Cytoscape, CellDesigner, and
JDesigner are graphical environments
for constructing pathways that canimport/export SBML models forsimulation. Cytoscape can also accesslarge databases containing protein andgene interactions with additionalsupport for PSI-MI and BioPAX
formats. Pathway Analysis Tools forIntegration and Knowledgebase(PATIKA) provides a Web-basedinterface to public databases, such asReactome, HPRD, and IntAct throughsupporting both SBML and BioPAXformats. Its visualization and layouttools facilitate pathway analysis.Reactome displays reactions as pathwaydiagrams and provides online tools forauthoring, curation, and visualizationas well as export to SBML and BioPAXformats. Ingenuity pathway analysistool, a Web-based interface of theIngenuity Knowledgebase, available bypaid subscription, enables users toquery molecular interactions,biological functions, and diseases forgenerating customized pathways andanalysis.
Illustration of the Pathway
Building ProcessPathway curation can be either
manual or automated. Manual curationprovides the most reliable informationextraction from the literature.However, the pace of new discovery canmake manually populated databasesdifficult to maintain. In the miningprocess, use of appropriate keywordsincreases the chances of identifying therelevant information. Automated textmining through Natural LanguageProcessing reduces the personnel
required for recovery of information,but has severe limitations in accuracy.Information in the scientific literatureis highly specialized, semanticallyunpredictable, and often not textual.Agreeing on facts is difficult even forexpert curators. The presentgeneration of text mining tools isprobably most useful as an aid tomanual curation.
The efficient mining of informationfrom the plethora of resourcedatabases hinges on the identification
of the most useful primary literatureand databases for the biological area ofinterest. This often poses a challenge,as the choice of databases and miningstrategies are biological areaspecific.We find Reactome, UniHI, andIngenuity Systems useful andappropriate for many biological areas.
We provide here an example ofassembly of a human dendritic cellsignaling pathway involved inresponding to microbes, assembled inCellDesigner, built using a KDO-based
doi:10.1371/journal.pcbi.0040016.g001
Figure 1. Schematic Illustrating the Biological Pathway Building Process
Pathway curators initially mine information (Step 1). The mining process can be initiated by twobroad pathway building objectives: (a) DDO wherein a list of genes and/or proteins are obtained byhigh-throughput experiments such as microarray, mass spectrometry or (b) KDO wherein a broadtopic of interest is chosen and then the knowledge concerning this topic is mined from resourcessuch as the primary literature and knowledgebases. Information from the mining process isassembled (Step 2), using pathway building tools, into a pathway, which, following many iterationsof feedback from domain experts (Step 3) and refinement (Step 4), leads to the desired specificannotated pathway.
PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160002
-
7/30/2019 Getting Started in Biological Pathway Construction and Analysis
3/5
information mining approach. Asnapshot of the pathway is in Figure 2.We extracted information such as
TLRs, TRIF, MyD88, RIGI, IRF3, andIFNb predominantly from primaryliterature and comprehensive reviewpapers obtained from databases such asPubMed. The Reactomes and
Ingenuity systems presorted manuallycurated information and search toolsenabled us to reliably identify andextract the pertinent entities and
interactions. Identification and
extraction of relevant informationfrom appropriate primary literature isa tedious task. Although slower, use ofinformation from the pathwayresources expedited the identification
step. The relevant primary literature isalso populated as annotations forentities and interactions while creatingthe pathway (unpublished data). Theefficient building and visualization of apathway requires the use of
appropriate software. We chose to
assemble the pathway in CellDesignerdue to its flexible graphics capabilitiesthat facilitate a clear presentation ofhigh granularity pathways.
DDO pathway building, which canfollow a similar process, differs in thatthe starting point is typically acollection of genes or proteinsidentified in a global experiment whoserelationships are not well understood.In this case, the pathway buildingprocess is used to elucidate thepathways and functional relationships
shared by regulated entities.
Pathway Analysis
Pathway analysis refers to thecomputational approaches used toinvestigate network behavior as asystem. Pathway analysis can be broadlyclassified into two types: topological/structural network analysis anddynamical analysis.
Topological analysis of a pathwayidentifies the global qualitative
properties of the system [6]. Oneapproach uses classical graph theory toidentify various motifs in a pathway
represented as a directed graph. Amotif is a group of interacting entitiescapable of information processing thatappears repeatedly. If the graph issigned (i.e., the positive or negative
regulatory effects of each interactionthat may be obtained from primaryliterature are specified), Booleannetwork analysis can be used to identify
the semi-quantitative features such as
positive/negative feedback loops andminimal cut sets in the pathway.Feedback loops strongly affect thebehavior of the system. A minimal cutset of entities is the smallest group of
entities that, when disrupted, affect theparticular network behavior of interest.The identification of minimal cut setsaids the assessment of the robustness ofa system. Motifs, feedback loops, andminimal cut sets of a pathway
connecting, for example, a receptor
Table 1. A List of Databases, Classified Based on the Type of Information Represented, Commonly Used during a Biological PathwayConstruction
Database Description
ProteinProtein Interaction Databases: Organize
experimental and/or in silico interactions
BIND 200,000 documented biomolecular interactions and complexes
MINT Exp erimentall y v erif ied in teracti ons
HPRD Elegant and comprehensive presentation of the interactions, entities,and evidences
MPact Yeast interactions. A part of MIPS
D IP Exp erimentall y determi ned interacti ons
IntAct Database and analysis system of binary and multiprotein interactions
PD ZBase PD Z D omai n co ntain ing p roteins
GNPV B ased on spec if ic e xperiments and l it erature
BioGr id Ph ysical a nd geneti c in teractio ns
UniHi Comprehensive human prote in int eract ions
O PHID Comb ines PPI f rom BIND, HPRD, and MINT
Metabolic Pathways Databases: Compendium of pathways
describing metabolic and physical processes (Primary source
for metabolic information initiated by Stanford Research Initiative)
EcoCyc Ent ire ge nome and biochemical mac hine ry of E. coli
MetaC yc Pa th ways of mo re than 165 species
HumanCyc Human metabolic pathways and the human genome
B ioCyc Col lec tion of dat abases for several organism
Signaling Pathways Databases: Pathways
pertaining to signal transduction
K EGG Comprehensive . L inks t o several useful database s
PANTHER Compendium of pathways built using CellDesigner
Reactome Hierarchical layout. Extensive links to relevant databases
Biomodels Domain experts curated pathways and associated mathematical models
STKE Repository of canonical pathways
Ingenuity Systems Commercial mammalian biological knowledgebase
PID Compendium of several assembled s ignaling pathways
BioPP Repository of biological pathways built using CellDesigner
Most databases have a graphics viewer for displaying entities and interactions. Refer to Table S1 for a more detailed description and URLs of these databases.BIND, Biomolecular Interaction Network Database; BioPP, Biological Pathway Publisher; DIP, Database of Interacting Proteins; EcoCyc, Encyclopaedia of E. coli Genes and Metabolism;GNPV, Genome Network Platform Viewer; HPRD, Human Protein Reference Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MetaCyc, a Metabolic Pathway database; MINT,Molecular INTeration database; MIPS, Munich Information center for Protein Sequences; OPHID, Online Predicted Human Interaction Database; PANTHER, Protein Analysis throughEvolutionary Relationship database; PID, The Pathway Interaction Database; STKE, Signal Transduction Knowledge Environment; UNIHI, Unified Human Interactome.doi:10.1371/journal.pcbi.0040016.t001
PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160003
-
7/30/2019 Getting Started in Biological Pathway Construction and Analysis
4/5
and a transcription factor, such asNFjB, that regulates many genes,illustrate the global properties of thesystem. Probabilistic graphical modelsapproaches such as Bayesian networkanalysis are used to analyze and learnabout the cellular networks fromquantitative experimental data and toinfer indirect relationships.
Dynamical analysis, a higher
resolution mathematical modeling,elucidates the detailed local and certainglobal quantitative behaviors of thesystem. Dynamical analysis requiresmore information on the reactionparameters and initial conditions thantopological approaches [6].Deterministic dynamical analysis usesdifferential equations to describereactions. Deterministic partial least
square (PLS) models assume thenetwork of pathways as a processorunit. Based on the appropriatequantitative experimentalmeasurements of key entities in an apriori known network of pathways, PLSmodels can be used to predict the time-dependent cross-talk between pathwaysof the network under certainconditions. Another approach is
doi:10.1371/journal.pcbi.0040016.g002
Figure 2. Example of KDO Pathway Assembly: Signal Transduction Pathways Involved during Infection due to Pathogens such as Virus, Bacteria inMammalian Dendritic Cells
Starting from a broad topic of interestinfection in mammalian dendritic cellsusing the resources in Table 1, this network of pathways was built.
PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160004
-
7/30/2019 Getting Started in Biological Pathway Construction and Analysis
5/5
stochastic modeling which uses aprobabilistic representation.Deterministic models describe averagebehavior. Stochastic approaches areimportant when the absolute numberof the reactant molecules in each cell issmall. In this condition, theprobabilistic nature of chemicalreactions may affect system behavior
and deterministic models may not bevalid. Many software tools are availablefor topological and dynamical pathwayanalysis [7,8]. &
Supporting Information
Table S1. A list of Frequently UsedDatabases, Classified Based on the Type ofInformation Represented, during aBiological Pathway Construction, TheirProperties, and URLs
A comprehensive list of databases can befound in Pathguide (http://www.pathguide.org). A, automated curation; B, both manualand automated curation; BIND,Biomolecular Interaction Network
Database; BioPP, Biological PathwayPublisher; DIP, Database of InteractingProteins; EcoCyc, Encyclopaedia of E. coliGenes and Metabolism; GNPV, GenomeNetwork Platform Viewer; HPRD, HumanProtein Reference Database; KEGG, KyotoEncyclopedia of Genes and Genomes; M,manual curation; MetaCyc, a MetabolicPathway database; MINT, MolecularInteration Database; MIPS, MunichInformation Center for Protein Sequences;
N, No; OPHID, Online Predicted HumanInteraction Database; PANTHER, ProteinAnalysis through Evolutionary RelationshipDatabase; PID, The Pathway InteractionDatabase; STKE, Signal TransductionKnowledge Environment, UNIHI, UnifiedHuman Interactome; Y, yes.
Found at doi:10.1371/journal.pcbi.0040016.st001 (61 KB DOC)
Acknowledgments
Author contributions. GAV, JS, SP, GN,and SCS wrote the paper.
Funding. Our pathway research is sup-ported by US National Institutes of HealthNIAID contract HHSN2662000500021C.
Competing interests. The authors havedeclared that no competing interests exist.
References
1. Oda K, Kitano H (2006) A comprehensive mapof the toll-like receptor signaling network. MolSyst Biol 2: 2006 0015.
2. Joshi-Tope G, Gillespie M, Vastrik I,DEustachio P, Schmidt E, et al. (2005)Reactome: a knowledgebase of biologicalpathways. Nucleic Acids Res 33: D428D432.
3. Stromback L, Jakoniene V, Tan H, Lambrix P(2006) Representing, storing and accessingmolecular interaction data: a review of modelsand tools. Brief Bioinform 7: 331338.
4. Baclawski K, Niu T (2006) Ontologies forbioinformatics. Cambridge (Massachusetts):The MIT Press.
5. Brazma A, Krestyaninova M, Sarkans U (2006)Standards for systems biology. Nat Rev Genet7: 593605
6. Alon U (2007) An introduction to systemsbiology: design principles of biological circuits.Boca Raton (Florida): Chapman & Hall/CRC.
7. Kashtan N, Itzkovitz S, Milo R, Alon U (2004)Efficient sampling algorithm for estimatingsubgraph concentrations and detectingnetwork motifs. Bioinformatics 20: 17461758.
8. Alves R, Antunes F, Salvador A (2006) Tools forkinetic modeling of biochemical networks. NatBiotechnol 24: 667672.
PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160005