Getting Started in Biological Pathway Construction and Analysis

7/30/2019 Getting Started in Biological Pathway Construction and Analysis

1/5

Message from ISCB

Getting Started in Biological Pathway

Construction and AnalysisGanesh A. Viswanathan, Jeremy Seto, Sonali Patil, German Nudelman, Stuart C. Sealfon

*

Introduction

Life depends on the capacity ofindividual cells to respond effectivelyto cues about their changing internaland external environments. Cellulardecision making and responses areorchestrated by complex molecularnetworks consisting of entities such asproteins or RNAs connected byinteractions such as activation orsynthesis. Information contained inprimary databases and in the

experimental literature relevant tothese networks is so extensive andrapidly growing that it is increasinglydifficult to integrate. As an aid totheoretical and experimental research,it is convenient to distill the inferencescontained in the experimentalliterature and databases intoknowledgebases that consist ofannotated representations of biologicalpathways.

Pathway building has beenperformed by individual groups

studying a network of interest (e.g.,Kitanos group who assembled animmune signaling pathway [1]) as wellas by large bioinformatics consortia(e.g., the Reactome Project [2]) andcommercial entities (e.g., IngenuitySystems). Pathway building is theprocess of identifying and integratingthe entities, interactions, andassociated annotations, and populatingthe knowledgebase. Pathwayconstruction can have either a data-driven objective (DDO) or aknowledge-driven objective (KDO).Data-driven pathway construction isused to generate relationshipinformation of genes or proteinsidentified in a specific experiment suchas a microarray study. Knowledge-

driven pathway construction entailsdevelopment of a detailed pathwayknowledgebase for particular domainsof interest, such as a cell type, disease,or system. To help researchers get theirbearings in this field, in the subsequentsections we provide a brief, practicalorientation to existing knowledgebasesand to the methods of pathwayconstruction and analysis.

Biological Pathway ConstructionWorkflow

The curation process of a biologicalpathway entails identifying andstructuring content, mininginformation manually and/orcomputationally, and assembling aknowledgebase using appropriatesoftware tools. A schematic illustratingthe major steps involved in the data-driven and knowledge-drivenconstruction processes is shown inFigure 1. For either DDO or KDOpathway construction, the first step is

to mine pertinent information fromrelevant information sources (discussedin Public and Private InformationSources) about the entities andinteractions. The information retrievedis assembled using appropriateformats, information standards, andpathway building tools (discussed inFormats, Standards, and PathwayBuilding Tools) to obtain a pathwayprototype. The pathway is furtherrefined to include context-specificannotations such as species, cell/tissuetype, or disease type. The pathway can

then be verified by the domain expertsand updated by the curators based onappropriate feedback. In the sectionIllustration of the Pathway BuildingProcess, we describe an example of theKDO approach for building a pathway.

Public and Private InformationSources

The extension of reductive biologybegun with Aristotles Parts of Animalsto the molecular realm has defined

large numbers of entities andinteractions in various cells andorganisms. Recent attempts to improveknowledge integration have led torefined classifications of cellularentities, such as Gene Ontology (GO),and to the assembly of structuredknowledge repositories. Datarepositories, which containinformation regarding sequence data,metabolism, signaling, reactions, andinteractions are a major source ofinformation for pathway building. Afew useful databases are described inTable 1. A comprehensive list ofresources can be found at http://www.pathguide.org.

Formats, Standards, andPathway Building Tools

Various standard, computerreadable, object-oriented formats havebeen developed to facilitate theorganization, storage, exchange, andparsing of pathway knowledgebases

and the relevant experimental evidenceinformation. Important pathway andpathway-related formats, which are allXML-based, include Systems BiologyMarkup Language (SBML), ProteomicsStandards InitiativeMolecularInteractions (PSI-MI), and Biological

Editor: Olga Troyanskaya, Princeton University,United States of America

Citation: Viswanathan GA, Seto J, Patil S, NudelmanG, Sealfon SC (2008) Getting started in biologicalpathway construction and analysis. PLoS Comput

Biol 4(2): e16. doi:10.1371/journal.pcbi.0040016Copyright: 2008 Viswanathan et al. This is anopen-access article distributed under the terms ofthe Creative Commons Attribution License, whichpermits unrestricted use, distribution, andreproduction in any medium, provided the originalauthor and source are credited.

Ganesh A. Viswanathan, Jeremy Seto, Sonali Patil,German Nudelman, and Stuart C. Sealfon are with theCenter for Translational Systems Biology andDepartment of Neurology, Mount Sinai School ofMedicine, New York, New York, United States ofAmerica.

* To whom correspondence should be addressed. E-mail: [email protected]

PLoS Computational Biology | www.ploscompbiol.org February 2008 | Volume 4 | Issue 2 | e160001


2/5

Pathways eXchange (BioPAX) [3].SBML, which is used mainly forrepresentation of pathways andmathematical models and supported bymore than 100 software systems, is

currently the best-suited format formathematical modeling andsimulations. PSI-MI is designed forstructured representation ofexperimental evidence information,such as molecular interactions data.The richest format, BioPAX, integratesPSI-MI within a pathwayrepresentation format and providesgeneral representation mechanisms

that permit storage of additionalinformation, such as mathematicalmodels. However, BioPAX is relativelynew, and its features are rapidlyevolving, making it a technicalchallenge to implement. Standardshave also been developed forrepresentation of different biologicalinformation such as the nomenclatureof entities and interactions (e.g.,HUGO, Human Genome

Organization), and experimental data,

(e.g., MIAME, Minimal InformationAssociated with MicroarrayExperiments). The ability to extractinformation automatically and to makeinferences is furthered by the use of thecontrolled vocabularies of establishedtaxonomies and ontologies [4]. GOclassifies genes to provide insight intotheir function and relationships and

serves as a model for other biologicalontologies. A comprehensive review ofbiological information standards canbe found in [5].

Pathway building tools are requiredto populate, visualize, and store apathway. Currently there are variouspathway building tools [3] that providethe ability to extract information aswell as to support multiple standardformats. Cytoscape, CellDesigner, and

JDesigner are graphical environments

for constructing pathways that canimport/export SBML models forsimulation. Cytoscape can also accesslarge databases containing protein andgene interactions with additionalsupport for PSI-MI and BioPAX

formats. Pathway Analysis Tools forIntegration and Knowledgebase(PATIKA) provides a Web-basedinterface to public databases, such asReactome, HPRD, and IntAct throughsupporting both SBML and BioPAXformats. Its visualization and layouttools facilitate pathway analysis.Reactome displays reactions as pathwaydiagrams and provides online tools forauthoring, curation, and visualizationas well as export to SBML and BioPAXformats. Ingenuity pathway analysistool, a Web-based interface of theIngenuity Knowledgebase, available bypaid subscription, enables users toquery molecular interactions,biological functions, and diseases forgenerating customized pathways andanalysis.

Illustration of the Pathway

Building ProcessPathway curation can be either

manual or automated. Manual curationprovides the most reliable informationextraction from the literature.However, the pace of new discovery canmake manually populated databasesdifficult to maintain. In the miningprocess, use of appropriate keywordsincreases the chances of identifying therelevant information. Automated textmining through Natural LanguageProcessing reduces the personnel

required for recovery of information,but has severe limitations in accuracy.Information in the scientific literatureis highly specialized, semanticallyunpredictable, and often not textual.Agreeing on facts is difficult even forexpert curators. The presentgeneration of text mining tools isprobably most useful as an aid tomanual curation.

The efficient mining of informationfrom the plethora of resourcedatabases hinges on the identification

of the most useful primary literatureand databases for the biological area ofinterest. This often poses a challenge,as the choice of databases and miningstrategies are biological areaspecific.We find Reactome, UniHI, andIngenuity Systems useful andappropriate for many biological areas.

We provide here an example ofassembly of a human dendritic cellsignaling pathway involved inresponding to microbes, assembled inCellDesigner, built using a KDO-based

doi:10.1371/journal.pcbi.0040016.g001

Figure 1. Schematic Illustrating the Biological Pathway Building Process

Pathway curators initially mine information (Step 1). The mining process can be initiated by twobroad pathway building objectives: (a) DDO wherein a list of genes and/or proteins are obtained byhigh-throughput experiments such as microarray, mass spectrometry or (b) KDO wherein a broadtopic of interest is chosen and then the knowledge concerning this topic is mined from resourcessuch as the primary literature and knowledgebases. Information from the mining process isassembled (Step 2), using pathway building tools, into a pathway, which, following many iterationsof feedback from domain experts (Step 3) and refinement (Step 4), leads to the desired specificannotated pathway.



3/5

information mining approach. Asnapshot of the pathway is in Figure 2.We extracted information such as

TLRs, TRIF, MyD88, RIGI, IRF3, andIFNb predominantly from primaryliterature and comprehensive reviewpapers obtained from databases such asPubMed. The Reactomes and

Ingenuity systems presorted manuallycurated information and search toolsenabled us to reliably identify andextract the pertinent entities and

interactions. Identification and

extraction of relevant informationfrom appropriate primary literature isa tedious task. Although slower, use ofinformation from the pathwayresources expedited the identification

step. The relevant primary literature isalso populated as annotations forentities and interactions while creatingthe pathway (unpublished data). Theefficient building and visualization of apathway requires the use of

appropriate software. We chose to

assemble the pathway in CellDesignerdue to its flexible graphics capabilitiesthat facilitate a clear presentation ofhigh granularity pathways.

DDO pathway building, which canfollow a similar process, differs in thatthe starting point is typically acollection of genes or proteinsidentified in a global experiment whoserelationships are not well understood.In this case, the pathway buildingprocess is used to elucidate thepathways and functional relationships

shared by regulated entities.

Pathway Analysis

Pathway analysis refers to thecomputational approaches used toinvestigate network behavior as asystem. Pathway analysis can be broadlyclassified into two types: topological/structural network analysis anddynamical analysis.

Topological analysis of a pathwayidentifies the global qualitative

properties of the system [6]. Oneapproach uses classical graph theory toidentify various motifs in a pathway

represented as a directed graph. Amotif is a group of interacting entitiescapable of information processing thatappears repeatedly. If the graph issigned (i.e., the positive or negative

regulatory effects of each interactionthat may be obtained from primaryliterature are specified), Booleannetwork analysis can be used to identify

the semi-quantitative features such as

positive/negative feedback loops andminimal cut sets in the pathway.Feedback loops strongly affect thebehavior of the system. A minimal cutset of entities is the smallest group of

entities that, when disrupted, affect theparticular network behavior of interest.The identification of minimal cut setsaids the assessment of the robustness ofa system. Motifs, feedback loops, andminimal cut sets of a pathway

connecting, for example, a receptor

Table 1. A List of Databases, Classified Based on the Type of Information Represented, Commonly Used during a Biological PathwayConstruction

Database Description

ProteinProtein Interaction Databases: Organize

experimental and/or in silico interactions

BIND 200,000 documented biomolecular interactions and complexes

MINT Exp erimentall y v erif ied in teracti ons

HPRD Elegant and comprehensive presentation of the interactions, entities,and evidences

MPact Yeast interactions. A part of MIPS

D IP Exp erimentall y determi ned interacti ons

IntAct Database and analysis system of binary and multiprotein interactions

PD ZBase PD Z D omai n co ntain ing p roteins

GNPV B ased on spec if ic e xperiments and l it erature

BioGr id Ph ysical a nd geneti c in teractio ns

UniHi Comprehensive human prote in int eract ions

O PHID Comb ines PPI f rom BIND, HPRD, and MINT

Metabolic Pathways Databases: Compendium of pathways

describing metabolic and physical processes (Primary source

for metabolic information initiated by Stanford Research Initiative)

EcoCyc Ent ire ge nome and biochemical mac hine ry of E. coli

MetaC yc Pa th ways of mo re than 165 species

HumanCyc Human metabolic pathways and the human genome

B ioCyc Col lec tion of dat abases for several organism

Signaling Pathways Databases: Pathways

pertaining to signal transduction

K EGG Comprehensive . L inks t o several useful database s

PANTHER Compendium of pathways built using CellDesigner

Reactome Hierarchical layout. Extensive links to relevant databases

Biomodels Domain experts curated pathways and associated mathematical models

STKE Repository of canonical pathways

Ingenuity Systems Commercial mammalian biological knowledgebase

PID Compendium of several assembled s ignaling pathways

BioPP Repository of biological pathways built using CellDesigner

Most databases have a graphics viewer for displaying entities and interactions. Refer to Table S1 for a more detailed description and URLs of these databases.BIND, Biomolecular Interaction Network Database; BioPP, Biological Pathway Publisher; DIP, Database of Interacting Proteins; EcoCyc, Encyclopaedia of E. coli Genes and Metabolism;GNPV, Genome Network Platform Viewer; HPRD, Human Protein Reference Database; KEGG, Kyoto Encyclopedia of Genes and Genomes; MetaCyc, a Metabolic Pathway database; MINT,Molecular INTeration database; MIPS, Munich Information center for Protein Sequences; OPHID, Online Predicted Human Interaction Database; PANTHER, Protein Analysis throughEvolutionary Relationship database; PID, The Pathway Interaction Database; STKE, Signal Transduction Knowledge Environment; UNIHI, Unified Human Interactome.doi:10.1371/journal.pcbi.0040016.t001



4/5

and a transcription factor, such asNFjB, that regulates many genes,illustrate the global properties of thesystem. Probabilistic graphical modelsapproaches such as Bayesian networkanalysis are used to analyze and learnabout the cellular networks fromquantitative experimental data and toinfer indirect relationships.

Dynamical analysis, a higher

resolution mathematical modeling,elucidates the detailed local and certainglobal quantitative behaviors of thesystem. Dynamical analysis requiresmore information on the reactionparameters and initial conditions thantopological approaches [6].Deterministic dynamical analysis usesdifferential equations to describereactions. Deterministic partial least

square (PLS) models assume thenetwork of pathways as a processorunit. Based on the appropriatequantitative experimentalmeasurements of key entities in an apriori known network of pathways, PLSmodels can be used to predict the time-dependent cross-talk between pathwaysof the network under certainconditions. Another approach is

doi:10.1371/journal.pcbi.0040016.g002

Figure 2. Example of KDO Pathway Assembly: Signal Transduction Pathways Involved during Infection due to Pathogens such as Virus, Bacteria inMammalian Dendritic Cells

Starting from a broad topic of interestinfection in mammalian dendritic cellsusing the resources in Table 1, this network of pathways was built.



5/5

stochastic modeling which uses aprobabilistic representation.Deterministic models describe averagebehavior. Stochastic approaches areimportant when the absolute numberof the reactant molecules in each cell issmall. In this condition, theprobabilistic nature of chemicalreactions may affect system behavior

and deterministic models may not bevalid. Many software tools are availablefor topological and dynamical pathwayanalysis [7,8]. &

Supporting Information

Table S1. A list of Frequently UsedDatabases, Classified Based on the Type ofInformation Represented, during aBiological Pathway Construction, TheirProperties, and URLs

A comprehensive list of databases can befound in Pathguide (http://www.pathguide.org). A, automated curation; B, both manualand automated curation; BIND,Biomolecular Interaction Network

Database; BioPP, Biological PathwayPublisher; DIP, Database of InteractingProteins; EcoCyc, Encyclopaedia of E. coliGenes and Metabolism; GNPV, GenomeNetwork Platform Viewer; HPRD, HumanProtein Reference Database; KEGG, KyotoEncyclopedia of Genes and Genomes; M,manual curation; MetaCyc, a MetabolicPathway database; MINT, MolecularInteration Database; MIPS, MunichInformation Center for Protein Sequences;

N, No; OPHID, Online Predicted HumanInteraction Database; PANTHER, ProteinAnalysis through Evolutionary RelationshipDatabase; PID, The Pathway InteractionDatabase; STKE, Signal TransductionKnowledge Environment, UNIHI, UnifiedHuman Interactome; Y, yes.

Found at doi:10.1371/journal.pcbi.0040016.st001 (61 KB DOC)

Acknowledgments

Author contributions. GAV, JS, SP, GN,and SCS wrote the paper.

Funding. Our pathway research is sup-ported by US National Institutes of HealthNIAID contract HHSN2662000500021C.

Competing interests. The authors havedeclared that no competing interests exist.

References

1. Oda K, Kitano H (2006) A comprehensive mapof the toll-like receptor signaling network. MolSyst Biol 2: 2006 0015.

2. Joshi-Tope G, Gillespie M, Vastrik I,DEustachio P, Schmidt E, et al. (2005)Reactome: a knowledgebase of biologicalpathways. Nucleic Acids Res 33: D428D432.

3. Stromback L, Jakoniene V, Tan H, Lambrix P(2006) Representing, storing and accessingmolecular interaction data: a review of modelsand tools. Brief Bioinform 7: 331338.

4. Baclawski K, Niu T (2006) Ontologies forbioinformatics. Cambridge (Massachusetts):The MIT Press.

5. Brazma A, Krestyaninova M, Sarkans U (2006)Standards for systems biology. Nat Rev Genet7: 593605

6. Alon U (2007) An introduction to systemsbiology: design principles of biological circuits.Boca Raton (Florida): Chapman & Hall/CRC.

7. Kashtan N, Itzkovitz S, Milo R, Alon U (2004)Efficient sampling algorithm for estimatingsubgraph concentrations and detectingnetwork motifs. Bioinformatics 20: 17461758.

8. Alves R, Antunes F, Salvador A (2006) Tools forkinetic modeling of biochemical networks. NatBiotechnol 24: 667672.


Getting Started in Biological Pathway Construction and Analysis

Documents

Transcript of Getting Started in Biological Pathway Construction and Analysis