Methods and resources for pathway analysis
description
Transcript of Methods and resources for pathway analysis
Methods and resources for pathway analysis
PABIO590BWeek 2
Pathways overview
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Pathways vs. networks
Gene networks• Clusters of genes (or gene products) with evidence of co-
expression• Connections usually represent degrees of co-expression• In-depth knowledge of process is not necessary• Networks are non-predictive
Biochemical pathways• Series of chained, chemical reactions• Connections represent describable (and quantifiable) relations
between molecules, proteins, lipids, etc.• Enzymatic process is elucidated• Changes via perturbation are predictable downstream
Pathways vs. networks
Gene networks Biochemical pathways
Curation Relatively easy: automated and manual
Difficult: mostly manual
Nodes Genes or gene products Any general molecule
Edges Levels of co-expression/influence or a qualitative relation
Representation of possibly quantifiable mechanisms between compounds
Fidelity Low – usually very little detail
High – specific processes
Predictive power Relatively low Relatively high
Pathway and network granularity
Level of detail
Eff
ort
to
cu
rate
General interaction
networks
Mathem
atical
simulation m
odels
Probabilistic
networks
Qualitative
networks
Curated reaction
pathways
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Yeast gene interaction network
Tong, et al., Science 303, 808 (2004)
Characteristics of the yeast gene network
• Some genes (e.g. regulatory factors) act as ‘hubs’ in a network and have many interactions– Degrees of connectivity follows the power law– Hubs may make interesting anti-cancer targets
• Clusters of genes with known function suggest function for hypothetical genes in same cluster
• Network characteristics can be used to predict protein-protein interactions
• Path between two genes tends to be short (average ~3.3 hops)
Tong, et al., Science 303, 808 (2004)
E. coli metabolic pathway
Karp, et al., Science 293, 2040 (2001)
glycolysis
Pathways: E. coli metabolic map
• Encompasses >791 chemical compounds in >744 noted biochemical reactions
• Pathway was compiled via literature information extraction and extensive manual curation– System allows for users to indicate evidence of
pathway annotations– Curation is done collaboratively with numerous
experts outside of EcoCyc
Karp, et al., Science 293, 2040 (2001)
Pathways in bioinformatics
• Most resources for pathways focus on metabolic pathways (signaling and regulatory gaining prominence)
• Pathways as a very specific subtype of networks– Like networks, can be made in computable (symbolic)
form– Specificities in chemical reactions are more predictive– Pathways can chain together, forming larger
pathways
Karp, et al., Science 293, 2040 (2001)
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Pathway repositories
• BioCyc/MetaCyc
• Kyoto Encyclopedia of Genes and Genomes (KEGG) PATHWAY DB
• BioCarta
• BioModels database
BioCyc database http://www.biocyc.org
• Pathway/genome database (PGDB) for organisms with completely sequenced genomes
• 409 full genomes and pathways deposited• Species-specific pathways are inferred form
MetaCyc• Query/navigation/pathway creation support
through the Pathway Tools software suite
http://www.biocyc.org
MetaCyc database http://www.metacyc.org
• Non-redundant reference database for metabolic pathways, reactions, enzymes and compounds
• Curation through experimental verification and manual literature review
• >1200 pathways from 1600+ species (mostly plants and microorganisms)
http://www.metacyc.org
http://www.metacyc.org
Glycolysis pathway in MetaCyc
KEGG PATHWAY database http://www.kegg.com
• Consolidated set of databases that cover genomics (GENE), chemical compounds (LIGAND) and reaction networks (PATHWAY)
• Broad focus on metabolics, signal transduction, disease, etc.
• Species-specific views available (but networks are static across all organisms)
http://www.kegg.com
http://www.kegg.com
Glycolysis pathway in KEGG
Global Pathway Map
BioCarta database http://www.biocarta.com
• Corporate-owned, publicly-curated pathway database
• Series of interactive, “cartoon” pathway maps
• Predominantly human and mouse pathways
• Contains 120,000 gene entries and 355 pathways
http://www.biocarta.com
http://www.biocarta.com
Glycolysis pathway in BioCarta
BioModels database http://www.biomodels.net
• Database for published, quantitative models of biochemical processes
• All models/pathways curated manually, compliant with MIRIAM
• Models can be output in SBML format for quantitative modeling
• 86 curated models, 40 models pending curation
http://www.biomodels.net
http://www.biomodels.net
Glycolysis pathways in BioModels
Comparison of pathway databases
MetaCyc/
BioCyc
KEGG PATHWAYS
BioCarta BioModels
Curation Manual and automated
Automated Manual Manual
Size ~621+ pathways ~289 reference pathways
~355 pathways ~126 models
Nomenclature EC, GO EC, KO None GO
Organism coverage
~500 species Various Primarily human and mouse
~475 species
Visuals Species-specific custom
Reference and species-specific
Animated, cartoonish
Non-standardized
Primary usage PGDB, computational biology
PGDB, pathway comparisons
Human pathways, disease
Simulations, modeling
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Pathway formats
• Extensible Markup Language (XML)
• Systems Biology Markup Language (SBML)
• BioPax
Extensible Markup Language (XML)
• Standard of representing information in a machine-readable way
• Similar to HTML; tags can enclose or contain data
<myXMLData><someTag>Some data here</someTag><anotherTag>More stuff here</anotherTag><attributeTag data=“embedded in tag” />
</myXMLData>
Systems Biology Markup Language
• XML-based language for representing biochemical reactions
• Oriented towards software data-sharing
• Tiered, upward-compatible architecture (two, upward-compatible levels, third planned)
• Primary intended use is for quantitative model simulations
SBML
BioPax
• Like SBML, XML-based pathway representation
• Tiered structure– Level 1: Metabolic pathway information– Level 2: Level 1 + Molecular interaction, post-
translational modification
• Intended to be a lingua franca for pathway databases
BioPax XML representation
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Inferring pathways and networks
• Experimental methods– Microarray co-expression– Quantitative trait locus mapping (QTL)– Isotope-coded affinity tagging (ICAT)– Yeast two-hybrid assay– Green florescent protein tagging (GFP tagging)
• Computational methods– Database-driven protein-protein interactions– Expression clustering techniques– Literature-mining for specified interactions
• Introduction to pathways and networks
• Examples of pathways and networks
• Review of pathway databases and tools
• Representing pathways and networks
• Methods of inferring pathways and networks
• Pathway and cellular simulations
Cellular simulations
• Study the effect perturbation has on a pathway (and thus the organism)
• Generally require extensive detail on the pathway or reactions of interest (flux equations, metabolite concentration, etc.)
• Cellular pathway simulations must manage both temporal and spatial complexity
Spatial dimension
Adapted from Kelly, H., http://www.fas.org/resource/05242004121456.pdf , via Neal, Yngve 2006 VHS, UW MEBI 591
Tem
po
ral
inte
rval
s
0.1 nm 10nm 1um 1mm 1cm 1m
pico
sec.
n
anos
ec.
m
icro
sec.
m
illis
ec.
sec
. m
in.
yr.
quantumm
echanics
molecular dynam
ics
cellular processes
systems physiology
organs and organisms
Simulation methods and techniques
Biological process Phenomena Computation scheme
Metabolism Enzymatic reaction Differential-algebraic equations, flux-based analysis
Signal transduction Binding Differential-algebraic equations, stochastic algorithms, diffusion-reaction
Gene expression Binding
Polymerization Degradation
Object-oriented modeling, differential-algebraic equations, stochastic algorithms, boolean networks
DNA replication BindingPolymerization
Object-oriented modeling, differential-algebraic equations
Membrane transport Osmotic pressureMembrane potential
Differential-algebraic equations, electrophysiology
Adapted from Tomita 2001
Research in simulation and modeling
• Virtual Cell (National Resource for Cell Analysis and Modeling)
• MCell (the Salk Institute)
• Gepasi (Virginia Tech)
• E-CELL (Institute for Advanced Biosciences, Keio University)
• Karyote/CellX (Indiana University)
Your task is to:
• Identify the functions of proteins X, Y & Z
• Identify the pathway(s) in which they are involved
• Look for differences in pathways between databases
• Examine the same pathway(s) in humans
Exercise