Biocuration2012 Eugeni Belda
-
Upload
eugenibc -
Category
Technology
-
view
410 -
download
1
description
Transcript of Biocuration2012 Eugeni Belda
From bacterial genome annotation to metabolic pathway curation
Eugenio Belda
Laboratory of Bioinformatic Analysis in Genomic and Metabolism (LABGeM team)
CEA/DSV/IG/Genoscope & CNRS UMR8030
Introduction Advances in sequencing technologies has allowed an exponential accumulation of complete genome sequences in public databases in recent years.
However, wide gap exist between rapid advances in genome sequencing and slow progress in characterization of new protein functions
Genoscope (French National Sequencing Center) has as one fundamental research objective the extension of in silico sequence annotations with experimental characterization of new enzymatic functions (Metabolic Genomics).
Lab. of Genomics & Biochemistry of Metabolism (LGBM) Lab. of Organic Chemistry and Biocatalysis (LCOB)Lab. For enzymatic cloning and screening (LCAB)Lab. of Bioinformatic Analysis in Genomic and Metabolism(LABGeM)
26%of
unknown functions
4712 enzymatic activities
(EC number)
25% of orphan
reactions
12273 protein families (Pfam)
?
Three MicroScope componentsV
isu
ali
zati
on
PrimaryDatabanks
InternalGenomicObjects
Computationalresults
PathwayGenome
DataBases
PkGDB
Data
Man
ag
em
en
tPro
cess
Man
ag
em
en
t
MaGe Web Interface
MicroCyc
DBRelease
JBPM Database
Functional / relationalAnalyses
Primary DatabankUpdate
Login
Genome browserand
Synteny maps
Tutorial
Artemis
Data Export
CGViewLinePlot
Genome overview
Keyword searchBlast and Pattern
Phylogenetic ProfileFusion / Fission
Tandem duplicationsMinimal Gene Set
RGPfinderSNPs / InDels
KEGGMicroCyc
Metabolic ProfilePathway / Synteny
Syntondisplay
Geneeditor
JobHistory
SyntacticAnnotations
Genecard
Vallenet D, et al.«MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006
Vallenet D. et al.«MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009
> 25 methods :
=> full automatisation :• genome annotation• primary data up-to-date
Integrated in a workflow
management system
EC / reactioncorrespondence
Pathway Tools A metabolic database is built for each annotated microbial
genomePGDB = Pathway/Genome Database (orgname_Cyc)
(P. Karp, SRI, USA)
• Experimentally elucidatedmetabolic pathways • 1800 pathways from 2216 organisms
Today: 1233 organisms (of which 676 public
genomes)PkGDB
http://www.genoscope.cns.fr/agc/microcyc
Database Management
Mapping on the KEGG metabolic
maps (http://www.kegg.jp/
)
Relational DataBase PkGDB(Prokaryotic Genome DataBase)
www.genoscope.cns.fr/agc/microscope
MicroScope Web site
«guest» access«guest» access
More than 30 tools are made available to the community
Since 2005, more than 50.000 expert
annotations per year
> 1,000 users, 300 active
Curation of metabolic data in Microscope CanOE (Candidate genes for Orphan Enzymes): Method for the automatic integration of genomic and metabolic contexts, that assists expert functional annotation, especially in the case of orphan enzymes. Based on the concept of Metabolon (“close” genes in genome sequence associated to “close” metabolic reactions):
reactions and compounds in metabolic network
genes on genome
gene gaps
reaction gapAnd ORPHAN
functional annotations
?
The method provides candidate genes for global/local orphan enzymatic activities that are located in the “gaps” of metabolons
https://www.genoscope.cns.fr/agc/microscope/metabolism/canoe.php
Boyer et. Al; Bioinformatics 2005; Dec 1;21(23):4209-15.
Curation of metabolic data in Microscope
CanOE (Candidate genes for Orphan Enzymes)
Example: Allantoin degradation metabolon in E. coli K122.1.3.5 is a global orphan reaction (no associated to any gene in any organism)
Three candidate genes for EC:2.1.3.5 reaction
None share any significant similarities with kown carbamoytransferases Protein expression and biochemical assays under way
Smith AAT, Belda E., Viari A., Médigue C., and Vallenet D. “The CanOE strategy: integrating genomic and metabolic contexts across multiple prokaryote genomes to find candidate genes for orphan enzymes” (Plos Computational Biology, In revision)
GPR curation interface: In the context of network reconstruction, is essential the definition of Gene-Protein-Reaction associations (Genes encoding enzymes/complexes/isozymes catalyzing a particular metabolic reaction):
Thiele & Palsson; Nat Protoc. 2010;5(1):93-121
Curation of metabolic data in Microscope
GPR curation interface: The gene curation interface of Microscope allows the validation of Gene-Reaction associations based on curated gene annotations. Two reference reaction resources availables, MetaCyc (functional) and RHEA (under development):
4.1.3.27, 2.4.2.18 Automatic retrieval of Metacyc/Rhea reactions based on
EC number Keyword
search
Curation of metabolic data in Microscope
Pathway validation interface: Validation/curation of automatically projected MetaCyc pathways based on Gene-Reaction associations:
Curation of metabolic data in Microscope
Projet Microme : www.microme.eu
Purpose : develop bioinformatics infrastructures, together with a projection and curation process, in order to generate : - complete metabolic pathways from genome annotations - whole-cell metabolic models from pathway assemblies
A Knowledge-Based Bioinformatics Frameworkfor Microbial Pathway Genomics
Experimentally validation of metabolic model using growth phenotype data (i.e, BIOLOG experiments) generated within the project for a subset of selected species.
Analytical tools are integrated for comparative and phylogenetic analysis based on projected pathways and metabolic models
AMAbiotics
CEA-Genoscope
Center for research and Technology Hellas
ISTHMUS
Molecular Networks
Swiss Institute of Bioinformatics
Wellcome Trust Sanger Institute
Wageningen University
Université Libre de Bruxelles
Tel-Aviv University
Spanish National Cancer Centre
German Collection of Microorganisms and Cell Cultures
European Bioinformatics Institute
Centro Nacional de Biotecnología
Microme WP2: Objectives
Unification of existing metabolic resources:
Pivot resources: ChEBI (chemical compounds) and Rhea (chemical reactions) Cross-references External resources (compounds, reactions, pathways): KEGG,
MetaCyc, Metabolic modelsAlcantara R., Axelsen K.B., Morgat A., Belda E., Coudert E., Bridge A., Cao H., de Matos P., Ennis M., Turner S., Owen G., Bougueleret L., Xenarios I., and Steinbeck C. (2012) Rhea - a manually curated resource of biochemical reactions. Nucleic Acids Research. 40, D754-D760, Database issue.
Provide EU with a curated microbial metabolic resource
Implement a unique cyclic and colaborative curation process for metabolic data
MicroScope and Microme Use MicroScope as reference resource of curated GPR (Gene Protein Reaction) associations for microbial genomes included in Microme project
Development of novel interfaces for GPR curation in Microscope environment. Retrieval of METACYC and RHEA reactions for a particular gene object from EC number annotations
Web-services
PkGDBmicrocycReconstruction
Each night
Curation tool
MicroScope and Microme Development of web-services to provide Microme partners with curated Gene-Reaction associations from Microscope platform
Test-case: Bacillus subtilis 168 re-annotation
Second most intensively studied bacterium after Escherichia coli, being a model organism for Gram-positive bacteria
Re-sequencing and first re-annotation of the genome in 2009
Genome sequenced in 1997. 4,214 Megabases, 4000 CDSs
Nature 1997 Nov 20;390(6657):249-56
Microbiology (2009), 155, 1758-1775
Re-annotation of the genome in the context of Microme project with special focus in the curation of Gene-Reaction associations by using Microscope metabolic tools and curation interface. Collaborative work LABGeM (CEA)-SIB-AMAbiotics (Antoine Danchin)
531 CDSs
378 CDSs
508 CDSs
310 CDSs Predicted MetaCyc reaction; BBH relationship with E. coli CDSs
Predicted MetaCyc reaction; No BBH relationship with E. coli CDSs
"Putative enzymes" in Product type annotation; No predicted MetaCyc re-action
"Enzymes" in Product type annotation; No predicted MetaCyc reaction
Starting data for curation of Gene-Reaction associations
Test-case: Bacillus subtilis 168 re-annotation
Test-case: Bacillus subtilis 168 re-annotationFrom the 909 CDS with predicted reaction
531 with BBH in E. coli:
416 with same GPR in B. subtilis and E. coli (EcoCyc)
115 CDS with different GPR in B. subtilis and E. coli (EcoCyc)
378 without BBH in E. coli:
254 with GPR predicted from the curated EC number
124 with GPR predicted from “product” annotation
310 CDS with “enzyme” annotation and without predicted reaction
508 CDS with “enzyme” annotation and without predicted reaction: Filter by Catalytic activity field in SwissProt annotations (41 CDSs)
Automatic validation of Gene-Reaction associations
Manual curation of Gene-Reaction associations in Microscope
environment
Sequence similarity profiles
Genomic context conservation
Integration of genomic and metabolic context (CanOE strategy)
Co-evolution patterns of functionally related genes
Test-case: Bacillus subtilis 168 re-annotation
Problems associated to automatic predictions of Gene-Reaction associations. Example: Generic EC number definition associated to multiple specific reaction instances in MetaCyc
No experimental evidence of activity ;
generic product annotation
17 predicted reactions based on EC:1.2.1.3 annotation. Problems in terms of modelling purposes
Without experimental evidence of specific substrates, only generic reaction has been validated
Test-case: Bacillus subtilis 168 re-annotation
0 200 400 600 800 1000 1200 1400 1600
1406 (715)
1006 (517)
985 (388)
1549
901
1022
Initial Gene-Reac-tion predictions (Pathway Tools)
Current Gene-Reac-tion associations (Manually Curated)
Stats of curation Gene-Reaction associations in Microscope
Nº Gene-Reaction associations
Nº CDS
Nº reactions
105 CDS without automatically predicted
reaction in initial projections
147 new reactions added (not originally predicted) 184 originally predicted reactions removed
Test-case: Bacillus subtilis 168 re-annotation
13 possible new metabolic pathways/pathway variants not presents in MetaCyc
Biotin biosynthesis pathway variant Lipoate biosynthesis pathway variant Myoinositol catabolism pathway variant Rhamnogalacturonan type I degradation pathway variant Acetoin dehydrogenase pathway variant Methionin salvage pathway variant Bacillaene biosynthesis pathway Aerobic respiration pathway variants
17 possible updates of SwissProt annotations
6 possible new EC numbers
Reported to SwissProt/IUBMB
curators
Aromatic polyketide biosynthesis pathway 2-methylthio-N6-threocarbamoyladenosine biosynthesisBacilysocin biosynthesisArchaeal-type ether lipid biosynthesisBacillaene biosynthesis pathway Methionine-Cysteine interconversion
New pathway variants
New metab.
pathways
Test-case: Bacillus subtilis 168 re-annotation Biotin biosynthesis pathway variant: Update of DAP aminotransferase pathway variant (EC:2.6.1.62)
KEGG pathway (map00780) MetaCyc pathway (PWY-5005)
S-Adenosyl-L-methionine as amino
group donor
L-lysine instead S-adenosyl-Methionine as amino group donor in Bacillus subtilis BioA enzyme
Test-case: Bacillus subtilis 168 re-annotation Biotin biosynthesis pathway variant: Link with fatty acid metabolism. Improvement of genome-scale metabolic models
iBsu1103: Most up-to-date B. subtilis 168 metabolic model (SEED methodology; 1437 reactions, 1103 genes). Henry CS, Zinner JF, Cohoon MP, Stevens RL. Genome Biol. 2009;10(6):R69
Dead-end metabolite
Not included in Biomass equation
iBsu1103 iBsu1103; Biotin in Biomass
iBsu1103; External influx Pimelate
iBsu1103; External influx Biotin
0.0020.0040.0060.0080.00
100.00120.00140.00
122.97
0.00
122.97 122.97
FBA simulations iBsu1103 model
Biom
ass
prod
. rat
eEX_pimelate
EX_biotin
Auxotrophic for Biotin
biosynthesis
Test-case: Bacillus subtilis 168 re-annotation
BioI enzyme of B. subtilis 168: cytochrome P450 protein that catalyzes the oxidative cleavage of acyl-ACP/free fatty acid molecules generated in the context of fatty acid biosynthesis yielding pimeloyl-ACP as primary product.
An Acyl-ACP
Pimeloyl-ACP
BioI (BSU30190)
A fatty acidBioI
(BSU30190)
Fatty acids metabolism L-Alanine+H+
CO2+HoloACP
BioF (BSU30220)
Future work
Extension of the reference set of Microme species to: Acinetobacter sp. ADP1 Pseudomonas putida KT2440 Bacillus subtilis 168
Second version of Gene-Reaction curation interface in Microscope environment:
Curation of protein complexes / Isozyme sets Management of Rhea reactions in addition of MetaCyc reactions
Definition of strategies for vertical annotation and propagation of curated GPR across multiple microbial genomes
Use UniPathway as reference resource of metabolic pathways in Microscope; Specie-specific pathway representations based on Pathway modules combination (http://www.unipathway.org)
Contributions
Claudine Médigue (Group Leader)David Vallenet (Researcher)Damien Monrico (Engineer)François Lefèvre (Engineer)Alexander T. Smith (PhD)Eugeni Belda (Post doc)
Claude ScarpelliLudovic Fleury
IT team
Anne Morgat Antoine Danchin
External partners
Foundings
EU Framework Programme 7 Collaborative Project. Grant Agreement Number 222886-2