First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic...
-
Upload
emmeline-june-sullivan -
Category
Documents
-
view
213 -
download
0
Transcript of First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic...
First Microme Jamboree – June, Monday 27 and Tuesday 28
LABGeM teamLABGeM teamLaboratory of Bioinformatic Analysis in Genomic and Laboratory of Bioinformatic Analysis in Genomic and
MetabolismMetabolismCEACEA/DSV/IG/Genoscope & /DSV/IG/Genoscope & CNRSCNRS UMR8030 UMR8030
MicroScope functionalities to support
pathways curation
The MicroScope platformThe MicroScope platform
http://www.genoscope.cns.fr/agc/microscope
October 2002 :
Begining of the Acinetobacter baylyi ADP1genome annotation
Computational platform for the annotation and comparative analysis of bacterial genomes. - equipments (servers/disks storage/backups) - softwares and data - human resources (development/training/support)
=> it offers to the community of microbiologists high technological resources for the automatic and expert analysis of genomic data.
Labelled in 2006 (RIO) and in 2009
493493 inin FranceFrance175175 inin EuropeEurope 8181 inin USA + 110USA + 110 others countriesothers countries
859 personal accounts
{
About 980 bacterial genomes : About 980 bacterial genomes : 345 345 genomes annotated in the system genomes annotated in the system (mostly (mostly sequenced at Genoscope and in USA...) and sequenced at Genoscope and in USA...) and 635 635 from public databanksfrom public databanks
Since 2004, 33 ‘genome’ papers (4 announcements) Specific genomic analysis : 22 other
publications
Usage of the platform
Expert annotations Expert annotations ::370 000 expert annotations5000 expert annotations a month (2010)
Vis
ual
izat
ion
PrimaryDatabanks
InternalGenomicObjects
Computationalresults
PathwayGenome
DataBases
PkGDB
Dat
a M
an
agem
ent
Pro
cess
Man
agem
en
t
MaGe Web Interface
MicroCyc
JBPM Workflows
DBRelease
JBPM Database
Functional / relationalAnalyses
Primary DatabankUpdate
Login
Genome browserand
Synteny maps
Tutorial
Artemis
Data Export
CGViewLinePlot
Genome overview
Keyword searchBlast and Pattern
Phylogenetic ProfileFusion / Fission
Tandem duplicationsMinimal Gene Set
RGPfinderSNPs / InDels
KEGGMicroCyc
Metabolic ProfilePathway / Synteny
Syntondisplay
Geneeditor
JobHistory
SyntacticAnnotations
Genecart
Vallenet D, et al.«MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006
Vallenet D. et al.«MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009
Three MicroScope components
> 25> 25 methods methods ::
=> full => full automatisation :automatisation :• • genome annotationgenome annotation• • primary data up-to-primary data up-to-datedate
Integrated in a Integrated in a workflow workflow
management management systemsystem
Public tools : RepSeek (repeats), Oriloc (oriC/terC position), tRNAscan-SE (tRNA genes), Blast on Rfam (snRNA genes).
“homemade” tools : findrRNA (rRNA genes), AMIMat (gene models according to codon usage), AMIGene (based on GeneMark), MICheck (re-annotation of public bacterial genomes).
Tools for the syntactic & functional annotationTools for the syntactic & functional annotation
Syntactic annotation
Functional annotation Public tools : BLAST (searches in specialized databases and Uniprot), InterproScan (domains and functional sites), COGnitor (COG protein families), PRIAM (enzymatic functions), Pathway tools (metabolic pathways reconstruction), SignalP & TMHMM & PSORT (protein localisation). “homemade” tools : Syntonizer (gene context analysis),and at the end, AutoFAssign, automatic functional annotation procedure :Blast on ‘reference genome annotations’ &
syntenies > HAMAP results > TIGRfam/Pfam results & Blast on UniProt
Gene Ontogoly (GO classification) <- InterProScan results
Classification of protein genesClassification of protein genes
Functional classifications from annotation tools
Functional classifications (Gene Editor)
COG classification <- COGnitor results
MultiFun (E. coli; M. Riley) TIGR main roles
Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation
project (www.pseudomonas.com)
Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation
project (www.pseudomonas.com)
Other kind of classification
Results available to correct/complete annotationResults available to correct/complete annotation
Annotations from reference genomes
MicroScope curated annotations
Synteny results on available complete bacterial genomes
TrEMBL contains functional annotations which often come from automatic
procedures only:‘IPMed?’ is used for proteins that may
have an experimentally validated function.
TrEMBL Blast similarities: example
IPMed =InterestingPubMed?
One instance of PkGDB for all MicroScope projects Collaborative annotation
Annotator accounts and rights on sequences
Annotation history
Public/primary data Data generated during the annotation process (analysis results and expert annotations)
The MicroScope platform : data management -1-The MicroScope platform : data management -1-
Data organisation and persistence :
Relational DataBase PkGDBRelational DataBase PkGDB(Prokaryotic Genome DataBase)
EC numberscorrespondence
BacterialGenome
Pathway Tools A metabolic database is built for each annotated microbial
genomePGDB = Pathway/Genome Database (orgname_Cyc)
(P. Karp, SRI, USA)
• Experimentally elucidatedmetabolic pathways • 1600 pathways from 2000 organisms
http://www.genoscope.cns.fr/agc/http://www.genoscope.cns.fr/agc/microcycmicrocyc
Today: Today: 977 organisms, 20 Go
The MicroScope platform : data management -2-The MicroScope platform : data management -2-
Enzymatic activities prediction (PRIAM)
«Metabolic profiles» functionality
Total number of reactions in pathway x
Select organisms to
compare
Select pathway classesNumber of reactions for
pathway x in a given organism
PkGDB
Metabolic phyloprofile : example of results
Using the “Keywords Search” functionality
Automatically annotated genes + validated genes Only all/personal validated genesOnly annotations from databank files or from our annotation pipeline Gene/Protein features: G+C%, MW, Pi Specific fields of the gene editor: Comments/Note
BlastP/Synteny results against: The set of genomes of the Microscope project
Escherichia coli (updated annotation ) or Bacillus subtilis (SubtiList database) annotations
The set of E. coli, B. subtilis, or P. aeruginosa essential genes
Genes involved in synteny groups and annotated as Protein of Unknown Function or Putative enzyme The set of similarities obtained with different sources: - HAMAP High-quality Automated/Manual Annotation - SwissProt or TrEMBL databank, limited or not to blast hits having a possible interesting PubMedID - PRIAM enzymatic profiles (Enzyme commission), - COG databank, - InterPro databank
Genes encoding enzymes involved in KEGG and BioCyc metabolic pathways
The results obtained with SignalP, Tmhmm, PsortB and Coiled Coil
Available datasets to be explored ?
Query on Query on P. putida P. putida annotationannotation
Step1 : genes annotated as « unknown function » => 2093 results (35%)
Step1 : genes annotated as « unknown function » => 2093 results (35%)
Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?
Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?
Results of the query...Results of the query...
Result : 216 genes (123 in SP and 93 in TrEMBL)« Get gene » => 114 genes (can be re-annotated)
Syntaxic re-annotation of Syntaxic re-annotation of P. putidaP. putida
PSEPK3868QuinohemoproteiQuinohemoprotei
n amine n amine dehydrogenasedehydrogenase
PP3461PP3460
PP3459
PP3462
PP3463PP3464
PP3465
PSEPK3872
PSEPK3873
PP3466
• Correspondence relationship
= Sequence similarity : BlastP
Bidirectional Best Hit
OR
at least 30% identity on 80% of the shortest sequence
• Co-localization
Gap = 5
Bacterial synteny: parameters
A putative ortholog toACIAD2440 on the E. coligenome
ACIAD2450
A putative paralog to ACIAD2450with two others co-localizedADP1 genes (in yellow)
Another putative paralog toACIAD2450, elsewhere on theADP1 chromosome
ACIAD2440
This P. putida « ortholog » (PP0114)is in synteny with two other genes(coloured in blue-purple).
These two P. putida genes (PP0220 andPP4425) are similar to ACIAD2450(putative paralogs of PP0114 ?)
How to read the synteny maps ?
How are genes organized in a synteny group ? -2-
« Syntonome » results in the gene annotation editor
PkGDB proteomes PkGDB proteomes
NCBI + WGS proteomes NCBI + WGS proteomes
KeyWordsBlast / MotivesPhylogenetic
profiles Fusions / Fissions
Genomic islandsMetabolic profilesExploration
Synteny map
MicroScope project
Authentication
CGView
Artemis
LinePlot
Metabolic pathways
Synton visualization
Annotation editorEXPERT CURATION
Help
Export
Options
Genome Overview
MicroScope web interfaces : MaGeMicroScope web interfaces : MaGe
MicroScope tutorial
With the help of the Analysis Results
section
This automatic information does not need to be changed
This information must be completed or corrected by the annotator
This information is optional
Annotation data in the ‘Gene Validation’ section of the editor
New
Adding gene-protein-reaction association (metacyc reactions)
PP0082 = trpA gene
List of the predicted reactions linked to the gene List of the predicted reactions linked to the gene
Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number
1
2
3
Adding gene-protein-reaction association (metacyc reactions)
PP0082 = trpA gene
PP0083 = trpB gene
Added for PP
David Vallenet Demo :please go to
http://www.genoscope.cns.fr/agc/microscope/