First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic...

30

Transcript of First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic...

Page 1: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.
Page 2: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

First Microme Jamboree – June, Monday 27 and Tuesday 28

LABGeM teamLABGeM teamLaboratory of Bioinformatic Analysis in Genomic and Laboratory of Bioinformatic Analysis in Genomic and

MetabolismMetabolismCEACEA/DSV/IG/Genoscope & /DSV/IG/Genoscope & CNRSCNRS UMR8030 UMR8030

MicroScope functionalities to support

pathways curation

Page 3: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

The MicroScope platformThe MicroScope platform

http://www.genoscope.cns.fr/agc/microscope

October 2002 :

Begining of the Acinetobacter baylyi ADP1genome annotation

Computational platform for the annotation and comparative analysis of bacterial genomes. - equipments (servers/disks storage/backups) - softwares and data - human resources (development/training/support)

=> it offers to the community of microbiologists high technological resources for the automatic and expert analysis of genomic data.

Labelled in 2006 (RIO) and in 2009

Page 4: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

493493 inin FranceFrance175175 inin EuropeEurope 8181 inin USA + 110USA + 110 others countriesothers countries

859 personal accounts

{

About 980 bacterial genomes : About 980 bacterial genomes : 345 345 genomes annotated in the system genomes annotated in the system (mostly (mostly sequenced at Genoscope and in USA...) and sequenced at Genoscope and in USA...) and 635 635 from public databanksfrom public databanks

Since 2004, 33 ‘genome’ papers (4 announcements) Specific genomic analysis : 22 other

publications

Usage of the platform

Expert annotations Expert annotations ::370 000 expert annotations5000 expert annotations a month (2010)

Page 5: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Vis

ual

izat

ion

PrimaryDatabanks

InternalGenomicObjects

Computationalresults

PathwayGenome

DataBases

PkGDB

Dat

a M

an

agem

ent

Pro

cess

Man

agem

en

t

MaGe Web Interface

MicroCyc

JBPM Workflows

DBRelease

JBPM Database

Functional / relationalAnalyses

Primary DatabankUpdate

Login

Genome browserand

Synteny maps

Tutorial

Artemis

Data Export

CGViewLinePlot

Genome overview

Keyword searchBlast and Pattern

Phylogenetic ProfileFusion / Fission

Tandem duplicationsMinimal Gene Set

RGPfinderSNPs / InDels

KEGGMicroCyc

Metabolic ProfilePathway / Synteny

Syntondisplay

Geneeditor

JobHistory

SyntacticAnnotations

Genecart

Vallenet D, et al.«MaGe - a microbial genome annotation system supported by synteny results» Nucleic Acids Research 2006

Vallenet D. et al.«MicroScope - a platform for microbial genome annotation and comparative genomics» Database 2009

Three MicroScope components

> 25> 25 methods methods ::

=> full => full automatisation :automatisation :• • genome annotationgenome annotation• • primary data up-to-primary data up-to-datedate

Integrated in a Integrated in a workflow workflow

management management systemsystem

Page 6: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Public tools : RepSeek (repeats), Oriloc (oriC/terC position), tRNAscan-SE (tRNA genes), Blast on Rfam (snRNA genes).

“homemade” tools : findrRNA (rRNA genes), AMIMat (gene models according to codon usage), AMIGene (based on GeneMark), MICheck (re-annotation of public bacterial genomes).

Tools for the syntactic & functional annotationTools for the syntactic & functional annotation

Syntactic annotation

Functional annotation Public tools : BLAST (searches in specialized databases and Uniprot), InterproScan (domains and functional sites), COGnitor (COG protein families), PRIAM (enzymatic functions), Pathway tools (metabolic pathways reconstruction), SignalP & TMHMM & PSORT (protein localisation). “homemade” tools : Syntonizer (gene context analysis),and at the end, AutoFAssign, automatic functional annotation procedure :Blast on ‘reference genome annotations’ &

syntenies > HAMAP results > TIGRfam/Pfam results & Blast on UniProt

Page 7: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Gene Ontogoly (GO classification) <- InterProScan results

Classification of protein genesClassification of protein genes

Functional classifications from annotation tools

Functional classifications (Gene Editor)

COG classification <- COGnitor results

MultiFun (E. coli; M. Riley) TIGR main roles

Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation

project (www.pseudomonas.com)

Inspired by the ‘protein name confidence’ defined in PseudoCAP = Pseudomonas aeruginosa community annotation

project (www.pseudomonas.com)

Other kind of classification

Page 8: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Results available to correct/complete annotationResults available to correct/complete annotation

Annotations from reference genomes

MicroScope curated annotations

Synteny results on available complete bacterial genomes

TrEMBL contains functional annotations which often come from automatic

procedures only:‘IPMed?’ is used for proteins that may

have an experimentally validated function.

Page 9: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

TrEMBL Blast similarities: example

IPMed =InterestingPubMed?

Page 10: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

One instance of PkGDB for all MicroScope projects Collaborative annotation

Annotator accounts and rights on sequences

Annotation history

Public/primary data Data generated during the annotation process (analysis results and expert annotations)

The MicroScope platform : data management -1-The MicroScope platform : data management -1-

Data organisation and persistence :

Relational DataBase PkGDBRelational DataBase PkGDB(Prokaryotic Genome DataBase)

Page 11: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

EC numberscorrespondence

BacterialGenome

Pathway Tools A metabolic database is built for each annotated microbial

genomePGDB = Pathway/Genome Database (orgname_Cyc)

(P. Karp, SRI, USA)

• Experimentally elucidatedmetabolic pathways • 1600 pathways from 2000 organisms

http://www.genoscope.cns.fr/agc/http://www.genoscope.cns.fr/agc/microcycmicrocyc

Today: Today: 977 organisms, 20 Go

The MicroScope platform : data management -2-The MicroScope platform : data management -2-

Enzymatic activities prediction (PRIAM)

Page 12: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

«Metabolic profiles» functionality

Total number of reactions in pathway x

Select organisms to

compare

Select pathway classesNumber of reactions for

pathway x in a given organism

PkGDB

Page 13: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Metabolic phyloprofile : example of results

Page 14: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Using the “Keywords Search” functionality

Page 15: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Automatically annotated genes + validated genes Only all/personal validated genesOnly annotations from databank files or from our annotation pipeline Gene/Protein features: G+C%, MW, Pi Specific fields of the gene editor: Comments/Note

BlastP/Synteny results against: The set of genomes of the Microscope project

Escherichia coli (updated annotation ) or Bacillus subtilis (SubtiList database) annotations

The set of E. coli, B. subtilis, or P. aeruginosa essential genes

Genes involved in synteny groups and annotated as Protein of Unknown Function or Putative enzyme The set of similarities obtained with different sources: - HAMAP High-quality Automated/Manual Annotation - SwissProt or TrEMBL databank, limited or not to blast hits having a possible interesting PubMedID - PRIAM enzymatic profiles (Enzyme commission), - COG databank, - InterPro databank

Genes encoding enzymes involved in KEGG and BioCyc metabolic pathways

The results obtained with SignalP, Tmhmm, PsortB and Coiled Coil

Available datasets to be explored ?

Page 16: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Query on Query on P. putida P. putida annotationannotation

Step1 : genes annotated as « unknown function » => 2093 results (35%)

Step1 : genes annotated as « unknown function » => 2093 results (35%)

Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?

Step2 : which ones have blast similarities (<> unknown functions) with UnitProt entries linked to PubMedID ?

Page 17: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Results of the query...Results of the query...

Result : 216 genes (123 in SP and 93 in TrEMBL)« Get gene » => 114 genes (can be re-annotated)

Page 18: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Syntaxic re-annotation of Syntaxic re-annotation of P. putidaP. putida

PSEPK3868QuinohemoproteiQuinohemoprotei

n amine n amine dehydrogenasedehydrogenase

PP3461PP3460

PP3459

PP3462

PP3463PP3464

PP3465

PSEPK3872

PSEPK3873

PP3466

Page 19: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

• Correspondence relationship

= Sequence similarity : BlastP

Bidirectional Best Hit

OR

at least 30% identity on 80% of the shortest sequence

• Co-localization

Gap = 5

Bacterial synteny: parameters

Page 20: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

A putative ortholog toACIAD2440 on the E. coligenome

ACIAD2450

A putative paralog to ACIAD2450with two others co-localizedADP1 genes (in yellow)

Another putative paralog toACIAD2450, elsewhere on theADP1 chromosome

ACIAD2440

This P. putida « ortholog » (PP0114)is in synteny with two other genes(coloured in blue-purple).

These two P. putida genes (PP0220 andPP4425) are similar to ACIAD2450(putative paralogs of PP0114 ?)

How to read the synteny maps ?

Page 21: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

How are genes organized in a synteny group ? -2-

Page 22: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

« Syntonome » results in the gene annotation editor

PkGDB proteomes PkGDB proteomes

NCBI + WGS proteomes NCBI + WGS proteomes

Page 23: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

KeyWordsBlast / MotivesPhylogenetic

profiles Fusions / Fissions

Genomic islandsMetabolic profilesExploration

Synteny map

MicroScope project

Authentication

CGView

Artemis

LinePlot

Metabolic pathways

Synton visualization

Annotation editorEXPERT CURATION

Help

Export

Options

Genome Overview

MicroScope web interfaces : MaGeMicroScope web interfaces : MaGe

Page 24: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

MicroScope tutorial

Page 25: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.
Page 26: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

With the help of the Analysis Results

section

This automatic information does not need to be changed

This information must be completed or corrected by the annotator

This information is optional

Annotation data in the ‘Gene Validation’ section of the editor

Page 27: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

New

Page 28: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Adding gene-protein-reaction association (metacyc reactions)

PP0082 = trpA gene

List of the predicted reactions linked to the gene List of the predicted reactions linked to the gene

Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number Click on EC to search for all MetaCyc reactions corresponding to the annotated EC number

1

2

3

Page 29: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

Adding gene-protein-reaction association (metacyc reactions)

PP0082 = trpA gene

PP0083 = trpB gene

Added for PP

Page 30: First Microme Jamboree – June, Monday 27 and Tuesday 28 LABGeM team Laboratory of Bioinformatic Analysis in Genomic and Metabolism CEA/DSV/IG/Genoscope.

David Vallenet Demo :please go to

http://www.genoscope.cns.fr/agc/microscope/