A consortium to accelerate the applications of high-throughput genome analysis,functional genomics...

31
A consortium to accelerate A consortium to accelerate the applications of high-throughput genome the applications of high-throughput genome analysis,functional genomics analysis,functional genomics in in Immunology, Developmental Biology, Immunology, Developmental Biology, Microbiology Microbiology & & Human Pathology Human Pathology Marseille-Nice Genopole Marseille-Nice Genopole Denis THIEFFRY & Richard CHRISTEN
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of A consortium to accelerate the applications of high-throughput genome analysis,functional genomics...

Page 1: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

A consortium to accelerate A consortium to accelerate the applications of high-throughput the applications of high-throughput

genome analysis,functional genomics genome analysis,functional genomics in in

Immunology, Developmental Biology, Immunology, Developmental Biology, MicrobiologyMicrobiology && Human PathologyHuman Pathology

Marseille-Nice GenopoleMarseille-Nice Genopole

Denis THIEFFRY & Richard CHRISTEN

Page 2: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

1. Bioinformatics

2. Transcriptome studies2. Transcriptome studies

3 Functional exploration in 3 Functional exploration in vertebratesvertebrates

4 Functional exploration in4 Functional exploration in non-vertebrate non-vertebratess

5. Genome variations5. Genome variations

6. Cancer genomics6. Cancer genomics

7. Microbial sequencing7. Microbial sequencing

8. Structural genomics8. Structural genomics

9. Proteomics9. Proteomics1O. Teaching of bioinformatics

Scientific Actions

Page 3: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics

Strategy and instruments:-Bi-monthly interdisciplinary seminars + summer

schools and workshops-Development of a teaching platform dedicated to

computational biology (Marseille + Nice)

Aim: Developing inter-disciplinary research in relation with

genomics, transcriptomics & proteomics

Page 4: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Teaching Plateform

Dedicated teaching rooms each with 20 terminals (Marseille + Nice)

Powerful dedicated servers with main molecular databases and

bioinformatic suites (EMBOSS, SRS, BLAST...)

Development of computational biology curricula: Licence, DESS, DEA, Engineering degrees, Research and Professional Masters,

Doctorate

Page 5: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics: Main Research Areas

1.Transcriptome: integration and analysis of expression data; DNA array conception

2.Pattern discovery and search in nucleic acid sequences

3.Integration and modelling of functional macromolecular system data; microbial genome annotation; databases

4.Computational analysis of genetic regulatory networks

5.Phylogeny

Page 6: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Main laboratoriesLaboratories Permanent staff involved

Architecture et Formation des Macromolécules Biologiques(AFMB)

B. Henrissat, P. Coutinho, B. Canard, M. Tegoni

Centr e d' I mmunologie de Marseille-Luminy (CIM L) P. Ferrier, J . Ewbank

Centr e de Physique Théorique (CPT) P. Chiappett a, A. Lambert , R. Lima

I nf ormation Génomique et Structurale (I GS)J -M. Claverie, C. NotreDame, Ph. Derreumaux, H.Ogata, S. Audic, C Abergel, K. Suhre

I nstitut de Mathématiques de Luminy (IM L) A. Guénoche, B. Mosse, E. Remy, B. Ghatt as

I nstitut de Pharmacologie Moléculaire et Cellulaire (IM PC) P. Barbry, H. Prieto

Laboratoire d’Analyse, Topologie et Probabilités (LAT P) B. Torresani, M-C. Roubaud, E. Pardoux

Laboratoire de Biologie Virtuelle (LBV) R. Christen, C. Pasquier

Laboratoire de Chimie Bactérienne (LCB) G. Fichant, Y. Quentin

Laboratoire de Génétique et Physiologie du Développement(LGPD)

D. Thieff ry, C. Chaouiya, B. J acq, M. Piovant,L. Röder, P. Lemaire, T. Lecuit

Laboratoire d'I nf ormatique Fondamentale (LIF)C. Sabatier, Y. Vaxes, C. Capponi, H. Garrett a, V.Chepoi, M. van Caneghem

Laboratoire de Phylogénomique (LPG) P. Pontarotti, A. Gilles, C Brochier, N Pech

Unité des Rickettsies (UR) D. Raoult, M. Drancourt , J- P. Fournier, P. Renesto

Réactions des Organismes aux Stress de l'Environnement (ROSE) R. Feyereisen, E. Deleury

Te chniques Avancées pour le Génome et la Clinique (TAGC)D. Gautheret, P. Hingamp, C. N'Guyen, R. Houlgatt e

Bioénergétique et I ngénérie des Protéines (BI P) J -P. Belaich, H-P. Fierobe

Bio Math/Phys Info

Page 7: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics: Stimulation of Interdisciplinary Research

Interdisciplinary research projects Laboratories

In tegration and analysis DNA chip data TAGC, CPT, CIML, IM L, LATP

Software development f or DNA-chip analysis IMP C, LBV, ROSE, TAGC

Conception of DNA chip for microbiology LBV, IM L

Pattern discovery/ search in nucleic sequences TAGC, CPT, LGPD

Bioinformatics applied to microbiology I GS, UR

Structural bioinformatics for glycobiology andvirology

AFMB, BIP

Modelling of functional macromolecular systems LCB, LIF , IM L

Computational analysis of regulatory networks LGPD, IM L, LIF , CPT

Phylogenomics: comparative analysis of chordategenomes

LPG, LATP

Bio Math/Phys Info

Page 8: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

CDD in Bioinformatics

Dispatching of Bioinformatics CDD in support of the transcriptome and proteome platforms

• Jean Fred FONTAINE (1/4/2002-31/08/2002) - Marseille (5 months)Development of a JAVA environment to process transcriptome data (normalisation, statistical analysis, classification...): interface; evolution toward a distributed scheme (CORBA).

• Pierre Fabrice LOPEZ (1/11/2002-30/4/2002) - Marseille (6 months)Development of a Java software for the automatic quantification of microarray data (Bzscan). Development of statistical tools for the normalisation and the analysis of DNA array data (Genesys), using statistical function of R and the distributed architecture CORBA. Data in the format XML with a complete compatibility with MIAME recommendations.

• David BOURGAIS (01/05/03-31/10/03) - Marseille (6 months)Integration data analysis software (ex: SEAQUEST) for the proteomics platform of Marseille.

• Kevin Le Brigand (01/05/2003-26/05/2004) - Nice (12 months)Development of the MEDIANTE application at Sophia Antipolis.

Page 9: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Complete genome sequence & annotation of a nannobacteriumRamlibacter tataouinensis

Bioinformatics Research: Microbial genome annotation

•Sandysoil bacteria (Tataouine,Tunisia)

Rod -shaped cell(peripheral differen-

tiationon agar)

Motile

Cellular Division

Dessiccation Resistance

Cyst-like cell

Ramlibacter tataouinensisRamlibacter tataouinensisRamlibacter tataouinensisTTB310TTB310TTB310

Motile rod

CystDifferentiation

signal ?

200 nm

200 nm

•Presents two morphotypes with unequivalent properties

( 200 nm)

( 800 nm)

Differentiation

signal ?

BirthJan.

1999

LEMiR/ UMR 163 CNRS-CEA, Cadarache

LBV Nice

Heulin et al. (2003). Int J Syst Evol Microbiol 53: 589-594.

Page 10: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Sequences assemblyfrom 52,000 to 70,000 sequences assembled

Genome project FXOA genoscope (grants from CNRS, MESR, CEA)

Dec.Sept.2002

Sept.Dec.

2002Dec.Apr.

2003Apr.

June2003

JuneDec.

2003

Preparation genome size estimation by Pulse Field Gel Electrophoresis (4 Mb)

search for financial supportCloning

purified DNA (phenol/chloroform) sent to the GENOSCOPE

Construction of four plasmidic banks

  "Pairwise" sequencing around 35,000 clones sequenced

design of

Dec.Jan.

2004Beginning of the annotation

Complete genome sequence & annotation of Ramlibacter tataouinensis

Bioinformatics Research: Microbial genome annotation

Finishing , ..., final assembly primers and sequencing of the gaps, final assembly

Page 11: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Using a knowledge database for cluster annotation

Bioinformatics Research: Transcriptome

Page 12: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Transcriptome

Automatic annotation of internal nodes

Page 13: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Zoom in and get all known evidences

Bioinformatics Research: Transcriptome

Page 14: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

An exemple using Flybase ontology

thorax vs head

thorax vs whole body

Head vs whole body

Bioinformatics Research: Transcriptome

Page 15: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Co-localisation and co-expression ?

Bioinformatics Research: Transcriptome

Page 16: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Co-localisation and co-expression ?

Ch I

Ch IV

Bioinformatics Research: Transcriptome

Page 17: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular Networks

Bio Math/Phys Info

600 Yeast proteins(over 2100 proteins and 4500 interactions) 29 cellular roles (according to YPD annotations)

Cellular Polarity

Cytokinesis

DNA Synthesis

RNAMaturatio

n

Nucleo-cytoplasm

icTransport

ProDistIn: functional classification and prediction using graph-based distances

Brun et al. (2003). J Struct Funct Genomics 3: 213-24.

Page 18: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular NetworksGIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (1)

Regulatory graph Regulatory graph

Dynamical graphDynamical graph

Model refinementsModel refinements

SIMULATION MODULE

GRAPH ANALYSER

USER INTERFACE

Java classes

Chaouiya et al. (2003). Lect Notes Control Info Sci 294: 119-126.

Remy et al. (2003) Bioinformatics 19 Suppl 2:ii172-8.Sánchez & Thieffry (2003). J theor Biol 224: 517-37.

Thieffry & Sánchez (2002). Ann N Y Acad Sci. 981: 135-53.

Page 19: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular NetworksGIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (2)

Page 20: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular Networks

GIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (3)

Page 21: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular Networks

GIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (4)

Page 22: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics Research: Molecular NetworksGINML: Gene-Interaction Network Modelling

Language

http://www.esil.univ-mrs.fr/~chaouiya/Recherche/GINML/

Page 23: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Annotation of RNA genes and RNA motifs

Based on the Erpin program, a central resource for non-coding RNA annotation:

tagc.univ-mrs.fr/erpin

Bioinformatics Research: Nucleic Patterns

Gautheret & Lambert (2001). J Mol Biol 313(5): 1003-11.

Lambert et al. (2002). Biochimie 84(9): 953-9

Legendre & Gautheret (2003). BMC Genomics 4(1): 7.

Page 24: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Abi-Rached et al. (2002).Nat Genet 31(1): 100-5.

Evidence of en bloc duplication in vertebrate genomes

Phylogenix start-up :Development of a genome annotation plate-

form using phylogenetic information

Bioinformatics Research: Phylogenomics

Page 25: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Evidence of en bloc duplication in primate genomes

Bioinformatics Research: Phylogenomics

Courseaux et al. (2003 ) Genome Res 13: 369-81.

Courseaux & Nahon (2001 ). Science 291: 1293-7.

Page 26: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Analysis SoftwareABC

Available Novembre 2003 (moving to new site)Object: Data mining of DNA chip data using Knowledge databases (ontologies)Link w/Genopole: transcriptome analysis

BZSCANhttp://tagc.univ-mrs.frObject: Automatic quantification of DNA microarray imagesLink w/Genopole: transcriptome analysis and cancerology

Erpinhttp://tagc.univ-mrs.fr/pub/erpin/Object: homology-based non-coding RNA identificationLink w/ Genopole: Developed through collaboration between TAGC & CPT

ESTAhttp://ir2lcb.cnrs-mrs.frObject: Search for proteins coding regions in organisms poorly represented in databases Link w/ Genopole: Developed by LCB

ESTparserhttp://tagc.univ-mrs.fr/bioinfo/ESTparser/Object: Annotation/analysis of polyadenylation sites in human genesLink w/Genopole: Transcriptome analysis

FSEDhttp://ir2lcb.cnrs-mrs.frObject: Search for sequencing errors due to changes in reading framesLink w/ Genopole: Developed by LCB

GelPrinthttp://igs-server.cnrs-mrs.fr/Object: Display of proteomic dataLink w/ Genopole: Developed at the IGS

GeneANOVAAvailable on request to [email protected]: ANOVA-based software devoted to the analysis of gene expression dataLink w/Genopole: collaboration with Evry Genopole

GIN-simSoon available at http://gin.univ-mrs.frObject: Qualitative dynamical simulation of molecular and genetic regulatory networksLink w/Genopole: developed at the LGPD in collaboration with the IML and the LIF

QualipartAvailable on request to [email protected]: Evaluation of the quality of a partitionLink w/ Genopole: Developed through collaboration between IML and LIF

QualitreeAvailable on request to [email protected]: Evaluation of the quality of a treeLink w/ Genopole: Developed through collaboration between IML and LIF

RECTSAhttp://ir2lcb.cnrs-mrs.fr/Object: search for coding and non coding regions in large genome regionsLink w/ Genopole: Developed by LCB

PhydBachttp://igs-server.cnrs-mrs.fr/Object: Functional predictionLink w/ Genopole: Developed at the IGS

RNAmotif http://www.scripps.edu/case/casegr-sh-2.5.htmlObject: descriptor-based non-coding RNA identificationLink w/ Genopole: none. Collaboration with US groups.

SATPhttp://igs-server.cnrs-mrs.fr/Object: Statistical analysis of transcription profilesLink w/ Genopole: Developed at the IGS

SamBahttp://igs-server.cnrs-mrs.fr/Object: Optimal design of large experiments using incomplete factorial analysisLink w/ Genopole: Developed at the IGS

SelfIDhttp://igs-server.cnrs-mrs.fr/Object: Automated bacterial gene finderLink w/Genopole: developed at the IGS

SequamAvailable Novembre 2003 (moving to new site)Object: Design of primers and probesLink w/Genopole: bioinformatics for microbiology.

Tcoffeehttp://igs-server.cnrs-mrs.fr/Tcoffee/Object: A Tool For Multiple Sequence AlignmentsLink w/ Genopole: Developed at the IGS

Page 27: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

DatabasesABCdb

http://ir2lcb.cnrs-mrs.fr/ABCdb/Object: a database for the identification and reconstruction of ABC transporters in completely sequenced bacterial genomesLink w/Genopole: collaboration between LCB and ARC

ALLONTO.dbAvailable Novembre 2003 (moving to new site)Object: Ontologies for transcriptome analysisLink w/Genopole: transcriptome analysis

BIGShttp://igs-server.cnrs-mrs.fr/ Object: Database of the targets of IGS structural genomics, node of the world-wide TargetDB networkLink w/Genopole: developed at the IGS

CAZyhttp://afmb.cnrs-mrs.fr/CAZY/Object: Description of the families of structurally-related carbohydrate-binding modules of enzymes that degrade, modify, or create glycosidic bondsLink w/Genopole: developed at the AFMB

FusionDBhttp://igs-server.cnrs-mrs.fr/FusionDB/Object: database of bacterial and archaeal gene fusion events - also known as Rosetta stonesLink w/Genopole: developed at the IGS

GIN-dbSoon available at http://gin.univ-mrs.frObject: Interaction of molecular and genetic interaction and regulatory dataLink w/Genopole: developed at the LGPD in collaboration with the LIF

Pa2Cdbhttp://ir2lcb.cnrs-mrs.fr/ABCdb/Object: a database for the identification and reconstruction of two component systems in Pseudomonas aeruginosaLink w/Genopole: developed by LCB

RICBASEhttp://igs-server.cnrs-mrs.fr/Object: Rickettsia comparative genomics databaseLink w/Genopole: developed at the IGS

rRNA.db

Available on request. On the web early 2004Object: 100 000 rRNA and ITS sequences, aligned and analyzed by phylogeny, for phylogeny and DNA chip conception.Link w/Genopole: bioinformatics for microbiology.

Tropheryma whipplei genome databasehttp://igs-server.cnrs-mrs.fr/mgdb/Tropheryma/Link w/Genopole: developed at the IGS

Page 28: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Selected Publications1. Beaudoing E, Gautheret D. (2001). Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST

data. Genome Res 9: 1520-26.2. Brazma A, Hingamp P et al (2001) Minimum Information About a Microarray Experiment - MIAME - towards Standards for Microarray

Data. Nature Genet 4: 365-71. 3. Brun C, Guenoche A, Jacq B (2003). Approach of the functional evolution of duplicated genes in Saccharomyces cerevisiae using a new

classification method based on protein-protein interaction data. J Struct Funct Genomics 3: 213-24. 4. Remy E, Mosse B, Chaouiya C, Thieffry D (2003). Discrete dynamics of regulatory feedback circuits. Bioinformatics 19 (supp 2): ii172-8.5. Claverie JM, Raoult D (2001). Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293: 2093-8. 6. Claverie JM, Ogata H (2003). The insertion of palindromic repeats in the evolution of proteins. Trends Biochem Sci 28: 75-80. 7. Daborn PJ, Yen L, Bogwitz M, LeGoff G, Feil E, Jeffers S, Tijet N, Perry T, Heckel D, Batterham P, Feyereisen R, Wilson T,

ffrench-Constant RH (2002). A single P450 allele associated with insecticide resistance in Drosophila. Science 297: 2253-6.8. Gautheret D, Lambert A (2001). Direct RNA definition and identification from multiple sequence alignments using secondary structure

profiles. J Mol Biol 313: 1003-11.9. Henrissat B, Coutinho PM. (2001) Classification of glycoside hydrolases and glycosyltransferases from hyperthermophiles. Methods

Enzymol 330: 183-201

10. Henrissat B, Coutinho PM, Davies GJ. (2001) A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana. Plant Mol Biol 47: 55-72

11. Henrissat B, Deleury E, Coutinho PM. (2002) Glycogen metabolism loss: a common marker of parasitic behaviour in bacteria? Trends Genet 18: 437-40

12. Legendre M, Gautheret D (2003). Sequence determinants in human polyadenylation site selection. BMC Genomics 4: 7.13. Lescure A, Gautheret D, Krol A (2002). Novel selenoproteins identified from genomic sequence data. Methods Enzymol 347: 57-70.14. Megy K, Audic S, Claverie JM (2003). Positional clustering of differentially expressed genes on human chromosomes 20, 21 and 22.

Genome Biol 4 (2):P1.15. Ogata H, Audic S, Abergel C, Fournier PE, Claverie JM (2002). Protein coding palindromes are a unique but recurrent feature in

Rickettsia. Genome Res 12: 808-16. 16. Quentin Y, Chabalier J, Fichant G. (2002). Strategies for the identification, the assembly and the classification of integrated biological

systems in completely sequenced genomes. Comput Chem 26(5): 447-57.17. Renesto P, Crapoulet N, Ogata H, La Scola B, Vestris G, Claverie JM, Raoult D (2003). Genome-based design of a cell-free culture

medium for Tropheryma whipplei. Lancet 362: 447-9. 18. Solano PJ, Mugat B, Martin D, Girard F, Huibant JM, Ferraz C, Jacq B, Demaille J, Maschat F. (2003). Genome-wide identification of in

vivo Drosophila Engrailed-binding DNA fragments and related target genes. Development 130: 1243-54.19. Thieffry D, Sánchez L (2002). Alternative epigenetic states understood in terms of specific regulatory structures. An NY Acad Sci 981:

135-53.20. Thieffry D, Sánchez L (2003). Dynamical modelling of pattern formation during embryonic development. Curr Opin in Genet Dev 13:

326-30.

Page 29: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Valorisation

Ipsogen (founded in 1999)Conception of DNA arraysELOGE: Sofware environment for functional genomics (DNA chip data processing, identification of transcriptional signatures for diagnosis)

IGS laboratory (J-M. Claverie)Aventis-CNRS joint venture in structural genomics

Phylogenics (founded in 2002)Software platform for the annotation of genomes based on genomic comparisons and phylogeny

Page 30: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &

Bioinformatics at the Marseille-Nice Genopole

ConclusionsThe genopole has played a crucial role in the

development of interdisciplinary researchDespite limited specific and direct financing, consequent

results in terms of software produces, international publications and (inter)national funding

ProspectsExpansion of the teaching plate-form to facilitate the use

of computational biology tools by all interested research teams (various kinds of access, professional

courses) and gain more global visibility

Page 31: A consortium to accelerate the applications of high-throughput genome analysis,functional genomics in Immunology, Developmental Biology, Microbiology &