A consortium to accelerate the applications of high-throughput genome analysis,functional genomics...
-
date post
15-Jan-2016 -
Category
Documents
-
view
220 -
download
1
Transcript of A consortium to accelerate the applications of high-throughput genome analysis,functional genomics...
A consortium to accelerate A consortium to accelerate the applications of high-throughput the applications of high-throughput
genome analysis,functional genomics genome analysis,functional genomics in in
Immunology, Developmental Biology, Immunology, Developmental Biology, MicrobiologyMicrobiology && Human PathologyHuman Pathology
Marseille-Nice GenopoleMarseille-Nice Genopole
Denis THIEFFRY & Richard CHRISTEN
1. Bioinformatics
2. Transcriptome studies2. Transcriptome studies
3 Functional exploration in 3 Functional exploration in vertebratesvertebrates
4 Functional exploration in4 Functional exploration in non-vertebrate non-vertebratess
5. Genome variations5. Genome variations
6. Cancer genomics6. Cancer genomics
7. Microbial sequencing7. Microbial sequencing
8. Structural genomics8. Structural genomics
9. Proteomics9. Proteomics1O. Teaching of bioinformatics
Scientific Actions
Bioinformatics
Strategy and instruments:-Bi-monthly interdisciplinary seminars + summer
schools and workshops-Development of a teaching platform dedicated to
computational biology (Marseille + Nice)
Aim: Developing inter-disciplinary research in relation with
genomics, transcriptomics & proteomics
Bioinformatics Teaching Plateform
Dedicated teaching rooms each with 20 terminals (Marseille + Nice)
Powerful dedicated servers with main molecular databases and
bioinformatic suites (EMBOSS, SRS, BLAST...)
Development of computational biology curricula: Licence, DESS, DEA, Engineering degrees, Research and Professional Masters,
Doctorate
Bioinformatics: Main Research Areas
1.Transcriptome: integration and analysis of expression data; DNA array conception
2.Pattern discovery and search in nucleic acid sequences
3.Integration and modelling of functional macromolecular system data; microbial genome annotation; databases
4.Computational analysis of genetic regulatory networks
5.Phylogeny
Bioinformatics Research: Main laboratoriesLaboratories Permanent staff involved
Architecture et Formation des Macromolécules Biologiques(AFMB)
B. Henrissat, P. Coutinho, B. Canard, M. Tegoni
Centr e d' I mmunologie de Marseille-Luminy (CIM L) P. Ferrier, J . Ewbank
Centr e de Physique Théorique (CPT) P. Chiappett a, A. Lambert , R. Lima
I nf ormation Génomique et Structurale (I GS)J -M. Claverie, C. NotreDame, Ph. Derreumaux, H.Ogata, S. Audic, C Abergel, K. Suhre
I nstitut de Mathématiques de Luminy (IM L) A. Guénoche, B. Mosse, E. Remy, B. Ghatt as
I nstitut de Pharmacologie Moléculaire et Cellulaire (IM PC) P. Barbry, H. Prieto
Laboratoire d’Analyse, Topologie et Probabilités (LAT P) B. Torresani, M-C. Roubaud, E. Pardoux
Laboratoire de Biologie Virtuelle (LBV) R. Christen, C. Pasquier
Laboratoire de Chimie Bactérienne (LCB) G. Fichant, Y. Quentin
Laboratoire de Génétique et Physiologie du Développement(LGPD)
D. Thieff ry, C. Chaouiya, B. J acq, M. Piovant,L. Röder, P. Lemaire, T. Lecuit
Laboratoire d'I nf ormatique Fondamentale (LIF)C. Sabatier, Y. Vaxes, C. Capponi, H. Garrett a, V.Chepoi, M. van Caneghem
Laboratoire de Phylogénomique (LPG) P. Pontarotti, A. Gilles, C Brochier, N Pech
Unité des Rickettsies (UR) D. Raoult, M. Drancourt , J- P. Fournier, P. Renesto
Réactions des Organismes aux Stress de l'Environnement (ROSE) R. Feyereisen, E. Deleury
Te chniques Avancées pour le Génome et la Clinique (TAGC)D. Gautheret, P. Hingamp, C. N'Guyen, R. Houlgatt e
Bioénergétique et I ngénérie des Protéines (BI P) J -P. Belaich, H-P. Fierobe
Bio Math/Phys Info
Bioinformatics: Stimulation of Interdisciplinary Research
Interdisciplinary research projects Laboratories
In tegration and analysis DNA chip data TAGC, CPT, CIML, IM L, LATP
Software development f or DNA-chip analysis IMP C, LBV, ROSE, TAGC
Conception of DNA chip for microbiology LBV, IM L
Pattern discovery/ search in nucleic sequences TAGC, CPT, LGPD
Bioinformatics applied to microbiology I GS, UR
Structural bioinformatics for glycobiology andvirology
AFMB, BIP
Modelling of functional macromolecular systems LCB, LIF , IM L
Computational analysis of regulatory networks LGPD, IM L, LIF , CPT
Phylogenomics: comparative analysis of chordategenomes
LPG, LATP
Bio Math/Phys Info
CDD in Bioinformatics
Dispatching of Bioinformatics CDD in support of the transcriptome and proteome platforms
• Jean Fred FONTAINE (1/4/2002-31/08/2002) - Marseille (5 months)Development of a JAVA environment to process transcriptome data (normalisation, statistical analysis, classification...): interface; evolution toward a distributed scheme (CORBA).
• Pierre Fabrice LOPEZ (1/11/2002-30/4/2002) - Marseille (6 months)Development of a Java software for the automatic quantification of microarray data (Bzscan). Development of statistical tools for the normalisation and the analysis of DNA array data (Genesys), using statistical function of R and the distributed architecture CORBA. Data in the format XML with a complete compatibility with MIAME recommendations.
• David BOURGAIS (01/05/03-31/10/03) - Marseille (6 months)Integration data analysis software (ex: SEAQUEST) for the proteomics platform of Marseille.
• Kevin Le Brigand (01/05/2003-26/05/2004) - Nice (12 months)Development of the MEDIANTE application at Sophia Antipolis.
Complete genome sequence & annotation of a nannobacteriumRamlibacter tataouinensis
Bioinformatics Research: Microbial genome annotation
•Sandysoil bacteria (Tataouine,Tunisia)
Rod -shaped cell(peripheral differen-
tiationon agar)
Motile
Cellular Division
Dessiccation Resistance
Cyst-like cell
Ramlibacter tataouinensisRamlibacter tataouinensisRamlibacter tataouinensisTTB310TTB310TTB310
Motile rod
CystDifferentiation
signal ?
200 nm
200 nm
•Presents two morphotypes with unequivalent properties
( 200 nm)
( 800 nm)
Differentiation
signal ?
BirthJan.
1999
LEMiR/ UMR 163 CNRS-CEA, Cadarache
LBV Nice
Heulin et al. (2003). Int J Syst Evol Microbiol 53: 589-594.
Sequences assemblyfrom 52,000 to 70,000 sequences assembled
Genome project FXOA genoscope (grants from CNRS, MESR, CEA)
Dec.Sept.2002
Sept.Dec.
2002Dec.Apr.
2003Apr.
June2003
JuneDec.
2003
Preparation genome size estimation by Pulse Field Gel Electrophoresis (4 Mb)
search for financial supportCloning
purified DNA (phenol/chloroform) sent to the GENOSCOPE
Construction of four plasmidic banks
"Pairwise" sequencing around 35,000 clones sequenced
design of
Dec.Jan.
2004Beginning of the annotation
Complete genome sequence & annotation of Ramlibacter tataouinensis
Bioinformatics Research: Microbial genome annotation
Finishing , ..., final assembly primers and sequencing of the gaps, final assembly
Using a knowledge database for cluster annotation
Bioinformatics Research: Transcriptome
Bioinformatics Research: Transcriptome
Automatic annotation of internal nodes
Zoom in and get all known evidences
Bioinformatics Research: Transcriptome
An exemple using Flybase ontology
thorax vs head
thorax vs whole body
Head vs whole body
Bioinformatics Research: Transcriptome
Co-localisation and co-expression ?
Bioinformatics Research: Transcriptome
Co-localisation and co-expression ?
Ch I
Ch IV
Bioinformatics Research: Transcriptome
Bioinformatics Research: Molecular Networks
Bio Math/Phys Info
600 Yeast proteins(over 2100 proteins and 4500 interactions) 29 cellular roles (according to YPD annotations)
Cellular Polarity
Cytokinesis
DNA Synthesis
RNAMaturatio
n
Nucleo-cytoplasm
icTransport
ProDistIn: functional classification and prediction using graph-based distances
Brun et al. (2003). J Struct Funct Genomics 3: 213-24.
Bioinformatics Research: Molecular NetworksGIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (1)
Regulatory graph Regulatory graph
Dynamical graphDynamical graph
Model refinementsModel refinements
SIMULATION MODULE
GRAPH ANALYSER
USER INTERFACE
Java classes
Chaouiya et al. (2003). Lect Notes Control Info Sci 294: 119-126.
Remy et al. (2003) Bioinformatics 19 Suppl 2:ii172-8.Sánchez & Thieffry (2003). J theor Biol 224: 517-37.
Thieffry & Sánchez (2002). Ann N Y Acad Sci. 981: 135-53.
Bioinformatics Research: Molecular NetworksGIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (2)
Bioinformatics Research: Molecular Networks
GIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (3)
Bioinformatics Research: Molecular Networks
GIN-sim: Qualitative dynamical modelling, analysis and simulation of genetic regulatory networks (4)
Bioinformatics Research: Molecular NetworksGINML: Gene-Interaction Network Modelling
Language
http://www.esil.univ-mrs.fr/~chaouiya/Recherche/GINML/
Annotation of RNA genes and RNA motifs
Based on the Erpin program, a central resource for non-coding RNA annotation:
tagc.univ-mrs.fr/erpin
Bioinformatics Research: Nucleic Patterns
Gautheret & Lambert (2001). J Mol Biol 313(5): 1003-11.
Lambert et al. (2002). Biochimie 84(9): 953-9
Legendre & Gautheret (2003). BMC Genomics 4(1): 7.
Abi-Rached et al. (2002).Nat Genet 31(1): 100-5.
Evidence of en bloc duplication in vertebrate genomes
Phylogenix start-up :Development of a genome annotation plate-
form using phylogenetic information
Bioinformatics Research: Phylogenomics
Evidence of en bloc duplication in primate genomes
Bioinformatics Research: Phylogenomics
Courseaux et al. (2003 ) Genome Res 13: 369-81.
Courseaux & Nahon (2001 ). Science 291: 1293-7.
Analysis SoftwareABC
Available Novembre 2003 (moving to new site)Object: Data mining of DNA chip data using Knowledge databases (ontologies)Link w/Genopole: transcriptome analysis
BZSCANhttp://tagc.univ-mrs.frObject: Automatic quantification of DNA microarray imagesLink w/Genopole: transcriptome analysis and cancerology
Erpinhttp://tagc.univ-mrs.fr/pub/erpin/Object: homology-based non-coding RNA identificationLink w/ Genopole: Developed through collaboration between TAGC & CPT
ESTAhttp://ir2lcb.cnrs-mrs.frObject: Search for proteins coding regions in organisms poorly represented in databases Link w/ Genopole: Developed by LCB
ESTparserhttp://tagc.univ-mrs.fr/bioinfo/ESTparser/Object: Annotation/analysis of polyadenylation sites in human genesLink w/Genopole: Transcriptome analysis
FSEDhttp://ir2lcb.cnrs-mrs.frObject: Search for sequencing errors due to changes in reading framesLink w/ Genopole: Developed by LCB
GelPrinthttp://igs-server.cnrs-mrs.fr/Object: Display of proteomic dataLink w/ Genopole: Developed at the IGS
GeneANOVAAvailable on request to [email protected]: ANOVA-based software devoted to the analysis of gene expression dataLink w/Genopole: collaboration with Evry Genopole
GIN-simSoon available at http://gin.univ-mrs.frObject: Qualitative dynamical simulation of molecular and genetic regulatory networksLink w/Genopole: developed at the LGPD in collaboration with the IML and the LIF
QualipartAvailable on request to [email protected]: Evaluation of the quality of a partitionLink w/ Genopole: Developed through collaboration between IML and LIF
QualitreeAvailable on request to [email protected]: Evaluation of the quality of a treeLink w/ Genopole: Developed through collaboration between IML and LIF
RECTSAhttp://ir2lcb.cnrs-mrs.fr/Object: search for coding and non coding regions in large genome regionsLink w/ Genopole: Developed by LCB
PhydBachttp://igs-server.cnrs-mrs.fr/Object: Functional predictionLink w/ Genopole: Developed at the IGS
RNAmotif http://www.scripps.edu/case/casegr-sh-2.5.htmlObject: descriptor-based non-coding RNA identificationLink w/ Genopole: none. Collaboration with US groups.
SATPhttp://igs-server.cnrs-mrs.fr/Object: Statistical analysis of transcription profilesLink w/ Genopole: Developed at the IGS
SamBahttp://igs-server.cnrs-mrs.fr/Object: Optimal design of large experiments using incomplete factorial analysisLink w/ Genopole: Developed at the IGS
SelfIDhttp://igs-server.cnrs-mrs.fr/Object: Automated bacterial gene finderLink w/Genopole: developed at the IGS
SequamAvailable Novembre 2003 (moving to new site)Object: Design of primers and probesLink w/Genopole: bioinformatics for microbiology.
Tcoffeehttp://igs-server.cnrs-mrs.fr/Tcoffee/Object: A Tool For Multiple Sequence AlignmentsLink w/ Genopole: Developed at the IGS
DatabasesABCdb
http://ir2lcb.cnrs-mrs.fr/ABCdb/Object: a database for the identification and reconstruction of ABC transporters in completely sequenced bacterial genomesLink w/Genopole: collaboration between LCB and ARC
ALLONTO.dbAvailable Novembre 2003 (moving to new site)Object: Ontologies for transcriptome analysisLink w/Genopole: transcriptome analysis
BIGShttp://igs-server.cnrs-mrs.fr/ Object: Database of the targets of IGS structural genomics, node of the world-wide TargetDB networkLink w/Genopole: developed at the IGS
CAZyhttp://afmb.cnrs-mrs.fr/CAZY/Object: Description of the families of structurally-related carbohydrate-binding modules of enzymes that degrade, modify, or create glycosidic bondsLink w/Genopole: developed at the AFMB
FusionDBhttp://igs-server.cnrs-mrs.fr/FusionDB/Object: database of bacterial and archaeal gene fusion events - also known as Rosetta stonesLink w/Genopole: developed at the IGS
GIN-dbSoon available at http://gin.univ-mrs.frObject: Interaction of molecular and genetic interaction and regulatory dataLink w/Genopole: developed at the LGPD in collaboration with the LIF
Pa2Cdbhttp://ir2lcb.cnrs-mrs.fr/ABCdb/Object: a database for the identification and reconstruction of two component systems in Pseudomonas aeruginosaLink w/Genopole: developed by LCB
RICBASEhttp://igs-server.cnrs-mrs.fr/Object: Rickettsia comparative genomics databaseLink w/Genopole: developed at the IGS
rRNA.db
Available on request. On the web early 2004Object: 100 000 rRNA and ITS sequences, aligned and analyzed by phylogeny, for phylogeny and DNA chip conception.Link w/Genopole: bioinformatics for microbiology.
Tropheryma whipplei genome databasehttp://igs-server.cnrs-mrs.fr/mgdb/Tropheryma/Link w/Genopole: developed at the IGS
Selected Publications1. Beaudoing E, Gautheret D. (2001). Identification of alternate polyadenylation sites and analysis of their tissue distribution using EST
data. Genome Res 9: 1520-26.2. Brazma A, Hingamp P et al (2001) Minimum Information About a Microarray Experiment - MIAME - towards Standards for Microarray
Data. Nature Genet 4: 365-71. 3. Brun C, Guenoche A, Jacq B (2003). Approach of the functional evolution of duplicated genes in Saccharomyces cerevisiae using a new
classification method based on protein-protein interaction data. J Struct Funct Genomics 3: 213-24. 4. Remy E, Mosse B, Chaouiya C, Thieffry D (2003). Discrete dynamics of regulatory feedback circuits. Bioinformatics 19 (supp 2): ii172-8.5. Claverie JM, Raoult D (2001). Mechanisms of evolution in Rickettsia conorii and R. prowazekii. Science 293: 2093-8. 6. Claverie JM, Ogata H (2003). The insertion of palindromic repeats in the evolution of proteins. Trends Biochem Sci 28: 75-80. 7. Daborn PJ, Yen L, Bogwitz M, LeGoff G, Feil E, Jeffers S, Tijet N, Perry T, Heckel D, Batterham P, Feyereisen R, Wilson T,
ffrench-Constant RH (2002). A single P450 allele associated with insecticide resistance in Drosophila. Science 297: 2253-6.8. Gautheret D, Lambert A (2001). Direct RNA definition and identification from multiple sequence alignments using secondary structure
profiles. J Mol Biol 313: 1003-11.9. Henrissat B, Coutinho PM. (2001) Classification of glycoside hydrolases and glycosyltransferases from hyperthermophiles. Methods
Enzymol 330: 183-201
10. Henrissat B, Coutinho PM, Davies GJ. (2001) A census of carbohydrate-active enzymes in the genome of Arabidopsis thaliana. Plant Mol Biol 47: 55-72
11. Henrissat B, Deleury E, Coutinho PM. (2002) Glycogen metabolism loss: a common marker of parasitic behaviour in bacteria? Trends Genet 18: 437-40
12. Legendre M, Gautheret D (2003). Sequence determinants in human polyadenylation site selection. BMC Genomics 4: 7.13. Lescure A, Gautheret D, Krol A (2002). Novel selenoproteins identified from genomic sequence data. Methods Enzymol 347: 57-70.14. Megy K, Audic S, Claverie JM (2003). Positional clustering of differentially expressed genes on human chromosomes 20, 21 and 22.
Genome Biol 4 (2):P1.15. Ogata H, Audic S, Abergel C, Fournier PE, Claverie JM (2002). Protein coding palindromes are a unique but recurrent feature in
Rickettsia. Genome Res 12: 808-16. 16. Quentin Y, Chabalier J, Fichant G. (2002). Strategies for the identification, the assembly and the classification of integrated biological
systems in completely sequenced genomes. Comput Chem 26(5): 447-57.17. Renesto P, Crapoulet N, Ogata H, La Scola B, Vestris G, Claverie JM, Raoult D (2003). Genome-based design of a cell-free culture
medium for Tropheryma whipplei. Lancet 362: 447-9. 18. Solano PJ, Mugat B, Martin D, Girard F, Huibant JM, Ferraz C, Jacq B, Demaille J, Maschat F. (2003). Genome-wide identification of in
vivo Drosophila Engrailed-binding DNA fragments and related target genes. Development 130: 1243-54.19. Thieffry D, Sánchez L (2002). Alternative epigenetic states understood in terms of specific regulatory structures. An NY Acad Sci 981:
135-53.20. Thieffry D, Sánchez L (2003). Dynamical modelling of pattern formation during embryonic development. Curr Opin in Genet Dev 13:
326-30.
Valorisation
Ipsogen (founded in 1999)Conception of DNA arraysELOGE: Sofware environment for functional genomics (DNA chip data processing, identification of transcriptional signatures for diagnosis)
IGS laboratory (J-M. Claverie)Aventis-CNRS joint venture in structural genomics
Phylogenics (founded in 2002)Software platform for the annotation of genomes based on genomic comparisons and phylogeny
Bioinformatics at the Marseille-Nice Genopole
ConclusionsThe genopole has played a crucial role in the
development of interdisciplinary researchDespite limited specific and direct financing, consequent
results in terms of software produces, international publications and (inter)national funding
ProspectsExpansion of the teaching plate-form to facilitate the use
of computational biology tools by all interested research teams (various kinds of access, professional
courses) and gain more global visibility