Network Services for Biologists in the Genome Era
description
Transcript of Network Services for Biologists in the Genome Era
Network Services for Biologists in the Genome Era
The Work of the European Bioinformatics Institute
Our Genometcaattctga tcgaataaac gaatttacat atttggtaag ttttggccaa tttcgtagca 60 atatgatgaa attgcgctct tttttaggaa tatcaaattg gaatataaca aaaaaaaaac 120 tgaaactaac caactgaatc taatgtgcat tttaaataat aaaaatggat cattttatac 180 atcatattaa aattaaaaaa atttcataaa aataatacgt agtaaaaaat aaaaattttt 240
aacataaata aannnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 300
MTERENNVYK AKLAEQAERY DEMVEAMKKV ASMDVELTVE ERNLLSVAYK NVIGARRASW RIITSIEQKE ENKGAEEKLE MIKTYRGQVE KELRDICSDI LNVLEKHLIP CATSGESKVF YYKMKGDYHR YLAEFATGSD RKDAAENSLI AYKAASDIAM NDLPPTHPIR LGLALNFSVF YYEILNSPDR ACRLAKAAFD DAIAELDTLS EESYKDSTLI MQLLRDNLTL WTSDMQAEDP NAGDGEPKEQ IQDVEDQDVS
Chr.2234566830 basepairs
estimated 50,000-100,000 genes (3286 Mbases) 2/3 of which are
completed and in the public domain.
...others...
From: Genome MOT at the EBI (April 2000)
Data growth (EMBL DB)
Activity Areas at EBI• EMBL
– Archiving, development and distribution of DNA sequence data.
• Swiss-Prot– Archiving, production, development and distribution of Protein sequence data.
• MSD– Archiving and distribution of macromolecular structural data and structure prediction applications.
• DALI– Archiving and distribution of 2D/3D prediction databases and tools for their usage.
• ENSEMBL– Archiving, automatic analysis and distribution of Human genome data.
• CGG– Genome annotation, data mining, methabolic pathways research.
• CORBA– Design and implementation of CORBA-based tools for database querying
• SRS– Development and maintenance of SRS in collab.with Lionbiosciences.
• Industry– Links to industry and customised R&D (e.g. Gene Expression).
• External Services– Development and deployment of on-line interactive and non-interactive tools for sequence analysis.
EBI’s Network Serviceshttp://www.ebi.ac.uk/Tools/
Type Interactive Non-Interactive
Search andRetrieve
SRS (over 100databases),
CORBA Tools
(Various frontends) [email protected]
Comparison Nucleotide andprotein sequence
searches
Fasta3, WU-Blast2, NCBI
Blast2
Blitz: (S&W)Bic_sw, MPsrch,
Scanps
Fasta3BlitzBlast
On-line analysis
Ensembl, Interproand advanced
Motiv andFingerprintsearches.
Structure andGene Prediction
Sequence Aligning(clustalw_mp)
Corba Tools [email protected]
Archivedistribution ftp CD
Data Submission WEBIN [email protected]
Support [email protected]
Our common user interface
srs.ebi.ac.ukSequence Retrieval SystemSequence Retrieval System
Core text search andretrieval engine for mostservices offered from EBI.
Updates and links togethermore than 100 databanks.
Biggest SRS server in theworld (over 130 databases).
Genome & Proteomes
Currently more that 30 complete genomes and proteomesare available interactively to the user community and demand for data from the Human genome is being met by providing access the all the material available in the EBI databases.
GPCRDB
A recent initiative: Ensembl
Bringing discovery to the scientific community ...
The Community
• EMBnet - European Molecular Biology network.
• Formed (officially) in 1988 to disseminate up-to-date molecular biology databases within member states.
• The initiative for the creation of EMBnet was started by EMBO council members in collaboration with EMBL staff in 1986.
Dissemination of EBI data resources to the world through the EMBnet
An EMBnet node (1/2)
• Hosted by a national academic centre.• Has national coverage over the Internet.• Provides services to academics as well
as industry (ca. 2000 users per node).• Maintains local copies of the mayor
biological databases and sequence analysis packages.
An EMBnet node (2/2)
• Provides training and education in the national language.
• Each node typically employs 2-3 staff.• Each node has at least one major
interactive login server and a WWW and ftp server (ca. 300 hosts today).
EMBnet organisation and main tasks ano 2K
E du ca tio n & T ra in ing R e se a rch & D ev e lo p m e ntT ec h n ica l M an ag er
P ub lic re la tionsE M B n e t ne ws
E xec utiv e B oa rd
EMBnet membership
EMBnet Milestones (1/2)
• Development of network based tools for database updates:– 1987 - First transaction of sequence data between EMBL in
Heidelberg and InfoBiogen in Paris.– 1991 - First implementation of the HASSLE protocol between
Norway, Switzerland and Italy. Asynchronous sequence database updates where then possible.
– 1993 - First implementation of xNDT between Norway and Sweden. Asynchronous sequence database updates…another solution.
– 1994 - First implementation of SynChron which runs from EBI today on several industrial sites.
EMBnet Milestones (2/2)• Financial
– 1990 - The European Union grants support for EMBnet for the first time.
• Organisation– 1994 - The Stichting EMBnet is formed granting EMBnet
independence from any mayor member (e.g. EMBL).• Software
– 1996 - SRS development was partly financed by EMBnet.– 1998 - EMBOSS (under a GPL) is developed by EMBnet
members.
Latest News
• New MPsrch - Fastest Smith & Waterman searches in the world (1.6 billion cell updates/sec) …available soon.
• Ensembl - fast delivery of newly predicted human genes and gene products into the public domain and access to similarity and homology searches on up-to-date data sets.
• Pre-calculated proteomic comparisons of genomes through InterPro.
• EST clustering, clean-up and redundancy reduction via the EuroGene Index.
Some facts and figures…(1/2)
• EMBL-EBI is the main provider of biology related sequence databases in Europe.– Sequence Databases (EMBL, TrEMBL, Ensembl
(The Human Genome), etc.)– Cartographic Databases (RHdb)– Mutation Databases (HGBASE)– 3D/2D Structure Databases (PDB, DSSP, etc.)
Some facts and figures….(2/2)
• EMBL-EBI produces more than 50 biological databases.
• EMBL-EBI handles ca. 100K request/day on www.ebi.ac.uk and 170K requests/day on srs.ebi.ac.uk. (8M req./month) increasing at a rate of 15%/month.
• EMBL-EBI is moving more than 200Gb of data across the European networks each month.
Main usage is...
• Sequence querying and retrieval.• Sequence comparison and searching.• File distribution through ftp.ebi.ac.uk.• Replication of data at many international
sites (e.g. EMBnet nodes).• Systematic use of e-mail based
services.
Contacts• EMBnet: www.embnet.org• EBI: www.ebi.ac.uk, corba.ebi.ac.uk, msd.ebi.ac.uk,
fly.ebi.ac.uk, industry.ebi.ac.uk, interpro.ebi.ac.uk, etc.• Ensembl: www.ensembl.org• EMBL: www.embl-heidelberg.de• [email protected]@ebi.ac.uk