Generic Database. What should a genome database do? Search Browse Collect Download results Multiple...
-
Upload
noel-matthew-douglas -
Category
Documents
-
view
220 -
download
0
Transcript of Generic Database. What should a genome database do? Search Browse Collect Download results Multiple...
Generic Database
What should a genome database do?
SearchBrowseCollect
Download resultsMultipleformat
GenomeBrowser
InformationGenomicProteomicliterature
Interactwith otherDatabase
Generic
Usable by everyone
GeneDB – An Overview
Aim – To provide a database to house the data from the many sequencingprojects that the Sanger Institute has been involved in. The database hadto be:
Generic, flexible enough to handle sequence from diverse organisms
Curatable, capable of being manually edited by annotators and curators
Intuitive and user friendly
Capable of housing new data types, easily expandable
Searchable, allow users complete flexibility in searching, selecting and downloading whatever information they want
Interactive, community feedback
Species Genome size Status Curated
Leishmania major 33600 In Finishing Yes
Leishmania infantum 33600 280k reads 5 X Yes
Trypanosoma b. brucei 35000 In Finishing Yes
Trypanosoma vivax 30000 300k reads ~6 X Yes
Trypanosoma cruzi ~41000 In Finishing 19 X No?
GeneDB November 2004 - Datasets www.genedb.org
Total number of organisms – 26
Number of protozoa - 12
Leishmania braziliensis ~33600 361k reads 5 X Yes
Trypanosoma congolense ~30000 262k reads ~5 X Yes
Trypanosoma b. gambiense ~30000 188k reads ~5 X Yes
Kinetoplastids
WWW.genedb.org
a) Basic information – on the selected gene
b) Location – The chromosome number, coordinates, gene length and a graphical map
c) Curated and/or automatic annotation
d) Predicted peptide propertiesstatistics on the predicted protein, known or predicted domains and motifs
e) Gene Ontology – Annotationusing the GO controlled vocabulary.
f) Database cross referencesare linked to other public databases
g) Curated orthologs – databaselinks to manually selected orthologous genes
h) Similarity information and the respective database links
i) Swiss-Prot annotations – for this protein and keywords
j) Contact – feedback forms forcurators and technical queries
Orthologs and Paralogues in GeneDB
Tri-tryp orthologsPredicted by clustering and Reciprocal BLAST
Paralogs or familiesPredicted using BLAST P and TribeMCL4 BLAST e value cutoffs
TribeMCL Enright A.J., Van Dongen S., Ouzounis C.A; Nucleic Acids Res. 30(7):1575-1584 (2002)
Help
(http://godatabase.org/cgi-bin/go.cgi?query=GO%3A0006166)
Sequence viewer and annotation tool
How to access data:
• keyword searching
• sequence searching/ motif search
• complex querying
• browsable catalogues, product, domain
• browsable contig/chromosome maps
• GO (gene ontology) - AmiGO
• across species
Searching GeneDB
Simple Query
Sequence searchanalysis
Browse Catologues
Chromosome/contig maps
Search multiple datasets over multiple organisms, Uses more than one BLAST algorithm if appropriate
Produces an intermediate results page, listing summary of the top 5hits of all searches
If protein sequence used will also display predicted Pfam proteinfamilies found
Access full BLAST search result from intermediate page
OMNIBLAST
Complex querying
Complex querying with boolean search tool
Cross species search for nucleoside transporter
By name or ID
By product
By protein domain
AmiGO – local Gene Ontology (GO) browser
Proteomics Tool
Select the datasetSelect restriction enzyme
Enter peptide mass data
Protein motif search
Data downloads Any search result that gives a list
History of any boolean queries
Contiguous sequence
Generate download list by adding to gene basket
Leishmania major Stats Trypanosoma brucei stats
Gene Naming
• GeneDB reference guide
• Papers:Trends in Parasitology, 2002 18 (10) 465-67January 2004 issue of Nucleic Acids Research
• Feed back forms for technical and biological queries
More information
• http://www.genedb.org/