GUS
We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and gene expression data from a large number of heterogeneous sources. User-friendly web interfaces present slices of the GUS database and allow researchers to execute structured queries for information concerning gene structure, function, and expression. Please visit poster #146A for details of the Genomics Unified Schema (GUS).
GUS Supports Multiple ProjectsAllGenes
PlasmoDB
EPConDB
Allgenes is based on a comprehensive mouse and human gene index. The genes are approximated by transcripts predicted from EST and mRNA clustering
PlasmoDB is the official database of the Plasmodium falciparum genome project which provides an integrated view of genome sequence data including expression data from EST, SAGE, and microarray projects
EPConDB is an index of genes expressed in endocrine pancreas. Expression is defined either through microarray experiments or sequence annotation.
"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have
been localized to mouse chromosome 5?"
http://www.allgenes.org/
Data Integration Data Analysis Tools•RHMap•GOFunction•Sequence
•GOFunction assigments
•Boolean function•History function•BLAST
This query illustrates several aspects of the GUS database including:
allgenes.org query
Select the allgenes.org boolean query page
Click on the "AND" button
Choose the RH map and GO function queries
Select mouse chromosome 5 and "transcription factor"
There are 22 mouse RNAs (assemblies) that meet these criteria:
This query result set now appears on the query "history" page:
Now use the BLAST page to identify RNAs similar to my cDNA
The results of the BLAST search appear in the query history
Intersect ("AND") the BLAST search with the previous query:
And we have our answer (the third row on the query history page):
Predicted GO function(s)(some manually reviewed)
predicted protein CAP4 assembly EST expression profile UCSC BLAT
Other transcripts fromthe same gene
External links
Mapping information
Protein/motif hits
Gene trap insertions,etc.
"List all genes whose proteins are predicted to contain a signal peptide and for which there is
evidence that they are expressed in Plasmodium falciparum's late schizont stage."
http://plasmodb.org/
Data Integration Data Analysis Tools•Predicted genome translation•Microarray expression
•Spot intensity •History function
This query illustrates several aspects of the GUS database including:
PlasmoDB query:
Select Text Search from the PlasmoDB homepage
Choose signal peptide
Choose chromosome and Gene/prediction type-submit
There are 1952 genes with predicted signal peptides
Choose gene expression-microarray from the homepage
Then choose an experiment, chromosome, and Gene/prediction type - submit
There are 12170 gene predictions that satisfy this query
Go to the history page and choose which simple queries to combine. Select intersect.
We have an answer. There are 949 predicted genes that satisfy our complex query
Click on a gene to get a full report
There is a variety of information available from the report page including:
Gene models predicted using a variety of approaches
and mRNA and protein predictions
"Which DOTS assemblies (RNA) represented on the Endocrine Pancreas Consortium’s chip 2.0 are constituents of the insulin initiated signal transduction pathway ?"
EPConDB query:
http://www.cbil.upenn.edu/EPConDB
Data Integration Data Analysis Tools•Sequence•Microarray experiment•Transduction pathway
•BLAST •History function
Go to the gene information query page and click on “DOTS assemblies involved in a pathway”
Choose the insulin pathway, a p-value, pancreas, the species, and whether an assembly must include an mRNA - submit
There are 59 dots assemblies that are constituents of the insulin pathway
Return to the gene information query page and select clones sets. Choose chip 2.0 - submit
There are 3242 assemblies represented on chip 2
Go to the history page, select the queries to combine and select intersect – view the results
There are 8 assemblies that satisfy the complex query. Clicking on an RNA retrieves an allgenes report.
Acknowledgements and References
The Plasmodium Genome Consortium Sanger http://www.sanger.ac.uk/Projects/P_falciparum TIRG/NMRC http://www.tigr.org/tdb/edb2/pfal/htmls Stanford http://sequence-www.stanford.edu/group/malaria/The many researchers who have contributed data and software to the database
Funding Agencies National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etcThe research community who has supported these large-scale ventures for the benefit of all
References1. K2/Kleisli and GUS: Experiments in integrated access to genomic data sources (2001) Davidson, S.B., J. Crabtree, B.P. Brunk, J. Schug, V.Tannen, G.C. Overton and C.J. Stoeckert, Jr. IBM Systems Journal 40(2):1-202. A relational schema for both array-based and SAGE gene expression experiments (2001) Stoeckert, C., A. Pizarro, E. Manduchi, M. Gibson, B. Brunk, J. Crabtree, J. Schug, S. Shen-Orr and G.C. Overton. Bioinformatics 17(4):300-3083. The GUS schema is available at http://www.allgenes.org/cgi-bin/schemaBrowser.pl4.The RAD schema is available at http://www.cbil.upenn.edu/cgi-bin/RAD2/schemaBrowserRAD.pl
Funding:Acknowledgements
National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etc
EPConDB is part of the NIDDK-sponsored consortium on "Functional Genomics of the Developing Endocrine Pancreas". We gratefully acknowledge support through NIDDK 56947 and 56954 with cosponsorship from the JDFI.
Funding for allgenes.org is provided by NIH grant RO1-HG-01539-03 and DOE grant DE-FG02-00ER62893
allgenes
.org
References
Bahl, A., Brunk, B., Coppel, R.L., Crabtree, J., Diskin, S.J., Fraunholz, M.J., Grant, G.R., Gupta, D., Huestis, R.L., Kissinger, J.C., Labo, P., Li, L., McWeeney, S.K., Milgram, A.J., Roos, D.S., Schug, J., Stoeckert, C.J. (2002) PlasmoDB: The Plasmodium Genome Resource. An integrated database providing tools for accessing and analyzing mapping, expression and sequence data (both finished and unfinished). Nucleic Acids Res. 2002 30: 87-90
Davidson, S.B., Crabtree, J., Brunk, Brian P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J. Jr. (2001) K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM Systems Journal: 40(2), p. 512-531.
Scearce, L. Marie, Brestelli, John E., McWeeney, Shannon K., Lee, Catherine S., Mazzarelli, Joan, Pinney, Deborah F., Pizarro, Angel, Stoeckert, C. J. Jr., Clifton, Sandra, Permutt, M. Alan, Brown, Juliana, Melton, Douglas A., Kaestner, Klaus H. (2002) Functional Genomics of the Endocrine Pancreas: The Pancreas Clone Set and PancChip, New Resources for Diabetes Research Diabetes 51: 1997-2004, 2002.
The Plasmodium Genome Database Collaborative (2001) PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. Nucleic Acids Res., 2001, Vol. 29, No. 1 66-69
Top Related