GUS

27
GUS We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and gene expression data from a large number of heterogeneous sources. User-friendly web interfaces present slices of the GUS database and allow researchers to execute structured queries for information concerning gene structure, function, and expression. Please visit poster #146A for details of the Genomics Unified Schema (GUS).

description

GUS - PowerPoint PPT Presentation

Transcript of GUS

Page 1: GUS

GUS

We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and gene expression data from a large number of heterogeneous sources. User-friendly web interfaces present slices of the GUS database and allow researchers to execute structured queries for information concerning gene structure, function, and expression. Please visit poster #146A for details of the Genomics Unified Schema (GUS).

Page 2: GUS

GUS Supports Multiple ProjectsAllGenes

PlasmoDB

EPConDB

Allgenes is based on a comprehensive mouse and human gene index. The genes are approximated by transcripts predicted from EST and mRNA clustering

PlasmoDB is the official database of the Plasmodium falciparum genome project which provides an integrated view of genome sequence data including expression data from EST, SAGE, and microarray projects

EPConDB is an index of genes expressed in endocrine pancreas. Expression is defined either through microarray experiments or sequence annotation.

Page 3: GUS

"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have

been localized to mouse chromosome 5?"

http://www.allgenes.org/

Data Integration Data Analysis Tools•RHMap•GOFunction•Sequence

•GOFunction assigments

•Boolean function•History function•BLAST

This query illustrates several aspects of the GUS database including:

allgenes.org query

Page 4: GUS

Select the allgenes.org boolean query page

Click on the "AND" button

Page 5: GUS

Choose the RH map and GO function queries

Select mouse chromosome 5 and "transcription factor"

Page 6: GUS

There are 22 mouse RNAs (assemblies) that meet these criteria:

This query result set now appears on the query "history" page:

Page 7: GUS

Now use the BLAST page to identify RNAs similar to my cDNA

The results of the BLAST search appear in the query history

Page 8: GUS

Intersect ("AND") the BLAST search with the previous query:

And we have our answer (the third row on the query history page):

Page 9: GUS

Predicted GO function(s)(some manually reviewed)

predicted protein CAP4 assembly EST expression profile UCSC BLAT

Other transcripts fromthe same gene

External links

Mapping information

Protein/motif hits

Gene trap insertions,etc.

Page 10: GUS

"List all genes whose proteins are predicted to contain a signal peptide and for which there is

evidence that they are expressed in Plasmodium falciparum's late schizont stage."

http://plasmodb.org/

Data Integration Data Analysis Tools•Predicted genome translation•Microarray expression

•Spot intensity •History function

This query illustrates several aspects of the GUS database including:

PlasmoDB query:

Page 11: GUS

Select Text Search from the PlasmoDB homepage

Choose signal peptide

Page 12: GUS

Choose chromosome and Gene/prediction type-submit

There are 1952 genes with predicted signal peptides

Page 13: GUS

Choose gene expression-microarray from the homepage

Then choose an experiment, chromosome, and Gene/prediction type - submit

There are 12170 gene predictions that satisfy this query

Page 14: GUS

Go to the history page and choose which simple queries to combine. Select intersect.

We have an answer. There are 949 predicted genes that satisfy our complex query

Click on a gene to get a full report

Page 15: GUS

There is a variety of information available from the report page including:

Gene models predicted using a variety of approaches

and mRNA and protein predictions

Page 16: GUS

"Which DOTS assemblies (RNA) represented on the Endocrine Pancreas Consortium’s chip 2.0 are constituents of the insulin initiated signal transduction pathway ?"

EPConDB query:

http://www.cbil.upenn.edu/EPConDB

Data Integration Data Analysis Tools•Sequence•Microarray experiment•Transduction pathway

•BLAST •History function

Page 17: GUS
Page 18: GUS

Go to the gene information query page and click on “DOTS assemblies involved in a pathway”

Page 19: GUS

Choose the insulin pathway, a p-value, pancreas, the species, and whether an assembly must include an mRNA - submit

There are 59 dots assemblies that are constituents of the insulin pathway

Page 20: GUS

Return to the gene information query page and select clones sets. Choose chip 2.0 - submit

There are 3242 assemblies represented on chip 2

Page 21: GUS

Go to the history page, select the queries to combine and select intersect – view the results

There are 8 assemblies that satisfy the complex query. Clicking on an RNA retrieves an allgenes report.

Page 22: GUS
Page 23: GUS

Acknowledgements and References

The Plasmodium Genome Consortium Sanger http://www.sanger.ac.uk/Projects/P_falciparum TIRG/NMRC http://www.tigr.org/tdb/edb2/pfal/htmls Stanford http://sequence-www.stanford.edu/group/malaria/The many researchers who have contributed data and software to the database

Funding Agencies National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etcThe research community who has supported these large-scale ventures for the benefit of all

References1. K2/Kleisli and GUS: Experiments in integrated access to genomic data sources (2001) Davidson, S.B., J. Crabtree, B.P. Brunk, J. Schug, V.Tannen, G.C. Overton and C.J. Stoeckert, Jr. IBM Systems Journal 40(2):1-202. A relational schema for both array-based and SAGE gene expression experiments (2001) Stoeckert, C., A. Pizarro, E. Manduchi, M. Gibson, B. Brunk, J. Crabtree, J. Schug, S. Shen-Orr and G.C. Overton. Bioinformatics 17(4):300-3083. The GUS schema is available at http://www.allgenes.org/cgi-bin/schemaBrowser.pl4.The RAD schema is available at http://www.cbil.upenn.edu/cgi-bin/RAD2/schemaBrowserRAD.pl

Page 24: GUS
Page 25: GUS

Funding:Acknowledgements

National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etc

EPConDB is part of the NIDDK-sponsored consortium on "Functional Genomics of the Developing Endocrine Pancreas". We gratefully acknowledge support through NIDDK 56947 and 56954 with cosponsorship from the JDFI.

Funding for allgenes.org is provided by NIH grant RO1-HG-01539-03 and DOE grant DE-FG02-00ER62893

allgenes

.org

Page 26: GUS

References

Bahl, A., Brunk, B., Coppel, R.L., Crabtree, J., Diskin, S.J., Fraunholz, M.J., Grant, G.R., Gupta, D., Huestis, R.L., Kissinger, J.C., Labo, P., Li, L., McWeeney, S.K., Milgram, A.J., Roos, D.S., Schug, J., Stoeckert, C.J. (2002) PlasmoDB: The Plasmodium Genome Resource. An integrated database providing tools for accessing and analyzing mapping, expression and sequence data (both finished and unfinished). Nucleic Acids Res. 2002 30: 87-90

Davidson, S.B., Crabtree, J., Brunk, Brian P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J. Jr. (2001) K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM Systems Journal: 40(2), p. 512-531.

Scearce, L. Marie, Brestelli, John E., McWeeney, Shannon K., Lee, Catherine S., Mazzarelli, Joan, Pinney, Deborah F., Pizarro, Angel, Stoeckert, C. J. Jr., Clifton, Sandra, Permutt, M. Alan, Brown, Juliana, Melton, Douglas A., Kaestner, Klaus H. (2002) Functional Genomics of the Endocrine Pancreas: The Pancreas Clone Set and PancChip, New Resources for Diabetes Research Diabetes 51: 1997-2004, 2002.

The Plasmodium Genome Database Collaborative (2001) PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. Nucleic Acids Res., 2001, Vol. 29, No. 1 66-69

Page 27: GUS