The Saccharomyces cerevisiae genome on the world wide web

2
The Saccbaromyces e cerevlslae genome on the World Wide Web The 5accbaromyces ceretqsiae genome sequencing proiect provides an excellent illustration of how the World Wide Web (WWW} can be used to share current infor- mation among laboratories on a global scale. Users and Wg;,~,gt-page authors can benefit from this arrangement. We found that WWW-publishingdetails of sequenc- ing proiectst, deferred many potentially time consuming enquiries. In this aaicle we provide an over.'iewofthe most prominent sources of information that relate to the genome of 5. ce~et'istae. from sequtmdng to dalalmses The rmcleotide sequence databases, EMBL/Ref.2). GenBank (Ref. 3) and DDBJ (Ref. 4), remain the major entts.' point for annotated sequence data in to the public domain. Yeast genome project data were mostly submitted per ensmid as they were finished and later assembled into contigu- ous chromosomal sequences. The European consortium of approxi- mately 100 laboratories submitted their data for analysisto the ,rt, lartinsrledInstitute for protein Sequences (,'.lIPS) ~, the in- formaties co-ordinator for European Commission Genome Projects. Sequences sent to/~AtPS were password protected for a period of lime to allow commercial exploitationby the Yeast IndustryPlatform, before submission to the major sequence databases, All sequence data have now been released and the complete sequences are available by FTP (Refs 6-8). Data can be acceded through MIPS by chromosome in graphical (xChromo)or text mode. Each open reading frame (ORF)is annotated and linked to a FASTA alignment and a Yeast Protein Database (YPD) enns'. Other enid" points to the MIPS database are through queries of Protein Information Resource (PIR) or YPD protein databases. Aligned ORFs and translated sequences from the human EST(expressedsequence tag) data- base can also be viewed. Genome-Browser allows a user-conllgurahle graphieal view of genome redundant3,. The next goal will be to make a non- redundant compilation oftheprotein coding sequences annotated and classified into functionalcategories.Thiscould take ~ m e time and it might be that the specialized S. cerevisiaedatabases will have the anno- tation more quickly than the all-encom- passing protein databases, such as S\'¢]SS- PROTand the PIR-Intemational database. Amos Bairoch's SWISS-PROTprotein database provides an excellentlyannotated nonredundant collection of S. cereviskw protein sequences. The fungal division of SWISS@ROT species.specific documents9 contains indices to SW1SS-PROT records Box 1. Other resources The Sanger Centre Yeast Genome BLAST Server1. • A list of yeast researchers' email addresses is avaihble by FrP (gel'. 23) or searched, amended, downloaded or browsed by gopberZL Vectordb (Ref. 25) contains a subset of vectom for yeast research, The ATCCcollection of yeast strains and dones zr. * Two-dimensional gel protein datal~ases of 5. cerevisiae and the Quesi II soft- ware system for construction and analysis of two-dimensional gel protein databases z:. NIH Campus Yeast Interest Grrml~ and their scientific interests. • Methods and references to done genes on the basis of protein-protein inter- actions, from Roger Brent's laboratory at Massachusetts General Hospital--'9. Dan Gon~hling's laboratory at the University of Chicago 3° has a page devoted to yeast protocols. • Tile Gietz LaborarorySl produces die intrepidly tided 'Definitive Yeast I Transformation Home Page'. ] Details of yeast books published by Cold Spring Harbor Laboratory Press32. [ • The yeast molecular biology newsgruupSS is a forum for announcements and discussion.discussions PostingScan also beVarYsearched 34,35.frOm sub ime to ridicu ous. Archives of newsgroup J TIG JuLY 1996 VOL. 12 No. 7 C~,pyright ~: lt;c~,Ebe~ier ~:ience Ud MI ri.~ht, resee.ed 01~ff~-952"~ %:Sl'~) 276 Pit: SS168-9525~ ~¢+ ~)_79.8 by yeast chromosome and systematic ORF names. There is also an index by gene namel° with cross-references to the Saccbarong~res Geneme Database (SGD) and LIffI'A (see below) n. The PIR- Imemational protein database is a joint venture be~'een the National Biomedical Research Foundation (USA), the Inter- national Protein Information Database in Japan and ~,APS. PIRisalso a nonredundam armotated protein database.A subset of PIR S. cemvisfae records can be searched on ten fieldsand between ranges of molecular weight or number of residuesta. LISq'Ais a compilation of nudeotide sequences encoding proteins from the yeasts S. cemt,tsiae. 5. cadsbergiensis and S. uvarum. LISTAHON and LISTAHOP are the nucleotide or protein homolo~" sections of the LISTAdatabase, respec- tivelsx LISTA is available by FTP (Ref. ll) and is searchable with Sequence Retrieval System 13. The 'tea.st Protein Database provides a non-redundant listing of yeast proteins with emph~is on their physical and func- tional properties. In addition to tabular data. YPD contains key' phrases describ- ing functions, mutam phenoWpes and participation with other proteins in com- plexes.YPD is available through theWWW (Ref. 14) or by FIP (Ref. 15) as ASCII, formatted and Excel spreadsheet versions. GeneQuiz (l~e[ 16) is an automated analysis sS, stem that predicts biochemical function from protein sequences. Ananaly- sis of 6613 S. cemvisiae protein sequences took only 72 hours to perform on a power- ful Silicon Graphics computer and was named GeneCroneh ( Ref. 17).A fimctional assignmenttogether with a measureofcon- fidence in each prediction is presentedon the \'¢ \X~'W in tables generated by user queries. Mignments to database sequences can be viewed and database records retrle~,ed. The Sacebaromyces Genome Data- basers proiect is located in file Department of Genetics at the School of Medicine. Stanford Universi W. and is led IW David Botstein and J. Miehad Cherty. SGD pro- vides a wide range of facilitiesand is a com- prehensi~x: resoulcc for yeast researchers. SGD achieves a seamle~ interface to SacchDB(Ref. 19: an ACeDB-type dataha~ of S. cen,t'isiae information ), HTML docu- ments. Gopher, FTP and a BLAST see,'er. SaechDB contains gene and protein prod- uct annotations, results of the completed chromommal sequencing proiecls,nucleo- tide sequences contained within GenBank. literaturereferences and abstracts, physical maps and genetic n~ps, including the underlying tetrad data. Much of the fune- tinnali V of the SacchDB database system is retained. ,Searching is possible within a specified data class (e.g. Author or Locus), through all database tern entries, with tools forcomplex queries or by simplebrowsing. Browsing of physical and genetic maps is supported by clickable images.

Transcript of The Saccharomyces cerevisiae genome on the world wide web

Page 1: The Saccharomyces cerevisiae genome on the world wide web

The Saccbaromyces

e • cerevlslae genome on the World Wide Web The 5accbaromyces ceretqsiae genome sequencing proiect provides an excellent illustration of how the World Wide Web (WWW} can be used to share current infor- mation among laboratories on a global scale. Users and Wg;,~,gt-page authors can benefit from this arrangement. We found that WWW-publishing details of sequenc- ing proiects t, deferred many potentially time consuming enquiries. In this aaicle we provide an over.'iewofthe most prominent sources of information that relate to the genome of 5. ce~et'istae.

from sequtmdng to dalalmses The rmcleotide sequence databases,

EMBL/Ref. 2). GenBank (Ref. 3) and DDBJ (Ref. 4), remain the major entts.' point for

annotated sequence data in to the public domain. Yeast genome project data were mostly submitted per ensmid as they were finished and later assembled into contigu- ous chromosomal sequences.

The European consortium of approxi- mately 100 laboratories submitted their data for analysis to the ,rt, lartinsrled Institute for protein Sequences (,'.lIPS) ~, the in- formaties co-ordinator for European Commission Genome Projects. Sequences sent to/~AtPS were password protected for a period of lime to allow commercial exploitation by the Yeast Industry Platform, before submission to the major sequence databases, All sequence data have now been released and the complete sequences are available by FTP (Refs 6-8). Data can be acceded through MIPS by chromosome in graphical (xChromo) or text mode. Each open reading frame (ORF) is annotated and linked to a FASTA alignment and a Yeast Protein Database (YPD) enns'. Other enid" points to the MIPS database are through queries of Protein Information Resource (PIR) or YPD protein databases. Aligned ORFs and translated sequences from the human EST(expressed sequence tag) data- base can also be viewed. Genome-Browser allows a user-conllgurahle graphieal view of genome redundant3,.

The next goal will be to make a non- redundant compilation oftheprotein coding sequences annotated and classified into functional categories. This could take ~me time and it might be that the specialized S. cerevisiaedatabases will have the anno- tation more quickly than the all-encom- passing protein databases, such as S\'¢]SS- PROT and the PIR-Intemational database.

Amos Bairoch's SWISS-PROT protein database provides an excellently annotated nonredundant collection of S. cereviskw protein sequences. The fungal division of SWISS@ROT species.specific documents 9 contains indices to SW1SS-PROT records

Box 1. Other resources

• The Sanger Centre Yeast Genome BLAST Server 1. • A list of yeast researchers' email addresses is avaihble by FrP (gel'. 23) or

searched, amended, downloaded or browsed by gopberZL • Vectordb (Ref. 25) contains a subset of vectom for yeast research, • The ATCC collection of yeast strains and dones zr. * Two-dimensional gel protein datal~ases of 5. cerevisiae and the Quesi II soft-

ware system for construction and analysis of two-dimensional gel protein databases z:.

• NIH Campus Yeast Interest Grrml~ and their scientific interests. • Methods and references to done genes on the basis of protein-protein inter-

actions, from Roger Brent's laboratory at Massachusetts General Hospital --'9. • Dan Gon~hling's laboratory at the University of Chicago 3° has a page devoted

to yeast protocols. • Tile Gietz LaborarorySl produces die intrepidly tided 'Definitive Yeast I

Transformation Home Page'. ] Details of yeast books published by Cold Spring Harbor Laboratory Press32. [

• The yeast molecular biology newsgruupSS is a forum for announcements and

discussion.discussions PostingScan also beVarYsearched 34,35.frOm sub ime to ridicu ous. Archives of newsgroup

J

TIG JuLY 1996 VOL. 12 No. 7

C~,pyright ~: lt;c~, Ebe~ier ~:ience Ud MI ri.~ht, resee.ed 01~ff~-952"~ %:S l '~ ) 276 Pit: SS 168-9525~ ~¢+ ~)_79.8

by yeast chromosome and systematic ORF names. There is also an index by gene name l° with cross-references to the Saccbarong~res Geneme Database (SGD) and LIffI'A (see below) n. The PIR- Imemational protein database is a joint venture be~'een the National Biomedical Research Foundation (USA), the Inter- national Protein Information Database in Japan and ~,APS. PIRis also a nonredundam armotated protein database. A subset of PIR S. cemvisfae records can be searched on ten fields and between ranges of molecular weight or number of residuesta.

LISq'A is a compilation of nudeotide sequences encoding proteins from the yeasts S. cemt,tsiae. 5. cadsbergiensis and S. uvarum. LISTAHON and LISTAHOP are the nucleotide or protein homolo~" sections of the LISTA database, respec- tivelsx LISTA is available by FTP (Ref. ll) and is searchable with Sequence Retrieval System 13.

The 'tea.st Protein Database provides a non-redundant listing of yeast proteins with emph~is on their physical and func- tional properties. In addition to tabular data. YPD contains key' phrases describ- ing functions, mutam phenoWpes and participation with other proteins in com- plexes.YPD is available through theWWW (Ref. 14) or by FIP (Ref. 15) as ASCII, formatted and Excel spreadsheet versions.

GeneQuiz (l~e[ 16) is an automated analysis sS, stem that predicts biochemical function from protein sequences. An analy- sis of 6613 S. cemvisiae protein sequences took only 72 hours to perform on a power- ful Silicon Graphics computer and was named GeneCroneh ( Ref. 17). A fimctional assignment together with a measureofcon- fidence in each prediction is presented on the \'¢ \X~'W in tables generated by user queries. Mignments to database sequences can be viewed and database records retrle~,ed.

The Sacebaromyces Genome Data- basers proiect is located in file Department of Genetics at the School of Medicine. Stanford Universi W. and is led IW David Botstein and J. Miehad Cherty. SGD pro- vides a wide range of facilities and is a com- prehensi~x: resoulcc for yeast researchers. SGD achieves a seamle~ interface to SacchDB (Ref. 19: an ACeDB-type dataha~ of S. cen, t'isiae information ), HTML docu- ments. Gopher, FTP and a BLAST see,'er. SaechDB contains gene and protein prod- uct annotations, results of the completed chromommal sequencing proiecls, nucleo- tide sequences contained within GenBank. literature references and abstracts, physical maps and genetic n~ps, including the underlying tetrad data. Much of the fune- tinnali V of the SacchDB database system is retained. ,Searching is possible within a specified data class (e.g. Author or Locus), through all database tern entries, with tools forcomplex queries or by simple browsing. Browsing of physical and genetic maps is supported by clickable images.

Page 2: The Saccharomyces cerevisiae genome on the world wide web

G E N E T W O R K

SGD is responsible for assigning gene symbols for S. cemvisiae. The Gene Registry is available in text or tab-deliraited formats for loading into a spreadsheet. A form is available to register gene names, the actual choice of symbol is guided by 'an amalgam of consensus, literature usage, darity relative to function and priority in the literature'.

SGD allows S. cerev~fig..e GenBank and SWIS$-PROT sequences to be searched on text fields. BLAST or FASI'A searches of DNA or protein can Le performed by pasting sequence into a WWW form. Databases of S. cerevisiae nucleotide se- quences, completed chromosomes (genoSc) or a nonredundant protein database (NRSC) are available to search against. Other features of SC, D include information on the Olson-Riles physical map, aligned genetic and physical maps, conference announcements and news, codon usage tables, biology and molecular biology links, and a yeast Virtual Library. The SGD Virtual Library 20 contains links to many of the information resources listed here.

XREFdb (Ref. 21) is an initiative to cross-reference the proteins of model organ- isma with those of mammals. The project compares the predicted proteins deter- mined in genome projects with conceptual translations of cDNA sequences from the Expressed Sequence Tag (EST) database using the TBLASTX program. Iv addition, those ESTs that show high similarity to S. cerevJsiae proteins are mapped on to human and mouse. This generates links between putative homologucs that can then be tested for functional equivalence.

In this way, experimental models of human genetic diseases can be established. Exam- pies of pages devoted to groups, tech- niques, reagents and other miscellaneous topics are briefly summarized in Box 1.

Sean Walsh [email protected]

Bart B a r t e r [email protected]

The Sauger Centre, Hinxton Hall, Hi~zxton, Cambffdge, UK CBIO IRQ.

References a n d URIs 1 htxp://www.sanger.ac.uk/yeast/

home.html 2 http://www.embl-ebi.ac.uk/ 3 htt p://www.ncbi.nlm.nih.gov/ 4 hRp://www.nig.ac.jp/ 5 http://www.mips.biochem.mpg.de/ 6 ftp://fip.ebi.ac.uk/pub/databases/

yeast/ 7 ft p://mips.embnet,org/yeast/ 8 ftp://genome-ftp.stanford.edu/

yeast/genome_seq 9 http://expasy.hcuge.ch/sprot/

sp-docu.html 10 http://expasy.hcuge.elVcgi-bin/

lisrs?yeast+txt 11 ftp://ftp.ebi.ac.uk/pub/databases/

lisra/ 12 http://www.gdb.org/Dan/fields/pir/

species/saccharomycescerevisiae. html

13 http://www.sanger.ae.uk/srs/srse 14 http://quest7.proteome.com/

YPDhome.html

/ 5 ftp://isis.cshl.org/pub/yeast/YPD/ 16 http://genecrunch.sgi.ch/

GQdeseription.html 17 http://geneemneh.sgi.eh/ 18 http://genome-www.stanford.edu/

Saccharomyces/ 19 ftp://genome-ftp.staoford.edu/pub/

yeast/SaeehDB/ 20 http://genome-www.stanford.edu/

VL-yeast.hunl 21 Reeves, R., Goebl, M. and

Hierer. P. (1995) Trends C, enef. 11, 372-373

22 http://www.ncbi.nlm.nih.gov/ XREFdb/

23 ftp://ncbi.nlm.nih.gov/repository/ yeast/yeaster.lst

24 gopher://gopher.gdb.org:70/11/ biol-search/yeast-email

£'5 http://biology.queensu.ca/ -misenera/vector.html

26 http://www.atcc.org/ 2 7 htt p://siva .cshl.org/#2dgel 28 htt p://www.nih.gov/sigs/yeast/

index.html 29 http://xanadu.mgh.harvard.edu/

bre ntlabweb/interacriontrap.html 30 http://http.bsd.uchicago.edu/

-gottsehling/homepage.html 31 http://www.umanitoba.ca/

faculties/medicine/human_genetics/ gietz/Tra fo.html

32 htt p : / / ~ . c s h l . o r g / b o o k s / mcbory_saceharomyces.htrnl

33 news:bionet.molbio.yeast (assuming rmtp server is se0

34 http://genome-www.sranford.edu/ cgi-birVbiosci_yeast/

35 http://www.bio.net/

M E E T I N G R E P O R T S

Nuclear receptor superfamily reunion KEYSTONE S'~WOSIUM NUO.EMg RE(]~fOR SUPE]~AMILy. lAKE TAHOE, CAH FOtL";IA, USA, 17-23 MARCH 1996.

The meeting, organized by Kathryn Horwitz, John Cidlowski and Ron Evans, dealt with the structure, function and physiological role of this family of ligand- dependent transcription factors, which now number some 150 different members..Many significant advances made since the last Keystone Symposium on this topic in 1~94 were presented at the mecting. These in- cluded the identification and characteriz- ation of potential coactivators and co- repressors, the determination of the crystal structure of the ligand-binding domains of four receptors, new insights into the role of peroxisome-prdfferstor-ac:ivated receptors and the effects of knocking out tile genes for several types of receptors in mice.

Perhaps the most remarkable report was the announcement made by George Kuipers andJanAke Gustafson (Karolinska

Inst., Sweden) that there is a second oestro- gen receptor that they call ERI~, in addition to the original (which must now be called ER~). Preliminary analysis indicates that their ligand-binding and target-gene speci- ficity are similar, but their relative tissue distribution is different. The importance of ERot for fertility in males and females has already been established by Ken Korach and co-workers, and so the result of knocking out ER[3 is eagerly aw.aimd.

The topic discussed most frequently, however, was undoulxedly that of receptor- interacting proteins and their role in modu- lating transcriptional activity of nuclear receptors. Before ligand binding, the tran- scriptional activity of many receptors is suppressed by proteins that bind to their ligand-binding domains. Heat shock pro- teins serve this function in the case of steroid

TIG JULY 1996 VOL. 12 NO. 7

hormone receptors, while a distinct group of proteins appear to repress the activity of the retinoie acid and thyroid hormone re- ceptors. Several such co-represso~ that are related to one another were described, namely SMRT (Ron Evans, Salk Inst,, USA), TRAC, s (Steven Sande and Martin Privalsky, Univ~ofCalifomia at Davis,USA) and NCoR (RiM Kurokawa and Andreas Hoflein, Univ. of California at San Diego, USA). One of these co-repressors, NCoR, was also found to bind to the progesterone receptor in the presence of the hormone antagonist RU486, suggesting that the inhibition of transcription was not merely a failure to adopt the appropriate active conformation (Katherine Hotwitz, Univ. of Colorado School of Medicine, USA).

Following homaone binding, a surpris- ing number of proteins ha~,e been found to

PIE SOl 68~9525(06)30051,6