World wide web resources for the biologist

6
t appears that the commercial world has just dis- covered the Internet, and soon everyone who has a computer on the Internet will want to become a pub- lisher. Andy Warhol stated that in the future everyone would be famous for 15; minutes. With the advent of the World Wide Web (WWW), that prophecy is about to become true. Individua!s will create esoteric home pages purely for entertainment, while commercial interests will concentrate on selling information. Very soon, academic research workers who have previously enjoyed a free range on the network will be competing with commercial or individual interests for bandwidth. Rather than providing a reference list for this article, I have included email addresses and Uniform Resource Locators (URLs) that will allow the read.er to go to the sites to which I refer below and see at first hand what the Internet and the WWW have to offer. The sites I have chosen reflect my own interests and are totally prejudiced; however, I believe they do provide a springboard for further exploration (for further reading, see Table 1). A glossary of acronyms is provided in Box 1. ellliljl servers Understanding and appreciating our position in the present often requires a review of past. In the begin- ning, network services were based on simple email, which is still the lowest common denominator for such services. At the European Molecular Biology Laboratory (EMBL), DNA sequences or software programs related Tbe WorkdWide Web is currently tbe major networking resource for biologists. It bas passed Gopber and simple electronic mail (email) servers in popularity. In tbe 199Os, tbe advent of cl&n&server software will be tbe main driuing force in bioinformatics. Daring tbe past few years, biologists bave used tbe hatemet increasingly to distribute data, and tbe methods of doing tbis bave become more and more sophisticated as tbe speed witb wbicb network links can be made bas increased! to biology could be ordered by contacting Netserv in Heidelberg. This service still exists, but is currently maintained by the European Bioinformatics Institute (EBI) near Cambridge in the UK. By sending an email message to [email protected] and including the word ‘help’ in the main body of the message you will receive complete instructions on how the service operates. In the USA, the National Center for Biotechnology Infor- mation (NCBI) also offers a similar email service for retrieving sequences ([email protected]), while retrieval and analysis of sequences can be obtained in Japan ([email protected]). TABLE 1.Kline d.ocummtation concerning theWorld Wide Web (WWW) Document Entering the World-Wide Web: A Guide to Cyberspace Internet Documentationand FAQs (Frequentlyasked questions) Hitchhikers’ Guide to the Internet World Wide Web and Mosaic:User’s Guide Handbook on Running a World Wide Ser\ier Essential Mosaic and HTML InternetInformationand Navigators Structuring the Information Superhighway in the UK The Network Observer Best of the Web 94 Welcome to the Big Dummy’s Guideto the Internet Zen and the Art of the Internet The use of WWW in Biological Research Biocomputing Survival Guide WWW94 Proceedings at Elsevier ElectronicProceedings of the SecondWorld Wide Web Conference 94: Mosaic and the Web ThirdInternational World Wide Web Conference Ad&ES& http://www.eit.com/web/www.guide http://groucho.gsfc.nasa.gov/Code_52O/locator/ surf.html#intemetdocs gopher://riceinfo.rice.edu/OO/Computer/AroundTheNeti Networks/Hitch http://elib.cme.nist.gov/fasd/pubs/schlenoff94.html http://info.mcc.ac.uk/CGU/SIMA/handbook/handbook.html http://kawika.hcc.hawaii.edu/htmlstart.html http://library.tufts.edu/www/intemet.html http://tin.ssc.plym.ac.uk/up.htrnl http://communication.ucsd~edu/pagre/tno.html~op http://wings.buffalo.edu/contest/ http://cs.dal.ca/bdgtti-entry.htrnl gopher://gopher.tamu.edu/ll/.dir/zen.dir ftp://bioftp.unibas.ch/archive_data/ www94paper_explode/paper.html http://www.ch.embnet.org/jam/jam.htrnl http://www.elsevier.nl/cgi-bin/ID-94 http://www.ncsa.uiuc.edu/SDG/IT94@roceedings/ http://www.igd.mp.de/www/www95/www35.html Q 1995 Ekevier Science Ltd 0168 - 9525/95/$09.50 TlG JUNE1995 VOL. 11 No. 6 223

Transcript of World wide web resources for the biologist

Page 1: World wide web resources for the biologist

t appears that the commercial world has just dis- covered the Internet, and soon everyone who has a computer on the Internet will want to become a pub- lisher. Andy Warhol stated that in the future everyone would be famous for 15; minutes. With the advent of the World Wide Web (WWW), that prophecy is about to become true. Individua!s will create esoteric home pages purely for entertainment, while commercial interests will concentrate on selling information. Very soon, academic research workers who have previously enjoyed a free range on the network will be competing with commercial or individual interests for bandwidth.

Rather than providing a reference list for this article, I have included email addresses and Uniform Resource Locators (URLs) that will allow the read.er to go to the sites to which I refer below and see at first hand what the Internet and the WWW have to offer. The sites I have chosen reflect my own interests and are totally prejudiced; however, I believe they do provide a springboard for further exploration (for further reading, see Table 1). A glossary of acronyms is provided in Box 1.

ellliljl servers Understanding and appreciating our position in the

present often requires a review of past. In the begin- ning, network services were based on simple email, which is still the lowest common denominator for such services. At the European Molecular Biology Laboratory (EMBL), DNA sequences or software programs related

Tbe Workd Wide Web is currently tbe major networking resource for biologists. It bas passed Gopber and simple electronic mail (email) servers in popularity. In tbe 199Os, tbe advent of cl&n&server software will be tbe main driuing force in bioinformatics. Daring tbe past few years, biologists bave used tbe hatemet increasingly to distribute data, and tbe methods of doing tbis bave become more and more sophisticated as tbe speed witb wbicb network links can be made bas increased!

to biology could be ordered by contacting Netserv in Heidelberg. This service still exists, but is currently maintained by the European Bioinformatics Institute (EBI) near Cambridge in the UK. By sending an email message to [email protected] and including the word ‘help’ in the main body of the message you will receive complete instructions on how the service operates. In the USA, the National Center for Biotechnology Infor- mation (NCBI) also offers a similar email service for retrieving sequences ([email protected]), while retrieval and analysis of sequences can be obtained in Japan ([email protected]).

TABLE 1. Kline d.ocummtation concerning the World Wide Web (WWW)

Document

Entering the World-Wide Web: A Guide to Cyberspace

Internet Documentation and FAQs (Frequently asked questions)

Hitchhikers’ Guide to the Internet

World Wide Web and Mosaic: User’s Guide

Handbook on Running a World Wide Ser\ier

Essential Mosaic and HTML

Internet Information and Navigators

Structuring the Information Superhighway in the UK

The Network Observer

Best of the Web 94

Welcome to the Big Dummy’s Guide to the Internet

Zen and the Art of the Internet

The use of WWW in Biological Research

Biocomputing Survival Guide

WWW94 Proceedings at Elsevier

Electronic Proceedings of the Second World Wide Web Conference 94: Mosaic and the Web

Third International World Wide Web Conference

Ad&ES&

http://www.eit.com/web/www.guide

http://groucho.gsfc.nasa.gov/Code_52O/locator/ surf.html#intemetdocs

gopher://riceinfo.rice.edu/OO/Computer/AroundTheNeti Networks/Hitch

http://elib.cme.nist.gov/fasd/pubs/schlenoff94.html

http://info.mcc.ac.uk/CGU/SIMA/handbook/handbook.html

http://kawika.hcc.hawaii.edu/htmlstart.html

http://library.tufts.edu/www/intemet.html

http://tin.ssc.plym.ac.uk/up.htrnl

http://communication.ucsd~edu/pagre/tno.html~op

http://wings.buffalo.edu/contest/

http://cs.dal.ca/bdgtti-entry.htrnl

gopher://gopher.tamu.edu/ll/.dir/zen.dir

ftp://bioftp.unibas.ch/archive_data/ www94paper_explode/paper.html

http://www.ch.embnet.org/jam/jam.htrnl

http://www.elsevier.nl/cgi-bin/ID-94

http://www.ncsa.uiuc.edu/SDG/IT94@roceedings/

http://www.igd.mp.de/www/www95/www35.html

Q 1995 Ekevier Science Ltd 0168 - 9525/95/$09.50

TlG JUNE 1995 VOL. 11 No. 6

223

Page 2: World wide web resources for the biologist

EVIEW$,

ASCII

CERN EBI EMBL EMBnet =PMY

GDB GVU

NCBI NCGR NCSA NSFnet PDB SRS

WAIS

Box 1. Ghsary of acronyms

American Standard Code for Information Interchange European Organization for Nuclear Research European Bioinformatics Institute European Molecular Biology Laboratory European Molecular Biology Network Expert Protein Analysis System File Transfer Protocol Genome Database Graphics Visualization Unit Hypertext Markup Language Hypertext Transfer Protocol National Center for Biotechnology Information National Center for Genome Resources National Center for Supercomputer Applications National Science Foundation Network Brookhaven Protein Database Sequence Retrieval System Uniform Resource Locator Wide Atea Information Server World Wide Web

FTP The next step was to make data available through

anonymous file transfer protocol VIP). In Europe, the complete DNA data library and the protein data library SWISS-PROT were made available for down- loading by EMBL. This service has also recently moved to the EBI in the UK, and very many databases and free software packages for molecular biology are available from the F’I‘P site ftp://ftp.ebi.ac.uk (contact Peter Stoehr at [email protected]). Two FTP sites in the USA that have also played an important role h the distri- bution of data anti programs are the NCBI site fip:Nncbi.nlm.crih.gov (contact Scott Federhen at [email protected]) and the University of Indiana site ftp://ftp.bio.indiana.edu (contact Don Gilbert at [email protected]).

FIGURE 1. Histogram showing the volume of Internet trafllc in recent years. The data were produced by James E. Pitkow ([email protected]). 2.39.5~1 is traffic on the Wide Area

Information Server (WAIS).

TIG &NE 1995 VOL. 11 No. 6

224

WAI!3 and Gopher ln the early 199Os, the software products Gopher

and Wide Area Information Server (WARS) made their appearance. The first Gopher site related to biology was set up in the USA at Indiana University (gopher:// fly.bio.indiana.edu) by Don Gilbert, and was so popular and useful that it quickly spawned many copies. What made Gilbert’s Gopher so innovative was the fact that he indexed the whole of the GenBank database using WAIS, and then used the Gopher interface to allow users to access the database for sequence entries and subsequently to download them directly online or even email them to email addresses out on the Internet.

In Europe, all the countries that belonged to the European Molecular Biology Network (EMBnet) soon had their own Gopher servers. The most sophisticated was run by Reinhard Doelz from the BioCentrum in Basel, Switzerland (gopher://bioftp.unibas.ch), who provided Gopher access so that the EMBL database could be queried online. Gopher was a huge success because it is simple to set up and maintain. It is also very easily operated by using the cursor keys to move through a series of menus; therefore, it was ideally suited to the research worker who had limited comput- ing power and who was quite satisfied working with a simple VT100 terminal interface.

FI’P is still responsible for most of the current network traffic on the NSFnet (Fig. 1). However, Gopher, which was the star performer up until 1993, has suddenly been overtaken by a newcomer called the World Wide Web.

In the rest of this article I concentrate on why the WWW, and in particular the software developed by the National Center for Supercomputer Applications (NCSA) called Mosaic, have become so popular.

Hypertext and multimedia

Byte Count by Service

Although multimedia and hypertext for the display of documentation are of great importance today, Ted Nelson first coined the word ‘hypertext’ 30 years ago in 1965 and, as early as 1967, Andy van Dam and others

began to build an early hyper- text editing system. In 1989, Tim Berners-Lee proposed the WWW project as a way of handling the delivery of documents at CERN (http://www.cern.ch/) in Geneva. This new concept viewed the re- search environment as no longer confined to a single laboratory, with its own limited resources, but rather that research workers could be linked together through a worldwide network. Computers may be located in different parts of the world, but, with modern networking, they can be seamlessly linked together. Researchers work locally, but compute globally.

Client-server model The WWW is based on a

client-server model. The two most popular types of WWW

Page 3: World wide web resources for the biologist

server come from CERN and from NCSA. Servers are maintained by those who wish to distribute data or documentation. A complete list of servers for different hardware platforms is available (http://info.cem.ch/ hypertext/WWW/Daemon/Overview.html). To access a server, the user needs client software and a connection to the Internet. Clients are often called ‘browsers’ since they allow you to browse through information that exists on the server. The best place to look for a complete listing of clients is at http://info.cern.ch/hypertext/ WWW/Clients.html where terminal-based browsers, clients for PCs that run Windows, and clients for the Macintosh and the X Window System for a variety of Unix operating systems can be found.

Every server has a URL that explicitly identifies it; for example, the URL that identifies the WWW server at EBI has the address http://www.ebi.ac.uk (Fig. 2). The acronym ‘http’ in the URL refers to the hypertext transfer protocol, which is used to transfer files on the WWW. Most of the documents on the WWW are written in Hypertext Markup Language (HTML). Therefore, it is possible to write a document incorporating links to documents and services that exist on a computer on the other side of the world, simply by specifying in HTML the machine name, the directory where the information is stored and the filename for the documentation. Because of this common language, users are very often unaware that they are moving from one computer to another. The interface is so transparent that it is not at all obvious to the novice on which computer they are performing their searches or queries.

If you do get lost in cyberspace then you can always find out where you are by looking at the document URL at the top of the Mosaic browser. It is also good practice to check where you might be heading next, if the pointer of the mouse is placed on a hyperlink, the URL of the site you are about to visit will be displayed at the bottom of the Mosaic screen. In addition, most clients offer the opportunity of saving the URL of an interesting site on a hotlist (Mosaic) or as a book- mark on a recently released browser called Netscape (http://home.mcom.com/home/welcome.html). You can then call up your hotlist and jump to a specific site directly, rather than navigating some tortuous route to find it.

The first bioservers on the WWW The first browsers for the W’WW were very rudi-

mentary and the hypertext links in documents were usually visible as numbers in square brackets. This par- ticular interface had no real advantage over Gopher, meaning that there was little initial interest in the WWW. However, in June 1993, NCSA released Mosaic 1.0 for the X Window System, which had such a good graphical user interface that immediately the combination of the WWW server and the Mosaic client began to interest biologists.

in the USA, the first WW’W biology-related server was established by Keith Robison at Harvard Biolabs (http://golgi.harvard.edu/). In Europe, the most impressive server for biology was the EXPASy (http://expasy.hcuge.chn server in Geneva, maintained by Amos Bairoch. The authors of these two sites were pioneers in the field and have remained SO.

FIGURE 2. The home page of the World Wide Web sewer of the European Bioinformatics Institute.

The Harvard site is also the main depository for new bioservers that come online. Harvard maintains a biosciences index called Biopages that lists new resources on the network related to the field of biology (http://golgi.harvard.edu/htbin&opages). Another use- ful source of information is an archive of announce- ments that is posted to the Usenet newsgroup Cbionet.sofiware.www). This newsgroup allows scientists who maintain a www server to announce the location of their server and to give a brief summary of the services it offers. The archive is maintained by Reinhard Doelz in Switzerland (http://www.ch.embnet.org/ bio-www/info. html).

Gopher versus the Www In March 1994, the WWW byte traffic surpassed the

Gopher tic on the NSFnet (Fig. l), and, in less than a year, the total number of WWW servers that concen- trate on some aspect of biology has exploded to approximately 871 sites (calculated from the Yahoo site, http://akebono.stanford.edu/yahoo/Science/3

Recently, a discussion in the Bionet newsgroups took place regarding the merits of Gopher compared with the WWW and HTML. One significant point raised was that, although an entry can be retrieved using a Gopher search, it is displayed as a simple ASCII text only. The entry may contain 3 reference to the Medline database, but if you wani to read the abstract it is probably necessary to quit Gopher and start up some other application such as Nentrez (see this month’s Genetwork, p. 247) to get at the Medline reference. However, with the WWW, it is possible to parse an entry as it is delivered to the screen so that certain parts

TIG JUNE 1995 VOL. 11 No. 6

225

Page 4: World wide web resources for the biologist

REVIEWS

of the document become hyperlinks; if you are then interested in a Medline reference, you need only click on the link and it appears, which saves a great deal of time.

The www protocol is very versatile since it is able to handle FIP for transferring files and email for posting documents to other users on the Internet. WAIS indexes can also be searched from within the WWW and, with the release of Netscape, there is now a convenient way of posting to the Bionet newsgroups on Usenet.

The Sequence Retrieval System (SRS) (http:// www.embl-heidelberg.de/srs/srsc), developed by Thure Etzold at EMBL, is a perfect example of many databases joined together through hyperlinks. Entries in the SWISS-PROT database may often have cross-references to other databases, such as Medline, PROSITE or EMBL, and it is a matter of great convenience to be able to jump directly to a cross-reference from within a hyper- link in a document.

For a dramatic insight into networking trends, monthly statistics are available from the Graphics Visualization Unit (GVU) Center for NSFnet Statistics (http://www.cc.gatech.edu/gvu/stats/NSF/merit.html).

Images on the network One of the best features of the WWW is that it

allows the display of images over the network. If you are interested in the three-dimensional structure of pro- teins, then the server at http://www.nih.gov/htbin/pdb LOWS you to view molecules online (Fig. 3); alterna- tively, if you are interested in genetic maps, then a good site is the Genome Database (GDB) browser at Johns Hopkins University (http://gdbwww.gdb.org/ gdbdoc.topq.html).

However, the display of graphics requires that you have high-speed network connections; otherwise

downloading the files is a tedious process. Waiting for graphics to appear on a slow line has been said to be like ‘watering a lawn with an eye-dropper’! Netscape overcomes some of the problems associated with loading large images since it uses interlaced Graphical Interface Formats (GIFs). Netscape (which is commercial software) has been developed by one of the original authors of Mosaic, Marc Andreessen.

An excellent server at the MRC Laboratory of Mol- ecular Biology (Cambridge, UK) (http://scop.mrc-lmb. cam.ac.uk/scop/) lists a structural classification of pro- teins (stop), and also delivers GIF images of proteins. This is a perfect example of how hypertext links can be used to obtain images from a Gopher (gopher:// pdb.pdb.bnl.gov:70/19/PDB/Entries/) in the USA, and gives an indication of how powerful hypertext can be.

Although some servers rely almost entirely on text on their home pages, many make good use of graphics or icons. The advantage of having icons is that they often act as visual signposts that help people to remem- ber where they have been as they navigate through cyberspace. The disadvantage of icons is that, if they are too large, then it takes ages for the home pages to load on the screen. For this reason, many clients offer the choice of switching icons on or off. On the other hand, if a home page consists of dense text only, it becomes very tedious to keep re-reading the same text to fmd the desired hyperlink. In my opinion, the best home pages are those that use both graphics and text.

Real and serious servers It is a fact that many home pages are ‘virtual’ rather

than ‘real’; i.e. rather than supplying any unique or substantial data themselves, they are composed entirely of links to other people’s work. There is nothing wrong with this approach, because the gathering together of information into a compact and discrete package is a service in itself. For instance, Pedro’s biomolecular research tools is a very good collec- tion of sites and services that is mirrored on both sides of the Atlantic: at http://www.fmi.ch/biology/ research_tools.html in Switzerland and at http:// www.public.iastate.edu/-pedro/research_tools.html in the USA. However, the primary provider of data is often the most informative since the author has a responsi- bility to keep the service and database up to date and accessible. For example, in the USA, the NCBI (http:// www.ncbi.nlm.nih.gov/) on the east coast and the National Center for Genome Resources (NCGR) (http:// www.ncgr.org/) on the west coast provide access to DNA sequence databases, while in Europe, the EBI (hnp://www.ebi.ac.uk/) provides primary access to both DNA databases and SWISS-PROT. In Asia, the GenomeNet WWW server (http://www.genome.ad.jp/) provides a very useful interface to many databases, as does the DNA Database of Japan (DDBJ) (http:// www.nig.ac.jp/home.html).

Traffic jams on the network superhighway One factor that is often overlooked is the speed of

the network lines to different parts of the world; some countries have better networks than others. It may well be that a WWW server that works well for a local

TIG JUNE 1995 VOL. 11 No. 6

226

Page 5: World wide web resources for the biologist

community of researchers does not petiorm as well when it is being accessed from the other side of the world.

The prudent research worker should be aware of international time differences and exploit them. Since the USA is 5-7 h behind Europe in time, it is often very convenient for Europeans to make use of WWW servers in the USA in the morning. Regular users of the net- works become aware of times during the day when the response times are so slow due to network traffic that it is almost impossible to do useful work on a server. At ‘rush hour’, we learn to avoid certain roads. The same is true on the Web; you learn to avoid certain sites at cer- tain times of the day.

With more and more people using the network, it is possible that, eventually, nobody will be able to do any real work because of the overload. Such a situation is comparable with all lanes on a motorway being so blocked that no traffic can move. Although the academic community has long enjoyed a monopoly on the Internet, it is becoming increasingly obvious that com- mercialization of the Internet is about to take place; science may have to take a back seat to entertainment or commerce. The netsurfers are with us.

a!# irnation on the In his book Z%e Songlines, Bruce Chatwin made the

statement ‘everything is solved by walking’. When the Australian aborigines went on ‘walkabout’ they followed invisible lines on the desert floor. Today, the biologist will often go on ‘walkabout’ to discover new Internet resources; many problems can be solved by walking the Web. Maps of the Internet resources of every European country are available on the WWW. In the UK, for example, a map with all the Web servers associated with universities in the UK is available (http://scitsc.wlv.ac.uk/ukinfo/uk.map.html). There is also a map (http://scitsc.wlv.ac.uk/ukinfo/uk.map.res.html) for all the research councils in the UK, and, if you want to explore the world at large, then the best clickable map is to be found in the USA (http:// wings.buffalo.edu/world/).

Clicking on maps to discover resources might be interesting but it is also time consuming. To overcome this problem, programs called Spiders or Robots have been written, which move about the Web collecting information about URIS. Some Robots collect titles or headers of home pages, others search the documents themselves and extract keywords to make up a short abstract, while still others simply provide indexes. A Robot called Archie-Like Indexing for the Web (ALlWeb) (http://www.cs.indiana.edu/aliweb/l_ iarch) was one of the first constructed to store inJormation about the Web. An example of its use is given in Box 2: a research worker interested in genetic mapping would key in the keywords ‘genetic maps’ and the Robot would return a list of sites that dealt with that subject. The URL would appear as highlighted text and just clicking on it would take you to the desired site. Databases of URT.s have since been developed using Robots such as WWW Worm (http://www.cs.colorado.edu/home/mcbryan/WWWW .html), which was awarded ‘Best of the Web 94’ at the first international conference on the WWW held in Geneva last year. A particularly good tool for searching the www is called WebCrawler (http://webcrawler.

: crop plant genetics; molecular genetics; devel- opmental genetics; plant pest resistance; elicitors; gene expression, gene reguiation, gene targeting; MADS-box genes; RFLP; signal transduction; T-DNA tagging; transform- ation; transposition; transposon tagging; maize barley; wheat; potato; Antirrhinum, Arabidopsis

skLnrskcc.org Keywords: Sloan-Kettering Institute; Research; Cancer; Moiecuiar, Biology; Genetics; Biochtmistry, Immunology; Genetics

ofours Keywords: genetics; diabetes; obesity; sickle-cell; genetic engineering; human genome

NeuroForecast Keywords: neural networks; genetica algorithm; financial; forecasting

Keywords: Genome Database (GDB); online Mendelian inheritance in man (OMIM); genome; disorders; database; biology; genetics

GDB Browser Keywords: GDB; genome; database; biology; genetics

Keywords: GDB; OMIT, disorders; database; biology; genetics

cs.washington.edu/WebCrawler/WebQuery.html), which specializes in biological sites. The Lycos Robot has a catalogue of 1.19 million unique URL~ and also provides a short abst;act about each site to help you to decide if it is worth a visit or not, An excellent resource for descriptions of the most popular Robots is available (http://home.mcom.com/home/intemet-search.html).

The future is expensive The academic community and the ‘old guard’ on the

Internet have always wanted everything to be free. The ethos of the academic community has been to promote free access to data and free distribution of data; how- ever, this philosophy does not sit well with commercial companies who would like to use the network for profit. Companies like CompuServe Prodigy and America Online are now giving their paying customers access to the Internet and it has been estimated that the number of users worldwide on the Internet is increasing at a rate of 1045% per month.

If the network infrastructure remains unchanged and the number of users increases, then the net- work bandwidth will not be sufficient to cope with the increased traffic and the information superhighway could well become jammed. In Europe, some attempts have been made to gain an insight into networking conditions within EMBnet (http://www.caos.kun.nl/Ping/), and the results show a

TIG JUNE 1995 VOL. 11 No. 6

227

Page 6: World wide web resources for the biologist

REVIEWS

great disparity in telecommunication systems between different countries.

A dramatic increase in the use of the Internet has occurred recently and biologists have been quick to provide many innovative services. The enthusiasm that has been witnessed in biology has also been paralleled by many other disciplines, notably chemistry and physics. However, the involvement of more and more commercial sites will dramatically transform the nature of the Internet. It would be a sad occurrence if the spirit

H igher plant mitochondrial genomes (mtDNAs) are larger than those of mammalian or fungal mitochondrial genomes. The smallest one (200 kb) is more than ten times the size of the mtDNA in animals, which is 16-20 kb, and several times the size of the 80 kb mtDNA of Succharomyces cereoisiae. The largest plant mitochon- drial genome (2500 kb) is about half the size of the Escberichiu coli chromosome. The higher plant mito- chondrial genomes are also more variable in their organization and have a larger coding capacity than mitochondrial genomes in mammals and fungi.

Four types of maize mitochondrial genomes have been recognized in the past192 and a fifth one was reported recently3. Their designations are NA and NE3 for the normal male fertile phenotypes, and T, S and C for the three different cytoplasmic male sterile (ems) phenotypes. These five mtDNA types have distinguish- able restriction enzyme digestion profdes3-6 and, so far, physical maps for three of the maize cytotypes have been completed.

The three physical maps The physical maps, constructed independently from

overlapping cosmid clones by the use of three restric- tion enzymes each (BamHI, XhoI and SmaI), are avail- able for the NA, NB and c&T mitochondrial genome@-9 (Fig, 1). In each case, the entire genetic information can be assembled as a single circular DNA molecule called the master chromosome: 700 kb for NA; 570 kb for NP; and 540 kb for CWUT. Each master chromosome contains unique sets of repeated se- quences, the homologous recombination of which gen- erates either isomeric forms of the master chromosome or subgenomic circular DNA molecules. The size of the subgenomes is dependent upon the location of the repeated sequences within the master chromosome (see below). Detailed sequence comparison shows striking organizational differences between the three master chromosomess~lO. Many sequences of various lengths have been permuted or rearranged to various degrees between the three genomes, and each genome contains unique sequences.

chatacteristic of the master chromosomes Gkne content

Higher plant mitochondrial genomes are much larger and contain more genes than those of other organisms. Forty six genes have been mapped onto

8 1995 EL5evier science lid 0168 - %25/95/$0!4.50

of cooperation and the sense of adventure that have so typified the academic side of the Internet were to give way to a less generous world view where the only aim is dne of profit,

The maiz genome: functional CHRISTIANP FAURON, MARK CASPER, YAN GAO AND BARRY MOORE

The organization of tbe mitocbondrialgeuome of bigber plants is complex, It bas two strikingfeatures. a large size tbat can vary among plant species; and tbe ability to undkygo bomohgous recombination tbat results in variation witbin species. Ihnn cosmid clime mapping studies, tbe totalgenetic infomration of tbeplant mitocbondrialgenome can be arranged into a single ckular mokx& tbat is referred to as tbe master chromosome. Tbts chular DNA mokuk contains repeated sequences tbat can generate, vi& intramolecular ~combination, either isomer& forms of tbe master chromosome or small&r subgeuomic circular DNA molkcuh~, The make ~uitochndrialg~ome is tbe most compkx andlargest mitocbondrialgenomef~ wbicb a physical map is presently available. Its organization varks considerably among tbe di@h?nt make cytoiypes. In an attempt to undkrstand tbe numerous di@Wut mitocbondrial DNA rearra~emeuts encouutmd among tbose cytoypes, we bave proposed a general modkl of genome evol&ion tbat can explain a multitude of genomic rearrangement, not oulyfor tbe ma&e mitocboudrial DNA but also for otber bigberplant mitocbondrial genomes as well

each maize mtDNA master chromosome so far (Table 1). That additional functional genes will be identified is strongly suggested by the analysis of a bryophyte (Marcbantia poZymorpha) mitochondrial genome, which is 186 kb long and has been sequenced by Oda et al.11 Although its organization is different from the mtDNA of higher plants (e.g. recombination does not occur), M. polymorpba mtDNA contains 94 open read- ing frames and has been a good source of genetic probes. Hybridization studies using oligonucleotide probes derived from the published sequence of M. polymorpba show that a substantial number of its genes, whose functions are still unknown, are also pres- ent within the maize mitochondrial genome (CM-R. Fauron, unpublished).

TIG JUNE 1995 VOL. 11 No. 6

228