Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z....

58
Gene-Protein Database of Escherichia coli K-12, Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115 INTRODUCTION The gene-protein database is unique among other databases being constructed for Escherichia coli because it is configured on a global approach that allows the cell’s total complement of polypeptides to be examined at one time (27). Two-dimensional (2-D) polyacrylamide gel electrophoresis (PAGE) permits this global approach by separating complex mixtures of polypeptides into individual polypeptide species (called spots on the 2-D gel) by two independent separation steps, isoelectric focusing and sodium dodecyl sulfate (SDS)-PAGE (38). However, the reason for creating this database is not to build the “master” 2-D gel for E. coli, because the small number of investigators routinely using 2-D gels would not justify this enormous venture. The purpose is to provide other investigators with physiological and regulatory data on the entire set of E. coli proteins. The ultimate goal of this database is to catalog when, why, and to what level each protein- encoding gene is expressed. Two projects that tackle this problem are under way. The first project, called the Genome Expression Map, is designed to link each of the protein-encoding genes to a spot on the 2-D gel. The second project, called the Response/Regulation Map, is focused on cataloging the conditions under which each of these genes is expressed and on determining what molecules regulate their expression. This database includes two types of information. First, for identified proteins, we provide the following information: gene name, protein name, EC number, SWISS-PROT accession number, GenBank code, metabolic class, position and orientation of the gene on the chromosome, molecular weight (MW), and pI (calculated from DNA sequence information). This provides sufficient information to allow a user to do a literature search or to access more information in other databases. Second, for identified as well as unidentified proteins, information obtained from 2-D gels is included: MW and pI of the protein (estimated from its migration on the gels), abundance of individual proteins grown under different conditions, and memberships of proteins in particular regulons and/or stimulons. Some of the other databases (e.g., SWISS-PROT [5]) also provide linkage to this database by including the 2-D spot name (called an alpha-numeric, or A-N, name) in their information list. The entire gene-protein database (including the 2-D gel images) is available electronically and can be obtained through anonymous ftp at the database repository at the National Center for Biotechnology Information (see the section on Information Exchange). HISTORY OF THE DATABASE The gene-protein database was begun immediately after the introduction of the 2-D gel method (38). The first set of data (which is still included in the database) was a catalog of 140 individual

Transcript of Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z....

Page 1: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

Gene-Protein Database of Escherichia coli K-12,Edition 6

RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERTL. CLARK, AND FREDERICK C. NEIDHARDT

115

INTRODUCTION

The gene-protein database is unique among other databases being constructed for Escherichiacoli because it is configured on a global approach that allows the cell’s total complement ofpolypeptides to be examined at one time (27). Two-dimensional (2-D) polyacrylamide gelelectrophoresis (PAGE) permits this global approach by separating complex mixtures ofpolypeptides into individual polypeptide species (called spots on the 2-D gel) by twoindependent separation steps, isoelectric focusing and sodium dodecyl sulfate (SDS)-PAGE (38).However, the reason for creating this database is not to build the “master” 2-D gel for E. coli,because the small number of investigators routinely using 2-D gels would not justify thisenormous venture. The purpose is to provide other investigators with physiological andregulatory data on the entire set of E. coli proteins.

The ultimate goal of this database is to catalog when, why, and to what level each protein-encoding gene is expressed. Two projects that tackle this problem are under way. The firstproject, called the Genome Expression Map, is designed to link each of the protein-encodinggenes to a spot on the 2-D gel. The second project, called the Response/Regulation Map, isfocused on cataloging the conditions under which each of these genes is expressed and ondetermining what molecules regulate their expression. This database includes two types ofinformation. First, for identified proteins, we provide the following information: gene name,protein name, EC number, SWISS-PROT accession number, GenBank code, metabolic class,position and orientation of the gene on the chromosome, molecular weight (MW), and pI(calculated from DNA sequence information). This provides sufficient information to allow auser to do a literature search or to access more information in other databases. Second, foridentified as well as unidentified proteins, information obtained from 2-D gels is included: MWand pI of the protein (estimated from its migration on the gels), abundance of individual proteinsgrown under different conditions, and memberships of proteins in particular regulons and/orstimulons. Some of the other databases (e.g., SWISS-PROT [5]) also provide linkage to thisdatabase by including the 2-D spot name (called an alpha-numeric, or A-N, name) in theirinformation list. The entire gene-protein database (including the 2-D gel images) is availableelectronically and can be obtained through anonymous ftp at the database repository at theNational Center for Biotechnology Information (see the section on Information Exchange).

HISTORY OF THE DATABASE

The gene-protein database was begun immediately after the introduction of the 2-D gel method(38). The first set of data (which is still included in the database) was a catalog of 140 individual

Page 2: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

proteins (21 were identified) that reported variations in the levels of each protein in culturesgrown under different growth conditions (43). The first important step in establishing thestructure of the database, the alpha-numeric naming system used to uniquely identify each 2-Dgel spot, was described in that catalog.

In 1980, the first set of reference 2-D gels was published along with the identities of 81 moreproteins (6). Five years into the development of the database, it became apparent that in order totrack each 2-D gel spot through numerous gels, a standard cumulative map of each type of 2-Dgel had to be established. Each reference 2-D gel was overlaid with a grid to give each spot aunique x and y coordinate. The alpha-numeric naming system was maintained to match proteinsamong the reference gels.

In 1983, the information in the database was linked to the chromosome in the first gene-protein index (33). In that update and review, the identities of 157 proteins were listed, and inaddition, many unidentified proteins were mapped to a small region on the chromosome (33).Throughout the 1980s, many published reports on the responses of proteins observed on 2-D gelsused (and added to) the protein identifications found in the index. Most of these reports gavephysiological and regulatory information about protein spots on 2-D gels (both those identifiedand those not identified).

In 1990, all of this information from the previous gene-protein indexes and from manyseparate reports was gathered together, put into an electronic database, and published as thegene-protein database (58). A year later, edition 4 was published. That edition introduced a newstandard 2-D gel that was generated by using a standardized 2-D gel method (61). The switch tothis new standardized method was important: it allowed other investigators to reproduce theprotein pattern so that they could access and contribute to the information in the databaseindependently of the database laboratory. Edition 5 (63) included the first set of identificationsmade using the T7 expression system on Kohara clones and also announced that an electronicversion of the database had been released to the database repository at the National Center forBiotechnology Information so that it would be available in a more usable form and could beupdated more frequently.

This sixth edition of the database introduces several changes necessary to accommodate theinput of data from many sources. A new naming system was started, not to replace the alpha-numeric naming system but to prevent redundancy in this system. The alpha-numeric names willnow be reserved for proteins that have been identified as the product of a particular gene. The newnaming system is being used for the Response/Regulation Map and will be used for the GenomeExpression Map to name proteins that have been observed but await identification. In theResponse/Regulation Map project, as many proteins as possible are matched to proteins already inthe database and are assigned that alpha-numeric name, but others are matched only within theResponse/Regulation Map project. These will be given a Response/Regulation Map name, which isan R followed by a four-digit number (e.g., R1698). In the Genome Expression Map project, theproteins matched to a single open reading frame (ORF) will be assigned alpha-numeric namesand will be added to Table 2 (see p. 2094) under the appropriate gene name and also in the SWISS-PROT (5) and E. coli (24) databases (as a reference between these databases). The proteins thatcannot be matched to a single ORF will be given only a Genome Expression Map name, which isan X followed by a four-digit number (e.g., X2404). These proteins appear in Table 2 withreference only to their chromosomal map positions until further analysis allows a match to an ORF.Table 1 (see p. 2076) will continue to serve as a list of all proteins found on 2-D gels, which arelisted in order of alpha-numeric name, Genome Expression Map name, and Response/RegulationMap name.

Page 3: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

DEVELOPMENT OF THE DATABASE

It is predicted that within the next few years, the entire DNA sequence of E. coli will bedetermined. The next steps in the analysis of E. coli will be (i) to confirm that the proposedORFs encode proteins, (ii) to determine how these genes are regulated, and (iii) to elucidate thefunction of these proteins. The plans for this database are designed to assist in this analysis. Twocomplementary projects to develop this database are under way. Each of these projects willprovide a separate data set, and a third data set will be provided by the DNA sequencing projects.Eventually, the three data sets will converge, because each contains information on the same setof 3,500 to 4,000 E. coli proteins.

The initial concept of the Genome Expression Map was published in 1980 (34). All of thoseearly protein identifications were made one at time, primarily using purified proteins as markersto identify the spots. The supply of purified E. coli proteins was quickly exhausted, and so othermethods to identify proteins were tried (44). The Genome Expression Map was intended toprovide a method for identifying all of the proteins on the E. coli chromosome without relying onbiochemists to purify the proteins or geneticists to construct mutants in each protein-encodinggene. Expressing genes carried on plasmids seemed like the ideal approach. At that time, arecombinant plasmid library, constructed by Clarke and Carbon (11), was available. One methodfor expressing proteins from recombinant plasmids had been described (49), and two moreexpression methods were developed (34, 47). Although many identifications have been made byusing these three expression methods, all of the methods failed to consistently express all of theproteins encoded by the plasmids. The primary reason for the failure was that each of thesemethods relied on the E. coli transcription system, and gene expression was thus controlled bythe cell’s own regulatory mechanisms, which do not allow equal transcription of all genes.

The new approach currently used for the Genome Expression Map project focuses onsimultaneously identifying many gene products. This will be accomplished by using the sets ofordered clones produced and sequenced by other laboratories (13, 23), by expressing the genes onthese clones with a selective expression system, and by matching the proteins produced by eachclone to ORFs found on the clone. By using clones that have been mapped to a position on thechromosome and completely sequenced, the cloning is easier (all restriction sites are known), and alist of potential protein products is already generated. The expression system uses phage transcriptionsystems (56), which offer two advantages over the E. coli transcription system. First, because phageRNA polymerases appear to ignore the transcription signals (encoded within the DNA sequence)used by the E. coli RNA polymerase to start and stop transcription, every ORF on a plasmid shouldbe transcribed within a single transcription unit. Second, by taking advantage of the sensitivity of theE. coli RNA polymerase and the resistance of the phage RNA polymerases to the antibiotic rifampin,the plasmid-encoded genes can be expressed exclusively. The sizes of the ordered chromosomalfragments allow 10 to 20 proteins to be identified simultaneously (based on the assumption that theaverage gene is 1 kb long), and yet this number of proteins is still small enough to allowunambiguous matching of most proteins to ORFs (in most cases) because of the variation in chargesand masses of proteins (migration on the 2-D gel) and also because all 20 genes will rarely beexpressed from a single strand. The experimental methods used for this project have been describedin detail elsewhere (50) and are presented here only briefly.

To express the genes from ordered sets of clones, the E. coli DNA from these clones is movedinto a special plasmid vector, and then the recombinant plasmid is transformed into a special E. colistrain. The special vector possesses several important features, including (i) a low-copy replicon tominimize the effects of certain genes that are lethal to E. coli when present in high copy, (ii) the lacZgene within the multiple cloning site to allow simple screening for plasmids containing inserts, and(iii) two different phage promoters flanking the multiple cloning site (oriented opposite to each other)

Page 4: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

to provide a means of independently expressing the protein-encoding genes on each DNA strand.Each of the special strains used to express the genes on these plasmids carries one of the phage RNApolymerase genes under the control of an inducible promoter to prevent the expression of theplasmid-encoded genes until the inducer is added, again minimizing the effects of lethal genes. Thestrains are also recA mutants; thus, recombination between the E. coli DNA on the plasmid and thechromosome is prevented.

To tag the proteins produced from the plasmid-encoded genes, a mixture of 3H-amino acids isadded to a culture in which the phage RNA polymerase has been induced and the E. coli RNApolymerase has been inhibited by rifampin. These 3H-labeled proteins are separated on 2-D gels.Because there is virtually no contamination from chromosomally encoded proteins to serve aslandmark spots on the 2-D gels, the 3H-labeled extracts are also comigrated on a 2-D gel with awhole-cell extract made from a culture (strain W3110) labeled with [14C]glucose in order to mapeach plasmid-encoded protein to a precise location on the reference 2-D images (50).

Page 5: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

FIGURE 1 2-D polyacrylamide gel of extracts of E. coli K-12 W3110 grown aerobically inglucose minimal MOPS (29) medium plus thiamine at 37°C and labelled with [35S]methionine(63).The 2-D gels were run on the investigator 2-D gel system (Millipore),using 4-8 ampholines inthe first dimensions (as described in the investigator manual) and 11.5% acrylamide (as described inthe investigator manual except that pH 8.8 Trizma from Sigma Chemical Co. was used as the 1.5 MTris for making up the slab gel). A grid overlay provides coordinates for individual spots. Lettersfrom A to H across the top follow the alphanumeric nomenclature system described in reference43.MW estimates given on the right were made by using protein spots with known MWs (deducedfrom the sequence).

Page 6: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

FIGURE 2 2-D polyacrylamide gels of extracts of E. coli K-12 W3110 grown aerobically inglucose minimal MOPS (29) medium at 37°C and labeled with 35S04 (63). A grid overlay providescoordinates for individual spots. Letters from A to H define zones of identified proteins versus theirmigration distances. (A) First dimension, isoelectric focusing to equilibrium (8,200 V-h), 1.6% pH 5to 7 and 0.4% pH 3.5 to 10 carrier ampholyte mixture; second dimension, 11.5% acrylamide. (B)First dimension, nonequilibrium (NE) pH gradient electrophoresis (39) (1,250 V-h), 2% pH 3.5 to 10carrier ampholyte mixture; second dimension, 11.5% acrylamide.

To match the ORFs found in the DNA sequence to spots on the 2-D gels, standard curves (shownin Fig. 5) were prepared by using the large set of proteins that have been identified on 2-D gels andwhose genes have been sequenced. From the sequence of the gene, the amino acid composition isdeduced, and from the amino acid composition, the pI and MW of the protein are calculated. Plots ofpI versus migration in the first dimension and MW versus migration yield the equations that give anestimate of where the products of other genes should migrate. By themselves, these estimates arenot sufficient to make a spot identification. However, when the number of candidate spots isreduced to 10 or so through the use of the selective expression system described, matches betweenORFs and their protein products can be found. In some cases, no unambiguous assignment of aprotein to an ORF can be made. In these cases, the protein will be assigned a Genome ExpressionMap name until further analysis clarifies which ORF matches the protein. This system of many-at-a-time protein identifications should rapidly increase the information compiled in the GenomeExpression Map section of the database.

The Genome Expression Map project specifically addresses the question of whether each ORFidentified within the DNA sequence actually expresses a protein. By themselves, the results of such

Page 7: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

work would be a significant contribution to the study of E. coli. However, because the expression ofthe ORFs will be determined through analysis on 2-D gels, this project also provides theResponse/Regulation Map project with the necessary linkage to the chromosome.The Response/Regulation Map really began with the first publication describing the 2-D gel method(38). O’Farrell used protein extracts from E. coli and revealed the proteins synthesized under givengrowth conditions. Many other global studies that used 2-D gels have since been published. Twofactors have limited the growth of this part of the database. First, lack of a standardized 2-D gelmethod (prior to 1991) hindered other investigators from contributing global studies to the database.Only one independent investigator has ever contributed to this part of the database (15). Second,although 2-D gels can resolve about 1,200 protein spots, the methods used to quantify the spotsrestrict global quantitation to the most abundant 200 to 600 proteins. Manual methods of quantitationare restricted by the low specific activities of radiolabeled amino acids and the time required to punchout individual spots for counting in scintillation counters. Computer-aided image analysis systemswere introduced in the 1980s, but quantitation by this method is limited by the slow processing timeof the computers, the immaturity of the software, and the narrow linear optical density range of theX-ray film used to capture the gel image.

FIGURE 2 (continued)

With the development of faster computers, better image analysis software, and a new method tomeasure radioactivity in the spots on the gel (phosphorimagers [41]), it is now possible to quicklyquantify 1,000 to 1,200 spots per gel and to match the spots among multiple images. When these

Page 8: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

methods of detection and analysis are used, proteins with steady-state levels of more than 50molecules per cell or those with synthesis rates accounting for 0.04% of the total protein synthesizedduring a pulse-label can be detected and included in the analysis of each 2-D gel. The results fromthe two different analysis methods are expressed differently, as discussed in the footnotes to Table 3(Table 3 on p. 2101; footnotes on p. 2111). The results of four comprehensive analyses were added tothis edition of the database, and eventually, all of the partial analyses will be redone and entered intothe database along with data from additional experiments. For each of these, only the data are given;the interpretations and conclusions made from these experiments are given elsewhere (see thefootnotes to Tables 1 through 4).

FIGURE 3 2-D polyacrylamide gels of extracts of E. coli B/r NC3 grown aerobically in glucoseminimal MOPS (29) medium at 37°C and labeled with 35S04 (63). A grid overlay providescoordinates for individual spots. Letters from A to H define zones of identified proteins versus theirmigration distances. (A) Gel conditions are the same as in Fig. 2A; (B) gel conditions are the same asin Fig. 2B.

The Response/Regulation Map project is cataloging, through 2-D gel analysis, when (theresponse) and how (the regulation) each individual protein is expressed. Although this catalog willseldom define the exact function for any individual protein, it is expected to provide many of theclues that will direct the study of each protein’s function and to provide the physiological data needed

Page 9: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

to help define regulatory elements contained in the DNA sequence by revealing which proteinsbelong to a particular regulon. Because the information in the Response/Regulation Map hasaccumulated over many years and from quantitative and qualitative analyses of the 2D gels, the dataare presented in two tables (Tables 3 [p. 2101] and 4 [p. 2112]). Quantitative data for all proteinsmeasured in each experiment are listed in Table 3 and are expressed as a ratio for each test condition.The level of proteins (during the labeling period) is expressed either as an a′ value or as parts permillion. Table 4 gives results of experiments that were qualitatively analyzed (induction of proteinsdetermined visually) or in which only small numbers of proteins were quantified. In these tables,investigators can look up the responses of individual proteins to different conditions. When themechanism of induction or repression of sets of proteins occurs via a common regulatory molecule,then the term regulon is used to define these coregulated proteins (27). Four regulons are listed inTable 4; the HTP regulon, controlled by σ32 (31); the OXY regulon, controlled by the OxyR protein(9); the SOS regulon, controlled by the LexA protein (64); and the LRP regulon, controlled by theleucine response regulator (15). For identified proteins, membership in a particular regulon hasusually been determined by genetic analysis; in other cases, 2-D gel analysis of mutants in theregulatory molecule revealed that proteins belonged to a regulon. Eventually, many more regulonswill be analyzed and added to the database. For example, the proteins belonging to the stimulonsinduced both by phosphate starvation and by growth in phosphonate should include the members ofthe regulon controlled by the PhoB transcriptional regulator (65). To verify that these proteins belongto this regulon, phoB mutants will be analyzed.

FIGURE 3 (continued)

Page 10: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

Part of ECO2DBASE, the electronic version of the database at the National Center forBiotechnology Information (see the section on Information Exchange), is a file not presented as atable here. This file, called Just Genes, lists the genes that encode E. coli proteins that are defined bygenetic or biochemical criteria or that are proposed to exist on the basis of analysis of DNA sequencebut that have not been identified on 2-D gels. This file was added to the database to serve as areference for both projects and to assist other investigators interested in a single protein that has notbeen identified on 2-D gels. For the Genome Expression Map project, this file provides the list ofORFs, which are then matched to proteins produced by the clones. For the Response/RegulationMap, this table provides an estimate of where proteins already known to be induced by differentconditions should migrate. In some cases, an identification can be made.

FIGURE 4 2-D polyacrylamide gel image made from an extract of E. coli K-12 W3110 grownaerobically in glucose minimal MOPS (29) medium plus thiamine at 37°C and labeled with[35S]methionine as described elsewhere (VanBogelen, manuscript in preparation). The 2-D gel was

Page 11: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

run on the Investigator 2-D gel system (Millipore), using 4-8 ampholines in the first dimension (asdescribed in the Investigator manual) and 11.5% duracryl (42) (as described in the Investigatormanual except that pH 8.8 Trizma from sigma was used as the 1.5 M Tris for making up the slab gel).The gel image was captured by exposing the dried gel to a phosphorimager plate (MolecularDynamics) and scanning the plate on the phosphorimager (Molecular Dynamics). The image file wasthen transferred to the BioImage image analysis system (Millipore), where spot analysis andmatching were done. The coordinates listed in Table 1 were generated by the software, as were theResponse/Regulation Map names (match numbers between different gels). The image wastransferred to a Macintosh computer, a grid was placed over the image to match the coordinatesassigned to each spot, and the image was printed on a Tetronix Phaser II printer.

APPLICATIONS FOR THE DATABASE

Although the information in the database is accessible independently from the 2-D gels, it is oftenseen merely as a master 2-D gel database for E. coli. While it does serve this purpose, there are manyother applications for this database. The global approach of this database offers a special set ofdata to E. coli investigators because it links a genome analysis (Genome Expression Map) withphysiological and regulatory analyses (Response/Regulation Map). These types of cellularprotein databases are also being constructed for Drosophila melanogaster, mice, rats, andhumans (reviewed in reference 62). Each eukaryotic database focuses on a specific type of cell,tissue, or body fluid. The aim of the Drosophila database is to study the variations in individualproteins in different developmental processes (51). The mammalian databases are trying to findproteins altered by disease states of cells and are also examining the effects of drug therapies.Many of the human databases are also linked to the human genome project (8).

The database primarily serves two types of applications: (i) for individual proteins, thedatabase lists how the level and/or synthesis rate varies under different conditions and indifferent mutant strains; and (ii) for diagnosing the physiological state of a culture, the databaseidentifies sets of proteins that are known by 2-D gel analysis to respond to a particular condition.Table 3 lists the responses of individual proteins to several conditions, making it relatively easyto identify the groups of proteins that respond similarly. With the recent developments in imageanalysis, more proteins can be analyzed per experiment. In all cases, when the protein spot hasbeen identified as the product of a gene (through the Expression Map), subsequent analysis cango much further.

Perhaps one of the best examples of the use of the database to study an individual gene is theuniversal stress protein. This protein, C013.5, is a fairly abundant protein under the standardgrowth conditions used for the database (aerobic growth in glucose minimal medium at 37°C)but was observed to be induced by almost all of the stress conditions tested (Table 4). By reversegenetics (using protein purified on 2-D gels), the gene was identified and cloned, the DNAsequence was determined, and mutants were made (36). None of the known regulatory proteinsfor the stress responses appear to control this gene. Several phenotypes for the null mutant havebeen observed, which suggests that the protein is involved in regulating the utilization of glucoseand the intermediates of glucose metabolism and also in regulating the steps involved in thedifferentiation of cells into an easily recoverable postexponential state (37). No other studies ofeither of these processes had ever identified this protein.

Page 12: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 13: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 14: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 15: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 16: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 17: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 18: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 19: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 20: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 21: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 22: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 23: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 24: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 25: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 26: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 27: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 28: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 29: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 1 Continued

The physiological state of a culture is very difficult to diagnose. Many techniques for measuringor examining a single molecule or enzymatic activity have been developed. A more global look at thephysiological states of cells can be taken by means of 2-D gel analysis. This approach allowsinvestigators to alternate between 2-D gel analysis of physiological states of cells and genetic andbiochemical analyses of individual genes. The best example of this application is the study of theheat shock response. One of the first global studies done by 2-D gels was of the response to atemperature shift (26). Early studies of the responses to changes in temperature had indicated thatprotein synthesis was unaffected (for shifts from 37 to 42°C in which the growth rate is unchanged).However, examination of pulse-labeled proteins on 2-D gels revealed that the synthesis rates ofalmost all proteins change transiently (26). The rate of synthesis of a small set of proteins was foundto increase dramatically after a temperature shift-up. Later, 2-D gel analysis of a temperature-sensitive mutant revealed that this set of proteins was part of a regulon (30). Many genetic andbiochemical studies that characterized the regulatory gene and its protein followed (31). Many of themembers of this regulon had previously been characterized through genetic and biochemical analysesand were subsequently identified as heat shock proteins by means of 2-D gel analysis (e.g., seereferences 55 and 57). Even the signal transduction pathway for this regulon has been partially studiedby 2-D gel analysis (60). Many of the stress conditions listed in Table 4 were used as part of the studyof inducers of heat shock proteins. This type of global analysis is beginning to play an important part inexpanding our information on other regulons as well (e.g., the LRP regulon [15]), which had previouslybeen studied extensively through the biochemical and genetic analyses of one (or a small set) of theregulon members.

The database contains information on the levels of certain proteins at various growth rates thatcan prove useful in yet another way. For example, this information was used as the basis forexperiments that used a novel approach to estimating the growth rate of Salmonella typhimurium(official designation, Salmonella enterica serovar Typhimurium) while these bacteria resided withinmacrophage host cells (1). Within a certain range, the levels of various translation factors andribosomal proteins vary directly with growth rate (43). The level of ribosomal protein L7/L12 seenon 2-D gels produced from intracellular S. typhimurium suggested that the intracellular bacteria were

Page 30: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

growing rapidly. Prior to these experiments, the growth rates of intracellular bacteria had beenestimated solely by counting viable bacteria following lysis of the host cells. The viable-countapproach had indicated that intracellular S. typhimurium cells were growing quite slowly. Thesecontrasting results led to further experiments in which it was determined that the intracellular bacteriaconsisted of at least two populations, one not dividing but viable and the other rapidly dividing (1).

A third type of query of the database is used to identify cellular trends for proteins. For example,Savageau used the database to look at the distribution of MW of proteins (52). Similar types ofdistributions for pI, amino acid usage, abundances of different classes of proteins, and evenconsensus sequences within the promoter regions for sets of coregulated genes could be determinedby using the information in the database, especially as the number of identified proteins (in theGenome Expression Map project) and the number of conditions (in the Response/Regulation Mapproject) increases to represent a larger fraction of the total number of E. coli proteins. Once 2-D geldatabases for other bacterial species are initiated, interesting comparative studies will be possible.

REFERENCE 2-D GELS

Five figures are included in the database: the three reference 2-D gels published in a previous editionof the database (63) (Fig. 1–3), one new reference 2-D gel that represents the Response/RegulationMap (Fig. 4), and a figure that gives the distributions of MW and pI for the proteins identified onthese reference gels (Fig. 5). The reference gels are overlaid with grids, and the exact coordinatesfor each protein in the database are listed in Table 1 under the spot name. The coordinates forFig. 4 are assigned by the computer program. A coarse grid was placed on the figure to locate thespots. The equations listed in Fig. 5 were used to estimate the MWs and pIs of the proteins listedin Table 1.

DATABASE TABLES

The volume of data found in this database is difficult to present as tables, especially consideringthe numerous starting points for posing questions of the database. Users are encouraged to obtainthe electronic version of the database (see section on Information Exchange).

Table 1

Table 1 (p. 2076) gives the positions of protein spots on 2-D gels and the MW and pI for eachprotein. This table is sorted in order by the spot name, first by alphanumeric names and then bythe Response/Regulation Map names. All of the protein spots listed in other tables of thedatabase are listed in Table 1. All of the spots observed in Fig. 1 and 4 have been assignednames, but some have no data entered and thus have not been included in this tabular version ofthe database. They are listed in the electronic version, ECO2DBASE (see the section onInformation Exchange). Table 1 lists the coordinate positions (on Fig. 1 through 4) for the spots,the calculated MW and pI of each identified protein, and an estimated MW and pI for everyprotein in the table.

Table 2

Table 2 (p. 2094) lists all of the proteins that have been identified as products of particular genes(or ORFs found within the DNA sequence) or are known proteins. The table is sorted by genename, and it references all of the information in the Expression Map. The following types ofinformation for each protein spot are included: gene name, protein name (if one has been

Page 31: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

assigned), alphanumeric name, category of function (48), EC number, SWISS-PROT number,GenBank codes, direction of the gene on the chromosome, genetic map location, physical maplocation (using the Kohara miniset to approximate the location), basis of the identification, anddonor of the material used in the identification. Table 2 lists some proteins expressed from aspecific Kohara clone but not linked to a gene contained on that clone.

Page 32: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 33: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 34: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 35: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 36: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 37: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 38: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 2 Continued

Page 39: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 40: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 41: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 3 Continued

Page 42: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 43: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

Table 3 Continued

Page 44: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 3 Continued

Page 45: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 46: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 47: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 48: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 49: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 50: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 4 Proteins belonging to stimulons and regulons?

Page 51: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

TABLE 4 Continued

Page 52: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115
Page 53: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

Table 3

Table 3 (p. 2101) lists all proteins included in a global study in which the level or synthesis ratesof proteins were measured. Columns 3 to 14 represent steady-state growth conditions; the next 5columns list growth transition conditions. The table presents the data, and the footnotes give abrief description of the experiment and/or the paper that originally presented the data. Includedin this table are the gene names associated with identified proteins.

Table 4

Table 4 (p. 2112) lists the protein spots induced by one or more of the conditions not listed inTable 3. Y indicates that the proteins appeared to be induced, according to visual analysis of the2-D gels, and Y followed by a number indicates the induction ratio of that protein. This tablealso lists proteins belonging to one or more regulons (only the HTP, SOS, OXY, and LRPregulons have been included so far).

INFORMATION EXCHANGE

Information exchange is a priority issue for the database. By 1990, information from numerouspublications, laboratory notebooks, and the gene-protein index had all been entered into anelectronic version of the database. In 1992, the electronic version was deposited at the databaserepository at the National Center for Biotechnology Information, and updates were submitted tomake all of the information accessible to investigators. Large-volume information databases arebest used in electronic form, and users are encouraged to obtain the database through anonymousftp from the repository. The Internet address is ncbi.nlm.gov or 130.14.20.1 in the directory/ncbi/repository/ECO2DBASE. The reference 2-D gels are in the GELS directory, and thedatabase and information files are in the edition6 directory. For those users who do not haveaccess to Internet, a copy of the database can be obtained from the authors (please specify a diskformat).

Page 54: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

FIGURE 5 Plots of the pls of identified proteins (Table 2) versus migration in the first directionand of the MWs of identified proteins versus migration in the second dimension for each of thefive reference images in Fig. 1 through 4. Equations for each line or curve (generated by thegraphical software program KaleidaGraph) are given along with the number of proteins on eachplot and the r value for each line or curve. These equations were used to estimate the pls andMWs given in Table 1, and the equations generated for the reverse plot for Fig. 1 and 3B wereused to estimate the locations on 2-D gels (Fig. 1 or 2B) of unidentified proteins (based on MWsand pls calculated from their amino acid compositions).

The alphanumeric names of proteins that have been identified have been incorporated intothe other databases, including the SWISS-PROT protein database (5) and the ECD database (24),so that users can easily and accurately move among the different databases. A new database forE. coli (based on the Caenorhabditis elegans database) is being developed. It will serve as an

Page 55: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

encyclopedia of all the information known about E. coli (Staffan Bergh, personalcommunication). All of the independent databases are being included in this encyclopedia. Thegene-protein database, including the 2-D reference gels, has already been entered.

Other investigators can contribute information to the database. For the Genome ExpressionMap project, samples of purified proteins can be sent to assist in the identification project. Forthe Response/Regulation Map, investigators are encouraged to submit physiological andregulatory information from their own 2-D gel analyses (as was done by B. Ernsting and R.Matthews [15]), although this requires that the 2-D gel pattern closely match that of the referencegels.

ACKNOWLEDGMENTS

The Genome Expression Map project is supported by grant DMB-8903787 from the NationalScience Foundation and grant GM17892 from the National Institutes of Health (NIH). Currentwork on the Response/Regulation Map is supported through Parke-Davis PharmaceuticalResearch. A. Pertsemlidis was supported by NIH grant GM08352–784525–31002.

We thank the many investigators (listed in Table 2) who have contributed biological materialfor protein identifications. We thank Amos Bairoch for assistance with the gene names andSWISS-PROT accession numbers and Manfried Kroger and Kenn Rudd for their assistance withmap positions of genes. We also acknowledge all of the scientists who have worked on thedatabase in the past: David Appleby, Philip L. Bloch, Jacqueline A. Bogan, Madhumita Ghosh,Sherrie Herendeen, M. Elizabeth Hutton, Douglas Irvine, Peggy LeMaux, Steen Pedersen,Teresa A. Phillips, Sankar P. Reddy, Solvejg Reeh, and Vicki Vaughn.

LITERATURE CITED

1. Abshire, K. Z., and F. C. Neidhardt. 1993. Growth rate paradox of Salmonellatyphimurium within host macrophages. J. Bacteriol. 175:3744–3748.

2. Ames, G. F.-L., and K. Nikaido. 1976. Two-dimensional gel electrophoresis of membraneproteins. Biochemistry 15:616–622.

2a.Allen, S. P., J. O. Polazzi, J. K. Gierse, and A. M. Easton. 1992. Two novel heat shockgenes encoding proteins produced in response to heterologous protein expression inEscherichia coli. J. Bacteriol. 174:6938–6947.

3. Ang, D., G. N. Chandrasekhar, M. Zylicz, and C. Georgopoulos. 1986. Escherichia coligrpE gene codes for heat shock protein B25.3, essential for both lambda DNA replication at alltemperatures and host growth at high temperature. J. Bacteriol. 167:25–29.

4. Bachmann, B. J. 1990. Linkage map of Escherichia coli K-12, edition 8. Microbiol. Rev.54:130–197.

5. Bairoch, A., and B. Boeckmann. 1993. The SWISS-PROT protein sequence data bankrecent developments. Nucleic Acids Res. 21:3093–3096.

6. Bloch, P. L., T. A. Phillips, F. C. Neidhardt. 1980. Protein identifications of O’Farrell two-dimensional gels: locations of 81 Escherichia coli proteins. J. Bacteriol. 141:1409–1420.

7. Blumenthal, R. M., P. G. Lemaux, F C. Neidhardt, and P. P. Dennis. 1976. The effects ofthe relA gene on the synthesis of aminoacyl-tRNA synthetases and other transcription andtranslation proteins in Escherichia coli A. Mol. Gen. Genet. 149:291–296.

8. Celis, J. E., H. H. Rasmussen, E. Olsen, P. Madsen, H. Leffers, B. Honore, K. Dejgaard,P. Gromov, H. J. Hoffmann, and M. Nielsen. 1993. The human keratinocyte two-dimensional gel protein database: update 1993. Electrophoresis 14:1091–1198.

Page 56: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

9. Christman, M. F., R. W. Morgan, F. S. Jacobson, and B. N. Ames. 1985. Positive controlof a regulon for defenses against oxidative stress and some heat-shock proteins in Salmonellatyphimurium. Cell 41:753–762.

10. Chuang, S.-E., and F. R. Blattner. 1993. Characterization of twenty-six new heat shockgenes of Escherichia coli. J. Bacteriol. 175:5242–5252.

11. Clarke, L., and J. Carbon. 1976. A colony bank containing synthetic ColE1 hybridplasmids representative of the entire E. coli genome. Cell 9:91–99.

12. Copeland, B. R., R. J. Richter, and C E. Furlong. 1982. Renaturation and identification ofperiplasmic proteins in two-dimensional gels of Escherichia coli. J. Biol. Chem. 257:15065–15071.

13. Daniels, D. L., G. Plunkett, V. Burland, and F. Blattner. 1992. DNA sequence of E. coli.I. The region from 84.5 to 86.5 minutes. Science 257:771–778.

14. Engstrom, P., and G. L. Hazelbauer. 1980. Multiple methylation of methyl-acceptingchemotaxis proteins during adaptation of E. coli to chemical stimuli. Cell 20:165–171.

15. Ernsting, B. R., M. R. Atkinson, A. J. Ninfa, and R. G. Matthews. 1992. Characterizationof the regulon controlled by the leucine-responsive regulatory protein in Escherichia coli. J.Bacteriol. 174:1109–1118.

16. Gage, D. J., and F. C. Neidhardt. 1993. Adaptation of Escherichia coli to the uncoupler ofoxidative phosphorylation 2,4-dinitrophenol. J. Bacteriol. 175:7105–7108.

17. Goldstein, J., N. S. Pollitt, and M. Inouye. 1990. Major cold shock protein of Escherichiacoli. Proc. Natl. Acad. Sci. USA 87:283–287.

18. Goodlove, P. E., P. R. Cunningham, J. Parker, and D. P. Clark. 1989. Cloning andsequence analysis of the fermentative alcohol-dehydrogenase-encoding gene of Escherichiacoli. Gene 85:209–214.

19. Gudas, L. J., and D. W. Mount. 1977. Identification of the recF (tif) gene product ofEscherichia coli. Proc. Natl. Acad. Sci. USA 74:5280–5284.

20. Herendeen, S. H., R. A. VanBogelen, F. C. Neidhardt. 1979. Levels of major proteins ofEscherichia coli during growth at different temperatures. J. Bacteriol. 139:185–194.

21. Jones, P. G., R. A. VanBogelen, and F. C. Neidhardt. 1987. Induction of proteins inresponse to low temperature in Escherichia coli. J. Bacteriol. 169:2092–2095.

22. Kaltschmidt, E., and H. G. Wittmann. 1970. Ribsomal proteins. VII. Two-dimensionalpolyacrylamide gel electrophoresis for fingerprinting of ribosomal proteins. Anal. Biochem.36:401–412.

23. Kohara, Y., K. Akiyama, and K. Isono. 1987. The physical map of the whole E. colichromosome: application of a new strategy for rapid analysis and sorting of a large genomiclibrary. Cell 50:495–508.

24. Kroger, M., R. Wahl, and P. Rice. 1993. Compilation of DNA sequences of Escherichiacoli (update 1993). Nucleic Acids Res. 21:2973–3000.

25. Kroh, H. E., and L. D. Simon. 1990. The C1pP component of C1p protease is the sigma-32dependent heat shock protein F21.5. J. Bacteriol. 172:6026–6034.

26. Lemaux, P. G., S. L. Herendeen, P. L. Bloch, and F. C. Neidhardt. 1978. Transient ratesof synthesis of individual polypeptides in E. coli following temperature shifts. Cell 13:427–434.

27. Neidhardt, F. C. 1987. Multigene systems and regulons, p. 1313–1317. In F. C. Neidhardt, J.L. Ingraham, K. B. Low, B. Magasanik, M. Schaecter, and H. E. Umbarger (ed.), Escherichia coliand Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. American Society forMicrobiology, Washington, D.C.

28. Neidhardt, F. C., P. L. Bloch, S. Pedersen, and S. Reeh. 1977. Chemical measurement ofsteady-state levels of ten aminoacyl-transfer ribonucleic acid synthetases in Escherichia coli. J.

Page 57: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

Bacteriol. 129:378–387.29. Neidhardt, F. C., P. L. Bloch, and D. F. Smith. 1974. Culture media for enterobacteria. J.

Bacteriol. 199:736–747.30. Neidhardt, F. C., and R. A. VanBogelen. 1981. Positive regulatory gene for temperature-

controlled proteins in Escherichia coli. Biochem. Biophys. Res. Commun. 100:894–900.31. Neidhardt, F. C., and R. A. VanBogelen. 1987. Heat shock response, p. 1334–1345. In F. C.

Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaecter, and H. E. Umbarger (ed.),Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology, vol. 2. AmericanSociety for Microbiology, Washington, D.C.

32. Neidhardt, F. C., R. A. VanBogelen, and V. Vaughn. 1984. The genetics and regulation ofheat-shock proteins. Annu. Rev. Genet. 18:295–329.

33. Neidhardt, F. C., V. Vaughn, T. A. Phillips, and P. L. Bloch. 1983. Gene-protein index ofEscherichia coli K-12. Microbiol. Rev. 47:231–284.

34. Neidhardt, F. C., R. Wirth, M. W. Smith, and R. VanBogelen. 1980. Selective synthesis ofplasmid-coded proteins by Escherichia coli during recovery from chloramphenicol treatment. J.Bacteriol. 143:535–537.

35. Nomenclature Committee of the International Union of Biochemistry. 1984. EnzymeNomenclature. Academic Press, Inc., New York.

36. Nystrom, T., and F. C. Neidhardt. 1992. Cloning, mapping, and nucleotide sequence of a geneencoding a universal stress protein in Escherichia coli. Mol. Microbiol. 6:3187–3198.

37. Nystrom, T., and F. C. Neidhardt. 1993. Isolation and properties of a mutant of Escherichiacoli with an insertional inactivation of the uspA gene, which encodes a universal stress protein. J.Bacteriol. 175:3949–3956.

38. O’Farrell, P. H. 1975. High resolution two-dimensional electrophoresis of proteins. J. Biol.Chem. 250:4007–4021.

39. O’Farrell, P. Z., H. M. Goodman, and P. H. O’Farrell. 1977. High resolution two-dimensional electrophoresis of basic as well as acidic proteins. Cell 12:1133–1142.

40. Parker, J. 1984. Identification of the purC gene product of Escherichia coli. J. Bacteriol.157:712–717.

41. Patterson, S. D., and G. I. Latter. 1993. Evaluation of storage phospho imaging forquantitative analysis of 2-D gels using the Quest II system. BioComputing 15:1076–1083.

42. Patton, W. F., M. F. Lopez, P. Barry, and W. M. Skea. 1992. A mechanically strongmatrix for protein electrophoresis with enhanced silver staining properties. BioTechniques12:580–585.

43. Pedersen, S., P. L. Bloch, S. Reeh, and F. C. Neidhardt. 1978. Patterns of proteinsynthesis in E. coli: a catalog of the amount of 140 individual proteins at different growthrates. Cell 14:179–190.

44. Phillips, T. A., P. L. Bloch, and F. C. Neidhardt. 1980. Protein identifications on O’Farrelltwo-dimensional gels: locations of 55 additional Escherichia coli proteins. J. Bacteriol.144:1024–1033.

45. Phillips, T. A., V. Vaughn, P. L. Bloch, and F. C. Neidhardt. 1987. Gene-protein index ofEscherichia coli K-12, edition 2, p. 919–966. In F. C. Neidhardt, J. L. Ingraham, K. B. Low, B.Magasanik, M. Schaecter, and H. E. Umbarger (ed.), Escherichia coli and Salmonellatyphimurium: Cellular and Molecular Biology, vol. 2. American Society for Microbiology,Washington, D.C.

46. Reeh, S., and S. Pedersen. 1979. Post-translational modification of Escherichia coli ribosomalprotein S6. Mol. Gen. Genet. 173:183–187.

47. Reeve, J. 1979. The use of minicells for bacteriophage directed polypeptide synthesis. MethodsEnzymol. 68:493–503.

Page 58: Gene-Protein Database of Escherichia coli K-12, Edition 6 · Edition 6 RUTH A. VANBOGELEN, KELLY Z. ABSHIRE, ALEXANDER PERTSEMLIDIS, ROBERT L. CLARK, AND FREDERICK C. NEIDHARDT 115

48. Riley, M. 1993. Functions of the gene products of Escherichia coli. Microbiol. Rev. 57:862–952.

49. Sancar, A., A. M. Hack, and W. D. Rupp. 1979. Simple method for identification of plasmid-coded proteins. J. Bacteriol. 137:692–693.

50. Sankar, P., M. E. Hutton, R. A. VanBogelen, R. L. Clark, and F. C. Neidhardt. 1993.Expression analysis of cloned chromosomal segments of Escherichia coli. J. Bacteriol.175:5145–5152.

51. Santaren, J. F. 1990. Towards establishing a protein database of Drosophila. Electrophoresis11:254–267.

52. Savageau, M. A. 1986. Proteins of Escherichia coli come in sizes that are multiples of 14kDa:domain concepts and evolutionary implications. Proc. Natl. Acad. Sci. USA 83:1198–1202.

52a.Sood, P., C. G. Lerner, T. Shimamoto, Q. Lu, and M. Inouye. 1994. Characterization of Era,essential Escherichia coli GTPase. Mol. Microbiol. 12:201–208.

53. Smith, M. W., and F. C. Neidhardt. 1983. Proteins induced by anaerobiosis in Escherichiacoli. J. Bacteriol. 154:336–343.

54. Smith, M. W., and F. C. Neidhardt. 1983. Proteins induced by aerobiosis in Escherichia coli.J. Bacteriol. 154:344–350.

55. Squires, C. L., S. Petersen, B. M. Ross, and C. Squires. 1991. C1pB is the Escherichia coliheat shock protein F84.1. J. Bacteriol. 173:4254–4262.

56. Studier, F. W., and B. A. Moffatt. 1986. Use of bacteriophage T7 RNA polymerase ofdirect selective high-level expression of cloned genes. J. Mol. Biol. 189:113–130.

57. Tilly, K., R. A. VanBogelen, C. Georgopoulis, and F. C. Neidhardt. 1983. Identificationof the heat-inducible protein C15.4 as the groES gene product in Escherichia coli. J. Bacteriol.154:1505–1507.

58. VanBogelen, R. A., M. E. Hutton, and F. C. Neidhardt. 1990. Gene protein database ofEscherichia coli K-12: edition 3. Electrophoresis 11:1131–1166.

59. VanBogelen, R. A., P. M. Kelley, and F. C. Neidhardt. 1987. Differential induction ofheat shock, SOS, and oxidation stress regulons and accumulation of nucleotides in Escherichiacoli. J. Bacteriol. 169:26–32.

60. VanBogelen, R. A., and F. C. Neidhardt. 1990. Ribosomes as sensors of heat and coldshock in Escherichia coli. Proc. Natl. Acad. Sci. USA 87:5589–5593.

61. VanBogelen, R. A., and F. C. Neidhardt. 1991. The gene-protein database of Escherichiacoli K-12: edition 4. Electrophoresis 12:955–994.

62. VanBogelen, R. A., and E. R. Olson. Application of 2-D protein gels in biotechnology.Biotech. Annu. Rev., in press.

63. VanBogelen, R. A., P. Sankar, R. L. Clark, J. A. Bogan, and F. C. Neidhardt. 1992. Thegene-protein database of Escherichia coli K-12: edition 5. Electrophoresis 13:1014–1054.

64. Walker, G. C. 1987. The SOS response of Escherichia coli, p. 1346–1357. In F. C.Neidhardt, J. L. Ingraham, K. B. Low, B. Magasanik, M. Schaecter, and H. E. Umbarger(ed.), Escherichia coli and Salmonella typhimurium: Cellular and Molecular Biology,vol. 2. American Society for Microbiology, Washington, D.C.

65. Wanner, B. L. 1992. Is cross regulation by phosphorylation of two-component responseregulator proteins important in bacteria? J. Bacteriol. 174:2053–2058.