Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella...

15
INFECTION AND IMMUNITY, 0019-9567/01/$04.0010 DOI: 10.1128/IAI.69.5.3271–3285.2001 May 2001, p. 3271–3285 Vol. 69, No. 5 Copyright © 2001, American Society for Microbiology. All Rights Reserved. Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella flexneri MALABI M. VENKATESAN, 1 * MARCIA B. GOLDBERG, 2 * DEBRA J. ROSE, 3 ERIK J. GROTBECK, 3 VALERIE BURLAND, 3 AND FREDERICK R. BLATTNER 3 Department of Enteric Infections, Division of Communicable Diseases and Immunology, Walter Reed Army Institute of Research, Silver Spring, Maryland 20910 1 ; Infectious Disease Division, Massachusetts General Hospital, Boston, Massachuetts 02114 2 ; and Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 53706 3 Received 19 September 2000/Returned for modification 22 November 2000/Accepted 31 January 2001 The complete sequence analysis of the 210-kb Shigella flexneri 5a virulence plasmid was determined. Shigella spp. cause dysentery and diarrhea by invasion and spread through the colonic mucosa. Most of the known Shigella virulence determinants are encoded on a large plasmid that is unique to virulent strains of Shigella and enteroinvasive Escherichia coli; these known genes account for approximately 30 to 35% of the virulence plasmid. In the complete sequence of the virulence plasmid, 286 open reading frames (ORFs) were identified. An astonishing 153 (53%) of these were related to known and putative insertion sequence (IS) elements; no known bacterial plasmid has previously been described with such a high proportion of IS elements. Four new IS elements were identified. Fifty putative proteins show no significant homology to proteins of known function; of these, 18 have a G1C content of less than 40%, typical of known virulence genes on the plasmid. These 18 constitute potentially unknown virulence genes. Two alleles of shet2 and five alleles of ipaH were also identified on the plasmid. Thus, the plasmid sequence suggests a remarkable history of IS-mediated acquisition of DNA across bacterial species. The complete sequence will permit targeted characterization of potential new Shigella virulence determinants. Shigella spp. continue to be a major health problem world- wide, causing an estimated 1 million deaths and 163 million cases of dysentery annually (29), predominantly in children younger than 5 years of age in developing countries. Shigella spp. cause bacillary dysentery in humans by invading and rep- licating in epithelial cells of the colon, causing an intense inflammatory reaction, characterized by abscess formation and ulceration, which damages the colonic epithelium. Shigella entry into susceptible host cells requires the coordi- nated expression of numerous genes that are activated in re- sponse to environmental cues (29, 68). Upon contact with host cells, the bacteria secrete factors (IpaB, IpaC, and IpaA) that induce large membrane ruffles on the host cell surface associ- ated with cytoskeletal rearrangements that lead to internaliza- tion of bacteria into the host cell. Once internalized, Shigella spp. release factors that cause lysis of the phagocytic vacuole, thereby releasing the bacteria into the cell cytoplasm, where the bacterial surface protein VirG (IcsA) assembles actin tails that propel the bacteria through the cell cytoplasm and into adjacent cells. Additional bacterial factors lead to release of proinflammatory cytokines, osmotic leak of the mucosal epi- thelium (ShET1 and ShET2), and, in macrophages, induction of cell death (IpaB) (20, 38, 75). Most work on the molecular pathogenesis of Shigella has been carried out in S. flexneri serotypes 2a and 5a. The entire complement of genes critical for invasion of epithelial cells is contained on a large 220-kb plasmid, termed the virulence plasmid or the invasion plasmid, which is present in all patho- genic strains (68). Known to be located on the virulence plas- mid is a locus of genes (ipa-mxi-spa) that encode proteins involved in invasion of mammalian cells and which has ho- mologs in Salmonella, Yersinia, enteropathogenic Escherichia coli, the plant pathogens Ralstonia salanacearum and Xan- thomonas campestris, and the flagellar assembly loci of Salmo- nella enterica serovar Typhimurium (21). In addition to those mentioned above, several other virulence plasmid proteins have been previously characterized (68). Together, the known genes account for approximately 30 to 35% of the virulence plasmid. Sequence analysis of the entire virulence plasmid was undertaken to permit identification of all plasmid-based proteins, including potential new determi- nants of virulence, to facilitate the understanding of evolution- ary assembly of the plasmid and mechanisms that generate spontaneous deletions which lead to strain avirulence, and to permit comparisons of virulence plasmids of different Shigella serotypes and strains and of different bacteria that vary in virulence properties. Herein, initial analysis of the virulence plasmid from S. flexneri 5a is presented. MATERIALS AND METHODS Bacterial strains and plasmids. The virulence plasmid pWR501, which was sequenced in this study, had been previously derived from the native virulence plasmid pWR100 (39). In brief, S. flexneri wild-type serotype 5a strain M90T, harboring pWR100, was crossed with E. coli MC1061 carrying pMT999, a Tn501- labeled, self-transmissible, temperature-sensitive plasmid. The cross resulted in M90T carrying a cointegrate of pWR100 and pMT999, which was subsequently mobilized into E. coli 395-1 by conjugation. Following passage of this strain at 42°C with appropriate antibiotic selection, a pWR100 derivative from which * Corresponding author. Mailing address for Malabi M. Venkate- san: Department of Enteric Infections, Division of Communicable Diseases and Immunology, Walter Reed Army Institute of Research, 503 Robert Forney Dr., Silver Spring, MD 20910. Phone: (301) 319- 9764. Fax: (301) 319-9801. E-mail: [email protected] .ARMY.MIL. Mailing address for Marcia B. Goldberg: Infectious Disease Division, Massachusetts General Hospital, GRJ504, 55 Fruit St., Boston, MA 02114. Phone: (617) 432-3238. Fax: (617) 432-3259. E-mail: [email protected]. 3271 on August 26, 2020 by guest http://iai.asm.org/ Downloaded from

Transcript of Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella...

Page 1: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

INFECTION AND IMMUNITY,0019-9567/01/$04.0010 DOI: 10.1128/IAI.69.5.3271–3285.2001

May 2001, p. 3271–3285 Vol. 69, No. 5

Copyright © 2001, American Society for Microbiology. All Rights Reserved.

Complete DNA Sequence and Analysis of the Large VirulencePlasmid of Shigella flexneri

MALABI M. VENKATESAN,1* MARCIA B. GOLDBERG,2* DEBRA J. ROSE,3 ERIK J. GROTBECK,3

VALERIE BURLAND,3 AND FREDERICK R. BLATTNER3

Department of Enteric Infections, Division of Communicable Diseases and Immunology, Walter Reed Army Institute ofResearch, Silver Spring, Maryland 209101; Infectious Disease Division, Massachusetts General Hospital, Boston,

Massachuetts 021142; and Laboratory of Genetics, University of Wisconsin, Madison, Wisconsin 537063

Received 19 September 2000/Returned for modification 22 November 2000/Accepted 31 January 2001

The complete sequence analysis of the 210-kb Shigella flexneri 5a virulence plasmid was determined. Shigellaspp. cause dysentery and diarrhea by invasion and spread through the colonic mucosa. Most of the knownShigella virulence determinants are encoded on a large plasmid that is unique to virulent strains of Shigella andenteroinvasive Escherichia coli; these known genes account for approximately 30 to 35% of the virulenceplasmid. In the complete sequence of the virulence plasmid, 286 open reading frames (ORFs) were identified.An astonishing 153 (53%) of these were related to known and putative insertion sequence (IS) elements; noknown bacterial plasmid has previously been described with such a high proportion of IS elements. Four newIS elements were identified. Fifty putative proteins show no significant homology to proteins of known function;of these, 18 have a G1C content of less than 40%, typical of known virulence genes on the plasmid. These 18constitute potentially unknown virulence genes. Two alleles of shet2 and five alleles of ipaH were also identifiedon the plasmid. Thus, the plasmid sequence suggests a remarkable history of IS-mediated acquisition of DNAacross bacterial species. The complete sequence will permit targeted characterization of potential new Shigellavirulence determinants.

Shigella spp. continue to be a major health problem world-wide, causing an estimated 1 million deaths and 163 millioncases of dysentery annually (29), predominantly in childrenyounger than 5 years of age in developing countries. Shigellaspp. cause bacillary dysentery in humans by invading and rep-licating in epithelial cells of the colon, causing an intenseinflammatory reaction, characterized by abscess formation andulceration, which damages the colonic epithelium.

Shigella entry into susceptible host cells requires the coordi-nated expression of numerous genes that are activated in re-sponse to environmental cues (29, 68). Upon contact with hostcells, the bacteria secrete factors (IpaB, IpaC, and IpaA) thatinduce large membrane ruffles on the host cell surface associ-ated with cytoskeletal rearrangements that lead to internaliza-tion of bacteria into the host cell. Once internalized, Shigellaspp. release factors that cause lysis of the phagocytic vacuole,thereby releasing the bacteria into the cell cytoplasm, wherethe bacterial surface protein VirG (IcsA) assembles actin tailsthat propel the bacteria through the cell cytoplasm and intoadjacent cells. Additional bacterial factors lead to release ofproinflammatory cytokines, osmotic leak of the mucosal epi-thelium (ShET1 and ShET2), and, in macrophages, inductionof cell death (IpaB) (20, 38, 75).

Most work on the molecular pathogenesis of Shigella has

been carried out in S. flexneri serotypes 2a and 5a. The entirecomplement of genes critical for invasion of epithelial cells iscontained on a large 220-kb plasmid, termed the virulenceplasmid or the invasion plasmid, which is present in all patho-genic strains (68). Known to be located on the virulence plas-mid is a locus of genes (ipa-mxi-spa) that encode proteinsinvolved in invasion of mammalian cells and which has ho-mologs in Salmonella, Yersinia, enteropathogenic Escherichiacoli, the plant pathogens Ralstonia salanacearum and Xan-thomonas campestris, and the flagellar assembly loci of Salmo-nella enterica serovar Typhimurium (21). In addition to thosementioned above, several other virulence plasmid proteinshave been previously characterized (68).

Together, the known genes account for approximately 30 to35% of the virulence plasmid. Sequence analysis of the entirevirulence plasmid was undertaken to permit identification ofall plasmid-based proteins, including potential new determi-nants of virulence, to facilitate the understanding of evolution-ary assembly of the plasmid and mechanisms that generatespontaneous deletions which lead to strain avirulence, and topermit comparisons of virulence plasmids of different Shigellaserotypes and strains and of different bacteria that vary invirulence properties. Herein, initial analysis of the virulenceplasmid from S. flexneri 5a is presented.

MATERIALS AND METHODS

Bacterial strains and plasmids. The virulence plasmid pWR501, which wassequenced in this study, had been previously derived from the native virulenceplasmid pWR100 (39). In brief, S. flexneri wild-type serotype 5a strain M90T,harboring pWR100, was crossed with E. coli MC1061 carrying pMT999, a Tn501-labeled, self-transmissible, temperature-sensitive plasmid. The cross resulted inM90T carrying a cointegrate of pWR100 and pMT999, which was subsequentlymobilized into E. coli 395-1 by conjugation. Following passage of this strain at42°C with appropriate antibiotic selection, a pWR100 derivative from which

* Corresponding author. Mailing address for Malabi M. Venkate-san: Department of Enteric Infections, Division of CommunicableDiseases and Immunology, Walter Reed Army Institute of Research,503 Robert Forney Dr., Silver Spring, MD 20910. Phone: (301) 319-9764. Fax: (301) 319-9801. E-mail: [email protected]. Mailing address for Marcia B. Goldberg: InfectiousDisease Division, Massachusetts General Hospital, GRJ504, 55 FruitSt., Boston, MA 02114. Phone: (617) 432-3238. Fax: (617) 432-3259.E-mail: [email protected].

3271

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 2: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

pMT999 had been lost and in which the Tn501 integration was retained wasisolated and designated pWR501 (39). This E. coli strain, which lacks the twosmaller Shigella plasmids, was the source for the isolation of virulence plasmidDNA. Plasmid DNA was isolated using previously published methods (70) andwas subjected to two cycles of CsCl banding and purification.

Library construction and sequencing. Plasmid DNA was sheared by nebuli-zation and size fractionated by agarose gel electrophoresis (31) to obtain DNAfragments in the range from 0.7 to 2.0 kb. Purified, end-repaired fragments weresubcloned into M13Janus (7), forming a random library. Clones were sequencedusing Prism dye terminator or BigDye terminator chemistry (P-E Applied Bio-systems, Foster City, Calif.). Data were collected on ABI377 sequencers. Se-quence reads were assembled by Seqman II (DNASTAR, Madison, Wis.). Fin-ishing methods included sequencing of opposite ends of linking clones, PCR-based techniques, and primer walking.

Sequence analysis. The sequence was searched for potential open readingframes (ORFs) by Genemark (5). Each ORF was initially searched against theGenPept protein database using the DeCypher II system, to determine identityor match with known proteins. After subsequent inspection and further analysisof many of the ORFs, annotations were created. Each gene was assigned aunique identifier (ID; S number).

Annotation. Sequence comparisons were performed using the amino acidsequences predicted for the identified ORFs unless otherwise indicated. Signif-icant homology was defined as greater than 30% identity over greater than 60%of both the query sequence and the target sequence, as has been used in othergenome analysis projects (3). An exception was applied in the analysis of inser-tion sequence (IS) elements, where, because we wanted to not miss fragments ofIS elements, we chose to apply the criteria of greater than 50% identity overgreater than 60% of the query sequence only. Based on sequence analysis andthe extent of similarity to hypothetical proteins, potential ORFs were categorizedas detailed below. In a few specific circumstances, we designated protein simi-larity on lower levels of amino acid identity than defined by these criteria; thesecases and all other exceptions to the criteria are noted in the text where appro-priate. The clusters of orthologous genes (COG) algorithm (COGnitor [65]) wasused to identify families to which predicted proteins are related, and ProfileScanand Prosite Scan algorithms were used for the identification of potential func-tional domains.

Nucleotide sequence accession number. The sequence described here has beenassigned GenBank Accession number AF 348706.

RESULTS AND DISCUSSION

Overview of annotation. The fully assembled circular se-quence of pWR501, including the Tn501 selectable marker,consisted of 221,851 bp, 213,491 bp of which was specific to theShigella virulence plasmid. Initial analysis of the sequenceidentified 293 potential ORFs (Fig. 1), 286 of which wereShigella plasmid derived and 7 of which belonged to Tn501.The 8,360-bp Tn501 insert, which had been previously intro-duced into pWR501 (see Materials and Methods), was locatedbetween sequence coordinates 198713 and 207065, with a 7-bpCCTAAAG target site duplication near the pWR501 replicon.

pWR501 is a mosaic of potential pathogenesis-associatedgenes, IS elements, maintenance genes, and unknown ORFs.Of the 286 Shigella-derived potential ORFs, 54 (19%) encodeknown Shigella proteins (category 1, indicated by red bandingin Fig. 1). Thirty-seven of these are located within a 32-kbcluster of uninterrupted ORFs, previously described, constitut-ing the ipa-mxi-spa loci or pathogenicity island (68). The re-maining 17 are distributed throughout the plasmid and includefive alleles of ipaH and one allele each of icsA (virG), virA, icsP(sopA), virF, virK, msbB sepA, ipgH, shet2, phoN-Sf, trcA, andan apyrase gene.

Four previously unidentified ORFs (1.4% of pWR501ORFs) encode proteins that have significant sequence similar-ity to known virulence-associated proteins (category 2, indi-cated by pink banding in Fig. 1); these are homologs of S.flexneri ShET2, the Salmonella serovar Typhimurium intracel-

lular growth and virulence determinant MkaD, the E. colilipopolysaccharide biosynthesis-related protein RfbU, and aUDP-sugar hydrolase (16, 18, 38, 62).

Fifty-two percent of pWR501 ORFs (153 ORFs) are knownand new IS elements (reviewed in reference 32) as well un-known IS elements (categories 3 and 4, indicated by blue andlight blue in Fig. 1). Among these, 20 complete known and atleast 4 new putative IS elements were identified. The majorityof the IS element ORFs are partial copies and, together withthe complete IS elements, are arranged in blocks separatedfrom non-IS related ORFs. In some regions, the arrangementof the IS elements is complex, with fragments of one IS ele-ment ORF interdigitated and often fused with another IS el-ement or fragment thereof, forming a mosaic pattern (Fig. 2).

In pWR501, most virulence-associated genes, including theipa-mxi-spa operons, the virG gene, and the ShET2 toxin gene,are flanked by one or more such mosaics of IS element ORFs(Fig. 2). In addition, many of the unknown ORFs (discussedbelow) are flanked by IS element ORFs and have G1C con-tent of less than 40%. Based on this genetic organization, therecombination events that led to the acquisition of many ormost genetic loci and the assembly of the large virulence plas-mid almost certainly involved IS-mediated events. To date, noknown plasmid in any bacterial species has been described withthis degree of IS element content.

Coincident with the extremely high content of IS element-related ORFs on pWR501 is an extremely low density of ORFsthat encode proteins predicted or known to be involved inother (i.e., non-IS element-related) functions in comparisonwith other plasmids. Thus, while pWR501 contains 0.62 non-ISelement-related genes per kb, the Y. pestis LCD plasmid con-tains 0.87 per kb (61 ORFs over 70,509 nucleotides) (45) andthe E. coli O157:H7 plasmid contains 0.91 per kb (84 ORFsover 92,077 nucleotides) (8). From a different perspective, 48%of pWR501 ORFs are non-IS element related, whereas 82% ofY. pestis LCD plasmid (45) and 87% of E. coli O157:H7 plas-mid (8) ORFs are non-IS element related. Furthermore, bac-terial chromosomes contain relatively few IS elements. Forexample, only 2% of the ORFs on the E. coli chromosome arethought to encode transposons, phage, or plasmids, and theoverall density of genes outside these three categories is 0.906per kb (4,201 ORFs over 4,639,221 nucleotides) (3). Thus,pWR501 appears to be and has been capable of extraordinaryIS element-related recombination events, which undoubtedlycontributed to the unique evolution of Shigella.

Twenty-five ORFs (9% of pWR501 ORFs) showed signifi-cant homology to proteins involved in plasmid replication andDNA metabolic functions (category 5, indicated by yellowbanding in Fig. 1). The replicon region, along with the origin ofreplication (ori) site, is almost identical to the R100 replicon(40, 43). Other proteins in this category include two sets oftoxin-antitoxin genes or protein addiction modules, includingthe ccdAB module of plasmid F and several homologs of full-sized or partial copies of a reverse transcriptase associated withgroup II introns (1, 33).

Fifty putative ORFs (17% of pWR501 ORFs) had insuffi-cient homology to known proteins to be assigned a putativefunction. These constitute the unknown ORFs, of which 33 hadno significant similarity to any protein or ORF in the database(category 6, indicated by brown in Fig. 1), and 17 had signifi-

3272 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 3: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

cant similarity to conserved hypothetical ORFs, i.e., proteins ofunknown function (category 7, indicated by orange in Fig. 1).Many of those in the latter group have been identified in otherbacterial sequencing projects.

A SalI restriction map of the virulence plasmid of a serotype2a strain (pMYSH6000) had been previously constructed bySasakawa et al. (53, 54). The map of pMYSH6000 assembled

from that analysis differs grossly from that of pWR501 in thatthe ipa-mxi-spa loci are in inverse orientation. The SalI restric-tion map predicted from the sequence of pWR501 matchesthat seen upon restriction analysis of pWR501 DNA (data notshown). However, the pWR501 SalI restriction map agreesonly in part with the SalI restriction map of pMYSH6000.Approximately 60% of the individual SalI fragments of

FIG. 1. Circular map of the plasmid. Outer ring depicts ORFs and their orientations, color coded according to functional category: 1, identicalor essentially identical to known virulence-associated proteins (red); 2, homologous to known pathogenesis-associated proteins (pink); 3, highlyhomologous to IS elements or transposases (blue); 4, weakly homologous to IS elements or transposases (light blue); 5, homologous to proteinsinvolved in replication, plasmid maintenance, or other DNA metabolic functions (yellow); 6, no significant similarity to any protein or ORF in thedatabase (brown); 7, homologous or identical to conserved hypothetical ORFs, i.e., proteins of unknown function (orange); and 8, Tn501insertion-associated genes (green). The second ring shows complete IS elements. The third ring graphs G1C content, calculated for each ORFand plotted around the mean value for all ORFs, with each value color coded for the corresponding ORF. Scale is in base pairs. The figure wasgenerated by Genescene (DNASTAR).

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3273

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 4: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

pWR501 are of sizes determined for individual fragments ofpMYSH6000 (e.g., pMYSH6000 fragments C, F, G, I, and J),but many fragments are of different sizes, and the total numberof fragments differs (23 in pMYSH6000, versus 22 inpWR501). These differences suggest that the virulence plas-mids from these two serotypes have diverged in overall orga-nization as well as in nucleotide sequence.

IS elements. Of the 153 ORFs related to IS element ORFs,114 (39%) have homology to known IS element ORFs (32;www-IS.biotoul.fr/is/is_family.html); of these, 71 belong tomembers of the IS3 family, which includes many IS elements inaddition to IS3. Among the complete IS elements, 20 havebeen previously identified in Shigella or other organisms; theseinclude five copies of IS629, four copies of IS1294, two copies

FIG. 2. Schematic representation of IS elements flanking virulence-associated genes on pWR501. (A) The 32-kb ipa-mxi-spa region; (B) thevirA-virG region; (C) the virF region. Sequence coordinates are indicated above each map. frag, fragment; put tnp, putative transposase; seq,sequence; URF, unknown ORF; inc, incomplete. (D) The region downstream of virF (left side, as shown in panel C) shown at the base pair levelto demonstrate that the nucleotide marking the end of one IS element or IS fragment is the start of another. The relevant bases are indicated inbold underline.

3274 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 5: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

of IS2, three copies of IS600, three copies of iso-IS1, one copyof IS4, one copy of IS630, and one copy of IS911 (Table 1). AnIS element was considered partial or incomplete when thehomology of the element in pWR501 did not extend to andinclude the inverted repeat of the target sequence, when itshould be present. Most of the IS element-related ORFs onpWR501 are partial copies of IS elements that are truncated atthe ends, contain internal deletions, or contain insertions ofother IS elements within their sequence, reflecting extensivegenomic rearrangement following acquisition, a pattern thathas also been seen in the enteropathogenic E. coli plasmidpB171 (67) and the Y. pestis plasmid pCD1 (45). Among all ISelement ORFs on pWR501, those that have not been previ-ously identified on Shigella virulence plasmids include IS10,IS21, IS100, IS110, IS150, IS1294, IS1328, and IS1353. De-tailed characterization of the IS element ORFs can be found athttp://www.genome.wisc.edu/.

Four new types of putative IS elements were identified inpWR501 and were designated ISSfl1, ISSfl2, ISSfl3, and ISSfl4.Their identification was based on the following criteria: thesizes of the ORFs were comparable to ORF sizes of known ISelements, the protein sequence similarity was less than 55% tothese known elements, and inverted and/or direct repeats werepresent at either end of the putative element. At present, thereare no data as to whether these putative IS elements aretransposable. Both complete and incomplete copies of thesenew putative IS elements are present in pWR501, comprising

six new complete elements and 26 ORFs in all. In addition, 13ORFs showed less than significant similarity to known IS ele-ments or putative transposases (Table 1, unknown ORFs).None of the putative transposases showed the characteristicfeatures of new IS elements; however, a more detailed inves-tigation of their sequence remains to be done.

The first new putative IS element, ISSfl1, consists of ORFsS0204 and S0203, is 929 bp in length, has 20-bp inverted re-peats, and generates a 3-bp TTC direct repeat at the point ofinsertion. S0204 and S0203 show similarity to OrfA and OrfBof IS1650 at the amino acid level but not at the nucleotidelevel. Elsewhere on pWR501, S0078-S0079 and S0101-S0102constitute incomplete copies of ISSfl1 consisting of bp 6 to 869and 4 to 827 respectively, of ISSfl1. The second putative new ISelement, ISSfl2, is present in two almost identical copies, S0055and S0128. They encode proteins with 58 to 59% identity to theStreptomyces coelicolor IS110 transposase, but as with ISSfl1,they have no significant homology at the nucleotide level, ex-cept for approximately 55 nucleotides in the center of theelement, which are similar to bp 1071 to 1127 of IS110. Adirect repeat sequence CCCCTATTA is seen at either end ofS0055 and S0128, identifying and duplicating the target site forinsertion. The third putative new IS element, ISSfl3, consists ofS0034, which has 33% identity at the amino acid level to E. coliIS10 transposase, without homology at the nucleotide level.ISSfl3 is inserted within a Y. pestis IS100 sequence, with dupli-cation of sequences at the insertion site. Sequences 98% iden-

TABLE 1. IS element-related ORFs on pWR501a

Type ofIS element

Length(bp)

No. of ORFs/completeelement

No. of ORFsin pWR501

No. of completeelements in pWR501

Known familiesb

IS1 808 2 15 3IS2 1,327 2 10 2IS3 1,258 2 7 0IS4 1,426 1 2 1IS10 1,329 1 1 0IS21 2,131 2 1 0IS91 1,829 2 4 0IS100 1,954 2 6 0IS150 1,443 2 3 0IS600 1,264 2 21 3IS629 1,310 3 27 5IS630 1,158–1,200 1 3 1IS911 1,250 2 2 1IS1294 1,688 1 9 4IS1328 1,359 1 1 0IS1353 1,613 2 2 0

Putative new IS elementsc

ISSfl1 929 1 6 1ISSfl2 1,373–1,376 1 2 2ISSfl3 1,301 1 1 1ISSfl4 2,728 3 17 2

Subtotal 140 26

Unknown ORFs 13

Total 153

a A detailed characterization of the IS elements on pWR501 has been posted at http://www.genome.wisc.edu/.b pWR501 ORFs with $50% amino acid identity over $60% of the length of the query sequence only (significant homology).c pWR501 ORFs with some homology (less than significant) to known or putative transposases.

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3275

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 6: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

tical at the amino acid level to S0034 have also been identifiedin the enteropathogenic E. coli EAF plasmid pB171 (Orf5 andOrf28 [67]).

The fourth putative new IS element, ISSfl4, is homologousto the recently identified ISEc8, located on the chromosome ofenterohemorrhagic E. coli O157:H7, adjacent to the locus ofenterocyte effacement pathogenicity island (44, 57). These el-ements are members of the IS66 family, which has also beenidentified in Rhizobium sp., Agrobacterium sp., and Sinorhizo-bium meliloti (57). Homologous ORFs are also present on theenteropathogenic E. coli plasmid pB171 (orf49 to orf51 [67]).ISEc8 (and ISShf4) consists of three ORFs transcribed in thesame direction and flanked by 11- to 22-bp imperfect invertedrepeats and an 8- or 9-bp target duplication site. On pWR501,three loci consisting of ORFs homologous to all three ISEc8ORFs are present (S0023 to S0025, S0116 to S0119, and S0216to S0219). In addition, six partial copies of the third ORF aloneare distributed throughout the plasmid (S0008, S0038, S0072,S0081, S0090, and S0230). Within the loci containing all threeORFs, the third ORF appears to have undergone an internaldeletion in one case (S0116-S0117) and a frameshift in another(S0216-S0217). Two of the ISSfl4 sequences (S0116 to S0119and S0216 to S0219) are flanked by an exact 11-bp invertedrepeat GTAAGCGCCCC, which is present 71 bp upstream ofthe ATG start site of the first ORF of the element (S0119 andS0219) and 21 bp downstream of the last ORF (S0116 andS0216). ISSfl4 is the largest IS element on pWR501, 2,730 bp,which is comparable in size to ISEc8 (2,443 bp). The ORFsbelonging to ISSfl4 show variably 36 to 62% identity over longstretches to the ISEc8 ORFs.

Several IS elements of one type show direct repeats at eachend of the insertion, while others belonging to the same groupdo not. Those that show target site duplication may have beenacquired more recently than those that do not. For example,only two of the five copies of IS629 and two of three iso-IS1elements have direct repeats. IS1294 elements have 4-bp directrepeats of the sequence CTTG, although the duplication oc-curs not as the result of target site duplication but as a result ofinsertion site specificity (CTTG), with the duplicate copy ofCTTG being the end of IS1294 itself (66). Recent work withthe IS1294 found on the ColD-like resistance plasmidpUB2380 indicates that the IS1294 element mediates not onlyits own transposition but also that of sequences adjacent to itin a transposition mechanism resembling rolling circle replica-tion with single-stranded DNA intermediates (66). pWR501contains four complete copies of IS1294.

A substantial portion of horizontally transferred genes, char-acterized by atypical nucleotide composition or patterns ofcodon usage, are associated with plasmid, phage, or transpo-son-related sequences, including IS elements (42). The regionsadjacent to these horizontally transferred genes often containremnants of these mobile elements (42). The map of pWR501indicates a unique arrangement of known virulence-relatedgenes and unknown ORFs separated by blocks of IS element-related ORFs. The blocks of IS elements presumably reflectthe history of IS element-mediated acquisition of the virulencegenes and other ORFs that have contributed to the assemblyand evolution of the virulence plasmid.

For example, sequences homologous to the ipa-mxi-spa genecluster (ipaJ to spa40) are seen in other bacteria, including the

yscM-yopD gene cluster on the Yersinia low-Ca21-response(LCR) plasmid pCD1 (45). It is noteworthy that a partial copyof the Yersinia IS100 (bp 1 to 1051 of the IS element’s 1,954 bp)borders one end of the Shigella ipa-mxi-spa loci and defines oneend of the region of low G1C content characteristic of thisgene cluster. At the other end of the ipa-mxi-spa loci, approx-imately 50 bp downstream of the stop codon for spa ORF11,which is adjacent to spa40, is another sequence of 139 bp(sequence coordinates 129144 to 129282) identical to the por-tion of the Yersinia LCR plasmid that borders the insertion ofan IS285 into the LCR plasmid. The 139-bp sequence includesthe 28-bp left inverted repeat of IS285 and 100 bases into orf2of the IS element. The end of the 139-bp sequence in pWR501also signals the end of the low-G1C ipa-mxi-spa region. The139-bp sequence is therefore a remnant of a mobile geneticelement that was possibly involved in the horizontal transfer ofthe ipa-mxi-spa genes into pWR501. It is interesting to notethat an IS100 has recently been described adjacent to yscM onpCD1, and two partial IS285 elements flank the yscM-yopDgene cluster in Yersinia (45). Homologs of the IS100 elementfrom Yersinia associated with a pathogenicity island that con-tains homologs to the ipa-mxi-spa genes of Shigella and theyscM-yopD genes of Yersinia on a 154-kb native plasmidpAV511 in the bean pathogen Pseudomonas syringae pathovarphaseolicola have also been identified (27). The lack of ISelements within the ipa-mxi-spa pathogenicity island suggeststhat the entire locus was acquired in a single recombinationevent. Furthermore, like the yscM-yopD, the ipa-mxi-spa clus-ter lacks a characteristic of most pathogenicity islands, thepresence of flanking tRNA genes (49). Absence of tRNAgenes may have resulted from genomic rearrangements follow-ing gene transfer; alternatively, acquisition of the locus maynot have involved insertion into tRNA genes.

A predictable consequence of the presence of such a highdensity of IS element sequences on the Shigella virulence plas-mid is the observed predisposition to frequent genomic alter-ations. For example, the construction of S. flexneri 2a vaccineSC602 by targeted deletion of virG (12) was accompanied by anunexpected recombination between two IS629 sequences thatflanked the region, resulting in a larger than expected deletion,which includes virA as well as virG (M. M. Venkatesan, un-published data). A spontaneous avirulent isolate of S. flexneri 5(M90T-A3) contains a deletion of approximately 70 kb thatincludes both the ipa-mxi-spa locus and the virG-virA locus (9),a segment flanked by IS629 elements (Fig. 1). Similar deletionshave been described for the T-32 ISTRATI vaccine strain (71)and the chromosomal she locus in a serotype 2a strain (48).Another type of IS element-mediated alteration that has beendescribed for the Shigella virulence plasmid is the spontaneousinsertion of an IS1 element within the virF gene, which con-verted a virulent strain to an avirulent one (36).

G1C composition. The G1C content of genes and geneflanking regions has been used as a marker for phylogeneticorigin of genes or gene clusters, with those genes or geneclusters that have anomalous base composition being thoughtto have been acquired more recently by horizontal gene trans-fer. The average G1C content of pWR501 is 47.6%, similar tothe estimated G1C content of the Shigella chromosome,50.02% (G. Plunkett III et al., personal communication). Incontrast, essentially all of the characterized virulence-associ-

3276 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 7: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

ated genes encoded on the virulence plasmid have markedlylower G1C composition, in the range of 30 to 35% (Fig. 3).

The overall G1C content of the ipa-mxi-spa region is 35%.While the G1C content of the Yersinia yscM-yopD region(44.8%) is similar to that of both the entire LCR plasmid andthe Y. pestis chromosome (45), the G1C content of the se-quenced region of pAV511 is 54%, significantly lower than theoverall figures of 59 to 61% reported for pathovars of P. syrin-gae (27). The homology of the virulence genes and flankingsequences of the pathogenicity islands among Shigella, Yer-sinia, and P. syringae plasmids indicate a common ancestry ofthese pathogenicity islands, with subsequent evolutionarychanges in gene composition and arrangements. Based onG1C composition alone, it is tempting to speculate that theShigella ipa, mxi, and spa genes were acquired much later inevolution than the Yersinia genes. Alternatively, Yersinia mayhave acquired the region from an organism with a similar G1Ccontent.

Altogether, 66 ORFs (22.5% of pWR501 ORFs) have aG1C content of less than 40%; of these, 38 are found as ablock within the ipa-mxi-spa loci, and 28 are distributedthroughout the remainder of the plasmid (Fig. 3). Sixteen ofthese 28 are found in clusters of two or three genes of lowG1C content, suggesting that they might have been acquiredas a block of genes from a single donor organism. Four of these28, virF (S0051), shet2 (S0097), virA (S0191), and sopA (S0292)are known virulence factors (68). Most of the remainder en-code hypothetical proteins that lack significant similarity to anyknown protein, many of which are of moderate size (200 to 500amino acids [aa]). This raises the possibility that like the knownvirulence genes on pWR501, many of these hypothetical ORFsencode virulence factors and were acquired by lateral transfer.These therefore potentially represent as yet uncharacterizedrecently acquired virulence determinants, some of which may

have been acquired after the emergence of the evolutionarygroup from which pWR501 was isolated.

Three pWR501 loci have G1C contents of 60% or greater(Fig. 3). The first of these is S0214, which lies within a block ofrelatively high G1C content genes that is homologous to an E.coli plasmid ColIb-P9 locus. The second high-G1C2 contentlocus is S0264, which lies immediately adjacent to a group ofgenes predicted to be involved in plasmid transfer and stabilityfunctions, most similar to those of plasmid R100 (see “Thereplicon,” below). The third locus is S0269 to S0274, withinTn501.

Unknown ORFs. ORFs that either showed no significantsimilarity to any protein or ORF in the database or werehomologous or identical to conserved hypothetical ORFs (pro-teins of unknown function) were jointly defined as unknownORFs and are listed in Table 2. A discussion of several inter-esting features of these ORFs follows.

Unknown ORFs with no significant similarity to hypotheti-cal proteins. The longest unknown ORF is S0249, which pu-tatively encodes a 970-aa protein that has less than significanthomology to the Mycobacterium tuberculosis probable DNAhelicase II homolog UvrD2 (700 aa [11]), largely betweenresidues 335 and 422. While M. tuberculosis UvrD2 has notbeen extensively characterized, E. coli uvrD encodes a 720-aaprotein with 39-59 DNA helicase activity (6) that is nonessentialfor viability but is required for methyl-directed mismatch re-pair and nucleotide excision repair and is believed to partici-pate in recombination and DNA replication. S0249 belongs tothe superfamily of DNA/RNA helicases (COG0210), as deter-mined by COGnitor for identification of families to whichpredicted proteins are related (65). S0249 has an ATP/GTPbinding site motif A (P loop) at residues 156 to 163 (GSAGSGKT), similar to the ATP binding motif in UvrD proteins(ProfileScan algorithm). The ATP binding motif is a glycine-

FIG. 3. Plot of G1C content (y axis) of each ORF on pWR501 (x axis). Unknown ORFs and selected known ORFs with G1C content of lessthan 40% are labeled.

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3277

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 8: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

TABLE 2. Unknown ORFsa

ORF Begincds

Endcds

Firstcodon

G1Ccontent

(%)

No.ofaa

Organism of protein towhich most similar Homolog protein Extent of similarity

S0002 485 949 ATG 51.0 154 No significant homologyS0003 1176 2042 ATG 32.8 288 No significant homologyS0005 3272 3526 ATG 42.3 84 No significant homologyS0006 4441 3470 TTG 31.2 323 No significant homologyS0007 4742 4344 GTG 32.3 132 No significant homologyS0011 6793 7110 ATG 49.4 105 No significant homologyS0016 11662 11084 ATG 31.4 192 No significant homologyS0030 18654 19331 ATG 35.5 225 S. flexneri ShET2 (565 aa)b 22% identical over 129 aaS0031 20037 19813 ATG 51.6 74 No significant homologyS0062 44909 44427 ATG 45.8 160 Mycobacterium tuberculosis Rv0919 (hypothetical protein)

(166 aa)47% identical over 148 aa

S0063 45202 44900 ATG 44.9 100 M. tuberculosis Rv0918 (hypothetical protein)(158 aa)c

33% identical over 77 aa

S0066 48838 47884 ATG 32.6 484 No significant homologyS0088 67022 67312 GTG 41.9 96 No significant homologyS0098 76039 74627 ATG 31.8 470 No significant homologyS0099 76793 76494 ATG 53.0 99 No significant homologyS0100 77192 76803 ATG 48.7 129 No significant homologyS0103 78806 78333 ATG 35.0 157 S. typhimurium HilCb (295 aa) 31% identical over 128 aa

E. coli PerAb (274 aa) 34% identical over 119 aaS0104 79222 78794 ATG 32.9 142 No significant homologyS0105 80088 79450 GTG 50.2 212 E. coli K-12 Hypothetical protein 55% identical over 212 aaS0111 83759 83274 ATG 43.1 161 M. tuberculosis Rv0919 (hypothetical protein)

(166 aa)50% identical over 157 aa

S0112 84082 83747 TTG 39.9 111 M. tuberculosis Rv0918 (hypothetical protein)(158 aa)c

38% identical over 81 aa

S0120 88198 87914 ATG 47.7 94 No significant homologyS0121 88260 88439 ATG 37.8 59 No significant homologyS0122 88540 89994 ATG 31.2 484 No significant homologyS0141 109413 109057 ATG 32.8 118 No significant homologyS0176 134729 134313 GTG 48.9 138 No significant homologyS0177 136045 135269 ATG 30.9 258 M. bovis TfpB (261 aa)b 28% identical over 237 aaS0179 137371 137201 ATG 51.6 56 No significant homologyS0193 147529 147786 ATG 39.1 85 No significant homologyS0194 147755 147952 GTG 47.5 65 No significant homologyS0205 155688 155287 GTG 41.8 133 No significant homologyS0208 157575 158342 GTG 53.0 255 E. coli plasmid ColIb-P9 YccB (308 aa)d 92% identical over 231 aaS0209 158288 158545 GTG 56.6 85 E. coli plasmid ColIb-P9 YccB (308 aa)d 92% identical over 84 aaS0210 158917 159600 ATG 56.0 227 E. coli plasmid ColIb-P9 YcdB (227 aa) 97% identical over 227 aaS0211 159601 159822 ATG 53.1 73 E. coli plasmid ColIb-P9 YceA (73 aa) 100% identical over 73 aaS0212 159834 160019 ATG 57.0 61 E. coli plasmid ColIb-P9 YceB (144 aa)d 90% identical over 61 aaS0213 160053 160268 ATG 57.4 71 E. coli plasmid ColIb-P9 YceB (144 aa)d 81% identical over 70 aaS0214 160313 161083 ATG 60.7 256 E. coli plasmid ColIb-P9 YcfA (256 aa) 93% identical over 256 aaS0225 168238 167600 ATG 36.6 212 No significant homologyS0235 173855 174845 GTG 50.0 96 E. coli plasmid ColIb-P9 YacA (89 aa) 92% identical over 89 aa

Vibrio cholerae RelBb (122 aa) 26% identity over 74 aaS0236 174219 174368 ATG 52.0 49 E. coli plasmid ColIb-P9 YacBc (93 aa) 97% identical over 32 aa

at amino terminusS0237 174397 174987 ATG 35.7 196 No significant homologyS0238 175234 175055 GTG 47.2 59 No significant homologyS0249 182574 179662 TTG 50.9 970 M. tuberculosis UvrD2b 23% identical over 310 aa

near amino terminusS0250 183057 183899 ATG 40.0 280 S. flexneri YSH6000

virulence plasmidShf (280 aa) .99% identical over 280

aaEnteroaggregative E. coli

plasmid pAAShf (280 aa) 95% identical over 280 aa

Enterohemorrhagic E. coliO157:H7 plasmid

L7026 (hypothetical protein)(273 aa)

70% identical over 273 aa

S0264 196924 197159 ATG 61.5 70 Plasmid R100 YigA (70 aa) 73% identical over 70 aaS0265 197451 197645 TTG 58.2 64 Plasmid R100 YigB (153 aa)d 95% identical over 64 aaS0276 207312 207479 ATG 51.2 55 Plasmid R100 YigB (153 aa)d 96% identical over 55 aaS0282 212528 212340 ATG 56.6 62 No significant homologyS0290 218962 218378 TTG 33.8 194 No significant homology

a Defined as those that either showed no significant similarity to any protein or ORF in the database or were homologous or identical to conserved hypothetical ORFs(proteins of unknown function). “Significant similarity” was defined as greater than 30% identity over at least 60% of the query sequence and 60% of the targetsequence. cds, sequence coordinates.

b Although the extent of similarity did not meet the criteria for “significant,” the homolog alignment was included because of the potential relevance of the particulargene to Shigella pathogenesis.

c Although the extent of similarity did not meet the criteria for “significant,” the homolog alignment was included because of its link to the adjacent ORF.d Based on alignments, S0208 appears to be frame shifted from S0209, and S0212 appears to be frame shifted from S0213; in review of the sequence, no sequencing

errors could be detected that would explain these apparent frame shifts. In addition, S0265 and S0276 appear to have evolved from a single ORF that was disruptedby inserted sequences and had an internal deletion.

3278 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 9: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

rich region, which typically forms a flexible loop between a betastrand and an alpha helix and interacts with one of the phos-phate groups of the nucleotide. S0249 also has an ankyrinrepeat 2 motif at residues 721 to 753. Ankyrin repeats aretandemly repeated modules of 33 aa seen primarily in eukary-otic proteins, where they play a role in protein-protein inter-actions (4). The few known examples from prokaryotes andviruses are thought to be the result of horizontal gene transfer(4). S0249 could represent a helicase with a C-terminal endcapable of interaction with other accessory proteins involved inDNA replication, excision, and/or repair mechanisms.

ORF S0141, located between icsB and ipgD within the ipa-mxi-spa loci, has not been previously described in the litera-ture. The predicted protein has no significant homology to anyhypothetical protein. The ATG start site and the first nineamino acids precede the start codon of ipgD, which is tran-scribed in the opposite orientation. It remains to be deter-mined whether S0141 is expressed in vivo.

S0103 has less than significant homology to HilC (SprA) ofSalmonella serovar Typhimurium and PerA of E. coli, bothmembers of the AraC/XylS family of transcriptional regulators,known to bind DNA via a helix-turn-helix (HTH) motif (23,56). In enteropathogenic E. coli, PerA activates the eaeA gene,encoding intimin. HilC regulates hilA expression, which in turnactivates the expression of invasion-associated genes. A se-quence that is 52% similar to the HTH motif characteristic ofthis family of proteins was detected in S0103 at residues 54 to155 (Prosite Scan ID PS01124). HTH DNA binding motifswere originally identified in bacterial proteins. They have sincealso been found in eukaryotic DNA-binding proteins. It is notknown what genes might be regulated by S0103.

Unknown ORFs with significant similarity to previously de-scribed proteins of unknown function. Adjacent to and in theopposite orientation of S0249 is an ORF (S0250) that encodesa 280-aa protein with 95% identity to Shigella shf (48). In thepreviously published sequence of shf, this locus was thought tocontain two ORFs, shf1 (predicted to encode a 133-aa protein)and shf2 (predicted to encode a 145-aa protein). The pWR501sequence lacks a single T nucleotide that is present after base770 of the published sequence. The absence of this nucleotidein pWR501 generates a single ORF encoding a 280-aa protein.COG analysis places S0250 in the family of xylanases/chitindeacetylases (COG0726), which function in carbohydratetransport and metabolism.

S0250 forms an operon with S0251, S0252, and S0253, whichencode rfbU (capU), virK, and msbB, respectively. RfbU is aUDP-sugar hydrolase and has been described in Vibrio cholerae asan accessory protein required for O-antigen biosynthesis (18).MsbB is an acyltransferase involved in fatty acyl modification ofthe O antigen (60). msbB genes are also present on the Shigellachromosome and in the E. coli genome. Mutations in msbB genesresult in lipopolysaccharide that has reduced toxicity (60). InShigella, VirK mutants have less virG mRNA than wild type,suggesting the involvement of virK in posttranscriptional regula-tion of virG expression (37). It is interesting to consider that VirK,being encoded within the shf-capU-msbB operon, may have a rolein O-antigen biosynthesis. A locus of three genes with significanthomology and similar in organization to pWR501 genes shf, rf-bU(capU), and virK is also seen in the enteroaggregative E. coliplasmid pAA2 (J. R. Czeczulin, T. R. Whittam, I. R. Henderson,

and J. P. Nataro, submitted for publication) and E. coli O157:H7plasmid pO157 (8).

Two loci containing seven (S0208 to S0214) and two (S0235and S0236) ORFs are similar to unknown ORFs on plasmidColIb-P9 (G. Sampei and K. Mizobuchi, unpublished data). Ofnote, ColIb-P9 was originally described in Shigella sonnei. ORFS0210 shows 41% identity to 135 residues of a 304-aa adeninemethyltransferase (EcoVIII modification methylase), whichcontains an N6-adenine-specific methylase signature motif be-tween residues 24 and 30 (ILTDPPY) (Prosite Scan IDPS00092). DNA methyltransferases modify DNA within a rec-ognition sequence and in most cases are associated with arestriction endonuclease. The combination of these two activ-ities protects native bacterial DNA from cleavage and facili-tates cleavage of foreign DNA. In bacteria, DNA methylationmay also be involved in transposon movement and DNA mis-match repair and may have important roles in virulence (26).Salmonella serovar Typhimurium strains with mutations inDNA adenine methylase show abnormalities in protein secre-tion, host cell invasion, and M-cell cytotoxicity (22). It is notknown whether the DNA methylase activity of S0210 is asso-ciated with a restriction endonuclease.

The identification of several unknown ORFs, many of whichhave interesting homologies and many of which may be in-volved in virulence, permits targeted investigation of genes thatmay have subtle albeit important roles in Shigella pathogenesis.Most of the virulence genes that have been previously charac-terized were identified using in vitro assays designed to evalu-ate the ability of bacteria to invade and multiply within epi-thelial cells. Many of the unknown ORFs in pWR501 may havebeen missed in these screens by virtue of not being essential toeither invasion or intracellular multiplication. They may nev-ertheless modulate the efficiency of invasion and intercellulardissemination or be otherwise important to pathogenesis. Fur-ther genetic and phenotypic analysis of each of these genes willpermit characterization of each one’s role in pathogenesis.

Known ORFs. S0042 (405 aa) is 47% identical and 62%similar over 387 aa to a reverse transcriptase/maturase (RT)from Sinorhizobium meliloti (419 aa) that is associated with agroup II intron (33). S0113 and S0114 are homologous to shortregions of the RT, apparently fragments (50 and 47 aa, respec-tively), and S0199 and S0200 represent a second copy of theRT element that is frameshifted. Group II introns are self-splicing RNAs linked to mobile genetic elements, related tothe nuclear introns of pre-mRNA. They are commonly foundin organellar genes of lower eukaryotes and plants but haverecently been described as associated with IS elements in manyprokaryotes, including E. coli (13). In addition to their ri-bozyme core, some group II introns encode proteins with re-verse transcriptase activity associated with endonuclease activ-ity. In pWR501, the RT encoded by S0042 appears to containthe seven domains characteristic of active RTs (33). The S.meliloti group II intron with the encoded RT is inserted withinan IS element. The association of group II introns with ISelements ensures the spread and maintenance of the introns.S0042 does not appear to be within an IS element, althoughthere is an IS629 element 700 bp upstream of the codingsequence and a putative transposase downstream. Further-more, the presence of multiple copies of RT in pWR501 indi-cates that they might originally have been acquired within IS

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3279

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 10: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

elements. The IS629 sequence is immediately adjacent to a170-bp sequence that is homologous to sequence on the en-teropathogenic E. coli plasmid pB171, suggesting DNA ex-change between pWR501 and pB171. Splicing efficiency of theS. meliloti group II intron requires expression of the RT pro-tein, which suggests that it has a maturase activity (33). S0042contains the carboxy-terminal domain that specifies maturaseactivity but lacks the zinc finger domain that specifies endonu-clease activity.

A virulence plasmid-encoded operon that contributes to en-hanced survival and mutation frequencies, impCAB, has pre-viously been characterized in S. flexneri strain SA100 (50). InpWR501, the impCAB operon is missing; only the first 176 bpare present, beginning at sequence coordinate 157595. imp-CAB is similar to the chromosomal umuDC operons of E. coliand Salmenella serovar Typhimurium and the plasmid mucABoperon; all function in DNA repair following radiation- andchemical-induced mutagenesis.

The invasion locus of Shigella is a pathogenicity island-likecluster that consists of 38 ORFs (S0130 to S0167) of the ipa-mxi-spa operons within a stretch of 32 kb of the virulenceplasmid (beginning with the start codon of ipaJ at sequencecoordinate 97092 [S0130] and ending with the stop codon ofspa-orf11 [S0167] at sequence coordinate 129091). Geneswithin this locus are critical for Shigella invasion of mammaliancells, although certain genes outside this region are requiredfor optimal invasion of tissue culture cells. This region hasbeen previously mapped and sequenced, and genes within thisregion have been extensively characterized (46). Notably, onpWR501, the orientation of the entire gene cluster is inverse ofwhat had been previously published (17). Potential explana-tions for this inversion include (i) strain differences due to atrue inversion during evolution that may have been enhancedby the presence of flanking IS elements or (ii) an artifact of thecloning approach that was used in the prior sequencingprojects. Strain-to-strain differences in the arrangement of vir-ulence genes have been previously described (2).

Two, and perhaps three, ShET2-like toxin genes are presenton pWR501. The ShET2 enterotoxin (S0097, 566 aa) is presenton the virulence plasmids of all species of Shigella (38). S0012(533 aa) represents a second allele of the ShET2 gene, with40% identity to ShET2 over more than 90% of its length (Fig.4). In addition, S0230 (266 aa) is 22% identical to ShET2 over55% of its length (Table 2 and Fig. 4).

There are five complete alleles of ipaH on pWR501, desig-nated ipaH1.4, ipaH2.5, ipaH4.5, ipaH7.8, and ipaH9.8 (69). Asingle T residue at pWR501 sequence coordinate 213802,within the coding sequence of ipaH1.4, a single T residue atpWR501 sequence coordinate 41551, within the coding se-quence of ipaH2.5, and a single G residue at pWR501 se-quence coordinate 61847, within the coding sequence ofipaH7.8, were missed previously, leading to truncation ofipaH1.4 and ipaH2.5 at their 39 ends and ipaH7.8 at its 59 end.Thus, our sequence predicts that ipaH1.4, ipaH2.5, ipaH4.5,ipaH7.8, and ipaH9.8 encode proteins of 575, 563, 574, 565, and545 aa, respectively. The amino-terminal halves are variable,while the carboxy-terminal halves are conserved (Fig. 5), asdescribed previously (69). The variable amino-terminal halvescontains a leucine-rich repeat (delimited by asterisks in Fig. 5)(28). The carboxy-terminal 13 residues of IpaH7.8 and IpaH9.8

are absent in IpaH2.5 and IpaH4.5 and altered in IpaH1.4. TheG1C content of these five alleles is remarkable in that theamino-terminal halves are lower in G1C content than theconserved carboxy-terminal halves of all five alleles, whichsuggests a modular unit whose two halves might have beenacquired separately during evolution and subsequently fused.ipaH1.4 and ipaH2.5 are more similar to each other than to theother three alleles. A schematic representation of the amino-terminal alignment and the degree of sequence identity amongthe five ipaH alleles is shown in Fig. 5. Although the functionof each ipaH allele is unknown, transcription of four of them,but not that of the ipaBCDA and mxi operons, was markedlyincreased during growth in the presence of Congo red and inan ipaD mutant, two conditions under which secretion throughthe Mxi-Spa machinery is enhanced (15). Transcription of theipaH genes was also transiently activated upon entry into epi-thelial cells. Recently, IpaH7.8 was shown to facilitate theescape of Shigella from phagocytic vacuoles of mouse macro-phages and human monocytes (19).

The replicon. pWR501 has a replicon region highly homol-ogous to that of plasmid R100, which is within the RepFIIAfamily of replicons. There are six known incompatability (Inc)groups within this family; plasmid R100 belongs to the IncFIIgroup. In pWR501, the replicon extends from sequence coor-dinates 208523 to 210861 and contains the essential replicationelements present in R100, including an ori and a G site (single-strand initiation site), which directs the priming of single-stranded DNA templates for leading and/or lagging strandsynthesis (Fig. 6) (43, 64). Of note, the R100 plasmid wasinitially isolated from an S. flexneri 2a strain.

The frequency of replication is regulated by control of thesynthesis of the plasmid-specific replication initiation proteinRepA1, which binds to the plasmid ori and assembles a replica-tion complex (43). The principal copy control elements are (i) anantisense RNA, inc RNA (or CopA in plasmid R1), which isconstitutively expressed and rapidly turned over and which exertsnegative control on the repA1 transcript (40); (ii) RepA2, a trans-acting repressor of the repA1 promoter; and (iii) repA6, whoseexpression disrupts the binding of inc RNA to the repA1 transcript(73). All of these elements are present on pWR501. To ourknowledge, no function has been ascribed to repA4.

The pWR501 replicon is more than 75% identical to theR100 replicon at both nucleotide and amino acid levels. Clas-sification of replicons is largely based on the extent of similarityof the sequence of RepA1 proteins. pWR501 RepA1 (251 aa inlength) is 76% identical to residues 35 to 285 of R100 RepA1(285 aa in length). Phylogenetic analysis of the RepA1 proteinsfrom several different replicons indicated that pWR501 RepA1is most closely related to RepA1 from R100 (data not shown).

repA2 of pWR501 is divergent from repA2 of R100 but isidentical to that of R1 (an IncFII plasmid), as has been shownfor another S. flexneri serotype 5a virulence plasmid, pWR110(59); interestingly, the three sequences are identical at theprotein level. The RepA2 target site is in a region that lackshomology, indicating that the RepA2 proteins and their targetsare plasmid specific and not group specific (41).

Of note, pWR110 was shown to be compatible with IncFIIplasmids and only weakly incompatible with IncFI plasmids(59). The pWR501 incompatibility determinant, located withinthe inc gene, is identical to that of pWR110 (59); therefore, the

3280 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 11: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

incompatibility profiles of the two plasmids would be predictedto be the same. The inc genes of pWR110 (59) and pWR501(this study) differ slightly from the inc genes of the IncFIIplasmids R1 and R100 and the IncFI plasmids P307 andColV2-K94. Thus, while the replicon of pWR501 is most sim-ilar to that of IncFII plasmids, it does not fall within the FIIincompatibility group, suggesting that the few differences innucleotide sequence within the inc gene may give rise to sig-nificant differences in incompatibility.

The ori region of R100 is located within a 167-bp sequencedownstream from repA1 that contains approximately 75 bp ofhigh AT content; a similar high-AT-content sequence is ob-served in the putative ori of pWR501, which lies within a362-bp segment between the stop codon of repA1 and the startcodon of repA4 (sequence coordinates 210012 to 210373) (Fig6) (63). Base pairs 93 to 258 of this 362-bp sequence are 95%

identical to the 167-bp ori sequence of R100. A dnaA box(TTATCCACA) is located within the ori sequence, and a Gsite, analogous to the single-strand initiation site of phages, islocated toward the 39 end of repA4, as in R100. Within the Gsite are three blocks of conserved nucleotides, one of whichprovides the starting point for primer RNA synthesis (63); allthree are conserved in pWR501. pWR501 homology to theR100 replicon ends 80 bp upstream of tir in R100 (Sampei andMizobuchi, submitted for publication).

Apart from the essential replicon, pWR501 has multiple locihomologous to sequences known to be involved in plasmidsegregation and stable maintenance: parA and parB of the P1bacteriophage partitioning system (S0039 and S0040), stbB andstbA of the R100 plasmid (S0206 and S0207), ccdA and ccdB(also known as proteins H and G) of the F plasmid (S0232 andS0233), and mvpA and mvpT (S0259 and S0258). P1 ParA and

FIG. 4. Alignment of ShET2 alleles on pWR501. The ShET2 toxin (S0097) is compared with the second copy of ShET2 (S0012) and a smallerORF (S0030) with which it bears similarity. The similarities are indicated as for BLAST searches.

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3281

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 12: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

ParB act on the cis-acting element parS to promote faithfulsegregation of the plasmid during the bacterial cell cycle (14).The ParA proteins of pWR501 and P1 are 75% identical andthe ParB proteins are 59% identical over most of their lengths,and an AT-rich parS site with an inverted imperfect 20-bprepeat structure is present immediately downstream of theparB stop codon. Plasmid R100 StbA and StbB, along with anessential upstream cis-acting element, mediate stable plasmidinheritance (61). StbA and StbB of pWR501 are 43 and 29%identical to R100 StbA and StbB over more than 80% of thetarget sequences, although pWR501 StbA is predicted to betwice as long as R100 StbA. An AT-rich sequence immediatelyupstream of stbA may constitute a cis-acting site. The redun-dancy of plasmid segregation and stable maintenance systemsand their distribution around the plasmid may enhance thestable inheritance of a single-copy plasmid, such as pWR501,over many generations. In addition, the redundancy may sug-gest not only that maintenance of the plasmid is important toShigella virulence but that maintenance of plasmid derivatives

that carry large deletions is important. The Shigella virulenceplasmid is known to spontaneously delete large segments (34),and the observed distribution of segregation and maintenanceloci might permit maintenance of derivatives that had deletedsegments containing one or more of these loci, which maypermit maintenance of virulence despite loss of some se-quence. Toxin-antitoxin systems, which encode a stable cyto-toxin and an unstable antitoxin, mediate postsegregational kill-ing of daughter cells that lack plasmid. pWR501 contains onerecently described toxin-antitoxin system (the mvp locus,S0258-S0259 [55]), one that has been characterized on F plas-mid (S0232-S0233), and remnants of a third (S0205). S0232and S0233 encode homologs of F plasmid CcdB (toxin) andCcdA (antitoxin) (1). Immediately downstream of stbB, S0205encodes a protein homologous to E. coli RelB, the toxin en-coded by the relBE toxin-antitoxin locus (25).

ColE1 plasmids resolve plasmid multimers into monomersby site-specific recombination involving the cer site (58), inconjunction with chromosomally encoded argR, pepC, and

FIG. 5. The five IpaH alleles of pWR501. (A) Alignment of the amino-terminal halves of the five IpaH alleles, using the Clustal method withPAM250 residue weight table (MEGALIGN algorithm; DNASTAR software). Beginning with the sequence LADAV (residues 307 to 311 ofIpaH1.4), the sequences are conserved among all five alleles; the carboxy termini are not shown. Asterisks, approximate extent of the leucine-richrepeat sequence; arrow, beginning of region conserved in all five alleles. (B) Sequence identity and divergence among the five IpaH alleles. Percentdivergence is calculated by comparing sequence pairs in relation to the phylogeny reconstructed by MEGALIGN, whereas percent identity isdetermined for individual pairs without regard to phylogenetic relationship. (C) G1C content distribution of IpaH7.8. Scale is in base pairs.

3282 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 13: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

xerC. pWR501 contains a 129-bp sequence (sequence coordi-nates 175196 to 175067) that is similar to bp 111 to 240 of the284-bp ColE1 cer site, and therefore contains the ArgR bindingsite, but lacks the right arm of the recombination site. PlasmidR1 has a significantly truncated cer site (only 44 bp long) thatis thought to function by recombination with a related site inthe terminus of the E. coli chromosome (10), which suggeststhat the pWR501 site may also be functional. Of note, a ho-molog of ColE1 mob9 protein is encoded within the pWR501cer site (S0238).

Transfer genes. A block of genes located between sequencecoordinates 189271 and 196816 are similar to known transfergenes. These include ORFs with significant homology to atruncation of traD and the complete sequences of traI, traX,and finO. In pWR501, this block has the same genetic organi-zation as the corresponding genes in plasmid R100 (74), in-cluding two downstream ORFs of unknown function, yigA andyigB, followed by the replicon, raising the possibility that theentire region was acquired by horizontal transfer from R100.The remainder of the large block of transfer genes that arelocated upstream in R100 and other conjugative plasmids areabsent on pWR501, and the traI gene has an internal frame-shift. Whether pWR501 once had the full complement oftransfer genes and subsequently lost most of them or whetheronly a portion of the transfer genes were acquired cannot bedetermined. Experimental data have shown that the virulenceplasmid is not capable of self-transfer by conjugation but canbe conjugated in the presence of conjugative plasmids (52).

Evolution of the Shigella virulence plasmid. Recent geneticanalyses suggests that shigellae do not constitute a distinctgenus as traditionally believed but rather are within the genus

of E. coli, much like the pathogenic E. coli (47). These analysesindicate that Shigella emerged from E. coli seven or eightindependent times during evolution, leading to three clustersof Shigella, each of which contains serotypes from multipletraditional species, and four or five additional forms, each ofwhich contains one traditional serotype (47). The three mainShigella clusters are estimated to have evolved 35,000 to270,000 years ago, which predates the development of agricul-ture and makes shigellosis one of the early infectious diseasesof humans (47).

The defining event each time Shigella arose was almost cer-tainly the acquisition of an historical precursor of the current-day virulence plasmid. The data also suggest that the loss ofspecific catabolic pathways (inability to utilize lactose and mu-cate and to decarboxylate lysine), loss of motility, and expan-sion of O-antigen diversity that are characteristic of Shigellastrains occurred more recently than the acquisition of the plas-mid (47). In addition, curli loci have been insertionally inacti-vated in Shigella (51). Since the plasmid was acquired at dis-tinct times, one would predict that differences reflecting theevolution of the plasmid could be obtained by genetic compar-ison of virulence plasmids of the seven different Shigella evo-lutionary groups. Subsequent to the acquisition of the viru-lence plasmid, divergence of Shigella clones from E. coli wouldinvolve clonal divergence (accumulation of mutations by basesubstitution), horizontal transfer of genetic material fromother species, and loss of gene sequences that interfere withpathogenicity (35, 42).

Certain horizontal gene transfer events have been key to theevolution of Shigella. A quintessential feature of Shigella is itsability to invade mammalian cells and access the cell cyto-

FIG. 6. Replicon of pWR501. (A) Replicon region and flanking DNA sequences (sequence coordinates 187081 to 214607), including theinsertion site of Tn501. (B) Expanded view of the replicon shown in panel A (sequence coordinates 207312 to 214912), indicating the position ofinc RNA, ori, and the G site. Distances are indicated in base pairs below each map.

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3283

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 14: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

plasm, defining a niche unique among enteric gram-negativebacteria, with the exception of enteroinvasive E. coli. Thus, theacquisition and evolution of the ipa-mxi-spa pathogenicity is-land, which encodes all of the genes required for cell invasionand phagolysosomal lysis, permitted a major alteration inpathogenesis (24). Likewise, the acquisition of virG (icsA),which mediates actin assembly on Shigella, and virF and virB,the regulators of the virG and ipa-mxi-spa loci, were key to theemergence of Shigella. Since all Shigella serotypes containthese loci, they were probably all present on the prototypicvirulence plasmid.

It is not known which ORFs were acquired subsequent tothe acquisition of the plasmid by the first Shigella strain. Onemight predict that access to an intracellular niche wouldpresent the organism with selective pressure to acquire certainfactors that would not have provided selective advantage pre-viously. These might include factors that permit resistance tothe cellular immune response of the host or utilization ofnutrients present in the host cytoplasm. Of note, the cellularimmune response is ineffective against Shigella in animal mod-els (72), and Shigella-specific cytotoxic T lymphocytes have notbeen isolated from convalescent individuals. In addition, fac-tors that permit the bacterium to optimize its lifestyle in thehuman colon may also have been acquired after acquisition ofthe prototypic virulence plasmid. An example of this is theacquisition by horizontal transfer of O-antigen genes, such asthose present on the virulence plasmid of S. sonnei, and sub-sequent inactivation of native O-antigen genes (30). Serotypicdiversity due to the variations in O antigen is seen amongShigella strains. Such diversity likely facilitates evasion of thehost humoral immune response. Further analysis of the func-tion of many ORFs of unknown function on pWR501 as well ascomparative analysis of virulence plasmids from other Shigellaserotypes will allow a more complete characterization of theevolution of the plasmid.

ACKNOWLEDGMENTS

We thank Guy Plunkett III for assistance with DNA analysis andhelpful discussions, Vessela Ivanova for assistance in the isolation ofDNA and analysis of the replicon, Sara Klink, George F. Mayhew, GuyPlunkett III, and Helen J. Wing for help with confirming the sequenceassembly, Nicole Perna for critical reading of the manuscript, LutherLindler for many helpful discussions, and the sequencing teams of theGenome Center of Wisconsin for excellent technical work.

The sequencing work was supported by NIH/NIAID, and the se-quence annotation was supported by NIH grant AI43562 (M.B.G.).

ADDENDUM IN PROOF

Buchrieser et al. (C. Buchrieser, P. Glaser, C. Rusniok, H.Nedjeri, H. d’Hauteville, F. Kunst, P. Sansonetti, and C. Par-sot, Mol. Microbiol. 38:760–771, 2000) have also recently se-quenced and analyzed the Shigella flexneri virulence plasmid.

REFERENCES

1. Bahassi, E. M., M. H. O’Dea, N. Allali, J. Messens, M. Gellert, and M.Couturier. 1999. Interactions of CcdB with DNA gyrase. Inactivation ofGyra, poisoning of the gyrase-DNA complex, and the antidote action ofCcdA. J. Biol. Chem. 274:10936–10944.

2. Benjelloun-Touimi, Z., M. S. Tahar, C. Montecucco, P. J. Sansonetti, and C.Parsot. 1998. SepA, the 110 kDa protein secreted by Shigella flexneri: twodomain structure and proteolytic activity. Microbiology 1444:1815–1822.

3. Blattner, F. R., G. Plunkett, C. A. Bloch, N. T. Perna, V. Burland, M. Riley,J. Collado-Vides, J. D. Glasner, C. K. Rode, G. F. Mayhew, J. Gregor, N. W.

Davis, H. A. Kirkpatrick, M. A. Goeden, D. J. Rose, B. Mau, and Y. Shao.1997. The complete genome sequence of Escherichia coli K-12. Science277:1453–1462.

4. Bork, P. 1993. Hundreds of ankyrin-like repeats in functionally diverse pro-teins: mobile modules that cross phyla horizontally? Proteins 17:363–374.

5. Borodovsky, M., and J. McIninch. 1993. GeneMark: parallel gene recogni-tion for both DNA strands. Comput. Chem. 17:123–133.

6. Bruand, C., and S. D. Ehrlich. 2000. UvrD-dependent replication of rolling-circle plasmids in Escherichia coli. Mol. Microbiol. 35:204–210.

7. Burland, V., D. L. Daniels, G. Plunkett, and F. R. Blattner. 1993. Genomesequencing on both strands: the Janus strategy. Nucleic Acids Res. 21:3385–3390.

8. Burland, V., Y. Shao, N. T. Perna, G. Plunkett, H. J. Sofia, and F. R. Blattner.1998. The complete DNA sequence and analysis of the large virulence plasmidof Escherichia coli O157:H7. Nucleic Acids Res. 26:4196–4204.

9. Buysse, J. M., M. Venkatesan, J. A. Mills, and E. V. Oaks. 1990. Molecularcharacterization of a trans-acting, positive effector (ipaR) of invasion plasmidantigen synthesis in Shigella flexneri serotype 5. Microb. Pathog. 8:197–211.

10. Clerget, M. 1991. Site-specific recombination promoted by a short DNAsegment of plasmid R1 and by a homologous segment in the terminus regionof the Escherichia coli chromosome. New Biol. 3:780–788.

11. Cole, S. T., R. Brosch, J. Parkhill, T. Garnier, C. Churcher, D. Harris, S. V.Gordon, K. Eiglmeier, S. Gas, C. E. Barry, F. Tekaia, K. Badcock, D.Basham, D. Brown, T. Chillingworth, R. Connor, R. Davies, K. Devlin, T.Feltwell, S. Gentles, N. Hamlin, S. Holroyd, T. Hornsby, K. Jagels, B. G.Barrell, et al. 1998. Deciphering the biology of Mycobacterium tuberculosisfrom the complete genome sequence. Nature 393:537–544.

12. Coster, T., C. W. Hoge, L. VandeVerg, A. B. Hartman, E. V. Oaks, M. M.Venkatesan, D. Cohen, G. Robin, A. Fontaine-Thompson, P. J. Sansonetti,and T. L. Hale. 1998. Vaccination against shigellosis with attenuated Shigellaflexneri 2a strain SC602. Infect. Immun. 67:3437–3443.

13. Cousineau, B., S. Lawrence, D. Smith, and M. Belfort. 2000. Retrotranspo-sition of a bacterial group II intron. Nature 404:1018–1021.

14. Davis, M. A., L. Radnedge, K. A. Martin, F. Hayes, B. Youngren, and S. J.Austin. 1996. The P1 ParA protein and its ATPase activity play a direct rolein the segregation of plasmid copies to daughter cells. Mol. Microbiol.21:1029–1036.

15. Demers, B., P. J. Sansonetti, and C. Parsot. 1998. Induction of type IIIsecretion in Shigella flexneri is associated with differential control of tran-scription of genes encoding secreted proteins. EMBO J. 17:2894–2903.

16. Edwards, C. J., D. J. Innes, D. M. Burns, and I. R. Beacham. 1993. UDP-sugar hydrolase isoenzymes in Salmonella enterica and Escherichia coli: silentalleles of ushA in related strains of group I Salmonella isolates, and of ushBin wild-type and K12 strains of E. coli, indicate recent and early silencingevents, respectively. FEMS Microbiol. Lett. 114:293–298.

17. Egile, C., H. d’Hauteville, C. Parsot, and P. J. Sansonetti. 1997. SopA, theouter membrane protease responsible for polar localization of IcsA in Shi-gella flexneri. Mol. Microbiol. 23:1063–1073.

18. Fallarino, A., C. Mavarangelos, U. H. Stroeher, and P. A. Manning. 1997.Identification of additional genes required for O-antigen biosynthesis inVibrio cholerae O1. J. Bacteriol. 179:2147–2153.

19. Fernandez-Prada, C. M., D. L. Hoover, B. D. Tall, A. B. Hartman, J. Kopel-owitz, and M. M. Venkatesan. 2000. Shigella flexneri IpaH7.8 facilitates escapeof virulent bacteria from the endocytic vacuoles of mouse and human mac-rophages. Infect. Immun. 68:3608–3619.

20. Fernandez-Prada, C. M., D. L. Hoover, B. D. Tall, and M. M. Venkatesan.1997. Human monocyte-derived macrophages infected with virulent Shigellaflexneri in vitro undergo a rapid cytolytic event similar to oncosis but notapoptosis. Infect. Immun. 65:1486–1496.

21. Galan, J. E., and A. Collmer. 1999. Type III secretion machines: bacterialdevices for protein delivery into host cells. Science 284:1322–1328.

22. Garcia-DelPortillo, F., M. G. Pucciarelli, and J. Casadesus. 1999. DNAadenine methylase mutants of Salmonella typhimurium show defects in pro-tein secretion, cell invasion, and M cell cytotoxicity. Proc. Natl. Acad. Sci.USA 96:11578–11583.

23. Gomez-Duarte, O. G., and J. B. Kaper. 1995. A plasmid-encoded regulatoryregion activates chromosomal eaeA expression in enteropathogenic Esche-richia coli. Infect. Immun. 63:1767–1776.

24. Groisman, E. A., and H. Ochman. 1996. Pathogenicity islands: bacterialevolution in quantum leaps. Cell 87:791–794.

25. Gronlund, H., and K. Gerdes. 1999. Toxin-antitoxin systems homologouswith relBE of Escherichia coli plasmid P307 are ubiquitous in prokaryotes. J.Mol. Biol. 285:1401–1415.

26. Heithoff, D. M., R. L. Sinsheimer, D. A. Low, and M. J. Mahan. 1999. Anessential role for DNA adenine methylation in bacterial virulence. Science284:967–970.

27. Jackson, R. W., E. Athanassopoulos, G. Tsiamis, J. W. Mansfield, A. Sesma,D. L. Arnold, M. J. Gibbon, J. Murillo, J. D. Taylor, and A. Vivian. 1999.Identification of a pathogenicity island, which contains genes for virulenceand avirulence, on a large native plasmid in the bean pathogen Pseudomonas

3284 VENKATESAN ET AL. INFECT. IMMUN.

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from

Page 15: Complete DNA Sequence and Analysis of the Large Virulence Plasmid of Shigella … · Shigella virulence plasmid. Initial analysis of the sequence identified 293 potential ORFs (Fig.

syringae pathovar phaseolicola. Proc. Natl. Acad. Sci. USA 96:10875–10880.28. Kajava, A. V. 1998. Structural diversity of leucine-rich repeat proteins. J.

Mol. Biol. 277:519–527.29. Kotloff, K. L., J. P. Winickoff, B. Ivanoff, J. D. Clemens, D. L. Swerdlow, P. J.

Sansonetti, G. K. Adak, and M. M. Levine. 1999. Global burden of Shigellainfections: implications for vaccine development and implementation of con-trol strategies. Bull W. H. O. 77:651–666.

30. Lai, V., L. Wang, and P. R. Reeves. 1998. Escherichia coli clone Sonnei(Shigella sonnei) had a chromosomal O-antigen gene cluster prior to gainingits current plasmid-borne O-antigen genes. J. Bacteriol. 180:2983–2986.

31. Mahillon, J., H. A. Kirkpatrick, H. L. Kijenski, C. A. Bloch, C. K. Rode, G. F.Mayhew, D. J. Rose, G. Plunkett, V. Burland, and F. R. Blattner. 1998.Subdivision of Escherichia coli K-12 genome for sequencing: manipulationand DNA sequence of transposable elements introducing unique restrictionsites. Gene 223:47–54.

32. Mahillon, J., C. Leonard, and M. Chandler. 1999. IS elements as constitu-ents of bacterial genomes. Res. Microbiol. 150:675–687.

33. Martinez-Abarca, F., S. Zekri, and N. Toro. 1998. Characterization andsplicing in vivo of a Sinorhizobium meliloti group II intron associated withparticular insertion sequences of the IS630-Tc1/IS3 retrotransposon super-family. Mol. Microbiol. 28:1295–1306.

34. Maurelli, A. T., B. Blackmon, and R. Curtiss. 1984. Loss of pigmentation inShigella flexneri 2a is correlated with loss of virulence and virulence-associ-ated plasmid. Infect. Immun. 43:397–401.

35. Maurelli, A. T., R. E. Fernandez, C. A. Bloch, C. K. Rode, and A. Fasano.1998. “Black holes” and bacterial pathogenicity: a large genomic deletionthat enhances the virulence of Shigella spp. and enteroinvasive Escherichiacoli. Proc. Natl. Acad. Sci. USA 95:3943–3948.

36. Mills, J. A., M. Venkatesan, L. S. Baron, and J. M. Buysse. 1992. Sponta-neous insertion of an IS1-like element into the virF gene is responsible foravirulence in opaque colonial variants of Shigella flexneri 2a. Infect. Immun.60:175–182.

37. Nakata, N., T. Tobe, I. Fukuda, T. Suzuki, K. Komatsu, M. Yoshikawa, andC. Sasakawa. 1993. The absence of a surface protease, OmpT, determinesthe intercellular spreading ability of Shigella: the relationship between theompT and kcpA loci. Mol. Microbiol. 9:459–468.

38. Nataro, J. P., J. Seriwatana, A. Fasano, D. R. Maneval, L. D. Guers, F.Noriega, F. Dubovsky, M. M. Levine, and J. G. Morris. 1995. Identificationand cloning of a novel plasmid-encoded enterotoxin of enteroinvasive Esch-erichia coli and Shigella strains. Infect. Immun. 63:4721–4728.

39. Newland, J. W., T. L. Hale, and S. B. Formal. 1992. Genotypic and pheno-typic characterization of an aroD deletion-attenuated Escherichia coli K12-Shigella flexneri hybrid vaccine expressing S. flexneri 2a somatic antigen.Vaccine 10:766–776.

40. Nordstrom, K., and E. G. Wagner. 1994. Kinetic aspects of control of plas-mid replication by antisense RNA. Trends Biochem. Sci. 19:294–300.

41. Nordstrom, M., and K. Nordstrom. 1985. Control of replication of FIIplasmids: comparison of the basic replicons and of the copB systems ofplasmids R100 and R1. Plasmid 13:81–87.

42. Ochman, H., J. G. Lawrence, and E. A. Groisman. 2000. Lateral genetransfer and the nature of bacterial innovation. Nature 405:299–304.

43. Ohtsubo, H., T. B. Ryder, Y. Maeda, K. Armstrong, and E. Ohtsubo. 1986.DNA replication of the resistance plasmid R100 and its control. Adv. Bio-phys. 21:115–132.

44. Perna, N. T., G. F. Mayhew, G. Posfai, S. Elliott, M. S. Donnenberg, J. B. Kaper,and F. R. Blattner. 1998. Molecular evolution of a pathogenicity island fromenterohemorrhagic Escherichia coli O157:H7. Infect. Immun. 66:3810–3817.

45. Perry, R. D., S. C. Straley, J. D. Fetherston, D. J. Rose, J. Gregor, and F. R.Blattner. 1998. DNA sequence and analysis of the low-Ca21-response plas-mid pCD1 of Yersinia pestis KIM5. Infect. Immun. 66:4611–4623.

46. Philpott, D. J., J. D. Edgeworth, and P. J. Sansonetti. 2000. The pathogenesisof Shigella flexneri infection: lessons from in vitro and in vivo studies. Philos.Trans. R. Soc. Lond. B 355:575–586.

47. Pupo, G. M., R. Lan, and P. R. Reeves. 2000. Multiple independent originsof Shigella clones of Escherichia coli and convergent evolution of many oftheir characteristics. Proc. Natl. Acad. Sci. USA 97:10567–10572.

48. Rajakumar, K., F. Luo, C. Sasakawa, and B. Adler. 1996. Evolutionary perspec-tive on a composite Shigella flexneri 2a virulence plasmid-borne locus comprisingthree distinct genetic elements. FEMS Microbiol. Lett. 144:13–20.

49. Rakin, A., C. Noelting, S. Schubert, and J. Hessemann. 1999. Common andspecific characteristics of the high-pathogenicity island of Yersinia enteroco-litica. Infect. Immun. 67:5265–5274.

50. Runyen-Janecky, L. J., M. Hong, and S. M. Payne. 1999. The virulence plasmid-encoded impCAB operon enhances survival and induced mutagenesis in Shigellaflexneri after exposure to UV radiation. Infect. Immun. 67:1415–1423.

51. Sakellaris, H., N. K. Hannink, K. Rajakumar, D. Bulach, M. Hunt, C.Sasakawa, and B. Adler. 2000. Curli loci of Shigella spp. Infect. Immun.68:3780–3783.

52. Sansonetti, P. J., D. J. Kopecko, and S. B. Formal. 1982. Involvement of aplasmid in the invasive ability of Shigella flexneri. Infect. Immun. 35:852–860.

53. Sasakawa, C., K. Kamata, T. Sakai, S. Y. Murayama, S. Makino, and M.Yoshikawa. 1986. Molecular alteration of the 140-megadalton plasmid asso-ciated with loss of virulence and Congo red binding activity in Shigellaflexneri. Infect. Immun. 51:470–475.

54. Sasakawa, C., S. Makino, K. Kamata, and M. Yoshikawa. 1986. Isolation,characterization, and mapping of Tn5 insertions into the 140-megadaltoninvasion plasmid defective in the mouse Sereny test in Shigella flexneri 2a.Infect. Immun. 54:32–36.

55. Sayeed, S., L. Reaves, and L. Radnedge. 2000. The stability region of thelarge virulence plasmid of Shigella flexneri encodes an efficient postsegrega-tional killing system. J. Bacteriol. 182:2416–2421.

56. Schechter, L. M., S. D. Damrauer, and C. A. Lee. 1999. Two AraC/XylSfamily members can independently counteract the effect of repressing se-quences upstream of the hilA promoter. Mol. Microbiol. 32:629–642.

57. Schneiker, S., B. Kosier, A. Puhler, and W. Selbitschka. 1999. The Sinorhi-zobium meliloti insertion sequence (IS) element ISRm14 is related to apreviously unrecognized IS element located adjacent to the Escherichia colilocus of enterocyte effacement (LEE) pathogenicity island. Curr. Microbiol.39:274–281.

58. Sharpe, M. E., H. M. Chatwin, C. Macpherson, H. L. Withers, and D. K.Summers. 1999. Analysis of the ColE1 stability determinant Rcd. Microbi-ology 145:2145–2144.

59. Silva, R. M., S. Saadi, and W. K. Maas. 1988. A basic replicon of virulence-associated plasmids of Shigella spp. and enteroinvasive Escherichia coli ishomologous with a basic replicon in plasmids of IncF groups. Infect. Immun.56:836–842.

60. Somerville, J. E., L. Cassiano, B. Bainbridge, M. D. Cunningham, and P. P.Darveau. 1996. A novel Escherichia coli lipid A mutant that produces anti-inflammatory lipopolysaccharide. J. Clin. Investig. 97:359–365.

61. Tabuchi, A., Y. N. Min, D. D. Womble, and R. H. Rownd. 1992. Autoregulationof the stability operon of IncFII plasmid NR1. J. Bacteriol. 174:7629–7634.

62. Taira, S., P. Riikonen, H. Saarilahti, S. Sukupolvi, and M. Rhen. 1991. ThemkaC virulence gene of the Salmonella serovar typhimurium 96 kb plasmidencodes a transcriptional activator. Mol. Gen. Genet. 228:381–384.

63. Tanaka, K., T. Rogi, H. Hiasa, D. M. Miao, Y. Honda, N. Nomura, H. Sakai,and T. Komano. 1994. Comparative analysis of functional and structuralfeatures in the primase dependent priming signals, G sites, from phages andplasmids. J. Bacteriol. 176:3606–3613.

64. Tanaka, K., H. Sakai, Y. Honda, T. Nakamura, A. Higashi, and T. Komano.1992. Structural and functional features of cis-acting sequences in the basicreplicon of plasmid ColIb-P9. Nucleic Acids Res. 20:2705–2710.

65. Tatusov, R. L., M. Y. Galperin, D. A. Natale, and E. V. Koonin. 2000. TheCOG database: a tool for genome-scale analysis of protein functions andevolution. Nucleic Acids Res. 28:33–36.

66. Tavakoli, N., A. Comanducci, H. M. Dodd, M. C. Lett, B. Albiger, and P.Bennett. 2000. IS1294, a DNA element that transposases by RC transposi-tion. Plasmid 44:66–84.

67. Tobe, T., T. Hayashi, C. G. Han, G. K. Schoolnik, E. Ohtsubo, and C.Sasakawa. 1999. Complete DNA sequence and structural analysis of theenteropathogenic Escherichia coli adherence factor plasmid. Infect. Immun.67:5455–5462.

68. Tran van Nhieu, G., and P. J. Sansonetti. 1999. Mechanism of Shigella entryinto epithelial cells. Curr. Opin. Microbiol. 2:51–55.

69. Venkatesan, M. M., J. M. Buysse, and A. Hartman. 1991. Sequence variationin two ipaH genes of Shigella flexneri and homology to the LRG-like familyof proteins. Mol. Microbiol. 5:2435–2445.

70. Venkatesan, M. M., J. M. Buysse, and D. J. Kopecko. 1988. Characterizationof invasion plasmid antigen genes (ipaBCD) from Shigella flexneri. Proc. Natl.Acad. Sci. USA 85:9317–9321.

71. Venkatesan, M. M., C. Fernandez-Prada, J. M. Buysse, S. B. Formal, andT. L. Hale. 1991. Virulence phenotype and genetic characteristics of the T-32Istrati Shigella flexneri vaccine strain. Vaccine 9:358–363.

72. Way, S. S., A. C. Borczuk, and M. B. Goldberg. 1999. Thymic independenceof adaptive immunity to the intracellular pathogen Shigella flexneri serotype2a. Infect. Immun. 67:3970–3979.

73. Wu, R., X. Wang, D. D. Wonble, and R. H. Rownd. 1992. Expression of therepA1 gene of IncFII plasmid NR1 is translationally coupled to expression ofan overlapping leader peptide. J. Bacteriol. 174:7620–7628.

74. Yoshioka, Y., Y. Fujita, and E. Ohtsubo. 1990. Nucleotide sequence of thepromoter-distal region of the tra operon of plasmid R100. J. Mol. Biol.214:39–53.

75. Zychlinsky, A., B. Kenny, R. Menard, M.-C. Prevost, I. B. Holland, and P. J.Sansonetti. 1994. IpaB mediates macrophage apoptosis induced by Shigellaflexneri. Mol. Microbiol. 11:619–627.

Editor: V. J. DiRita

VOL. 69, 2001 SEQUENCE ANALYSIS OF THE SHIGELLA VIRULENCE PLASMID 3285

on August 26, 2020 by guest

http://iai.asm.org/

Dow

nloaded from