Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not...

23
CLINICAL MICROBIOLOGY REVIEWS, Oct. 2004, p. 840–862 Vol. 17, No. 4 0893-8512/04/$08.000 DOI: 10.1128/CMR.17.4.840–862.2004 Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases Jill E. Clarridge III* Department of Laboratory Medicine, University of Washington, and Pathology and Laboratory Medicine Service, Veterans Affairs Medical Center, Seattle, Washington INTRODUCTION .......................................................................................................................................................840 MECHANICS OF THE PROCESS ..........................................................................................................................841 Choice of the 16S rRNA Gene as the Gene To Sequence .................................................................................841 Basics of Sequencing ..............................................................................................................................................844 ASSIGNING AN IDENTIFICATION WITH 16S rRNA GENE SEQUENCES .................................................845 Overview of Bacterial Identification and Taxonomic Placement Using 16S rRNA Gene Sequences .........845 Problems in Generating a Sequence ....................................................................................................................845 Generating Dendrograms and Comparing Sequences.......................................................................................846 Sequence Databases................................................................................................................................................848 THE CLINICAL MICROBIOLOGIST’S DILEMMA IN ASSIGNING A GENUS AND SPECIES NAME ..849 Common Definitions of Genus or Species Derived by 16S rRNA Gene Sequence Analysis ........................849 Problems with the Present Nomenclature ...........................................................................................................850 Microheterogeneity in the 16S rRNA Gene Sequence Is Common..................................................................852 MAJOR IMPROVEMENTS IN CLINICAL MICROBIOLOGY PRACTICE BY IDENTIFYING BACTERIA BY SEQUENCE INSTEAD OF PHENOTYPE .........................................................................854 16S rRNA Gene Sequences Can Better Identify Poorly Described, Rarely Isolated, or Phenotypically Aberrant Strains .................................................................................................................................................854 16S rRNA Gene Sequences Can Be Routinely Used for Identification of Mycobacteria .............................856 16S rRNA Gene Sequence Analysis Can Lead to the Discovery and Description of Novel Pathogens......856 16S rRNA Gene Sequence Analysis Can Identify Noncultured Bacteria........................................................857 COSTS IN A ROUTINE CLINICAL MICROBIOLOGY LABORATORY .........................................................857 STANDARDS FOR EDITORS, REVIEWERS, AND LABORATORIANS..........................................................858 CONCLUSIONS .........................................................................................................................................................859 ACKNOWLEDGMENTS ...........................................................................................................................................859 REFERENCES ............................................................................................................................................................859 INTRODUCTION One area within the practice of clinical microbiology is the craft of putting scientific names to microbial isolates. This is usually done with the intent of giving insight into the etiolog- ical agent causing an infectious disease, including pathological associations and possible effective antimicrobial therapy. The historical method for performing this task is dependent on the comparison of an accurate morphologic and phenotypic de- scription of type strains or typical strains with the accurate morphologic and phenotypic description of the isolate to be identified. Microbiologists authoring standard references such as Bergey’s Manual of Systematic Bacteriology or the Manual of Clinical Microbiology or compiling results from well-character- ized strains such as those found at the Centers for Disease Control and Prevention or the American Type Culture Collec- tion (ATCC) would publish tables summarizing the character- istics of each species of bacteria (35, 54, 60). Clinical microbi- ologists would try to match the results for their unknown clinical strain with a group in these tables. Not infrequently, there would be no perfect match and a judgment would have to be made about the most probable identification. Although various schema and computer programs were devised to help in these judgements, identification could vary among labora- tories (96). In the 1980s, a new standard for identifying bacteria began to be developed. In the laboratories of Woese and others, it was shown that phylogenetic relationships of bacteria, and, indeed, all life-forms, could be determined by comparing a stable part of the genetic code (111, 113). Candidates for this genetic area in bacteria included the genes that code for the 5S, the 16S (also called the small subunit), and the 23S rRNA and the spaces between these genes. The part of the DNA now most commonly used for taxonomic purposes for bacteria is the 16S rRNA gene (7, 36, 44, 52, 64, 101). The 16S rRNA gene is also designated 16S rDNA, and the terms have been used interchangeably: current ASM policy is that “16S rRNA gene” be used. The 16S rRNA gene can be compared not only among all bacteria but also with the 16S rRNA gene of ar- cheobacteria and the 18S rRNA gene of eucaryotes. Figure 1 shows the relationship of major branches of life, the Archaea, Bacteria (procaryotes), and Eucarya, as well as the major branches within the procaryotes based on these gene se- quences (62, 64, 111, 113). The goal of this review is to describe not only the mechanism * Mailing address: Pathology and Laboratory Medicine Services (113), VA Medical Center, 1660 S. Columbian Way, Seattle, WA 98108. Phone: (206) 277-4514. Fax: (206) 764-2001. E-mail: jill [email protected]. 840 on February 10, 2021 by guest http://cmr.asm.org/ Downloaded from

Transcript of Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not...

Page 1: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

CLINICAL MICROBIOLOGY REVIEWS, Oct. 2004, p. 840–862 Vol. 17, No. 40893-8512/04/$08.00�0 DOI: 10.1128/CMR.17.4.840–862.2004

Impact of 16S rRNA Gene Sequence Analysis for Identification ofBacteria on Clinical Microbiology and Infectious Diseases

Jill E. Clarridge III*Department of Laboratory Medicine, University of Washington, and Pathology and Laboratory

Medicine Service, Veterans Affairs Medical Center, Seattle, Washington

INTRODUCTION .......................................................................................................................................................840MECHANICS OF THE PROCESS ..........................................................................................................................841

Choice of the 16S rRNA Gene as the Gene To Sequence .................................................................................841Basics of Sequencing ..............................................................................................................................................844

ASSIGNING AN IDENTIFICATION WITH 16S rRNA GENE SEQUENCES .................................................845Overview of Bacterial Identification and Taxonomic Placement Using 16S rRNA Gene Sequences .........845Problems in Generating a Sequence ....................................................................................................................845Generating Dendrograms and Comparing Sequences.......................................................................................846Sequence Databases................................................................................................................................................848

THE CLINICAL MICROBIOLOGIST’S DILEMMA IN ASSIGNING A GENUS AND SPECIES NAME..849Common Definitions of Genus or Species Derived by 16S rRNA Gene Sequence Analysis ........................849Problems with the Present Nomenclature...........................................................................................................850Microheterogeneity in the 16S rRNA Gene Sequence Is Common..................................................................852

MAJOR IMPROVEMENTS IN CLINICAL MICROBIOLOGY PRACTICE BY IDENTIFYINGBACTERIA BY SEQUENCE INSTEAD OF PHENOTYPE .........................................................................854

16S rRNA Gene Sequences Can Better Identify Poorly Described, Rarely Isolated, or PhenotypicallyAberrant Strains .................................................................................................................................................854

16S rRNA Gene Sequences Can Be Routinely Used for Identification of Mycobacteria .............................85616S rRNA Gene Sequence Analysis Can Lead to the Discovery and Description of Novel Pathogens......85616S rRNA Gene Sequence Analysis Can Identify Noncultured Bacteria........................................................857

COSTS IN A ROUTINE CLINICAL MICROBIOLOGY LABORATORY.........................................................857STANDARDS FOR EDITORS, REVIEWERS, AND LABORATORIANS..........................................................858CONCLUSIONS .........................................................................................................................................................859ACKNOWLEDGMENTS ...........................................................................................................................................859REFERENCES ............................................................................................................................................................859

INTRODUCTION

One area within the practice of clinical microbiology is thecraft of putting scientific names to microbial isolates. This isusually done with the intent of giving insight into the etiolog-ical agent causing an infectious disease, including pathologicalassociations and possible effective antimicrobial therapy. Thehistorical method for performing this task is dependent on thecomparison of an accurate morphologic and phenotypic de-scription of type strains or typical strains with the accuratemorphologic and phenotypic description of the isolate to beidentified. Microbiologists authoring standard references suchas Bergey’s Manual of Systematic Bacteriology or the Manual ofClinical Microbiology or compiling results from well-character-ized strains such as those found at the Centers for DiseaseControl and Prevention or the American Type Culture Collec-tion (ATCC) would publish tables summarizing the character-istics of each species of bacteria (35, 54, 60). Clinical microbi-ologists would try to match the results for their unknownclinical strain with a group in these tables. Not infrequently,

there would be no perfect match and a judgment would have tobe made about the most probable identification. Althoughvarious schema and computer programs were devised to helpin these judgements, identification could vary among labora-tories (96).

In the 1980s, a new standard for identifying bacteria beganto be developed. In the laboratories of Woese and others, itwas shown that phylogenetic relationships of bacteria, and,indeed, all life-forms, could be determined by comparing astable part of the genetic code (111, 113). Candidates for thisgenetic area in bacteria included the genes that code for the5S, the 16S (also called the small subunit), and the 23S rRNAand the spaces between these genes. The part of the DNA nowmost commonly used for taxonomic purposes for bacteria isthe 16S rRNA gene (7, 36, 44, 52, 64, 101). The 16S rRNAgene is also designated 16S rDNA, and the terms have beenused interchangeably: current ASM policy is that “16S rRNAgene” be used. The 16S rRNA gene can be compared not onlyamong all bacteria but also with the 16S rRNA gene of ar-cheobacteria and the 18S rRNA gene of eucaryotes. Figure 1shows the relationship of major branches of life, the Archaea,Bacteria (procaryotes), and Eucarya, as well as the majorbranches within the procaryotes based on these gene se-quences (62, 64, 111, 113).

The goal of this review is to describe not only the mechanism

* Mailing address: Pathology and Laboratory Medicine Services(113), VA Medical Center, 1660 S. Columbian Way, Seattle, WA98108. Phone: (206) 277-4514. Fax: (206) 764-2001. E-mail: [email protected].

840

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 2: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

and limits of bacterial 16S rRNA gene sequence analysis butalso the impact and potential contribution that the 16S rRNAgene sequence analysis can make to the understanding of clin-ical microbiology and infectious diseases. It is hoped that thiswill promote recognition that the correct identification or tax-onomic name assignment can make a difference in our under-standing of the pathogenic process and in clinical outcome. Afurther goal is to help the clinical microbiologist winnow theenormous amount of taxonomic information now being gen-erated in order to promote meaningful and scientifically accu-rate communications with clinical colleagues.

MECHANICS OF THE PROCESS

Choice of the 16S rRNA Gene as the Gene To Sequence

In the 1960s, Dubnau et al. (28) noted conservation in the16S rRNA gene sequence relationships in Bacillus spp. Wide-spread use of this gene sequence for bacterial identification

and taxonomy followed a body of pioneering work by Woese,who defined important properties. Foremost is the fact that itseems to behave as a molecular chronometer, as pointed out inan excellent review article by Woese (113). The degree ofconservation is assumed to result from the importance of the16S rRNA as a critical component of cell function. This is incontrast to the genes needed to make enzymes. Mutations inthese genes can usually be tolerated more frequently since theymay affect structures not as unique and essential as rRNA (if abacterium does not have the gene to make the enzymes neededto utilize lactose, it can use an alternative sugar or protein asan energy source). Thus, few other genes are as highly con-served as the 16S rRNA gene. Although the absolute rate ofchange in the 16S rRNA gene sequence is not known, it doesmark evolutionary distance and relatedness of organisms (44,49, 62, 100). Problems in assigning a numerical value to thisrate of change include the possibility that this rate of change of

FIG. 1. Universal phylogenetic tree based on the 16S rRNA gene sequence comparisons. Reprinted from reference 62 with permission of thepublisher.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 841

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 3: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

16S rRNA gene may not be identical for all organisms (differ-ent taxonomic groups could have different rates of change), therates could vary at times during evolution, and the rates couldbe different at different sites throughout the 16S rRNA gene.There are so-called “hot spots” which show larger numbers ofmutations (101, 104); these areas are not the same for allspecies. 16S rRNA is also the target for several antimicrobialagents. As such, mutations in the 16S rRNA gene can affect thesusceptibility of the organism to these agents and the 16SrRNA gene sequence can distinguish phenotypic resistance toantimicrobial agents (69, 70). However, these characteristicsdo not obviate or affect the use of 16S rRNA gene sequence forbacterial identification or assignment of close relationships atthe genus and species level, as used in clinical microbiology.They can have a greater impact on the assignment of relation-ships of the deeper (more distantly related) branches (36).

The 16S rRNA gene sequence is about 1,550 bp long and iscomposed of both variable and conserved regions. The gene islarge enough, with sufficient interspecific polymorphisms of16S rRNA gene, to provide distinguishing and statistically validmeasurements. Universal primers are usually chosen as com-plementary to the conserved regions at the beginning of thegene and at either the 540-bp region or at the end of the wholesequence (about the 1,550-bp region), and the sequence of thevariable region in between is used for the comparative taxon-omy (11, 75). Although 500 and 1,500 bp are common lengthsto sequence and compare, sequences in databases can be ofvarious lengths.

The 16S rRNA gene sequence has been determined for alarge number of strains. GenBank, the largest databank ofnucleotide sequences, has over 20 million deposited sequences,of which over 90,000 are of 16S rRNA gene. This means thatthere are many previously deposited sequences against whichto compare the sequence of an unknown strain.

Lastly, the 16S rRNA gene is universal in bacteria, and sorelationships can be measured among all bacteria (111, 113)(Fig. 2). In general, the comparison of the 16S rRNA genesequences allows differentiation between organisms at the genuslevel across all major phyla of bacteria, in addition to classifyingstrains at multiple levels, including what we now call the speciesand subspecies level. The occasional exceptions to the usefulnessof 16S rRNA gene sequencing usually relate to more than onewell-known species having the same or very similar sequences.

It is also important to consider whether it is necessary tosequence the whole 1,500-bp length or whether the commonlyreported shorter sequences can provide comparable informa-tion. Sometimes sequencing the entire 1,500-bp region is nec-essary to distinguish between particular taxa or strains (84, 85).Sequencing of the entire 1,500-bp sequence is also desirableand usually required when describing a new species. However,for most clinical bacterial isolates the initial 500-bp sequenceprovides adequate differentiation for identification and in factcan provide a bigger percent difference between strains be-cause the region shows slightly more diversity per kilobasesequenced. Kattar et al. (48) found that 66% of the variabilityin the 16S rRNA gene sequence among Bordetella species wasin the first 500 bp. Evaluations published in the literature,made using the MicroSeq database (Applied Biosystems Inc.[ABI], Foster City, Calif.), are usually based on the 500-bpsequence (42, 66, 97, 98). Other researchers have made iden-

tifications using sequences of about 400 bp (6) or even less than200 bp (109). Data in the figures and calculations in this reviewrefer to the 500-bp length unless otherwise noted. From theMicroSeq databases of over 1,400 organisms for both the 500-and 1,500-bp lengths, we compared the 500- and 1,500-bpsequences for 100 organisms by using each length to generatedendrograms and found the relationships of species to be ba-sically the same with either length. For example, Fig. 3 showsthat dendrograms generated using either the 1,500-bp 16SrRNA gene sequence (left side of figure) or the 500-bp 16SrRNA gene sequence (right side of figure) of a group of clinicaland type strains of brevibacteria are similar but not identical.

On a practical note, generating the 500-bp sequence is lessexpensive and easier since it takes more sequencing reactionsto generate the 1,500-bp sequence.

Many other genomic regions have also been used to examinethe phylogenetic relationships among bacteria. Whole-genomeanalysis has been tried, but this is quite difficult because thegenomes are of such different sizes and because gene duplica-tion, gene transfer, gene deletion, gene fusion, and gene split-ting are common; at present there are less than 100 wholegenomes to compare (3, 112). However, it has been observedthat the trees based on whole-genomic analysis and the 16SrRNA gene trees are similar (3). Other areas of the rRNAgene have also been used for studying phylogenetic relation-ships among bacteria. Roth et al. used the 16S-23S rRNA geneinternal transcribed spacer sequences to distinguish amongMycobacterium spp., finding it particularly useful for speciesthat were indistinguishable by 16S rRNA gene sequences (82).Others have found the use of 23S rRNA sequences helpful indistinguishing among Streptococcus spp. (72). Although someresearchers find that an overall robustness of the method issuggested because the major branching points of the phyloge-netic tree were conserved when either the 16S rRNA or 16S-23S rRNA gene sequences were used (82), others find the 16SrRNA gene sequence much more useful for phylogenetic anal-ysis than the 16S-23S rRNA gene region (89). In any case, themethod is not widely used, and there are few comparativesequences. For mycobacteria, the gene encoding the 65-kDaheat shock protein is highly conserved and also has been usedto define taxonomic relationships (63, 79, 101). Although thegene encoding the 65-kDa heat shock protein sequences arephylogenetically useful, far fewer of them are available in da-tabases. Phylogenetic trees obtained using protein-encodinggene sequence comparisons do not seem to reveal deep-rootedtaxonomic and evolutionary relationships as reliably as thoseobtained using the 16S rRNA gene (36,40, 55).

In general, if one wants to compare strains for epidemiolog-ical purposes or to detect a strain having a particular virulencefactor, 16S rRNA gene analysis is not usually adequate there isnot enough variation, and, obviously, the region does not en-code virulence factors. An exception to this is the microheter-ogeniety in the 16S rRNA gene sequence found by Sacchi et al.(85), which could be used to track Neisseria meningitidis strains.In addition, if one is interested only in differentiating specieswithin a particular genus, a better gene than the 16S rRNAgene might be found to identify species. For example, thecitrate synthetase gene in the genera Bartonella and Rickettsia(32, 73) seems to be unique for each species and thus is anexcellent tool in experiments to differentiate them. However,

842 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 4: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

FIG. 2. Dendrogram showing the genetic relationships of many of the major groups of clinically important organisms based on the 500-bp 16SrRNA gene sequence. Most sequences are of type strains from the MicroSeq database. The Leptotrichia buccalis sequence was downloaded fromGenBank, and the sequence for clinical strain Unknownsp01M1398 was generated in our laboratory.

843

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 5: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

no gene has shown as broad applicability over all the taxo-nomic groups as the 16S rRNA gene. Thus, if the goal is toidentify an unknown organism on the basis of no a prioriknowledge, the 16S rRNA gene sequence is an excellent andextensively used choice. The new edition of Bergey’s Manual ofSystematic Bacteriology, the most widely used and authoritativereference on bacterial taxonomy, is organized using 16S rRNAgene sequence analysis as the backbone. Two chapters in thesecond edition of Bergey’s Manual of Systematic Bacteriology areparticularly recommended and give an excellent overview (36,55).

Basics of Sequencing

Nucleic acid sequencing methods have undergone tremen-dous advances over the past decade. These rapid advanceshave made it possible for even a small laboratory to determinethe sequence of millions of base pairs of DNA per year. Thequality of sequence data has improved with the speed andtechnology available. Thus, one sees that in public databasessuch as GenBank, the 16S rRNA gene sequences deposited inthe early 1990s have many more incorrect and indeterminatebases than do recently generated sequences. We briefly outlinethe steps required below and in Table 1. Table 1 also shows theapproximate time required for each step of the ABI dye ter-minator method. Other publications give these steps in moredetail, along with diagrams (52, 65, 98). Generally, it is possibleto generate a sequence in less than 1.5 working days by usingless than a colony or directly from a specimen (9, 22, 98).

Historically, there have been several methods to determineDNA sequence. These methods are detailed in reference 87(chapters 13 and 14). Bacterial genomic DNA is extractedfrom whole cells by using a standard method (87) or a com-mercial system (e.g., PrepMan DNA extraction reagent; ABI).The DNA is used as the template for PCR to amplify a seg-

ment of about 500 or 1,500 bp of the 16S rRNA gene sequence.Broad-based or universal primers complementary to conservedregions are used so that the region can be amplified from anybacteria. The PCR products are purified to remove excessprimers and nucleotides; several good commercial kits areavailable (e.g., QiaQick PCR purification kit [Qiagen] andMicrocon-100 Microconcentrator columns [Millipore]).

The next step is a process called cycle sequencing. It issimilar to PCR in that it uses DNA (purified products of thefirst PCR cycle) as the template. Both the forward and reversesequences are used as the template in separate reactions inwhich only the forward or reverse primer is used. Cycle se-quencing also differs from PCR in that no new template isformed (the same template is reused for as many cycles asprogrammed, usually 25 cycles) and the product is a mixture ofDNA of various lengths. This is achieved by adding speciallylabeled bases called dye terminators (along with unlabeledbases), which, when they are randomly incorporated in thissecond cycle, terminate the sequence. Thus, fragments of everysize are generated. As each of the four added labeled termi-nator bases has different fluorescent dye, each of which absorbsat a different wavelength, the terminal base of each fragmentcan be determined by a fluorometer.

The products are purified to remove unincorporated dye ter-minators, and the length of each is determined using capillaryelectrophoresis (e.g., ABI PRISM 3100 genetic analyzer with 16capillaries or ABI PRISM 310 genetic analyzer with 1 capillary)or gel electrophoresis (e.g., the Visible Genetics system). Since wethen know the length and terminal base of each fragment, thesequence of the bases can be determined. The two strands of theDNA are sequenced separately, generating both forward andreverse (complementary) sequences. An electropherogram, atracing of the detection of the separated fragments as they elutefrom the column (or are separated in the gel) in which each baseis represented by a different color, can be manually or automat-

FIG. 3. A comparison of dendrograms generated using either the 1,500-bp 16S rRNA gene sequence (left) or the 500-bp 16S rRNA genesequence (right) of a group of clinical and type strains of Brevibacterium.

844 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 6: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

ically edited. It is possible to have the fragments of various lengthsso well separated that every base of a 500-bp sequence can bedetermined. When ambiguities occur, most of them can be re-solved by visual reediting of the electropherogram.

We have evaluated two different DNA sequencers for clin-ical microbiology identifications. The model 3100 ABI se-quencer was superior to the model 310 sequencer in the lengthof reliable sequence (520 and 460 bp, respectively), in additionto the published superiority of ease of use and time of run. Inthe time and cost analysis, our times and thus our costs arebased on data using the ABI PRISM 3100 genetic analyzer.

The generated DNA sequences are usually assembled byaligning the forward and reverse sequences. This consensus se-quence is then compared with a database library by using analysissoftware. Some systems allow comparisons of the single forwardor reverse sequences. Well-known databases of 16S rRNA genesequences that can be consulted via the World Wide Web areGenBank (http://www.ncbi.nlm.nih.gov/), the Ribosomal Data-base Project (RDP-II) (http://rdp.cme.msu.edu/html/), the Ribo-somal Database Project European Molecular Biology Laboratory(http://www.ebi.ac.uk/embl/), Smart Gene IDNS (http://www.smartgene.ch), and Ribosomal Differentiation of Medical Mi-croorganisms (RIDOM) (http://www.ridom.com/). The propri-etary MicroSeq 500 bacterial database (version 1.4.2) containssequences for 1,434 species or subspecies within 235 genera.

ASSIGNING AN IDENTIFICATION WITH 16S rRNAGENE SEQUENCES

Overview of Bacterial Identification and TaxonomicPlacement Using 16S rRNA Gene Sequences

In Fig. 2, the major branches of important bacteria encoun-tered in clinical practice are shown in dendrogram form,

whereas in Fig. 1, the major divisions of bacteria are shownwith all other life-forms in a star form. For reference, note andcompare where Bacillus, Clostridium, and Escherichia are ineach figure.

Figure 2 was generated by choosing 62 genetically widelydisparate strains representing major taxonomic groups orclades of medical interest from the MicroSeq database. Chla-mydia trachomatis is the outgroup. The horizontal line at thetop (in this case, 16.105%) is used to provide a rough measureof genetic distance. To find the approximate genetic differencebetween two clades in Fig. 2, the two horizontal distancesbetween the species to be assessed are added (the verticaldistance does not count) and the total is compared to the tophorizontal line. For example, Nocardia asteroides and Corynebac-terium diphtheriae differ by approximately 10% whereas Trepo-nema medium and Mycoplasma hominis differ by approximately28%. The actual numbers generated in this manner are not veryaccurate, especially for genera that are not closely related. Thevalues also vary somewhat with the computer program used togenerate the dendrogram. However, they are useful for estimat-ing relative relatedness. The scale of the dendrogram and thenumber of organisms that are missing can be better appreciated ifone notes that the organism Escherichia coli represents the entireEnterobacteriaceae clade of over 50 species and Vibrio cholerae isthe most closely related organism represented in Fig. 2. In addi-tion to dendrograms, percent similarity, percent dissimilarity,whole-gene alignment, and concise alignment can be used tocompare and evaluate sequences.

Problems in Generating a Sequence

How often are sequences wrong? Although the technology issuch that the sequence reading is getting ever more accurate,

TABLE 1. Procedures and time to perform 16S rRNA gene sequence analysis for bacterial identification in a routine clinicalmicrobiology laboratorya

Step Procedure Time (hands-on) Waiting time(machine time)b

1 Harvest. Organism can be harvested from any plate or broth if it isa pure culture, e.g., antibiotic susceptibility plate. The age of theculture is not important. One or two 0.01 loops full is sufficient.Save for processing as a batch.

3–5 min each

2 Extraction of DNA. 0.5 h 10 min and 3 min3 PCR amplification. 0.5 h 2.0 h4 Analysis of the PCR product. Loading, running, and examining gel. 20 min 1 h5 Purification of PCR products. 1 h6 Cycle sequencing. 30 min 3.0 h7 Purification of PCR products. 1 h8 Sequencing of the 16S rRNA gene. Load capillary tray; allow to run

during time away, e.g., overnight or while doing something else.1 h 2.5 h

9 Analysis time. It takes 5 min or less to edit the sequence if theoperator, software, and runs are good.

5–15 min/sample

10 Assignment of a name. If the organism is in the database, it takes 1min; if it is a novel organism and several databases must besearched and sequences compared in detail, 15–30 min. At thispoint, correlation with phenotypic characteristics and clinicalpresentation is also done.

Not counted for this analysis

11 Reporting of results. 30 min

Total labor time, based on integrating and completing three runs of20 samples per wk.

60 samples/40 h 1 sample/40 min

a Modified from Clarridge et al., Abstr. 101st Gen. Meet. Am. Soc. Microbiol. 2001, with subsequent contributions by Kristina Hulten.b Based on an ABI 3100 instrument.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 845

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 7: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

we estimate that there may be an operator mistake in editing in1 in 5,000 to 1 in 10,000 bp. Errors at this level do not make adifference in the species name (44). There is also a questionwhether the DNA sequence must have overlap to be accurateor whether an adequate sequence could be obtained by usingjust the forward or reverse sequence. Several large and influ-ential laboratories use only the forward sequence (6). In onestudy of 50 isolates, either forward or reverse sequence couldbe used to assign a correct species identification, with less than1% difference between sequences (J. E. Clarridge, Q. Zhang,and S. Heward, Abstr. 101st Gen. Meet. Am Soc. Microbiol.2001, abstr. C-44, 2001). Other laboratories generate se-quences using multiple overlaps, particularly if microheteroge-neity is important (84, 85). Since high-fidelity polymerases withproofreading capability became available, PCR-generated er-rors are very low.

In contrast to the accuracy achievable nowadays with theexcellent equipment and reagents available, some of the se-quences deposited in public databases such as GenBank, par-ticularly those derived over 10 years ago, are not very accurate(18). This is usually because some sequences were not clearlygenerated, often because of poor separation of the fragmentsin the gel electrophoresis steps. When evaluating a sequencegenerated in another laboratory or another database for whichthe electropherogram is not available, clues to a poorly read orgenerated sequence are that they have many ambiguous bases,which are noted as N, R, Y, W, M, S, or K (meaning that thebase is unknown, A or G, C or T, A or T, A or C, G or C, Gor T, respectively). One can also see the letters B, D, H, and V,representing 3-base ambiguities. Sequences deposited morerecently are much more accurate.

It is worthwhile to examine the quality of sequences beforeusing them. During a study of Actinomyces strains (17), wecompared sequences from a commercial database (MicroSeq),a public database (GenBank-EMBL), and our own internallygenerated database. The type strain of Actinomyces turicensisin the GenBank database and the 12 clinical strains that wesequenced showed minimal (none to two) differences and noambiguous bases, indicating that all were good quality se-quences. The sequence of the type strain of A. meyeri fromMicroSeq was also of high quality. However, the type strain ofA. gravaenitzii in the GenBank database was of poorer quality,with 25 N’s in 500 bp sequenced (5%). Thus, it initially ap-peared that the phenotypically similar A. gravaenitzii strainswere genetically more heterogeneous than the A. turicensisstrains, although this may not be true.

Although most ambiguities can be resolved by a reediting ofthe original electropherogram if available (electropherogramsare not available from GenBank and do not come with manu-scripts for review), there can also be situations in which it is notpossible to determine a unique base to a particular position.Usually this is because of some technical problem, e.g., theoriginal specimen was not a pure culture, the yield of labeledproduct was too low, or the column was malfunctioning. In thiscase, there are usually many unreadable bases and the wholesequencing procedure must be repeated.

It is also possible that intracellular polymorphisms mightcause difficulties in obtaining an easily interpretable sequence;i.e., since there are multiple copies of the 16S rRNA genewithin a single-cell genome, there could be several different

sequences and thus there could be two different base pairs at agiven location. The existence of variant 16S rRNA gene allelesin a single genome has been convincingly demonstrated inseveral reports (38, 61, 74, 104, 110). Most of the documentedintracellular heterogeneities are only one or two polymor-phisms per sequence and would not lead to different speciesidentification. Among these reports, the highest rate of poly-morphisms found (104) was with stock cultures of Streptomycesspp.; 2.5% of 475 strains showed diversity of more than 1 in1,500 bp. The greatest diversity of 14 bp, shown by one strain,is still only 1% of the total number of base pairs sequenced. InStreptomyces spp., most of the intracellular polymorphisms arein the hypervariable �-helix region between bp 173 to 195, thesame region of the 16S rRNA gene in which the individualStreptomyces species differ the most (86). Not all genera havethe same hypervariable region. There is also a report compar-ing whole-genome sequences that finds widely disparate 16SrRNA gene sequences in a single clone (3), suggesting large-fragment gene transfer. The overall occurrence of polymor-phisms may be reevaluated in the future by techniques thatcould detect a variant allele expressed in minor amounts, but atthis point it does not seem to be a problem for accurate iden-tifications in clinical microbiology. During a review of hun-dreds of sequence analyses from our laboratory, we found thatintracellular polymorphism has not been a problem and thatonly a few Y’s or R’s have had to be assigned. Even though theintracellular polymorphisms may not be sufficiently numerousto generate a different species name based on each of thesequences, they are to be considered and can be useful whenusing the 16S rRNA gene sequence for epidemiological straintracking (38, 85).

One must be certain to distinguish the existence of two (ormore) variant 16S rRNA genes in a single genome from amixture of closely related but distinct strains in the presumedpure culture. In our studies of the “Streptococcus milleri”group, we reported this second possibility, in which there is amixed culture of two or more separate but highly relatedstrains in the same specimen. For example, two strains of S.anginosus (strains VAMC 417-1 and 417-2, as noted in refer-ence 13) were originally frozen as a single isolate, but withprecise work in sorting out the slightly different phenotypes,each substrain could be isolated in pure culture and was shownto be unique in genotype and phenotype. The sequence differ-ence in the isolates was only 4 bp in the 500-bp length, andthere were no ambiguous bases. If we had sequenced the mix-ture, these four positions might have appeared as three R’s anda Y, which some could have interpreted as intracellular poly-morphism. The situation in which the original specimen is nota pure culture but the contaminating strain is not closely re-lated is easier to resolve since there would typically be over 50ambiguous bases; under these circumstances, the organism(s)should be reisolated and resequenced.

Generating Dendrograms and Comparing Sequences

Several sequence-comparing software packages are available.We generally use the proprietary software that comes with theMicroSeq method. Other common software packages arePAUP (107), BLAST (1), and Phylip (30, 31; Phylogeny Infer-ence Package, University of Washington). BIBI, a bioinformat-

846 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 8: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

ics bacterial identification tool, has recently been developed tosimplify and automate bacterial identifications using DNA se-quence analysis (24); it is available at http://pbil.univ-lyon1.fr/bibi/. An excellent website that compares 194 of the phy-logeny software packages and 16 free servers that are used ingenerating dendrograms as well as addressing other conceptscovered in this review is http://evolution.genetics.washington.edu/phylip/software.html. Comparisons are commonly shown asdendrograms and linear alignments (for concise linear align-ments, all the identical base pairs are omitted and only thedifferences are shown).

Methods commonly used for generating dendrograms arethe NJ (neighbor-joining) method, the UPGMA (unweightedpair group method with arithmetic averages), and theWPGMA (weighted pair group method with arithmetic aver-ages) (86, 107). The methods are comparable, and the majorgroupings are preserved if the isolates are closely related.However, when the taxa being compared are less closely re-lated, the dendrogram relationships are more strongly affectedby the program used. Figure 4 shows the dendrograms gener-ated by the NJ methods for some Streptococcus spp. Figures 4Aand B are dendrograms with different outgroups, i.e., the pri-mary sequence against which a sequence is compared. The bestchoice for the outgroup is a closely related strain just outside

the group being studied. When an inappropriate outgroup ischosen, as in Fig. 4A (Chlamydia trachomatis is too distantlyrelated), the relationships of the clades can be obscured. S.anginosus may be too closely related but is useful in that thedifferences are clear (Fig. 4B). A concise comparison of thesame isolates is shown in Fig. 4C. Another method of compar-ison is what is called a concise alignment: only the differencesamong the compared sequences are shown (Fig. 4C). Thenumbers at the top of the figure are the base position in thesequence and are to be read vertically; for example, strainVAMC5210 differs from strain S7745 at positions 137, 274, and487.

The length of sequence analyzed and the alignment toolused can also affect the comparison of sequences. In Fig. 3,even though the relationships shown between the brevibacteriaare similar, the dendrogram reflects the fact that the threestrains in the Brevibacterium mcbrellneri group show no differ-ence in the first 500-bp sequence but there are five or sixdifferences in the last 1,000-bp sequence. This is a less commonoccurrence than the case where there is more heterogeneity inthe initial segment of the gene (48). However, if the clinicalstrain VAMC3643 and B. epidermidis are compared using twoalignment tools (BLAST and the Needleman Wunsch algo-rithm), four nonidentical sets of dissimilarity percentages are

FIG. 4. Importance of an appropriate outgroup and a concise comparison of strains. (A and B) Chlamydia trachomata (A) is too distantlyrelated to allow differences easily seen using S. anginosus as outgroup (B). (C) Exact base pair differences between strains seen in panel B in concisealignment form. The numbers at the top of the figure are the base position in the sequence and are to be read vertically; for example, strainVAMC5210 differs from S7745 at positions 137, 274, and 487.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 847

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 9: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

generated. For both sets of comparisons, the 500-bp sequencesshow more dissimilarity. More important is that the two pro-grams give somewhat different results; BLAST is faster but lessaccurate than the Needleman Wunsch algorithm. For example,the percent dissimilarity in the whole 1,500-bp sequence be-tween clinical strain VAMC3643 and B. epidermidis was 0.8%using the Needleman Wunsch algorithm and 1.5% using theBLAST comparison. The same calculations based on the first500 bp are 1.8 and 2.8%, respectively. The order of relatednesswas about the same for all four sets of calculations for theclosely related strains. However, this is not true for the moredistantly related species shown in Fig. 3. More caution must beexercised in using sequence data to assign exact taxonomicrelationships between the higher taxa (36, 55).

The dendrogram can also be used to assess an unknownsequence. To demonstrate how one can quickly assess an un-known sequence from any database, I imported a sequence,Leptotrichia buccalis L37788, which is not in the MicroSeqdatabase, from the GenBank database and incorporated it intothe dendrogram generated with type strain sequences from theMicroSeq database (Fig. 2). It is appropriately placed in thefusobacteria-streptobacilli lineage. In addition, a strain se-quenced in our laboratory, Unknownsp01M1398, was incorpo-rated into the dendrogram to demonstrate the relatedness ofthe organism to other major groups, even though there is nota closely related sequence with which to compare it in thedatabase. It represents a novel, deeply divergent genetic linethat is about 20% divergent from any bacteria in the MicroSeqdatabase and matches only an “uncultureable” strain in theGenBank database (these databases are discussed below).Both strains are seen about halfway down the dendrogram.

There are certain aspects of relatedness that are bettershown by dendrogram or by concise alignment comparisonthan by percent dissimilarity. In Fig. 4B the dendrogram showsthat strains S5366, S7377, and VAMC6703 are not in the samegenogroup and the concise alignment shows the exact basedifferences, yet the percent dissimilarity from strain 5366 ofboth S7377 and VAMC6703 is 1.1%. Thus, for example, onecannot assume that because two strains are 1.1% dissimilar toa particular strain, they have a closer relationship to each other(in this case they are 1.2% dissimilar to each other).

Sequence Databases

For phenotypic identifications of microorganisms, we aredependent on a database with an accurate morphologic andbiochemical description of type strains or typical strains andstandard methods to determine these characteristics for theisolate to be identified. Similarly, for accurate 16S rRNA genesequence identification of organisms, we are dependent onaccurate sequences in databases, appropriate names associatedwith those sequences, and an accurate sequence for the isolateto be identified.

There are several reasons why sequence databases can varyand may not accurately link a name with a sequence and,further, with a correct relative placement of the sequenceamong other bacterial sequences. It could be that the typestrain or strains in certified collections such as ATCC wereincorrectly named or classified by biochemical means or thatthe descriptions are just wrong. A familiar example of an or-

ganism being placed in the wrong genus is “Corynebacteriumaquaticum.” Although “C. aquaticum” has, superficially, thesame morphology as the corynebacteria, it is genetically distantfrom the genus Corynebacterium and is more closely related toorganisms in the genus Microbacterium. Another example isthat when examined by genotypic methods, the ATCC strainspreviously deposited as Corynebacterium xerosis were found tobe diverse and incorrectly identified (23). There are numerousother straightforward errors in taxonomy that are addressed inthe new edition of Bergey’s Manual of Systematic Bacteriology(36).

There is also the misplacement of well-defined species pre-sumably within a single genus but actually found in manytaxonomic groups. Species of the genus Enterobacter are foundassociated with five different genera. Enterobacter is a goodexample of what is called a polyphyletic genus, as is Citrobacter.In an attempt to correct this, some of the historic Enterobacterspp. have been given the new genus name Pantoea (for exam-ple, P. agglomerans and P. aerogenes). Proteus is a monophyleticgenus with all species in a single clade.

A third reason for error applies primarily to unverified da-tabases such as GenBank, which accept any linked name andsequence that is sent to them. In the 1995 investigation ofGenBank sequences, a comparison of sequences for the pre-sumed same species showed that almost 20% had more than2% variability, indicating that many genetically different strainswere being deposited under the same species name (18). Onsome occasions, the name associated with a given sequencemay not be correct due to poor phenotypic descriptions orfaulty testing. For example, the deposited strains of the generaAlcaligenes and Achromobacter are intermixed, with the de-scriptions and sequence both overlapping. On other occasions,the sequences deposited in GenBank are not complete or asaccurate as they could be. For example, we compared thesequence of the type strain of Mycobacterium szulgai generatedin our laboratory to the sequences in the GenBank databaseand the MicroSeq database (115). The sequence from theMicroSeq database and our sequence agreed, while the Gen-Bank sequence had an unacceptable number of bases that werelisted as N or undetermined.

Wilck et al. also have noted that databases such as GenBank,which are very broad (i.e., contain both pathogenic and non-pathogenic strains of human, animal, and environmental ori-gin) and not peer reviewed, tend to contain errors. They sug-gest that the use of peer-reviewed databases for defining the16S rRNA gene sequences of bacteria found in humans wouldimprove the validity of this method of organism identificationin the clinical microbiology laboratory (109). It was this sort ofconsideration that propelled the development of the propri-etary MicroSeq database of verified type strains and theRIDOM database.

There are few papers on the evaluation of the GenBank,RDP-II (57), and RIDOM databases and Internet-based pro-grams. To demonstrate the quality and accuracy of resultsprovided from available databases, Turenne et al. submitted 79mycobacterial type strain sequences determined in their labo-ratory for analysis using the GenBank database, the RDP-II,and, most recently, the RIDOM database (103). They foundthat all of the type strain sequences they generated had anidentical matching sequence when analyzed by RIDOM

848 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 10: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

whereas only 23% of species had a perfect match with se-quences from GenBank databases as determined by BLASTand 25% of species had a perfect match as determined byRDP-II. This means that the quality and/or the number of themycobacterial sequences in RIDOM is higher. However, al-though the RIDOM database has a particularly good collectionof mycobacterial sequences, there is a narror range of otherorganisms: in 2002 the database had only 237 sequences, ofwhich 158 were mycobacterial. Turenne et al. found that therewere species not present in other public databases that werepresent in the RIDOM database (103). The RIDOM databaseis broadening and expanding their database with quality se-quences.

The value of comparing unknown sequences to several dif-ferent databases is addressed in Table 2. The sequence of arecent blood isolate that we thought was a Streptococcus sp.(but with sequencing was shown to be Gemella bergiae) wasidentified using MicroSeq, GenBank, our own database, andthe RDP-II and RIDOM databases (Table 2). In the threemajor databases, the degree of relatedness is expressed differ-ently. GenBank’s main measure is percent identity, MicroSeq’smeasure is percent dissimilarity and RDP-II’s measure is arelatedness value somewhat close to (but lower than) the Gen-Bank percent identity. It is enough to note, without comparingstatistical merit, that all are valid. A dendrogram gives addi-tional information by showing that Gemella haemolysans and

G. morbillorum are easily distinguishable from each other al-though both are equidistant from the unknown isolate (Ge-mella bergiae).

The problem of intraspecies variability, i.e., that all strainswithin a species do not have identical 16S rRNA gene se-quences, requires the deposition of more than one sequencefor each species (18). By examining multiple sequences avail-able through GenBank for the same species and not countingthe 20% with variation of over 2% that were misclassified,Clayton found there was still considerable variation in se-quence within a species (18). As more clinical specimens aresequenced, interspecies genetic variability may become moreobvious. On the other hand, strains with minor variability (lessthan 1%) are sometimes given separate species designation.The clinical significance of minor variation is becoming clearerand is addressed below (see “microheterogeneity” in the 16SrRNA gene sequence is common”).

THE CLINICAL MICROBIOLOGIST’S DILEMMA INASSIGNING A GENUS AND SPECIES NAME

Common Definitions of Genus or Species Derived by 16SrRNA Gene Sequence Analysis

At times, and particularly when compared with all the ad-vances in molecular biology, taxonomy has been thought to be

TABLE 2. Comparison of databases for the identification of a abscess isolate that seemed to be a Streptococcus sp. by routine phenotypicmethods but was identified by 16S rRNA gene sequence homology as Gemella bergena

Factors to compare betweendatabases

Database

GenBank MicroSeq RDP-II RIDOM Our internal database

Total no. of sequences indatabasea

About 20,000,000, with about90,000 16S rRNA gene

About 1,400 About 6,000 More than 300 About 1,500 clinicalstrains

Was a similar sequencein the database?

Yes, several; closest were G.bergeri (Y13365) and anunnamed oral strain

No, but Gemellahaemolysans wasidentified as theclosest relative at6.5%; thus, onemust compare toother databasesalso

Yes No Yes; 0% differencefrom isolate fromfinger abscess

Can sequences beimported from anotherdatabase?

Yes; import bestselected sequencesfrom GenBank

Software comparison ofour isolate with G.bergeri (Y13365)

0% difference, 100%homology to G. bergeri(Y13365)

0% difference fromimported G.bergeri (Y13365)

0.963 related to G.bergeri (Y13365)in database

NA

Comparison with asecond tier of closelyrelated strains

There are two sequencesdeposited as G.haemolysans ATCC 10379;one with GenBank no.L14326 is good quality,one with GenBank no.M58799 has too many N’s;L14326 is 93% similar

G. morbillorum andG. haemolysansare equally relatedat 6.52%dissimilarity

G. haemolysansATCC 10379 isrelated at 0.677;these sequencesare generallyimported fromGenBank; thepoor-qualitysequence,M58799, looks asif it is moreclosely related,with a 0.739relativerelatedness

NA

a For all the public databases, total numbers of sequences are increasing rapidly.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 849

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 11: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

a relatively musty and obscure branch of microbiology. Thus,the taxon assignment for an isolate might be considered trivial:“a rose by any other name. . .” However, within the practice ofclinical microbiology, the laboratory’s decision on a name of-ten precipitates decisions about clinical treatment and therapy.In addition, it is difficult to correctly elucidate the pathology oforganisms or the disease process if a complex of organisms isreferred to by a single name or if many names are used for asingle species.

At this time, there exists no clear-cut consensus definition ofbacterial genus or species by 16S rRNA gene sequence com-parisons. This is a critical taxonomic concern, and a thoroughdiscussion is beyond the scope of this review (26, 33, 55, 56, 94,106, 108).

However, I will outline some of the practices and problemsas they affect clinical microbiology. Although a proposal fordefining species and genus using DNA-DNA hybridization asthe “gold standard” has been published (108), DNA-DNAhybridization is a difficult technique, performed in only a fewlaboratories, and does not always correlate with other defini-tions of species (32). The 16S rRNA gene sequence is mucheasier to determine and thus has become the new gold stan-dard (32, 44). Even though there is a consensus neither on theexact degree of genetic difference that defines a species nor onthe mathematical algorithm used to generate the data (Fig. 3),in practice a range of about a 0.5 to 1% difference (99 to 99.5%similarity) is often used (91). Bosshard et al. (6) used �99%similarity to define a species and �95% to �99% to define agenus. Fox et al. proposed that there be a difference of at least5 to 15 bp in the whole 16S rRNA gene sequence to define aspecies (33). Turenne et al. (103) designated the reportablerange for a species as �0.8 to 2.0% and suggested that asequence could obviously be called unique, i.e., representing aorganism whose sequence has not yet been deposited and thusmight be a novel species, if there were at least 20 to 38 bpdifference in sequence. This gives a corresponding score of0.961 to 0.916 in RDP-II (19, 103). Hall et al. adopted adistance score of 0.00% to less than 1.00% as the criterion forspecies identity (42). However, Tortoli found that Mycobacte-rium species could be validly named with differences of 4 bp orfewer (101). Tang et al. (97, 98) suggested a 0.5% difference asthe limit for species designation. Often, a strain with a smallgenotypic difference (less than 0.5%) has been considered asubspecies (10). When there is a clear phenotypic uniqueness,genogroups with less than 1.0% differences in sequence have infact been named as new species (48, 83, 101). For rickettsialisolates, Fournier et al. propose that members of the samespecies and genus have �99.8% and �98.1% 16S rRNA genehomology, respectively (32).

In addition, the total amount of intraspecies variability to beallowed is also not clear. For example, using the guideline thatan unknown strain should be less than 1% different from thetype strain, it might mean that hypothetical strain 1 and strain2 of the same species are 2% different from each other. Ingeneral, there is agreement that all sequences of strains withinthe same species should be close (no more than 1 to 1.5%difference in base pair sequence) (33, 91). At the next higherlevel, one should think of establishing a new genus if geno-groups are more than 5 to 7% divergent. The next sectiondiscusses how well these guidelines pertain to our present

nomenclature. Major challenges will be in the nomenclaturefor organisms named before their correct taxonomy was re-vealed by 16S rRNA gene sequence comparisons and in deal-ing with microheterogeneity, i.e., a difference of only a fewbase pairs in sequence.

To summarize, it is not possible to give a definite similarityor dissimilarity value to define genus and species. This is in partbecause different values are generated by analyzing separatedatabases and using different methods (Table 2; Fig. 3). Thepercent difference can vary if it is calculated using only the first500 bp or all 1,500 bp and can also vary with the program usedfor the calculations. It also is probable that a single value forthe definition of a genus or species on the basis of the 16SrRNA gene sequence is not appropriate for all genera (32, 44,101).

Problems with the Present Nomenclature

In addition to the fundamental problems with both the phe-notypic definition of species, which has been the standard untilthis time, and the genotypic definition of species, which is thepresent gold standard, there are special problems for someorganisms in resolving relationships between genotype andphenotype. This is often the case with well-known genera andspecies that were named based only phenotypic criteria prior tothe availability of the 16S rRNA gene sequence.

Examples of problems in assigning taxa and in finding ameaningful correspondence between genotype and phenotypeare shown in Table 3. The first two categories in Table 3(“Same genotype but different phenotypes” and “Similar ge-notype but different phenotypes”) show a critical problem forclinical practice and a drawback or deficiency in the 16S rRNAgene sequence identification method: that for some of thespecies, a sequence can be ambiguous since it does not distin-guish between two closely related but distinct and sometimesclinically important species or phenotypes. Another way to saythis is that there is more than one phenotype for a givengenotype. Several important examples of those with the samegenotype (or with only a few base pairs that are different) butdifferent phenotypes are M. tuberculosis and M. bovis, M. aviumand M. paratuberculosis, and M. kansasii and M. gastri (42, 103).Of the pathogenic bordetellae, B. pertussis, B. parapertussis,and B. bronchiseptica are not well distinguished by 16S rRNAgene sequence (48). Other such organisms are listed in thereview by Fredericks and Relman (34). Roth et al. showed thaton occasion, where 16S rRNA gene sequences were indistin-guishable, such as for M. kansasii and M. gastri, or highlysimilar, such as for M. malmoense and M. szulgai, the 16S-23SrRNA gene internal transcribed spacer sequences were a help-ful supplement for the differentiation of closely related species(82). Cases of unresolved identification by 16S rRNA genesequence analysis can also be distinguished by the addition ofa few carefully selected biochemical tests (42).

By some guidelines, isolates with a small genotypic differ-ence (ca. 0.4 to 0.9%) but a definite phenotypic difference havebeen considered either separate species or subspecies. In somecases, the isolates are given different names (e.g., S. pneu-moniae and S. mitis), and in others they are given the samespecies name (e.g., some members of the S. anginosus sub-groups [13]). However, there is great variability in practice

850 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 12: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

TABLE 3. Insights into the relationships of genotype and phenotypea

Type of relationship First strain Second strain No. of bp differences Reason for this example

Same genotype butdifferentphenotypes

Mycobacterium tuberculosisATCC 27294

M. bovis ATCC 19210 orM. africanum ATCC25420

0 Although given names that appear tobe species, the three strains are nowdesignated subspecies and showsmall or no genetic difference, butthere is a definite phenotypic andpathological difference. Seereference 43 for other exampleswithin the mycobacteria. This raresituation is a limitation to the use ofsequencing as gold standard.

Similar genotype butdifferentphenotypes

E. coli ATCC 11775 Shigella dysenteriaeATCC 13313

3 Two high-level pathogens aregenotypically close enough to beconsidered the same species buthave kept clinically and historicallyimportant separate names.

Streptococcus pneumoniaeATCC 33400

S. mitis ATCC 49456 3 A high-level pathogen and acommensal are genotypically verysimilar.

Streptococcus bovis S. equinus 2 Very close genogroups which aredifficult to distinguish phenotypicallyin the clinical laboratory. However,their reported difference in lactosereaction and different niches mayjustify the different names.

Similar phenotypesbut differentgenotype

Nocardia asteroides ATCC19247

N. farcinica ATCC 3318 13 Separate species which are relativelydifficult to distinguish phenotypicallyare easy to distinguish by sequence.

S. bovis ATCC 33317T S. bovis ATCC 43143 13 Separate genogroups which aredifficult to distinguish phenotypicallyare easy to distinguish by genotype(11).

Too distant to be thesame species

Enterobacter (Pantoea)agglomerans (bg1)

E. (Pantoea) agglomerans(bg2)

27 Strains that were originally thought tobe biogroups within the samespecies are genotypically distantenough to be considered separategenera.

Too distant to be thesame genus

Clostridium tetani ATCC19406

Clostridium innocuumATCC 14501

About 104 Although these two organisms havebeen given the same genus name,the large difference (20%) meansthat at least one has beentaxonomically misplaced. Species inthe same genera should not differby more than about 5 to 6%.

Too close to havedifferent names

Mycobacterium duvalii “M. valentiae” 0 Sometimes names get into theliterature without full justification.

Too close to be threedifferent genera

Enterobacter cloacaeATCC 13047

Leclercia (Enterobacter)adecarboxylata ATCC23216

1–2 These three organisms, which havebeen placed in different genera, aregenotypically close enough to beconsidered the same species.

E. cloacae ATCC 13047 Citrobacter werkmanii 6Subspecies Streptococcus dysgalactiae

subsp. dysgalactiaeStreptococcus dysgalactiae

subsp. equisimilis14/500 and 16/1,500 The difference between these two

“subspecies” is greater than thatbetween the genera above.

Streptococcus equi subsp.equi

Streptococcus equi subsp.zooepidemicus

1/500 and 1/1,500 The only difference between the twosubspecies was at bp 204.

Staphylococcus cohniisubsp. cohnii

Staphylococcus cohniisubsp. urealyticum

1/500 and 3/1,500 The difference of 0.33% was the samecalculated using either the 500-bpor the 1,500-bp sequence.

Other points Streptococcus mitis ATCC49456

Clinical strains identifiedas S. mitis

3–10 Lower-level pathogens tend to havemore heterogeneous genotypes for aphenotypically derived name,presumably because of poorphenotypic discrimination.

S. intermedius ATCC37335

S. constellatus ATCC27823

9 Although they are sometimeconsidered together as the S. millerigroup, S. intermedius is much moreclosely related to S. constellatus thanto S. anginosus.

Continued on following page

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 851

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 13: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

(Table 3), since a comparison of sequences for several subspe-cies shows a differences of from 1 to 14 bp.

There is also the category “Similar phenotypes but differentgenotype.” Another way to say this is that there is more thanone genotype for a given phenotype or that the phenotype ispolyphyletic, a vexing but not uncommon problem that 16SrRNA gene sequencing is well positioned to solve. Recently wesequenced all our stored clinical Nocardia strains and weresurprised to discover that essentially all the strains that, intimes past, we had considered to be N. asteroides on the basisof phenotypic characteristics were indeed mostly N. farcinicaand a few N. nova. We also found that our S. bovis collectionconsisted of two separate genogroups, although the clinicalsignificance of this has not been determined (15, 45). Often, onfurther study, distinguishable phenotypes are found.

Table 3 also contains the categories “Too distant to be thesame species or genus” and “Too close to have differentnames,” and “Too close to be three different genera.” It ishelpful to look at the number of base pair difference and Table4 to have an understanding of how variable interspecies andintergenus distances are in present-day practice. The obviousreason why there is great variation in what we call a genus andspecies, as the rules have not been uniformly applied, is thatthese were named before 16S rRNA gene sequencing wasavailable. The consideration of subspecies is problematic sincethe difference between some subspecies is as great as thatbetween some genera; the distance between the subspeciesStreptococcus dysgalactiae subsp. dysgalactiae and Streptococcusdysgalactiae subsp. equisimilis is greater than between manygenera. There are several additional categories. When the ge-notype and phenotype overlap, as with Afipia strains, it may bethat strains overlap because there are so few isolates that thedescription may not be sufficiently accurate. Sometimes, if nei-ther genotype nor phenotype is distinct but strains have beengiven different names, it could be because of historic reasons,simultaneous publication, or carelessness on the part of theauthors and the editors.

Table 4 is a way to present data showing the genetic vari-ability within different genera. The data are modified fromthose presented by Montgomery et al. (S. Montgomery, S.Anderson, M. Waddington, J. Bartell, G. Num, and P. Foxall,Abstr. IXth Int. Congr. Bacterial Appl. Microbiol., abstr. 54,1999). The interspecies variability for each of the genera listedin column has been calculated. The number of known speciesis given in column 2, and the percentage of known species thatare in the MicroSeq database are shown in column 3. Thegenera are sorted on the basis of increasing average interspe-cies variability, which ranges from 0.5 to 17.3%. The last two

columns can be used to evaluate genera: they indicate thatthere may be problems if the minimum distance between spe-cies is less than about 0.4 to 0.6% and the maximum distanceis above about 5%. Thus, we see that Arthrobacter, Bacillus,Lactobacillus, and Clostridium, to name some of the most egre-gious examples, have at least two named species that have thesame 16S rRNA gene sequence and at least one species that ishighly unrelated to others in the same genus. Most of thegenera in the first half of the table err only, if at all, in havingtwo names for the same species or too little separation betweenspecies.

Table 4 demonstrates inequalities in broadness betweentaxa. A well-known example is that the genetic difference be-tween some genera that comprise the Enterobacteriaceae issmaller than the difference between some subspecies of Strep-tococcus (Table 3). Further, this difference is far smaller thanin the single genus Clostridium. Therefore, by the Enterobac-teriaceae standards for designating genera, many species ofClostridium would be given separate genus status. On the otherhand, by Clostridium standards, all the members of the Enter-obacteriaceae plus all the Vibrio, Aeromonas, Haemophilus, andPasteurella groups would be considered to belong to only onegenus (Fig. 2).

In general, one sees more variability reported in the 16SrRNA gene of species that are less well described and of lowerpathogenicity than in the well-known pathogenic species. Thisis simply because organisms of low pathogenicity are usuallynot as well described as pathogens. Thus, in our justifiable lackof knowledge, we may be grouping genotypically and pheno-typically heterogeneous isolates within one taxon. As a conse-quence, commensal or environmental isolates with a givenname tend to have more variation in their 16S rRNA genesequence than do well described highly pathogenic strains. Anexample of this is that strains identified on phenotypic groundsas S. anginosus or M. flavescens would tend to have morevariation in their 16S rRNA gene sequence than strains iden-tified on phenotypic grounds as S. aureus or M. tuberculosis.

Microheterogeneity in the 16S rRNA Gene SequenceIs Common

Sequevars, intraspecies variation, and variant subspecies ge-notypes are terms expressing the concept of microheterogene-ity within a species. Usually this denotes differences of lessthan 0.5% or only a few base pair per 16S rRNA gene se-quence. There is not a consensus about how to name organismgroups showing microheterogeneity. The significance of micro-heterogeneity to clinical microbiologists is that it seems to

TABLE 3—Continued

Type of relationship First strain Second strain No. of bp differences Reason for this example

S. intermedius ATCC37335

S. anginosus ATCC33397

33

S. anginosus ATCC 33397 S. anginosus, urinarytract strains

6 There is almost as much variabilityamong the S. anginosus clades orsubspecies as between the generaEnterobacter (cloacae) andCitrobacter (werkmanii) above

a Examples of base pair differences per 500 bp sequenced between the two strains listed and the significance of this result.

852 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 14: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

TABLE 4. Comparison of the genetic heterogeneity within generaa

Genusb

No. ofvalidly

publishedspecies

% of speciesin MicroSeq

database

% withinterspeciesdistances of

�3%

% withinterspeciesdistances of

�1%

Interspecies distance

Avgdistance

(%)

Range (%)

Low High

Edwardsiella 4 75.0 100.0 100.0 0.5 0.2 0.7Curtobacterium 6 83.3 100.0 100.0 0.5 0.1 0.7Bordetella 7 85.7 100.0 60.0 0.8 0.0 1.5Shigella 4 100.0 100.0 50.0 0.9 0.4 1.4Cedecea 3 100.0 100.0 66.7 0.9 0.6 1.3Bartonella 14 21.4 100.0 66.7 1.0 0.7 1.4Aeromonas 17 64.7 100.0 30.9 1.0 0.0 2.1Kluyvera 5 80.0 100.0 67.0 1.1 0.9 1.4Yersinia 11 91.0 100.0 42.2 1.2 0.3 2.5Proteus 5 80.0 100.0 17.0 1.3 0.7 1.8Providencia 5 100.0 100.0 10.0 1.3 0.2 2.1Veillonella 7 57.1 100.0 16.6 1.4 0.8 2.0Hypomicrobium 9 33.3 100.0 0.0 1.5 1.3 1.6Pseudoalteromonas 16 43.8 71.4 71.4 1.6 0.2 4.2Listeria 6 100.0 66.7 60.0 1.6 0.3 3.5Micromonospora 15 53.3 50.9 5.5 1.6 0.5 2.4Fluoribacter 3 100.0 100.0 0.0 1.6 1.2 2.0Klebsiella 5 100.0 100.0 10.0 1.7 0.1 2.4Streptosporangium 11 45.5 100.0 30.0 1.8 0.0 2.8Hydrogenophaga 4 100.0 100.0 16.7 1.9 0.4 2.6Burkholderia 16 56.3 52.8 27.8 2.0 0.0 4.1Enterobacter 13 92.3 83.3 6.1 2.1 0.4 3.7Pantoea 7 57.1 83.3 0.0 2.2 1.1 3.1Gordona 9 67.0 86.7 6.7 2.4 0.2 3.4Acidovorax 5 60.0 66.7 0.0 2.4 1.5 3.5Brevibacillus 10 90.0 61.1 13.9 2.4 0.3 4.8Staphylococcus 35 88.6 76.3 4.7 2.4 0.0 4.1Serratia 9 100.0 72.2 5.6 2.4 0.6 4.2Tsukamurella 5 80.0 50.0 50.0 2.4 0.5 4.2Saccharothrix 12 25.0 66.7 0.0 2.4 1.4 3.2Pseudomonas 71 70.4 20.7 2.0 2.5 0.1 6.6Microbacterium 27 51.9 64.8 4.4 2.7 0.0 5.4Acinetobacter 7 100.0 61.9 0.0 2.9 1.0 3.8Nocardia 18 50.0 55.6 0.0 3.0 1.4 4.4Escherichia 5 100.0 40.0 10.0 3.0 0.3 4.7Neisseria 19 73.7 48.4 4.4 3.0 0.1 5.1Enterococcus 19 94.7 50.3 9.8 3.0 0.1 6.3Ralstonia 5 60.0 33.0 0.0 3.1 2.1 3.6Methylobacterium 10 90.0 58.5 19.6 3.1 0.4 5.0Xenorhabdus 5 80.0 33.3 0.0 3.3 2.5 4.2Mycobacterium 81 80.0 35.1 2.1 3.4 0.0 7.3Kocuria 5 60.0 0.0 0.0 3.6 3.3 3.9Actinobacillus 16 50.0 46.4 7.1 3.7 0.4 6.4Cellulomonas 10 90.0 13.9 2.8 3.7 0.4 6.0Legionella 36 97.2 14.3 1.5 4.0 0.2 7.4Carnobacterium 6 50.0 33.3 0.0 4.2 2.7 5.1Fusobacterium 15 40.0 6.7 6.7 4.4 0.3 10.8Arcanobacterium 4 75.0 33.3 0.0 4.4 2.3 5.5Weissella 7 85.7 6.7 0.0 4.6 1.2 6.8Helicobacter 18 38.9 14.3 0.0 4.7 2.7 7.1Pediococcus 7 71.4 30.0 10.0 5.0 0.1 8.4Halomonas 19 36.8 14.3 0.0 5.1 2.5 6.9Haemophilus 17 71.0 19.7 7.6 5.1 0.7 7.3Moraxella 15 73.3 23.6 0.0 5.3 1.4 10.6Shewanella 12 25.0 0.0 0.0 5.5 4.9 6.1Streptococcus 46 76.0 5.7 1.0 5.7 0.1 9.0Campylobacter 14 57.1 7.1 0.0 5.9 1.4 9.0Propionibacterium 10 40.0 0.0 0.0 6.1 3.4 7.4Pasteurella 19 89.5 6.6 1.5 6.1 0.6 10.2Sphingomonas 23 39.1 16.7 0.0 6.2 1.0 8.5Arthrobacter 26 76.9 15.3 2.1 6.2 0.5 22.9Corynebacterium 49 59.2 2.0 0.5 6.3 0.3 11.9Paenibacillus 22 81.8 2.9 0.0 6.7 1.5 9.8

Continued on following page

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 853

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 15: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

allow the possibility of distinguishing important phenotype,pathogenicity, and niche differences between strains (10, 38,84, 85). Microheterogeneity has also been exploited for straintracking and epidemiological studies (38, 85).

16S rRNA gene sequence microheterogeneity has beenfound in many taxa. In Nocardia, substitutions of as little as 1or 2 bp correlated with a unique phenotype (83). However,major differences in drug susceptibility patterns, important forclinical practice, were not found at this level but were found atthe 1% difference range. Similar microheterogeneity is seenwithin the genus Mycobacterium (101, 103). Over 40 new spe-cies have been detected since 1990, most of which were grownfrom clinical samples and are potentially pathogenic. Many ofthem differ from another by only a few base pairs, but evensmall changes in the sequence seem to be correlated withunique phenotypic characteristics, clinical significance, andniche (101). Sometimes they are called sequevars (101). Be-cause new sequences are found in almost all studies of clinicalmycobacterial strains, the prospect is that many more seque-vars will be detected, swelling the numbers of potential sub-species clades.

For the four groups that we have studied, S. anginosus, S.constellatus, H. influenzae, and H. parainfluenzae, the subspe-cies clade has correlated with a combination of biochemicalprofile and niche. As an example, 100 clinical strains of S.anginosus that we have sequenced cluster into about 13 groupsor clades. Most of the clades differ by only 2 to 4 bp per 500-bpsequence from the next nearest clade, and many of the cladesdiffer from each other by 6 bp (Fig. 4C). We examined thephenotypic characteristics and niche for each clade. We foundthat the minute differences in sequence are reflected in phe-notypic and niche differences. Some of this information issummarized in Table 5, where methods of reporting and thesignificance of the eight possible differences at positions 71, 97,137, 186, 274, 288, 463, and 487 are addressed. Similarly, Chen

et al. found that genetic subspecies groupings of Pasteurellamultocida might have clinical significance (10).

As these microheterogeneities are increasingly shown to becorrelated with important clinical or phenotypic characteris-tics, there is a need to recognize them. However, it is not clearthat assigning a new name would be the most practical orinformative practice. If species names are given to recognizeevery microheterogeneity, we may be confronted with a pleth-ora of new names that might add confusion, particularly for themedical practioner. A practical interim solution might be usingquantifying modifiers (i.e., the percent difference from the typestrain) and pathologic descriptions (the most common site ofisolation) attached to the primary species name (Table 5). Anumber of possible naming schemes are shown. The advantageof some of the possible nomenclatures is that they allow animmediately apparent relationship with better-known speciesto be understood without having to do a literature search witheach clinical microbiology patient’s report.

MAJOR IMPROVEMENTS IN CLINICALMICROBIOLOGY PRACTICE BY IDENTIFYING

BACTERIA BY SEQUENCE INSTEAD OF PHENOTYPE

16S rRNA Gene Sequences Can Better Identify PoorlyDescribed, Rarely Isolated, or Phenotypically

Aberrant Strains

Because 16S rRNA gene sequence analysis can discriminatefar more finely among strains of bacteria than is possible withphenotypic methods, it can allow a more precise identificationof poorly described, rarely isolated, or phenotypically aberrantstrains. This is an area in which 16S rRNA gene sequenceidentification might have an immediate impact on patient care.For example, some species of “viridans” streptococci are farmore likely than others to cause endocarditis (2, 8). However,

TABLE 4—Continued

Genusb

No. ofvalidly

publishedspecies

% of speciesin MicroSeq

database

% withinterspeciesdistances of

�3%

% withinterspeciesdistances of

�1%

Interspecies distance

Avgdistance

(%)

Range (%)

Low High

Lactococcus 5 80.0 0.0 0.0 7.3 4.1 8.8Actinomyces 18 38.9 4.8 0.0 7.3 2.3 11.0Capnocytophaga 7 42.9 0.0 0.0 7.6 6.7 8.2Prosthecomicrobium 4 75.0 0.0 0.0 7.7 7.2 8.3Deinococcus 7 57.1 0.0 0.0 7.9 4.8 9.2Bacillus 76 77.6 2.5 1.2 8.0 0.0 18.0Agrobacterium 5 80.0 16.7 0.0 8.5 1.4 13.6Sphingobacterium 6 66.7 0.0 0.0 8.7 4.5 11.0Bacteroides 26 76.9 0.0 0.0 8.8 3.2 14.2Lactobacillus 69 55.0 2.7 0.7 9.2 0.1 16.3Selenomonas 9 33.3 0.0 0.0 10.1 9.4 10.5Prevotella 24 41.7 0.0 0.0 11.8 3.1 16.6Flavobacterium 20 25.0 10.0 0.0 12.2 1.9 20.1Brevibacterium 10 80.0 21.4 3.6 12.5 0.9 21.0Peptostreptococcus 15 46.7 4.8 0.0 13.3 2.4 18.6Clostridium 132 27.3 2.7 1.3 13.6 0.0 22.7Cytophaga 8 50.0 0.0 0.0 17.3 10.5 20.1

a Modified from Montgomery et al., Abstr. IXth Int. Congr. Bacteriol. Appl. Microbiol., 1999, with permission from the publisher.b The table is arranged from genera with the lowest average interspecies genetic difference to genera with the highest average interspecies genetic difference. Some

obvious outliers have been removed.

854 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 16: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

identification of these organisms by phenotypic methods isdifficult and subject to error. Thus, judging the importance ofisolates of “viridans” streptococci isolated from the blood hasbeen a problem for the infectious-disease physician. However,16S rRNA gene sequence analysis provides accurate identifi-cation at the species level and can clarify their clinical impor-tance (14, 34; S. M. Attorri, M. Waddington, and J. E. Clar-ridge, Abstr. 100th Gen. Meet. Am. Soc. Microbiol. 2000,abstr. C-348, 2000).

Mistakes in identifying poorly described, rarely isolated, orphenotypically aberrant strains are probably quite common inthe routine clinical laboratory. Even in large and complexresearch and reference laboratories, which have more time andphenotypic tests available for identification of bacteria, un-usual bacteria often cannot be identified. Drancourt et al.published a list of 177 such difficult organisms (27). It is inter-esting that these organisms had been referred to the bio-Merieux, Inc. (Marcy l’Etoile, France), laboratory by routinelaboratories and intensive prior attempts to identify the organ-isms by biochemical methods had been made without success.For about 80% of the isolates, there was a close match with adescribed species. Another 10% represented a new specieswithin a described genus, and about 10% of the organismsrepresented novel taxa. (27). Tang et al. compared a variety ofidentification systems including cellular fatty acid profiles, car-bon source utilization, and conventional biochemical identifi-cation with the 16S rRNA gene sequence to evaluate bothunusual aerobic gram-negative bacilli and coyneform organ-isms isolated from clinical specimens (97, 98). They found that

16S rRNA gene sequence provided more rapid, unambiguousidentification of the difficult bacterial isolates than did conven-tial methods and that this identification could translate toimproved clinical outcomes (98). Bosshard et al. (6) found thatonly a minority of the clinical laboratory isolates of aerobicgram-positive rods could be correctly identified by phenotypicmethods whereas rRNA gene sequencing is an excellentmethod for identifying these organisms, which are difficult toidentify by convential methods. Woo et al. found that theMicroSeq 500 16S rRNA gene-based bacterial identificationsystem for most clinically important bacterial strains with am-biguous biochemical profiles was limited only by the degree ofcompleteness of the database (114).

However, it may be difficult for the primary laboratory to tellif an organism is difficult to identify or that it is incorrectlyidentified. For example, twice we isolated from blood an or-ganism that was consistently identified by widely used pheno-typic methods as Neisseria meningitidis or Actinobacillus acti-nomycetemcomitans but was unambiguously identified bysequencing as Francisella tularensis subsp. novicida (16). Therewas no history compatible with tularemia. Since two strainshave been isolated from our hospital patients but none re-ported from the rest of the world, it is probable that otherstrains may be isolated but not correctly identified, since theywere not thought to be “difficult” based on routine tests andthus were not sequenced. Similarly, we reported the isolationof Actinomyces israelii from a cervix and did not think it was adifficult organism. Later results of sequencing performed aspart of a survey of our Actinomyces spp. showed it to be Bi-

TABLE 5. Possible ways to name organisms to capture all the information available by 16S rRNA gene sequence analysisa

Specimen name Nomenclature or possible nomenclature Reason for example

6703 Streptococcus anginosus Present system6703 Streptococcus anginosus (0%) Designates that there are no sequence differences between the

specimen sequence and type strain6703 Streptococcus anginosus (0%) (blood, pharynx) As above, with clinical source of other genetically identical strains

VAMC5210 Streptococcus anginosus Present systemVAMC5210 Streptococcus anginosus (0.37%) Designates the percent difference between the specimen sequence

and type strainVAMC5210 Streptococcus anginosus sequevar (274A, 288A) Designates the position and base change of the two differences

between the specimen sequence and type strain

S7745 Streptococcus anginosus Present systemS7745 Streptococcus anginosus (0.55%) Designates the percent difference between the specimen sequence

and type strainS7745 Streptococcus anginosus (0.55%) sequevar

(137A, 288A, 487G)As above and designates the position and substitution of the

three differences between the specimen sequence and typestrain

S7745 Streptococcus anginosus (0.55%) sequevar(137A, 288A, 487G) (urogenital)

All known information is included, including the most commonsite of isolation

VAMC5366 Streptococcus anginosus Present systemVAMC5366 Streptococcus anginosus (1.10%) Designates the percent difference between the specimen sequence

and type strain; since there is more than 1% difference,consider naming as a new species

VAMC5366 Streptococcus anginosus sequevar: differs fromtype strain at positions 71, 97, 186, 274, 288,and 463

Designates the position of the six differences between thespecimen sequence and type strain

VAMC5366 Streptococcus anginosus (1.10%) (pharynx) Designates the percent difference between the specimen sequenceand type strain and the most common site of isolation

a The genotypic information makes naming organisms more complex but at times may have clinical importance. All examples are of S. anginosus subspecies orsequevars shown in Fig. 4.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 855

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 17: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

fidobacterium boum, yet it had a colony and biochemical profilesimilar to A. israelii and grew in air supplemented with CO2,despite Bifidobacterium boum being described as a strict anaer-obe in the literature.

Some species of Nocardia are difficult to distinguish pheno-typically but are readily identified by 16S rRNA gene sequence(83, 105). Sequence analysis has led to reevaluation of thestrains that had been called N. asteroides and to reassignmentof clinical significance (83). In our laboratory, we found that allof the presumed N. asteroides isolates associated with brainabscess (three) were actually N. farcinica when analyzed by 16SrRNA gene sequencing.

16S rRNA Gene Sequences Can Be Routinely Used forIdentification of Mycobacteria

Mycobacteria are in general slow-growing and/or difficult toidentify. Thus, they were an important group of organisms inearly important studies establishing the usefulness of 16SrRNA gene sequencing for clinical microbiology (4, 5, 7, 29, 51,80, 81, 99). More recently, there have been several additionalstudies comparing the identification of mycobacteria by 16SrRNA gene sequence and phenotypic methods (19, 22, 42, 50,66, 92, 103). In all of these studies, the accuracy of 16S rRNAgene sequencing in the identification to the species level wasjudged to be superior overall to phenotypic methods. Overall,by providing for the accurate identification of species in thedatabase and the taxonomic placement if not complete iden-tification of novel species, 16S rRNA gene sequence analysis ofmycobacteria seems to be the most accurate method available.

Exceptions are the known instances in which 16S rRNAgene sequencing could not differentiate among a limited num-ber of species, e.g., M. avium and M. paratuberculosis, M. che-lonae and M. abscessus, and M. tuberculosis and others in theM. tuberculosis group (19, 42, 66, 80). In a large referencelaboratory, Hall et al. found that 16S rRNA gene sequencecould identify 243 of 328 clinical isolates with a distance scoreof �1% (42). In this group, the agreement with phenotypicidentification was 90.1%, with the discrepant results beingthose in the groups which sequencing cannot distinguish, asgiven above. The remaining 85 isolates had distance scoresabove 1% but below 4% and thus were determined to be withinthe genus Mycobacterium: in this case, either novel species orspecies that exhibited significant genotypic divergence from anorganism in the database with the closest match. The power ofsequence analysis is demonstrated since the 85 organisms were“known to be unknown” by sequence analysis whereas thephenotypic testing identified most of these incorrectly asknown species. The authors recommended integration of nu-cleic acid sequencing into the routine mycobacteriology labo-ratory after the use of genetic probes for the most commonspecies. The use of the MicroSeq 500 microbial identificationsystem and internal (Mayo Clinic) databases containing addi-tional sequences significantly reduced the number of organ-isms that could not be identified by phenotypic methods. Cloudet al. also used the MicroSeq 500 system and found that an-other database (RIDOM) was needed to provide sequences foradditional species (19). Most of the recent studies agree that afew known organisms such as M. lentiflavum are missing fromthe MicroSeq 500 database and that about 7 to 12% of the

isolates were novel strains (18, 42). Patel et al. found that 12 of113 strains had over 0.9% difference from any known strains(66). Not surprisingly, older studies found a larger percentageof unidentified strains because of the less extensive databasesavailable at the time (92). A European consortium, using phe-notypic testing, high-performance liquid chromatography, and16S rRNA gene sequencing, also found that about 7% ofmycobacterial strains represented novel species (102). A fewexamples of the many good papers in which novel mycobacte-rial species are described using both biochemical and genotypicdescriptions and more than one strain are included in thereference list (90, 91, 101).

16S rRNA Gene Sequence Analysis Can Lead to theDiscovery and Description of Novel Pathogens

In clinical microbiology practice, novel organisms are gen-erally first recognized by an aberrant phenotype or niche. Ifthese observations are followed by 16S rRNA gene sequencing,the sequence often indicates that the organism is only an un-usual phenotype of a known taxon (Table 3). However, at thistime, perhaps 10 to 20% of the isolates might not match anyother described organism and thus might be a novel organismand an even higher percentage might be previously describedorganisms but not strains usually encountered in clinical prac-tice (25, 76, 102). A medium to large clinical microbiologylaboratory can be expected to isolate a few novel organisms permonth.

Where should one look for new species? We look for novelorganisms among the groups that are poorly described or dif-ficult to grow. We have found that there is little potential forfinding novel genogroups within the well-studied major patho-genic species, since these taxonomic groups have been so welland accurately described that there is little confusion or vari-ability in their 16S rRNA gene sequence. Species with thisgenetically homogeneous characteristic that we have testedinclude M. tuberculosis, H. ducreyi, H. influenzae group b, Strep-tococcus dysgalactiae, S. pneumoniae (blood isolates), andStaphylococcus aureus.

If we plotted all known strains in a dendrogram such as inFig. 2, we would see that some taxa are very, very crowded. Forexample, the family Enterobacteriaceae has poor potential forharboring novel types, since so many phenotypes have alreadybeen described. However, other genetic groups are not as wellstudied, and few have had as many distinct species grown,isolated, and sequenced. At this time, some other taxa such asthe Leptotrichia-Streptobacillus-Fusobacterium group remainquite sparse. It is probable that many future isolates in thisgroup may represent novel species. Other potentially fruitfultaxa which are beginning to be more fully described include theActinomyces-Actinobacillus genera and the less common cata-lase-negative gram-positive cocci (6, 17, 20, 21, 43, 89). Onecan also examine Table 4; where the interspecies difference isgreat, it is probable that there are undiscovered bacteria in thegroup. In a recent review of taxonomic groups of mycobacteria,Tortoli (101) presented the essential contribution made by 16SrRNA gene sequencing not only to distinguishing 42 new spe-cies but also to realigning of classically known species of slowand rapid growers into new groupings. Large numbers ofanaerobes remain to be described (90, 91). Another fruitful

856 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 18: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

area is that of organisms associated with animals; a seeminglylarge percentage remain to be well described (21).

In the last 15 years, there have been thousands of publica-tions utilizing 16S rRNA gene sequence as part of a speciesdescription. Some representative journals reflect both thenumbers of novel organisms being described and the uses of16S sequencing. In 2001 in Applied and Environmental Micro-biology, there were 116 reports of such studies, many of whichused 16S rRNA gene sequencing to assess the proportions ofspecies in populations of bacteria in a particular environment(see, e.g., reference 88; see also reference 46). In 2003, therewere 181 such reports. In contrast, in the Journal of ClinicalMicrobiology, many of the 94 citations in 2001 and 91 citationsin 2003 concerned the correct description of a novel potentialpathogen (e.g., Helicobacter winghamensis) (58). In 2003 in theInternational Journal of Systematic and Evolutionary Microbiol-ogy, about 300 novel species were reported with the 16S rRNAgene sequence as part of the description.

16S rRNA Gene Sequence Analysis Can IdentifyNoncultured Bacteria

Although the 16S rRNA gene sequence is an essential partof the description of a novel organism, for many nonculturedbacteria it may be the only taxonomic description (34, 76, 78).Since all organisms are obviously able to grow under theproper conditions, the terms “noncultured” or “not easily cul-tured” are preferable to “nonculturable,” which is sometimesused. Indeed, we recently grew and described a strain (Fig. 2,strain 01-1398) whose sequence in GenBank was identified as“nonculturable.”

The basic method to derive a sequence for a nonculturedbacterium is to use universal primers against the 16S rRNAgene region in a PCR step to increase the amount of DNA andthen to sequence the amplicon (7, 11, 39, 72, 75, 77). This willwork well if there is only one organism to detect. A differentstrategy is imposed if there is a situation in which a mixedculture is likely, such as a clinical specimen from a nonsterilesite or an environmental sample.

There are valid reasons why bacterial strains that we mayencounter in clinical practice are not recovered. For example,prior antibiotic treatment may render them nonviable. Thereare several interesting publications in which the etiologicalagents of “culture-negative” endocarditis (8, 9, 37, 39, 47, 109)were identified by molecular analysis using broad-range PCRprimers complementary to the 16S rRNA gene, sequencing,and database searches with different software. The organismsusually can be grown (e.g., Streptococcus salivarius [109] andCapnocytophaga canimorsus [our laboratory]), but in thesecases the specimens were submitted once the patient was beinggiven antibiotics.

A second reason is that the organisms may be genuinely hardto grow. Bartonella henselae, the causative agent of bacillaryangiomatosis, was primarily identified and associated with thedisease in this way (77). Many of the reports of Whipple’sdisease are based solely on PCR amplification and subsequentsequencing of the 16S rRNA genes (9, 78), because the etio-logic agent, Tropheryma whipplei, has demanding growth re-quirements beyond the capabilities of most clinical laborato-ries. It is interesting that Celard et al. found T. whipplei as the

causative agent in two cases of endocarditis without therebeing previous evidence of related disease (9). Brouqui andRaoult (8), using broad-based PCR amplification of the 16SrRNA gene, found that the most common etiologic agentsassociated with culture-negative endocarditis were Bartonellaquintana and Coxiella burnetii, both of which require specialconditions to grow. We routinely freeze at �70°C a portion ofthe removed valves if at that time we are still uncertain aboutthe cause of a case of endocarditis. If the pathogen is notsubsequently identified by culture, we subject the blood culturebottles and valve to PCR with universal primers and performsequence analysis on the amplified product. In most cases, theprocess is successful only if the organisms are present in suf-ficient numbers to be seen by electron or light microscopy (37,109). In an excellent presentation of 29 cases of histologicallyconfirmed infective endocarditis, direct amplification and se-quence analysis was confirmatory in 21 cases and essential tothe diagnosis in 5 cases (37). Because many Mycobacteriumspp. are slow-growing and costly to identify, it has been usefulto identify them by direct sequencing of amplified DNA (7,80).

A third situation in which these methods are useful is whenthere are mats of adherent, diverse, and unknown organisms.Periodontal disease and biofilms have been studied in this way(25). The diversity of organisms present in the subgingivalpockets of patients with periodontitis and acute necrotizingulcerative gingivitis was examined by amplification of the 16SrRNA gene using PCR with a universal forward primer and aspirochete-selective reverse primer. The amplified DNA wascloned into Escherichia coli. The DNA segment was then se-quenced, and the sequences were compared as describedabove. Novel genotypes are commonly found, such as thoserepresenting Atopobium species and a new genus, Olsenellagen. nov. (25).

Similar studies of the environment are also yielding a wealthof novel organisms. The medically important order Chlamydia-les has long been considered to contain a few closely relatedbacteria which occur exclusively in animals and humans. How-ever, employing techniques similar to those described above,Horn and Wagner (46) found at least four novel evolutionarylineages of Chlamydiales in environmental sludge. These find-ings suggest that some wastewater treatment plants representreservoirs for a diverse assemblage of environmental chlamyd-iae and suggest that the environment may be a source of novelorganisms that might have public health consequences.

Automatic instruments used to detect growth in blood cul-ture bottles sometimes flag positive in the absence of apparentgrowth. A unique use of direct 16S rRNA gene sequenceanalysis with universal primers was to search for nonculturedbacteria as a source of the presumably false-positive bloodcultures (71). The investigators did not find any previouslyundetected bacterium as the cause of the false-positive bloodcultures.

COSTS IN A ROUTINE CLINICAL MICROBIOLOGYLABORATORY

At this time, for routine identifications, 16S rRNA genesequence analysis is more expensive than most traditional iden-tification methods. However for difficult organisms, multiple

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 857

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 19: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

identification methods often must be used, which increases thecost. Thus, Hall et al. (42) found that for the identification ofunusual mycobacteria, sequence analysis was less than $2 moreexpensive than standard methods. Patel et al. estimated thecosts to be $144 for a single isolate but only about half of thatwhen multiple specimens were analyzed at the same time (65,66). Wilck et al. published a similar figure, finding that thematerial costs, excluding labor, amounted to $59 per specimen(this included the deparaffinization and DNA extraction fromthe surgical specimen, materials for PCR, PCR cleanup, se-quencing, and the processing of a control) (109). Cook et al.(22) found even lower costs for identification of nontubercu-lous mycobacteria: without estimating capital equipmentcharges, the cost for labor and materials were $48 Canadian(about $32 U.S.).

We examined the costs for microbial identification based on16S rRNA gene sequences with a sequence analyzer based inthe clinical microbiology laboratory. We used the MicroSeqsoftware and two different DNA sequences. ABI model 3100was superior to ABI model 310 in the length of reliable se-quence (520 and 460 bp, respectively), in addition to the pub-lished superiority of ease of use and time of run. This de-creased the editing time and even allowed a correctidentification with only the forward sequence (Clarridge et al.,Abstr. 101st Gen. Meet. Am. Soc. Microbiol. 2001). Bottgersgroup routinely uses just one sequence. We found, not includ-ing instrument purchase, that the cost per isolate is about $44(material costs of $20 and labor costs of $24; the labor costswere up from the 2001 estimate in the abstract mentionedabove). Our costs assume some batch preparation of speci-mens. Our calculations were made on the basis of using thereagent and disposables for the ABI PRISM 3100 geneticanalyzer (with 16 capillaries), but they are comparable to thecost for the ABI PRISM 3100-Avant Genetic analyzer (with 4capillaries). We do not process extra specimens as controlssince the success of the process can be judged by the finalproduct. If the instrument is on a reagent-purchase agreementfor other tests such as human immunodeficiency virus geno-typing, instrument costs can be lower. Cook et al. (22) calcu-lated that for the identification of nontuberculous mycobacte-ria, one had to perform 100 analyses per month to have thecost of identification, including the instrument, be lower thancost of using conventional identification methods. In addition,especially if a sequence analyzer is available, being shared, forexample, with human immunodeficiency virus genotyping testsor research, the analysis can be more cost-effective and timely.

If one does not have a sequence analyzer in-house, sequenceanalysis is available at many university core laboratories. Thecosts can be about $30 to $50 for 500 bp if the isolate is sentand $12 to $18 if the DNA is provided in a condition ready tobe put on the analyzer. These costs include both the forwardand reverse sequences. The quality of sequences from suchinstitutions can vary considerably. To assemble the sequenceand make a comparison with databases with an evaluation,there is usually an additional $50 to $80 charge. Laboratoriesoffer the complete service of identification of bacteria by se-quence analysis for about $100 to $150.

All reports cited above agree that these costs for the defin-itive identification provided by 16S sequence analysis are rea-sonable when the costs of multiple investigations done in the

attempt to establish an identification are considered. In onestudy, the turnaround time for a mycobacterial identificationwas shortened to 24 h and the results were reported muchearlier (42). The additional costs of possible patient-relatedsequelae due to improper or lack of identification of an isolate,such as broad-spectrum intravenous antibiotics or incorrectdiagnosis, were not estimated.

STANDARDS FOR EDITORS, REVIEWERS, ANDLABORATORIANS

This section summarizes some standards that reviewers andeditors of clinical microbiology and infectious-disease journalsand clinical laboratory personnel who may not be actively en-gaged in sequence analysis themselves need to be aware of toevaluate sequence-based data for laboratory use or publica-tion.

(i) Novel organisms. The 16S rRNA gene sequence is anessential part of the description of a novel organism. Theauthors should be required to include 16S rRNA gene se-quence data in every publication describing a new strain, pref-erably the complete gene sequence (94). It is helpful to review-ers if this information is accessible in electronic form. Asummary of the differences from type strains could be pub-lished, but the whole sequence need not be published since itwastes space (68). The sequence should contain fewer than 1%undetermined bases without an explanation.

(ii) Poorly described organisms. The 16S rRNA gene se-quence identification should be performed (even if not re-ported) on isolates for which unusual disease associations arebeing asserted if the isolate is not extremely well known andeasy to identify. For example, it is permissible to refer to M.tuberculosis or to S. pyogenes without sequence data but not torefer to the rapidly growing environmental mycobacteria or tothe viridans streptococci as a whole or most of the species thatcomprise the viridans streptococci or Actinomyces spp., be-cause in these groups the phenotype is often polyphyletic anddescriptions may be ambiguous. Examples of other ambiguousidentifications are given in Table 3. The large changes in thenames and descriptions of gram-positive rods and anaerobesmake sequence identification necessary for any publication inwhich an association is claimed between an unusual microbeand a disease.

(iii) Number of strains. It is preferable that there be three ormore similar isolates for any publication of a new species,strain, or subspecies; some recommend that there be fivestrains (12, 18). However, single-strain reports of rare isolatescontinue to be published. Because the phenotype continues tobe extremely important to clinical microbiologists, phenotypictests should be associated with the sequence. The same phe-notypic tests for the new isolate should be performed in thesame laboratory on several known strains; i.e., it is not right tocompare tests performed in the investigator’s laboratory onlyon the strain being described with literature values for the typeor known strains with which the strains will be compared: theliterature values can be wrong, or methods may not be identi-cally performed.

(iv) At least 0.5% and possibly 1% difference is needed towarrant a new species name. As a rule of thumb until firmerrules are propagated by taxonomists, unless there are over-

858 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 20: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

whelming and well-documented phenotypic reasons, thereshould be a difference of at least 1 bp per 100 bases sequencedto warrant assigning a new species name. If there are clearphenotypic differences, as little as 0.5% may warrant a newspecies name. Recall that calculations using mathematicallydifferent search algorithms such as BLAST and NeedlemanWunsch may not give the same results. When differences be-tween isolates are larger than 5%, consideration should begiven to generating a new genus name.

(v) Sequence trumps phenotype. For an unknown strain,when relatedness by sequence and biochemical or morpholog-ical characteristics do not agree, in general the sequencetrumps the phenotype. This does not mean that the genotypecannot still be ambiguous, as detailed in the first examples ofclosely related taxa in Table 3. If the genotype-phenotypediscrepancy is an unexpected result, it is worth rechecking thatthe sequence and phenotypic studies have been performed onthe same exact strain.

(vi) What the sequencing laboratory needs to tell the client.The report from the laboratory performing the sequence anal-ysis should include the name of the bacterium and the percentdifference from the nearest type strain in the database used. Ifthis is a novel isolate or one that does not match closely with atype strain, it would be helpful if information on other similarisolates and an alignment of the sequence with nearest knownstrain, showing just the differences and the sequence in a textfile, were also included. Thus, for example, the client mightreceive the following report Capnocytophaga canimorsis(2.46%) and a text file of the actual sequence. The client canthen examine the quality of the sequence. If not satisfied (usingcriteria outlined above), the client could ask for a resequenc-ing. If the quality is good but the client considers the 2.46%difference is too large, the sequence in the text file can becompared to all sequences in GenBank or another database byusing the free program, BLAST, to see if there is a closermatch. We would perform a BLAST search using a publicdatabase on a sequence with greater than 1% difference froma known type strain. If there was no closer match to a typestrain, the organism could be reported as C. canimorsus (mostclosely resembling) or C. canimorsus, novel genotype. Table 5shows different ways of reporting.

CONCLUSIONS

A colleague once remarked that he thought it was astound-ing that the simple stain developed by Gram would have pro-found structural and taxonomic significance. I find it remark-able and equally serendipitous that 16S rRNA gene sequencewould have almost the exact amount of variability to define aspecies or at least provide a clinically useful distinction amongbacterial strains.

We have seen that identifying bacteria isolated in the clinicallaboratory by sequence instead of phenotype can improve clin-ical microbiology by better identifying poorly described, rarelyisolated, or biochemically aberrant strains. 16S rRNA genesequences allow bacterial identification that is more robust,reproducible, and accurate than that obtained by phenotypictesting. The test results are less subjective. 16S rRNA genesequence analysis can lead to the discovery of novel pathogens.

16S rRNA gene sequence analysis can identify nonculturedbacteria, allowing independence from growth conditions.

The correct designation of organisms is important. For ex-ample, when we refer to a complex of organisms by a singlename and these organisms have different pathogenic potential,the disease process is obscured. As it is recognized that thecorrect taxonomy or name assignment can make a difference inclinical outcome, there should be a demand for more wide-spread use of the accurate identifications that 16S rRNA genesequence analysis can provide.

A downside to the better discrimination provided by 16SrRNA gene sequence analysis is that it introduces a commu-nication difficulty, since there are many more distinct se-quences than names or phenotypic descriptions. Without aone-to-one correspondence, there can be a problem in assign-ing names to sequences in a meaningful way. The additionaldata may also be difficult to fully communicate to clinicalcolleagues. An attempt to address this is presented in Table 5.Further, the information that 16S rRNA gene sequencing hasmade available so far (and even more so in the future) con-fronts the clinical microbiologist with having to change somefamiliar concepts of species identification.

16S rRNA gene sequencing traditionally played a limitedrole in the identification of microorganisms in clinical micro-biology laboratories, mainly due to high costs, requirements forgreat technical skill, and the lack of user-friendly comparativesequencing analysis software and validated databases. How-ever, the availability of improved DNA sequencing techniques,vastly increased databases and more readily available kits andsoftware, makes this technology a competitive alternative toroutine microbial identification techniques for some groups oforganisms, such as mycobacteria. The costs can be also com-parable to traditional identification methods for other slow-growing and difficult-to-identify organisms, particularly if asequencer is available for multitasking in other sections of thelaboratory.

An additional important function for 16S rRNA gene se-quencing is to provide accurately grouped organisms for fur-ther study. Despite its accuracy, 16S rRNA gene sequenceanalysis lacks widespread use beyond the large and referencelaboratories because of technical and cost considerations.Thus, a future challenge for the large clinical, reference, andresearch laboratories is to translate information from 16SrRNA gene sequencing into convenient biochemical testingschemes, making the accuracy of the genotypic identificationavailable to the smaller and routine clinical microbiology lab-oratories.

ACKNOWLEDGMENTS

I am grateful to the following people for their knowledge, helpful-ness, and willingness to share: Silvia Attorri, John Bartell, SharonHeward, Kristina Hulten, S. Montgomery, Liane Tsai, R. Visanas,Michael Waddington, Qing Zhang, and the people in the MicrobiologyLaboratories at the VA Medical Center, Houston and Seattle.

REFERENCES

1. Altschul, S. F., W. Gish, W. Miller, E. W. Myers, and D. J. Lipman. 1990.Basic Local Alignment Search Tool. J. Mol. Biol. 215:729–731.

2. Banks, J., S. Poole, S. P. Nair, J. Lewthwaite, P. Tabona, R. McNab, M.Wilson, A. Paul, and B. Henderson. 2002. Streptococcus sanguis secretesCD14-binding proteins that stimulate cytokine synthesis: a clue to thepathogenesis of infective (bacterial) endocarditis. Microb. Pathog. 32:105–116.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 859

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 21: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

3. Bansal, A. K., and T. E. Meyer. 2002. Evolutionary analysis by whole-genome comparisons. J. Bacteriol. 184:2260–2272.

4. Boddinghaus, B., J. Wolters, W. Heikens, and E. C. Bottger. 1990. Phylo-genetic analysis and identification of different serovars of Mycobacteriumintracellulare at the molecular level. FEMS Microbiol. Lett. 58:197–203.

5. Boddinghaus, B., T. Rogall, T. Flohr, H. Blocker, and E. C. Bottger. 1990.Detection and identification of mycobacteria by amplification of rRNA.J. Clin. Microbiol. 28:1751–1759.

6. Bosshard, P. P., S. Abels, R. Zbinden, E. C. Bottger, and M. Altwegg. 2003.Ribosomal DNA sequencing for identification of aerobic gram-positiverods in the clinical laboratory (an 18-month evaluation). J. Clin. Microbiol.41:4134–4140.

7. Bottger, E. C. 1989. Rapid determination of bacterial ribosomal RNAsequences by direct sequencing of enzymatically amplified DNA. FEMSMicrobiol. Lett. 65:171–176.

8. Brouqui, P., and D. Raoult. 2001. Endocarditis due to rare and fastidiousbacteria. Clin. Microbiol. Rev. 14:177–207.

9. Celard, M., G. Gevigney, S. Mosnier, P. Buttard, Y. Benito, J. Etienne, andF. Vandenesch. 1999. Polymerase chain reaction analysis for diagnosis ofTropheryma whippelii infective endocarditis in two patients with no previousevidence of Whipple’s disease. Clin. Infect. Dis. 29:1348–1349.

10. Chen, H., K. Hulten, and J. E. Clarridge III. 2002. Taxonomic subgroups ofPasteurella multocida correlate with clinical presentation. J. Clin. Microbiol.40:3438–3441.

11. Chen, K., H. Neimark, P. Rumore, and C. R. Steinman. 1989. Broad-rangeDNA probes for detecting and amplifying eubacterial nucleic acids. FEMSMicrobiol. Lett. 57:19–24.

12. Christensen, H., M. Bisgaard, W. Frederiksen, R. Mutters, P. Kuhnert, andJ. E. Olsen. 2001. Is characterization of a single isolate sufficient for validpublication of a new genus or species? Proposal to modify recommendation30b of the Bacteriological Code (1990 revision). Int. J. Syst. Bacteriol.51:2221–2222.

13. Clarridge, J. E., III, C. Osting, M. Jalali, J. Osborne, and M. Waddington.1999. Genotypic and phenotypic characterization of “Streptococcus milleri”group isolates from a Veterans Administration hospital population. J. Clin.Microbiol. 37:3681–3687.

14. Clarridge, J. E., III, S. Attorri, D. M. Musher, J. Hebert, and S. Dunbar.2001. Streptococcus intermedius, Streptococcus constellatus, and Streptococ-cus anginosus (“Streptococcus milleri group”) are of different clinical impor-tance and are not equally associated with abscess. Clin. Infect. Dis. 32:1511–1515.

15. Clarridge, J. E., III, S. M. Attorri, Q. Zhang, and J. Bartell. 2001. 16Sribosomal DNA sequence analysis distinguishes biotypes of Streptococcusbovis: Streptococcus bovis biotype II/2 is a separate genospecies and thepredominant clinical isolate in adult males. J. Clin. Microbiol. 39:1549–1552.

16. Clarridge, J. E., III, T. J. Raich, A. Sjosted, G. Sandstrom, R. O. Darouiche,R. M. Shawar, P. R. Georghiou, C. Osting, and L. Vo. 1996. Characteriza-tion of two unusual clinically significant Francisella strains. J. Clin. Micro-biol. 34:1995–2000.

17. Clarridge, J. E., III, and Q. Zhang. 2002. Genotypic diversity of clinicalActinomyces species: phenotype, source, and disease correlation amonggenospecies. J. Clin. Microbiol. 40:3442–3448.

18. Clayton, R. A., G. Sutton, P. S. Hinkle, Jr., C. Bult, and C. Fields. 1995.Intraspecific variation in small-subunit rRNA sequences in GenBank: whysingle sequences may not adequately represent prokaryotic taxa. Int. J. Syst.Bacteriol. 45:595–599.

19. Cloud, J. L., H. Neal, R. Rosenberry, C. Y. Turenne, M. Jama, D. R.Hillyard, and K. C. Carroll. 2002. Identification of Mycobacterium spp. byusing a commercial 16S ribosomal DNA sequencing kit and additionalsequencing libraries. J. Clin. Microbiol. 40:400–406.

20. Collins, M. D., R. A. Hutson, N. Nikolaitchouk, A. Nyberg, and E. Falsen.2003. Luteococcus sanguinis sp. nov., isolated from human blood. Int. J.Syst. Evol. Microbiol. 53:1889–1891.

21. Collins, M. D., R. Higgins, S. Messier, M. Fortin, R. A. Hutson, P. A.Lawson, and E. Falsen. 2003. Allofustis seminis gen. nov., sp. nov., a novelgram-positive, catalase-negative, rod-shaped bacterium from pig semen.Int. J. Syst. Evol. Microbiol. 53:811–814.

22. Cook, V. J., C. Y. Turenne, J. Wolfe, R. Pauls, and A. Kabani. 2003.Conventional methods versus 16S ribosomal DNA sequencing for identifi-cation of nontuberculous mycobacteria: cost analysis. J. Clin. Microbiol.41:1010–1015.

23. Coyle, M. B., R. B. Leonard, D. J. Nowowiejski, A. Malekniazi, and D. J.Finn. 1993. Evidence of multiple taxa within commercially available refer-ence strains of Corynebacterium xerosis. J. Clin. Microbiol. 31:1788–1793.

24. Devulder, G., G. Perriere, F. Baty, and J. P. Flandrois. 2003. BIBI, abioinformatics bacterial identification tool. J. Clin. Microbiol. 41:1785–1787.

25. Dewhirst, F. E., B. J. Paster, N. Tzellas, B. Coleman, J. Downes, D. A.Spratt, and W. G. Wade. 2001. Characterization of novel human oral iso-lates and cloned 16S rDNA sequences that fall in the family Coriobacteri-aceae: description of Olsenella gen., reclassification of Lactobacillus uli as

Olsenella uli comb. Description of Olsenella profusa sp. Int. J. Syst. Bacte-riol. 51:1797–1804.

26. Dixon, B. 2001. What’s in a name? Lancet Infect. Dis. 1:212.27. Drancourt, M., C. Bollet, A. Carlioz, R. Martelin, J. P. Gayral, and D.

Raoult. 2000. 16S ribosomal DNA sequence analysis of a large collection ofenvironmental and clinical unidentifiable bacterial isolates. J. Clin. Micro-biol. 38:3623–3630.

28. Dubnau, D., I. Smith, P. Morell, and J. Marmur. 1965. Gene conservationin Bacillus species. I. Conserved genetic and nucleic acid base sequencehomologies. Proc. Natl. Acad. Sci. USA 54:491–498.

29. Edwards, U., T. Rogall, H. Blocker, M. Emde, and E. C. Bottger. 1989.Isolation and direct complete nucleotide determination of entire genes.Characterization of a gene coding for 16S ribosomal RNA. Nucleic AcidsRes. 17:7843–7853.

30. Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximumlikelihood approach. Mol. Evol. 17:368–376.

31. Felsenstein, J. 1989. PHYLIP—phylogeny inference package (version 3.2).Cladistics 5:164–166.

32. Fournier, P. E., J. S. Dumler, G. Greub, J. Zhang, Y. Wu, and D. Raoult.2003. Gene sequence-based criteria for identification of new Rickettsiaisolates and description of Rickettsia heilongjiangensis sp. nov. J. Clin. Mi-crobiol. 41:5456–5465.

33. Fox, G. E., J. D. Wisotzkey, and P. Jurtshuk, Jr. 1992. How close is close:16S rRNA sequence identity may not be sufficient to guarantee speciesidentity. Int. J. Syst. Bacteriol. 42:166–170.

34. Fredricks, D. N., and D. A. Relman. 1996. Sequence-based identification ofmicrobial pathogens: a reconsideration of Koch’s postulates. Clin. Micro-biol. Rev. 9:18–33.

35. Funke, G., A. V. Graevenitz, J. E. Clarridge III, and K. Bernard. 1997.Clinical microbiology of coryneform organisms. Clin. Microbiol. Rev. 10:125–159.

36. Garrity, G. M., and J. G. Holt. 2001. The road map to the manual, p.119–166. In G. M. Garrity (ed), Bergey’s manual of systematic bacteriology.Springer-Verlag, New York, N.Y.

37. Gauduchon, V., L. Chalabreysse, J. Etienne, M. Celard, Y. Benito, H. T.Lepidi, F. Thivolet-Bejui, and F. Vandenesch. 2003. Molecular diagnosis ofinfective endocarditis by PCR amplification and direct sequencing of DNAfrom valve tissue. J. Clin. Microbiol. 41:763–766.

38. Gee, J. E., C. T. Sacchi, M. B. Glass, B. K. De, R. S. Weyant, P. N. Levett,A. M. Whitney, A. R. Hoffmaster, and T. Popovic. 2003. Use of 16S rRNAgene sequencing for rapid identification and differentiation of Burkholderiapseudomallei and B. mallei. J. Clin. Microbiol. 41:4647–4654.

39. Goldenberger, D., A. Kunzli, P. Vogt, R. Zbinden, and M. Altwegg. 1997.Molecular diagnosis of bacterial endocarditis by broad-range, PCR ampli-fication and direct sequencing. J. Clin. Microbiol. 35:2733–2739.

40. Gupta, S., and M. C. Maiden. 2001. Exploring the evolution of diversity inpathogen populations. Trends Microbiol. 9:181–185.

41. Haas, W. H., W. R. Butler, P. Kirschner, B. B. Plikaytis, M. B. Coyle, B.Amthor, A. G. Steigerwalt, D. J. Brenner, M. Salfinger, J. T. Crawford,E. C. Bottger, and H. J. Bremer. 1997. A new agent of mycobacteriallymphadenitis in children. Mycobacterium heidelbergense sp. nov. J. Clin.Microbiol. 35:3203–3209.

42. Hall, L., K. A. Doerr, L. S. Wohlfiel, and G. D. Roberts. 2003. Evaluation ofthe MicroSeq system for Identification of mycobacteria by 16S ribosomalDNA sequencing and its integration into a routine clinical mycobacteriol-ogy laboratory. J. Clin. Microbiol. 41:1447–1453.

43. Hall, V., M. D. Collins, R. A. Hutson, E. Falsen, E. Inganas, and B. I.Duerden. 2003. Actinobaculum urinale sp. nov., from human urine. Int. J.Syst. Evol. Microbiol. 53:679–682.

44. Harmsen, D., and H. Karch. 2004. 16S rDNA for diagnosing pathogens: aliving tree. ASM News 70:19–24.

45. Herrero, I. A., M. S. Rouse, K. E. Piper, S. A. Alyaseen, J. M. Steckelberg,and R. Patel. 2002. Reevaluation of Streptococcus bovis endocarditis casesfrom 1975 to 1985 by 16S ribosomal DNA sequence analysis. J. Clin.Microbiol. 40:3848–3850.

46. Horn, M., and M. Wagner. 2001. Evidence for additional genus-level di-versity of Chlamydiales in the environment. FEMS Microbiol. Lett. 204:71–74.

47. Jalava, J., P. Kotilainen, S. Nikkari, M. Skurnik, E. Vanttinen, O. P.Lehtonen, E. Eerola, and P. Toivanen. 1995. Use of the polymerase chainreaction and DNA sequencing for detection of Bartonella quintana in theaortic valve of a patient with culture-negative infective endocarditis. Clin.Infect. Dis. 21:891–896.

48. Kattar, M. M., J. F. Chavez, A. P. Limaye, S. L. Rassoulian-Barrett, S. L.Yarfitz, L. C. Carlson, Y. Houze, S. Swanzy, B. L. Wood, and B. T. Cookson.2001. Application of 16S rRNA gene sequencing to identify Bordetella hinziias the causative agent of fatal septicemia. J. Clin. Microbiol. 38:789–794.

49. Kimura, M. 1980. A simple method for estimating evolutionary rates ofbase substitutions through comparative studies of nucleotide sequences.Mol. Evol. 16:111–120.

50. Kirschner, P. and E. C. Bottger. 1998. Species identification of mycobac-teria using rDNA sequencing. Methods Mol. Biol. 101:349–361.

860 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 22: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

51. Kirschner, P., B. Springer, U. Vogel, A. Meier, A. Wrede, M. Kiekenbeck,F. C. Bange, and E. C. Bottger. 1993. Genotypic identification of mycobac-teria by nucleic acid sequence determination: report of a 2-year experiencein a clinical laboratory. J. Clin. Microbiol. 31:2882–2889.

52. Kolbert, C. P. and D. H. Persing. 1999. Ribosomal DNA sequencing as atool for identification of bacterial pathogens. Curr. Opin. Microbiol. 2:299–305.

53. Koukila-Kahkola, P., B. Springer, E. C. Bottger, L. Paulin, E. Jantzen, andM. L. Katila. 1995. Mycobacterium branderi sp. nov., a new potential humanpathogen. Int. J. Syst. Bacteriol. 45:549–553.

54. Krieg, N. R. and J. G. Holt (ed.). 1984. Bergey’s manual of systematicbacteriology. The Williams & Wilkins Co., Baltimore, Md.

55. Krieg, N. R., and G. M. Garrity. 2001. On using the manual, p. 15–19. InG. M. Garrity (ed.), Bergey’s manual of systematic bacteriology, 2nd ed.Bergey’s Manual Trust. Springer-Verlag, New York, N.Y.

56. Lan, R., and P. R. Reeves. 2001. When does a clone deserve a name? Aperspective on bacterial species based on population genetics. Trends Mi-crobiol. 9:419–423.

57. Maidak, B. L., J. R. Cole, C. T. Parker, Jr., G. M. Garrity, N. Larsen, B. Li,T. G. Lilburn, M. J. McCaughey, G. J. Olsen, R. Overbeek, S. Pramanik, T.M. Schmidt, J. M. Tiedje, and C. R. Woese. 1999. A new version of the RDP(Ribosomal Database Project). Nucleic Acids Res. 27:171–173.

58. Melito, P. L., C. Munro, P. R. Chipman, D. L. Woodward, T. F. Booth, andF. G. Rodgers. 2001. Helicobacter winghamensis sp. nov., a novel Helicobac-ter sp. isolated from patients with gastroenteritis. J. Clin. Microbiol. 39:2412–2417.

59. Reference deleted.60. Murray, P. R., E. J. Baron, M. A. Pfaller, F. C. Tenover and R. H. Yolken

(ed.). 1999. Manual of clinical microbiology, 7th ed. ASM Press, Washing-ton, D.C.

61. Ninet, B., M. Monod, S. Emler, J. Pawlowski, C. Metral, P. Rohner, R.Auckenthaler, and B. Hirschel. 1996. Two different 16S rRNA genes in amycobacterial strain. J. Clin. Microbiol. 34:2531–2536.

62. Pace, N. 1997. A molecular view of microbial diversity and the biosphere.Science 276:734–740.

63. Pai, S., N. Esen, X. Pan, and J. M. Musser. 1997. Routine rapid Mycobac-terium species assignment based on species-specific allelic variation in the65-kilodalton heat shock protein gene (hsp65). Arch. Pathol. Lab. Med.121:859–864.

64. Palys, T., L. K. Nakamura, and F. M. Cohan. 1997. Discovery and classi-fication of ecological diversity in the bacterial world: the role of DNAsequence data. Int. J. Syst. Bacteriol. 47:1145–1156.

65. Patel, J. B. 2001. 16S rRNA gene sequencing for bacterial pathogen iden-tification in the clinical laboratory. Mol. Diagn. 6:313–321.

66. Patel, J. B., D. G. Leonard, X. Pan, J. M. Musser, R. F. Berman, and I.Nachamkin. 2000. Sequence-based identification of Mycobacterium speciesusing the Microseq 500 16S rDNA bacterial identification system. J. Clin.Microbiol. 38:246–251.

67. Patel, J. B., J. E. Clarridge III, M. S. Schuster, J. Osborne, M. Waddington,and I. Nachamkin. 1999. Leptotrichia species bacteremia in a neutropenicpatient. J. Clin. Microbiol. 37:2064–2067.

68. Patel, R., K. E. Piper, M. S. Rouse, J. M. Steckelberg, J. R. Uhl, P. Kohner,M. K. Hopkins, F. R. Cockerill III, and B. C. Kline. 1998. Determination of16S rRNA sequences of enterococci and application to species identifica-tion of nonmotile Enterococcus gallinarum isolates. J. Clin. Microbiol. 36:3399–3407.

69. Pfister, P., M. Risch, D. E. Brodersen, and E. C. Bottger. 2003. Role of 16SrRNA helix 44 in ribosomal resistance to hygromycin B. Antimicrob.Agents Chemother. 47:1496–1502.

70. Pfister, P., S. Hobbie, Q. Vicens, E. C. Bottger, and E. Westhof. 2003. Themolecular basis for A-site mutations conferring aminoglycoside resistance:relationship between ribosomal susceptibility and X-ray crystal structures.Chembiochem 4:1078–1088.

71. Qian, Q., Y. Tang, C. P. Kolbert, C. A. Torgerson, J. G. Hughes, E. A.Vetter, W. Harmsen, S. O. Montgomery, F. R. Cockerill, III, and D. H.Persing. 2001. Direct identification of bacteria from positive blood culturesby amplification and sequencing of the 16S rRNA gene: evaluation ofBACTEC 9240 instrument true-positive and false-positive results. J. Clin.Microbiol. 39:3578–3582.

72. Rantakokko-Jalava, K., S. Nikkari, J. Jalava, E. Eerola, M. Skurnik, O.Meurman, O. Ruuskanen, A. Alanen, E. Kotilainen, P. Toivanen, and P.Kotilainen. 2000. Direct amplification of rRNA genes in diagnosis of bac-terial infections. J. Clin. Microbiol. 38:32–39.

73. Regnery, R. L., B. E. Anderson, J. E. Clarridge III, M. C. Rodriguez-Barradas, D. C. Jones, and J. H. Carr. 1992. Characterization of a novelRochalimaea species, R. henselae sp. nov., isolated from blood of a febrile,human immunodeficiency virus-positive patient. J. Clin. Microbiol. 30:265–274.

74. Reischl, U., K. Feldmann, L. Naumann, B. J. Gaugler, B. Ninet, B. Hir-schel, and S. Emler. 1998. 16S rRNA sequence diversity in Mycobacteriumcelatum strains caused by presence of two different copies of 16S rRNAgene. J. Clin. Microbiol. 36:1761–1764.

75. Relman, D. A. 1999. The search for unrecognized pathogens. Science 284:1308–1310.

76. Relman, D. A. 1993. Universal bacterial 16S rDNA amplification and se-quencing, p. 489–495. In D. H. Persing, T. F. Smith, F. C. Tenover, and T. J.White (ed.), Diagnostic molecular microbiology: principles and applica-tions. ASM Press, Washington, D.C.

77. Relman, D. A., J. S. Loutit, T. M. Schmidt, S. Falkow, and L. S. Tompkins.1990. The agent of bacillary angiomatosis. An approach to the identificationof uncultured pathogens. N. Engl. J. Med. 323:1573–1580.

78. Relman, D. A., T. M. Schmidt, R. P. MacDermott, and S. Falkow. 1992.Identification of the uncultured bacillus of Whipple’s disease. N. Engl.J. Med. 327:293–301.

79. Ringuet, H., C. Akoua-Koffi, S. Honore, A. Varnerot, V. Vincent, P. Berche,J. L. Gaillard, and C. Pierre-Audigier. 1999. hsp65 sequencing for identi-fication of rapidly growing mycobacteria. J. Clin. Microbiol. 37:852–857.

80. Rogall, T., T. Flohr, and E. C. Bottger. 1990. Differentiation of Mycobac-terium species by direct sequencing of amplified DNA. J. Gen. Microbiol.136:1915–1920.

81. Rogall, T., J. Wolters, T. Flohr, and E. C. Bottger. 1990. Towards a phy-logeny and definition of species at the molecular level within the genusMycobacterium. Int. J. Syst. Bacteriol. 40:323–330.

82. Roth, A., M. Fischer, M. E. Hamid, S. Michalke, W. Ludwig, and H. Mauch.1998. Differentiation of phylogenetically related slowly growing mycobac-teria based on 16S–23S rRNA gene internal transcribed spacer sequences.J. Clin. Microbiol. 36:139–147.

83. Roth, A., S. Andrees, R. M. Kroppenstedt, D. Harmsen, and H. Mauch.2003. Phylogeny of the genus Nocardia based on reassessed 16S rRNA genesequences reveals underspeciation and division of strains classified as No-cardia asteroides into three established species and two unnamed taxons.J. Clin. Microbiol. 41:851–856.

84. Sacchi, C. T., A. M. Whitney, L. W. Mayer, R. Morey, A. Steigerwalt, A.Boras, R. S. Weyant, and T. Popovic. 2002. Sequencing of 16S rRNA gene:a rapid tool for identification of Bacillus anthracis. Emerg. Infect. Dis.8:1117–1123.

85. Sacchi, C. T., A. M. Whitney, M. W. Reeves, L. W. Mayer, and T. Popovic.2002. Sequence diversity of Neisseria meningitidis 16S rRNA genes and useof 16S rRNA gene sequencing as a molecular subtyping tool. J. Clin.Microbiol. 40:4520–4527.

86. Saitou, N., and M. Nei. 1987. The neighbor-joining method: a new methodfor reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406–425.

87. Sambrook, J., E. F. Fritsch, and T. Maniatis. 1989. Molecular cloning: alaboratory manual, 2nd ed. Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y.

88. Schmalenberger, A., F. Schwieger, and C. C. Tebbe. 2001. Effect of primershybridizing to different evolutionarily conserved regions of the small- sub-unit rRNA gene in PCR-based microbial community analyses and geneticprofiling. Appl. Environ. Microbiol. 67:3557–3563.

89. Song, J., S. C. Lee, J. W. Kang, H. J. Baek, and J. W. Suh. 2004. Phyloge-netic analysis of Streptomyces spp. isolated from potato scab lesions inKorea on the basis of 16S rRNA gene and 16S-23S rDNA internally tran-scribed spacer sequences. Int. J. Syst. Evol. Microbiol. 54:203–209.

90. Song, Y., C. Liu, M. McTeague, and S. M. Finegold. 2003. Ribosomal DNAsequence-based analysis of clinically significant gram-positive anaerobiccocci. J. Clin. Microbiol. 41:1363–1369.

91. Song, Y., C. Liu, D. R. Molitoris, T. J. Tomzynski, P. A. Lawson, M. D.Collins, and S. M. Finegold. 2003. Clostridium bolteae sp. nov., isolatedfrom human sources. Syst. Appl. Microbiol. 26:84–89.

92. Springer, B., E. C. Bottger, P. Kirschner, and R. J. Wallace, Jr. 1995.Phylogeny of the Mycobacterium chelonae-like organism based on partialsequencing of the 16S rRNA gene and proposal of Mycobacterium muco-genicum sp. nov. Int. J. Syst. Bacteriol. 45:262–267.

93. Springer, B., L. Stockman, K. Teschner, G. D. Roberts, and E. C. Bottger.1996. Two-laboratory collaborative study on identification of mycobacteria:molecular versus phenotypic methods. J. Clin. Microbiol. 34:296–303.

94. Stackebrandt, E., W. Frederiksen, G. M. Garrity, P. A. Grimont, P. Kamp-fer, M. C. Maiden, X. Nesme, R. Rossello-Mora, J. Swings, H. G. Truper,L. Vauterin, A. C. Ward, and W. B. Whitman. 2002. Report of the ad hoccommittee for the re-evaluation of the species definition in bacteriology.Int. J. Syst. Evol. Microbiol. 52:1043–1047.

95. Sultana, F., Y. Kawamura, X. G. Hou, S. E. Shu, and T. Ezaki. 1998.Determination of 23S rRNA sequences from members of the genus Strep-tococcus and characterization of genetically distinct organisms previouslyidentified as members of the Streptococcus anginosus group. FEMS Micro-biol. Lett. 158:223–230.

96. Sussman, J. I., E. J. Baron, M. J. Tenenbaum, M. H. Kaplan, J. Greenspan,R. R. Facklam, M. B. Tyburski, M. A. Goldman, B. F. Kanzer, and R. A.Pizzarello. 1986. Viridans streptococcal endocarditis: clinical, microbiolog-ical, and echocardiographic correlations. J. Infect. Dis. 154:597–603.

97. Tang, Y. W., A. Von Graevenitz, M. G. Waddington, M. K. Hopkins, D. H.Smith, H. Li, C. P. Kolbert, S. O. Montgomery, and D. H. Persing. 2000.Identification of coryneform bacterial isolates by ribosomal DNA sequenceanalysis. J. Clin. Microbiol. 38:1676–1678.

VOL. 17, 2004 16S rRNA GENE SEQUENCE FOR BACTERIAL IDENTIFICATION 861

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from

Page 23: Impact of 16S rRNA Gene Sequence Analysis for ... · change in the 16S rRNA gene sequence is not known, it does mark evolutionary distance and relatedness of organisms (44, 49, 62,

98. Tang, Y. W., N. M. Ellis, M. K. Hopkins, D. H. Smith, D. E. Dodge, andD. H. Persing. 1998. Comparison of phenotypic and genotypic technique foridentification of unusual aerobic pathogenic gram-negative bacilli. J. Clin.Microbiol. 36:3674–3679.

99. Teske, A., J. Wolters, and E. C. Bottger. 1991. The 16S rRNA nucleotidesequence of Mycobacterium leprae: phylogenetic position and developmentof DNA probes. FEMS Microbiol. Lett. 64:231–237.

100. Thorne, J. L., H. Kishino, and I. S. Painter. 1998. Estimating the rate ofevolution of the rate of molecular evolution. Mol. Biol. Evol. 15:1647–1657.

101. Tortoli, E. 2003. Impact of genotypic studies on mycobacterial taxonomy:the new mycobacteria of the 1990s. Clin. Microbiol. Rev. 16:319–354.

102. Tortoli, E., A. Bartoloni, E. C. Bottger, S. Emler, C. Garzelli, E. Magliano,A. Mantella, N. Rastogi, L. Rindi, C. Scarparo, and P. Urbano. 2001.Burden of unidentifiable mycobacteria in a reference laboratory. J. Clin.Microbiol. 39:4058–4065.

103. Turenne, C. Y., L. Tschetter, J. Wolfe, and A. Kabani. 2001. Necessity ofquality- controlled 16S rRNA gene sequence databases: identifying nontu-berculous Mycobacterium species. J. Clin. Microbiol. 39:3637–3648.

104. Ueda, K., T. Seki, T. Kudo, T. Yoshida, and M. Kataoka. Two distinctmechanisms cause heterogeneity of 16S rRNA. J. Bacteriol. 181:78–82.

105. Wallace, R. J., Jr., M. Tsukamura, B. A. Brown, J. Brown, V. A. Steingrube,Y. Zhang, and D. R. Nash. 1990. Cefotaxime-resistant Nocardia asteroidesstrains are isolates of the controversial species Nocardia farcinica. J. Clin.Microbiol. 28:2726–2732.

106. Ward, B. B. 2002. How many species of prokaryotes are there? Proc. Natl.Acad. Sci. USA 99:10234–10236.

107. Waterman, M. S. 1995. Introduction to computational biology, p. 201–202.Chapman & Hall, New York, N.Y.

108. Wayne, L. G., R. C. Good, E. C. Bottger, R. Butler, M. Dorsch, T. Ezaki, W.Gross, V. Jonas, J. Kilburn, P. Kirschner, M. I. Krichevsky, M. Ridell,T. M. Shinnick, B. Springer, E. Stackebrandt, I. Tarnok, Z. Tarnok, H.Tasaka, V. Vincent, N. G. Warren, C. A. Knott, and R. Johnson. 1996.

Semantide- and chemotaxonomy-based analyses of some problematic phe-notypic clusters of slowly growing mycobacteria, a cooperative study of theInternational Working Group on Mycobacterial Taxonomy. Int. J. Syst.Bacteriol. 46:280–297.

109. Wilck, M. B., Y. Wu, J. G. Howe, J. Y. Crouch, and S. C. Edberg. 2001.Endocarditis caused by culture-negative organisms visible by Brown andBrenn staining: utility of PCR and DNA sequencing for diagnosis. J. Clin.Microbiol. 39:2025–2027.

110. Wilson, R. W., V. A. Steingrube, E. C. Bottger, B. Springer, B. A. Brown-Elliott, V. Vincent, K. C. Jost, Jr., Y. Zhang, M. J. Garcia, S. H. Chiu, G. O.Onyi, H. Rossmoore, D. R. Nash, and R. J. Wallace, Jr. 2001. Mycobacte-rium immunogenum sp. nov., a novel species related to Mycobacteriumabscessus and associated with clinical disease, pseudo- outbreaks and con-taminated metalworking fluids: an international cooperative study on my-cobacterial taxonomy. Int. J. Syst. Evol. Microbiol. 51:1751–1764.

111. Woese, C. R., E. Stackebrandt, T. J. Macke, and G. E. Fox. 1985. Aphylogenetic definition of the major eubacterial taxa. Syst. Appl. Microbiol.6:143–151.

112. Woese, C. R., G. J. Olsen, M. Ibba, and D. Soll. 2000. Comparisons ofcomplete genome sequences allow the most objective and comprehensivedescriptions possible of a lineage’s evolution. Microbiol. Mol. Biol. Rev.64:202–236.

113. Woese, C. R. 1987. Bacterial evolution. Microbiol. Rev. 51:221–271.114. Woo, P. C., K. H. Ng, S. K. Lau, K. T. Yip, A. M. Fung, K. W. Leung, D. M.

Tam, T. L. Que, and K. Y. Yuen. 2003. Usefulness of the MicroSeq 500 16Sribosomal DNA-based bacterial identification system for identification ofclinically significant bacterial isolates with ambiguous biochemical profiles.J. Clin. Microbiol. 41:1996–2001.

115. Zhang, Q., R. Kennon, M. A. Koza, K. Hulten, and J. E. Clarridge III. 2002.Pseudoepidemic due to a unique strain of Mycobacterium szulqai: genotypic,phenotypic, and epidemiological analysis. J. Clin. Microbiol. 40:1134–1139.

862 CLARRIDGE CLIN. MICROBIOL. REV.

on February 10, 2021 by guest

http://cmr.asm

.org/D

ownloaded from