Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s...

12
Copyright 0 1992 by the Genetics Society of America Sequence Heterogeneity Between the Two Genes Encoding 16s rRNA From the Halophilic Archaebacterium Haloarcula marismortui Shanthini Mylvaganam and Patrick P. Dennis Program in Evolutionary Biology, Canadian Institute for Advanced Research, and Department of Biochemistry, University of British Columbia, Vancouver, British Columbia, Canada V6T 123 Manuscript received July 8, 199 1 Accepted for publication November 14, 1991 ABSTRACT The halophilic archaebacterium, Haloarcula marismortui, contains two nonadjacent ribosomal RNA operons, designated rrnA and rrnB, in its genome. The 16s rRNA genes within these operons are 1472 nucleotides in length and differ by nucleotide substitutions at 74 positions. The substitutions are not uniformly distributed but rather are localized within three domains of 16s rRNA; more than two-thirds of the differences occur within the domain bounded by nucleotides 508 and 823. This domain is known to be important for P site bindingof aminoacylated tRNA and for 30-50s subunit association. Using S1 nuclease protection, it has been shown that the 16s rRNAs transcribed from both operons are equally represented in the functional 70s ribosome population. Comparison of these two H. marismortui sequences to the 16s gene sequences from related halophilic genera suggests that (i) in diverging genera, mutational differences in 16s gene sequences are not clustered but rather are more generally distributed throughout the length of the 16s sequence, and (ii) the rrnB sequence, particularly within the 508-823 domain, is more different from the out group sequences than is the rrnA sequence. Several possible explanations for the evolutionary origin and maintenance ofthis sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of multicopy rRNA genes are identical or nearly identical. The molecular mechanism responsible for maintaining this sequence homogeneity has not yet been elaborated, although the necessity for such a mechanism seems obvious (DOVER 1982, 1987). The primary nucleotide sequence of rRNA is essential for maintaining second- ary and higher order structural features and for highly specific interactions with a multiplicity of other com- ponents of the translation apparatus including (i) ri- bosomal proteins; (ii) translation initiation, elongation and termination-release factors; (iii) aminoacylated, peptidyl and deacylated tRNAs; and (iv) different mRNAs. (For reviews, see NOLLER et al. 1990; BRI- MACOMBE et al. 1990.) Changes in rRNAgene se- quences that are fixed at all loci are usually considered to be either nearly neutral or are balanced by com- pensatory changes. Compensatory changes can be in- tramolecular, altering a second site within the same molecule or intermolecular, altering a nucleotide or amino acid site within an interacting component of the translation apparatus. Intramolecular compensa- tory changes within rRNAs have been documented in detail (WOESE et al. 1983). Fixation of mutational changes at all rRNA loci allows optimization of the translation apparatus according to aunique set of One of us (P.P.D.) would like to dedicate this paper to MEL GLEITER on the occasion of his retirement. Genetics 130 399-410 (March, 1992) species specific criteria and accounts at least in part for the divergence in rRNA sequences between even closely related species (DOVER 1982, 1987). There are a number of examples where more than one small or large subunit rRNA encoding gene from the genome of an organism has beenanalyzed.In most cases, the sequences are as anticipated, either identical, or exhibit a lowlevel of heterogeneity at about 0.1% of the nucleotide positions (DRYDEN and KAPLAN 1991 ; HEINONIN, SCHNARE and GRAY 1990; MADEN et al. 1987). A notable exception to this rule occurs in the eucaryotic parasite Plasmodium berghei. The genome of this organism contains two types of small subunit rRNA genes that differ by nucleotide substitution or deletion-insertion at 107 of 2059 po- sitions. The two types of genes are preferentially expressed in different stages in the life cycle of the parasite (GUNDERSON et al. 1987). The halophilic archaebacterium, Haloarcula maris- mortui, contains two nonadjacent ribosomal RNA op- erons in its genome (MEVARECH et al. 1989). These two operons, designated rrnA and rrnB, were previ- ously cloned onseparate genomic restriction frag- ments. Preliminary characterization by restriction en- zyme analysis indicated that the two copies of the 16s and 23s rRNA genes differed at a number of posi- tions. Furthermore, analysis of the 5”flanking region indicated that the rrnB operon was unusual compared to otherhalophilic rRNA operons in that it contained

Transcript of Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s...

Page 1: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

Copyright 0 1992 by the Genetics Society of America

Sequence Heterogeneity Between the Two Genes Encoding 16s rRNA From the Halophilic Archaebacterium Haloarcula marismortui

Shanthini Mylvaganam and Patrick P. Dennis

Program in Evolutionary Biology, Canadian Institute for Advanced Research, and Department of Biochemistry, University of British Columbia, Vancouver, British Columbia, Canada V6T 123

Manuscript received July 8, 199 1 Accepted for publication November 14, 1991

ABSTRACT The halophilic archaebacterium, Haloarcula marismortui, contains two nonadjacent ribosomal RNA

operons, designated rrnA and rrnB, in its genome. The 16s rRNA genes within these operons are 1472 nucleotides in length and differ by nucleotide substitutions at 74 positions. The substitutions are not uniformly distributed but rather are localized within three domains of 16s rRNA; more than two-thirds of the differences occur within the domain bounded by nucleotides 508 and 823. This domain is known to be important for P site binding of aminoacylated tRNA and for 30-50s subunit association. Using S1 nuclease protection, it has been shown that the 16s rRNAs transcribed from both operons are equally represented in the functional 70s ribosome population. Comparison of these two H. marismortui sequences to the 16s gene sequences from related halophilic genera suggests that (i) in diverging genera, mutational differences in 16s gene sequences are not clustered but rather are more generally distributed throughout the length of the 16s sequence, and (ii) the rrnB sequence, particularly within the 508-823 domain, is more different from the out group sequences than is the rrnA sequence. Several possible explanations for the evolutionary origin and maintenance of this sequence heterogeneity within 16s rRNA of H. marismortui are discussed.

I N virtually all species, the sequences of multicopy rRNA genes are identical or nearly identical. The

molecular mechanism responsible for maintaining this sequence homogeneity has not yet been elaborated, although the necessity for such a mechanism seems obvious (DOVER 1982, 1987). The primary nucleotide sequence of rRNA is essential for maintaining second- ary and higher order structural features and for highly specific interactions with a multiplicity of other com- ponents of the translation apparatus including (i) ri- bosomal proteins; (ii) translation initiation, elongation and termination-release factors; (iii) aminoacylated, peptidyl and deacylated tRNAs; and (iv) different mRNAs. (For reviews, see NOLLER et al. 1990; BRI- MACOMBE et al. 1990.) Changes in rRNA gene se- quences that are fixed at all loci are usually considered to be either nearly neutral or are balanced by com- pensatory changes. Compensatory changes can be in- tramolecular, altering a second site within the same molecule or intermolecular, altering a nucleotide or amino acid site within an interacting component of the translation apparatus. Intramolecular compensa- tory changes within rRNAs have been documented in detail (WOESE et al. 1983). Fixation of mutational changes at all rRNA loci allows optimization of the translation apparatus according to a unique set of

One of us (P.P.D.) would like to dedicate this paper to MEL GLEITER on the occasion of his retirement.

Genetics 130 399-410 (March, 1992)

species specific criteria and accounts at least in part for the divergence in rRNA sequences between even closely related species (DOVER 1982, 1987).

There are a number of examples where more than one small or large subunit rRNA encoding gene from the genome of an organism has been analyzed. In most cases, the sequences are as anticipated, either identical, or exhibit a low level of heterogeneity at about 0.1% of the nucleotide positions (DRYDEN and KAPLAN 1991 ; HEINONIN, SCHNARE and GRAY 1990; MADEN et al. 1987). A notable exception to this rule occurs in the eucaryotic parasite Plasmodium berghei. The genome of this organism contains two types of small subunit rRNA genes that differ by nucleotide substitution or deletion-insertion at 107 of 2059 po- sitions. The two types of genes are preferentially expressed in different stages in the life cycle of the parasite (GUNDERSON et al. 1987).

The halophilic archaebacterium, Haloarcula maris- mortui, contains two nonadjacent ribosomal RNA op- erons in its genome (MEVARECH et al. 1989). These two operons, designated rrnA and rrnB, were previ- ously cloned on separate genomic restriction frag- ments. Preliminary characterization by restriction en- zyme analysis indicated that the two copies of the 16s and 23s rRNA genes differed at a number of posi- tions. Furthermore, analysis of the 5”flanking region indicated that the rrnB operon was unusual compared to other halophilic rRNA operons in that it contained

Page 2: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

400 S. Mylvaganam and P. P. Dennis

only a single recognizable promoter, and lacked a precursor processing site within the primary tran- script. The more typical rrnA operon was preceded by four tandem transcription start sites and a well conserved precursor processing site. Using S1 nu- clease protection, it was possible to demonstrate that primary transcripts were produced from each operon. In this manuscript, the complete 16s gene sequences from both the rrnA and rrnB operons are presented and compared to the 16s rRNA sequences from re- lated halophilic genera. These two sequences differ from each other by nucleotide substitution at 74 of 14'72 position. In addition, we show that the 16s rRNA sequences derived from both operons during exponential growth are assembled into 30s ribosome subunits which are unrestricted in their ability to form 70s ribosomal particles.

MATERIALS AND METHODS

Strains and growth conditions: The two halophilic ar- chaebacteria, Halobacterium cutirubrum and Haloarcula mar- ismortui were used in this study. Hb. cutirubrum and Hb. halobium are now considered independent isolates of a single species, Hb. salinarium; Ha. marismortui was formerly called Halobacterium marismortui (LARSEN 1984). Hb. cutirubrum was cultured in the basal salts medium of BAYLEY (1971) supplemented with glycerol (0.4%) and casamino acids (2%). Ha. marismortui was cultured in the enriched medium of OREN, LAU and FOX (1988). Cultures were incubated at 37- 39' and cell density was monitored as A600 ",,,; for RNA or ribosome preparation, cells were harvested in mid logarith- mic phase (A600 ",,, 5 0.5).

Ribosomal RNA operons: The two clones, pHC8 (labo- ratory strain number PD1021) and pHHlO (PD1022), are pBR322 derivatives containing, respectively, a 8.0-kb HindIII-ClalI fragment (derived from a 20-kbp Hind111 fragment) and a 10-kb Hind111 fragment from the genome of Ha. marismortui (MEVARECH et al. 1989). It has been shown by southern hybridization that each fragment con- tains a complete and distinct rRNA operon. The plasmid pHC8 contain the rrnA operon and the plasmid pHH10 contains the rrnB operon. The clone p4W (PD655) contains a 7.5-kb KpnI-Bgl%Ifragment from the genome of Hb. cutirubrum inserted between the EcoRI and BamHI site of pBR322; this fragment contains the Hb. cutirubrum single copy rRNA operon (HUI and DENNIS, 1985). The rrnA and rrnB 16s rRNA genes were sequenced on both strands using the dideoxy chain termination method (SANGER, NICK- LEN and COULSEN, 1977). The accession numbers for the rrnA and rrnB 16s genes are M27042 and M27043, respec- tively.

Nuclease S1 protection analysis: Total RNA from Hb. cutirubrum and Ha. marismortui was isolated as described previously by cell lysis with buffer containing sodium dode- cy1 sulfate, phenol extraction, ethanol precipitation, and CsCl centrifugation (CHANT and DENNIS 1986). To obtain RNA from 70s ribosomal particles, bacteria were concen- trated 20-50-fold by centrifugation, resuspended in ribo- some buffer (3.4 M KC], 100 mM MgCI2, 6 mM 2-mercap- toethanol and 10 mM Tris-HC1, pH 7.6; SHEVAIK et al. 1985), and disrupted by passage through a French pressure cell. The lysates were centrifuged at low speed (1 0,000 X g, 10 min) to remove debris, and 70s particles were isolated

by sedimentation through a 5-30% sucrose density gradient (SW27, 27K, 6 hr, 10') in ribosome buffer. The RNA was obtained from the pooled 70s fraction by phenol extraction and was purified by ethanol precipitation and CsCl centrif- ugation.

Nuclease S1 protection experiments were carried out as described previously (CHANT and DENNIS 1986). Briefly, the homologous but nonidentical 272 nucleotide long SacII- SmaI fragments (nucleotide positions 463-734 in 16s se- quence) were isolated from the Ha. marismortui pHC8 and pHHl0 clones and from the Hb. cutirubrum p4W clone. The fragments were 5' end labeled at the SmaI site on the minus (-) DNA strand (nucleotide position 736) using pol- ynucleotide kinase and [y-32P]ATP. Approximately 20-30 ng of the respective fragments (DNA excess; between lo5 and 1 O6 dpm per assay) were hybridized to 50 to 200 ng of total RNA or of RNA isolated from 70s ribosomes in S1 hybridization buffer for three hours at temperatures be- tween 50 and 60". Hybrids were digested with 200-500 units/ml of nuclease S1 at 32-35" for 30 min, and protected products were analyzed for length by electrophoresis in 8% polyacrylamide-urea sequencing gels. As a negative control, total yeast tRNA from Saccharomyces cermisiae was used in place of Ha. marismortui or Hb. cutirubrurn RNA to protect the end labeled DNA fragments. In these experiments, the distribution and intensity of partial protection products were sensitive to (i) the hybridization temperature, (ii) the S1 concentration, and (iii) to a lesser extent, the S1 digestion temperature. The conditions used in Figure 6, although not necessarily optimal for each combination of RNA and DNA, are illustrative of typical optimum results.

Analysis of 16s rRNA sequences: Introduction of nine separate sites of single nucleotide insertion or deletion were required to achieve end to end alignment of the five 16s gene sequences from the four related halophilic species: rrnA and rrnB 16s rRNA sequences from Ha. marismortui and single 16s RNA sequences from Hb. cutirubrum, Halo- ferax volcanii and Halococcus morrhuae (HUI and DENNIS 1985; GUPTA, LANTER and WOESE 1983; LEFFERS and GAR- RETT 1984). The precise position of each nucleotide inser- tion or deletion was chosen to maximize nucleotide identities between the five sequences in the surrounding region. Ev- olutionary divergence between any two of the set was esti- mated as P, the proportion of observed nucleotide differ- ences at aligned positions. No attempt was made to correct these P values for multiple substitution events at the same site. Of the nine gap positions, six are phylogenetically uninformative. The two at positions 970 and 1405 group Hb. czltirubrum with Hf: volcanii and Ha. marismortui with Hc. morrhuae. The last at position 1233 distinguishes the rrnA and rrnB Ha. marismortui sequences from the three out group sequences. For the comparison between any two sequences, sites of deletion or insertion of single nucleotides required to maintain the alignment were omitted in the P value calculation.

The divergence of the Ha. marismortui rrnA and rrnB 16s gene sequences (or portions thereof) were analyzed by comparison to the 16s gene sequences from the three out group species HJ: volcanii, Hb. cutirubrum and Hc. morrhuae (GUPTA, LANTER and WOESE 1983; HUI and DENNIS 1985; LEFFERS and GARRETT 1984). Using normal approximation and an easy upper estimate for the variance of PBC - PAC, it can be shown that a 1 - (Y confidence interval for PBC - PAC

(the theoretical, as opposed to the observed, difference in the proportion of sites which are not identical between B and C

Page 3: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

rRNA Sequence Heterogeneity 40 1

and those which are not identical between B and C) is

P B C - P A C f Za/2 ( P B C + P A C - 2 p B C I l “ ) ” z / ( L ) ’ ’ 2

or

P B C - P A C f za/2 u/(L)’”

where

PBC is the proportion of sites that differ between sequences B and C;

PAC is the proportion of sites that differ between sequences A and C ;

Z,,, is the cu/2 cut-off value of the standard normal distribution ( i e . , for a 1 - a = 99% confidence interval, zo.00~ = 2.58);

P B c m C is the proportion of sites that differ both between B and C and between A and C ;

u2, the variance is estimated as (PBC + PAC - 2 P B C m C ) ;

L is the number of sites in the compared sequences.

In the analysis, A denotes the rrnA sequence; E denotes the rrnE sequence; and C is one of the three out group sequences. When the confidence interval for pBc - pAc does not contain the value 0, then the value of the difference, pBc - pAC is significantly different from 0 to a confidence level, 1 - a.

If the common ancestral 16s sequence of rrnA and rrnB is not shared by an out group species, the relative rate equation of SARICH and WILSON (1973) can be applied. These equations provide an estimate of the number of substitutions occurring in the rrnA and rrnB 16s sequence (or portions thereof) since their divergence from the com- mon ancestral sequence. The equations are

P A 0 = ( P A C + P A B - p B C ) / 2

PBO = ( P B c + PAB - P,+c)/2

P C 0 = ( P A C + P B C - p A B ) / 2

where the P A B , P B C and PAC variables are the proportion of nucleotide differences between sequences A and B, B and C, and A and C. A graphic representation of these equations is presented in Figure 5 , I . If the basic assumption is correct, the solution of these equations estimates the evolutionary distance between sequences A, B and C, and sequence 0 with the root of the tree being in the C-0 branch. For the present case, A is the rrnA sequence, B is the rrnR sequence and C is the sequence from one of the out group species, Hb. cutirubrum, Hf: volcanii or Hc. morrhuae.

RESULTS

It has been previously demonstrated by Southern hybridization that the genomic DNA from a single colony isolate of Ha. marismortui contains two rRNA operons (MEVARECH et al. 1989; our unpublished results). One operon, present on an 8.0-kb HindIII- ClalI genomic restriction fragment (derived from a larger 20-kbp Hind111 fragment), is designated rrnA and the other operon, present on an 10-kb Hind111 genomic fragment, is designated rrnB. Primary tran- scripts from both operons have been detected by S1 nuclease protection of DNA fragments that overlap the mature 16s and 23s rRNA coding regions of the two operons.

The complete nucleotide sequence of the rrnA and rrnB 16s rRNA genes has been determined (Figure 1). The two sequences are 1472 nucleotides in length. There are no nucleotide gaps in the alignment of the two sequences; however, the two 16s sequences differ by nucleotide substitution at 74 positions. These dif- ferences are not randomly distributed but rather are confined with a single exception to three domain regions within the universal secondary structure model for small subunit rRNA (WOESE et al. 1983). In the Ha. marismortui sequences, these domains are bounded by nucleotide base pairs at positions 56-32 1, 508-823 and 986-1158, and contain 10, 51 and 12 nucleotide differences, respectively. The correspond- ing Escherichia coli domains are bounded by nucleotide base pairs at positions 59-353, 570-880 and 1046- 121 1. The single difference between the rrnA and rrnB sequences that occurs outside of these domains is located within an apical tetra loop at position 12 16 (E. coli position 1258).

The sequences and universal secondary structure for the more variable 508-823 domain are illustrated in Figure 2. Most of the nucleotide differences be- tween the rrnA and rrnB sequences are located within certain helical regions of RNA secondary structure and are compensatory, affecting both components of a nucleotide base pair. Other helical regions (for ex- ample, those bounded by nucleotides 61 2-67 1 and 708-749) and most single-stranded loop and bulge regions are nearly free of substitutions.

Expression of rrnA and rrnB 16s rRNAs: Protec- tion against nuclease S 1 digestion of end-labeled DNA fragments was used to demonstrate that both rrnA and rrnB 16s rRNA sequences are present in active 30s ribosome subunits. For the analysis, RNA was obtained from 70s ribosome particles isolated by su- crose density gradient sedimentation. In E. coli and presumably also in halobacteria, only 30s subunits that are correctly assembled and active in translation are capable of associating with 50s subunits to form 70s particles (TAPPRICH et al. 1990). The DNA probes were two homologous but nonidentical 272-bp SacII- SmaI fragments that overlap a portion of the 508- 823 variable domain (nucleotides 462-734 in Figure 1) and were isolated from the pHC8 and pHHlO clones. As a control, the homologous fragment from the clone p4W containing the single copy 16s rRNA gene of Hb. cutirubrum was also used (HUI and DENNIS 1986). The fragments were 5’-end labeled in the (-) strand (position 736) with polynucleotide kinase and [y-’*P]ATP. The labeled DNA fragments were hybridized to RNA and the resulting RNA-DNA hy- brids were treated with S1 nuclease. The protected DNA fragments were separated by acrylamide gel electrophoresis and visualized by autoradiography (Figure 3).

Page 4: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

402 S. Mylvaganam and P. P. Dennis

I 6s rnnn 2 0 10 60 80 IO0

#no - - - - - - - - - - . - - - - - - - - - - - - - ~ - - - - - - - - - - ~ " - - - - - - - c ~ ~ ~ - - - ~ ~ ~ ~ ~ - ~ - ~ ~ ~ - - ~ ~ - c ~ - - - - - ~ ~ ~ - - - n - - - - ~ ~ ~ c - - - - - - - ~ - - &U - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ - - - - - - - - - - ~ - ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ - - - - c ~ - ~ ~ ~ - ~ ~ ~ ~ ~ - - - ~ - ~ - ~ ~ ~ ~ - nUo - - - - - - - - - - - - - - - - - - - - - - -1" - - - - - - - -~ - -c - - - - - - - - - -~ - - - - - - - - - -~~ - - - -~~~-c .~ - -~ -~ - - - -~ -~ -~~~~- - - - -~~~~~- #*a r r d n T T c c c c T l c n ~ c c T c c c c c ~ c ~ n ~ r c c ~ n ~ c c e c ~ c c ~ ~ ~ ~ n ~ c n ~ c c ~ n c ~ c ~ n ~ c ~ ~ ~ n c ~ c c c ~ t t c n ~ n ~ n c c ~ c f f i ~ m c n c c ~ c #no rr& ....................................................................................................

I 2 0 I I O I 6 0 I80 200 #,e -~------~----cl~-----~----~----------------~----cc-n----c~~~-~----------n.~~--n---c~------c----c--- He" ""- #"e ------------------~~~-----------------------~~~---~~c---c~~~c-----c~---n------c~cccc------c--~-----

~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ - ~ n ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ f ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~

H ~ O c r M c c c n n n c T ~ c c ~ n c n c R f c ~ c n ~ n ~ c ~ c c t t n ~ c ~ c n t t c c n n ~ n c ~ c n ~ n ~ n n c ~ c ~ c n ~ c ~ ~ c c f f i ~ ~ c c n c ~ ~ ~ n c n n n c c ~ ~ c c t t c c c #.q ,,& ----------------------~----------------~-----~~~~~---~-~~~~----~~--c--------~---------------~-~~----

nm0 ~-c----c---n--------.------~-----------------------c------------------------------------c----------- 210 260 280 300

#e" CAC"""C""""T""""""C""""""""""c""c""""""""""""""c"c"""""- #"e . - - - - - - - - - . - - - - - - - - - - - - - - - - - - - - c - - - - - - - - - - - - - - ~ ~ ~ ~ - ~ - - - - - ~ ~ - - - - ~ ~ - ~ ~ ~ - ~ ~ ~ - ~ - - - ~ - - ~ ~ ~ ~ ~ ~ ~ c - - - - - - - ~ ~ ~ - HBO r r d l c T n c c n T c T c c c l t e c G C c c n ~ ~ n c c ~ n c n ~ c c ~ ~ c c ~ n ~ c c c c c n c c n ~ ~ c e ~ n n ~ c c c ~ n c ~ c ~ ~ c ~ c n c n c c n n c n n ~ ~ c c n ~ c t t ~ n ~ #BO rcn) ---------------------------------------------~~~~~--~~~~~~~.-~~~~~~~~--~~--------~~~~~~~-----~~~~~~-

320 310 360 380 100 #Bo - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ ~ ~ ~ ~ - - - - - ~ ~ - ~ c - - ~ - ~ ~ - - - - ~ - - ~ - - - c ~ ~ - - - ~ ~ c ~ - - - - - - ~ ~ c ~ - ~ - ~ c v ------------~--------------.--.------------------------l----n---------n-----"-.--------n------c---- nv0 - - - - - - - - - - - -~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -~ -~ - - - - -~~~~~-c~- - -~ -~ - - -~ - - -~~~c~~- - - - -c - -~ - - - - - -~ - -~ - #Do r r d cTtf lcf lcnncnTncccctccc~nccccccccn~nttcccc~ncc~~~ncnc~ccncc~nc~cccn~ncttccnc~cccnc~c~ecttcn~n~nccc H.0 ,-r& """""""""__"""""""""""""""""""""""""""""""""""""""" #00 - - - - - - - - - - c ~ c - - - - - - - - n n - - - c ~ c ~ - - - ~ " - - ~ ~ - - - - - - ~ ~ ~ ~ ~ - ~ ~ ~ ~ ~ ~ ~ ~ ~ - ~ - - ~ ~ ~ - ~ - - - - - - - - - ~ . c - - - - - - - - - n - - - ~ - - ~ - - 140 1M l o o 500

HC" l"n""-~-Rf""""""G"E"c""-"""""""""""""""""""""""""""""c" #"e - ~ ~ ~ ~ ~ ~ ~ - ~ - ~ c ~ ~ - - - - - - - c - - ~ t " . ~ - ~ ~ - ~ ~ ~ - ~ ~ - ~ ~ - - - - - - ~ ~ ~ - ~ - - - - - ~ ~ ~ ~ - - - - - - - . ~ ~ - - - ~ ~ ~ ~ . c ~ ~ ~ ~ - ~ - - - ~ ~ ~ ~ ~ - ~ ~ - #Bo r r d c f c c c T T T T c T c ~ n c c c T n n c c ~ c c ~ ~ n ~ n ~ n n c c R f ~ t t c c n n c n c c c c ~ c c c n c c c c c c c c c c ~ n n ~ n c c c c ~ c ~ c c e c ~ c n ~ c c ~ c n ~ n ~ ~ #BO #-,-"E """"""""""""""""""""""""""""""""""""""""""""""""""

220

120

520 H.0 """"""""""""""""-G"""~c""""c"c""""-~"~"~""~~""""c~~~"""""""

510 560 580 600

nSu -----------------------1"--l---------------------lclc----l----c-cn--..----------------l-n-------n----

HBO PPM n l r c c c c c T n n n c c c T c c c T ~ c c c c c c c ~ c n n c ~ c c c ~ ~ t t c n n n ~ c c ~ c c c c ~ c n ~ c c c ~ c c c . c c ~ c c ~ c c c n ~ c ~ c ~ c c c ~ ~ ~ c c ~ c c c c #"e - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ c ~ n - c - - ~ - n - c - - - - - ~ - ~ c c - c ~ - - ~ ~ ~ ~ - ~ c - c ~ - - . . - - - - ~ - ~ - ~ - - - - c ~ ~ ~ - - - - - - - - n - - - - #0a prn8 """""""""""~ Il--l-uI1------n-------------cB-------~-----.-----c-I-------~-n-----------n

#Bo ~ - - - - - n - - - n - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ - - - - ~ ~ - - - - - - - ~ ~ ~ ~ ~ ~ ~ - ~ - c - ~ ~ ~ ~ - ~ ~ - - - ~ ~ ~ - ~ - - - - - - - - - - - - - - - - - - 620 610 660 680 700

He" """""""""-~""""""""""""""""""~"~~"-c""~-~""""GR""-~""-~""" #eo - - - ~ - T C - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - C - - - - - - - ~ ~ ~ ~ - - - - - ~ - ~ ~ ~ - - ~ ~ ~ - - ~ - ~ - - - - - - ~ ~ ~ . - - - - - - - ~ ~ - - - - - - - - - - - 1/00 r r d f l f l c n c c T c n c c c c T R e c T c c c c c c ~ n c c n c ~ c n n n ~ c c ~ c ~ ~ ~ c c ~ c c n c c c ~ c ~ c ~ ~ ~ c ~ n ~ c ~ c ~ c n c c n f f i n c c c ~ c c c n c c c ~ c n c c #DO rr& ~ - - - - ~ - ~ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - . - - - - ~ ~ ~ ~ ~ - - ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ - - ~ ~ ~ - ~ . - ~ - ~ ~ ~ - - ~ c - ~ ~ - - ~ ~ - - - ~ - - - ~ ~ -

720 710 760 I80 MI0 H.0 """""T"""""""""""""""""n"""""""""""""-c"-l"""""-n"""l"- #e" """"""""""""""""""""""""""""""lc"""""-c"c"""""""""-l"- nvo ----.-----.----------------.."--------------------------------------------n-nc--------------l-l--l---

wBo ,,,,# ---------~------------------------------------~---------n-"-----"---n--x-~~c--c-l----~-x-n--I--- HDO P P ~ R c n c c f l f l n c c ~ n c c c T c T c c n ~ c ~ n ~ ~ n c n ~ ~ c c c c c ~ n c ~ c c ~ n c c ~ c ~ n ~ c c n ~ ~ ~ c c c ~ n c c ~ c ~ c c c c ~ f f i c c ~ n c c n ~ c ~ c c ~ ~ ~ c c c

820 810 860 ' I 8 0 900 ,qB0 t"""""""""""""""""""C""""""""""""""""""-l""""""""""" #.$" G"""""""""cA"""""""""""""""""""""""""-,""-l"""""""""""

#*a rrn8 G"-1""a"-T""""""""""""""""""""""""""""""""""""""""""" H D ~ ,,"a c T n c c c n n c c c c n c n n W c f f i c c c c c ~ c t t n n c ~ n c c ~ c ~ ~ n n c c n ~ c ~ n c ~ ~ m n t t n n ~ ~ ~ c t t c nGenccncnncccGRcec~~cccc~~ #"e ~~"""""""~~""""""""""~~""""""""""""""~~""~~"""""""""""

#BO """""""""""""""~~~""""~~~~~~"""""~~~"~~~~"~."~"."~""""""""""" 920 910 960 I O 1000

#e" -----------------------------~-c----------.----~~~~~----------~---c---c-cc--c------------------.------ nu0 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ~ - c ~ - - - - - ~ ~ c - . - - c ~ ~ ~ ~ ~ ~ ~ - ~ ~ - - ~ ~ - - ~ - - - c - ~ ~ ~ - n - - - - - - - c ~ n - - - - - - . - - - - - - - - - - - - HBO PPM T n n T T c c n c T c n n c c c c c c n c n ~ c ~ c ~ c c c ~ c c c ~ c ~ ~ n . c ~ n n ~ c n c n c ~ c n ~ ~ ~ c n c c n c ~ ~ ~ n . c ~ c ~ c c c ~ ~ ~ c n c f f i c ~ c ~ c c n ~ c ~ c c H.0 ern# """""""""""""""""""""."""""""""""""~."""""""""""""""n

1020 I010 I060 1080 I 100 #Do """"""""""""""""""""~~~""""""~~~~"~~""""~~~~~""~~-~~~~~""""l"~

nuo """""""""""""""""""""""""""""""~""""~~c-~~."c~~~~~c""""l"- H~~ -.....---.-.---------------------------------""-----"" c"-n---------c-------lc-----c--------l---

#,e rrnn c c c r c n c c T c c ~ n c c c T c n c c c c ~ c c ~ c ~ ~ n ~ ~ c n c c c n n c c ~ c ~ c n c c c c c ~ ~ ~ c ~ ~ ~ ~ c c c n c c n n ~ n c c c ~ ~ c n ~ ~ ~ ~ ~ c c c ~ f l c f l c ~ f l c H , ~ rr& -----------------------------------------------------n-------------------c----~--~---~--------------

1120 I I10 I I 6 0 I180 I200 #Bo -~n--------lC-----~~-------------------------------------------~c------c-------------------C~------ HC" .~""""""~~""""""""""""""""""""""""~~~"""""""""""-lce-"--- #"e ~--------- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -~----~~~------~~----~~-~~~~----~-------- - - - - - - - - - -~ccn----n- n,,, r r ~ c n t c n c ~ c c c c c ~ c c ~ n n n c c c c ~ c m c ~ n ~ c ~ n ~ c c ~ n c c ~ c n c ~ n ~ c c c c c c n n ~ t t n c c c c c c m c n c c ~ c ~ ~ ~ n n ~ c ~ ~ n ~ c n c n c ~ #,o rm& """""~"~""8z""""""~""""""""""""""""""""""""""""""""".

1220 1240 I260 I280 1300 H , ~ ----C--T--C--------~-------------~~~~~~~~----------~~~-~-----~--------R------------------------------ nCu - - - -n--c--~--------~-~-------- - - -~---~~c~n---~~~------~~-~-------- - -~~----~-------- - - - - - - - - - - - - - - - - - - #"e - - - ~ - - - ~ - T C ~ . - - R - - R ~ R " - - - - - - - - - - ~ ~ - - - ~ ~ ~ ~ - ~ ~ ~ - ~ ~ - - - - - ~ ~ ~ ~ - - - - - - ~ - - ~ ~ ~ - ~ - - ~ - - - - - - - - - - - - - - - - - - - - - - - - - - n.0 P P ~ A t c ~ n r ~ n n c c c c c f f i n c c c c n c ~ ~ n n ~ c ~ c c . n ~ c c ~ n c ~ c c ~ n c ~ ~ c c c n ~ ~ c c c ~ c ~ c n ~ c c c c ~ c ~ n ~ c n ~ c l c c n ~ ~ C C C l n C l ~ n l C ~ #,* rm8 """""""-~""""""""~.""""""""""""""""""""""""""""-"----"--

1320 1310 I360 1380 1100 #,o ~~-~"""-~-T"""""""""""""""""""""~~"""""~"""""""-~-cc~-~"""- HS" """"c""""""""""""""""""""""""~"""""""~""""""~"".-~l""n- nuo -~-r---n~--n-~----------------------------------~~---~---~-~--------n---------------~--cfl.-Cl------ #,a rmd c ~ T ~ T c n ~ n n c c G c c c c c ~ c ~ l ~ c ~ c c c ~ c c ~ c c ~ ~ ~ n c n c n c c c c c c c ~ c n n n c c ~ c c c n c ~ c t t c ~ c c c c n l c n t t c c c l c n l c ~ C G R C C C l C C ~ #,* rr& """""""""__""""""""""""""""""""""""""""""""""".-------"-

1120 I410 I 1 6 0 #Bo ---------~----------~--------------------------c--------c---------------- nCu ----.------.--------~--------------------------c--------c---------------- #"O "".""T""""""""""""""""""~""""C"""""""" nag r r ~ n ~ c T c c c c T c c t E n n c c t t c c ~ ~ n n c ~ c c ~ n n c n n c c ~ f f i c c c ~ f f i n ~ n n ~ c ~ c ~ c c c ~ ~ c n ~ c ~ c ~ c c ~ #,o ,,d __"""""""""""""""""""""""""""""""""""

FIGURE 1 .-Nucleotide sequence and align- ment of the 16s rRNA encoding genes. The com- plete nucleotide sequence of the Ha. marismortui 16s rRNA encoding gene from the r m A operon is depicted (Hma rrnA). Below is the rrnB 16s rRNA encoding sequence (Hma rrnB); only nucle- otides that differ from the rrnA sequence are in- dicated. Substitutions in the rrnB sequence that are compensatory, affecting both components of a base pair in a region of RNA secondary structure, are underlined. For comparison, the entire 16s sequences from Hf: volcanii (Hvo), Hb. cutirutrum (Hcu) and Hc. morrhuac (Hmo) also are included; again, only nucleotides that differ from the corre- sponding nucleotides in the rrnA sequence are indicated. Dashes (- - - - -) indicate nucleotides identical to the rrnA sequence; dots (*) indicate single nucleotide gaps in the sequence(s) required to maintain alignment and are the consequence of deletion or insertion events during evolutionary diversification. The Sac11 and Smal restriction sites bounding the fragment used for S1 nuclease pro- tection experiments are located at positions 463 and 734 respectively.

Page 5: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

rRNA Sequence Heterogeneity S6, S11, S15, S21

2

403

610 - G A U A

660

C A A

c u

I I

A-U f l * G - C * U C * U - A * G

U * C - G * A C-G

A A - 6eo A G 0 A.0

600 - A * G 0 A * c G . A o G * R

G-C C-G E - G G-C G-C

560 seo G G-C I U G G C G I U G

U P ! ! 1 : : 1 u A ' C G C G U C

a * G C C A G C G G A C U G U ' C G G C ,,

986-1 t 58 REG I ON

508-823 REG I OM - -1216

Sll, s21 I A G A I

G- C G- C

720 C-G 'A A C-G G

C U

G U -710

A

u G O U G- C G- C A- U

G

C .

520 - U A

G C c / G

U

U- A C - G G- C

u c

* G

U G

760

C

U A A

A C

I I l l C G A G C G

u r c c a n

G O U U * G

A * G - C * U

U O G

* "

C-G G -C e20 u * G A G-C

I R

n G - C * U 505 - G -C

5 ' 3' G G

A A FIGURE 2.-Universal secondary structure for the 16s 508-523 domain. The Ha. masismortui 16s RNA folded into the universal secondary

structure is depicted in the upper right. The 58-321, 508-823 and 986-1 158 domains are shadowed. The secondary structure for the region bounded by nucleotide positions 508 and 823 for the rrnA sequence is illustrated. Mutational differences between rmA and rrnB ( 0 ) .

normal Watson-Crick (-), G-U ( 0 ) and A-G ( 0 ) base pairs are indicated. The boxed nucleotides correspond to the positions in E. coli 16s rRNA protected from chemical modifications by tRNA binding to the P site. Also indicated are the general regions that in E. coli 16s rRNA interact with the indicated small subunit ribosomal proteins S6, S8, S11, S15, S18 and S21.

In principle, a full length protection product is observed only if the DNA probe fragment is hybrid- ized to an RNA with perfect sequence complementar- ity. Partial protection products arise from S1 cleavage of the labeled DNA probe within regions of the RNA- DNA duplex that contain one or more mismatched base pairs. Furthermore, the thermostability of RNA- DNA duplexes with perfect sequence complementar- ity will be greater than the stability of duplexes con- taining a substantial proportion of mismatches.

In practice, both pHC8 and pHHlO derived DNA fragments exhibited intense 272 nucleotide long full- length products resulting predominantly from protec- tion by Ha. marismortui RNA with perfect sequence complimentarity. These full-length protection prod-

ucts were visible after hybridization at both 53 and 59" (Figure 3, lanes 1, 2, 7 and 8). Several clusters of partial length protection products were also visible following hybridization at 53" but were diminished in intensity following hybridization at 59" (compare lane 1 with 2, and 7 with 8). These partial length products, 54-63, 149-153 and 205-213 nucleotides in length, are produced by S1 cleavage of the probe in the vicinity of clustered nucleotide differences that exist between the rrnA and rrnB DNA sequences at posi- tions 674-683, 584-588 and 524-532, respectively.

As a control, heterologous RNA from Hb. cutiru- brum was hybridized to the Ha. marismortui pHC8 and pHH10 DNA probes at both 53 and 59" (lanes 3, 4, 4a, 9 and 10). At the higher temperature, very little

Page 6: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

404 S. Myhaganam and P. P. Dennis

205- 13 [ 174-84 [

149-53 [

65-78

54-63 I

"

"- I, -272 - " - - 220

0 0

0 -190 -

0

W -

-135

-105

- 78 - 75

FIGURE 3.-Ribosomal RNA pro- tection of end-labeled DNA frag- ments. The 272-nucleotide long SacllSmaI (nucleotide position 463- 734) fragments were isolated from the H. marismortui pHC8 (rrnA) and pHH 10 (rrnB) clones and from the clone p4W containingsingle copy Hb. cutirubrum operon. The fragments were 5"end labeled in the minus strand (position 736) and hybridized to Ha. marismortui RNA isolated from 70s ribosomes or to total Hb. cutirubrum RNA. Above each lane, 1-1 8, the source of RNA, the hybrid- iration temperature and the source of the DNA are indicated. Abbrevi- ations are as in Figure 1 . Details of the S1 protection assay are described in the MATERIALS AND METHODS. With the exception of lane 4a, the illustrated results are from a single experiment where 500 units of S1 were used for digestion at 35". For lane 4a, 300 units of enzyme at 34" were employed.

partial or full-length protection of the DNA probes was evident. At the lower temperature, however, par- tial protection products 65-78 and 171-187 nucleo- tides in length were clearly visible. Again, these prod- ucts are produced by S1 cleavage of the DNA probes in the vicinity of clustered nucleotide differences that exist between the Hb. cutirubrum and both the rrnA and rrnB DNA sequences at nucleotide positions 657- 672 and 555-566.

As a second control, the DNA probe from Hb. cutirubrum was hybridized to either homologous Hb. cutirubrum or heterologous Ha. marismortui RNA at 53 and 59". With homologous RNA, predominantly full-length product is visible at both temperatures whereas with heterologous RNA, little or no full-

length protection product was visible (lanes 13, 14, 15 and 16); however, at 53" heterologous RNA re- sulted in partial length protection products 65-78 and 17 1-1 87 nucleotides in length. These correspond to S1 cleavage of the Hb. cutirubrum DNA probe in the vicinity of clustered nucleotide differences that exist between the Hb. cutibrubrum and both the rrnA and rrnB DNA sequences at nucleotide positions 657- 672 and 555-566. Finally, for all three DNA probes, yeast tRNA failed to produce either full-length or partial length protection products.

These results clearly demonstrate that 16s rRNA sequences derived from both rrnA and rrnB 16s rRNA genes of Ha. marismortui are present in 70s ribosome particles. Particles 70s in size have initiated

Page 7: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

rRNA Sequence Heterogeneity 405

protein synthesis and are presumed to be active in elongation (TAPPRICH et al. 1990). At high stringency, both DNA probes exhibit substantial full-length pro- tection. At lower stringency, both DNA probes exhibit many of the partial length protection products ex- pected from S 1 cleavage within regions of RNA-DNA sequence noncomplementarity. The relative intensi- ties of the autoradiographic bands suggest that the rrnA and rrnB RNA sequences are equally repre- sented in 70s ribosomes.

We have attempted to verify this important result by using RNA template in a primer extension-reverse transcription sequencing reaction, but have been un- able to generate an accurate and reliable sequence, primarily because of RNA secondary structure and the high G + C content of the template RNA. OREN, LAU and Fox (1988) used reverse transcription to determine a portion of the 16s rRNA sequence of Ha. marismortui; however, their sequence contains a large number of gaps and unidentified and misiden- tified nucleotides; at 50 positions of rrnA and rrnB DNA sequence heterogeneity, where they were able to specify a base, 24 correspond to the rrnA sequence and 26 correspond to the rrnB sequence. This exam- ple clearly illustrates a potential error that can be encountered when using reverse transcriptase se- quencing of small subunit rRNA in phylogenetic and systematic studies.

Evolution and divergence of 16s rRNA sequences: To gain insight into the functional and evolutionary significance of these nucleotide differences, the two Ha. marismortui 16s rRNA sequences were compared to the 16s rRNA sequences from three closely related genera of halophilic archaebacteria: Haloferax vol- canii, Halobacterium cutirubrum and Halococcus mor- rhuae (GUPTA, LANTER and WOESE 1983; HUI and DENNIS 1985; LEFFERS and GARRETT 1984). Although the H. volcanii genome also contains two rRNA op- erons, the sequences of the two 16s genes are as expected, identical, or nearly identical. The other two species contain only single rRNA operons. In Figure 1, these 16s rRNA gene sequences from Hc. morrhuae, Hb. cutirubrum and HJ volcanii are aligned to the Ha. marismortui rrnA and rrnB sequences. Only nucleo- tides that differ from the rrnA sequence are indicated.

Analysis of the five pairwise sequence alignments is summarized in Table 1 and the distribution of differ- ences along the length of the 16s primary sequence for some of the comparisons is illustrated in Figure 4. The two most similar 16s sequences, exhibiting 5.0 substitutions per 100 nucleotides, are the rrnA and rrnB sequences of Ha. marismortui. All other pairwise comparisons show substantially less similarity, exhib- iting between 9.8 and 13.2 substitutions per 100 nucleotides. In both intra- and interspecies sequence comparisons, differences caused by transitions out-

number transversions by about two to one, and about 60% of the substitutions are compensatory, affecting both components of a nucleotide base pair in a region of RNA helical structure. In contrast to the intraspe- cies rrnA-rrnB case, the nucleotide differences in all interspecies pairwise alignments are not concentrated in specific domains but rather are more generally distributed throughout the 16s sequence (see Figures 1 and 4). For the three out group species, the 508- 823 domain exhibits essentially the same frequency of nucleotide substitution as the entire 16s molecule (1 1.7-12.0 vs. 11.4-14.2 substitutions per 100 nucle- otides; Table 1, A and B).

Taken together, these comparisons suggest the fol- lowing picture: the 16s rRNA genes present in the four related halophilic archaebacterial species, Ha. marismortui, Hf: volcanii, Hb. cutirubrum and Hc. mor- rhuae, would appear to have diverged from a common ancestral 16s sequence within a relatively short period of evolutionary time. It is not clear whether the an- cestral halophilic 16s sequence was present in one copy per genome in the ancestral organism and sub- sequently duplicated in the Ha. marismortui and Hf: volcanii lineages or, vice versa, was present in two copies per genome in the ancestor and subsequently reduced to one copy by deletion in the Hb. cutirubrum and Hc. morrhuae lineages.

The intraspecies divergence of the rrnA and rrnB 16s sequences is more difficult to understand and explain. This is because intracellular and intraspecies processes such as selection, recombination, conver- sion, etc., have impeded divergence and contributed to homogenization over all or part of the 16s gene sequences since their initial duplication; however, based on nucleotide sequence similarity alone, it would appear that the divergence of the rrnA and rrnB 16s sequences from each other is more recent than their divergence from the other three sequences.

When the 508-823 domains of rrnA and rrnB are compared to the corresponding domains in the three out group species, several features emerge. First, the number of substitutions between rrnA and rrnB in this domain is greater than the number of substitu- tions in the corresponding domain comparisons be- tween the other three species (i.e., 16.1 vs. 11.4-14.2 substitutions per 100 nucleotides, Table 1 B). Second, pairwise comparisons of the 508-823 domain of rrnA and rrnB with the corresponding domain for the three out species indicate that the rrnB domain is more different than is the rrnA domain (15.8-18.7 us. 9.8- 13.0 substitutions per 100 nucleotides, Table 1B). For the comparisons to the Hb. cutirubrum and Hc. mor- rhuae sequences, these differences are significant to greater than the 99% level for both the 508-823 domain and for the entire 16s molecule. (For an

Page 8: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

406 S. Mylvaganam and P. P. Dennis

TABLE 1

Comparison of the nucleotide sequences of the 16s rRNA and the 508-823 16s domain from halophilic archaebacterial species

Nucleotide differences

Comparisons L Ts TU

A. Complete 16s sequence rrnA/rrnB rrnA/Hvo rrnB/Huo rrnA/Hcu rrnB/Hcu rrnA/Hmo rrnB/Hmo HvolHcu HvolHmo HculHmo

rrnA/rrnB rrnA/Hvo rrnB/Huo rrnA/Hcu rrnB/Hcu rrnA/Hmo rrnB/Hcu HvolHcu HvolHmo HculHmo

B. 508-823 domain

1472 1470 1470 1471 1471 1471 1471 1471 1470 1470

316 316 316 316 316 316 316 316 316 316

54 118 124 94

120 106 128 121 107 115

34 29 34 24 40 16 31 40 29 29

20 54 60 50 62 56 66 56 66 51

17 13 16 8

19 15 26

5 1 1 7

Total

74 172 184 144 182 162 194 177 173 172

51 42 50 32 59 31 57 45 40 36

P ~

0.050 0.117 0.1 25 0.098 0.124 0.110 0.132 0.120 0.118 0.1 17

0.161 0.133 0.158 0.101 0.187 0.098 0.180 0.142 0.127 0.114

P,,,, 02 P,, - P,, i C.I.

0.098

0.088

0.099

0.073

0.066

0.070

0.048

0.046

0.034

0.145

0.156

0.138

0.008 f 0.1 5

0.026 k 0.014*

0.022 k 0.012*

0.025 f 0.055

0.086 k 0.057*

0.082 k 0.054*

The abbreviations used are as follows: L is the length of the nucleotide sequences compared; Ts is the number of transition substitutions; Tu is the number of transversion substitutions; Total is the total number of substitutions; P is the proportion of nucleotide differences in the two compared sequences; P A C ~ C is the proportion of sites that differ both between sequences B and C and between A and C. Sequence A is rrnA, B is rrnB and C is the out-group sequence. u2 = (PBc + PAC - 2Pscmc);pBC - PAC + C.I. = the difference in the proportion of nucleotide substitutions between sequences A and C, and B and C. The confidence interval (C.1.) calculated as ?Za,zu/(L)” is for a = 1% (;.e., Za/2 = 2.58). The asterisk (*) denotes confidence intervals that do not contain the value 0.

- Hcu

- HVO

300 600 900 1200 ij NUCLEOTIDE POSITION WITHIN 16s rRNA

FIGURE 4.-Distribution of nucleotide sequence differences within 16s rRNA. The number of nucleotide sequence differences over 10 nucleotide increments was determined for five different pairwise 16s sequence comparisons. The two sequences in each pair are indicated on the right; abbreviations are as in the legend to Figure 1 .

explanation of the statistical test, see MATERIALS AND METHODS.)

If the common ancestor of the rmA-rrnB sequence is more recent than the common ancestor sequence with any of the out group species, then the observa- tions above imply that within the 508-823 domain, the rrnB sequence is diverging more rapidly from the

three out group sequences than is the rrnA sequence. Using the relative rate equations of SARICH and WIL- SON (1 973), it is possible to estimate the rate at which the rrnA and rrnB sequences are diverging from a common ancestral sequence relative to a third se- quence. The results of these calculations are summa- rized in Figure 5; the point of trifurcation represents the common ancestral sequence of A ( r r A ) and B ( rrnB) . If the basic assumption holds ( i .e . , that rrnA and rrnB share a common ancestor sequence not shared with any of the outgroup sequences), it is clear from these estimates that the 508-823 domain of the rrnB 16s sequence is accumulating nucleotide substi- tutions about 1.5-fold more rapidly than the rrnA domain when compared to Hf: volcanii (tree 11), and approximately threefold more rapidly when com- pared to Hb. cutirubrum or Hc. morrhuae domains (trees 111 and IV). Alternatively, it is possible that the divergence of rrnA and rrnB occurred early and that the 16s rRNA sequences present in the three species, Hf: volcanii, Hb. cutirubrum and Ha. marismortui were derived from the rrnA lineage.

DISCUSSION

It is generally observed that multiple rRNA encod- ing operons in the genome of most organisms exhibit

Page 9: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

rRNA Sequence Heterogeneity 407

Hma rrnB

FIGURE 5,"Relative rates of nucleotide sequence divergence within the 508-823 16s rRNA domain. Relative rates of sequence divergence measured as the proportion nucleotide sequence difference within the 508-823 domain was calculated using the simultaneous equations of SARICH and W I ~ N (1 973) and described in MATERIALS AND METHODS. For the topological rep resentations in 11, 111 and IV, it is assumed that the intersection point, 0, represents the common ancestral sequence of rmA and rrnB 16s rRNA. The branch

111 Hcu IC) IV lengths (and corresponding numbers) represent the cal- culated number of substitutions (per 100 nucleotides) between 0 and the three contemporary sequences. To be valid, the roots of these trees must lie in the OC branch (see text). The species abbreviations are as in the legend to Figure 1.

f ima rrnS

perfect or nearly perfect sequence identity. The mech- anisms for maintaining this sequence homogeneity are not understood, but presumably they involve selec- tion, recombination or gene conversion-like events. These events maintain integrity and uniformity within the translation apparatus and contribute within a spe- cies to the evolutionary process referred to as molec- ular drive (DOVER 1982, 1987).

Sequence heterogeneity of rRNA: Several exam- ples of low level rRNA sequence heterogeneity in a variety of species spanning the entire evolutionary spectrum have been reported. In E. coli, which has seven rRNA operons in its genome, eight sites of sequence heterogeneity have been observed in the analysis of bulk 23s rRNA (BRANLANT et al. 1981). This corresponds to about one site per operon. Three rRNA operons from Rhodobacter sphaeroides have been sequenced. The three 16s genes were identical whereas the 23s genes exhibit a single nucleotide substitution in one gene and three single nucleotide deletions in a second gene. The 5s genes exhibited slightly more heterogeneity; one 5s gene differed at four positions from the other two genes (DRYDEN and KAPLAN 1990). The mitochondrial genome of Tetra- hymena contains two genes encoding large subunit rRNA; the two genes differ at 5 of 2595 positions, and both genes are expressed (HEINONIN, SCHNARE and GRAY 1990). Similarly, the analysis of a number of recombinant clones carrying human rDNA repeats has revealed a single site of heterogeneity in small subunit genes and several sites of length variation within simple sequence tracks in large subunit genes (MADEN et al. 1987). It has been suggested that these

Hma rrn6

length variants are the result of slippage during DNA replication. The regions in which they occur are ap- parently not critical for rRNA structure or function and are undergoing rapid evolutionary divergence within the mammalian group. None of the hetero- geneities in these examples are believed to have func- tional significance.

A more interesting example has been described in the blood parasite, Plasmodium berghei (GUNDERSON et al. 1987). This organism contains two distinctly dif- ferent types of small subunit rRNA genes that differ by substitution at 72 of 2059 (3.5%) aligned nucleo- tide positions. In addition, there are 35 gap positions caused by deletion or insertion events in the alignment of the two sequences. The two gene types are differ- entially regulated; one is expressed in sporozoites found in the insect host, whereas the other is ex- pressed in the asexual stage found in the blood stream of mammals. It has been suggested that this differen- tial expression may play a role in the types of protein synthesized in the different developmental stages in the complex life cycle of Plasmodium. More recently a second species, P. falciparum, has been analyzed (MCCUTCHAN et al. 1988). It also has two types of 18s rRNA genes and they differ at about 17% of their nucleotide positions; only one of the two genes is expressed in the asexual blood stream parasite. The type of rRNA expressed in the insect stage sporozoite has not yet been determined.

As with the two Ha. marismortui 16s sequences, the distribution of the differences between the two small subunit sequences of P. berghei is not random. Vir- tually all the substitution and gap differences fall

Page 10: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

408 s. Mylvaganarn and P. P. Dennis

within two domains of the universal secondary struc- ture model for small subunit RNA (GUNDERSON et al. 1987; WOESE et al. 1983). Interestingly, these two domains are the homologs to the 56-321 and 508- 823 domains in H. marismortui 16s rRNA that also exhibit sequence heterogeneity.

When the rrnA and rrnB sequences are compared to the 16s gene sequences from the related halophilic species, Hf: volcanii, Hb. cutirubrum and Ha. morrhuae, several unusual features become apparent. In the three outgroup interspecies 16s comparisons, differ- ences are distributed throughout the sequence; the 508-823 domain is no more sensitive to substitution than is the entire 16s gene (see Figure 4). When the rrnA 16s is compared to the three out group 16s sequences, it appears to be diverging in a normal fashion, and the 508-823 domain is in no way un- usual. In contrast, when the rrnB sequence is com- pared to out group 16s sequence, the 508-823 do- main is clearly diverging more rapidly than the rest of the sequences. Application of the relative rate test (SARICH and WILSON 1973) indicates that substitutions are accumulating in the 508-823 domain about 1.5- 3-fold more rapidly in the rrnB compared to rrnA. As an alternative explanation, the rrnA and rrnB se- quences may have diverged early with the other three 16s rRNA sequences being derived from the rrnA lineage.

Structure-function of 16s rRNA: Several struc- tural models which predict higher order RNA-RNA and RNA-protein interactions within the 30s subunit of the E. coli ribosome have been proposed (WOESE et al. 1983; STERN, WEISER and NOLLER 1988; NOLLER et al. 1990; BRIMACOMBE et al. 1990; OAKES et al. 1990). Because of evolutionary conservation, it seems likely that Ha. marismortui 30s subunits containing either the rrnA or rrnB 16s sequences will retain the important structural and functional features necessary for protein synthesis. None of the 74 nucleotide dif- ferences between the rrnA and rrnB 16s rRNA se- quences occur at positions that have been identified as functionally important for interaction with tRNA, mRNA, or factors during the protein synthesis cycle. Nor do the substitutions significantly alter predicted secondary structures or affect nucleotide sites pro- posed to be important for tertiary interactions. With the exception of the tendency of substitutions to clus- ter, particularly within the 508-823 domain, the nu- cleotide differences are characteristic of those seen between 16s rRNA sequences of closely related gen- era; none of the substitutions occur at nucleotide positions that tend to be highly conserved during evolution.

In Ha. marismortui rrnA and rrnB 16s rRNA, the two interrupted helical regions bounded by nucleo- tides 612-672 and 708-749 are invariant except for

a pair of compensatory substitutions occurring at po- sition 7 10/747 (Figure 2). The apical loops of these helices correspond to the loops that in E. coli 16s rRNA contain nucleotides (boxed in Figure 2) pro- tected from chemical modification by tRNA binding to the P site of the ribosome (MOAZED and NOLLER 1989). In the 30s subunit, these two loops are in close proximity and form a ring like structure that defines the platform on the top of the 30s subunit (OAKES et al. 1990). In E. coli, the sequence UUAGAU in the apical loop of the second helix adjacent to the two tRNA P site nucleotides has been proposed to interact with a complementary sequence in 23s rRNA during 70s particle formation (TAPPRICH et al. 1990). The homologous and identical sequence, UUAGAU, is conserved in both the rrnA and rrnB 16s rRNAs (position 727-732, Figure 2) and the complementary hexanucleotide is also conserved in Ha. marismortui 23s rRNA at position 2788-2783 (BROMBACH et al. 1989). Ribosomal proteins S6, S1 1, S15 and S21 bind to sites within one or both of these helical regions as indicated in Figure 2. [For more detail, see reviews by NOLLER et al. (1 990), OAKES et al. (1990) and BRIMACOMBE et al. (1 990).]

In contrast, the two helical regions bounded by nucleotide 526-588 and 768-800 exhibit many dif- ferences between rrnA and rrnB; however, most all of the differences are compensatory, affecting both com- ponents of nucleotide base pairs within the helices (Figure 2). The base of the 526-588 helix corresponds to the region in E. coli 16s rRNA where protein S8 binds and the apical portion of the 768-800 helix corresponds to the region where protein S18 binds. These E. coli proteins are capable of forming specific complexes with Hb. cutirubrum and several other archaebacterial 16s rRNAs (THURLOW and ZIMMER- MANN 1982). The protein-binding sites in the archae- bacterial RNAs contains short regions of perfect or nearly perfect sequence identity to E. coli 16s rRNA and can be folded into a common secondary structure despite base changes at approximately half of the congruent positions. This implies that the S8 and S 15 binding sites are highly conserved between eubacteria and archaebacteria but that at least some base substi- tutions can be tolerated without loss of protein bind- ing activity (WOESE et al. 1983). From two dimensional gel, protein and DNA analyses, there is no evidence that either the S8 or S15 genes are duplicated in Ha. marismortui (WITTMANN-LIEBOLD et al. 1990; SCHOL- GEN and ARNDT 1991; ARNDT et al. 1991). Thus, single S8 and probably also S 15 proteins appear able to bind to both the rrnA and rrnB 16s sequences.

Evolutionary origin and significance of 16s se- quence heterogeneity: The extent and pattern of nucleotide sequence differences between the dupli- cate 16s rRNA encoding genes of Ha. marismortui

Page 11: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

rRNA Sequence Heterogeneity 409

raise several intriguing questions: How did the se- quence heterogeneity originate and what are the se- lective forces which have allowed it to be maintained and propagated? One simple explanation is that Ha. marismortui has lost the ability to homogenize its du- plicated 16s rRNA genes through recombination or gene conversion-like mechanisms and that the two sequences are evolving separately but still subject to certain functional constraints; that is, mutations that occur at functionally unimportant positions are some- times being fixed by random processes whereas mu- tations at important positions are usually eliminated by negative selection. This scenario explains neither the obvious clustering of mutational differences be- tween the two sequences nor the apparent accelerated rate of mutation within the rrnB sequence.

Halobacteria are known to engage in cell contact dependent genetic exchange (ROSENSHINE, TCHELET and MEVARECH 1989). As a more interesting and complex possibility, one could imagine that Ha. mar- ismortui represents a genetic chimera produced by an exchange event between related organisms that has survived with a duality in its protein synthesis machin- ery. In the most extreme case (the likelihood of which seems remote), ribosomes would play an active role in efficiency, accuracy or regulation of protein synthesis through subtle interactions with mRNAs. Maintaining this duality would be essential for full expression of the chimeric genome; that is, the two types of ribo- some would differentially translate the separate pro- teins encoded by the two halves of the chimeric ge- nome. In this case, the rrnA and rrnB sequence dif- ferences would identify regions uniquely involved in these subtle processes. Important nucleotide differ- ences in these regions would be maintained and pos- sibly even expanded by positive selection. Other re- gions of the 16s molecule not involved might be subjected to sequence homogenization. This could explain not only the clustering of mutational differ- ences but also the apparent accelerated rate of change evident in the rrnB 16s gene sequence. This situation would be similar in many respects to that proposed for the blood stream parasite P. berghei; however, instead of being developmentally regulated in Ha. marismortui, both types of 30s ribosome subunits are present simultaneously in the archaebacterial cyto- plasm.

The fact that virtually all of the differences between the rrnA and rrnB sequences occur at phylogenetically variable positions strongly argues against any struc- tural or functional importance of these positions in protein synthesis. If one accepts that the origin of 16s sequence heterogeneity was through interspecies ge- netic exchange, one could explain the current local- ization of these differences by assuming that the con- version events have obliterated nucleotide differences

in important regions of the molecule and the differ- ences that are retained are inconsequential relative to ribosome function.

This work was supported by grant M T 6340 from the Medical Research Council of Canada. P.P.D. is a fellow in the Evolutionary Biology Program of the Canadian Institute for Advanced Research. We thank ED PERKINS for suggestions concerning the statistical analysis, and members of our laboratory group for their suggestions and criticisms. We especially thank JANET CHOW and DEIDRE DE JONG WONC for their contributions to DNA sequence determina- tions and subclone constructions.

LITERATURE CITED

ARNDT, E., T . SCHOLZEN, W. KROMER, T. HALAKEYAMA and M. KIMURA, 1991 Primary structures of ribosomal proteins from the archaebacterium Halobacterium marismortui and the eubac- terium Bacillus stearothermophilus. Biochimie 73: 657-668.

BAYLEY, S. T., 1971 Protein synthesis systems from halophilic bacteria. Methods Mol. Biol. 1: 89-100.

BRANLANT, C., A. KROL, M. A. MACHATT, J. POUYET, J.-P. EBEL, K. EDWARDS and H. KOSSEL, 1981 E. coli 23s rRNA hetero- geneity. Nucleic Acids Res. 9 4303-4321.

BRIMACOMBE, R., B. GREUER, P. MITCHELL, M. OSSWALD, J. RINKE- APPEL, D. SCHULER and K. STUDE, 1990 Three-dimensional structure and function of Escherichia coli 16s and 23s rRNA as studied by cross-linking techniques, pp. 93-106 in The Ri- bosome: Structure, Function and Evolution, edited by W. E. HILL, A. DALBERG, R. A. GARRETT, P. B. MORE, D. SCHLESSINGER and J. R. WARNER. ASM Publications, Washington, D.C.

BROMBACH, M., T. SPECHT, V. A. ERDMANN and N. ULBRICH, 1989 Complete nucleotide sequence of a 23s ribosomal RNA gene from Halobacterium marismortui. Nucleic Acids Res. 17: 3293.

CHANT, J., and P. P. DENNIS, 1986 Archaebacteria: transcription and processing of ribosomal RNA sequences in Halobacterium cutirubrum. EMBO J. 5: 1091-1097.

DOVER, G . A., 1982 Molecular drive: a cohesive mode of species evolution. Nature 299 11 1-1 17.

DOVER, G. A,, 1987 DNA turnover and the molecular clock. J.

DRYDEN, S., and S. KAPLAN, 1990 Localization and structural analysis of the ribosomal RNA operons of Rhodobacter sphae- roides. Nucleic Acids Res. 18: 7267-7277.

GUNDERSON, J. H., M. L. SOGIN, G. WOLLETT, M. HOLLINCDALE, V. F. DE LA CRUZ, A. P. WATERS and T. F. MCCUTCHAN, 1987 Structurally distinct, stage specific ribosomes occur in Plasmodium. Science 238: 933-937.

GUPTA, R., J. M. LANTER and C. R. WOESE, 1983 Sequence of the 16s ribosomal RNA from Halobacterium volcanii, an ar- chaebacterium. Science 221: 656-659.

HEINONIN, T. Y. K., M. N. SCHNARE and M. W. GRAY, 1990 Sequence heterogeneity in the duplicated large subunit ribosomal RNA genes of Tetrahymena pyriyormis mitochondrial DNA. J. Biol. Chem. 265: 22336-22341.

HUI, I . , and P. P. DENNIS, 1985 Characterization of the ribosomal RNA gene cluster in Halobacterium cutirubrum. J. Biol. Chem. 260: 899-906.

LARSEN, H., 1984 Family V Halobacteriaceae, p. 261-267 in Ber- gey's Manual of Systematic Bacteriology, Vol. I, edited by N. R. KRIEC and J. G. HOLT. Williams & Wilkins, Baltimore.

LEFFERS, H., and R. A. GARRETT, 1984 The nucleotide sequence of the 16s rRNA gene of the archaebacterium Hc. morrhuaee.

MADEN, B. E., C. L. DENT, T. E. FARRELL, J. GARDE, F. MCCALLUM and J. A. WAKEMAN, 1987 Human rRNA heterogeneity. Biochem. J. 246: 519-527.

Mol. EvoI. 26: 47-58.

EMBO J. 3: 1613-1619.

Page 12: Sequence Heterogeneity Between the Two Genes Encoding 16s ... · sequence heterogeneity within 16s rRNA of H. marismortui are discussed. I N virtually all species, the sequences of

410 S. Mylvaganam and P. P. Dennis

MCCUTCHAN, T. F., V. F. DE LA CRUZ, A. A. LAL, J. H. GUNDER- SON, H. J. ELWOOD and M. L. SOGIN, 1988 Primary sequence of two small subunit ribosomal RNA genes from Plasmodium

falcipurium. Mol. Biochem. Parasitol. 28: 63-68. MEVARECH, M., S. HIRSCH-TWIZER, S. GOLDMAN, S. YAKOBSON, H.

EISENBERG and P. P. DENNIS, 1989 Isolation and character- ization of the rRNA gene clusters of Halobacterium marismortui. J. Bacteriol. 171: 3479-3485.

MOAZED, D., and H. NOLLER, 1989 Intermediate states in the movement of transfer RNA in the ribosome. Nature 342: 142- 148.

NOLLER, H. F., D. MOAZED, S. STERN, T. POWERS, P. ALLEN, J. ROBERTSON, B. WEISER and K. TRIMAN, 1990 Structure of rRNA and its functional interactions during translation, pp. 73-92 in The Ribosome: Structure, Function and Evolution, edited by W. E. HILL, A. DALBERG, R. A. GARRETT, P. B. MOORE, D. SCHLESSINGER and J. R. WARNER. ASM Publications, Washing- ton, D.C.

OAKES, M. I . , A. SCHEINMAN, T. ACHA, G. SHANKWEILER and J. LAKE, 1990 Ribosome structure: three dimensional locations of rRNA and proteins, pp. 180-193 in The Ribosome: Structure, Function and Evolution, edited by W. E. HILL, A. DALBERG, R. A. GARRETT, P. B. MORE, D. SCHLESSINGER and J. R. WARNER. ASM Publications, Washington, D.C.

OREN, A., P. P. LAU and G. E. Fox, 1988 The taxanomic status of Halobacterium marismortui from the Dead Sea: a comparison with Halobacterium vullismortis. Syst. Appl. Microbiol. 1 0 251- 258.

ROSENSHINE, l., R. TCHELET and M. MEVARECH, 1989 The mech- anism of DNA transfer in the mating system of an archaebac- terium. Science 245: 1387-1389.

SANGER, F., S. NICKLEN and A. R. COULSEN, 1977 DNA sequenc- ing with terminating inhibitors. Proc. Natl. Acad. Sci. USA 7 4 5463-5467.

SARICH, V. M., and A. C. WILSON, 1973 Generation time and genomic evolution in primates. Science 179 1 144-1 147.

SCHOLZEN, T., and E. ARNDT, 1991 Organization and nucleotide sequence of ten ribosomal protein genes from the region equiv- alent to the spectinomycin operon in the archaebacterium Halobacterium marismortui. Mol. Gen. Genet. 228: 70-80.

SHEVAICK, A., H. S. GEWITY, B. HENNEMANN, A. YONATH and H. G. WITTMANN, 1985 Characterization and crystallization of ribosomal particles from Halobacterium marismortui. FEBS Lett.

STERN, S., B. WEISER and H. F. NOLLER, 1988 Model for the three dimensional folding of 16s ribosomal RNA. J. Mol. Biol.

TAPPRICH, W., H. U. GORINGER, E. DE SLASIO, C. PRESCOTT and A. E. DALBERG, 1990 Studies of ribosome function by muta- genesis of Escherichia coli rRNA, pp. 236 in The Ribosome: Structure, Function and Evolution, edited W. E. HILL, A. DAL- BERG, R. A. GARRETT, P. B. MOORE, D. SCHLESSINGER and J. R. WARNER. ASM Publications, Washington, D.C.

THURLOW, D. L., and R. A. ZIMMERMANN, 1982 Evolution of protein binding regions of archaebacteria, eubacteria and eu- caryotic ribosomal RNAs, p. 347 in Archaebacteria, edited by 0. KANDLER. Gustav Fracher, Verlag, Stuttgart.

WOESE, C. R., R. GUTELL, R. GUPTA and H. NOLLER, 1983 Detailed analysis of the higher-order structure of 16s- like ribosomal ribonucleic acid. Microbiol. Rev. 46: 621-669.

WITTMANN-LIEBOLD, B., A. KOPKE, E. ARNDT, W. KROMER, T . HATAKEYHAMA and H. G. WITTMANN, 1990 Sequence com- parison and evolution of ribosomal proteins and their genes pp. 598-6 16 in The Ribosome: Structure, Function and Evolution, edited by W. E. HILL, A. DALBERG, R. A. GARRETT, P. B. MOORE, D. SCHLESSINGER and J. R. WARNER. ASM Publica- tions, Washington, D.C.

184: 68-7 1.

204: 447-481.

Communicating editor: W.-H.LI