The Complete Nucleotide Sequence of the Xenopus Zaevis ...

16
THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1985 by The American Society of Biological Chemists. he. Vol. 260, No. 17, Issue of August 15, pp. 9759-9774.1985 Printed in U. S. A. The Complete Nucleotide Sequence of the Xenopus Zaevis Mitochondrial Genome* (Received for publication, December 13, 1984) Bruce A. Roe, Din-Pow Ma$, Richard K. Wilson, and James F.-H. WongQ From the Chemistry Department, University of Oklahoma, Norman, Oklahoma 73019 The complete sequence of the 17,553-nucleotide Xenopus laevis mitochondrial genome has been deter- mined. A comparison of this amphibian mitochondrial genomic sequence with those of the mammalian mito- chondrial genomes reveals a similar gene order and compact genomic organization. The encoded genes for 22 tRNAs, two ribosomal RNAs, and 13 proteins (COI, COII, COIII, ATPase 6, cytochrome b, and eight addi- tional unidentified reading frames) in the amphibian mitochondria are highly homologous to their mamma- lian counterparts. Although the amphibian mitochod- rial genome contains a significantly larger displace- ment loop region than the mammalian mitochondrial genomes, there are several regions of sequence homol- ogy near the putative sites for heavy and light strand transcription initiation and heavy strand replication. The unique mitochondrial genetic code observed in the mammalian mitochondrial systems is similar to that of the X. laevis mitochondrial genome because of the pres- ence of only 22 encoded tRNAs and thehigh degree of homology between thepredictedprotein sequences. However, the amphibian system exclusively utilizes AUG as the start codon in all 13 open reading frames and shows a preference for codons ending in U rather than ending in C. In addition, the X. laevis mitochon- drial genome employs the encoded AGA stop codon once and theUAA stop codon three times and requires polyadenylation to provide the nine other UAA stop codons. These observations suggest that the mecha- nisms of replication,transcription, processing, and translation in mitochondria are highly conserved throughout higher vertebrates. All eucaryotes examined to date contain a double-stranded mitochondrial specific DNA whichencodes several mitochon- drial components. In higher vertebrates, the mitochondrial DNA ranges in size from 14.5 to 19.5 kilobases (1) and codes for 22 tRNA9 and two ribosomal RNAs which are required for the mitochondrial protein-synthesizing system. These ge- nomes also encode at least fiveknown polypeptides (three subunits of cytochrome c oxidase, ATPase subunit 6, and cytochrome b) and possibly eight additional polypeptides in * This work was supported in part by Grants GM-30399 and GM- 30400 from the National Institutes of Health and a Biomedical Research Support grant from the University of Oklahoma. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. College Station, TX 77804. $ Present address: Department of Biology, Texas A&M University, § Present address: Shell Development Corp., Modesto, CA 95368. other available unidentified open reading frames (URFs’) (2- 4). The complete nucleotide sequences of the human, bovine, and mouse mitochondrial genomes (2-4) and the restriction maps and partial nucleotide sequences of several other higher eucaryote mitochondrial genomes have been reported (5-12). These studies and additional hybridization experiments (13) reveal that higher vertebrate mitochondrial DNAs have com- mon elements of overall gene organization but differ slightly in their nucleotide sequences. In contrast, the Drosophila yakuba mitochondrial ribosoma RNA genes and the origin of replication occupy similar relative positions (14-16), but the overall gene order differs from that observed for the mam- malian mitochondrial genome (17). The organization of higher eucaryote mitochondrial ge- nomes is extremely economical. Genes for the 22 tRNA8 are interspersed between the ribosomal RNA and protein coding regions, possibly serving as punctuation points for processing of the primary transcripts (18). With the exception of the displacement loop (D-loop) region, there are very few noncod- ing nucleotides and no apparent introns (2-4). Replication of the mitochondrial heavy (H) strand begins inthe D-loop region, while that of the light (L) strand begins at a hairpin structure located in a cluster of five tRNA genes between the coding regions of COI and URF2. Transcription of both heavy and light strands is thought to be initiated at sites in the D- loop region, resulting in two large polycistronic precursor RNAs which are subsequently processed to mature tRNAs, rRNAs, and polyadenylated mRNAs (1,20). The genetic code of the mitochondrial system differs dramatically from that of the nuclear, cytoplasmic, and procaryotic systems (21). This altered genetic code is due, in part, to the presence of only 22 mitochondrially encoded tRNAs (21-23) and was confirmed by protein sequence analysis of several mitochondrially en- coded polypeptides (see Ref. 2). In addition, because of the highly conserved genomic organization, mitochondria have an unusual mechanism for introducing translation stop signals in mature mRNAs. Following cleavage from the primary transcript, pre-mRNAs ending in U or UA are polyadenylated to generate a putative UAA translation stop codon (2). In addition, pre-tRNAs undergo post-transcriptional addition of CCA to their 3’ terminus to form mature, functional species. Because we were interested in studying the evolutionary significance of the unique mitochondrial properties and de- veloping a system for elucidating the events of mitochondrial RNA processing, we investigated an amphibian mitochondrial genome. We earlier presented the nucleotide sequence of the D-loop and the region containing the origin of L-strand synthesis (24), and now present the complete sequence of the ‘The abbreviations used are: URF, unidentified open reading frame; L-strand, light strand; H-strand, heavy strand. 9759

Transcript of The Complete Nucleotide Sequence of the Xenopus Zaevis ...

THE JOURNAL OF BIOLOGICAL CHEMISTRY 0 1985 by The American Society of Biological Chemists. h e .

Vol. 260, No. 17, Issue of August 15, pp. 9759-9774.1985 Printed in U. S. A.

The Complete Nucleotide Sequence of the Xenopus Zaevis Mitochondrial Genome*

(Received for publication, December 13, 1984)

Bruce A. Roe, Din-Pow Ma$, Richard K. Wilson, and James F.-H. WongQ From the Chemistry Department, University of Oklahoma, Norman, Oklahoma 73019

The complete sequence of the 17,553-nucleotide Xenopus laevis mitochondrial genome has been deter- mined. A comparison of this amphibian mitochondrial genomic sequence with those of the mammalian mito- chondrial genomes reveals a similar gene order and compact genomic organization. The encoded genes for 22 tRNAs, two ribosomal RNAs, and 13 proteins (COI, COII, COIII, ATPase 6, cytochrome b, and eight addi- tional unidentified reading frames) in the amphibian mitochondria are highly homologous to their mamma- lian counterparts. Although the amphibian mitochod- rial genome contains a significantly larger displace- ment loop region than the mammalian mitochondrial genomes, there are several regions of sequence homol- ogy near the putative sites for heavy and light strand transcription initiation and heavy strand replication. The unique mitochondrial genetic code observed in the mammalian mitochondrial systems is similar to that of the X. laevis mitochondrial genome because of the pres- ence of only 22 encoded tRNAs and the high degree of homology between the predicted protein sequences. However, the amphibian system exclusively utilizes AUG as the start codon in all 13 open reading frames and shows a preference for codons ending in U rather than ending in C. In addition, the X. laevis mitochon- drial genome employs the encoded AGA stop codon once and the UAA stop codon three times and requires polyadenylation to provide the nine other UAA stop codons. These observations suggest that the mecha- nisms of replication, transcription, processing, and translation in mitochondria are highly conserved throughout higher vertebrates.

All eucaryotes examined to date contain a double-stranded mitochondrial specific DNA which encodes several mitochon- drial components. In higher vertebrates, the mitochondrial DNA ranges in size from 14.5 to 19.5 kilobases (1) and codes for 22 tRNA9 and two ribosomal RNAs which are required for the mitochondrial protein-synthesizing system. These ge- nomes also encode at least five known polypeptides (three subunits of cytochrome c oxidase, ATPase subunit 6, and cytochrome b) and possibly eight additional polypeptides in

* This work was supported in part by Grants GM-30399 and GM- 30400 from the National Institutes of Health and a Biomedical Research Support grant from the University of Oklahoma. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “aduer- tisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

College Station, TX 77804. $ Present address: Department of Biology, Texas A&M University,

§ Present address: Shell Development Corp., Modesto, CA 95368.

other available unidentified open reading frames (URFs’) (2- 4).

The complete nucleotide sequences of the human, bovine, and mouse mitochondrial genomes (2-4) and the restriction maps and partial nucleotide sequences of several other higher eucaryote mitochondrial genomes have been reported (5-12). These studies and additional hybridization experiments (13) reveal that higher vertebrate mitochondrial DNAs have com- mon elements of overall gene organization but differ slightly in their nucleotide sequences. In contrast, the Drosophila yakuba mitochondrial ribosoma RNA genes and the origin of replication occupy similar relative positions (14-16), but the overall gene order differs from that observed for the mam- malian mitochondrial genome (17).

The organization of higher eucaryote mitochondrial ge- nomes is extremely economical. Genes for the 22 tRNA8 are interspersed between the ribosomal RNA and protein coding regions, possibly serving as punctuation points for processing of the primary transcripts (18). With the exception of the displacement loop (D-loop) region, there are very few noncod- ing nucleotides and no apparent introns (2-4). Replication of the mitochondrial heavy (H) strand begins in the D-loop region, while that of the light (L) strand begins at a hairpin structure located in a cluster of five tRNA genes between the coding regions of COI and URF2. Transcription of both heavy and light strands is thought to be initiated at sites in the D- loop region, resulting in two large polycistronic precursor RNAs which are subsequently processed to mature tRNAs, rRNAs, and polyadenylated mRNAs (1,20). The genetic code of the mitochondrial system differs dramatically from that of the nuclear, cytoplasmic, and procaryotic systems (21). This altered genetic code is due, in part, to the presence of only 22 mitochondrially encoded tRNAs (21-23) and was confirmed by protein sequence analysis of several mitochondrially en- coded polypeptides (see Ref. 2). In addition, because of the highly conserved genomic organization, mitochondria have an unusual mechanism for introducing translation stop signals in mature mRNAs. Following cleavage from the primary transcript, pre-mRNAs ending in U or UA are polyadenylated to generate a putative UAA translation stop codon (2). In addition, pre-tRNAs undergo post-transcriptional addition of CCA to their 3’ terminus to form mature, functional species.

Because we were interested in studying the evolutionary significance of the unique mitochondrial properties and de- veloping a system for elucidating the events of mitochondrial RNA processing, we investigated an amphibian mitochondrial genome. We earlier presented the nucleotide sequence of the D-loop and the region containing the origin of L-strand synthesis (24), and now present the complete sequence of the

‘The abbreviations used are: URF, unidentified open reading frame; L-strand, light strand; H-strand, heavy strand.

9759

9760 X . laevis Mitochondrial Genome

17,553-nucleotide Xenopus lueuis mitochondrial genome. This amphibian mitochondrial genome shows a high homology in both overall gene order and nucleotide sequence with the mitochondrial genomes of vertebrates. In addition, our results indicate that the unique mitochondrial genetic code and the overall mechanism of mitochondrial gene expression are con- served from mammalian to amphibian.

MATERIALS AND METHODS

A recombinant plasmid (pXlm-31) containing the entire X . kzevis mitochondrial genome inserted into the BamHI site of pBR322 (25) was a gift from Dr. I. Dawid (National Institutes of Health, Bethesda, MD). All further subcloning and DNA sequencing experiments were performed using this cloned mitochondrial genome.

The complete mitochondrial genomic insert was excised by restric- tion endonuclease digestion with BamHI, purified by preparative agarose gel electrophoresis, eluted by a modified freeze-thaw method, and concentrated by ethanol precipitation. After further digestion with selected restriction endonucleases, fragments of the Xenopus mitochondrial genome were ligated into either M13mp8, mp9, mplO, or mpll (26). In several instances, large fragments, which lacked appropriate restriction sites, were gel-purified and treated with nu- clease BaZ31 for various time intervals prior to ligation into M13mp9 to obtain overlapping segments. After transvection into Escherichia coli strain JM101, the single-stranded recombinant phage DNAs containing fragments of the Xenopus mitochondrial genome were isolated by phenol extraction of the polyethylene glycol-concentrated phage (27).

All DNA sequence data were obtained by the dideoxynucleotide chain termination method of Sanger et al. (27,28). The M13 subclones were selected first at random and later by hybridization to overlapping fragments cloned in M13 in the opposite orientation. All inserts were sequenced using a synthetic oligonucleotide primer. The complete sequence of contiguous regions was assembled from individual, over- lapping subclones containing fragments in either orientation, using the programs described by Staden (29-31) but modified to run on an IBM-3081 computer. All regions of the X. kzeuis mitochondrial ge- nome were subcloned into M13 vectors and sequenced in both ori- entations at least twice to remove discrepancies. The sequencing strategy and partial restriction map are shown in Fig. 1. Optimal homology among nucleotide sequences was determined by the NU- CALN program of Wilbur and Lipman (32). All recombinant DNA

1 I

2 I

experiments were performed in accordance with National Institutes of Health guidelines.

RESULTS AND DISCUSSION

Genomic Organization-The overall gene order of the X. laevis mitochondrial genome, shown in Fig. 2, is virtually identical to the human, bovine, and mouse mitochondrial DNAs. Nearly all of the difference in size between the X. laevis and other vertebrate mitochondrial genomes is con- tained in the D-loop region (24). The complete sequence of the X . laeuis mitochondrial genome is shown in Fig. 3 with nucleotide 1 representing the 5' end of the D-loop region. The H-strand (the complement of the L-strand shown in Fig. 3) encodes all but one of 13 potential open reading frames, both the 12 S and 16 S ribosomal RNAs, and 14 of 22 tRNA genes. With the exception of the D-loop region, the 36-nucleotide origin of L-strand replication, and 28 nucleotides separating the genes for threonine and proline tRNAs, the X. laeuis mitochondrial DNA consists totally of putative coding re- gions.

Displacement Loop and Origin of Heavy Strand Replica- tion-The D-loop of the X . laevis mitochondrial genome is a 2134-nucleotide region flanked by the genes for tRNAP" and tRNAPhe. This region contains the site for the origin of H- strand replication as well as the putative major sites of tran- scriptional initiation for both strands of the genome (33). The mechanism of mitochondrial DNA replication requires the displacement of the H-strand by synthesis of a complemen- tary DNA fragment, resulting in a temporary triple-stranded structure (19,20,34). H-strand replication proceeds unidirec- tionally from this DNA fragment without concomitant L- strand replication until approximately two-thirds of the H- strand has been synthesized. In the amphibian mitochondrial D-loop region, the triple-stranded structure contains a hy- bridized 14 S single-stranded DNA initiation segment, while in mammalian mitochondria, the D-loop contains a 7 S single- stranded DNA initiation segment. The amphibian 14 S DNA

3 4 1 1

5 6 I I

7 I

8

I 9 1

10 I

11 I

12 I

13 I

14 I

15 I

16 I

17 I

ERO B B O B B 0 S S R T N R a a M L O ~ J L a a BR ~ ~~ ~ ~~~~ ~ ~ ~ ~~~ ~ ~~~~~~~~ ~

" " u - "

e"--l e"--r " -"-

< . I c"--l " - "- I-

-+ -

7 - c", 7

- U

FIG. 1. Sequencing strategy and partial restriction enzyme map of the X. laevis mitochondrial genome. The arrows show the region and direction of sequence determined for individual M13 subclones by the dideoxynucleotide chain termination method (27-28). The numbers represent the genomic length in kilobases. Restriction endonucleases used in subcloning: A, AccI; B, AluI; C, AuaI; D, AuaII; E , BamHI; F, BglII; G, ClaI; H, EcoRI; I , HindPI; J, HincII; K, HindIII; L, HpaII; M, PstI; N , PuuII; 0, RsaI; P, SaZI; Q, Sau3A R, TagI; S , XbaI; T, XhOI.

X . laevis Mitochondrial Genome 9761

3000

I D loop region H strand origin 1

F 12s RNA

I 0

6000

V 16SRNA L ATG URF 1

0 0 0 0

9000

ATG URF 2 co I Taa s

-2 2 1 36 -1 2 - 2

L strand origin +

12000

D ATG co I1 ATPase 6 TAB ATG co 111 I

15 2 -2 1 -10 -1 -2 0 -2 0

15000

URF4L URF4 URF 5

/ -7 - 2 0 0 0

17553

TAA Cyt b /

-5 AGA GTA

0 -1 28

FIG. 2. Overall genomic organization of the X . laevis mitochondrial genome. All reading frames read left-to-right (H-strand) unless otherwise noted. The numbers above the right end of each line represent the genomic length in base pairs, while the numbers below the coding regions indicate the intragenic spacing. The encoded initiation and termination codons are shown in upper case letters, while the polyadenylation generated termination codons are indicated by lower case letters. Transfer RNA genes are identified by their one-letter amino acid abbreviations.

consists of at least two species, 1350 and 1510 nucleotides in length (35). The mammalian 7 S DNAs are smaller than their amphibian counterparts and are in the size range of 500-630 nucleotides (35, 36). This is consistent with the observation of a larger D-loop region in the X. laeuis mitochondrial ge- nome. Further comparisons of sequence and features of the X . laeuis mitochondrial D-loop region with those of the mam- malian mitochondria have been previously described (24).

Control regions for the initiation and termination of tran- scription are also contained in the D-loop region. Investiga- tion of human mitochondrial DNA transcriptional mecha- nisms has suggested that both strands of the mitochondrial DNA are transcribed into full-length polycistronic RNAs which are subsequently processed into individual mRNAs, rRNAs, and tRNAs (33). Recent studies have located the major sites for H-strand and L-strand transcription initiation by examining the 5' termini of in vitro synthesized mitochon- drial RNA (37, 38). In the human mitochondrial genome, H- strand transcription begins from an A- and C-rich region located in the D-loop about 15 nucleotides upstream from the tRNAPh' gene. A similar A- and C-rich region is present at the site of L-strand transcription initiation, about 100 nucleo- tides upstream from the H-strand promoter region (37). The consensus sequence of these two promoter regions is observed only once in the X . laeuis mitochondrial genome at the site of conserved sequence block 3 (CSB-3 in Fig. 3). In addition, the

3' region of the D-loop contains an A- and T-rich sequence characteristic of many promoters. Experiments aimed at de- termining the site of H-strand transcription initiation in X . laeuis mitochondrial DNA are presently in progress.

Origin of Light Strand Replication-The site for the origin of light strand replication is located within a cluster of five tRNA genes about two-thirds of the genome away from the origin of heavy strand replication. The first three and last two tRNA genes in the cluster are separated by a 36-nucleotide stretch which forms a symmetrical hairpin loop structure. The stem of the hairpin may be either nine base pairs with continuous base pairings or 15 base pairs with two mis- matched base pairings. The former structure would contain a 19-nucleotide A- and T-rich loop, while the latter structure would contain a smaller T-rich loop. As this hairpin is located at the site of L-strand replication initiation and is functional only when the region is single-stranded (39), it is generally believed that this structure is functionally important in the replication mechanism. The high sequence homology of this hairpin structure with its mammalian counterparts indicates its probable role as the initiation site for light strand synthe- sis.

Protein Coding Regions-The amphibian mitochondrial ge- nome contains 13 long open reading frames. The genes for COI, COII, COIII, cytochrome b, and ATPase 6 have been identified by their homology to similar regions in other mi-

9762 X . laevis Mitochondrial Genome Q*pe.ted s*quencc Repearad Sequence ... t*.*******~**~......**~*.**........ tt.....*.4*..4*..4.**.~~...**..

~ ~ A I ~ ~ C W M C * C T A C ~ A A I I " I C M C C ~ I A I M ~ M C ~ A ~ ~ A I A ~ ~ A I M I M C C A R A I M C A C A ~ ~ A C A T M T M ~ A ~ 490 500 510 520 530 540 550 560 570 580 590 600

~ C I " C M C A T A T I A I A C C R A R M C R A I I l A ~ ~ T A R M C A I A C A I A T M I A A ~ C I " I A R M I M I C ~ I A C ~ ~ M I M T I l A C A ~ A ~ A I M 970 980 990 1000 1010 1020 1030 1040 1050 1060 1070 1080

~ M A C A I A I m A C C ~ M ~ ~ ~ ~ ~ C ~ A R A R A ~ A C ~ M - M I ~ ~ - A ~ 1690 1700 1710 1720 1730 1740 1750 1760 1770 1780 1790 1800

CSB-2 csn-1 *. 4*.*.4H... ....**4.*...... ." .

~ - C a ? * C C C C C C A ~ ~ ~ R A - ~ C - C I " C C ~ - ~ A C A R ~ C C M ~ A ~ M C C ~ ~ ~ M C 1810 1820 1430 1840 1450 1960 1870 1480 1990 1900 1910 1920

~ C C T M ~ ~ A I A ~ A ~ ~ ~ A D C C ~ C A T A T ~ ~ A ~ A ~ I A ~ A R A C C C ~ C C C A ~ A T A C A ~ A I A I A I A ~ R A I 1930 1940 1950 1960 1970 1980 1990 2000 2010 2020 2010 2040

D C * C T D U G * ~ A ~ A C ~ ~ C R ~ A I ~ ~ A ~ ~ M R C A ~ ~ ~ I I l A C A R ~ A ~ ~ A C A C ~ ~ ~ ~ A C 2170 2180 2190 2200 2210 2220 2230 2240 2250 2260 2270 22M)

2290 2300 2310 2320 2330 2340 23% ~ 2j60- ~ - 2 3 7 0 ~ 2 3 6 ~ 2390 2400

~ C M ~ A ~ M ~ A C C C A T A R A C - M C T C ~ C C ~ ~ I l A C ~ A ~ A ~ I A C ~ C ~ A ~ ~ ~ I A 2410 2420 2430 2440 2450 2460 2470 2480 2490 2500 2510 2520

~ C T A C R ~ U I M * n C C C C A ~ A C T A C ~ ~ C T - ~ C - f f i ~ R ~ ~ C ~ C C C ~ C T A C ~ ~ ~ M ~ ~ I ~ C C C T C ~ 2530 2540 2550 2560 2570 2580 2590 2600 2610 2620 2630 2640

* M C C T C U T A C R C I I D C C ~ C T A T A I A C C A C C R C ~ ~ ~ C C A C C T C ~ ~ A - A R A - M T ~ - A ~ M C A ~ C A ~ ~ ~ A I A ~ 2650 2660 2670 2680 2690 2700 2710 2720 2730 2740 2750 2760

~ ~ A C A I m C T A I A ~ ~ ~ I I l A ~ A ~ C I A I G I l A C C ~ ~ ~ G ~ ~ A ~ A ~ ~ ~ ~ * M C ~ ~ ~ ~ 2770 2180 2790 2800 2810 2820 2430 2R4O 2850 2860 2870 2880

............................................................

c o c C c c t c c * c w u T I c ~ ~ I M * n C C C ~ ~ C ~ ~ A C ~ ~ M C C M ~ C T A T ~ ~ A C M ~ ~ A C ~ ~ G A ~ C ~ C R ~ C C A ~ ~ ~ A ~ A C 2890 2900 2910 2920 2930 2940 2950 2960 2970 2900 2990 3 0 0 0

V.1

W C C C A I ~ M C C ~ ~ A C ~ ~ M C M T A - ~ C ~ R A - A I C C - C C T A D C A ~ C M ~ A ~ M I 16 3 1RMA

3010 3020 3030 3040 3050 3060 3070 1080 1090 3100 3110 3120

U I C M l M C e f C * l A ~ I I l A ~ M R I l A C C A ~ I " ~ A R A I ~ T A ~ M ~ A T A A T A ~ A T A ~ A C ~ ~ ~ ~ I ~ ~ 3130 3140 3150 3160 3170 31% 3190 3200 3210 3220 3230 3240

FIG. 3. Sequence of the X. laevie mitochondrial genome. Nucleotide 1 is the first nucleotide of the D-loop region. All coding regions read left-to-right (H-strand) except for URF6 and the eight tRNA genes encoded by the L-strand. The amino acid translation for the 13 mitochondrial proteins is indicated by one-letter amino acid abbreviations written above the nucleotide sequence. The direct repeat region at positions 29-73 and 90-134 and the conserved sequence blocks (CSB) at positions 1707-1712,180&1820, and 1855-1877 are indicated by consecutive asterisks above the nucleotide sequence. The regions encoding tRNA5 and rRNAs are indicated by dashed lines above the nucleotide sequences which are delimited by asterisks to indicate the putative 5' and 3' encoded nucleotide.

X . &vis Mitochondrial Genome 9763

A F C ~ A * G C ~ ~ G I O F C A R ~ ~ T A * U C C A ~ C A C C * C 1610 3620 1630 3640 7650 3660 1610 3680 3690 1100 3110 3120

H * M c T * M C M C I C C C M C T Z C M ~ M C C C A T M f f i * M T M C M I I ~ C M C C A ~ c r C A C f f * M - f f i H ~ A C C A C f f i A ~ T A ~ F C C F C ~ ~ 4210 4220 4230 4240 4250 4260 4210 4280 4 2 9 0 4100 4110 6120

T A T A C C C ~ T A Q ~ A O C C A T A ~ ~ ~ C C C A C T ~ C ~ T A T A C - C A C T A ~ A G A C ~ T ~ C ~ ~ A ~ ~ ~ M F C C F C ~ A F C C ~ T ~ H A F C - A ~ T A T A T A ~ ~ A T W A L A L A ~ S I U A P L P ' ( P P ~ L A ~ L ~ I . G I L ? I L A L C S L A V Y T I

5050 5060 5010 5080 5090 5100 5110 5120 5130 5140 5150 5160

A T A C C C A C C A ~ C W T A T G A F C M ~ A T A C A C ~ ~ A ~ A T G A A ~ ~ C T C C C M F C A C A ~ ~ A ~ C A T U C A H A ~ A C A T A ~ A ~ A C C M ~ A ~ - A C C A T C Y P ~ F ~ Y ~ ~ L ~ ~ L V U U ~ ~ L P I T L A H T L ~ H I C L P ~ C ~ L C L P S

5650 5660 5611 5680 5690 5 l W l 5710 5720 5110 5160 5150 5160

q T (."- """"""""""""""""""""""""""""","""""""""""""""""""" 5170 5180 5190 rson 5810 5820 r w o 5860 5850 5060 5810 5.880

11e

A C * U C c T A f f i W I + ; I G C C C G C A ~ A F C A ~ C A T A C A f f G * U T A r A ~ * U C C C C A T C A F C F C C r T * W C A C ~ ~ A ~ M C ~ A C ~ f f i A C A ~ ~

"""""""""- GI" '(et tRNA "."_ tRNA """"""_""""""""""""""""""""" * V Y P I T I S V

1R?-2

C C C F C C C l A C r C C G A C T A T A C C * C T I C C T A R ~ C ~ T A - ~ ~ C C A T A C C C C * M C A ~ ~ C C C ~ C m A c T M m M C C C M T C A ~ - f f i 5R90 5900 5910 5920 5910 5000 5050 5060 5010 5980 5990 6OOO

FIG. 3"continued.

9764 X . laevis Mitochondrial Genome

A ~ M H A T C A C C T C C I C U T A H C ~ R A C T - ~ A T ~ A T C C A C ~ ~ H C ~ A ~ C T A C ~ R G A T ~ T ~ C C C C A T C C A ~ T A C ~ - A - M ~ L I U T S T * ? L V L K T I F C T Y I 9 9 L A T 9 U S ~ T P S T T A L C L L T L

6610 6620 6630 6640 6h50 6660 6670 66110 6690 6700 6710 6720

C O C M T A A * T M T A T M ~ A C H ~ C C C C C A T C A ~ A H A C T A ~ A ~ A ~ ~ H G M ~ ~ A c C C G G C A C A G G ~ M ~ A C C C c C ~ A c c T f f i ~ I U N N U S F U L L P P 9 P L L L b A F 9 G V ? A G A G T G U T V Y V P L A G N

1690 7700 7710 7720 7730 7740 7750 7760 7770 7760 7790 7800

A R M T A ~ ~ ~ T ~ C ~ ~ M ~ M C C H I ~ T I C C C ~ M C A ~ T A ~ ~ M ~ C M T A C ~ ~ C G A T A C ~ A C T A C C C A G A C ~ T T A T A C A H A T G ~ T A C C R C ~ A T C ~ M P A C ~ N L T F F P ~ H F L C L ~ A ~ ~ ~ ~ ~ ~ ~ Y P ~ A Y T L ~ N T ~ ~ ~

8650 8660 8670 8680 8690 8700 8710 8720 8730 8740 8750 5760

FIG. 3"continuecl.

X . laevis Mitochondrial Genome 9765

R A C C C T C C H ~ C A A M C * U T G C U T C C C A ~ ~ ~ A C G A C C T T C A T ~ ~ C A T C ~ ~ A ~ ~ T A C T C R C C ~ ~ A ~ A C ~ ~ ~ A C M T ~ T C A G A M ~ ~ A ~ * M C C A C V P S L C V Y T O A I P C R L U O T S I l ~ T q P C V F Y G Q C S E I C G ~ N H

9610 9620 9630 9640 9650 9660 9670 9680 9690 9700 9710 9720

s p M p 1 v " e A " p L 0 p e N v 5 s s n L 8 A I."- CRNA L9S

* G ~ ~ A T A C C M H C T A ~ G M ~ ~ A C C G C T A A C C G A ~ G A C T G A T C ~ ~ T C A A T A C T A ~ A ~ A ~ A C T M G A A ~ T ~ T A ~ C A H A ~ G A C A ~ C ~ M ~ 9730 9740 9750 9760 9770 9780 9790 9800 9810 9820 9830 9840

M P 9 L N P G P W P L 1 L 1 P 9 V L V L L T F I P P Y V L lRP-A6L

T A G A ~ ~ C G ~ A C T C C C M C C A C C C ~ ~ M T G A T A T ~ C A C A R T ~ C C C ~ C C C A T G A ~ C T M T C ~ M T C ~ C C ~ A C ~ ~ R C C ~ M ~ A ~ A T C ~ ~ A C ~ ~ M 9850 9860 9870 9880 9890 9900 9910 q920 9930 9940 9950 9960

ATPase 6

K H K A P Y E P T T 9 T T E K S Y P Y P V N W P U T I H Y L F P P ~ ~ P M F P V I L G I

M C A C A U C C A ~ M ~ A A C C M C T A C A C ~ C C A C A G A A T C T A U C C T M C C ~ ~ C ~ A C C A T G M C C T M ~ ~ ~ C A C C M ~ A T G ~ C ~ M ~ A ~ A T 9970 9980 9990 10000 10010 10020 10030 10040 10050 10060 10070 10080

M - * C C M ~ M C C T T C A C C A C A T A A R G A ~ C C T A ~ A H G A C A T C A ~ M T A C ~ ? A H M T A ~ ~ M C ~ A ~ ~ A ~ ~ A ~ ~ A C ~ A C A C ~ A C A C C M ~ A C I F ~ Q L T F P G H K U A L L L T 9 L W L L L W S L Y L L C L L P Y T P T P T T

10210 10220 10230 10240 10250 10260 10270 10280 10290 10300 10310 10320

~ M C I A r C C r T * M C A T A G C T A ~ A R C C C A r r A T G A r r f f i C M C A ~ ~ T C A T f f i C ~ G A M C C M C C M C T A ~ A C T A f f i A C A T C T A C r r C C ~ M ~ M C A C C M C A C C A ~ 9 L S L N W G L A V P L W L A T V l U A 9 Y P T I L

10330 10340 10350 10360 10770 10380 10390 10400 10410 10420 10430 10440

M ~ C I C T T C ~ A ~ A ~ A T C G A A r r A G C C I A ~ A ~ G A C C A ~ A ~ C C ~ f f i A m C G A C ~ A ~ M ~ M C A ~ f f i A C A ~ A ~ M ~ M ~ M ~ C A C C ~ I P V L I I l E T I S L P 1 P P L A L C V R L T A N L T A G ~ L L I ~ L l ~ T A

10450 10460 10470 10480 10490 10500 10510 10520 10530 io540 10550 10560

~ ~ A ~ C I A ~ A T A C C M ~ m ~ T A T C ~ A C A T C M r r m C - C T C C T M C A C ~ T f f i * M T C ~ T ~ A ~ M T M T C C M ~ A T A C C T A r r C ~ C r r A C T A P V L L S I U P T V A I L T S l V L P L L T L L ~ I A V A M I ~ A Y V F V L L

10570 10580 10590 10600 10610 10620 10610 10640 10650 10660 10670 10680

end ATPme 6:Scart Colll

A ~ ~ ~ C ~ A T C T A C U C ~ C T M ~ A C A C C M C C L S L ~ L ~ ~ N ~ I ~ A H ~ A ~ A ~ ~ ~ ~ ~ P ~ P ~ P L T G A ~ A A L L L T ~ G

10690 10700 10710 10720 10730 10740 10750 10760 10770 10780 10790 10800

~ C C T A T A ~ A ~ A C ~ ~ ~ ~ A ~ M T ~ ~ C ~ M C C C T A ~ ~ ~ C C T M ~ A C T A T A R A C T M C T A T M ~ C M ~ A T G A C G A G A C R M ~ ~ C C A G M ~ ~ ~ M C A R C C ~ A C A T L A M W P H P G S U l L L T L G L ~ T W V L T ~ l q V W ~ n V l ~ E G T P q C H

10810 10820 10830 10840 10850 10860 10870 10880 10890 10900 10910 10920

C A C A C T C C A C C C R T C ~ A ~ ~ A C G A T A T ~ ~ ~ M T M T C C T A ~ A ~ ~ A C A T C A G M R A ~ ~ C ~ ~ C ~ A ~ ~ ~ ~ A ~ T G A ~ A ~ A C M ~ M ~ A ~ ~ ~ A C A ~ A T H T P P V Q K G L R Y G W I L P I T 9 E V P F F l G P F U A F K N S F L A P T Y

10930 10940 10950 10960 10970 10980 10990 11000 11010 11020 11030 11060

C M r r ~ T ~ C C A C C M C A f f i M r r A C C C C A f i * U I C E L G E C U P P T G l T D L N P P E V P L L U T L V L L I F C V t V T V * H H S

11050 11060 11070 11080 11090 11100 11110 11120 11130 11140 11150 11160

O G A C m A C G G A T C M C A ~ A ~ M C ~ ~ C C * C G G G V Y C S T F F V A T C P H G L H V I I G S L F L S V C L L ~ ~ l ~ ~ H F T S K

11290 113M) 11310 11320 11330 11140 11350 11360 11370 11380 11390 11400

H H P C P E A A U Y W H P V D V V U L P L Y V S I Y u U G s I*--- cRNA G1Y

C A C C A C ~ M G C C G C I T C A T G A T A C ~ A C A C H C ~ G A C ~ A R A T G A C T A ~ C ~ A C ~ ~ T C ~ T C T A ~ G A T G ~ A T C A T A C ~ ~ A R A ~ M C C A R A C A C C T G A 11410 11420 11430 11440 11450 11460 11470 11480 11490 l l W 0 11510 11520

URP-3

C ~ C M P C A C * M R C ~ A ~ A ~ T ~ M G A G ~ R A A T G A C A ~ C A C T A T C ~ M T M ~ G C C A T A A C T C T A T C M ~ A ~ C T A ~ M T C ~ M R ~ A C C T T C C C C * M T M 11530 11540 11550 11560 11570 11580 11590 11600 11610 11620 11630 11640

' M T A T I L M I A ~ T L S T 1 L A I L F F U L P ~ M

T P ~ M ~ K L C P Y ~ C G P O P L G C Y ~ L P Q ~ M ~ F P L I A ~ L F L L ~ ~ L C C C ~ T A T G G A * * M C T C T C C C C C ~ A C ~ ~ R f f i A ~ G A T C C T ~ ~ ~ ~ A T G ~ A ~ A C C A ~ C T C C A T A C C A ~ r r ~ G A T C ~ C A ~ A ~ ~ A ~ G A C ~ A G

11650 11660 11670 11680 11690 11700 11710 11720 11730 11740 11750 11760

E I A L L L P P P U A A ~ L V T P S I V l L U A A L I L T L L T L G L l Y E W L M A ~ G C - C T C C C m ~ C C T T G f f i C C ~ A C M C ~ M C A C A C C M ~ A r r R M T C H A ~ f f i C A ~ T C T A A ~ T M C C C r r C ~ A C T C ~ f f i C C T M ~ A T G M T G A C r r C

11770 11780 11790 11800 11810 11820 11830 11840 11850 11860 11870 I1880

~ ~ m ? A T m A C G A r r M C A f f i A ~ A ~ C H * M C C C C T C ' S A P I L C L T C L A L N S F P l L S ~ L L C L E G M L ~ M S M ~ G l V L T P L

12010 12020 12030 12040 12050 12060 12070 12080 12090 12100 12110 12120

H L F I Y L S S M H L Y I W L D P A A P E A A T G L S L N F ~ H ~ T T H G T ~ Y C A ~ M ~ A T C T A T ~ C T C A T C C A T M T A C T A T A C A ~ ~ A T A C T ~ C

12130 12140 12150 12160 12170 12180 12190 12200 12210 12220 12230 12240

URP-4 ~ L Y I L L P T L M L I Q C T U L T N S ~ U L U P S L T 9 q S L

~ A m G C C T A U C C T C C T A G U T ~ ~ ~ A R A C C M C A ~ M T G ~ M T C C C A T C M C A T G A ~ A A C A M T ~ T C A ~ A T G A C C R C C ~ M C C T C A C A U ~ C C T T A L F S L N L L E C I

12250 12260 12270 12280 1 2 2 ~ 12300 12110 12120 12130 12340 12350 12360

FIG. 3"continued.

9766 X . laeuis Mitochondrial Genome I l S L L S L U W P F N Q S E l T ~ F S N Y L U T l ~ ~ l S l P L L l L T C W L ~ A ~ A C I A C ~ P A C C ~ M ~ A ~ ~ ~ M ~ C A A ~ ~ C G * M C M C ~ A ~ C ~ ~ M A ~ A C ~ A ~ A A C ~ A ~ C A C C * M T ~ A C C C ~ G ~ M ~ ~ ~ M C A ~ G A C ~

12370 12380 12390 12400 12410 12420 12430 12440 12450 12460 12470 12480

L P L U L I A S 9 Y H L S N ~ P I S q Q q l F l T U L V F L Q L S L l U A F S A f c C C A T T A A I A ~ A ~ G c I A C C C ~ f c A ~ A ~ * M C ~ A C C M l ~ C A C ~ C M C C M ~ A ~ ~ ~ A l A C ~ R ~ ~ M ~ A f c C ~ M ~ A l A C C ~ C A C C M

12490 12500 12510 12520 12530 12540 12550 12560 12570 12580 12590 12600

l E L I L F Y I U F E I T L l P T L I I l T ~ W C N ~ A l ~ L ~ A C l Y F L F K C ~ ~ ~ ~ M T ~ A T ~ A T A ~ ~ A T A ~ G ~ ~ A C A ~ A A ~ ~ C C M C A ~ ~ M ~ A ~ ~ A ~ ~ A C A C ~ ~ * M C C M ~ A ~ A C C C ~ * M T C C A ~ A ~ A C - A ~ A I A

12610 12620 12630 12640 12650 12660 12670 12h80 12690 12700 12710 12720

I L A C S L P L L V A L L S L Y S F T G l L S L N L L Q L L P N H l P U l ~ A N C ~ C I ~ ~ ~ C ~ C C ~ A C C C ~ ~ ~ A ~ A C ~ A ~ C A ~ A T A ~ C ~ ~ ~ A C A ~ ~ ~ M C C C T A ~ ~ A C T * M ~ A ~ M C ~ A C I A C C C M C C A C A ~ C C M ~ M ~ A C C C M ~

12730 12740 12750 12760 12770 12790 12790 12800 12410 12820 12830 12840

Y S W W L A C L L A I U V U U P L Y G T ~ L W L P K A H V ~ A Q I A G S U V L A A ~ C A ~ ~ A ~ A C C C A C C ~ A l f f i l ~ l A C C A ~ A l ~ M C A C A C C l A T G A C ~ C C ~ C C f c A ~ A ~ A C C C C ~ A ~ ~ ~ C M T A ~ ~

12850 12860 12870 12880 12890 12900 12910 12920 12930 12940 12950 12960

A I L L K L G C Y C I I P I S f T L S P F U K E L A Y P P L I L S L W G I I ~ l C I A ~ * M C ' P I f f i A ~ A ~ T A ~ A f c C G M ~ M ~ A C A ~ C ~ C C ~ C M l ~ G A A ~ A C C C l A C C C A ~ C f c A T m A ~ A C T A ~ A ~ M ~ A ~ ~ T M C C A

12970 12980 12990 13000 13010 13020 13030 11040 13050 130h0 13070 13080

S S I C L R 9 l D L S S U I A K S S V S H U G L V l S A G ~ N Q l P U K A L l C - A ~ A C C * C * A G A m ~ f c M T A A ~ C I A C f c A T C C T ~ C ~ A ~ A ~ A R M ~ C A ~ ~ M l M f c * M C C C C M ~ - ~ M C ~ f f i

13090 13100 13110 13120 13130 13140 13150 13160 13170 13180 13190 13200

A M I L I l S O G L l H S A L C C L A U K ~ S K ~ ~ l H S ~ A L L L S R C L E T C M T M ~ I A C I ~ M C C ~ C C C C ~ A ~ C ~ ~ A C C ~ A C C M C R A C A C A T A C C ~ O C ~ A T T A T C A C C A O C I A ~ A C C A

13210 13220 13230 13240 13250 13260 13270 132RO 13290 13300 13310 13320

I L P L U C l W W L l S N L A N U A L P P S P N W U C l l l l M l A L F N W S S I C ~ I I ~ ~ A ~ T M T G ~ M C C ~ C I M ~ T C * M ~ M T A T A C C C ~ A C C A C C R C C C C * M ~ M T A ~ ~ ~ ~ ~ ~ A G A ~ A ~ A ~ ~ A T M C A C C ~ A ~ M ~ ~ ~ ~

13330 13340 13350 13360 13370 13380 13390 13400 13410 13420 13430 13440

W T I I L T D L G T L L T A S Y S L K * F L U T ~ ~ G ~ T P E H L ~ A I N P T H - A ~ A f c C I C A C A A C A ~ A ~ A ~ A C ~ f c ~ A C A l A ~ ~ A l M ~ M C G - l M C C C C A G M C A C ~ M ~ A ~ M C C C I A C A C A l A

13450 13460 13470 13480 13490 13500 13510 11520 13530 13540 11550 11560

I R E H T L M 1 U H L I P I I P L U I4 Y P P. L I W C L F P I*-- tRNA H i 8

C C ~ ~ C ~ C * C C ~ ~ M T M C C A T A C A ~ M ~ M ~ ~ A ~ ~ C C A ~ M T M T ~ C ~ C ~ ~ A ~ C - A ~ A ~ T A T A ~ M T ~ A C I A ~ - A 13570 13580 13590 13600 13610 13620 11630 13640 13650 13660 13670 13680

r r C l f f i A R C f f i A f f i ~ r * - A l ~ C C ~ G ~ A C C ~ M ~ ~ ~ M r r A C r r A C ~ ~ M ~ C C ~ ~ " ~ C ~ C tun4 (CUII) "

13690 13700 13710 13720 13730 13740 13750 13760 13770 13780 13790 13800

tRNA c a y , Ser L W

URF-5 ~ U Y P P L I F ~ S S U L I T ~ S I L I L P I

T A * T C O C C I C C I ~ A f f i M C C A ~ C * M l C C M ~ ~ G c I A ~ M ~ C A C T M ~ M C C ~ C A I A r r M r r A C M T C C M ~ M ~ A C C C A f c 13810 13820 13830 13840 13950 13860 13470 13880 13890 13900 13910 13920

L ~ S T F N ~ N I I Y L H H L I ~ T S V K T A P L I I I L ~ ~ C L I C f c A l A f c M C A ~ U C A T ~ l A ~ A l * M C C I C C A f c A ~ M ~ ~ C A ~ A ~ ~ C A C C A ~ C C l M ~ A C C A ~ A ~ C A C ~ A ~ A ~ l A G A c C ~ M

13930 13940 13950 13960 13970 13980 13990 14000 14010 14020 14030 14040

~ ~ ~ ~ ~ M ~ C A C ~ A T A T A T A C C ~ A G A C C C M T A A ~ C C ~ - * M T A C - A ~ C ~ A ~ C A T A ~ A T ~ A ~ A C A G ~ I M C A A ~ M ~ L E F A l W K U A S D P U I S R F F K Y L L T F L V A U V l L V T A N ~ F F q F

14170 14180 14190 14200 14210 14220 14230 14240 14250 14260 14270 14290

~ A ~ f f i ~ R A ~ A f c A l A f c C ~ C C ' F ? l C W E C V C l U S F L L l C W W Y ~ ~ A l P N l A A L ~ A V l ~ U R V G U ~

14290 14300 14310 14320 14330 16340 14350 14360 14370 14390 14390 14400

C G m M ~ ~ A C C A l A O C A l ~ A ~ A I * M C ~ * M C ~ A l ~ * M T A C M C A A ~ A l A ~ * M ~ ~ T M C ~ A C ~ A C C A C ~ ~ A ~ M ~ A C C ~ 14410 14420 14430 14440 14450 14460 14470 14480 14490 14500 14510 14520

C L I L S U A ~ V A ~ N L N S W ~ ~ ~ Q V F U L N ~ ~ ~ L T L P L L G L ~ L A A

A ~ * M ~ ~ ~ ~ C ~ ~ C O C ~ C A C C C A ~ A ~ A C ~ C C C M T A C M ~ ~ ~ ~ ~ C C A C ~ ~ ~ A T C A C C ~ T A ~ A ~ A ~ A C M T A R A - G ~ M ~ 1 C K S A Q P G L H P W L P A A U l C P T P V S A L L H S S l U V V A G l F L L

14530 14540 14550 14560 14570 14580 14590 14600 14610 14620 14630 14640

A C O G M ~ ~ ~ ~ C ~ C F A ~ M ~ * M ~ M T M ~ C * M C A C C ~ A C M ~ G C ~ C ~ ~ M T M C M ~ A ~ ~ A C C ~ C ~ C ~ M C C C ~ ~ A ~ A ~ ~ I R I S P U M N N N 9 T A L T I C L C L G A U l T L F T A A C A L l ~ N D l K K

14650 14660 14670 14680 14690 14700 14710 14720 14730 14740 14750 14760

I V A F S l S S Q L G L U U V T l G L I P ~ L A ? F H l C N ~ A F F Y V K K F F A ~ A C C A ~ M C A ~ M C C ~ ~ ~ ~ ~ ~ A ~ M T M T A R C A C M ~ ~ ~ ~ ~ A C T M ~ ~ ~ ~ A C C - C C A C A ~ C C M T M ~ A ~ ~ A M R A T A ~ A -

14770 14780 14790 14800 14810 14820 14830 14840 14950 16860 14870 14880

~ C ~ M T A ~ U ~ C M ~ A T G M C M G A T A ~ G A ~ T A ~ A ~ C C T A C ~ - A C C M ~ ~ A ~ A C A ~ M C A A - A C C C I A C C ~ M C C C S G 9 K S S C L N U E ~ ~ I R Y U G C L ~ N S L P I T l S C L T l G S L A L l

14890 14900 14910 14P20 14930 14940 14950 14960 14970 14980 14990 l 5 W

C C U C C C C A ~ ~ C C T A C C * C C A ~ ~ ~ ~ C A C O C T A T C A ~ M C C ~ M C A C ~ ~ * M C C M C A C ~ ~ ~ ~ C ~ M C A C C A C A ~ M ~ M C A ~ ~ A C C ~ A ~ G T P F L A C F F S K O A I l l A L ~ T S ~ l ~ l W A L l L l L l A T ~ F l A l

15010 15020 15030 15040 15050 15060 15070 15080 15090 15100 15110 15120

I A I ~ A R M ~ A ~ A ~ ~ A ~ ~ C A C ~ ~ ~ * M ~ ~ C A ~ ~ A ~ A C ~ A ~ ~ M C G ~ C M C ~ A ~ C A T C M C C C M ~ ~ ~ ~ ~ ~ A C C ~ ~ A ~ T S F R V I F P A S ~ G H P R S ~ P L C P l ~ l ~ ~ Y l V l N P I Y R L A W C S

15130 15140 15150 IS160 15170 15180 15190 15200 1S210 15220 15230 15240

FIG. 3"continued.

X. h v i s Mitochondrial Genome 9767

FIG. 3"continued.

tochondrial genomes and to protein sequence data from bo- vine and yeast mitochondrial components (2). R-loop map- ping of polyadenylated transcripts from X. lueuis mitochon- dria (25) and sequence homology with the other mitochondrial genes support the concept that the remaining eight URFs are viable coding regions although their functions remain to be identified.

Most of the protein coding regions of the X . &vis mito- chondrial genome show significant nucleotide homology to their mammalian counterparts as shown in Fig. 4. As in the other mitochondrial genomes, deviations from the universal genetic code are observed. With only 22 available tRNAs, the AUA codon most likely is read as methionine instead of

isoleucine, and the UGA codon is read as tryptophan instead of functioning as a terminator. However, in contrast to the mammalian mitochondrial genomes, all 13 open reading frames of the amphibian mitochondrial genome begin with AUG and the AUA codon is observed only internally. Al- though three of the 13 reading frames encode the stop codon TAG and four encode TAA, all of the TAG codons and one TAA codon (ATPase 6) are converted to UAA after transcrip- tion, processing, and polyadenylation. Five other reading frames encode TNN, where NN is the 5' terminus of the adjacent gene which is lost by the reading frame after cleavage from the primary transcript and subsequently replaced by polyadenylation. The AGA codon, which is observed once as

X. laevis Mitochondrial Genome

A T P a s e 6

X C

H" BO UO

X= Ru BO U.3

FIG. 4. Comparison of open reading frames of the X. laevis and other higher vertebrate mitochon- drial genomes. Species compared Xe, X . laeuis; Hu, human; Bo, bovine, Mo, mouse. Amino acids are indicated by their one-letter abbreviations. Asterisks indicate an identical amino acid present in all four mitochondrial proteins, the dashes represent spaces introduced to optimize the sequence homology, and termination codons are indicated by a period.

X . laevis Mitochondrial Genome 9769

" r f I

H" xe

B O M O

Xe H"

no BO

XC H U B O N O

X C

H" BO Ma

H" X C

n o B O

H" X C

BO ne

a terminator in the human and bovine genomes for COI and is not observed in the X . laeuis mitochbndrial genome. cytochrome b, respectively, also occurs once in the amphibian As in other vertebrate mitochondrial genomes, there are genome as the terminator of URFG. The AGG codon, which three instances of reading frame overlap in X. lueuis mito- is used once in the human genome as a terminator (of URFG), chondrial DNA (see Figs. 2 and 3). The 3' end of URF4L

9770 X . laevk Mitochondrial Genome

overlaps the start of the URF4 gene by seven nucleotides, as in the mouse, bovine, and human mtDNAs, and terminates at UAA. The 3' end of URF5 overlaps the 3' end of the L- strand encoded URFG gene by five nucleotides, while this overlap is 14 nucleotides in the mouse and bovine mitochon- drial genomes and zero in the human mitochondrial genome. The translation of URFG terminates at an AGA codon, as also observed in the human URFG gene. The overlap of URFAGL and the gene for ATPase 6 is only 10 nucleotides in the amphibian mitochondrial genome as compared to 43, 40, and 46 for the mouse, bovine, and human mitochondrial genomes, respectively. This decreased overlap is the conse- quence of a single base change in the amphibian gene, result- ing in a shortened reading frame.

The overall deduced codon usage for the X . laeuis and human mitochondrial genomes is shown in Fig. 5. As with other vertebrate mitochondria, only 22 tRNAs are encoded by the amphibian mitochondrial genome. With the absence of an arginine tRNA to read AGPur codons and the occur- rence of AGA in the X . laevis URFG gene, it is highly likely that AGA serves as a terminator codon in the amphibian mitochondrial genome. In addition, both TGG and TGA most probably code for tryptophan, while both ATG and ATA code for methionine. These observations of unique codon usage have also been reported for the other vertebrate mitochondrial genomes (2-4). Therefore, it is highly likely that the unusual mitochondrial genetic code is conserved throughout higher eucaryotes.

The comparison of overall deduced codon usage in the amphibian and human mitochondrial genomes also shown in Fig. 5 reveals a unique preference for certain codons. With a choice of 2 pyrimidines in the two codon boxes, the amphibian mitochondria shows a preference for codons ending in U, while the human mitochondria has a preference for codons ending in C. After analysis of the tRNA gene sequences shown below in Fig. 6, it can be concluded that the amphibian mitochondrial system prefers the G-U wobble to the normal G-C reading of the third codon position. In four codon boxes, each read by a single tRNA via U-N wobble (40) or "two-out- of-three" (41), there also is a slight preference for codons ending in U rather than C. Since the overall nucleotide composition of the major coding strand of the amphibian mitochondrial DNA is 30% T, 23.5% C, 33% A, and 13.5% G, compared to 24.7% T, 31.2% C, 30.9% A, and 13.2% G for the

cotm TOTALS OVER ALI. CEYES

'(e 111, "~1".1""."111"~"~~"~~~"~~"~~~~~~~"~~"""~~""~~"~~"~~~~

Ye 1111 '(e I!" Ye l lu

Phe TTT 125 7 7 S e r TCT 6h 3 ? Tyr TAT 60 $6 C y s TGT 1 3 5

Lett TTA 2 1 7 7) TCA I 2 3 43 T e r TAA 3 ( 9 ) 1 ( 7 ) T r p TGA 131 93 TAC 5 7 59 TGC I5 1 7 TTC In3 1'41 TCC 44 99

TTC ?n IC, TCG I I 7 TAG 0 2 TGG 16 I I

L ~ U CTT I59 65 pro CCT 5 2 :I 11t.i C ~ T fan IS ~ r p CCT In 7 CTC 54 I 6 7 CTA 111 2 7 6

ccc 2 0 I 1 9 CAC 56 79 CCA I 2 1 52 GI" CAA 93 41

CGC 5 ? 5 CGA 51) ?9

CTG I l l 4 5 ccc I I 7 CAC 6 9 CCG I ?

111. A I T 2 2 8 1 2 5 T h r ACT e 3 5 1 Asn AAT 7 0 3 3 ser ACT I 5 LC+ ATC 110 1 %

Yet ATA l h l l h 7 ACC 7 1 I 5 5 ACA I 4 5 1 1 3 Lys AAA 7 7 R 5 Ter ACA I I

AAC R n 131 AGC hZ l q

~ T C 3 5 'rn ACG R I O AAC a I n ~ G C n I

_ " _ = . " " " " " " " = ~ ~ ~ ~ " " " ~ ~ ~ ~ " ~ ~ " " ~ " " ~ " " " " " ~ ~ " = " ~ ~ ~ - =

_3=-~1"""""==-="~~~"~~~~~~~~~"~~~~"~~-~"~"~"""""~~=-==-

~ ~ ~ = ~ 1 " " " " 1 ~ " ~ ~ " " ~ ~ " ~ ~ ~ ~ " " " ~ ~ " ~ ~ ~ " " ~ ~ ~ = ~ ~ ~ ~ " " ~ " ~ - - ~ = = -

V a l GTT h? 311 A I , I CCT 7 3 4 3 Asp GAT 16 1 5 Gly GGT '14 2'1

GGA I I 3 67 GGc 2 3 3 6

GTC 2 5 49 CCC 6 ) I 2 4 GAC 38 51 CTA 7 5 7 0 GCA 111 en GI,, CAA q? 64 CTC 1 7 14 GCC h 4 GAG I h 2 6

ccc 4 0 4s

~ ~ ~ ~ ~ * " * " " ~ 1 " " " " ~ ~ ~ " ~ " ~ ~ ~ ~ " ~ " ~ ~ ~ ~ ~ ~ " " " ~ ~ " ~ ~ ~ ~ ~ ~ ~ ~ ~ " = ~ ~ -

FIG. 5. Comparison of total codon usage in the X. laevis and human mitochondrial genomes. The codon usage is given over all reading frames, and the numbers in parentheses indicate the number of UAA termination codons which are generated through post-tran- scriptional polyadenylation.

human mitochondrial genome, the observation of codon pref- erence may be significant or only reflect the overall increased presence of T and the decreased occurrence of C in the major coding strand.

Transfer RNA Genes-The X. laeuis mitochondrial genome contains 22 tRNA genes, identified on the basis of map

1"NI

c-c .-, c-c ,-A I C

C L r *

K C

c-c A-7 c-i i-c

C L T I

T.* 7-A c-c r-.

T C T C

m x c

FIG. 6. Sequence of X. laevie mitochondrial tRNA genes represented in the cloverleaf form. A Watson-Crick base pair in a stem region is indicated by a dash or colon and a plus indicates a G + U base pair.

X . laevis Mitochondrial Gemme 9771

location and homology to known mitochondrial tRNA and DNA sequence data (2-4, 9, 11, 22, 23, 42-45). These tRNA genes are shown in cloverleaf structure in Fig. 6. The tRNA genes are interspersed between protein and ribosomal RNA coding regions in the X. luevb mitochondrial genome. Hence, the mechanism of RNA processing from a full-length tran- script with tRNA structures serving as punctuation, as origi- nally described for the human mitochondrial system (18), probably occurs in the amphibian system as well.

The 22 X. luevis mitochondrial tRNA genes are compared to their mammalian counterparts in Fig. 7. As has been observed with other vertebrate mitochondrial tRNAs, the amphibian mitochondrial tRNAs lack many features usually associated with cytoplasmic tRNAs. For example, 16 of the 22 amphibian mitochondrial tRNAs (as compared to 19 for mouse, 18 for human, and 17 for bovine) lack the sequence rT(U)-q-C-Pur-A in loop IV which occurs in other non- organelle elongator tRNAs. As with other mitochondrial tRNAs, the X. laevis mitochondrial tRNAs contain a high proportion of A and U and have several instances of mis- matched base pairings in their stem regions. The most prev- alent mismatches are the G + U pairings (G + T in the tDNA sequences in Fig. 6) which do not cause significant distortion in helical regions (46) and are allowed under the wobble hypothesis (40). Other mismatched pairings occur in the stem of loop I in asparagine, tyrosine, leucine (UUR), leucine (CUN), isoleucine, serine (UCN), and aspartic acid tRNAs. These loosely paired regions may be stabilized by base stack- ing and/or tertiary structural interactions rather than by classical base pairings, as is the case for the truncated mito- chondrial serine (AGY) tRNA.

As also can be seen in Fig. 7, the primary sequences of the Xenopus mitochondrial tRNA genes are quite similar to those of the mammalian mitochondria. The greatest homology oc- curs in the anticodon loop and stem and in the amino acid acceptor stem regions. In contrast, the least conservation of sequence homology occurs in loops I and IV and their corre- sponding stem regions. It might be that the lack of conserved elements in loops I and IV is due partially to the removal of any evolutionary pressure for maintaining the internal pro- moters required for eucaryote tRNA gene transcription (47), since mitochondrial tRNAs are transcribed as part of a single polycistronic species.

Ribosomal RNA Genes-The X. laevis mitochondrial genes for 12 S and 16 S ribosomal RNAs have been identified on the basis of their sequence homology to known mitochondrial rRNA coding regions and similar genomic location (2). A comparison of the primary structures of these rRNA genes with those of the mammalian mitochondrial systems reveals several similarities. In the 12 S rRNA gene, there are five regions with more than 20 consecutive homologous nucleo- tides dispersed throughout the gene. A similar analysis of the X. l aev i s 16 S rRNA gene reveals nine regions with more than 20 consecutive nucleotides homologous to the other sequenced higher vertebrate mitochondrial 16 S rRNA genes, including a large segment within the region 100-200 nucleotides from the 3' terminus. Further analysis of these genes through secondary structure modeling (48, 49) reveals that the X. laevis 12 S and 16 S rRNAs can be folded by base pairing to form structures which are similar to those reported for other organelle and cytoplasmic rRNAs. Fig. 8 shows the proposed secondary structures of the X. luevis and human mitochon- drial 12 S rRNAs, and Fig. 9 shows a putative secondary structure of six domains of the X. loevis 16 S rRNA. The 3' regions of the X. laevis and human mitochondrial 12 S rRNAs (Fig. 8B, upper and lowerpanels, respectively) contain similar

9772 X . laevis Mitochondrial Genome Xenopus laevis Mitochondrial 12 S rRNA

A B

A

Human Mitochondrial 12s rRNA

B

FIG. 8. Comparison of the putative secondary structure for the X . laevis (upper panel) and the human (lower panel) mitochondrial 12 S ribosomal RNAs. These structures have been drawn in two domains (A and B ) according to Brimacombe (49). Spaces have been introduced in some locations to eliminate crowding and optimize homology. A Watson-Crick base pair is indicated by a solid line between the strands, while a G + U base pair is indicated by a broken line between the strands. The arrows represent the connection between the two domains. The 5’ and 3’ termini are so indicated.

X . laevis Mitochondrial Genome

A

A

B

n

9773

C

D E F

FIG. 9. A putative secondary structure for the X. Zaevis mitochondrial 16 S ribosomal RNA. The structure has been drawn in six domains (A-F) for ease of comparison with the proposed model (49). A Watson- Crick base pair is indicated by a solid l ine between the strands, while a G + U base pair is indicated by a broken line between the strands. The arrows represent the connection between the two domains. The 5’ and 3’ termini are so indicated.

overall secondary structural domains. A comparison of the 5‘ regions of these RNAs reveals that the X. &vis 12 S rRNA lacks a large loop-stem structure which is present in the human 12 S rRNA near its 5’ end although the other struc- tural features are conserved. Interestingly, both the amphib- ian and human 12 S rRNAs can form a putative stem-loop structure as their 5’ end. Finally, a region in both 12 S rRNAs quite distal to the 5‘ end also can be postulated to fold back

and base pair to two regions near the 5’ end. These additional base pairings may be required to stabilize the overall second- ary structure of the 12 S rRNAs as it, too, is conserved in amphibians and mammals. A comparison of the mitochondrial 16 S rRNA putative secondary structure with that of the human 16 S rRNA (49) also reveals high degree of structural homology. These comparisons lead to the conclusion that the overall structural domains of both the 12 S and 16 S rRNAs

9774 X . laevis Mitoch

are highly conserved in mammals and amphibians, although in many instances they differ dramatically in their primary sequences.

Acknowledgments-We thank Dr. I. Dawid for providing the plas- mid pXlm-31 containing the entire X. loeuis mitochondrial genome and M. A. Ahmed, M. Stevens, and J. Moran for assistance in the early stages of this project.

REFERENCES 1. Altman, P. L., and Katz, D. D. (1976) Biol. Handb. I. Cell Biol.

2. Anderson, S., Bankier, A. T., Barrell, B. G., deBruijn, M. H. L., Coulson, A. R., Drouin, J., Eperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, R., and Young, I. G. (1981) Nature 290,457-465

3. Anderson, S., deBruijn, M. H. L., Coulson, A. R., Eperon, I. C., Sanger, F., and Young, I. G. (1982) J. Mol. Biol. 156,683-717

4. Bibb, M. J., Van Etten, R. A., Wright, C. T., Walberg, M. W., and Clayton, D. A. (1981) Cell 2 6 , 167-180

5. Cordonnier, A. M., Vannier, P. A., and Brun, G. M. (1982) Eur. J. Biochem. 126, 119-127

6. Kroon, A. M., Pepe, G., Bakker, H., Holtrop, M., Bollen, J. E., Van Bruggen, E. F. J., Cantatore, P., Terpstra, P., and Saccone, C. (1977) Biochim. Biophys. Acta 4 7 8 , 128-145

7. Kobayashi, M., Seki, T., Yaginuma, K., and Koike, K. (1981) Gene (Amst.) 16,297-307

8. Koike, K., Kobayashi, M., Yaginuma, K., Taira, M., Yoshida, E., and Imai, M. (1982) Gene (Amst.) 20,177-185

9. Sekiya, T., Kobayashi, M., Seki, T., and Koike, K. (1980) Gene

10. Saccone, C., Cantatore, P., Gadeleta, G., Gallerani, R., Lanave, C., Pepe, G., and Kroon, A. M. (1981) Nucleic Acids Res. 9 ,

11. Wolstenholme, D. R., Fauron, C. M.-R., and Goddard, J. M. (1982) Gene (Amst.) 20.63-69

12. Brown, G. C., and Simpson, M. V. (1982) Proc. Natl. Acad. Sci. U. S. A. 39,3246-3250

13. Champagne, A. M., Dennebouy, N., Julien, J. F., Lehegarat, J. C., and Mounolou, J. C . (1984) Biochem. Biophys. Res. Com- mun. 122,918-924

217-219

(Amst.) 11,53-62

4139-4148

14. Klukas, C. K., and Dawid, I. B. (1976) Cell 9,615-625 15. Goddard, J. M., and Wolstenholme, D. R. (1978) Proc. Natl. Acad.

Sci. U. S. A. 76,3886-3890 16. Goddard, J. M., and Wolstenholme, D. R. (1980) Nucleic Acids

Res. 8, 741-757 17. Clary, D. O., Goddard, J. M., Martin, S. C., Fauron, C. M. R.,

6637 and Wolstenholme, D. R. (1982) Nucleic Acids Res. 10,6619-

18. Ojala, D., Merkel, C., Gelfand, R., and Attardi, G . (1980) CeU 22,

19. Clayton, D. A. (1982) Cell 28,693-705 20. Bogenhagen, D., Gillum, A. M., Martens, P. A., and Clayton, D.

A. (1978) Cold Spring Harbor Symp. Quunt. Biol. 43, 253-262 21. Barrell, B. G., Anderson, S., Bankier, A. T., deBruijn, M. H. L.,

393-403

mdrial Genome

Chen, E., Coulson, A. R., Drouin, J., Epperon, I. C., Nierlich, D. P., Roe, B. A., Sanger, F., Schreier, P. H., Smith, A. J. H., Staden, R., and Young, I. G . (1980) Proc. Natl. Acad. Sci. U. S.

22. Roe, B. A., Wong, J. F. H., Chen, E. Y., and Armstrong, P. W. (1981) in Recombinant DNA: Proceedings of the Third Cleveland Symposium on Macromolecules (Walton, A. G., ed) pp. 167- 176, Elsevier Publishing Co., Amsterdam

23. Roe, B. A., Wong, J. F. H., Chen, E. Y., Armstrong, P. W., Stankiewicz, A., Ma, D. P., and McDonough, J. (1982) in Mitochondrial Genes (Slonimski, P., Borat, P., and Attardi, G., eds) pp. 45-49, Cold Spring Harbor Laboratory, Cold Spring

A. 77,3164-3166

Harbor, NY

Nucleic Acids Res. 11.4977-4995 24. Wong, J. F. H., Ma, D. P., Wilson, R. K., and Roe, B. A. (1983)

25. Rastl, E., and Dawid, I. B . (1979) Cell 18,501-510 26. Messing, J., and Vieira, J. (1982) Gene (Amst.) 1 9 , 269-276 27. Sanger, F., Coulson, A. R., Barrell, B. G., Smith, A. J. H., and

28. Sanger, F., Nicklen, S., and Coulson, A. R. (1977) Proc. Natl.

29. Staden, R. (1979) Nucleic Acids Res. 6, 2601-2610 30. Staden, R. (1978) Nucleic Acids Res. 5,1013-1015 31. Staden, R. (1977) Nucleic Acids Res. 4 , 4037-4051 32. Wilber, W. J., and Lipman, D. L. (1983) Proc. Natl. Acad. Sci. U.

33. Clayton, D. A. (1984) Annu. Reu. Biochem. 53,573-594 34. Bogenhagen. D.. Lowell, C.. and Clayton. D. A. (1981) J. Mol.

Roe, B. A. (1980) J. Mol. Biol. 143,161-178

Acad. Sci. U. S. A. 74,5463-5467

S. A. 80, 726-730

3wl. 1k3; 77193 . . . .

35. Gillum. A. M.. and Clavton. D. A. (1978) Proc. Natl. Acad. Sci. " < . , U. S.'A. 75,'677-681

36. Walberg, M. W., and Clayton, D. A. (1981) Nucleic Acids Res. 9 ,

37. Chang, D. D., and Clayton, D. A. (1984) Cell 36,635-643 38. Bogenhagen, D. F., Applegate, E. F., and Yoza, B. K. (1984) Cell

39. Martens, P. A., and Clayton, D. A. (1979) J. Mol. Biol. 135,327-

40. Crick, F. H. C. (1966) J. Mol. Biol. 19,54&555 41. Lagerkrist, U. (1978) Proc. Natl. Acad. Sei. U. S. A. 7 5 , 1759-

42. Randerath, E., Agrawal, H. P., and Randerath, E. (1981) Biochem.

43. Randerath, K., Agrawal, H. P., and Randerath, E. (1981)

44. Agrawal, H. P., Gupta, R. C., Randerath, K., and Randerath, E.

45. DeBruijn, M. H. L., Schreier, P. H., Eperon, I. C., Barrell, B. G., Chen, E. Y., Armstrong, P. W., Wong, J. F. H., and Roe, B. A. (1980) Nucleic Acids Res. 8,5213-5222

46. Rich, A., and RajBhandary, U. L. (1976) Annu. Reu. Biochem.

47. Hall, B. D., Clarkson, S. G., and Tocchini-Valentini, G. (1982)

48. Noller, H. F. (19$4) Annu. Reu. Biochem. 53,119-162 49. Brimacombe, R. (1984) Trends Biochem. Sci. 9,273-277

5411-5421

36,1105-1113

351

1762

Biophys. Res. Commun. 1 0 3 , 739-744

Biochem. Biophys. Res. Commun. 100,732-737

(1981) FEES Lett. 130 , 287-290

45,805-860

Cell 29.3-5