Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

14
Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi Manolo Gouy*st and Wen-Hsiung Li* *Center for Demographic and Population Genetics, University of Texas, Houston; and jLaboratoire de Biometric, Universid Lyon I The branching order of the kingdoms Animalia, Plantae, and Fungi has been a controversial issue. Using the transformed distance method and the maximum parsimony method, we investigated this problem by comparing the sequences of several kinds of macromolecules in organisms spanning all three kingdoms. The analysis was based on the large-subunit and small-subunit ribosomal RNAs, 10 isoacceptor transfer RNA families, and six highly conserved proteins. All three sets of sequences support the same phylogenetic tree: plants and animals are sibling kingdoms that have diverged more recently than the fungi. The ribosomal RNA and protein data sets are large enough so that in both cases the inferred phylogeny is statistically significant. The present report appears to be the first to provide sta- tistically conclusive molecular evidence for the phylogeny of the three kingdoms. The determination of this phylogeny will help us to understand the evolution of various molecular, cellular, and developmental characters shared by any two of the three kingdoms. Noting that the large-subunit rRNA sequences have evolved at similar rates in the three kingdoms, we estimated the ratio of the time since the animal-plant split to the time since the fungal divergence to be 0.90. Introduction Although the fungi have long been classified as a separate kingdom, their evo- lutionary position relative to the kingdoms Animalia and Plantae remains uncertain. Traditionally, fungi are considered to be more closely related to plants than to animals, but this view is not supported by solid evidence. Indeed, the fossil record of such an ancient divergence is nonexistent. Also, morphological characters cannot be employed to study this question because they are often of little use for comparisons at the in- terphylum level, and are even less useful at the interkingdom level. With the deter- mination of the primary structure of homologous macromolecules in numerous or- ganisms spanning several kingdoms, molecular phylogeny techniques, which have proved to be useful for resolving phylogenetical questions at the interphylum level or beyond (Baroin et al. 1988; Field et al. 1988), may provide a resolution of this long- standing issue. The problem to be solved is to identify which of the three possible trees shown in figure 1 represents the true interkingdom divergence. All three trees have been proposed. The classical view favors tree C. Cavalier-Smith (1987a and references therein) has proposed tree B, on the basis of cellular characters-such as the absence 1. Key words: molecular phylogeny, branching dates, ribosomal RNAs, transfer RNAs, protein se- quences. Abbreviations: SSU = small subunit; LSU = large subunit; GAPDH = glyceraldehyde-3-phosphate dehydrogenase; SOD = superoxide dismutase; TPI = triosephosphate isomerase. Address for correspondence and reprints: Wen-Hsiung Li, Center for Demographic and Population Genetics, University of Texas, P.O. Box 20334, Houston, Texas 77225. Mol. Bid. Evol. 6(2): 109-122. 1989. 0 1989 by The University of Chicago. All rights reserved. 0737-4038/89/0602-000 I$O2.00 109

Transcript of Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Page 1: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi ’

Manolo Gouy*st and Wen-Hsiung Li* *Center for Demographic and Population Genetics, University of Texas, Houston; and jLaboratoire de Biometric, Universid Lyon I

The branching order of the kingdoms Animalia, Plantae, and Fungi has been a controversial issue. Using the transformed distance method and the maximum parsimony method, we investigated this problem by comparing the sequences of several kinds of macromolecules in organisms spanning all three kingdoms. The analysis was based on the large-subunit and small-subunit ribosomal RNAs, 10 isoacceptor transfer RNA families, and six highly conserved proteins. All three sets of sequences support the same phylogenetic tree: plants and animals are sibling kingdoms that have diverged more recently than the fungi. The ribosomal RNA and protein data sets are large enough so that in both cases the inferred phylogeny is statistically significant. The present report appears to be the first to provide sta- tistically conclusive molecular evidence for the phylogeny of the three kingdoms. The determination of this phylogeny will help us to understand the evolution of various molecular, cellular, and developmental characters shared by any two of the three kingdoms. Noting that the large-subunit rRNA sequences have evolved at similar rates in the three kingdoms, we estimated the ratio of the time since the animal-plant split to the time since the fungal divergence to be 0.90.

Introduction

Although the fungi have long been classified as a separate kingdom, their evo- lutionary position relative to the kingdoms Animalia and Plantae remains uncertain. Traditionally, fungi are considered to be more closely related to plants than to animals, but this view is not supported by solid evidence. Indeed, the fossil record of such an ancient divergence is nonexistent. Also, morphological characters cannot be employed to study this question because they are often of little use for comparisons at the in- terphylum level, and are even less useful at the interkingdom level. With the deter- mination of the primary structure of homologous macromolecules in numerous or- ganisms spanning several kingdoms, molecular phylogeny techniques, which have proved to be useful for resolving phylogenetical questions at the interphylum level or beyond (Baroin et al. 1988; Field et al. 1988), may provide a resolution of this long- standing issue.

The problem to be solved is to identify which of the three possible trees shown in figure 1 represents the true interkingdom divergence. All three trees have been proposed. The classical view favors tree C. Cavalier-Smith (1987a and references therein) has proposed tree B, on the basis of cellular characters-such as the absence

1. Key words: molecular phylogeny, branching dates, ribosomal RNAs, transfer RNAs, protein se- quences. Abbreviations: SSU = small subunit; LSU = large subunit; GAPDH = glyceraldehyde-3-phosphate dehydrogenase; SOD = superoxide dismutase; TPI = triosephosphate isomerase.

Address for correspondence and reprints: Wen-Hsiung Li, Center for Demographic and Population Genetics, University of Texas, P.O. Box 20334, Houston, Texas 77225.

Mol. Bid. Evol. 6(2): 109-122. 1989. 0 1989 by The University of Chicago. All rights reserved. 0737-4038/89/0602-000 I$O2.00

109

Page 2: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

110 Gouy and Li

0 A 0 B 0 C

PF F P FA

RG. 1 .-Three possible alternative phylogenetic trees, labeled 0, 0, and 0, for the three kingdoms Animalia (A), Plantae (P), and Fungi (F).

of chloroplasts-shared by animals and fungi. Phylogenetic analyses of cytochrome c amino acid sequences supported either tree A (Fitch 1976) or tree B (Dayhoff et al. 1975). Several recent analyses of small-subunit ribosomal (SSU) RNA sequences have pointed to tree A (Gunderson et al. 1987; Vossbrinck et al. 1987), tree B (Hunt et al. 1985)) or tree C (Nairn and Ferl 1988)) though without focusing on the evo- lutionary position of fungi.

Our analysis was made by comparing the sequence of as many suitable (i.e., well conserved through evolution) macromolecules as possible in one or more species from each of the three kingdoms. To determine their branching order without assuming rate constancy, an outgroup is necessary. Many eukaryotic protists, which are thought

Data and Methods

Macromolecules were chosen according to two criteria. First, each molecule has to be sequenced in at least one species in each of the three kingdoms and in an outgroup species. Second, these molecules have to be sufficiently conserved in primary structure across the evolutionary spans considered here so that extant sequences can be reliably aligned. The following molecules satisfy these criteria: the SSU and large-subunit (LSU) ribosomal RNAs (in agreement with its evolutionary origin and ribosomal location, 5.8s rRNA has been considered here to be the S-end of LSU rRNA and therefore is included in the analysis; 5s rRNA was tested but not retained in the sample because of its short length and high divergence when interkingdom comparisons are made); 10 isoacceptor tRNA molecules; six proteins [ cytochrome c, glyceraldehyde-3-phos- phate dehydrogenase (GAPDH), triosephosphate isomerase (TPI ), Cu/Zn superoxide dismutase (SOD), ATPase alpha-subunit, and 70-kD heat-shock protein (hsp70)]. The taxa from which sequences were used differ for each molecule; they are listed in tables l-3.

Page 3: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 111

Table 1 Proportions of Nucleotide Differences between SSU and LSU rRNAs from Various Groups of Organisms

Plants Fungi Dictyostelium Crithidia

Animals: SSU^ . . . . . LSV . . . . . .

AverageC . Plants:

SSV . . . . . . Isue . . . . . .

AverageC . Fungi:

ssu’ . . . . . . IXJ’ . . . .

AverageC . Dictyostelium:

SSUh . . . LSV . . . . . .

AverageC .

0.101 (0.109 + 0.011) 0.110 (0.120 f 0.011) 0.163 (0.186 k 0.014) 0.236 (0.286 zk 0.019) 0.116 (0.127 + 0.009) 0.135 (0.151 + 0.010) 0.206 (0.245 & 0.014) 0.221 (0.264 f 0.014) 0.110 (0.120 + 0.007) 0.126 (0.139 f 0.007) 0.190 (0.222 + 0.010) 0.227 (0.272 k 0.011)

0.077 (0.08 1 f 0.009) 0.135 (0.150 + 0.013) 0.222 (0.267 zk 0.018) 0.120 (0.132 f 0.009) 0.199 (0.236 & 0.013) 0.225 (0.270 f 0.014) 0.104 (0.112 f 0.007) 0.175 (0.203 f 0.009) 0.224 (0.269 f 0.011)

0.146 (0.164 + 0.013) 0.220 (0.264 f 0.018) 0.196 (0.230 f 0.013) 0.221 (0.264 f 0.014) 0.177 (0.204 + 0.009) 0.221 (0.264 f 0.011)

0.232 (0.282 + 0.019) 0.266 (0.332 + 0.016) 0.253 (0.313 f 0.012)

NOTE.-Data in parentheses are intergroup distances, corrected for multiple hits, along with their SEs. ‘Data set: human, mouse (Mus musculus), rabbit (Oryctolagus cuniculus), Xenopus laevis, and Artemia salina (an

arthropod). b Data set: human, mouse (M. musculus), and X. laevis. ’ Average of intergroup distances, weighted by the number of sites used in SSU ( 1,110) and LSU ( I,86 1) rRNAs. d Data set: rice (Oryza sativa), maize (Zea mays), soybean (Glycine mar), and Zamia pumila (a cycad). ’ Data set: rice [O. sativa] and lemon [Citrus limon; the 5.8s rRNA portion was from broad bean (Viciafaba), another

dicot]. ‘Data set: Saccharomyces cerevisiae, Neurospora crassa, and Pneumocystis carinii. a Data set: S. cerevisiae. h Outgroups: Dictyostelium discoideum and Crithidia fasciculata (a flagellate).

Most nucleotide and protein sequences were extracted from the nucleic acid and protein sequence data bases: GenBank (release 57), EMBL (release 14), and NBRF- PIR (release 16). The tRNA and rRNA samples were also supplemented by recent compilations (Sprinzl et al. 1987 for tRNAs; Dams et al. 1988 for SSU rRNAs; Erd- mann and Wolters 1986 for 5.8s rRNAs; and Gutell and Fox 1988 for LSU rRNAs). Zamia pumila and Pneumocystis carinii SSU rRNA sequences are from Edman et al. (1988) and Naim and Ferl(l988). The corrected rabbit SSU rRNA sequence (Rairkar

Table 2 Proportions of Nucleotide Differences between Composite tRNA Sequences from Various Groups of Organisms

Angiosperms Yeast Eubacteria Archaebacteria

Mammals’ . . . Angiospermsb . Yeast . . Eubacteria’. . .

0.208 (0.244 f 0.02 1) 0.263 (0.326 & 0.025) 0.380 (0.535 k 0.037) 0.354 (0.483 k 0.034) 0.28 1 (0.354 f 0.027) 0.36 1 (0.497 + 0.035) 0.353 (0.479 + 0.033)

0.391 (0.558 + 0.039) 0.407 (0.598 f 0.042) 0.38 1 (0.534 & 0.037)

NOTE.-A total of 745 homologous sites were used, by appending end to end 10 &acceptor tRNAs. Data in parentheses are intergroup distances, corrected for multiple hits, along with their SEs.

’ Mammals: composite sequence of human, murine (Mus musculus), and bovine (Bos taunts) tRNAs. b Angiosperms: composite sequence ofwheat (Triticum aestivum), soybean (Glycine max), Petunia sp., and bean (Phas-

eolus vulgaris) tRNAs. ’ Eubacteria: two complete IO-tRNA sets from Escherichia coli and Bacillus subtilis, respectively. Archaebacteria:

Halobacterium volcanii.

Page 4: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

112 Gouy and Li

Table 3 Proportions of Amino Acid Differences between Proteins from Various Groups of Organisms

Plants Fungi Bacteria

Animals: GAPDH ........ TPI ............ SOD ........... ATPase ......... hsp70 .......... Cytochrome c. ...

Average’ ...... Plants:

GAPDH ........ TPI ............ SOD ........... ATPase ......... hsp70 .......... Cytochrome c. ...

Average” ...... Fungi:

GAPDH ........ TPI ............ SOD ........... ATPase ......... hsp70 .......... Cytochrome c. ...

Average’ . . . . . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

. . .

0.309 0.375 0.425 0.207 0.196 0.36 1 0.274

(319) (232) (132) (410) (451)

(90)

0.343 0.45 1 0.459 0.563 0.44 1 0.683 0.237 0.28 1 0.218 0.444 0.334 0.53 1 0.306 0.446

0.320 0.463 0.45 1 0.559 0.453 0.697 0.242 0.265 0.220 0.465 0.398 0.549 0.306 0.45 1

0.434 0.568 0.697 0.302 0.463 0.55 1 0.446

NOTE.-Protein data sets: GAPDH from Homo sapiens, Rattus norvegicus, Gallus gallus, Caenorhahditis elegans, Drosophila melanogaster, Homarus americanus, Sinapis alba, Nicotiana tabacum, Zea mays (Brinkmann et al. 1987), Saccharomyces cerevisiae, Zygosaccharomyces roux& Escherichia coli, Bacillus stearothermophilus, Thermus aquaticus, and Zymomonas mobilis (Conway et al. 1987); TPI from Homo sapiens, Oryctolagus cuniculus, G. gallus, Latimeria chalumnae, Z. mays. S. cerevisiae, Schizosaccharomyces pombe, Aspergillus nidulans, E. coli, and B. stearother- mophilus: SOD from H. sapiens, R. norvegicus, Bos taurus, Sus scrosa, Equus caballus, Xiphias gladius (swordfish), D. melanogaster, Spinacia oleracea, Z. mays (Cannon et al. 1987), Brassica oleracea (Steffens et al. 1986), S. cerevisiae, Neurospora crassa, and Photobacterium leiognathi; ATPase from Xenopus laevis, Z. mays, Oenothera biennis, S. cerevisiae, E. coli, Rhodopseudomonas blastica, and Rhodospirillum rubrum. Hsp70 from H. sapiens, R. norvegicus, G. gallus, X. laevis, D. melanogaster (loci 87~1 and 87a7), Z. mays. S. cerevisiae (gene YG 100; Ingolia et al. 1982). E. coli, and B. megatherium (residues 343-392 and the C-terminus of gene YGlCKl are unsequenced; that part of the protein has been omitted from computations); cytochrome c from G. gallus, D. melanogaster, Helix aspersa, Asterias rubens, EiseniaJoeida (annelid), Acer negundo, Gingko biloba, Phaseolus aureus, Oryza saliva, Triticum aestivum, S. cerevisiae (iso- I), S. pombe, Humicola lan- uginosa, N. crassa, Ustilago sphaerogena, Nitrobacter winogradskyi, Paracoccus denitrtjicans, Rho- dopseudomonas sphaeroides. Rhodopseudomonas viridis. Agrobacterium tumefaciens, R. rubrum, and Rhodomicrobium vanniellii. Numbers in parentheses are number of sites of each protein used in the distance computations.

’ Average of intergroup distances for each protein, weighted by the number of sites of that protein.

et al. 1988 ) was used. Sources of protein sequences other than the data bases are given in table 3.

Protein sequences were aligned by eye with the help of the multiple alignment editing program of the UWGCG package (Devereux et al. 1984) and of alignments published in the source articles for the sequences. tRNAs and SSU rRNAs were aligned

Page 5: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 113

as published (Sprinzl et al. 1987; Dams et al. 1988). LSU rRNAs were aligned so that they fit the consensus secondary-structure folding model (Leffers et al. 1987). Only those regions of rRNAs that have been well conserved and can be unambiguously aligned have been considered.

Phylogenetic trees were built using two methods. The first is the transformed distance (TD) method (Farris 1977; Klotz et al. 1979; Li 198 l), which uses one or more outgroups as references to correct unequal rates of evolution among the ingroup taxa. The second is the maximum parsimony method, using Felsenstein’s PHYLIP package (Felsenstein 198 1). For both methods, all sites involving an insertion-deletion event in any sequence were discarded. After multiple alignment and delimitation of the regions conserved enough to be retained, the number of different sites between each pair of sequences was computed. Interkingdom distances were then computed by averaging the number of differences between individual species. These distances were treated by the TD algorithm to obtain one of three possible trees (fig. 1). When using the TD method, distances were not corrected for multiple hits, because such correction has been shown not to improve the probability of finding the true phylo- genetic tree (Li et al. 1987; Saitou and Nei 1987; Sourdis and Nei 1988).

After the tree-building procedure, branch lengths were calculated by the least- squares method from pairwise evolutionary distances corrected for multiple hits ac- cording to either Kimura’s two-parameter model for nucleotide sequences or the Pois- son model for protein sequences (see, e.g., Nei 1987, pp. 41, 67). This procedure allows us to compute the SE of the estimated internal branch length (Li 1989). The internal branch length is considered to be greater than zero at the 5% significance level if it is larger than twice its SE.

Results SSU and LSU Ribosomal RNA Data

The SSU rRNA has been sequenced in a wide range of eukaryotic organisms (recently compiled in Dams et al. 1988). The LSU rRNA has been sequenced in comparatively few organisms: only three protist sequences are known, one of them, Dictyostelium, being incomplete at its 3’-end. We retained as outgroups the slime mold Dictyostelium and the flagellate Crithidia, for which both the SSU and LSU rRNA have been completely sequenced (or nearly so for Dictyostelium LSU). Table 1 shows the observed proportions of nucleotide differences, in the rRNA sequences, between the three kingdoms and the two outgroup organisms. In agreement with a previous report (Sogin et al. 1986), it appears that the SSU rRNA has evolved faster in the animal lineage than in the other two kingdoms: for this molecule, the distance from the animal group to an outgroup is consistently greater than that between plants or fungi and the same outgroup. Such a phenomenon does not happen with the LSU rRNA. Using the numbers of sites analyzed in SSU and LSU rRNAs as weights, we computed the average sequence differences for the five taxa analyzed (table 1). Figure 2 shows that the evolutionary tree inferred by the TD method from the rRNA data set supports topology A (fig. 1) : animals and plants are grouped together while fungi have diverged earlier. The SE of the estimated length of the branch connecting the animal-plant node to the point of fungi divergence is also shown. The estimated branch length is greater than zero at the 5% significance level.

A parsimony analysis was made on combined SSU and LSU rRNAs of the fol- lowing organisms: the human and Xenopus rRNAs represent the animal kingdom; the plant kingdom is represented by a monocot (rice) and a composite dicot sequence

Page 6: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

114 Gouy and Li

1 .8+/- 0.4

Dictyostelium

Crithidia FIG. 2.-Unrooted phylogenetic tree inferred from rRNA sequences. A total of 2,97 1 sites were analyzed.

Here and in figs. 3 and 4, branch lengths are in percent substitutions, corrected for multiple hits. The length estimates of the two internal branches are followed by their SE. A = animals; P = plants; F = fungi. See table 1 for a description of the data set.

(soybean for SSU, broad bean for 5.8S, and lemon for LSU); the only available fungal sequence is that of yeast; and Dictyostelium and Crithidia are the outgroups. The most parsimonious tree supports topology A (and also clusters rice with the dicot, and human with Xenopus) and requires 1,540 base changes. In contrast, trees B and C require, respectively, 1,549 and 1,547 changes.

Transfer RNA Data

Looking for homologous tRNAs in the three kingdoms and an outgroup, we found 10 isoacceptor species: Asn-tRNA-GUU, Asp-tRNA-GUC, Arg-tRNA-ACG, Gly-tRNA-GCC, elongator Met-tRNA-CAU, initiator Met-tRNA-CAU, Phe-tRNA- GAA, Pro-tRNA-UGG, Trp-tRNA-CCA, and Tyr-tRNA-GUA, each of the 10 being shown with its anticodon. Isoacceptors from various organisms were considered ho- mologous if they had the same anticodon. For each of the above tRNA families, a sequence is known in two eubacteria (Escherichia coli and Bacillus subtilis) and also in the archaebacterium Halobacterium volcanii. We also considered protist tRNAs, but too few of them were available for this analysis. The 10 tRNA sequences from each taxon were appended end to end to make a single string of sequences. We thus obtained both a mammalian compound sequence made with human, murine, and bovine tRNAs and an angiosperm compound sequence-made out of wheat, soybean, petunia, and bean tRNAs.

Page 7: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 115

animals yeast animals angiosperms

angiosperms eubacteria yeast halobacterium FIG. 3.-Unrooted phylogenetic trees inferred from concatenated tRNA sequences. Branch lengths are

in percent substitutions. The SEs of the internal branch length estimates are given. A total of 745 sites were analyzed.

Table 2 shows the overall divergence of compound tRNA sequences. The resulting evolutionary trees are given in figure 3. When the eubacterial sequences are used as an outgroup, tree A is predicted with a statistical significance level of 8%. The Halo- bacterium outgroup leads to topology B but with an internal branch length not sig- nificantly greater than zero. The parsimony analysis confirms these results. When the two eubacterial sequences are used, tree A is the most parsimonious tree and requires 686 base changes, against 692 and 702 base changes, respectively, for trees B and C. With the archaebacterial outgroup, the most parsimonious tree is B, with 558 changes; tree A requires only one more change, while C requires 565 changes. The tRNA an- alysis thus disfavors topology C, while not allowing separation of the other two trees.

Protein Sequences

We found six sets of homologous polypeptides to be adequate for the present analysis. Two, GAPDH and TPI, are involved in glycolysis. The nuclei of plant cells harbor genes for two types of GAPDH: (1) the cytosolic, glycolytic isoenzymes and (2) the chloroplastic, photosynthetic isoenzymes (Brinkmann et al. 1987). The same phenomenon occurs with TPI plant genes (Marchionni and Gilbert 1986). For both GAPDH and TPI, the two types of plant genes display a sequence divergence corre- sponding to the prokaryote/eukaryote separation, and therefore the gene for the chlo- roplastic enzyme is thought to have been transferred, during the course of evolution, from the genome of the endosymbiotic chloroplast into the nuclear genome. From plants, we thus retained only the cytosolic GAPDH and TPI isoenzymes, for the others would be paralogous to the animal enzymes. The Cu/Zn SOD protects aerobic cells against oxygen toxicity. Eukaryotic cells contain a cytoplasmic Cu/Zn SOD and a mitochondrial Mn SOD. Prokaryotes contain usually a Mn SOD or a Fe SOD which is homologous but structurally unrelated to the Cu/Zn isoenzyme (Steffens et al. 1986). However, a Cu/Zn SOD has been discovered sporadically in bacteria, and its gene has been sequenced in Photobacterium Zeiognathi (Steinman 1987), thus giving us an outgroup sequence. The alpha subunit of ATPase is the largest subunit of the ATP-synthesizing enzyme complex in bacteria, mitochondria, and chloroplasts. We found three bacterial sequences for comparison with eukaryotic mitochondrial iso- enzymes. Chloroplastic proteins, which have no homologue in animals or fungi, were not included in the analysis. Cytochrome c, encoded in the nucleus, is located in the mitochondria of all aerobic cells, and it functions in oxydative phosphorylation. Cy- tochrome c has been shown to be closely related to bacterial cytochromes c2 and ~550, which we took as outgroups (Dayhoff 1978). Mitochondrial polypeptides, such as ATPase and cytochrome c, may be included in the present analysis as long as the acquisition of the symbiotic prokaryote from which mitochondria derive antedates

Page 8: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

116 Guoy and Li

animals funai

plan ts bacteria FIG. 4.-Unrooted phylogenetic tree inferred from the pooled protein data set. A total of 1,634 sites

were analyzed. Branch lengths are in percent substitutions. The SE of the internal branch length estimate is shown. See table 3 for a description of the data set.

the animal-plant-fungal divergence. Mitochondria have sometimes been suggested to be polyphyletic, but no evidence for this has been found in analyses of mitochondrial SSU rRNAs (Hunt et al. 1985). The mitochondrial genome is now thought to have originated once from a purple nonsulfur bacterium (Cavalier-Smith 1987b). Heat- shock protein hsp70 is one of a series of proteins whose expression is triggered in response to an increase in temperature of the cell environment. All above polypeptides have bacterial homologues and are present in most, if not all, eukaryotic cells; they evolved from a common ancestor established before the divergence of eukaryotes and eubacteria.

After alignment, these protein sequences were grouped by kingdom, and inter- kingdom distances were computed. Each of these six proteins has bacteria as the outgroup taxon, thus allowing us to pool them in a composite protein sequence data set (table 3 ) . The phylogenetic tree inferred from these data is shown in figure 4. Here again, tree A is obtained and the internal branch length is greater than zero at the 5% significance level.

Relative Dates of Divergence of the Three Kingdoms

We noted above that LSU rRNA has evolved at similar rates in the three kingdoms (table 1). The LSU rRNA data may therefore be used to estimate the ratio of T, (time since the animal-plant divergence) to T2 (time since the fungal divergence): TI/Tz = 2d( A, P)/ [ d( A, F) + d( P, F)] , where d(X, Y) is the corrected distance between kingdomsXand Y. Since d(A, P) = 0.127, d(A, F) = 0.151, and d(P, F) = 0.132, we obtain TI / T2 = 0.90. The divergence between animals and plants has been estimated from the geological record to be 1 .O-0.7 billion years ago (Gya) (Schopf et al. 1983). Therefore, T2 is estimated to be. 1.1-0.8 Gya. The upper bound of 7’2 is close to the estimate (1.2 Gya) obtained by both Dickerson (197 1) and Kimura and Ohta (1973) from cytochrome c sequence data.

Discussion

We have computed the molecular distances by which homologous macromole- cules differ in the kingdomsAnimalia, Fungi, and Plantae. The presently available corpus of nucleic acid and protein sequence data allowed us to base this phylogenetic analysis on the two largest rRNAs, 10 isoacceptor tRNA families, and six highly con- served polypeptide sequences. When pooled, these three series of data yield, respectively,

Page 9: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 117

2,97 1 sites for rRNAs, 745 sites for tRNAs, and 1,634 sites for proteins, for interking- dom comparisons relative to various outgroup sequences. While the tRNA data set is too small to yield statistically significant results, ribosomal RNA and protein sequences indicate the same phylogenetic relationship for the three kingdoms: plants and animals are sibling kingdoms that diverged from a common ancestor after the fungal lineage became separated (figs. 2-4).

A large amount of data has proved necessary to untangle this molecular phylogeny problem. The SSU rRNA alone yields confusing indications, favoring different phy- logenies depending on the outgroup used (data not shown). This arises partly from the approximately twofold difference in evolutionary rate between the animal and plant lineages, a difference that greatly reduces the resolving power of phylogenetic reconstruction methods. LSU rRNA evolved at similar rates in all three kingdoms. SSU and LSU rRNAs can be pooled with either Dictyostelium or the flagellate Crithidia as an outgroup. The resulting internodal distance between the branching node for fungi and the node for the plant/animal split is statistically significant (fig. 2). The tRNA data provide additional support for the same phylogenetic tree when the eu- bacterial outgroup is used, though the results with the archaebacterial outgroup are not consistent. Also, individual protein sequences do not favor a unique topology (data not shown). This situation is understandable when one considers that the length of the branch between the two bifurcations is short (about one-tenth) relative to the branches leading to the three kingdoms (fig. 2).

It is essential that an analysis should involve only orthologous sequence com- parisons. Although rRNA genes are present in multiple copies in eukaryotic genomes, these copies are known to evolve in a concerted manner with virtually no differences within the various repeats of one species. When selecting the tRNA sequence sample, we considered all tRNAs with identical anticodons as homologous. It is possible that this procedure does not detect a few paralogous tRNAs, especially when eukaryotic, eubacterial, and archaebacterial sequences are simultaneously compared. Protein se- quences, which are often encoded by gene families in higher eukaryotes, have to be checked for possible paralogy. GAPDH is encoded by only one functional gene and several pseudogenes in mammals (Hanauer and Mandel 1984; Allen et al. 1987). Two GAPDH genes are found in Drosophila, encoding products that differ by eight residues, much less than the -75 amino acid differences between Drosophila and human (Tso et al. 1985 ) . No isoenzymes are known among cytosolic plant GAPDH (Shih et al. 1986). Finally, the yeast genome harbors three GAPDH genes whose products differ from each other by < 12% (Holland et al. 1983). Thus the divergence of nonallelic genes within each taxon is much smaller than interkingdom differences, suggesting that no gene duplication is more ancient than the kingdom splits. The same is true of TPI genes: the gene is unique in fungi Schizosaccharomyces (Russell 1985) and AspergiZZus (M&night et al. 1986) and in chicken ( Straus and Gilbert 1985 ) ; there is only one functional gene in mammals ( Maquat et al. 1985 ) ; and at least nine genes or pseudogenes have been detected in the maize genome, but no isoenzymes are known ( Marchionni and Gilbert 1986). No paralogy is involved in the Cu/Zn SOD comparisons: no isoenzymes are known in animals; a unique gene is present in human (Levanon et al. 1985)) rat (Delabar et al. 1987), and Drosophila (Set0 et al. 1987); and only one isoform has been detected in cabbage (Steffens et al. 1986). Two nuclear genes, however, encode two cytosolic SOD isoenzymes in maize, but only one has been sequenced (Cannon et al. 1987). The two mitochondrial polypeptides studied, ATPase alpha subunit and cytochrome c, are unique in all species. We cannot exclude

Page 10: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

118 Gouy and Li

the possibility that some hsp70 sequences analyzed are paralogous, because this poly- peptide is encoded by several expressed genes in all three kingdoms (reviewed by Peterson et al. 1988 and Rochester et al. 1986 for plant genes). Only in Drosophila have two genes been completely sequenced; they have been found to diverge by only 3% at the protein level (Karch et al. 198 1).

Besides the above six proteins, we also considered other proteins: alpha- and beta- tubulins, actin, calmodulin, and histones H2A, H3, and H4. However, when andyzed individually, each had an apparent rate of evolution differing by at least a factor of three between the three kingdoms. The same conclusion has been drawn by Little ( 1985) on the basis of tubulin sequence analysis. Therefore, these proteins were not retained because tree reconstruction is difficult when branch lengths differ greatly (Felsenstein 1978; Li et al. 1987). Moreover, most of these eukaryotic-specific proteins are encoded by gene families, thus making uncertain whether the genes compared between species are orthologous.

We have pooled six protein sets together in order to increase the resolving power. These proteins have different evolutionary rates (table 3). Since we weighted the com- puted distances for each protein by the number of sites used, our procedure is equivalent to having used a composite sequence made of six proteins. This procedure is as legitimate as is the analysis of the LSU rRNA which contains regions evolving at various rates.

Many molecular phylogenetic analyses, all based on rRNA sequences, have been published recently, but most of them did not specifically deal with the phylogeny of the three higher eukaryotic kingdoms. 5s rRNA is convenient for intrakingdom phy- logeny but cannot resolve the question of the animal-plant-fungal divergence (Hori and Osawa 1987). Recent reports based on SSU rRNA have focused on eukaryoticl archaebacterial relationships (Lake 1988) and on the phylogenies of bacteria (Woese 1987), protists (Gunderson et al. 1987; Vossbrinck et al. 1987), and the animal king- dom (Field et al. 1988). LSU rRNA has been used to determine the phylogeny of archaebacteria (Leffers et al. 1987), and that of protists by using the 400 5’-terminal nucleotides of the molecule ( Baroin et al. 1988 ) . Only one report specifically studied the divergence of the three kingdoms by using LSU rRNA (Hasegawa et al. 1985). These authors found that animals and fungi form a clade sharing a common ancestor after plants diverged, but they admitted that the resolution of their method was not enough to be statistically significant. There are two possible reasons for the difference between their results and ours: in their study ( 1) only transversion substitutions were used, reducing greatly the sampling size; and (2) the Crithidia LSU rRNA sequence was not known then, thereby necessitating that Physarum, which is even more distantly related to higher eukaryotes, be used instead. The present report is, we think, the first to provide statistically conclusive molecular evidence for the phylogeny of the kingdoms Animalia, Fungi, and Plantae.

“Sequence information is innately more informative of evolutionary relationships than phenotypic information is” ( Woese 1987 ) . Thus, the determination of the phy- logenetic relationships of the three kingdoms can help us to understand the evolution of various molecular, cellular, or developmental characters shared by two of the three kingdoms. The loss of chloroplasts appears to be a derived character which occurred in both the animal and fungal lineages. The use of the UGA triplet to encode tryptophan in animal and fungi but not in plant mitochondria has been taken as support for tree B (Cavalier-Smith 1987a). However, Paramecium mitochondrion also uses UGA as a sense codon. Thus, this usage of UGA is not uniquely shared by animals and fungi.

Page 11: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 119

On the other hand, animals and plants display similarities in their sexual reproduction in conformance with their relatedness. When sexually reproducing, these organisms form gametes of distinct morphology according to sex that fuse and develop into a diploid organism. The sexual reproduction of fungi is quite different, for it is based on conjugation that produces heterokaryotic cells whose nuclei ultimately fuse into a diploid zygote which immediately undergoes meiosis.

Acknowledgments

This study was supported by NIH grant GM30998. We thank D. Graur and N. Takahata for comments.

LITERATURE CITED

ALLEN, R. W., K. A. TRACH, and J. A. HOCH. 1987. Identification of the 37 KDa protein displaying a variable interaction with the erythroid membrane as glyceraldehyde-3-phosphate dehydrogenase. J. Biol. Chem. 262:649-653.

BAROIN, A., R. PERASSO, L.-H. Qu , G. BRUGEROLLE, J.-P. BACHELLERIE, and A. ADOUTTE. 1988. Partial phylogeny of the unicellular eukaryotes based on rapid sequencing of a portion of 28s ribosomal RNA. Proc. Natl. Acad. Sci. USA 85:3474-3478.

BRINKMANN, H., P. MARTINEZ, F. QUIGLEY, W. MARTIN, and R. CERFF. 1987. Endosymbiotic origin and codon bias of the nuclear gene for chloroplast glyceraldehyde-3-phosphate de- hydrogenase from maize. J. Mol. Evol. 26:320-328.

CANNON, R. E., J. A. WHITE, and J. G. SCANDALIOS. 1987. Cloning of cDNA for maize superoxide dismutase 2 ( SOD2). Proc. Natl. Acad. Sci. USA 84: 179- 183.

CAVALIER-SMITH, T. 1987a. The origin of eukaryote and archaebacterial cells. Ann. NY Acad. Sci. 503: 17-54.

- 1987b. The simultaneous symbiotic origin of mitochondria, chloroplasts, and micro- . bodies. Ann. NY Acad. Sci. 503:55-7 1.

CONWAY, T., G. W. SEWELL, and L. 0. INGRAM. 1987. Glyceraldehyde-3-phosphate dehydro- genase gene from Zymomonas mobilis: cloning, sequencing, and identification of promoter region. J. Bacterial. 169:5653-5662.

DAMS, E., L. HENDRIKS, Y. VAN DE PEER, J.-M. NEEF~, G. SMITS, I. VANDENBEMFT, and R. DE WACHTER. 1988. Compilation of small ribosomal subunit RNA sequences. Nucleic Acids Res. 16:r87-r 173.

DAYHOFF, M. 0. 1978. Atlas of protein sequence and structure. Vol. 5, Suppl. 3. National Biomedical Research Foundation, Washington, D.C.

DAYHOFF, M. O., P. J. MCLAUGHLIN, W. C. BARKER, and L. T. HUNT. 1975. Evolution of sequences within protein superfamilies. Naturwissenschaften 62: 154- 16 1.

DELABAR, J.-M., A. NICOLE, L. D’AURIOL, Y. JACOB, M. MEUNIER-ROTIVAL, F. GALIBERT, P.-M. SINI~T, and H. JBR~ME . 1987. Cloning and sequencing of a rat superoxide dismutase cDNA. Eur. J. Biochem. 166:18 1-187.

DEVEREUX, J., P. HAEBERLI , and 0. SMITHIES. 1984. A comprehensive set of sequence analysis programs for the VAX. Nucleic Acids Res. 12:387-395.

DICKERSON, R. E. 197 1. The structure of cytochrome c and the rates of molecular evolution. J. Mol. Evol. 1:26-45.

EDMAN, J. C., J. A. KOVACS, H. MASUR, D. V. SANTI, H. J. ELWOOD, and M. L. SOGIN. 1988. Ribosomal RNA sequence shows Pneumocystis carinii to be a member of the Fungi. Nature 334:5 19-522.

ERDMANN, V. A., and J. WOLTERS. 1986. Collection of published 5S, 5.8s and 4.5s rihosomal RNA sequences. Nucleic Acids Res. 14:rl-r59.

FARRIS, J. S. 1977. On the phenetic approach to vertebrate classification. Pp. 823-850 in

Page 12: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

120 Gouy and Li

M. K. HECHT, P. C. GOODY, and B. M. HECHT, eds. Major patterns in vertebrate evolution. Plenum, New York.

FELSENSTEIN, J. 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27:40 l-4 10.

- 198 1. Numerical methods for inferring evolutionary trees. Q. Rev. Biol. 57:379-404. . FIELD, K. G., G. J. OLSEN, D. J. LANE, S. J. GIOVANNONI, M. T. GHISELIN, E. C. GAFF,

N. R. PACE, and R. A. GAFF. 1988. Molecular phylogeny of the animal kingdom. Science 239:748-753.

FITCH, W. M. 1976. The molecular evolution of cytochrome c in eukaryotes. J. Mol. Evol. 8: 13-40.

GUNDERSON, J. H., H. ELWOOD, A. INGOLD, K. KINDLE, and M. L. SOGIN. 1987. Phylogenetic relationships between chlorophytes, chrysophytes, and oomycetes. Proc. Natl. Acad. Sci. USA 84:5823-5827.

GUTELL, R. R., and G. E. Fox. 1988. A compilation of large subunit RNA sequences presented in a structural format. Nucleic Acids Res. 16:r175-r269.

HANAUER, A., and J. L. MANDEL. 1984. The glyceraldehyde-3-phosphate dehydrogenase gene family: structure of a human cDNA and of an X chromosome linked pseudogene: amazing complexity of the gene family in mouse. EMBO J. 3:2627-2633.

HASEGAWA, M., Y. IIDA, T. YANO, F. TAKAIWA, and M. IWABUCHI. 1985. Phylogenetic re- lationships among eukaryotic kingdoms inferred from ribosomal RNA sequences. J. Mol. Evol. 22:32-38.

HOLLAND, J. P., L. LABIENIEC, C. SWIMMER, and M. J. HOLLAND. 1983. Homologous nucleotide sequences at the 5’ termini of messenger RNAs synthesized from the yeast enolase and glyceraldehyde-3-phosphate dehydrogenase gene families: the primary structure of a third yeast glyceraldehyde-3-phosphate dehydrogenase gene. J. Biol. Chem. 258:529 l-5299.

HORI, H., and S. OSAWA. 1987. Origin and evolution of organisms as deduced from 5s ribosomal RNA sequences. Mol. Biol. Evol. 4:445-472.

HUNT, L. T., D. G. GEORGE, and W. C. BARKER. 1985. The prokaryote-eukaryote interface. BioSystems l&223-240.

INGOLIA, T. D., M. R. SLATER, and E. A. CRAIG. 1982. Saccharomyces cerevisiae contains a complex multigene family related to the major heat shock-inducible gene of Drosophila. Mol. Cell. Biol. 2: 1388-l 398.

KARCH, F., I. TOEROEK, and A. TISSIERES. 198 1. Extensive regions of homology in front of the two hsp70 heat shock variant genes in Drosophila melanogaster. J. Mol. Biol. 148:2 19- 230.

KIMURA, M., and T. OHTA. 1973. Eukaryotes-prokaryotes divergence estimated by 5s ribosomal RNA sequences. Nature New Biol. 243: 199-200.

KLOTZ, L. C., N. KOMAR, R. L. BLANKEN, and R. M. MITCHELL. 1979. Calculation of evo- lutionary trees from sequence data. Proc. Natl. Acad. Sci. USA 76:4516-4520.

LAKE, J. A. 1988. Origin of the eukaryotic nucleus determined by rate-invariant analysis of rRNA sequences. Nature 331: 184- 186.

LEFFERS, H., J. KJEMS, L. ~STERGAARD, N. LARSEN, and R. A. GARRETT. 1987. Evolutionary relationships amongst archaebacteria: a comparative study of 23s ribosomal RNAs of a sulphur-dependent extreme thermophile, an extreme halophile and a thermophilic meth- anogen. J. Mol. Biol. 195:43-61.

LEVANON , D., J. LIEMAN-HURWITZ, N. DAFNI, M. WIGDERSON , L. SHERMAN, Y. BERNSTEIN, Z. LAVER-RUDICH , E. DANCIGER, 0. STEIN, and Y. GRONER . 1985. Architecture and anat- omy of the chromosomal locus in human chromosome 2 1 encoding the Cu/Zn superoxide dismutase. EMBO J. 4:77-84.

Li, W. H. 198 1. Simple method for constructing phylogenetic trees from distance matrices. Proc. Natl. Acad. Sci. USA 78:1085-1089.

- 1989. A statistical test of phylogenies estimated from sequence data. Mol. Biol. Evol. . (accepted)

Page 13: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

Phylogeny of the Kingdoms Animalia, Plantae, and Fungi 12 1

LI, W. H., K. H. WOLFE, J. SOURDIS, and P. M. SHARP. 1987. Reconstruction of phylogenetic trees and estimation of divergence times under nonconstant rates of evolution. Cold Spring Harbor Symp. Quant. Biol. 52:847-856.

LITTLE, M. 1985. An evaluation of tubulin as a molecular clock. BioSystems l&24 l-247. MCKNIGHT, G. L., P. J. O’HARA, and M. L. PARKER. 1986. Nucleotide sequence of the

triosephosphate isomerase gene from Aspergillus niduluns: implications for a differential loss of introns. Cell 46: 143- 147.

MAQUAT, L. E., R. CHILCOTE, and P. M. RYAN. 1985. Human triosephosphate isomerase cDNA and protein structure: studies of triosephosphate isomerase deficiency in man. J. Biol. Chem. 260:3748-3753.

MARCHIONNI, M., and W. GILBERT. 1986. The triosephosphate isomerase gene from maize: introns antedate the plant-animal divergence. Cell 46: 133- 14 1.

NAIRN, C. J., and R. J. F+ERL. 1988. The complete nucleotide sequence of the small-subunit ribosomal RNA coding region for the cycad Zamia pumilu: phylogenetic implications. J. Mol. Evol. 27: 133- 14 1.

NEI, M. 1987. Molecular evolutionary genetics. Columbia University Press, New York. PETERSON, M. G., P. E. CREWTHER, J. K. THOMPSON, L. M. CORCORAN, R. L. COPPEL,

G. V. BROWN, R. F. ANDERS, and D. J. KEMP. 1988. A second antigenic heat shock protein of Plasmodiuti falciparum. DNA 7:7 1-78.

RAIRKAR, A., H. M. RUBINO, and R. E. LOCKARD. 1988. Revised primary structure of rabbit 18s ribosomal RNA. Nucleic Acids Res. 16:3 113.

ROCHESTER, D. E., J. A. WINER, and D. M. SHAH. 1986. The structure and expression of maize genes encoding the major heat shock protein, hsp70. EMBO J. 5:45 l-458.

RUSSELL, P. R. 1985. Transcription of the triose-phosphate-isomerase gene of Schizosaccharo- myces pombe initiates from a start point different from that in Saccharomyces cerevisiue. Gene 40: 125- 130.

SAITOU, N., and M. NEI . 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4:406-425.

SCHOPF, J. W., J. M. HAYES, and M. R. WALTER. 1983. Evolution of earth’s earliest ecosystems: recent progress and unsolved problems. Pp. 361-384 in J. W. SCHOPF, ed. Earth’s earliest biosphere: its origin and evolution. Princeton University Press, Princeton, N.J.

SETO, N. 0. L., S. HAYASHI , and G. M. TENER . 1987. The sequence of the Cu-Zn superoxide dismutase gene of Drosophila Nucleic Acids. Res. 15: 1060 1.

SHIH, M.-C., G. LAZAR, and H. M. GOODMAN. 1986. Evidence in favor of the symbiotic origin of chloroplasts: primary structure and evolution of tobacco glyceraldehyde-3-phosphate de- hydrogenases. Cell 47:73-80.

SOGIN, M. L., H. J. ELWOOD, and J. H. GUNDERSON. 1986. Evolutionary diversity of eukaryotic small-subunit rRNA genes. Proc. Natl. Acad. Sci. USA 83:1383-1387.

SOURDIS, J., and M. NEI . 1988. Relative efficiencies of the maximum parsimony and distance- matrix methods in obtaining the correct phylogenetic tree. Mol. Biol. Evol. 5:298-3 11.

SPRINZL, M., T. HARTMANN, F. MEISSNER, J. MOLL, and T. VORDERW~~LBECKE. 1987. Com- pilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 15:r53-r188.

STEFFENS, G. J., A. M. MICHELSON, F. ~TTING, K. PLJGET, W. STRASSBURGER, and L. FLOHB. 1986. Primary structure of Cu-Zn superoxide dismutase of Brassica oleruceu proves homology with corresponding enzymes of animals, fungi and prokaryotes. Biol. Chem. Hoppe Seyler 367:1007-1016.

STEINMAN, H. M. 1987. Bacteriocuprein superoxide dismutase of Photobacterium leiognathi: isolation and sequence of the gene and evidence for a precursor form. J. Biol. Chem. 262: 1882-1887.

STRAUS, D., and W. GILBERT. 1985. Genetic engineering in the Precambrian: structure of the chicken triosephosphate isomerase gene. Mol. Cell. Biol. 5:3497-3506.

Tso, J. Y., X.-H. SUN, and R. WV. 1985. Structure of two unlinked Drosophila mekznogaster glyceraldehyde-3-phosphate dehydrogenase genes. J. Biol. Chem. 260:8220-8228.

Page 14: Molecular Phylogeny of the Kingdoms Animalia, Plantae, and Fungi '

122 Gouy and Li

VOSSBRINCK, C. R., J. V. MADDOX, S. FRIEDMAN, B. A. DEBRUNNER-VOSSBRINCK, and C. R. WOESE . 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature 326:4 1 l-4 14.

WOESE, C. R. 1987. Bacterial evolution. Microbial. Rev. 51:221-27 1.

WALTER M. FITCH, reviewing editor

Received November 10, 1988