CsRn1, a Novel Active Retrotransposon in a Parasitic ... Mol. Biol. Evol. 18(8):1474–1483. 2001 q...

10
1474 Mol. Biol. Evol. 18(8):1474–1483. 2001 q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038 CsRn1, a Novel Active Retrotransposon in a Parasitic Trematode, Clonorchis sinensis, Discloses a New Phylogenetic Clade of Ty3/gypsy-like LTR Retrotransposons Young-An Bae,* Seo-Yun Moon,* Yoon Kong,² Seung-Yull Cho,² and Mun-Gan Rhyu* *Department of Microbiology, College of Medicine, Catholic University of Korea, Seoul, Korea; and ²Department of Molecular Parasitology, Sungkyunkwan University School of Medicine, Suwon, Korea We screened the genome of a trematode, Clonorchis sinensis, in order to identify novel retrotransposons and thereby provide additional information on retrotransposons for comprehensive phylogenetic study. Considering the vast potential of retrotransposons to generate genetically variable regions among individual genomes, randomly amplified polymorphic DNAs (RAPDs) detected by arbitrarily primed polymerase chain reactions were selected as candidates for retrotransposon-related sequences. From RAPD analysis, we isolated and characterized a novel retrotransposon in C. sinensis as the first member of uncorrupted long-terminal-repeat (LTR) retrotransposons in phylum Platyhel- minthes. The retrotransposon, which was named Clonorchis sinensis Retrotransposon 1 (CsRn1), showed a geno- mewide distribution and had a copy number of more than 100 per haploid genome. CsRn1 encoded an uninterrupted open reading frame (ORF) of 1,304 amino acids, and the deduced ORF exhibited similarities to the pol proteins of Ty3/gypsy-like LTR retrotransposons. The mobile activity of master copies was predicted by sequence analysis and confirmed by the presence of mRNA transcripts. Phylogenetic analysis of Ty3/gypsy-like LTR retrotransposons detected a new clade comprising CsRn1, Kabuki of Bombyx mori, and an uncharacterized element of Drosophila melanogaster. With its high repetitiveness and preserved mobile activity, it is proposed that CsRn1 may play a significant role in the genomic evolution of C. sinensis. Introduction A remarkable wealth of data is now available on the diversity and distribution of retrotransposons in nearly all eukaryotes (see Boeke and Stoye 1997 and references therein). As a class of transposable elements, retrotransposons move and integrate into new sites with- in genomes via reverse transcription of an RNA inter- mediate (Boeke et al. 1985) and can therefore provide genetic variations ranging from simple sequence poly- morphism within the elements to dramatic alterations in chromosomal structure (Kidwell and Lisch 1997; Fe- doroff 2000). Based on such vast potential to generate genetic variations, retrotransposons have been known as major agents in evolution that give rise to phenotypic variants (Long et al. 2000) and, in the long term, drive speciation (Mcdonald 1990). With the cumulative data on retrotransposons, var- ious studies have discussed the probable evolutionary course of retrotransposons, including studies of long- terminal-repeat (LTR) family, non-LTR family, and ex- ogenous retroviruses (Xiong and Eickbush 1990; Malik, Burke, and Eickbush 1999; Malik and Eickbush 1999). However, the phylogenetic relationship of diverse retro- transposons remains ambiguous, mainly with regard to factors concerning the branching patterns in phyloge- netic trees, such as the formation of polytomy (Malik and Eickbush 1999) and polyphyletic distribution (Malik Abbreviations: IN, integrase; PBS, primer-binding site; PPT, po- lypurine tract; PR, protease; RT, reverse transcriptase; TSD, target site duplication; RH, RNase H. Key words: Clonorchis sinensis, trematode, LTR retrotransposon, AP-PCR, RAPD, master copy. Address for correspondence and reprints: Mun-Gan Rhyu, De- partment of Microbiology, College of Medicine, Catholic University of Korea, Seoul 137-701, Korea. E-mail: [email protected]. and Eickbush 1999; Marı ´n and Llore ´ns 2000). Thus, the number of ancient classes responsible for diverse clades of retrotransposons and the extent of possible horizontal transfer between different species are difficult to define. Such ambiguities are likely to arise from limitations in sequence information on retrotransposons and unbal- anced sampling from each taxon. There have been a number of reports on the LTR retrotransposons of animals such as nematodes (Felder et al. 1994; Bowen and McDonald 1999), insects (Lin- dsley and Zimm 1992; Biessmann et al. 1999; Abe et al. 2000), echinoderms (Britten et al. 1995), and fish (Poulter and Butler 1998). In Platyhelminthes, however, Gulliver of Schistosoma japonicum (Laha et al. 2001) is the only full-length LTR retrotransposon so far de- scribed in the phylum, although Arkhipova and Mesel- son (2000) have recently reported a segmental sequence of an LTR retrotransposon in Dugesia. Moreover, the sequence of Gulliver is corrupted, even though its ex- pression has been demonstrated at the level of transcrip- tion by reverse transcription-PCR (RT-PCR) (Laha et al. 2001). Thus, currently there are no reports on uncor- rupted LTR retrotransposons, which may play a signif- icant role in the formation of genomes in Platyhelmin- thes, including trematodes. Retrotransposons introduce variations through their heterogeneous integration and subsequent sequence di- vergence, and these polymorphic regions can be iden- tified as randomly amplified polymorphic DNAs (RAPDs) by arbitrarily primed PCR (AP-PCR) (Abe et al. 1998). In the present study, we attempted to identify retrotransposons from the genome of Clonorchis sinen- sis, an important human liver fluke in East Asia, based on the analysis of RAPD sequences. By screening var- iable genetic regions from individual C. sinensis using the AP-PCR method, a retrotransposon, named Clonor- chis sinensis Retrotransposon 1 (CsRn1), was isolated

Transcript of CsRn1, a Novel Active Retrotransposon in a Parasitic ... Mol. Biol. Evol. 18(8):1474–1483. 2001 q...

1474

Mol. Biol. Evol. 18(8):1474–1483. 2001q 2001 by the Society for Molecular Biology and Evolution. ISSN: 0737-4038

CsRn1, a Novel Active Retrotransposon in a Parasitic Trematode,Clonorchis sinensis, Discloses a New Phylogenetic Cladeof Ty3/gypsy-like LTR Retrotransposons

Young-An Bae,* Seo-Yun Moon,* Yoon Kong,† Seung-Yull Cho,† and Mun-Gan Rhyu**Department of Microbiology, College of Medicine, Catholic University of Korea, Seoul, Korea; and †Department ofMolecular Parasitology, Sungkyunkwan University School of Medicine, Suwon, Korea

We screened the genome of a trematode, Clonorchis sinensis, in order to identify novel retrotransposons and therebyprovide additional information on retrotransposons for comprehensive phylogenetic study. Considering the vastpotential of retrotransposons to generate genetically variable regions among individual genomes, randomly amplifiedpolymorphic DNAs (RAPDs) detected by arbitrarily primed polymerase chain reactions were selected as candidatesfor retrotransposon-related sequences. From RAPD analysis, we isolated and characterized a novel retrotransposonin C. sinensis as the first member of uncorrupted long-terminal-repeat (LTR) retrotransposons in phylum Platyhel-minthes. The retrotransposon, which was named Clonorchis sinensis Retrotransposon 1 (CsRn1), showed a geno-mewide distribution and had a copy number of more than 100 per haploid genome. CsRn1 encoded an uninterruptedopen reading frame (ORF) of 1,304 amino acids, and the deduced ORF exhibited similarities to the pol proteins ofTy3/gypsy-like LTR retrotransposons. The mobile activity of master copies was predicted by sequence analysis andconfirmed by the presence of mRNA transcripts. Phylogenetic analysis of Ty3/gypsy-like LTR retrotransposonsdetected a new clade comprising CsRn1, Kabuki of Bombyx mori, and an uncharacterized element of Drosophilamelanogaster. With its high repetitiveness and preserved mobile activity, it is proposed that CsRn1 may play asignificant role in the genomic evolution of C. sinensis.

Introduction

A remarkable wealth of data is now available onthe diversity and distribution of retrotransposons innearly all eukaryotes (see Boeke and Stoye 1997 andreferences therein). As a class of transposable elements,retrotransposons move and integrate into new sites with-in genomes via reverse transcription of an RNA inter-mediate (Boeke et al. 1985) and can therefore providegenetic variations ranging from simple sequence poly-morphism within the elements to dramatic alterations inchromosomal structure (Kidwell and Lisch 1997; Fe-doroff 2000). Based on such vast potential to generategenetic variations, retrotransposons have been known asmajor agents in evolution that give rise to phenotypicvariants (Long et al. 2000) and, in the long term, drivespeciation (Mcdonald 1990).

With the cumulative data on retrotransposons, var-ious studies have discussed the probable evolutionarycourse of retrotransposons, including studies of long-terminal-repeat (LTR) family, non-LTR family, and ex-ogenous retroviruses (Xiong and Eickbush 1990; Malik,Burke, and Eickbush 1999; Malik and Eickbush 1999).However, the phylogenetic relationship of diverse retro-transposons remains ambiguous, mainly with regard tofactors concerning the branching patterns in phyloge-netic trees, such as the formation of polytomy (Malikand Eickbush 1999) and polyphyletic distribution (Malik

Abbreviations: IN, integrase; PBS, primer-binding site; PPT, po-lypurine tract; PR, protease; RT, reverse transcriptase; TSD, target siteduplication; RH, RNase H.

Key words: Clonorchis sinensis, trematode, LTR retrotransposon,AP-PCR, RAPD, master copy.

Address for correspondence and reprints: Mun-Gan Rhyu, De-partment of Microbiology, College of Medicine, Catholic Universityof Korea, Seoul 137-701, Korea. E-mail: [email protected].

and Eickbush 1999; Marın and Llorens 2000). Thus, thenumber of ancient classes responsible for diverse cladesof retrotransposons and the extent of possible horizontaltransfer between different species are difficult to define.Such ambiguities are likely to arise from limitations insequence information on retrotransposons and unbal-anced sampling from each taxon.

There have been a number of reports on the LTRretrotransposons of animals such as nematodes (Felderet al. 1994; Bowen and McDonald 1999), insects (Lin-dsley and Zimm 1992; Biessmann et al. 1999; Abe etal. 2000), echinoderms (Britten et al. 1995), and fish(Poulter and Butler 1998). In Platyhelminthes, however,Gulliver of Schistosoma japonicum (Laha et al. 2001) isthe only full-length LTR retrotransposon so far de-scribed in the phylum, although Arkhipova and Mesel-son (2000) have recently reported a segmental sequenceof an LTR retrotransposon in Dugesia. Moreover, thesequence of Gulliver is corrupted, even though its ex-pression has been demonstrated at the level of transcrip-tion by reverse transcription-PCR (RT-PCR) (Laha et al.2001). Thus, currently there are no reports on uncor-rupted LTR retrotransposons, which may play a signif-icant role in the formation of genomes in Platyhelmin-thes, including trematodes.

Retrotransposons introduce variations through theirheterogeneous integration and subsequent sequence di-vergence, and these polymorphic regions can be iden-tified as randomly amplified polymorphic DNAs(RAPDs) by arbitrarily primed PCR (AP-PCR) (Abe etal. 1998). In the present study, we attempted to identifyretrotransposons from the genome of Clonorchis sinen-sis, an important human liver fluke in East Asia, basedon the analysis of RAPD sequences. By screening var-iable genetic regions from individual C. sinensis usingthe AP-PCR method, a retrotransposon, named Clonor-chis sinensis Retrotransposon 1 (CsRn1), was isolated

CsRn1 Retrotransposon in a Trematode 1475

as the second complete but the first uncorrupted LTRretrotransposon identified in the phylum Platyhelmin-thes. Structural and genomic analyses of CsRn1 and itsphylogenetic relationship with other Ty3/gypsy-like LTRretrotransposons are presented.

Materials and MethodsParasite and DNA Extraction

Adult C. sinensis were collected from the livers ofexperimental rabbits which were challenged orally withmetacercariae obtained from naturally infected fish inKimhae, Korea, 3 months prior to dissection. Theworms were washed with physiological saline five timesat 48C. Fresh worms were used immediately for DNAextraction with the Wizard DNA Purification Kit (Pro-mega, Madison, Wis.) according to the manufacturer’sinstructions.

AP-PCR and Cloning of Individual Worm-SpecificProducts

Genomic DNAs extracted from individual wormswere used for the amplification of worm-specific RAPDregions by PCR under low-stringency conditions. Thefollowing arbitrarily designed primers were used in thePCR reactions: AP-1 (59-GATCCGTTCA-39), AP-3 (59-ACCCATACCC-39), B4 (59-GGACTGGAGT-39), B5(59-TGCGCCCTTC-39), B8 (59-GTCCACACGG-39),B12 (59-CCTTGACGCA-39), B13 (59-TTCCCCCGCT-39), F6 (59-GGGAATTCGG-39), and F15 (59-CCAG-TACTCC-39). The reaction mixtures included 40 ng ofgenomic DNA, 1.25 mM of primers, 0.2 mM each ofdATP, dGTP, dCTP, and dTTP, and 1.25 U of Taq poly-merase (Takara, Shiga, Japan) in a total reaction volumeof 20 ml. PCR conditions were as follows: 32 cycles of1 min at 948C, 1 min at 378C, and 2 min at 728C, anda final extension of 10 min at 728C. The reproducibilityof the results was tested by repeating the reactions underidentical conditions three times. The PCR products werefractionated by electrophoresis on agarose gels andstained with ethidium bromide. Individual-specificbands were recovered from agarose gels, reamplifiedwith the corresponding primers, and then cloned intopGEM-T Easy Vector (Promega) for nucleotidesequencing.

Southern Blot Hybridization

Five micrograms of genomic DNAs isolated fromC. sinensis adult worms were digested with restrictionenzymes. After being fractionated through 0.8% agarosegels, the DNAs were blotted onto nylon membrane (Hy-bond-N1; Amersham Pharmacia Biotech, Uppsala, Swe-den) by capillary action in 10 3 standard saline citrate(SSC). The blots were hybridized with probes enzymat-ically labeled with the ECL Direct Labeling Kit (Amer-sham Pharmacia Biotech). The labeling and hybridizingconditions were according to the manufacturer’s instruc-tions. The membranes were washed twice in 6 M urea,0.4% sodium dodecyl sulfate, and 0.1 3 SSC at 428C

for 20 min and twice in 2 3 SSC at room temperaturefor 5 min.

Dot-Blot Analysis

Ten micrograms of genomic DNA were blottedonto nylon membrane according to the standard proce-dure of dot-blot analysis (Sambrook, Fritsch, and Man-iatis 1989). Two identical membranes were prepared, ofwhich one was hybridized with probe for repetitive se-quences and the other was hybridized with that for cys-teine protease as a single copy control (GenBank acces-sion number AF271091). The probe for protease wasamplified from C. sinensis genomic DNA with theCsCP3-S1 (59-GCTGGACTCCGACTACCCATATG-39)and CsCP3-R3 (59-GGTTTAAACGATTGTGCATCGC-39) primers. Probe labeling and hybridizing conditionswere same as those for Southern blot hybridization. Theintensities of signals were measured using the LAS-1000plus system (FUJIFILM, Tokyo, Japan).

Construction and Screening of Genomic DNA Library

DNAs from adult worms were partially digestedwith Sau3AI (Takara). DNA fragments of 9;23 kb wererecovered by ultracentrifugation onto sucrose gradients,purified, and then cloned into lambda FIX II vector pre-digested with XhoI (Stratagene, La Jolla, Calif.). Theconstructs were packaged into lambda particles usingGigapack III Gold-11 packaging extract. The unampli-fied libraries were screened with DNA probe labeledwith the ECL labeling kit. The conditions for plaque-lift hybridization were identical to those for Southernblot hybridization. The inserts of lambda clones wereamplified by long PCR (Chen, Fockler, and Higuchi1994) using primers designed from vector regions (59-CTAATACGACTCACTATAGGGCGTCG-39 and 59-CCCTCACTAAAGGGAGTCGACTCG-39) and LATaq polymerase according to the standard cycle condi-tions (Takara). The amplified products were digestedwith restriction enzymes and were then cloned intopBluescript II SK(2) phagemid (Stratagene) for nucle-otide sequencing.

Screening of cDNA Library by PCR Methods

A cDNA library of adult C. sinensis was construct-ed in lZAP II vector using a cDNA synthesis kit (Stra-tagene) according to the manufacturer’s instructions.The library was amplified and was then used in standardPCR reactions for the detection of mRNA transcripts.The PCR products were cloned into pGEM-T Easy Vec-tor for sequencing.

Sequence Analysis

The nucleotide sequences were automatically de-termined with an ABI PRISM 377 DNA sequencer (Ap-plied Biosystems, Foster City, Calif.) and a BigDye Ter-minator Cycle Sequencing Reaction Kit (Perkin ElmerCorporation, Foster City, Calif.). To ensure the accuracyof sequencing reactions, sequences of single strands

1476 Bae et al.

FIG. 1.—Arbitrarily primed-PCR(AP-PCR) analysis of individual worms for isolating randomly amplified polymorphic DNAs (RAPD). A,Electrophoresis of AP-PCR products on agarose gel visualized by ethidium bromide staining. Separately extracted genomic DNAs from individualClonorchis sinensis worms were amplified using AP primers. Band shifts corresponding to individual-specific RAPD markers are indicated byarrows. Numbers at the top refer to individual worms, and primers used in each reaction are indicated above the numbers. The positions ofDNA size standards (in bp) are shown on the left. B, Southern blot hybridization of P-B13 to genomic DNAs of C. sinensis. Restrictionendonucleases used for the digestion of DNAs are indicated at the top. The positions of DNA size standards (in kb) are shown on the right.

from five clones of vector-ligated DNA fragments weredetermined. For PCR products obtained during cDNAlibrary screening, nucleotide sequences from bothstrands were determined. After sequencing, homologysearches were performed against the nonredundant da-tabase at the National Center for Biotechnology Infor-mation (NCBI; http://www.ncbi.nlm.nhm.gov/) usingBLASTN and BLASTX (Altschul et al. 1997). The RE-PEAT program in the GCG package, version 8. (Uni-versity of Wisconsin), was used to determine the directrepeat sequences. The putative open reading frames(ORFs) were predicted by GeneScan (Burge and Karlin1997), GeneMark (Borodovsky and Lukashin, unpub-lished; http://genemark.biology.gatech.edu/GeneMark/),and ORF Finder (NCBI). A search for the functionalprotein domains was performed using ProDom (Gouzy,Corpet, and Kahn 1999) and ProfileScan (Gribskov,McLachlan, and Eisenberg 1987).

Phylogenetic Analysis

The nucleotide sequences were aligned withClustalX (Thompson et al. 1997). After optimizing thesequence alignments using the PHYDIT program (Chun1995), divergence values were calculated and a dendro-gram was drawn using the programs DNADIST andNEIGHBOR, respectively, of PHYLIP (Felsenstein1993). Based on the previous reports (Xiong and Eick-bush 1990; Malik and Eickbush 1999), reverse transcrip-tase (RT), RNase H (RH), and integrase (IN) domains

in pol proteins were defined from alignments usingCLUSTAL W (Thompson, Higgins, and Gibson 1994),and amino acid sequences of the three domains werecombined for the phylogenetic analysis of LTR retro-transposons. After aligning the combined sequenceswith ClustalX and optimizing the alignment withGeneDoc (Nicholas and Nicholas 1997), a phylogeneticanalysis was performed using PROTDIST and NEIGH-BOR of PHYLIP. The trees were displayed by TreeView(Page 1996), and the statistical significance of branchingpoints was evaluated with 1,000 random samplings ofthe input sequence alignments using SEQBOOT.

ResultsIsolation of RAPDs by AP-PCR

With nine 10mer arbitrary primers, singly or inpairs, AP-PCRs were performed to find band shifts ingenomic DNAs separately extracted from individual C.sinensis worms. In AP-PCRs with DNAs from threeworms, three individual-specific band shifts were dis-covered (fig. 1A), and their sequences were used tosearch the GenBank database using the BLAST algo-rithm. The sequence of a band shift with the B13 primer,named P-B13 (GenBank accession number AZ551682),showed significant identity at the amino acid (aa) se-quence level to pol proteins in Kabuki of Bombyx mori(AB032718; identity value of 34%) and in bovine syn-cytial virus (U94514, 23%). Significant identities to anyknown genes were not found in the homology searches

CsRn1 Retrotransposon in a Trematode 1477

FIG. 2.—Overall structure of the CsRn1 long-terminal-repeat (LTR) retrotransposon. Boxes containing black arrows represent flanking LTRs.Striped boxes within the open reading frame (ORF) show the conserved functional domains of Gag, protease (PR), reverse transcriptase (RT),RNase H (RH), and three subdomains of integrase (IN). Duplicated target sites of 4 bp are represented as target site duplications (TSDs).Positions of P-B13 RAPD and LTR probe for southern hybridization (P-LTR) are indicated. A restriction map of the full-length CsRn1 elementis presented above the structure.

FIG. 3.—Sequences of the putative primer-binding site (PBS) and polypurine tract (PPT) of CsRn1. The sequences of 59 long-terminal-repeat (LTR) termini, PBS, PPT, and 39 LTR termini are aligned and presented. The 39 end of bovine tRNATrp is also presented in its corre-sponding region. Dots represent gaps introduced into sequences to increase their similarity. Sequences of Kabuki of Bombyx mori (AB032718)and an undescribed LTR retrotransposon of Drosophila melanogaster on AE003787 are used for the alignment.

using the sequences of the other two bands. To deter-mine the genomic distribution of P-B13, the P-B13DNA fragment was used to probe a Southern blot ofgenomic DNA from C. sinensis. Multiple bands of hy-bridization to the genomic DNA digested with a seriesof restriction enzymes indicated that P-B13 represents ahighly repetitive region (fig. 1B), which is one of theprominent features of retrotransposons. Thus, it was pro-posed that P-B13 is a portion of retrotransposon presentwithin the C. sinensis genome.

The Structure of a Novel LTR Retrotransposon, CsRn1

The overall structure of a novel retrotransposon en-compassing the P-B13 marker was determined fromthree randomly selected lambda clones, labeled lCs-1,lCs-2, and lCs-4, obtained by screening the genomicDNA library with the P-B13 probe (fig. 2). In each ofthese three clones, the full lengths of the CsRn1 retro-transposon differed slightly due to several insertions anddeletions (indels), and a consensus sequence determinedfrom the three clones had a size of 5,026 bp. The wholeunits of CsRn1 were bounded by direct repeats of 4 bpknown as target site duplications (TSDs) introduced dur-ing the process of integration.

The complete CsRn1 elements were flanked byLTRs with an average size of 471 bp (fig. 2). As in mostLTR retrotransposons (Boeke and Stoye 1997), the LTRscontained at both ends short inverted repeats of 3 bpthat initiated as TG and terminated as CA (59-TGT . . .ACA-39). A sequence motif (AATACA) similar to a typ-ical poly A signal sequence (AATAAA) was found with-in the LTR sequence, but a promoter signal sequence(TATA box) could not be defined. The sequence of 12bp adjacent to the 59-LTR was complementary to thenucleotides at the 39 ends of bovine and chicken t-

RNATrp. Thus, this region seems to be a likely primer-binding site (PBS) for the synthesis of the first cDNAstrand. An additional priming site (59-GGGGGAGTAG-39) for the synthesis of the second cDNA strand (poly-purine tract [PPT]) was found in the direct upstreamregion of 39-LTR (fig. 3).

An internal region of the CsRn1 copy from lambdaclone 4 (lCs-4) contained one large, uninterrupted ORF(1,304 aa). The copies from the other two lambda clones(lCs-1 and lCs-2) showed similar results after the se-quences were corrected for corruptions. The deducedORF included well-conserved functional domains in theorder protease (PR), RT, RH, and IN (fig. 2). Instead ofa conventional Gag motif (CCHC), the ORF had an ap-parent nucleic-acid-binding site (CHCC) at the predicted39 ends of Gag just prior to the DTG aspartic proteaseactive site. Although the CHCC motif is unusual, it isalso repetitively found in Gag proteins of Kabuki andAE003787 (fig. 4). The amino acid conservation (fig. 4)and domain order (PR-RT-RH-IN) in the ORF indicatedthat CsRn1 belongs to a family of Ty3/gypsy-like LTRretrotransposons (Pringle 1999).

Identification of CsRn1 Master Copies

CsRn1 elements were interspersed throughout thegenome of C. sinensis, rather than being tandemly ar-rayed or clustered at limited loci, and the copy numberwas estimated to be approximately 100 per haploid ge-nome under high-stringency conditions (0.1 3 SSC)(fig. 5). In addition to the 3 initially identified copies,11 copies of full-length CsRn1 elements were furtherobtained from the C. sinensis genomic library for com-parison of their overall sequences and structures. The 14CsRn1 copies (GenBank accession numbers AY013558–AY013571) shared an overall sequence identity of

1478 Bae et al.

FIG. 4.—Multiple-sequence alignments of functional protein domains from various long-terminal-repeat (LTR) retrotransposons. The aminoacid sequences of putative protein domains, Gag, protease, reverse transcriptase, RNase H, and integrase, are aligned. Seven conserved subdo-mains of reverse transcriptase are underlined (Xiong and Eickbush 1990). The amino acid sequences of CsRn1 domains are compared withthose of other LTR retrotransposons: Kabuki of Bombyx mori (AB032718), undescribed LTR retrotransposon of Drosophila melanogaster onAE003787, Sushi of Fugu rubripes (AF030881), and Osvaldo of Drosophila buzzatii (Z46728). For the CHCC Gag domain, only those ofKabuki and AE003787 are used for the alignment, since the other elements have a CCHC Gag motif instead of CHCC. Sequence identitieswith CsRn1 are highlighted in black, while similar residues are highlighted by gray boxes.

98.7%, which slightly differed in LTRs (98.3%) and ininternal coding regions (98.9%) (table 1). In three copies(CsRn1-4, CsRn1-7, and CsRn1-52), the reading frameswere retained, while in others they were corrupted byindels and/or stop codons introduced by base substitu-tions. All copies were bounded by TSDs of 4 bp andadjacent unique single-copy regions, supporting the het-erogeneous genomewide distribution of CsRn1.

A sequence alignment of 14 full-length CsRn1 cop-ies showed base substitutions shared by more than twocopies at numerous positions (90 of 5,023 bp), whichshowed clear subdivisions within the CsRn1 elements.For example, two nucleotide pairs at positions 2666 and2692 divided CsRn1 copies into two groups, designatedgroup G, with diagnostic bases of G·T, and group C,with C·C. The two groups were further distinguished bynucleotide pairs observed at eight other positions, al-though the bases at these positions were less distinctive(data not shown). This, together with the finding of well-conserved sequences at the remaining positions, dem-onstrated that the members of group G formed a clonallineage originating from a common progenitor copy.The three diagnostic bases at positions 66, 4240, and4619 suggested further divergence of this group into twosubgroups (fig. 6).

Short branch lengths relating the members of groupG in a phylogenetic tree (average length 5 0.001; fig.6) (Medstrand and Mager 1998) and nearly identicalLTR sequences of the copies (table 1) (Dangel et al.1995) suggested that the CsRn1 copies of this group

have recently been replicated. Together with the fact thatthe major fraction of the CsRn1 elements belong togroup G (7 of 14 sequenced copies), these results dem-onstrate a role of the elements belonging to group G asactive master copies for the multiplication of CsRn1 el-ements. The copies of group C also had numerous di-agnostic bases, but they were heterogeneous and showedhigher levels of sequence divergence when comparedwith group G (fig. 6). Thus, these divergent copies werethought to be inactive variant forms of CsRn1 that ex-panded prior to the expansion of group G.

The Mobile Activity of the CsRn1 Master Copy

The genomic distribution of the element was foundto be heterogeneous among individuals of C. sinensis,which supports the recent expansion of CsRn1 (fig. 7A).Based on this finding, we attempted to examine the pres-ence of CsRn1 transcripts in the total RNA moleculesextracted from the adult worms by Northern blot anal-ysis but failed to detect any signals when the blots wereprobed with P-B13 probe (data not shown). We thenperformed PCRs with higher sensitivity using a cDNAlibrary as template and four primer sets covering thefull-length of mRNA transcripts (see fig. 7B and its leg-end). As shown in figure 7B, 39-end and P-B13 regionswere well amplified, whereas 59-end and RT regionswere not amplified or weakly amplified, possibly due tothe incomplete extension of cDNAs from 39 ends duringthe construction of the cDNA library.

CsRn1 Retrotransposon in a Trematode 1479

FIG. 5.—Intragenomic distribution and copy number of CsRn1. A, Genomic Southern blots showing that CsRn1 is highly reiterated through-out the Clonorchis sinensis genome. Restriction endonucleases used for the digestion of DNAs are indicated at the top. Arrows in lane 1 (AccIdigests) indicate two prominent bands of 950 bp and 1,059 bp corresponding to the expected sizes of AccI digests spanning the LTRs andinternal sequences at the 59 and 39 regions, respectively (see fig. 2). The positions of DNA size standards (in kb) are shown on the right. B,Dot-blot analysis for determining the copy number of CsRn1. Each of two membranes was dotted in duplicate with varying amounts (mg) ofgenomic DNAs as shown on the right. The probes used for hybridization are indicated at the top: P-B13 for CsRn1 (see fig. 2) and Cys-Profor cysteine protease as a single-copy control. The signal intensities between the two blots were compared to estimate the copy number ofCsRn1.

Table 1Sequence Similarities Among CsRn1 Copies

CsRn1 COPY

LTR SEQUENCE

LTRsa ConsensusbCODING

SEQUENCEc

CsRn1-1 . . . . . . . .CsRn1-2 . . . . . . . .CsRn1-4 . . . . . . . .CsRn1-7 . . . . . . . .CsRn1-15. . . . . . .CsRn1-16. . . . . . .CsRn1-26. . . . . . .CsRn1-39. . . . . . .CsRn1-52. . . . . . .CsRn1-54. . . . . . .CsRn1-74. . . . . . .CsRn1-82. . . . . . .CsRn1-86. . . . . . .CsRn1-89. . . . . . .

99.1 (0.9)99.2 (0.9)99.6 (0.4)99.8 (0.2)

100 (0.0)99.8 (0.2)

100 (0.0)100 (0.0)100 (0.0)

99.2 (0.8)99.6 (0.4)

100 (0.0)98.9 (1.1)99.6 (0.4)

97.4 (2.7)99.0 (1.0)99.2 (0.9)99.5 (0.5)99.8 (0.2)99.3 (0.8)98.4 (1.6)99.4 (0.6)99.6 (0.4)98.9 (1.1)99.4 (0.6)99.8 (0.2)98.8 (1.2)96.8 (3.2)

98.5 (1.5)99.2 (0.8)99.4 (0.6)99.4 (0.6)99.6 (0.4)99.5 (0.5)98.3 (1.7)99.5 (0.5)99.5 (0.5)99.3 (0.7)99.6 (0.4)99.8 (0.2)99.3 (0.7)98.8 (1.2)

Note.—For pairwise comparison, the positions with gaps are excluded. Per-centages of similarity calculated as (different nucleotides/total nucleotides) 3100. Numbers in parentheses are numbers of different bases per 100 bp.

a Comparison between 59 and 39 long terminal repeats (LTRs) of each ele-ment.

b Average similarity between each flanking LTR and a consensus sequence.c Comparison between internal coding sequence and a consensus sequence.

The PCR products of 39-end regions were clonedand sequenced, and 12 randomly chosen clones showedsix different sequences. The 39 ends of CsRn1 transcriptslay within 141–146 bp downstream of the 59 end of the39 LTR and about 15 bp downstream of the presumed

AATACA poly A signal sequence. When compared withgenomic copies of CsRn1 elements, the transcriptsshared the same bases with the copies of group G atdiagnostic substitution positions (data not shown). A ho-mology search of CsRn1 sequence against dbEST ofGenBank revealed that a yet-unidentified CsRn1-like el-ement is also expressed in Schistosoma mansoni, one ofthe well-studied trematodes (fig. 7C). Since the expan-sion of retrotransposons is largely restricted to germcells and/or early embryonic stages, the frequency ofCsRn1 amplification could not be determined by North-ern blot analysis, in which the total RNA moleculesfrom the whole organism were used. However, the poly-morphic distribution of CsRn1 among individuals andthe presence of its mRNA transcripts suggest constant,uninterrupted expansion of the element responsible forthe high copy number.

The Evolutionary Relationship of CsRn1 with OtherRetrotransposons

The amino acid sequence of RT encoded in CsRn1showed strong homology with that of RT in many Ty3/gypsy-like LTR retrotransposons, particularly in Kabuki(identity value of 63%) and AE003787 (58%), and asimilar pattern of homology was also found with thesequences of IN (fig. 4). Moreover, the nucleotide se-quence of CsRn1 exhibited homology with the DNAsequences of the two elements over a relatively long

1480 Bae et al.

FIG. 6.—Phylogenetic analysis of the full-length CsRn1 elements. The analysis was conducted using PHYLIP based on an alignment offull-length nucleotide sequences. The tree was constructed using the neighbor-joining algorithm and was unrooted. The number at a particularnode indicates its percentage of appearances in 1,000 bootstrap replicates, and only values of .60% are indicated. Groupings supported by thebootstrap analysis, as well as by the diagnostic bases, are marked as thicker branches (see text).

FIG. 7.—Mobile activity of CsRn1. A, Southern blot analysis of CsRn1 with genomic DNAs separately extracted from individual worms(presented as numbers on the top) and P-LTR probe (see fig. 2). HindII restriction endonuclease was used for the digestion of the DNAs. Severalpolymorphic bands among individuals introduced by recent expansion of CsRn1 can be seen between 0.56 and 1.4 kb. B, Amplification of 59-end, reverse transcriptase (RT), P-B13 marker, and 39-end regions of CsRn1 from a cDNA library of Clonorchis sinensis by PCR methods.Primers used in each PCR reaction are as follows: T3 and 5LTR-R (59-CGACTAAATCCGCTGAATC-39) for the 59-end region; RT-F (59-GACGAAAGGTCCACCTGTC-39) and RT-R (59-GGGCAATGGTGAAATACCTG-39) for RT; Pro-F (59-TCTGGTTGAGCGTTTCCATC-39)and Pro-R (59-CACTAGAGCGTTGACCGTG-39) for P-B13; and T7 and 3LTR-F (59-GAAACTTGAAGTGAGCAAC-39) for the 39-end region.M 5 100-bp size marker. C, Homology search of CsRn1 genomic sequence against the dbEST of GenBank databases. The full-length CsRn1and three expressed sequence tags (ESTs) of Schistosoma mansoni with the highest match are presented. The hatched boxes indicate the matchedregions, and their positions in the CsRn1 sequence are shown, together with the homology values. The ESTs of S. mansoni are presented withaccession numbers in the databases and were obtained from cDNA libraries of male (AI977543) and female (AI975406 and AI976475) adultworms of the trematode.

range of nucleotide positions (Kabuki, 53.6% identity in2,972 nt; AE003787, 53.5% in 3,583 nt). In view ofthese observations, we performed a phylogenetic anal-ysis using amino acid sequences of pol proteins in orderto gain further understanding of the relationship ofCsRn1 with other Ty3/gypsy-like LTR retrotransposons.In addition to RT, the amino acid sequences of RH andIN were selected for this analysis to increase the reso-lution (Malik and Eickbush 1999).

In a previous report, Malik and Eickbush (1999)divided Ty3/gypsy-like LTR retrotransposons into eight

distinct clades. As shown in figure 8, the members ofthe eight clades were well separated in a tree constructedby the UPGMA method. Interestingly, however, CsRn1formed a previously undetected, tightly conserved cladewith Kabuki and AE003787. A similar clustering patternwas observed in a tree constructed with the neighbor-joining algorithm (data not shown), and the statisticalsignificance of the branching points was well supportedby bootstrap analysis. As members of a new clade (des-ignated the CsRn1 clade), the three elements shared anumber of common features, such as a CHCC Gag mo-

CsRn1 Retrotransposon in a Trematode 1481

FIG. 8.—Phylogenetic relationship of CsRn1 with other Ty3/gyp-sy-like retrotransposons. The analysis was based on an alignment ofamino acid sequences from reverse transcriptase to central subdomainof integrase (DDE motif), and the tree was constructed by the UPGMAalgorithm using PHYLIP and was unrooted. Branching nodes separat-ing each clade are represented as thicker bars, and bootstrap values ofthese nodes are shown as numbers at the nodes. The elements markedwith asterisks indicate retrotransposons belonging to the genus Erran-tivirus. The elements used in the analysis are as follows: 412, Dro-sophila melanogaster (X04132); AE003787, undescribed retrotranspo-son of D. melanogaster; AF262041, undescribed retrotransposon ofArabidopsis thaliana; AF026205 and U88169, undescribed retrotran-sposons of Caenorhabditis elegans; Athila, A. thaliana (AB005248);Blastopia, D. melanogaster (Z27119); Cer1, C. elegans (U15406);Cft1, Cladosporium fulvum (AF051915); Deal, Ananas comosus(Y12432); Cyclops, Vicia faba (AB007466); Gulliver, Schistosoma ja-ponicum (AF243513); Gypsy, D. melanogaster (M12927); Kabuki,Bombyx mori (AB032718); Mag, B. mori (S08405); MarY1, Tricho-loma matsukake (AB028236); Mdg1, D. melanogaster (X59545);Mdg3, D. melanogaster (X95908); Osvaldo, D. buzzatti (AJ133521);RIRE7, Oryza sativa (AB033235); Sushi, Fugu rubripes (AF030881);Ted, Trichoplusia ni (M32662); Tf1, Schizosaccharomyces pombe(M38526); Tom, D. ananassae (Z24451); Tv1, D. virilis (AF056940);Ty3-2, Saccharomyces cerevisiae (S53577); Ulysses, D. virilis(X56645); Woot, Tribolium castaneum (U09586); Yoyo, Ceratitis cap-itata (U60529); ZAM, D. melanogaster (AJ000387).

tif, a TSD of 4 bases (in the case of Kabuki, TSD cannotbe determined; see Abe et al. 2000), and sequence con-servations in the PBS and PPT (fig. 3), as well as func-tional protein domains (fig. 4).

Discussion

In the present study, we characterized CsRn1 fromthe genetically polymorphic regions among individualC. sinensis worms as the first member of uncorruptedLTR retrotransposons found in the phylum Platyhelmin-thes. The full-length CsRn1 encodes a single uninter-rupted ORF which resembles pol proteins of Ty3/gypsy-like elements. A phylogenetic analysis showed thatCsRn1 is a member of a distinct, previously undetectedclade of LTR retrotransposons which exhibit definitivecharacteristics such as highly conserved PBS (tRNATrp)and PPT, a highly conserved and unusual CHCC Gag

motif, a TSD of 4 bases, and strong similarity of func-tional protein domains.

For expansion, retrotransposons are transcribed byhost RNA polymerase and then reverse-transcribed bytheir own reverse transcriptase. Because these two en-zymes have no proofreading capacity (Varmus andBrown 1989), cDNAs produced during the process oftransposition tend to acquire random base substitutions.These base substitutions frequently inactivate the prog-eny copies, and the resulting ‘‘dead-on-arrival’’ (DOA)copies with no mobile activities are likely to be sub-jected to neutral evolution, through which the accumu-lation of sequence variations within the DOA copies isaccelerated (Petrov, Lozovskaya, and Hartl 1996). Thus,together with the low fidelity of RT, the neutral evolu-tion of the inactive DOA copies may have an additionaleffect on sequence divergence among individual ge-nomes of a species. In this study, a variable genomicregion among individuals was successfully identified byAP-PCR-based RAPD analysis as CsRn1 LTR retrotran-sposon-related sequence.

During an evolutionary time, a particular subset ofretrotransposons expands differentially, rather than si-multaneously, from other variant subsets with selectiveadvantage for expansion (Clough et al. 1996). Thus, arecently expanded master copy can be distinguished asthe subset with the largest population (Boissinot, Chev-ret, and Furano 2000) and with low levels of sequencedivergence among its members (Medstrand and Mager1998). In addition to these characteristics, high sequenceidentity between flanking LTRs of individual copies canbe accepted as a hallmark of their recent integration incases of LTR retrotransposons, since a pair of LTRs usethe same sequences as templates for their replications(Dangel et al. 1995). CsRn1 copies of group G satisfiedall of these criteria (table 1 and fig. 6), suggesting therole of group G as an active master copy, and the pre-served mobile activity was confirmed by the uncorrupt-ed ORF, heterogeneous distribution among individualgenomes (fig. 7A), and the presence of mRNA tran-scripts of CsRn1 (fig. 7B).

The coding capacity of CsRn1 suggests that the el-ement is a member of metaviruses, which have no envgene (Pringle 1999). Although no significant homologyto other LTR retrotransposons at the nucleotide sequencelevel was found throughout the whole unit, sequencemotifs for the synthesis of double-stranded cDNA (PBSand PPT) showed strong identity with those of Kabuki(B. mori) and AE003787 (D. melanogaster) (fig. 3).Moreover, the amino acid sequences in the unusualCHCC Gag motif, RT, and IN were well conserved inthe three elements (fig. 4). With these shared properties,a phylogenetic analysis suggested that CsRn1 formed anancient, previously undetected clade of Ty3/gypsy-likeLTR retrotransposons found in insects and trematodes(CsRn1 clade; fig. 8). The nucleotide sequences of Ka-buki and AE003787 in their putative ORF regions arecorrupted, which inactivates the elements and gives riseto low copy numbers (the number of Kabuki was esti-mated as eight in the haploid genome; Abe et al. 2000).However, several copies of CsRn1 were uncorrupted and

1482 Bae et al.

showed maintained mobile capacity (fig. 7), suggestingthat the element might have acquired high copy numbersin C. sinensis genomes (more than 100 per haploid ge-nome; fig. 5) through its continuous expansion.

Although they had a phylogenetically close rela-tionship, differences in the numbers of ORFs and nosignificant homology in nucleotide sequences were ob-served among the members of the new clade, whichreduces the probability of horizontal transfer betweeninsects and trematodes in the recent past. Thus, it islikely that these elements evolved from a common an-cestor that was present in the common progenitor ofinsects and trematodes or was transferred from insectsto trematodes, or vice versa, during the early stage ofthe divergence of the two taxa. However, the possibilityof horizontal transfer between insects and trematodes isuncertain because few cases, only within similar species,have been reported (Gonzalez and Lessios 1999; Jordan,Matyunina, and McDonald 1999). The isolation of fur-ther elements belonging to the CsRn1 clade in insectsand trematodes or the finding of an errantivirus(s) thatis phylogenetically related to the clade (Malik and Eick-bush 1999) will be helpful in understanding the detailedevolutionary course among the members of the clade.

Only a few data on LTR retrotransposons are avail-able for the Platyhelminthes, despite the large contentof repetitive elements in their genomes (see Regev,Lamb, and Jablonka 1998 and references therein). Mod-el organisms such as D. melanogaster, A. thaliana, andS. cerevisiae, commonly used in previous studies, havegenome structures with relatively low complexity andrepetitive elements with low copy numbers, whichmakes it difficult to estimate the actual significance ofretrotransposons in the complex genome of the animal.Thus, the present study using a trematode provides ad-vantages for the study of retrotransposons with small butcomplex genomes. We confirmed the presence of partialsequences similar to those of CsRn1 in other trematodes,S. mansoni (fig. 7C) and Paragonimus westermani (un-published data). These results reflect the presence of aunique LTR retrotransposon family belonging to theCsRn1 clade in the lower animal taxa which may playsignificant roles in the evolution of genomes. Our resultsconcerning an active LTR retrotransposon and its relatedelements will broaden the current knowledge on LTRretrotransposons and provide a clue for further studieson the evolutionary origin of diverse reverse-transcrib-ing elements.

Supplementary Material

All of the nucleotide sequences described in this articlewere deposited in GenBank of NCBI, and their accessionnumbers are as follows: P-B13 RAPD marker, AZ551682;CsRn1-1, AY013558; CsRn1-2, AY013570; CsRn1-4,AY013569; CsRn1-7, AY013563; CsRn1-15, AY013566;CsRn1-16, AY013562; CsRn1-26, AY013571; CsRn1-39,AY013564; CsRn1-52, AY013565; CsRn1-5lb4,AY013568; CsRn1-74, AY013560; CsRn1-82, AY013561;CsRn1-86, AY013567; CsRn1-89, AY013559.

A sequence alignment of multiple CsRn1 copieswas deposited in GenBank in linked form to each nu-cleotide set, of which accession numbers are presentedabove, and the alignment of pol proteins used for phy-logenetic analysis in this work was deposited in linkedform to the nucleotide sequence of CsRn1-4(AY013569).

Acknowledgments

This work was supported by a grant from the KoreaScience and Engineering Foundation (KOSEF) (number1999 2-208-004 5).

LITERATURE CITED

ABE, H., M. KANEHARA, T. TERADA, F. OHBAYASHI, T. SHI-MADA, S. KAWAI, M. SUZUKI, T. SUGASAKI, and T. OSHIKI.1998. Identification of novel random amplified polymorphicDNAs (RAPDs) on the W chromosome of the domesticatedsilkworm, Bombyx mori, and the wild silkworm, B. man-darina, and their retrotransposable element-related nucleo-tide sequences. Genes Genet. Syst. 73:243–254.

ABE, H., F. OHBAYASHI, T. SHIMADA, T. SUGASAKI, S. KAWAI,K. MITA, and T. OSHIKI. 2000. Molecular structure of anovel gypsy-Ty3-like retrotransposon (Kabuki) and nestedretrotransposable elements on the W chromosome of thesilkworm Bombyx mori. Mol. Gen. Genet. 263:916–924.

ALTSCHUL, S. F., T. L. MADDEN, A. A. SCHAFFER, J. ZHANG,Z. ZHANG, W. MILLER, and D. J. LIPMAN. 1997. GappedBLAST and PSI-BLAST: a new generation of protein da-tabase search programs. Nucleic Acids Res. 25:3389–3402.

ARKHIPOVA, I., and M. MESELSON. 2000. Transposable ele-ments in sexual and ancient asexual taxa. Proc. Natl. Acad.Sci. USA 97:14473–14477.

BIESSMANN, H., M. F. WALTER, D. LE, S. CHUAN, and J. G.YAO. 1999. Moose, a new family of LTR-retrotransposonsin the mosquito Anopheles gambiae. Insect Mol. Biol. 8:201–212.

BOEKE, J. D., D. J. GARFINKEL, C. A. STYLES, and G. R. FINK.1985. Ty elements transpose through an RNA intermediate.Cell 40:491–500.

BOEKE, J. D., and J. P. STOYE. 1997. Retrotransposons, endog-enous retroviruses, and the evolution of retroelements. Pp.343–435 in J. M. COFFIN, S. H. HUGHES, and H. E. VAR-MUS, eds. Retroviruses. Cold Spring Harbor LaboratoryPress, New York.

BOISSINOT, S., P. CHEVRET, and A. V. FURANO. 2000.L1(LINE-1) retrotransposon evolution and amplification inrecent human history. Mol. Biol. Evol. 17:915–928.

BOWEN, N. J., and J. F. MCDONALD. 1999. Genomic analysisof Caenorhabditis elegans reveals ancient families of ret-roviral-like elements. Genome Res. 9:924–935.

BRITTEN, R. J., T. J. MCCORMACK, T. L. MEARS, and E. H.DAVIDSON. 1995. Gypsy/Ty3-class retrotransposons inte-grated in the DNA of herring, tunicate, and echinoderms. J.Mol. Evol. 40:13–24.

BURGE, C., and S. KARLIN. 1997. Prediction of complete genestructures in human genomic DNA. J. Mol. Biol. 268:78–94.

CHEN, S., C. FOCKLER, and R. HIGUCHI. 1994. Efficient am-plification of long targets from human genomic DNA andcloned inserts. Proc. Natl. Acad. Sci. USA 91:5695–5699.

CHUN, J. 1995. PHYDIT. Distributed by the author (http://kctc2.kribb.re.kr/jchun/phydit/right.html).

CsRn1 Retrotransposon in a Trematode 1483

CLOUGH, J. E., J. A. FOSTER, M. BARNETT, and H. A. WICH-MAN. 1996. Computer simulation of transposable elementevolution: random template and strict master models. J.Mol. Evol. 42:52–58.

DANGEL, A. W., B. J. BAKER, A. R. MENDOZA, and C. Y. YU.1995. Complement component of C4 gene intron 9 as aphylogenetic marker for primates: long terminal repeats ofthe endogenous retrovirus ERV-K (C4) are a molecularclock of evolution. Immunogenetics 42:41–52.

FEDOROFF, N. 2000. Transposons and genome evolution inplants. Proc. Natl. Acad. Sci. USA 97:7002–7007.

FELDER, H., A. HERZCEG, Y. DE CHASTONAY, P. AEBY, H. TOB-LER, and F. MULLER. 1994. Tas, a retrotransposon from theparasitic nematode Ascaris lumbricoides. Gene 149:219–225.

FELSENSTEIN, J. 1993. PHYLIP (phylogeny inference package).Version 3.5c. Distributed by the author (http://evolution.genetics.washington.edu/phylip.html), Department of Ge-netics, University of Washington, Seattle.

GONZALEZ, P., and H. A. LESSIOS. 1999. Evolution of sea ur-chin retroviral-like (SURL) elements: evidence from 40Echinoid species. Mol. Biol. Evol. 16:938–952.

GOUZY, J., F. CORPET, and D. KAHN. 1999. Whole genomeprotein domain analysis using a new method for domainclustering. Comput. Chem. 23:330–340.

GRIBSKOV, M., A. D. MCLACHLAN, and D. EISENBERG. 1987.Profile analysis: detection of distantly related proteins. Proc.Natl. Acad. Sci. USA 84:4355–4358.

JORDAN, I. K., L. V. MATYUNINA, and J. F. MCDONALD. 1999.Evidence for the recent horizontal transfer of long terminalrepeat retrotransposon. Proc. Natl. Acad. Sci. USA 96:12621–12625.

KIDWELL, M. G., and D. LISCH. 1997. Transposable elementsas sources of variation in animals and plants. Proc. Natl.Acad. Sci. USA 94:7704–7711.

LAHA, T., A. LOUKAS, C. K. VERITY, D. P. MCMANUS, and P.J. BRINDLEY. 2001. Gulliver, a long terminal repeat retro-transposon from the genome of the oriental blood flukeSchistosoma japonicum. Gene 264:59–68.

LINDSLEY, D. L., and G. G. ZIMM. 1992. The genome of Dro-sophila melanogaster. Academic Press, New York.

LONG, A. D., R. F. LYMAN, A. H. MORGAN, C. H. LANGLEY,and T. F. C. MACKAY. 2000. Both naturally occurring in-sertions of transposable elements and intermediate frequen-cy polymorphisms at the achaete-scute complex are asso-ciated with variation in bristle number in Drosophila me-lanogaster. Genetics 154:1255–1269.

MCDONALD, J. F. 1990. Macroevolution and retroviral ele-ments. Bioscience 40:183–191.

MALIK, H. S., W. D. BURKE, and T. H. EICKBUSH. 1999. Theage and evolution of non-LTR retrotransposable elements.Mol. Biol. Evol. 16:793–805.

MALIK, H., and T. H. EICKBUSH. 1999. Modular evolution ofthe integrase domain in the Ty3/Gypsy class of LTR retro-transposons. J. Virol. 73:5186–5190.

MARIN, I., and C. LLORENS. 2000. Ty3/Gypsy retrotransposons:description of new Arabidopsis thaliana elements and evo-lutionary perspectives derived from comparative genomicdata. Mol. Biol. Evol. 17:1040–1049.

MEDSTRAND, P., and D. L. MAGER. 1998. Human-specific in-tegrations of the HERV-K endogenous retrovirus family. J.Virol. 72:9782–9787.

NICHOLAS, K. B., and H. B. NICHOLAS JR. 1997. GeneDoc: atool for editing and annotation multiple sequence align-ments. Distributed by the authors (www.cris.com/;ketchup/genedoc.shtmi).

PAGE, R. D. 1996. Tree View: an application to display phy-logenetic trees on personal computers. Comput. Appl. Bio-sci. 12:357–358.

PETROV, D. A., E. R. LOZOVSKAYA, and D. L. HARTL. 1996.High intrinsic rate of DNA loss in Drosophila. Nature 384:346–349.

POULTER, R., and M. BUTLER. 1998. A retrotransposon familyfrom the pufferfish (fugu) Fugu rubripes. Gene 215:241–249.

PRINGLE, C. R. 1999. Virus taxonomy—1999. The universalsystem of virus taxonomy, updated to include the new pro-posals ratified by the International Committee on Taxonomyof Viruses during 1998. Arch. Virol. 144:421–429.

REGEV, A., M. J. LAMB, and E. JABLONKA. 1998. The role ofDNA methylation in invertebrates: developmental regula-tion or genome defense? Mol. Biol. Evol. 15:880–891.

SAMBROOK, J., E. F. FRITSCH, and T. MANIATIS. 1989. Molec-ular cloning: a laboratory manual. 2nd edition. Cold SpringHarbor Press, New York.

THOMPSON, J. D., T. J. GIBSON, F. PLEWNIAK, F. JEANMOUGIN,and D. G. HIGGINS. 1997. The ClustalX windows interface:flexible strategies for multiple sequence alignment aided byquality analysis tools. Nucleic Acids Res. 25:4876–4882.

THOMPSON, J. D., D. G. HIGGINS, and T. J. GIBSON. 1994.CLUSTAL W: improving the sensitivity of progressive mul-tiple sequence alignment through sequence weighting, po-sition-specific gap penalties and weight matrix choice. Nu-cleic Acids Res. 22:4673–4680.

VARMUS, H., and P. BROWN. 1989. Retroviruses. Pp. 53–108in D. E. Berg and M. H. Howe, eds. Mobile DNA. Amer-ican Society of Microbiology, Washington, D.C.

XIONG, Y., and T. H. EICKBUSH. 1990. Origin and evolution ofretroelements based upon their reverse transcriptase se-quences. EMBO J. 9:3353–3362.

KENNETH WOLFE, reviewing editor

Accepted April 4, 2001