A New Member of a Family of Site-Specific Retrotransposons Is ...

10
Vol. 11, No. 12 A New Member of a Family of Site-Specific Retrotransposons Is Present in the Spliced Leader RNA Genes of Trypanosoma cruzi MERCEDITAS S. VILLANUEVA,* SUZANNE P. WILLIAMS, CHARLES B. BEARD, FRANK F. RICHARDS, AND SERAP AKSOY Yale-MacArthur Center for Molecular Parasitology and Department of Internal Medicine, Yale University School of Medicine, P.O. Box 3333, New Haven, Connecticut 06510 Received 15 May 1991/Accepted 29 August 1991 A new member of a family of site-specific retrotransposons is described in the New World trypanosome Trypanosoma cruzi. This element, CZAR (cruzi-associated retrotransposon), resembles two previously de- scribed retrotransposons found in the African trypanosome T. brucei gambiense and the mosquito trypanoso- matid Crithidia fasciculata in specifically inserting between nucleotides 11 and 12 of the highly conserved 39-mer of the spliced leader RNA (SL-RNA) gene. CZAR is similar in overall organization to the other two SL-RNA-associated elements. It possesses two potential long open reading frames which resemble the gag and pol genes of retroviruses. In the pol open reading frame, all three elements contain similarly arranged endonuclease domains and share extensive amino acid homology in the reverse transcriptase region. All are associated with the SL-RNA gene locus and are present in low copy numbers. They do not appear to have 5' truncated versions. All three retrotransposons are otherwise quite distinct from one another, with no significant overall amino acid homology. The presence of such retroelements inserted into the identical site within SL-RNA gene sequences in at least three evolutionarily distant trypanosomatid species argues for a functional role. Because these elements appear to have a precise target site requirement for integration, we refer to them as SL siteposons. The formation of mRNA in members of the family Trypa- nosomatidae involves trans splicing of the 5'-end 39 nucle- otides (nt) of a small nonpolyadenylated transcript, the spliced leader RNA (SL-RNA), to all pre-mRNAs posttran- scriptionally (reviewed in references 1 and 9). The SL-RNA is composed of two parts: the trans-spliced 39-mer which is highly conserved between species and the 3' nonspliced portion which varies in both size and sequence among trypanosomatids (15). The mechanisms and reasons under- lying discontinuous transcription of protein coding genes remain unclear (9). In all trypanosomatid species examined, multiple copies of SL-RNA genes exist in tandem arrays in discrete genomic loci (2, 16, 22, 39, 40). It has previously been reported that in the African trypanosome Trypanosoma brucei gambiense, several copies of these SL-RNA genes are interrupted by a 5.5- to 7.0-kb retrotransposon, SLACS (spliced leader-associated conserved sequence) (2, 3) or MAE (miniexon donor RNA gene-associated element) (11). A similar 4.0-kb element, CRE1, has been described in the distantly related mosquito trypanosomatid Crithidia fascic- ulata (23). SLACS and CRE1 both interrupt SL-RNA genes between nucleotides 11 and 12 of the SL 39-mer. Such site specificity of integration is unusual among retroviruses and retrotransposons. Trypanosoma cruzi is a New World trypanosome that causes Chagas' disease in some rural and poverty-stricken areas of Latin America. Like T. brucei gambiense, it cycles between a vertebrate and insect host. Both of these trypano- somes are thought to be evolutionarily remote from C. fasciculata, which is presumed to be monogenetic and nonpathogenic to humans (51). Furthermore, T. cruzi and T. brucei gambiense cause an entirely distinct spectrum of diseases in nonoverlapping geographic distributions. Despite * Corresponding author. these differences, all three trypanosomatids are similar with respect to the organization of their SL-RNA genes. In this report, we describe that T. cruzi also contains an SL- associated retrotransposon which we have called CZAR (cruzi-associated retrotransposon). This is the first transpos- able element to be described in T. cruzi. It has many structural similarities to SLACS and CRE1, including the same site specificity of integration in SL sequences. By virtue of this site specificity, the three elements comprise a distinct subset among retrotransposons. We refer to these elements as siteposons. MATERIALS AND METHODS Maintenance of T. cruzi cultures. All studies were per- formed with a North American strain of T. cruzi isolated from a triatomid bug in Gainesville, Fla. (4). Parasites were cultured at 28°C in LIT medium as previously described (12). These cultured forms corresponded to the epimastigote stage of the T. cruzi life cycle. Preparation of nucleic acids. DNA from the T. cruzi strain was isolated from late-log-phase 28°C cultures according to standard procedures (7). DNA from recombinant phages was extracted as described previously (55). Plasmid DNA was prepared by a small-scale alkaline lysis method (35). Construction and screening of a T. cruzi genomic library. T. cruzi DNA was subjected to partial Sau3A digestion to generate 15- to 20-kb fragments. These fragments were ligated into BamHI-cut EMBL3 DNA that had been treated with bacterial alkaline phosphatase. The resulting library was packaged and amplified, and its titer was determined on Escherichia coli P2 392. Recombinants were screened ini- tially with an SL-specific oligonucleotide, SL-1, correspond- ing to nt 17 to 45 of the T. brucei SL-RNA repeat unit (5'-GTT TCG CAT ACC AAT AAA GTA CAG AAA CTG-3'). Subsequent screening and phage mapping were 6139 MOLECULAR AND CELLULAR BIOLOGY, Dec. 1991, p. 6139-6148 0270-7306/91/126139-10$02.00/0 Copyright © 1991, American Society for Microbiology on February 17, 2018 by guest http://mcb.asm.org/ Downloaded from

Transcript of A New Member of a Family of Site-Specific Retrotransposons Is ...

Page 1: A New Member of a Family of Site-Specific Retrotransposons Is ...

Vol. 11, No. 12

A New Member of a Family of Site-Specific Retrotransposons IsPresent in the Spliced Leader RNA Genes of Trypanosoma cruzi

MERCEDITAS S. VILLANUEVA,* SUZANNE P. WILLIAMS, CHARLES B. BEARD,FRANK F. RICHARDS, AND SERAP AKSOY

Yale-MacArthur Center for Molecular Parasitology and Department ofInternal Medicine, Yale University School ofMedicine, P.O. Box 3333, New Haven, Connecticut 06510

Received 15 May 1991/Accepted 29 August 1991

A new member of a family of site-specific retrotransposons is described in the New World trypanosomeTrypanosoma cruzi. This element, CZAR (cruzi-associated retrotransposon), resembles two previously de-scribed retrotransposons found in the African trypanosome T. brucei gambiense and the mosquito trypanoso-matid Crithidia fasciculata in specifically inserting between nucleotides 11 and 12 of the highly conserved39-mer of the spliced leader RNA (SL-RNA) gene. CZAR is similar in overall organization to the other twoSL-RNA-associated elements. It possesses two potential long open reading frames which resemble the gag andpol genes of retroviruses. In the pol open reading frame, all three elements contain similarly arrangedendonuclease domains and share extensive amino acid homology in the reverse transcriptase region. All areassociated with the SL-RNA gene locus and are present in low copy numbers. They do not appear to have 5'truncated versions. All three retrotransposons are otherwise quite distinct from one another, with no significantoverall amino acid homology. The presence of such retroelements inserted into the identical site within SL-RNAgene sequences in at least three evolutionarily distant trypanosomatid species argues for a functional role.Because these elements appear to have a precise target site requirement for integration, we refer to them as SLsiteposons.

The formation of mRNA in members of the family Trypa-nosomatidae involves trans splicing of the 5'-end 39 nucle-otides (nt) of a small nonpolyadenylated transcript, thespliced leader RNA (SL-RNA), to all pre-mRNAs posttran-scriptionally (reviewed in references 1 and 9). The SL-RNAis composed of two parts: the trans-spliced 39-mer which ishighly conserved between species and the 3' nonsplicedportion which varies in both size and sequence amongtrypanosomatids (15). The mechanisms and reasons under-lying discontinuous transcription of protein coding genesremain unclear (9). In all trypanosomatid species examined,multiple copies of SL-RNA genes exist in tandem arrays indiscrete genomic loci (2, 16, 22, 39, 40). It has previouslybeen reported that in the African trypanosome Trypanosomabrucei gambiense, several copies of these SL-RNA genesare interrupted by a 5.5- to 7.0-kb retrotransposon, SLACS(spliced leader-associated conserved sequence) (2, 3) orMAE (miniexon donor RNA gene-associated element) (11).A similar 4.0-kb element, CRE1, has been described in thedistantly related mosquito trypanosomatid Crithidia fascic-ulata (23). SLACS and CRE1 both interrupt SL-RNA genesbetween nucleotides 11 and 12 of the SL 39-mer. Such sitespecificity of integration is unusual among retroviruses andretrotransposons.Trypanosoma cruzi is a New World trypanosome that

causes Chagas' disease in some rural and poverty-strickenareas of Latin America. Like T. brucei gambiense, it cyclesbetween a vertebrate and insect host. Both of these trypano-somes are thought to be evolutionarily remote from C.fasciculata, which is presumed to be monogenetic andnonpathogenic to humans (51). Furthermore, T. cruzi and T.brucei gambiense cause an entirely distinct spectrum ofdiseases in nonoverlapping geographic distributions. Despite

* Corresponding author.

these differences, all three trypanosomatids are similar withrespect to the organization of their SL-RNA genes. In thisreport, we describe that T. cruzi also contains an SL-associated retrotransposon which we have called CZAR(cruzi-associated retrotransposon). This is the first transpos-able element to be described in T. cruzi. It has manystructural similarities to SLACS and CRE1, including thesame site specificity of integration in SL sequences. Byvirtue of this site specificity, the three elements comprise adistinct subset among retrotransposons. We refer to theseelements as siteposons.

MATERIALS AND METHODS

Maintenance of T. cruzi cultures. All studies were per-formed with a North American strain of T. cruzi isolatedfrom a triatomid bug in Gainesville, Fla. (4). Parasites werecultured at 28°C in LIT medium as previously described (12).These cultured forms corresponded to the epimastigote stageof the T. cruzi life cycle.

Preparation of nucleic acids. DNA from the T. cruzi strainwas isolated from late-log-phase 28°C cultures according tostandard procedures (7). DNA from recombinant phages wasextracted as described previously (55). Plasmid DNA wasprepared by a small-scale alkaline lysis method (35).

Construction and screening of a T. cruzi genomic library. T.cruzi DNA was subjected to partial Sau3A digestion togenerate 15- to 20-kb fragments. These fragments wereligated into BamHI-cut EMBL3 DNA that had been treatedwith bacterial alkaline phosphatase. The resulting librarywas packaged and amplified, and its titer was determined onEscherichia coli P2 392. Recombinants were screened ini-tially with an SL-specific oligonucleotide, SL-1, correspond-ing to nt 17 to 45 of the T. brucei SL-RNA repeat unit(5'-GTT TCG CAT ACC AAT AAA GTA CAG AAACTG-3'). Subsequent screening and phage mapping were

6139

MOLECULAR AND CELLULAR BIOLOGY, Dec. 1991, p. 6139-61480270-7306/91/126139-10$02.00/0Copyright © 1991, American Society for Microbiology

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 2: A New Member of a Family of Site-Specific Retrotransposons Is ...

6140 VILLANUEVA ET AL.

5 SL-RNA

1 39 105

6O9nt

11 11

SL K_I_11_ORF2 _ ___SLCC IORFi I ORF2 (A)42

22nt 5' CZAR 22nt

B H S H S

S P B B P P B P PP P B

__ ll. 4- x

pBS2 pB0.91 pBS3 pBS4 pBS1

<- 4-O (3-4-4 4-0 (4 0-

1 kb

FIG. 1. Organization and restriction map of CZAR inserted within SL-RNA gene sequences. (A) Schematic representation of the CZARelement showing its overall organization and orientation of transcription. Above is a diagram of the 609-nt T. cruzi SL-RNA repeat unit. Ithas a 105-nt SL-RNA transcript which has the conserved 39-mer at its 5' end. The 39-mer is represented by three boxes as follows: nt 1 to11 (black), nt 12 to 32 (diagonals), and nt 33 to 39 (hatched). CZAR inserts between nt 11 and 12 of the SL-RNA gene and is bounded by 22-nttarget site duplications consisting of nt 11 to 32 of the 39-mer. There are two ORFs, which are followed by a 42-nt poly(A) stretch at the 3'end. At the 5' end, the open circles represent the 185-nt tandem repeats. (B) Restriction map ofCZAR and sequencing strategy as determinedon recombinant XTCBG 4. Below are six subclones constructed in the plasmid vector pUC13 by using the BamHI or BamHI-Sall fragmentsthat span CZAR sequences. The direction of DNA sequence obtained from smaller fragments subcloned into m13 sequencing vectors isindicated by arrows. Where sequence overlaps were not obtained, specific oligonucleotides were synthesized; these are indicated by arrows

with open circles. Abbreviations for restriction sites: S, Sall; P, PstI; B, BamHl; H, HindIII.

done with pTC-SL, a pUC13 clone containing a 0.6-kbHindIII fragment with the SL-RNA gene repeat unit; thiswas subcloned from TC-18, a lambda recombinant isolatedfrom the library that contained multiple copies of the SL-RNA repeat unit.

Southern blotting and labelling of probes. Genomic orrecombinant phage DNA was digested, separated on 1%agarose gels, and transferred to nitrocellulose filters as

previously described (49). Filters were hybridized to probesprepared either by 5' end labelling of synthetic oligonucleo-tides with [-y-32P]ATP and T4 kinase or by random priming ofdouble-stranded plasmid DNA with [a-32P]dATP, using theBoehringer Mannheim kit (catalog no. 1004760). For single-stranded probes, hybridization was performed at 42°C in amixture containing 5x SSC (lx SSC is 0.15 M NaCl plus0.015 M sodium citrate), 5x Denhardt's solution (SOx Den-hardt's solution contains 5 g of Ficoll, 5 g of polyvinylpyr-rolidone, and 5 g of bovine serum albumin per 500 ml ofH20), 0.2% sodium dodecyl sulfate (SDS), and 20 ,ig ofdenatured salmon sperm DNA per ml. For double-strandedprobes, hybridization was carried out in 50% formamide-5 xSSC-5x Denhardt's solution-0.2% SDS-20 ,ug of denaturedsalmon sperm DNA per ml. Filters were washed for 4 h in2x SSC-0.5% SDS at 42°C for single-stranded probes and in0.2x SSC-0.5% SDS at 55°C for double-stranded probes.

Subcloning and sequencing strategy. DNA sequence wasdetermined from recombinant phage XTCBG 4 as shown inFig. 1B. After detailed restriction mapping, specific restric-tion fragments were subcloned into the pUC13 vector asBamHI, BamHI-SalI, or SalI-HindIII fragments. These, in

turn, were further subcloned as PstI, BamHI-PstI, SalI-PstIfragments or recloned as Sall-HindIII fragments intoM13mp18 and M13mpl9 sequencing vectors. These clonesand the direction of sequence determination from these sitesare shown. In regions where sequence overlap was notachieved, specific oligonucleotide primers were synthesizedas indicated in Fig. 1B. In addition, a Sau3A library of thelargest 3.0-kb BamHI fragment was constructed in pUC13 toenable determination of sequence overlaps. Sequencing wasperformed by a modification of the dideoxy method ofSanger et al. (45), utilizing [355]dATP and Sequenase asinstructed by the manufacturer (U.S. Biochemical).DNA sequences were analyzed by using the University of

Wisconsin computer programs sponsored by the Yale Mi-crocomputer Center (17).

Pulsed-field gel electrophoresis (PFGE) analysis. Parasitesgrown to late log phase were concentrated to 5 x 108 cellsper ml and then embedded in 0.5% low-melting-point agar-ose and processed as previously described (2, 6). DNA inblocks was digested for 4 to 6 h, using 30 to 60 U ofrestriction enzyme per block, and then separated on 1%agarose gels, using the model 2015 Pulsaphor LKB appara-tus with a hexagonal electrode array or the Bio-Rad CHEFDR-II apparatus with Pulsewave 760 switcher. For separa-tion of DNA fragments in the 200- to 500-kb range, a pulsetime of 200 s was used at 170 V on the LKB apparatus. Toseparate chromosomes larger than 1,000 kb, a 50- to 200-sramped pulse time was used at 140 V. on the CHEF. Gelswere run at 14°C for 30 to 68 h.

Nucleotide sequence accession number. The nucleotide

A

F- V -.

MOL. CELL. BIOL.

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 3: A New Member of a Family of Site-Specific Retrotransposons Is ...

T. CRUZI RETROTRANSPOSON CZAR 6141

1 2 3 4

kb

23.1-9.4-6.6- __-04.4 -

2.3-2.0-

.56-

FIG. 2. Southern blot of T. cruzi genomic DNA probed withSL-RNA gene and conserved SL-associated sequences. In eachlane, 2 p.g of DNA was digested with HindlIl, separated on a 1%agarose gel, and transferred to nitrocellulose paper. The blot washybridized to probes which consisted of a 39-mer-specific oligonu-cleotide, SL-2 (lane 1), SL-RNA repeat unit-containing pTC-SL(lane 2), SL-associated but non-SL-containing pBO.9 (lane 3), andpBS4 (lane 4). For origins of these probes, see Materials andMethods and Fig. 1B. Lambda DNA cleaved with HindIII was usedas the size marker. Because of the different binding affinities ofnitrocellulose for small versus large DNA fragments, the relativeintensities of the 609-nt SL-RNA gene repeat unit and the largerrestriction fragments do not reflect their respective copy numbers.An alternative means of determining copy number of these frag-ments is described in the text.

sequence reported here has been entered in the EMBLnucleotide sequence data library under accession numberM62862.

RESULTS

Identification of an SL-associated retrotransposon in T.cruzi. From previous DNA sequence analysis, it is knownthat certain restriction enzymes cleave once within theSL-RNA gene repeat unit in T. cruzi (15). When such anenzyme, Hindlll, is used in genomic digests, a 609-nt bandcorresponding to the SL-RNA repeat unit hybridizes to theoligonucleotide probe SL-2, which contains the SL-RNA39-mer. In addition to the repeat unit-containing fragment, a6.0-kb band and a triplet of fragments of 2.5 kb are observed(Fig. 2, lane 1). When an analogous blot is probed withpTC-SL, which contains the entire 609-nt SL-RNA generepeat, the triplet of 2.5-kb bands is once again seen inaddition to the SL-RNA repeat unit itself (lane 2). We havenoted that a second 0.8-kb HindIII band also hybridizes toSL probes but have not further characterized it. This hybrid-ization pattern is reminiscent of that found in T. bruceigambiense and C. fasciculata, in which, in addition to themajor SL-RNA repeat unit, larger SL-hybridizing fragmentsare observed.To investigate the nature of the larger SL-hybridizing

fragments, a genomic T. cruzi library was constructed andscreened with SL-1, another SL-containing oligonucleotide.A detailed restriction enzyme map of one of the recombinantclones, XTCBG 4, is shown in Fig. 1B. It contained a HindIlI

fragment that hybridized to SL-2 that was identical in size tothe 6.0-kb HindIlI fragment seen in genomic DNA (data notshown) (Fig. 2, lane 1). Furthermore, when an internal nonSL-containing 1.3-kb BamHI-SalI fragment of XTCBG 4(pBS4 in Fig. 1B) was used to probe HindIII-digestedgenomic DNA, a 6.0-kb fragment was again found to hybrid-ize (Fig. 2, lane 4). XTCBG 4 did not contain a copy of oneof the 2.5-kb HindlIl fragments that hybridized to SL-2 andpTC-SL as a triplet in genomic blots. However, when pBO.9,the 0.9-kb non-SL-containing BamHI fragment located im-mediately upstream of the 6.0-kb HindIll fragment onXTCBG 4, was used to probe the same genomic blots, thetriplet of 2.5-kb HindIII bands was observed (Fig. 2, lane 3).This finding suggests that these three fragments are similar toone another but have regions of heterogeneity that accountfor their size differences. Furthermore, this heterogeneitydoes not appear to be random, since there are a discretenumber of fragments. From this analysis, we conclude thatXTCBG 4 faithfully contains the entire 6.0-kb fragment and aportion of one of the 2.5-kb HindIll fragments present asSL-hybridizing bands in the genome. In addition, our map ofXTCBG 4 indicates that these two HindIII fragments arephysically linked in the genome.The location of SL-RNA gene sequences on XTCBG 4 was

clarified by further Southern blot analysis. Specifically, asshown in Fig. 1B, SL-RNA gene homology was localized tofragments pBS1 and pBS2, which respectively encompassedthe 3' end of the 6.0-kb HindIlI fragment and sequences 5' topBO.9. In other partially characterized phages, one to twofull copies of the SL-RNA gene repeat unit were found toflank the 6.0- and 2.5-kb HindIll sequences, consistent withtheir interrupting the tandem array of SL-RNA genes (datanot shown).The nucleotide sequence of XTCBG 4 reveals that it contains

a non-LTR retrotransposon. The complete nucleotide se-quence of the region of XTCBG 4 shown in Fig. 1B wasdetermined. The sequence (Fig. 3) contains 7,291 nt begin-ning at nt 1 of the SL-RNA gene. At the 5' end, SLsequences are interrupted at nt 32 of the SL-RNA gene.Immediately adjacent to this insertion site at nt 36 to 52 is anearly perfect 17-bp duplication corresponding to nt 14 to 30of the SL 39-mer (underlined in Fig. 3). There are twopotential open reading frames (ORFs) which encompass 70%of the element and whose transcription is in the sameorientation as that of the SL-RNA genes. The first ORF(ORF 1) has a coding capacity of 386 amino acids, with thepresumed initiation Met at nt 1505 and termination at nt2663. Although there are three potential start codons in ORF1, none fits the optimum eukaryotic initiation sequence fortranslation (29). Thus, the most distal of the three wasarbitrarily chosen. The second ORF (ORF 2) is separatedfrom ORF 1 by 78 nt. It can code for 1,317 amino acids, withits presumptive start codon located at nt 2741 and stop codonat nt 6692. Following this is a 534-nt stretch preceding 42 Aresidues. Immediately adjacent to this poly(A) tract, SLsequences resume, beginning at nt 11 of the SL 39-mer.Thus, SL-RNA gene sequences from nt 11 to 32 are presentas identical 22-nt direct repeats at the 5' and 3' ends. Thesefeatures, schematically summarized in Fig. 1A, are reminis-cent of non-LTR (long terminal repeat) retrotransposonsfound in other organisms. We call this element CZAR, forcruzi-associated retrotransposon.The first 1,300 nt of CZAR do not contain extensive ORFs.

This 5' untranslated region is notable for a short (185-nt)repeat segment present in two full copies and one partialcopy (underlined in Fig. 3). The repeats are recognizable as

VOL . 1 l, 1991

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 4: A New Member of a Family of Site-Specific Retrotransposons Is ...

6142 VILLANUEVA ET AL.

TACTACGCTTATTATACATTGATACAGGTTCTGTGTTCAAACCCTGTTCTTGTTAAAACTTTTCCCATTTTTGTCTTTTTTCGGATATATTGTCCGACGAGGTTGCGCCTCGTCGGAAIAAAAGTAGTTCCTAGGAGAGrGCTGCTCCGACGCGGTTCGAACCCTACTTGAAATTTTTTTTTTTGTCTTAGTCATTTTTTCAAAAAAAAAAAATTTTTCAAAAAATTTTTTCCCTTGACTAGGGTTATTTGGACGCGAGGGCAACTTGGCAAAAGAATTGTTTTTGACTTTATCATCGTTTTCAAAAAATAAAAAATAAAAATATATAAAATAATAACATATCTATATATATAGTTATTATTTTTGACACTGTATCACCCTACTTGATAACAACTATTGTGGTTGGACTCTAAATACGTAArTTAAAAGGACACAAAAGCTCCTTTTACTCAATCGGCGTACGTGTTTGGTTTGTGATAATACAACTATTGTAAATAAGAACAAAAAAGAAAAAGGAAAAACAAAATCAACAGAAGCAAAATTAGCAAAATACAGTTCATTAGGCACAAAGAGATCCTCATAGCTGTAAAACGGATCCATTTGGTAGGAATTAATACTTGTCGTGGTCTTTGAATACTCTAAATTTTAGGTAAAAGAACTACCCTCTTGTCGTAGTTCACCACLAMAAAGAAGAMACAGGTATTGACGCGTATTGTTTGTCAGAGATCGATATGCAACAATTTGAGAGCGAGTGAGTAACAGGAGAAGTACCCCTTCTGGAAAATAAACATTTTTGAGGGTCTATCTGGGACCAGGTTTCTCGTATGGAAAGAAATATCATTGCGGATATTAAACGATATTGGACAGAGAGGGCATCGATACTCATCGGTTGGGGAGATCGATATTATTTGGTAAACAGTCGAGTGAGTAACAGGAGAAGTACCCCTTCTGGAAAATAACCATTTTTGAGGGTCTATCTGTGACCAGGTTTCTCGTATTTTTACGGAAAGAAAATGCGGATATTAAAAGATATTGGACAGAGAGGATATTGACGATTATTGGTTGTTGGAGATCGTTGTTAATACTCAGCGAAGAATCGAAGAAGTAAAGAGAATTGACGACTGCGTCATTTGGCAGAACACGGATAGTGGACAGCGTGTTGGATCGGGAGTGACGGGTCTTTGGGTTCGATCCATTGACACTTATACGTTGTTACACGGACAATTACTACATTCAGGATACTAATTATCACGGCTGGGCAAACCAGGACTCGGCAGATACATCTTTACGTAGGTAATGGCCTACTCCAATCGAGCATGCCAATGCCTTTTGACTTTCGCAACGGGAGGACGGACAGTGACTGGACGATAGGTGGGAAGCGGCAGCAACCGTATATGACACGGTCAGCGGCTGCGTCAAGAACTCCGTTTATCCACCGAGACAGCGTGGATCCATCTCAGCTTCAGGATATCATTACGGCGGCA

N T R S A A A S R T P F I H R D S V D P S O L O D I I T A AGAGGCTATCAAACGGCTCACGTCCCCTACTAATGGGCGATACACGGATAACAATGGGAATCGCAGCTGGAGGTCAAGGAGTGGGCGGCCGAACAE A I K R L T S P T N G R Y T D N N G N R S W R S R S G R P N

ACGCAGAGTTACTCTGGCGATCTCAACCACCCCAGGGAGGGCCGCAAATGATGCAAGGAGGAAACCGTTCCTGGCAGTACAGAGACCAACAGCC

rTAGTCAaTGCAAAAACCTArTTGTTTGCTGAG:AGMCT:TTCTAArTATTTTrTTTTAArTCAGATkTATTATTAAAGA:GGCTCGAGTATT.ACCACGkACGACAT T

GAAGAGR R:CAGACG

D A E L L W R S Q P P O G G P O N N O G G N R S W O Y R D O O P R RGTACAGATCACAATCAAGCTTGGAAAGACATAACACGAGCCACAGGCGCCGCGAAGGACTCAACAGGAGGTTCGTTCGGGGTCCCAGGGGGTCCGACGAG

Y R S O S S L E R N N T S H R R R E G L N R R F V R G P R G S D EGGGCCTCACACGAACAGCGGGGACCACAGTCAAGCACCGCAACCCCATCCTCCCGGGAAGAAGGGAGGTCGCGCAGGGGTTGCCACACGCAGGTTTTCGGG P H T N S G D H S Q A P Q P H P P G K K G G R A G V A T R R F SGGACAACTAACCAGACCAAAAACCCGCATCGTGCAATAGCAGCGCTAGCGGAGCTCGAGCTGCAGCCTACCAGGAAAGCACCAGCGGTACATAGGAGCG T T N O T K K P A S C N S S A S G A R A A A Y Q E K H Q R Y I G ACACTAAGACTCTAGTGCCATGGTGGACGGCCTTGTACCAAAGTGGAGCACAAAATACCACTGCCCACTGTGCTCGrTTTACCGACCTGGGGAGCACGAC

T K T L V P W W T A L Y Q S G A Q N Y N C P L C S F N R P G E H DGTGTTTTACCACTGTCGCCAGGCTCACCCAACATACGACAAGTGCTACCCGTACCGACTACATCTAAACGGGCACTGCACCTTCCCCGTCGAGAAATCCTV F Y H C R Q A H P T Y D K C Y P Y R L H L N G H C T F P V E K SGCTCGCTCAATGCAGTGCTGGCGGTCCTGTCTCATTACGAGGACGAGAGCCAGTTGTCGCAGGMAGTACGAAGTACCTACACGGTGCCATGCAGAGAGMAC S L N A V L A V L S H Y E D E S Q L S Q E V R S T Y T V P C R E NTACGGAACGGATCATACGGGCAATGGACGTGAACCTGCGCTGGCACCGGTATCGGCTTTGGAGCTCCTCATTAAMACGATTCGAAGTTACGGTGTATGT

T E R I I R A M D V N L R W H R Y R L W S S S L K T I R S Y G V CTTTCCACAACGGCGGTGGCCGGAATCCACTGCGAGGCATGCGGGTGGACCCTGCCAATGGCGGATGCGTATCCGTCCTACTCGGAAACACTGCAGGACGAF P O R R W P E S T A R H A G G P C Q W R N R I R P T R K H C R T

ACCCGCCGTCATCACCTTGCAGCCGGGAAAGAGGCACCCATACAACTCACACAGAAGTTGTTGATGCAACAATACCGGTCCACGTGGGTAACGGAAGAAN P P S S P C S R E R R H P Y N S H R S C *

CACGTGTGCACCGGCGAGAAGCGGGAACGCATCACACTCCATGGCACATGACAGCCACCAAATCACGGAAATTCGAAGACGCGGTGGCACTGGAGTTCGGN A H D S H Q I T E I R R R G G T G V R

CCACTGGGACAAGCAAGCGATGGGGGTGGAGGACATCCCATTTGTGATCCTCCTACCGCATCAAACGGCACGGCGGAAACGGACTGGTGGCCTAGTGGP L G O A S D G G G G H P I C D P P T A S K R H G G K R T G G L V

CCACCTACGCCCCAACCATGTCGTCGCCTACATACCAAGCGCTCRACGGGACACGGAGTGGGTCATGATCGATGGTATGGTCCAAGGTGCAGCGA T N A P N H V V A Y I P S A L R K D T E W V N I D G M V Q K V O RAAAGCCCCTCAACACCCACAAGGTAATCCTCTGCCTCTACCGGCGCCTTATGCCGGAGACGGGACCGGAGGAAGATGAGGAGGACGATGAGCAAkGAACCG

K P L N T H K V I L C L Y R R L N P E T G P E E D E E D D E Q E PCAGGCGCGTACAGGGAGCAGGACGGCCGCACGCTAGCCACTGCGCCCAGGGCRCACGCCGCCCCCAATCACAATGGGAAGAGGCCCCAGACGACTQ A R N R E Q D G R T L A T A P K G s R R P Q S Q K W E E A P D DCCAGTATGGTGGACGGGTTTGGTAAGAATGGATATGGGCAACCGCCGGGCAGTGCAACTCATTCCAGTCCGCCAGCCTCGTCGTCACCTTCCACCCAGACS S N V D G f G K N G Y G O P P G S A T H S S P P A S S S P S T Q TGATCACACAGGACCCCTTCCTCACTCCTATCGCCCCACGAAAGCGAGGCAGGCGGGGGGAGGACCAGGAGGAGGAATCCGAGGCAGACGGCAACAAAGGC

I T Q D P F L T P I A P R K R G R R G E D Q E E E S E A D G N K GGGGGGGATTGTCTGCGAAGAGGACGAGGAGGGACTCTTACAACCGTCAGTTCCCCCAACATCGTCCCAACCAAMCCAGCAGGCACCACAAGTTCACTTCCG G I v C E E D E E G L L Q P S V P P T s s Q P N Q Q A P Q V H FTAGATGAGGAGGAAGAAGAAATAACACTGAGGCGGACACAACCTCAAGAGACACCCCACCAcATAAGGATGACACACACcACCAcACAATMCCTCACTL D E E E E E I T L R R T Q P Q E T P H T N K D D T P P H 0 0 S S LGCCCCAGGAAGAAGTGGAGATGGAGGAGGAAGGTGTGGGAGGCGACGAGGGGGACTCAGAAACACCGAGAGACACTACGGGCCAGGAAGACGGTTGGACAC

P Q E E V E N E E E G V G G D E G D S E T P R D T T G a E D G G Hrr&TTTTrArArr.&rAAnrAArssrrrCTTr.rTrrATrPrrnTr.TrrTnTrTr.nr.rnnrTr.rrTnrrArvrArAATTrTrr.r.r.ArrnrrTrn.Tr.r.r.r.AArArrEXA I I I I L-AtAt;UAtWAUAL;ALLUti I I tit I UUiA I L;tt;t I ti I tit Ui I L I Uti;Uti I titit I tit;;ALLALUAt I L; LtitUAt;L;UXU LtU titUMLUAAXAkP f S H D K T T R L L D P V W C L A G G C H H K f S G P R R G E HRACGA HCGCATATTCATGCAGTACACCGRGCAGGAR CGGATGGACATCACTNGAIGCGCNCATCA CCCAGGACTGGTTCGGRGCGACGCATGCGGL R S H I lil A V lSi R K a E R N D I T N E A L I S Q G L V R MC D A ffi GGGAAGTTTGCTCGGCCAGCCTGCGAGCACGGGCAGCACATCGCCCACGCTGTGGCCAGTACACATGTCGTAAGGAAACATGGCGACACAGCGGGAGGAG

1002003004005006007008009001000110012001300140015001600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

2800

2900

3000

3100

3200

3300

3400

3500

3600

3700

3800

3900

4000E V C S A S L R A R A A H R P R C G Q Y T C R K E N M A T Q R E E

FIG. 3. Nucleotide sequence of CZAR. The complete nucleotide sequence of the element was determined from lambda clone XTCBG 4.The deduced protein coding sequence of the two nonoverlapping ORFs is shown below. Numbering begins with nt 1 of the SL-RNA gene.Boxed regions at the 5' and 3' ends correspond to target site duplications. CZAR sequences interrupt the SL-RNA gene at nt 32. Adjacentto the 5' target site duplication is a 17-nt sequence corresponding to a repeat of nt 14 to 30 of the 39-mer (underlined). Starting at nt 784 isthe first of two full copies of a 185-nt tandem repeat (underlined). ORF 1 starts at nt 1505 and terminates at nt 2663 and is 386 amino acidslong. ORF 2 is separated from ORF 1 by 78 nt and is in the same reading frame as ORF 1. It is 1,317 amino acids long and terminates at nt6692. The 3' end of CZAR contains an untranslated region followed by 42 A residues. The underlined residues in ORF 1 and ORF 2 representamino acids that correspond to Cys motifs. The Cys motif in the conserved retroviral endonuclease domain is underlined, and conservedresidues are boxed. Also boxed is the conserved YXDD box of the reverse transcriptase domain. *, termination codon.

such, although they display degeneracy. Their A+T contentis 63%, which contrasts with the relatively G+C-rich tandemrepeats found in other non-LTR retrotransposons (10, 34).They are localized to pBO.9, the 0.9-kb BamHI fragment onXTCBG 4 which lies adjacent to the site where CZARinterrupts SL sequences (Fig. 1B). pBO.9 hybridizes to theSL-hybridizing triplet of 2.5-kb HindIII fragments seen ingenomic digests (Fig. 2, lanes 2 and 3), which suggests thata variable number of these imperfect repeats underlies the 5'genomic heterogeneity in CZAR and that XTCBG 4 containsa version of one of the variants. Since there is no further 5'

variability on Southern blotting experiments, these elementsdo not appear to have 5' truncated versions.To determine whether the target site duplications flanking

different copies of CZAR were identical, two additionalinsertion sites from independently isolated phages weresequenced (data not shown). In both cases, at the 5' end, theterminal duplications are 22 nt long and are identical to thosein the CZAR element whose sequence is shown in Fig. 3.This finding suggests that these are recent insertion events orthat there are functional constraints being exerted to main-tain the duplications. This contrasts with other non-LTR

MOL. CELL. BIOL.

11 -2

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 5: A New Member of a Family of Site-Specific Retrotransposons Is ...

T. CRUZI RETROTRANSPOSON CZAR 6143

TACCGTGCGTCGGTTACCGGATCACACTATACCAAGACGGCGGCGTTTCTGGAACGGACACCTGCCGTGGAATGGCCGACGACACCGGCCACAGACCCGCY R A S V T G S H Y T K T A A F L E R T P A V E W P T T P A T D PGGCAGGACCCGTGGCTGCAGGAAGAGTGCCGACGAGACGTTACCTCCACAAGCGTGAGTGGCCCAACTGGCTGGATGTATGCCGCACAGTCATGTTGGGR Q D P W L O E R V P T R R Y L H K R E W P H W L D V C R T V N L GATACAACGCCTCGGCGCCGGAGGAACGCAGCCGCAACAAGTGGCCATCATGGACCTGGTGCGACAACACCTACGACTGCCGGAGAACCCGCGTAGTCGG

Y N A S A P E 0 IS R K O V A I N D L V R 0 H L R L P E H P R S RCGCCAAGCCACCCGCACCSAA TGYTAAGCAGCATACCCGATCCTCCACCCGACCACTGCACACCACGACAGTGGTGCGGGGTGCTATGGAGACCAR Q A T R T H H D K Q H T T D P P P D H C H T T T V V R G A N E TCGAAAGACGGGGAGGAGTCGGTCAGCGATGAGACTGAGCAACAGGACACAACAGCACACCAGCCAGAGCGCATACTCACCGCGTCCGACATATACAAGACT K D G E E S V S D E T E O O D T T A H O P E R I L T A S D I Y K TGCGACGCGTAGAGACATTGTGCACACTGCAGGCAACGGGACGCGCAGCGCTCCTCACGGCGGCAGAGGCGGAGCCGGTGGTCTTCTCCCCCGAGCTGGTA

R R V E T L C T L O A T G R A A L L T A A E A E P V V F S P E L VCAAAGCCTGGACGACCTCTACCCACAGGAGGACACAAGCCTGTACCCCGAGCCGGCCGTCAGTGCACCCCTGGTCACGTTCGACAGCAAGGAGCTGGCCAQ S L D D L Y P O E D T S L Y P E P A V S A P L V T F D S K E L A

AGATAATTGGGAGTCGCCTTACGCGGGGCGCAGCACCCGGTCTTGACGGCTGGACACGGGAGCTGCTGTACCCCCTCACGAAGGACAAGGCGCTCCTCATK I I G S R L T R G A A P G L D G W T R E L L Y P L T K D K A L L NGGAGATCACCGCCATTTTGACGGACATGGCCAACGGAAACGTGGCCCCGGAGGTGGCACACCGCCTCAGGGCAACCAATCTCACGGTGCTCCGGAAGCCA

E I T A I L T D M A N G N V A P E V A H R L R A T N L T V L R K PAACAAGAAGTTCCCGCCGATCGGCGCCGAGTGTGTATGGGCAAAGCCATATCACTCATGGCGGTGGACGCGGTCATGCCAGCCCTCAAACCTGCTTTAN K K F P P I G A E C V W A K A I S L N A V D A V N P A L K T C F

AGAACCTGCAGTATGGGGTCGGCAACAACATCGAATTGGCGATCCAGAAGATTCGGCGGGACTTCCACCTCAAGGGCAGTGTGGCCATGCTGGACGGCCGK N L a Y G V G N H I E L A I O K I R R D F H L K G S V A N L D G RAAACGCGTACAACGCCATCAGCCGCACGGCCATCCTGTCCGCCGTGTACGGGMAACCCGCCTGGAGCCCGCTGTGGCGGGTCACACGACTGCTCCTGGGCN A Y N A I S R T A I L S A V Y G N T A W S P L W R V T R L L G

ACGGAGGGGCTGGTGGGCTTCTACGAAAGGGCCAACTGGTCCACTCGTGGAAGTCGACCCGTGGAGTGCGCCAAGGCATGGTGCTGGGACCGGTCCTATT E G L V G F Y E K G o L V H S W K S T R G V R Q G N V L G P V L

TCTCCATCGGCACCATCGCCACCCTCCGCCAACTGGAAGCAGCTTTTCCAACGCCAGCTTCACGGCGTACCTGGACGACGTGACGGTGGCGGCACCACCF S I G T I A T L R O L E S S F S H A S F T AIY L D RJV T V A A P PGGGCATGCTGGGGAAGGTGTGCGAGGCGACCTCCCGGGCGATGCGTGCCCTGGGCATTGAGACAAACGAGGACAAGACGGAGGTCCTCAACAAAGGAGGG

G N L G K V C E A T S R A N R A L G I E T N E D K T E V L N K G GCCCGTGGACATGCCCACGGAGTACATTCGGCCGTTTGCCCGCGTGCTCGGTGCGGGAGTAGCAAACGACCCAGAGAGCGAGCTGATTACGCAGTTTGTGCP V D M P T E Y I R P F A R V L G A G V A N D P E S E L I T o F V

AACGCAAGGCGGAGGAAACCGACCGCCTGTTCCGAGCCATTGTGGAGCTGCCATTCGCAAAGCACACGCAGGTGCGGCTCCTCTCGGTGTCGGCGCTGCCQ R K A E E T D R L F R A I V E L P F A K H T O V R L L S V S A L PACGCGTGACGTTCCTGCTACGGACGCATGCCCCCGCACACACGCGGGCAGCGGCGAGTGGTTCGACGACCGCGTCACCGGCGTCCTGGGCGTCATCATGG

R V T F L L R T H A P A H T R A A A S G S T T A S P A S W A S S WACGGCCCCGTCACGAAGCGCGCACGCACATTGCGGCCATCCCGGTGCGCCGGGAGGGTGCGGTCTCCGACGGCAGAGGGAGATTGCGGAGTTTGCCTACGT A P S R S A H A H C G H P G A P G G C G L R R O R E I A E F A YCGTGTCTTGGCGAGAAGGGGAAGCAACGCGCCATGACCGACGAGTTGGATGCAAAGCACCAGAGCGACCTTTACGAAACCCTGCAGGGTCCTGATCGCAAA C L G E K G K O R A N T D E L D A K H O S D L Y E T L Q G P D R KGGTGTTCGTGTCCAACACGGCGGCCGGCGCTGGCAGACCCCTCACGGACCCGCAGGTGCATGCGGACGACAGAGGTTTCTCCACCTACCTACGGGAACGA

V F V S N T A A G A G R P L T D P O V H A D D R G F S T Y L R E RCAACTGATGCGCGTGCTGCCGGAGGGACAGRCACGTGCAAACGGCACGCGACTCCAACACCCGCL L N R V L P E G O K C V C G A D A S H E H V H T C T R L O Q H P

GGACCACACGCCACGACATGATCAACATGACGTTCGCAAATGGGTTGCGGCTGTGCGGATTCCAATGCGGCATGGMACCGCGCCTGACGGAGGCAAGCCGR T T R H D M I N N T F A N G L R L C G F O C G N E P R L T E A S R

ACGGCGACCGGACATCCTCATCGTCGGACTCGACACGTACGCGATCACCGACGTGACCGTCACGTACGCCGGGCGGGTCACTGCCTYAGTGTCGGAGGAR R P D I L I V G L D T Y A I T D V T V T Y A G R V T A Y V S E E

S N E E A D P L R A A R D R L T O K R Q K Y R N W A L A N G L D FAGCCATTCGTCATGCTGACCAACGGGGCAATCCACCCGGCAAGTCGGCGGTGGCTGCGGCGGATCCTGGGCAACCAGGACCACCGACTCACCATCACGAAE P F V N L T N G A I H P A S R R W L R R I L G N O D H R L T I T NAGCGTACGATATGATAGAGGCGGACACCCTGGCGGCCATGCTGCGTGGAACGTGCACGTCTTCACGCAGCGTGCGCCGCACGGGCCGGGTATACCGCA Y D N I V A D T L A A N L R G N V H V F N A A C A A R A G *

CCCGGGTAGGCGAGTGCCCGGGCGACCCCTCGGGTAGCAGAGAGGCAATGAACACAGAGGTCAAGGAAACTACGGAGACGGAAAACACAAGTGAATACTCGCCAACACAGAAAGCAGCAGGAAACATCCACGGAACGGCAAGCAATGCAGAGGCAATATCAACGTTCGGCATACGCG&GAAGGCGTTTTTTGTTTGTtATTCGAGTCACCGTACCAAAGCCAAAGATTCAAAGG&CTTCTTCTCTATATTGTGGMAATAGACGCCCAAGAGGGTACTGGAAGACTGCTTTTTTCTTGCTTCTTATTCCCTCCCGCCCGGGTTGGTTCGGATCTTTACTTTGTTTTTATTTCTTTGGCGGGTAGGGCATGTTTTGTACTTTGGGTATGGGTJTTT

FIG. 3-Continued.

retrotransposons in which the extent and composition of theduplications vary between different copies of the element(20, 24, 52). Comparison of the DNA sequence between thethree phages in the adjacent untranslated region shows anumber of base pair changes, indicating that they indeedrepresent different copies of CZAR.ORF 1 of CZAR contains a nucleic acid binding motif that

differs from retroviral gag genes. Previous analysis of the gaggene of retroviruses, which codes for virion core polypep-tides, has shown characteristic conserved Cys motifs usuallylocated at the carboxy terminus (14, 52). In retroviruses aswell as LTR- and non-LTR-containing retrotransposons,there may be one to three copies of this motif with theconsensus pattern Cys-X2-Cys-X4-His-X4-Cys, where X rep-resents any amino acid (52). Such zinc finger motifs arecharacteristic of proteins with nucleic acid binding proper-ties (5, 30). CZAR also contains such a motif in the middle ofORF 1 (underlined in Fig. 3). It possesses the sequenceCys-X2-Cys-X12-His-X4-His and thus differs from the motifspreviously described in gag domains. Interestingly, thismotif is more similar to that found in the Xenopus transcrip-tion factor TFIIIA (Cys-X4-Cys-X12-His-X34-His) (36). A

comparison of several gag region Cys motifs among retro-viruses and various retrotransposons is shown in Fig. 4A.Both CZAR and SLACS contain the same motif in ORF 1.CRE1 does not possess a separate ORF 1 region with gaghomology. R2Bm, a retrotransposon found in the 28S rRNAgenes of the silkmoth, contains a single ORF which has thesame type of Cys motif at its N terminus (10). IngiITRS-1,another, more ubiquitous retrotransposon in the T. bruceigambiense complex, also has this motif but at the 3' end ofits single ORF (29, 38, 42).ORF 2 contains a DNA binding domain resembling endo-

nuclease. An analysis of the 5' end of ORF 2 in CZARstarting at amino acid 356 also shows Cys motifs which can

be grouped in the following order: His-X4-His-Xj>20-Cys-X2-Cys. This arrangement has been noted in various retro-viruses (e.g., human T-cell leukemia virus type 1 [HTLV-1]and Moloney murine leukemia virus [MoMLV]) and a fewLTR-containing retrotransposons (e.g., 1731 and copia), andit has been suggested to be characteristic of endonuclease/integrase domains (27). A comparison ofCZAR in this regionwith several retroviruses and LTR-containing retrotrans-posons (Fig. 4B) reveals that it conforms to this motif.

4100

4200

4300

400

4500

4600

4700

480

4900

5000

5100

5200

5300

5400

5500

5600

5700

5800

5900

6000

6100

6200

6300

6400

6500

6600

6700

680069007000710072007291CATTCACGGTCGCGGTGGCCAACTTT

VOL. 1l, 1991

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 6: A New Member of a Family of Site-Specific Retrotransposons Is ...

6144 VILLANUEVA ET AL.

(A)

CZARSIALCSR2Bn

HIVCDPIARIBn

(B) CZAR

SLACSCRE 1

0PIA1731MEMLVHTLV-1

C X2-C.........X12 .......... H. ..X4..., H

CIPL kCISFNRPGEHDVFYHCRQAH

CIPV C GFAHPEETITVT HCRQQ HC QF CIERTFSTNRGLGV HKRRAHC]TD C DATYQCRSSAVTH VNK[HC .X2.C. ..X4.. H .X4...,

CIFNCGKEG HIARN. .(7). .CWK CGKEG[H QMKN[CCIHH CGREG HhIKKD CCJNK C QQYG IHPEKFjCj .(6). CGRCGEDGG H RMEA[CrL^____n

. (20). CZ X H ..(90) .

1 PIVG C H ,. (20) . C SI t'i H .p..92) . .E. (19) .

,. (27).. (28) .

,.(32).

,. (22) .

X19-36

....................(130).,

...................(125).

.........v (121).....................(118).

.X108-126.........

FIG. 4. Comparison of the Cys motifs found in CZAR with thoseof other retrotransposons and retroviruses. (A) The Cys motif inORF 1 of CZAR is compared with that found in SLACS (3), R2Bm(10), and Ingi (29). A consensus motif is denoted above where C isCys, H is His, and X represents any amino acid. This is comparedwith the motifs found in the gag domains of human immunodefi-ciency virus (HIV) (43), the LTR retrotransposon copia (37), and thenon-LTR retrotransposon RlBm (52). Numbers in parenthesesrepresent intervening amino acids. (B) In the top portion, the Cysmotifs in the 5' region of ORF 2 of CZAR are juxtaposed with thosefound in SLACS (3) and CRE1 (23). The Cys and His residues of thethree SL siteposons are boxed and arranged as two zinc fingers. Analternative grouping of these residues is shown in the consensusmotif below, which corresponds to the conserved Cys motif inretroviral endonuclease domains (27). Residues E and R are alsopart of the endonuclease domain. Also shown are examples of theseconserved residues in the endonuclease domains of two LTR-containing retrotransposons, copia (37) and 1731 (21), and tworetroviruses, MoMLV (47), and HTLV-1 (46).

SLACS and CRE1 also contain this Cys motif. Aside fromthis Zn binding motif, CZAR, SLACS, and CRE1 do notshow conserved residues characteristic of retroviral endo-nuclease domains except for conserved Glu-Arg residuesfound 108 to 110 amino acids following Cys (19, 27). Theseresidues are present in CZAR and SLACS but not in CRE1at the same relative position. Of note, a similarly arrangedendonuclease motif has not been found in other non-LTRretrotransposons (e.g., Rl, R2, I, F, and LlMd) (18-20, 25,34). There is no amino acid homology to a protease or RNaseH domain (27) in ORF 2 of CZAR.ORF 2 of CZAR bears homology to conserved reverse

transcriptase domains. Retroviruses and retrotransposons allhave a highly conserved region in the pol gene whichpotentially encodes a reverse transcriptase (8). By compar-ing a large number of retroviral pol genes and the ORF 2 ofDrosophila retrotransposon 17.6, Toh et al. identified aconserved region with 13 amino acids that are invariant orcontain acceptable substitutions (50). This constitutes aframework for comparing other elements with presumedreverse transcriptase homology. In addition, Xiong andEickbush (52, 54) delineated eight regions of amino acidhomology in this domain that were shared specifically by thegroup of non-LTR retrotransposons and which also encom-pass the invariant amino acids identified by Toh et al. In Fig.5, the ORF 2 sequences of CZAR are compared withsequences of other non-LTR retrotransposons and alignedaccording to these two schemes. CZAR contains 9 of 13 ofthe original invariant residues of Toh et al. and 23 of 35invariant residues according to the scheme of Xiong and

Eickbush. The spacing between the eight conserved regionsis also in keeping with the proposed scheme.The similarities in potential protein coding domains be-

tween CZAR, SLACS, and CRE1 are most notable in thereverse transcriptase region of the pol-encoding ORF. In thisdomain, it is remarkable that the three elements show agreater similarity to one another than to other non-LTRretrotransposons (shown as shaded areas in Fig. 5). Allowingfor functionally equivalent substitutions, there is 55% aminoacid homology among all three elements within the domaindelimited by the eight highly conserved reverse transcriptaseregions. A comparison of CZAR and SLACS alone in thisregion shows 75% amino acid homology. Furthermore, allthree elements are identical to each other when one com-pares specific amino acid residues which deviate from thosethat are uniformly conserved in the other non-LTR retro-transposons (e.g., V in place ofF in box 3 and A-Y-N insteadof A-F-D in box 4). In the highly conserved YXDD box (box6), all three elements contain L or I in place of A, makingthem more similar to retroviruses (52, 54). The similaritiesbetween CZAR and SLACS extend to the region down-stream of the reverse transcriptase domain where there is70% amino acid homology. CRE1 diverges from the othertwo elements in this region.Genomic organization of CZAR. The chromosomal distri-

bution of SL-RNA genes and CZAR was determined byPFGE. When T. cruzi Gainesville chromosomes embeddedin agarose blocks are separated by PFGE and analyzed bySouthern blotting, all SL-RNA gene and CZAR sequenceslocalize to a single chromosome of approximately 1,200 to1,300 kb (Fig. 6A). Furthermore, upon digestion with en-zymes such as EcoRI and XbaI that cut neither within CZARnor within the SL-RNA gene, three identical bands rangingfrom 200 to 400 kb are found to hybridize to both pTC-SL,which contains the SL-RNA gene repeat unit (Fig. 6B, lanes1 and 2), and pBS4, which contains CZAR sequences withinthe reverse transcriptase region (Fig. 6B, lanes 3 and 4). Thisresult further corroborates the finding that CZAR sequencesare associated with SL-RNA genes.

Estimation of the copy number of SL-RNA genes andCZAR was done by comparative hybridization intensityanalysis. Specifically, HindlIl digests of different concentra-tions of genomic T. cruzi DNA were run in parallel withknown concentrations of a plasmid containing a single copyof the SL-RNA gene or XTCBG 4, which has one copy ofCZAR. Subsequent Southern blots were respectively hy-bridized to an SL or CZAR probe, and relative hybridizationintensities between the genomic DNA and single-copy con-structs were compared. Using a T. cruzi genome size of 250x 103 kb (33) and the appropriate molecular weights of thesingle-copy constructs, the genomic copy number was cal-culated as previously described for T. brucei gambiense (16).This analysis indicated that there are approximately 300copies of SL-RNA genes and 30 to 40 copies of CZAR perdiploid genome (data not shown). Thus, it appears thatapproximately 10% of all SL-RNA genes in T. cruzi areinterrupted by CZAR sequences.CZAR resembles SLACS and CRE1 but is a distinct ele-

ment. CZAR has many structural features in common withSLACS and CRE1, although overall it is more similar toSLACS. This reflects the finding that T. cruzi and T. bruceigambiense are evolutionarily closer to each other than to C.fasciculata, as has been deduced from mitochondrial rRNAgene sequence analysis (32). The most striking feature is thatall three elements insert specifically between nt 11 and 12 ofthe SL 39-mer. Of note, the 39-mer of the SL-RNA genes in

MOL. CELL. BIOL.

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 7: A New Member of a Family of Site-Specific Retrotransposons Is ...

T. CRUZI RETROTRANSPOSON CZAR 6145

2

CZAR.. (658) ..SC .. (520) ..CIE-1.. (435) . .

3

RP K Q G

_.. (12).. A

.*. (17)f.. (17)

... (17)

L PG D F G P K L K R Q GF TIrqi.. (145)..IIE N .. (12) . E2WE'VIIIAnI.. (12) .. ..

Ri. . (461) . ..SWPK. . (12) .. b _ _ _ _ _ _ . . (11) . .GFGR. .R2.. (438) . . . .S-. (09) .*A*TPI RAI . . (10) . .C..AT..I.. (309) ..TSKGCAPG.NRISY(l.. (12)..F.E_.I (14)..F.. (445) .. .IQ-. (12) . .FNAITK[IYFPgRIR4KE (14) .

L R

4 5 6

CZAR.. (17) ...

SIA S.. (17) . ..

CIE-1;.. (17) . .

7 8

G K GV

.*(19) .-I-W * *(36) EW...(. .. . . (6 .....

***19)*- -*.*. (13)

ID AFD G QG L L ADD K IGVIngi.. (24) ...v a vI... (45) ....GVP VPDIIMN5L... (13) ....FLTUA... (28) ... SVNVAlM ... (29) .. .IGVRl.. (23) . ..T MA... (43) ...*G*aI 1: IVINMEL... (12) ... .AYAD1 .IVIV... (20) ...G. .... (38) . ..IGVR2.. (25) ....II YVI.... (45) ....I I;M EICL... (19) ...sAYAlX ATA.. . (20) ...GIR.NR... (41) . ..I<V

LIM. . (26) ....IAE a ... (45) ...G.. 161SPflIIVL&VL... (23) ... .ITADEN ... (21) ...GY1 **... (31) .. .IGVI.. (22) . ..I VRFDVV... (45) . ..GIPQGSPIVLEUANl(L.. . (13) ....AYADDFPIZI.... (26) ...GASISISK ... (31) ...eItIF.. (25) ...11VQPMVL... (44) ...CIFQ~ ~ M... (09) ...S1-ADr1=... (26) ...II .~... (31) ...ll;V

D F PG P Y M G K G

FIG. 5. Comparison of the putative ORF 2 coding sequences of CZAR with amino acids in the reverse transcriptase domain of othernon-LTR retrotransposons. The SL siteposons are grouped together and separated from other representative non-LTR retrotransposons.Conserved amino acids are organized into eight boxes as described by Xiong and Eickbush (52). The amino acid residues in bold at the bottomcorrespond to invariant residues in retroviruses described by Toh et al. (50). The conserved residues identified by Xiong and Eickbush appearabove the non-LTR retrotransposons. The amino acids from both schemes that are conserved in CZAR, SLACS, and CRE1 are marked abovethem. The shading represents conserved residues among the three SL siteposons allowing for functionally equivalent substitutions. Thenumber at the start of each element denotes the number of amino acids from the beginning of the ORF. Subsequent numbers between eachbox represent intervening amino acids. Dashes correspond to gaps introduced to maximize alignment. References for the sequences: SLACS(3), CRE1 (23), Ingi (29), RlBm (52), R2Bm (10), LlMd (34), I factor (20), and F factor (18).

T. cruzi, T. brucei gambiense, and C. fasciculata are iden-tical between nt 1 and 12. All contain unusually long targetsite duplications (22 nt in CZAR, 49 nt in SLACS, and 28 ntin CRE1) compared with the 4- to 14-bp duplications foundin other non-LTR retrotransposons. CZAR and SLACS bothcontain a variably repeated tandem sequence of 185 nt in the5' untranslated region. Similar repeats have been found inthe mouse long interspersed nuclear element (LINE) LlMd,and it has been suggested that they may represent internalpromoters (34). Both CZAR and SLACS contain a potentialgag-like ORF of 386 and 384 amino acids, respectively, thatis not present in CREL. In CZAR and SLACS, ORF 1 isseparated from ORF 2 by 78 and 79 nt, respectively, andboth ORFs are in the same frame.The three SL-associated retrotransposons all contain the

Cys motif (His-X4-His-X2030-Cys-X2-Cys) which is the mostrigorously conserved feature of retroviral endonuclease do-mains (27). In all three elements, this Cys motif is locatedupstream of reverse transcriptase sequences. This organiza-tion differs from that of other non-LTR retrotransposonswhose putative endonuclease domains contain a Cys motif ofthe type Cys-X1-3-Cys-X7_--His-X4-Cys downstream of re-verse transcriptase sequences (20, 25, 34). Interestingly, asshown in Fig. 4B, all three SL-associated retrotransposonshave additional Cys-His residues that can be alternativelygrouped into two Zn finger motifs that resemble the ORF 1

motif in CZAR and SLACS, further emphasizing that theseelements are more akin to one another than to the othernon-LTR retrotransposons.CZAR, SLACS, and CRE1 appear to be similar in chro-

mosomal organization as well. All are restricted to discretechromosomal loci which also contain the SL-RNA genes. Allare present in low copy numbers which constitute insertionsin approximately 10% of SL-RNA genes.

DISCUSSION

We have described CZAR, an SL-associated retrotrans-poson in the New World trypanosome T. cruzi which is thethird member of a family of site-specific retrotransposonsthat we have termed siteposons. This family includes thepreviously characterized SLACS in the African trypano-some T. brucei gambiense and CREl in the mosquitotrypanosomatid C. fasciculata. Preliminary evidence sug-gests that a fourth siteposon is present in another unrelatedinsect trypanosomatid, Herpetomonas samuelpessoai (la).While all of these elements can be classified as non-LTRretrotransposons, they share additional characteristics thatfurther distinguish them from the larger group. Most strik-ingly, all of these elements have the identical site specificityof integration between nt 11 and 12 of the SL-RNA gene.While numerous examples among retroviruses and LTR

K- -

.i.i.i. i.i....

---w e...*..* - -

VOL. 11, 1991

'. jS'. . . (13) . . .'(13)...j

...$... (13) ...i

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 8: A New Member of a Family of Site-Specific Retrotransposons Is ...

6146 VILLANUEVA ET AL.

A

kb

1900-

11 20-

945 -

1 MB

2 3

kb

4b 4u1400 -

350-300 -

250-

FIG. 6. Chromosomal arrangement of SL-RNsequences. (A) PFGE of T. cruzi chromosomesramped pulse time of 50 to 200 s at 140 V. Ibromide-stained T. cruzi chromosomes; M, yused as the size marker; 2 and 3, Southern blofrom the same gel probed with pTC-SL (lane 2)pBS3 (lane 3). (B) T. cruzi chromosomes were d(lanes 1 and 3) and XbaI (lanes 2 and 4) and sepa30 h, using a pulse time of 200 s at 170 V. Lanes 1with pTC-SL; lanes 3 and 4 were probed with CZSize markers were determined from a ladder ofers. For origins of the probes, see Fig. 2BMethods.

retrotransposons have shown that integratfest a spectrum of sequence specificity (44ated siteposons seem to display a more e)integration site specificity. The determinanintegration are not fully known but are pro]rial. One important factor may be theencoded endonuclease. This view is supporthat a fusion protein encoded by R2Bm, ancretrotransposon, was capable of cleavinjinsertion within 28S rRNA genes (53) wIvitro. It is interesting that according to a m(Xiong and Eickbush (53) based on R2Bm,sequence of the integration event might btandem duplication at the border with SL-This, in fact, is seen at the 5' border ofsequences.The fact that three evolutionarily distant

have maintained a stable population of stheir SL-RNA genes argues for a functionathey were nonfunctional, one might have prconversion events as well as negative seagainst deleterious insertions into SL-RShave favored their extinction. The majorityintegrate promiscuously throughout eukaoccasionally disrupting functional genes (8,ing variations that effectively reshuffle thrandomly inserting elements are thought toof evolutionary development. In contrast,that the siteposons we have described havrole that is inextricably linked to theirspecifically within the SL-RNA genes. Peposons help to ensure the survival of thegenes. It is possible that, by periodically i

SL-RNA gene array, they may serve to disseminate SL-

1 2 3 4 RNA genes and thus decrease the likelihood of eliminationof repetitive sequences via gene conversions. It has beensuggested for the only other two site-specific non-LTRretrotransposons, Rl and R2, that a self-encoded endonucle-ase may play a direct role in promoting rRNA gene amplifi-cation (26). The SL siteposons may be serving a similarfunction in situations in which SL-RNA gene amplificationoccurs.

A* , The conservation of amino acid homology in the reversetranscriptase domain and the persistence of the endonucle-

r* t " ~ase Cys motif in these siteposons imply that these function-ally crucial domains are maintained since they are necessaryfor active transposition. The location and arrangement of theendonuclease domain of these siteposons are more similar tothose of retroviruses and LTR-containing retrotransposonswhich have known transposition intermediates (27, 37).While the mechanism of active transposition is not known, it

A genes and CZAR presumably involves an RNA intermediate which is reverses run for 68 h at a transcribed by a self-encoded reverse transcriptase, fol-Lanes: 1, ethidium lowed by integration into a new genomic site (8). This modeleast chromosomes is indirectly supported by the structure of the retrotrans-ts of parallel lanes posons, namely, the presence of target site duplications, theand CZAR-specific pol-encoding ORF, and the 3' poly(A) stretch. In general, itigested with EcoRI has been difficult to document active transposition amongLrated by PFGE for the non-LTR retrotransposons since transposition frequen-Land 2 were probedZAR-specific pBS4. cies are low and most copies are defective due to 5'lambda concatem- truncations. However, in a few cases, full-length RNAand Materials and intermediates have been detected. In Drosophila melano-

gaster, for example, certain crosses between inducer malesand reactive females are associated with a high frequency ofgerm line transposition of the I factor retrotransposon (20).In this case, a potential transposition intermediate has been

ion events mani- identified (13). Full-length transcripts of the human LINE), the SL-associ- element LlHs have been found only in a human teratocar-xtreme degree of cinoma cell line (48). These examples suggest that somets of site-specific copies of LINE-like elements are functionally active. Arebably multifacto- the SL-associated siteposons actively transposing? In serialretrotransposon- passages of cloned lines, CRE1 reportedly has a transposi-ted by the finding tion frequency of 1% per generation (23). However, no)ther site-specific specific transcripts have been found for CRE1, CZAR, andg at the site of SLACS. As in the other LINE-like elements, this mayien expressed in reflect transcription levels below our detection ability orodel proposed by transcription that is dependent on the stage in the parasitea potential con- life cycle. There are precedents for stage-specific transcrip-

e generation of a tion in various Drosophila retrotransposons (41) and in theRNA sequences. T. brucei gambiense IngitTRS-1 element (38). We are cur-CZAR with SL rently studying whether a similar pattern exists for CZAR.

trypanosomatidsiteposons withinil role. Indeed, if-edicted that gene-lection pressure14A genes wouldof retroelementsryotic genomes,28). By introduc-ie genome, theseinfluence the ratewe can postulate(e a more limitedability to insertrhaps these site-crucial SL-RNAinserting into the

ACKNOWLEDGMENTS

We thank K. Matthews, P. Mason, C. Tschudi, and E. Ullu forhelpful discussions and critical reading of the manuscript. We aregrateful to Roy Capper for excellent technical assistance.

This work was supported by NIH grant F32-AI08415 (to M.S.V.)and Tropical Diseases Research Unit grant 1-POl A128778 and bythe John D. and Catherine T. MacArthur Foundation. M.S.V., S.A.,and F.F.R. are investigators of the Consortium on the Biology ofParasitic Diseases of the MacArthur Foundation. C.B.B. is sup-ported by the American Heart Association.

REFERENCES1. Agabian, N. 1990. Trans splicing of nuclear pre-mRNAs. Cell

61:1157-1160.la.Aksoy, S. Unpublished data.2. Aksoy, S., T. M. Lalor, J. Martin, L. H. T. Van der Ploeg, and

F. F. Richards. 1987. Multiple copies of a retroposon interruptspliced leader RNA genes in the African trypanosome Trypano-

MOL. CELL. BIOL.

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 9: A New Member of a Family of Site-Specific Retrotransposons Is ...

T. CRUZI RETROTRANSPOSON CZAR 6147

soma gambiense. EMBO J. 6:3819-3826.3. Aksoy, S., S. Williams, S. Chang, and F. F. Richards. 1990.SLACS retrotransposon from Trypanosoma brucei gambienseis similar to mammalian LINEs. Nucleic Acids Res. 18:785-792.

4. Beard, C. B., D. G. Young, J. F. Butler, and D. A. Evans. 1988.First isolation of Trypanosoma cruzi from a wild caught Tri-atoma sanguisuga (Le Conte) (Hemiptera: Triatominae) inFlorida, USA. J. Parasitol. 74:343-344.

5. Berg, J. M. 1986. Potential metal-binding domains in nucleicacid binding proteins. Science 232:485-487.

6. Bernards, A., J. M. Kooter, P. A. M. Michels, R. M. P. Moberts,and P. Borst. 1986. Pulsed field gradient electrophoresis of DNAdigested in agarose allows the sizing of the large duplication unitof a surface antigen gene in trypanosomes. Gene 42:313-322.

7. Bernards, A., L. H. T. Van der Ploeg, A. C. C. Frasch, P. Borst,J. C. Boothroyd, S. Coleman, and G. A. M. Cross. 1981.Activation of trypanosome surface glycoprotein genes involvesa duplication-transposition leading to an altered 3' end. Cell27:497-505.

8. Boeke, J. D., and V. G. Corces. 1989. Transcription and reversetranscription of retrotransposons. Annu. Rev. Microbiol. 43:403-434.

9. Borst, P. 1986. Discontinuous transcription and antigenic vari-ation in trypanosomes. Annu. Rev. Biochem. 55:701-732.

10. Burke, W. D., C. C. Calalang, and T. H. Eickbush. 1987. Thesite-specific ribosomal insertion element type II of Bombyx mori(R2Bm) contains the coding sequence for a reverse tran-scriptase-like enzyme. Mol. Cell. Biol. 7:2221-2230.

11. Carrington, M., I. Roditi, and R. 0. Williams. 1987. Thestructure and transcription of an element interspersed betweentandem arrays of mini-exon donor RNA genes in Trypanosomabrucei. Nucleic Acids Res. 15:10179-10198.

12. Castellani, O., L. V. Ribeiro, and J. F. Fernandes. 1967. Differ-entiation of Trypanosoma cruzi in culture. J. Protozool. 14:447-451.

13. Chaboissier, M., I. Busseau, J. Prosser, D. J. Finnegan, and A.Bucheton. 1990. Identification of a potential RNA intermediatefor transposition of the LINE-like element I factor in Droso-phila melanogaster. EMBO J. 9:3557-3563.

14. Covey, S. 1986. Amino acid sequence homology in gag region ofreverse transcribing elements and the coat protein gene ofcauliflower mosaic virus. Nucleic Acids Res. 14:623-633.

15. DeLange, T., T. M. Berkvens, H. J. G. Veerman, A. C. C.Frasch, J. D. Barry, and P. Borst. 1984. Comparison of thegenes coding for the common 5' terminal sequence of messengerRNA's in three trypanosome species. Nucleic Acids Res. 12:4431-443.

16. DeLange, T., A. Y. C. Liu, L. H. T. Van der Ploeg, P. Borst,M. C. Tromp, and J. H. Van Boom. 1983. Tandem repetition ofthe 5' mini-exon of variant surface glycoprotein genes: a multi-ple promotor for VSG gene transcription? Cell 34:891-900.

17. Devereux, J., P. Haeberli, and 0. Smithies. 1984. A comprehen-sive set of sequence analysis programs for the VAX. NucleicAcids Res. 12:387-395.

18. DiNocera, P. P., and G. Casari. 1987. Related polypeptides areencoded by Drosophila F elements, I factors and mammalian Lisequences. Proc. Natl. Acad. Sci. USA 84:5843-5847.

19. Doolittle, R. F., D. F. Feng, M. S. Johnson, and M. A. McClure.1989. Origins and evolutionary relationships of retroviruses. Q.Rev. Biol. 64:1-30.

20. Fawcett, D. H., C. K. Lister, E. Kellett, and D. J. Finnegan.1986. Transposable elements controlling I-R hybrid dysgenesisin D. melanogaster are similar to mammalian LINEs. Cell47:1007-1015.

21. Fourcade-Peronnet, F., L. d' Auriol, J. Becker, F. Galibert, andM. Best-Belpomme. 1988. Primary structure and functionalorganization of Drosophila 1731 retrotransposon. Nucleic AcidsRes. 16:6113-6125.

22. Gabriel, A., S. S. Sisodia, and D. W. Cleveland. 1987. Evidenceof discontinuous transcription in the trypanosomatid Crithidiafasciculata. J. Biol. Chem. 262:16192-16199.

23. Gabriel, A., T. J. Yen, D. C. Schwartz, C. L. Smith, J. D. Boeke,B. Sollner-Webb, and D. W. Cleveland. 1990. A rapidly rear-

ranging retrotransposon within the mini-exon gene locus ofCrithidiafasciculata. Mol. Cell. Biol. 10:615-624.

24. Hutchison, C. A., S. C. Hardies, D. D. Loeb, W. R. Shehee, andM. H. Edgeil. 1989. LINEs and related retrosposons: longinterspersed repeated sequences in the eukaryotic genome, p.593-617. In D. E. Berg and M. M. Howe (ed.), Mobile DNA.American Society for Microbiology, Washington, D.C.

25. Jakubczak, J. L., W. D. Burke, and T. H. Eickbush. 1991.Retrotransposable elements Ri and R2 interrupt the rRNAgenes of most insects. Proc. Natl. Acad. Sci. USA 88:3295-3299.

26. Jakubczak, J. L., Y. Xiong, and T. H. Eickbush. 1990. Type I(Ri) and type II (R2) ribosomal DNA insertions of Drosophilamelanogaster are retrotransposable elements closely related tothose of Bombyx mori. J. Mol. Biol. 212:37-52.

27. Johnson, M. S., M. A. McClure, D. F. Feng, J. Gray, and R. F.Doolittle. 1986. Computer analysis of retroviral pol genes:assignment of enzymatic functions to specific sequences andhomologies with nonviral enzymes. Proc. Natl. Acad. Sci. USA83:7648-7652.

28. Kazazian, H. H., C. Wong, H. Youssoufian, A. F. Scott, D. G.Phillips, and S. E. Antonarakis. 1988. Haemophilia A resultingfrom de novo insertion of Li sequences represents a novelmechanism for mutation in man. Nature (London) 332:164-166.

29. Kimmel, B. E., 0. K. Ole-Moiyoi, and J. R. Young. 1987. Ingi,a 5.2-kb dispersed sequence element from Trypanosoma bruceithat carries half of a smaller mobile element at either end andhas homology with mammalian LINEs. Mol. Cell. Biol. 7:1465-1475.

30. Klug, A., and D. Rhodes. 1987. "Zinc fingers:" a novel proteinmotif for nucleic acid recognition. Trends Biochem. Sci. 12:464-469.

31. Kozak, M. 1986. Point mutations define a sequence flanking theAUG initiator codon that modulates translation by eukaryoticribosomes. Cell 44:283-292.

32. Lake, J. A., V. F. de la Cruz, P. C. G. Ferreira, C. Morel, andL. Simpson. 1988. Evolution of parasitism: kinetoplastid proto-zoan history reconstructed from mitochondrial rRNA genesequences. Proc. Natl. Acad. Sci. USA 85:4779-4783.

33. Lanar, D. E., L. S. Levy, and J. E. Manning. 1981. Complexityand content of the DNA and RNA in Trypanosoma cruzi. Mol.Biochem. Parasitol. 3:327-341.

34. Loeb, D. D., R. W. Padgett, S. C. Hardies, W. R. Shehee, M. B.Comer, M. H. Edgell, and C. A. Hutchison. 1986. The sequenceof a large LlMd element reveals a tandemly repeated 5' end andseveral features found in retrotransposons. Mol. Cell. Biol.6:168-182.

35. Maniatis, T., E. F. Frisch, and J. Sambrook. 1989. Molecularcloning: a laboratory manual. Cold Spring Harbor LaboratoryPress, Cold Spring Harbor, N.Y.

36. Miller, J., A. D. McLachlan, and A. Klug. 1985. Repetitivezinc-binding domains in the protein transcription factor IIIAfrom Xenopus oocytes. EMBO J. 4:1609-1614.

37. Mount, S. M., and G. M. Rubin. 1985. Complete nucleotidesequence of the Drosophila transposable element copia: homol-ogy between copia and retroviral proteins. Mol. Cell. Biol.5:1630-1638.

38. Murphy, N. B., A. Pays, P. Tebabi, H. Coquelet, M. Guyaux, M.Steinert, and E. Pays. 1987. Trypanosoma brucei repeatedelement with unusual structural and transcriptional properties.J. Mol. Biol. 195:855-871.

39. Nelson, R. G., M. Parsons, P. J. Barr, K. Stuart, M. Selkirk, andN. Agabian. 1983. Sequences homologous to the variant antigenmRNA spliced leader are located in tandem repeats and variableorphons in Trypanosoma brucei. Cell 34:901-909.

40. Nelson, R. G., M. Parsons, M. Selkirk, G. Newport, P. J. Barr,and N. Agabian. 1984. Sequences homologous to variant antigenmRNA spliced leader in Trypanosomatidae which do not un-dergo antigenic variation. Nature (London) 308:665-667.

41. Parkhurst, S. M., and V. G. Corces. 1986. Developmentalexpression of Drosophila melanogaster retrovirus-like trans-posable elements. EMBO J. 6:419-424.

42. Pays, E., and N. B. Murphy. 1987. DNA-binding fingers en-

VOL . 1 l, 1991

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from

Page 10: A New Member of a Family of Site-Specific Retrotransposons Is ...

6148 VILLANUEVA ET AL.

coded by a trypanosome retroposon. J. Mol. Biol. 197:147-148.43. Ratner, L., W. Haseltine, R. Patarca, K. J. Livak, B. Starcich,

S. F. Josephs, E. R. Doran, J. A. Rafalski, E. A. Whitehorn,K. Baumeister, L. Ivanoff, S. R. Petteway, Jr., M. L. Pearson,J. A. Lautenberger, T. S. Papas, J. Ghrayeb, N. T. Chang,R. C. Gallo, and F. Wong-Staal. 1985. Complete nucleotidesequence of the AIDS virus, HTLV-III. Nature (London)313:277-284.

44. Sandmeyer, S. B., L. J. Hansen, and D. L. Chalker. 1990.Integration specificity of retrotransposons and retroviruses.Annu. Rev. Genet. 24:491-518.

45. Sanger, F., S. Nicklen, and A. R. Coulson. 1977. DNA sequenc-ing with chain-terminating inhibitors. Proc. Natl. Acad. Sci.USA 74:5463-5467.

46. Seiki, M., S. Hattori, Y. Hirayama, and M. Yoshida. 1983.Human adult T-cell leukemia virus: complete nucleotide se-quence of the provirus genome integrated in leukemia cell DNA.Proc. Natl. Acad. Sci. USA 80:3618-3622.

47. Shinnick, T. M., R. A. Lerner, and J. G. Sutcliffe. 1981.Nucleotide sequence of Moloney murine leukaemia virus. Na-ture (London) 293:543-548.

48. Skowronski, J., and M. F. Singer. 1985. Expression of a cyto-plasmic LINE-1 transcript is regulated in a human teratocarci-noma cell line. Proc. Natl. Acad. Sci. USA 82:6050-6054.

MOL. CELL. BIOL.

49. Southern, E. M. 1975. Detection of specific sequences amongDNA fragments separated by gel electrophoresis. J. Mol. Biol.98:503-517.

50. Toh, H., R. Kikuno, H. Hayashida, T. Miyata, W. Kugimiya, S.Inouye, S. Yuki, and K. Saigo. 1985. Close structural resem-blance between putative polymerase of a Drosophila transpos-able genetic element 17.6 and pol gene product of Moloneymurine leukemia virus. EMBO J. 4:1267-1272.

51. Vickerman, K. 1976. The diversity of the kinetoplastid flagel-lates, p. 1-34. In W. H. R. Lumsden and D. A. Evans (ed.),Biology of the Kinetoplastida, vol. 1. Academic Press, London.

52. Xiong, Y., and T. H. Eickbush. 1988. The site-specific ribosomalDNA insertion element RlBm belongs to a class of non-long-terminal repeat retrotransposons. Mol. Cell. Biol. 8:114-123.

53. Xiong, Y., and T. H. Eickbush. 1988. Functional expression of asequence specific endonuclease encoded by the retrotransposonR2Bm. Cell 55:235-246.

54. Xiong, Y., and T. H. Eickbush. 1990. Origin and evolution ofretroelements based upon their reverse transcriptase sequences.EMBO J. 9:3353-3362.

55. Yamamoto, K. R., B. M. Alberts, R. Benzinger, L. Lawhorne,and G. Treiber. 1970. Rapid bacteriophage sedimentation in thepresence of polyethylene glycol and its application to large-scalevirus purification. Virology 40:734-744.

on February 17, 2018 by guest

http://mcb.asm

.org/D

ownloaded from