Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated...

12
Human endogenous retrovirus family HERV-K(HML-5): Status, evolution, and reconstruction of an ancient betaretrovirus in the human genome Lavie, Laurence; Medstrand, Patrik; Schempp, Werner; Meese, Eckart; Mayer, Jens Published in: Journal of Virology DOI: 10.1128/JVI.78.16.8788-8798.2004 Published: 2004-01-01 Link to publication Citation for published version (APA): Lavie, L., Medstrand, P., Schempp, W., Meese, E., & Mayer, J. (2004). Human endogenous retrovirus family HERV-K(HML-5): Status, evolution, and reconstruction of an ancient betaretrovirus in the human genome. Journal of Virology, 78(16), 8788-8798. DOI: 10.1128/JVI.78.16.8788-8798.2004 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Transcript of Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated...

Page 1: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

LUND UNIVERSITY

PO Box 117221 00 Lund+46 46-222 00 00

Human endogenous retrovirus family HERV-K(HML-5): Status, evolution, andreconstruction of an ancient betaretrovirus in the human genome

Lavie, Laurence; Medstrand, Patrik; Schempp, Werner; Meese, Eckart; Mayer, Jens

Published in:Journal of Virology

DOI:10.1128/JVI.78.16.8788-8798.2004

Published: 2004-01-01

Link to publication

Citation for published version (APA):Lavie, L., Medstrand, P., Schempp, W., Meese, E., & Mayer, J. (2004). Human endogenous retrovirus familyHERV-K(HML-5): Status, evolution, and reconstruction of an ancient betaretrovirus in the human genome.Journal of Virology, 78(16), 8788-8798. DOI: 10.1128/JVI.78.16.8788-8798.2004

General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authorsand/or other copyright owners and it is a condition of accessing publications that users recognise and abide by thelegal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of privatestudy or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Page 2: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

JOURNAL OF VIROLOGY, Aug. 2004, p. 8788–8798 Vol. 78, No. 160022-538X/04/$08.00�0 DOI: 10.1128/JVI.78.16.8788–8798.2004Copyright © 2004, American Society for Microbiology. All Rights Reserved.

Human Endogenous Retrovirus Family HERV-K(HML-5): Status,Evolution, and Reconstruction of an Ancient Betaretrovirus

in the Human Genome†Laurence Lavie,1 Patrik Medstrand,2 Werner Schempp,3 Eckart Meese,1 and Jens Mayer1*Department of Human Genetics, University of Saarland, 66421 Homburg,1 and Institute of Human Genetics and

Anthropology, University of Freiburg, 79106 Freiburg,3 Germany, and Department of Cell andMolecular Biology, Lund University, 22184 Lund, Sweden2

Received 27 December 2003/Accepted 12 April 2004

The human genome harbors numerous distinct families of so-called human endogenous retroviruses(HERV) which are remnants of exogenous retroviruses that entered the germ line millions of years ago. Wedescribe here the hitherto little-characterized betaretrovirus HERV-K(HML-5) family (named HERVK22 inRepbase) in greater detail. Out of 139 proviruses, only a few loci represent full-length proviruses, and manylack gag protease and/or env gene regions. We generated a consensus sequence from multiple alignment of 62HML-5 loci that displays open reading frames for the four major retroviral proteins. Four HML-5 longterminal repeat (LTR) subfamilies were identified that are associated with monophyletic proviral bodies,implying different evolution of HML-5 LTRs and genes. Sequence analysis indicated that the proviruses formedapproximately 55 million years ago. Accordingly, HML-5 proviral sequences were detected in Old World andNew World primates but not in prosimians. No recent activity is associated with this HERV family. We alsoconclude that the HML-5 consensus sequence primer binding site is identical to methionine tRNA. Therefore,the family should be designated HERV-M. Our study provides important insights into the structure andevolution of the oldest betaretrovirus in the primate genome known to date.

Approximately 8% of the human genome is derived fromretrovirus-like elements termed endogenous retroviruses(ERV) (14). Most of them are likely remnants of exogenousretrovirus infection of the germ line which became fixed in thepopulation millions of years ago. Intracellular retrotransposi-tion events increased proviral copy number during evolution,resulting in the presence of a few to several thousand provi-ruses belonging to various ERV families. A variety of distincthuman ERV (HERV) families have been identified, suggestingthat the germ line was invaded by various exogenous retrovi-ruses (23, 25, 41). The human genome was recently estimatedto contain 30 to 50 distinct HERV families (11). About 100different HERV sequences are defined in Repbase, a widelyemployed reference sequence database for repetitive elements(15). Unfortunately, there is no established nomenclature forHERV. We follow here a nomenclature that uses the single-letter amino acid code of the tRNA complementary to theprimer binding site formerly used to prime reverse transcrip-tion. Accordingly, HERV-K denotes a number of HERV fam-ilies having a lysine tRNA-like primer binding site. Phyloge-netic analysis of HERV reverse transcriptase sequences haveidentified 10 HERV-K families in the human genome whichwere termed human MMTV-like (HML-1 to HML-10) be-cause of homologies to the betaretrovirus mouse mammary

tumor virus (MMTV) (1, 32). Repbase Update also lists 10HERV-K families.

Structurally intact ERV contain gag, protease (prt), polymer-ase (pol), and envelope (env) genes and are flanked by longterminal repeats (LTRs). Two HERV families, HERV-K(HML-2) and HERV-W, contain intact open reading frames(ORFs) and encode functional proteins (2, 3, 18, 27, 29, 31,36). Further proviral sequences with Env coding capacity wereidentified recently (9). Most HERV sequences are coding de-ficient because they accumulated a variety of mutations anddeletions since provirus formation. The vast majority of HERVare present in the genome only as solitary LTRs due to ho-mologous recombination between 5� and 3� LTRs.

Many HERV families were fixed in Old World primatesafter their evolutionary split from New World primates about35 million years ago (41). For example, the HERV-K familiesHML-2, HML-3, and HML-6 were all found in Old Worldmonkeys but missing in New World primates. The HERV-K(HML-2) family seems to be relatively long-lived in the ge-nome, displaying transpositional activity since its appearancein the primate lineage up till after the human-chimpanzee split(7, 28, 33, 38). In contrast, other ERV families are short-lived.In the case of the HERV-K(HML-3) family, no activity interms of provirus formation was indicated in the human evo-lutionary lineage over the last 30 million years (26). To date,only a few HERV families have been characterized in greaterdetail. Here we characterize the HERV-K(HML-5) family byanalyzing proviruses in the human genome. We find that theHML-5 proviruses have an older origin than other betaretro-viruses in the human genome, and we reconstruct a putativefull-length coding-competent ancient betaretrovirus which tar-geted the primate germ line more than 50 million years ago.

* Corresponding author. Mailing address: Department of HumanGenetics, Building 60, University of Saarland, Medical Faculty, 66421Homburg, Germany. Phone: 49 6841 1626627. Fax: 49 6841 1626186.E-mail: [email protected].

† Supplemental material for this article may be found at http://jvi.asm.org/.

8788

Page 3: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

MATERIALS AND METHODS

HERV-K(HML-5) sequence retrieval. HERV-K(HML-5) proviruses(LTR22A/HERVK22/LTR22A; Repbase version 8.2.0 consensus sequences)were identified by downloading the human sequence specified as HERVK22 inthe RepeatMasker annotation of the June 2002 (hg12) UCSC genome browser(http://genome.ucsc.edu/). We performed dot matrix comparisons between eachproviral sequence using MacVector (Accelrys Inc.) with default settings. Weidentified boundaries between HERV-K(HML-5) proviral loci and flanking cel-lular sequences and the location of nonretroviral repetitive elements within theproviral portions by using RepeatMasker (provided by A. F. A. Smit and P.Green; RepeatMasker at http://www.repeatmasker.org). We subsequently re-moved LTRs and apparent non-HERV-K(HML-5) sequences.

Multiple alignment of HERV-K(HML-5) sequences and consensus sequencegeneration. We produced multiple alignments of HERV-K(HML-5) proviralsequences displaying a minimal size of 3 kb (after deletion of other repetitiveelements). Multiple alignments were generated employing DIALIGN2 (35) pro-vided by the Institut Pasteur web server (http://bioweb.pasteur.fr) or employingClustalW (42) using default settings. We further optimized alignments by hand-operating the Se-Al program (provided by Andrew Rambaut; http://evolve.zoo.ox.ac.uk/). Consensus sequences, generated by the Boxshade program at theInstitut Pasteur, were further evaluated and corrected manually. The resultingproviral sequence was analyzed for encoded retroviral proteins and conserveddomains employing BlastP and CD-search at the National Center for Biotech-nology Information.

Phylogenetic analysis. We employed PAUP* (Sinauer Associates) and thePHYLIP package (10), the latter provided by the Institut Pasteur web server, forphylogenetic analysis. The above programs generated multiple alignment of LTRsequences, and several regions from the multiple alignment of proviral bodyregions were subjected to neighbor-joining analysis. Nucleotide distances werecorrected according to the Kimura-2-parameter model (17). Gaps and ambigu-ous positions were excluded from analysis. Sequences obtained from variousprimate species were included in the respective multiple alignment portions ofhuman sequences and were analyzed similarly.

Evolutionary age of proviruses. Approximate integration times of HERV-K(HML-5) proviruses displaying almost full-length LTRs on both sides wereestimated by determination of Kimura-2-parameter corrected nucleotide dis-tances between 5� and 3� LTRs of particular proviruses by employing PAUP*.Proviral age (T) was estimated according to the following formula: T � D/(2 �

0.13), where D is the nucleotide divergence between the 2 LTR sequences and0.13 is the estimated average substitution rate per nucleotide and million years(8, 19). The factor 2 considers that both LTRs diverged independently from eachother.

Test for HERV-K(HML-5) homologous sequences in primate species. PCRprimers corresponding to the 5� portion of the gag gene (P1, 5�-AACAGTATATAAAAGTATTGAAACA-3�; and P2, 5�-TGCTTTTTCTTATCTCTTTATAAG-3�) and the env gene (P3, 5�-CCATTCACTCCAGATAATTTGT-3�; and P4,5�-TTCTTTTCCAATCAAGGAGTGA-3�) within the HERV-K(HML-5) con-sensus sequence were used to amplify HERV-K(HML-5) proviruses from humanand various other primate species genomic DNAs, together comprising fourprimate clades: Pan troglodytes, Pongo pygmaeus, Hylobates lar, Colobus guereza,Mandrillus sphinx, Macaca mulatta, Saimiri sciureus, Callithrix jacchus, Alouattaseniculus, and Nycticebus coucang. Genomic DNA (100 ng each) was subjected toPCR with 1 �M primers and 2.5 U of Taq polymerase (Invitrogen Inc.) in a finalvolume of 25 �l. PCR cycling conditions for gag and env amplification were asfollows: 5 min at 94°C; 35 cycles of 45 s at 94°C, 45 s at 54°C, and 2 min at 72°C;5 min at 72°C. PCR products for the gag region obtained from P. troglodytes, H.lar, M. mulatta, and A. seniculus were cloned into the pGEM-T vector (Promega).Sequences for both DNA strands were obtained using a SequiTherm Excel IIDNA sequencing kit-LC (Biozym) and an automated DNA sequencer (Licor4000-L; MWG). Raw sequence data were analyzed using the Sequencher soft-ware (Gene Codes Corp.).

Nucleotide sequence accession numbers. We deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase.

RESULTS

Identification and structure of HERV-K(HML-5) provi-ruses. By using the annotation within the UCSC genomebrowser (http://genome.ucsc.edu/), we identified 139 HML-5proviruses in the human genome. Twenty-three out of 139(16.5%) loci were located on the Y chromosome, whereas

other loci seemed randomly distributed along autosomes. Wenext performed dot matrix comparisons between proviral lociand the Repbase consensus sequences (LTR22A/HERVK22/LTR22A) to examine proviral structures in more detail. Wegenerated schematic structures of 100 out of 139 HML-5 se-quences (Fig. 1). We did not include 39 of the provirusesbecause those proviral sequences were heavily mutated due tointegration of other repetitive elements, deletions, duplica-tions, and/or inversions. The majority (123 proviruses) dis-played larger deletions of retroviral genes: only 11 provirusesdisplayed a complete gag gene, 12 a complete prt gene, 45 acomplete pol gene, and 15 a complete env gene. Among those,multiple sequences commonly displayed large deletions ofabout 1,410 and 1,320 bp within either the gag-prt junction orthe env region, respectively, or in both. The first deletion re-moved a large portion of the 5� part of the gag gene and thecomplete prt gene, and the second deletion removed a largecentral part of the env gene (Fig. 1). Sixteen proviruses wereintact regarding the presence of retroviral genes, and nine ofthese resembled full-length proviral structures having LTRs onboth sides flanking the putative coding regions.

Reconstruction of a coding-competent HML-5 provirus. Be-fore multiple sequence alignment of HML-5 sequences, weemployed RepeatMasker to identify and to subsequently de-lete non-HML-5 sequences from proviral sequence entries. Agreater number of proviruses contained one or several otherrepetitive elements which likely integrated into the provirusesat different stages during primate evolution. L1 elements fromthe M1, P, and PA2-6 subfamilies and Alu elements from theY, Yc, Ya5, Yb9, Sp, Sc, and Sg subfamilies were recognized(4, 16, 40). Furthermore, numerous LTR elements annotatedin Repbase as LTR5B, -6A, -7, -7B, -10F, -12, -12B, -12C, -15,-17, -19, -30, and MER11C (15) were found (Table 1). Wechose 62 proviral sequence entries with the least deletions formultiple alignment. By using Boxshade and subsequent visualcorrection, we generated a HML-5 consensus sequence fromthe alignment. The sequence displayed a total length of 6,824bp and was 4% divergent in sequence compared to theHERVK22 consensus sequence present in Repbase. Severalnucleotide differences adjusted reading frames with regard tothe Repbase sequence. Two 1-bp and an 11-bp insertion wereintroduced into the 5� intergenic region. A 3-bp sequence wasintroduced, and 2 and 1 bp were removed in the gag, prt, andpol gene regions, respectively. In addition, a stop codon withinthe prt gene and three stop codons within pol could be cor-rected.

Translation of our consensus sequence gave rise to ORFs forfour major retroviral proteins that were not found as such inthe Repbase sequence. ORFs for putative retroviral gag, prt,pol, and env genes could be defined (Fig. 1; see also Fig. 1A inthe supplemental material, which is also available from usupon request). The putative 517-, 265-, 911-, and 694-amino-acid Gag, Prt, Pol, and Env proteins, respectively, displayedextended similarities to the corresponding retroviral proteins,as revealed by CD-search and BlastP analysis. As describedrecently, HML-5 proviruses once harbored a potentially activedUTPase enzyme within the prt ORF and also a domain en-coding a retroviral aspartyl protease (30). As expected, the polreading frame encoded retroviral reverse transcriptase, RNaseH, and integrase domains. CD-search furthermore identified a

VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8789

Page 4: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

FIG. 1. Structures of 100 HERV-K(HML-5) provirus sequences. An ORF map of the HERV-K(HML-5) consensus sequence generated in thisstudy, including retroviral gag, prt, pol, and env reading frames and protein domains, is shown at the top. Structures of proviral sequences withregard to the consensus sequence are depicted below, with horizontal lines indicating the presence of respective proviral portions. Numbers to theright refer to proviral locus numbers given in Table 1. Proviruses 117, 133, and 136 probably stem from genomic duplication events and are

8790 LAVIE ET AL. J. VIROL.

Page 5: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

HERV-K/MMTV-type gp36 domain within the Env proteinC-terminal portion (Fig. 1). BlastP analysis of encoded retro-viral proteins against the GenBank protein division revealedhigh similarities to retroviral proteins from the HERV-K(HML-2) family. Moreover, significant similarities to the be-taretroviruses ovine pulmonary adenocarcinoma virus,MMTV, Mason-Pfizer monkey virus, and simian retroviruses 1and 2 were detected, with identities to respective retroviralproteins ranging from 26 to 49% and similarities ranging from41 to 64% (as determined by BLAST). Thus, our method wasable to reconstruct all putative coding sequences of an ancientbetaretrovirus.

The HML-5 sequence was initially assigned to the HERV-Ksuperfamily due to the similarity in pol with other HERV-Kfamilies. Based on the analysis of an earlier draft version of thehuman genome sequence, Tristem suggested that the HERV-K(HML-5) primer binding site region is more similar to iso-leucine than to lysine (43). We analyzed the HERV-K(HML-5) consensus primer binding site region in regard tothe similarity to reported human tRNA sequences (22) (http://rna.wustl.edu/GtRDB/) and found that the HML-5 primerbinding site is identical in sequence to the methionine tRNA 3�end along at least 18 nucleotides (nt). Isoleucine tRNA dis-played differences in three nucleotides, and lysine tRNAs dis-played at least six different nucleotides (Fig. 2). Therefore,following the current HERV nomenclature, the HERV-K(HML-5) family should be designated HERV-M.

Phylogeny of the HERV-K(HML-5) family. Dot matrix com-parisons between the different proviruses and the HERVK22sequence flanked by LTR22A, as included in Repbase, showedthat only a few entries presented more extended similaritieswith the LTR22A sequence. This finding suggested the exis-tence of one or more LTR variants associated with HML-5sequences and corroborated the definition of different LTR22subfamilies in Repbase (LTR22, LTR22A, and LTR22B). Foreach proviral entry, we utilized RepeatMasker annotations todetermine the presence of complete or deleted 5� and 3� LTRs(Table 2). We found that 70 proviral sequences were associ-ated with full-length LTRs on both sides, 59 entries harboredat least one complete or deleted LTR on one side, and 10entries lacked both LTRs. We generated a ClustalW multiplealignment with default alignment parameters for a total of 134fairly complete LTR sequences.

Neighbor-joining analysis of the sequences subject to multi-ple alignment suggested the existence of several LTR subfam-ilies (Fig. 3). A major branch, displaying 100% bootstrap sup-port (1,000 replicates), consisted of sequences similar to theLTR22A subfamily. Among those sequences, two subgroupsdisplaying 63 and 91% bootstrap support could be distin-guished. We named these two subgroups LTR22A andLTR22A2. Another group (99% support) consisted of se-quences with similarity to the LTR22 subfamily. The majority

of the remaining sequences were more similar to the LTR22Bsubfamily. Therefore, analysis of the entire HML-5 data setrevealed three major LTR subfamilies (LTR22, LTR22A, andLTR22B) associated with the proviral sequences and is inaccord with previous findings. In addition, LTR22A comprisesa clearly distinguishable subgroup, named LTR22A2.

Based on the larger data set available in our study (40), wegenerated three new consensus sequences for the LTR22 fam-ilies listed in Repbase and for LTR22A. The classification ofLTR sequences in Table 2 is based on their phylogenetic re-lationship to those new LTR22 consensus sequences. The newconsensus sequences for LTR22B and LTR22A displayedlengths of 497 and 457 bp, respectively, and were modestlydifferent from the Repbase sequences. The 471-nt-longLTR22A2 consensus sequence presented 74% identity and 4%gaps compared to LTR22A. As revealed in this study, thepreviously established Repbase LTR22 consensus sequencewas probably derived from three loci located on the Y chro-mosome (proviruses 117, 121, 131; Fig. 3) that are less repre-sentative for LTR22. The LTR22 consensus sequence gener-ated in this study is 492 bp in length compared to the 580-ntRepbase sequence that furthermore included 47 ambiguouspositions (International Union of Pure and Applied Chemistrycodes).

We furthermore analyzed the phylogenetic relationships be-tween HERV-K(HML-5) proviruses for five different proviralregions (representing all major retroviral genes) by neighbor-joining and bootstrap analysis. Phylogenetic trees displayedsimilar branch lengths with respect to the consensus sequence,suggesting a similar nucleotide divergence and age of proviralsequences (data not shown). Neighbor-joining analysis impliedtwo HML-5 subfamilies, but they were not supported in thebootstrap analysis. Only a few chromosome Y sequences weredistinguished from the remaining sequences with higher boot-strap support (Fig. 4). An exception was obtained for theregion spanning the end of reverse transcriptase until the startof RNase H. In this case, a subfamily of 28 proviral sequenceswas distinguished with 86% bootstrap support that was mostlyassociated with LTR22A and LTR22A2 sequences. Eight moreLTR22B sequences were separated with 90% support. Theremaining 14 sequences (with low bootstrap support) wereeither LTR22 or LTR22B (data not shown). Taken together,analysis of four out of five internal regions supports the obser-vation that HML-5 proviral bodies do not form distinct sub-groups but rather represent a monophyletic group, in contrastto phylogenetically clearly different LTR sequences.

Age of the HERV-K(HML-5) family. A number of repetitiveelements were found in HERV-K(HML-5) proviruses. In par-ticular, reasonably old Alu subfamilies, such as AluSg, AluSc,and AluSp (Table 1), with approximate evolutionary ages of 31,35, 37 million years (16), respectively, suggested an evolution-ary age of HML-5 proviruses higher than that of other

therefore summarized in one line. Non-HERV-K(HML-5) insertions, such as Alu or L1 elements, present in some proviruses were excluded fromthe figure. Note that the majority of sequences display two commonly deleted regions within the gag-prt and the env genes, indicated by verticaldotted lines. Some sequences only comprise 5� or 3� portions. For gag-prt, 5� and 3� deletion points clustered between nt 768 to 960 and nt 2061to 2319, respectively (52% being located at nt 768 and nt 2303). For env, clusters ranged from nt 5256 to 6328 and nt 6568 to 6641 (75% locatedat nt 5314 and 68% at nt 6641). The thick black lines numbered 1 through 5 indicate regions in the multiple sequence alignment for whichphylogenetic analysis was performed. RT, reverse transcriptase.

VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8791

Page 6: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

TA

BL

E1.

HE

RV

-K(H

ML

-5)

loci

inve

stig

ated

inth

isst

udya

No.

Chr

.ban

dSi

ze(b

p)A

cces

sion

no.

Rep

eats

No.

Chr

.ban

dSi

ze(b

p)A

cces

sion

no.

Rep

eats

11p

35.2

829

AL

6629

06.4

(104

050–

1048

80)�

72*

4q32

.16,

276

AC

1070

56.5

(822

0–14

493)

�2*

1p22

.23,

721

AC

0191

87.3

(101

11–1

3833

)�73

*5p

14.3

3,53

7A

C02

6722

.4(9

6454

–999

85)�

3*1q

225,

409

AL

1359

27.1

4(71

991–

7740

1)�

Alu

Sp/A

luY

a574

5p14

.166

2A

C02

7333

.5(1

6879

0–16

9434

)�4

1q23

.112

,084

AL

3597

53.9

(444

27–5

6512

)�L

1PA

2/L

1PA

375

*5p

13.2

7,87

7A

C00

8925

.4(5

0885

–587

63)�

Alu

Y5*

1q25

.35,

167

AL

1333

83.1

0(88

786–

9395

4)�

76*

5q33

.12,

096

AC

0221

06.5

(509

79–5

3076

)�6

1q31

.365

1A

L59

2144

.5(1

4818

7–14

8839

)�77

*6p

22.3

7,74

2A

L59

1416

.4(9

7763

–105

506)

�7*

1q32

.12,

719

AL

3598

37.2

1(11

2252

–114

972)

�78

*6p

22.2

3,24

2A

L03

1777

.5(6

2600

–658

43)�

8*1q

32.1

2,71

8A

C02

2000

.1(5

7476

–601

95)�

79*

6p21

.32

7,28

7A

L64

5931

.7(5

7633

–644

54)�

9*1q

42.1

27,

611

AL

1627

38,1

2(44

11–1

2023

)�L

TR

12/L

TR

30/L

TR

5880

6p21

.32

136

AL

6459

31.7

(644

51–6

4588

)�10

*1q

435,

604

AL

5916

86.9

(118

293–

1238

98)�

Alu

Y81

*6p

21.3

11,

922

AL

1387

21.1

6(57

057–

5897

2)�

Alu

Sg(i

n5�

LT

R)

1110

q11.

215,

689

AC

0687

07.6

(109

183–

1148

73)�

Alu

Sp82

*6p

12.3

4,53

1A

L35

9458

.17(

6657

5–71

107)

�12

11q1

2.1

5,13

4A

P001

652.

4(15

6224

–161

359)

�83

*6p

12.1

4,88

9A

L45

0489

.12(

4528

4–50

174)

�13

*11

q14.

17,

752

AP0

0339

8.2(

1538

58–1

5682

9)�

84*

6q12

5,25

1A

L07

8597

.11(

4651

3–51

765)

�A

luY

(in

LT

R)

14*

11q2

16,

698

AP0

0087

0.4(

6549

2–72

191)

�A

luY

85*

6q14

.17,

151

AL

3918

43.1

3(67

3–78

25)�

15*

11q2

2.1

4,87

8A

P003

403.

2(16

4665

–169

544)

�86

*6q

14.3

4,49

4A

L35

7272

.10(

1482

9–19

324)

�16

11q2

2.1

771

AP0

0079

8.4(

4473

3–45

495)

�87

*6q

16.3

7,83

0A

L16

1720

.15(

1203

7–19

868)

�A

luY

1711

q22.

160

3A

P000

798.

4(45

495–

4608

1)�

88*

7p14

.14,

097

AC

0730

68.8

(158

047–

1621

45)�

18*

12p1

3.31

4,02

6A

C09

2746

.9(1

6238

–202

58)�

897p

14.1

640

AC

7206

1.8(

1598

4–16

579)

�19

12p1

3.31

242

AC

0927

46.9

(130

74–1

3317

)�90

*7q

21.1

11,

619

AC

0965

62.1

(411

74–4

2804

)�20

*12

q12

4,95

7A

C07

9601

.22(

7991

6–84

874)

�91

7q22

.35,

307

AC

0048

55.1

(661

47–7

1455

)�21

*12

q12

4,88

7A

C07

6972

.16(

2999

3–34

881)

�92

*7q

31.3

17,

753

AC

0045

36.1

(332

87–4

1051

)�22

12q1

54,

267

AC

0206

56.3

0(52

510–

5677

8)�

LT

R7B

(in

5�L

TR

)93

*8p

23.2

5,11

3A

C08

7369

.5(9

0669

–957

83)�

23*

12q2

1.32

4,96

6A

C08

7865

.8(1

5116

9–15

6136

)�94

*8p

23.2

5,04

3A

C08

7369

.5(3

6733

–417

75)�

2412

q22

5,22

0A

C01

8475

.27(

1655

38–1

7075

9)�

LT

R30

/LT

R12

/LT

R12

895

*8P

23.2

5,81

9A

C08

7369

.5(1

4280

–201

00)�

25*

12q2

3.1

6,09

1A

C01

0203

.13(

5308

7–59

179)

�96

*8p

23.1

6,27

2A

C05

5869

.4(2

3969

–271

26)�

Alu

Y/L

TR

7/H

ER

V-H

26*

12q2

4.31

3,93

0A

C00

5858

.1(7

141–

1107

2)�

97*

8p21

.21,

480

AC

0249

58.8

(167

718–

1691

99)�

27*

13q1

2.12

2,59

4A

L35

4798

.13(

8742

1–90

016)

�98

*8q

11.1

5,21

3A

C10

4576

.5(1

2676

8–13

1982

)�A

luY

28*

13q2

1.33

5,03

4A

L13

9001

.14(

8067

3–85

708)

�99

*8q

12.1

5,35

5A

C09

1561

.4(1

8044

1–18

5797

)�29

*13

q33.

14,

742

AL

1580

63.1

2(11

0923

–115

660)

�10

0*8q

21.3

3,72

0A

C11

0012

.3(1

3409

5–13

7806

)�30

*14

q21.

12,

851

AL

3568

00.3

(992

79–1

0213

1)�

101*

9p24

.12,

439

AL

1335

47.1

6(38

846–

4128

6)�

31*

14q2

1.2

3,68

0A

L16

1752

.4(8

5483

–891

64)�

102*

9p23

6,21

9A

L45

1129

.6(1

6932

–231

52)�

32*

14q3

1.1

8,65

4A

C01

8513

.5(1

9822

–284

77)�

L1P

A6/

LT

R12

B/L

TR

30/L

TR

1210

3*9q

31.1

5,18

7A

L35

3805

.20(

3011

6–35

304)

�A

luY

3314

q32.

335,

131

AB

0194

37.1

(130

399–

1355

31)�

104*

Xp2

1.3

7,30

4A

C02

4024

.6(1

0252

9–10

9834

)�L

1PA

534

15q1

5.1

4,34

5A

C08

7878

.7(9

2196

–965

42)�

Alu

Sg10

5*X

p21.

31,

130

AC

0052

97.1

(121

542–

1226

63)�

35*

15q2

1.2

6,20

6A

C02

5040

.7(6

4493

–652

86)�

L1P

A4

106*

Xp1

1.1

2,57

6A

L35

4756

.18(

1424

8–16

813)

�L

TR

1536

*15

q21.

34,

766

AC

0687

23.5

(705

64–7

5331

)�10

7*X

q11.

26,

619

AL

1582

03.1

2(31

906–

3852

6)�

37*

16q1

1.2

4,82

8A

C00

2519

.1(6

3018

–678

47)�

L1M

110

8*X

q11.

25,

610

AL

3537

44.1

8(11

7924

–123

535)

�A

luY

b938

*16

q11.

21,

962

AC

1165

53.1

(784

27–8

0390

)�10

9X

q12

2,64

3A

L44

5523

.11(

6858

9–71

233)

�39

16q1

1.2

670

AC

1067

85.2

(451

30–4

5801

)�11

0*X

q13.

27,

328

AL

3565

13.1

1(68

589–

7123

3)�

aluY

b940

16q1

1.2

6,77

6A

C10

6785

.2(4

7835

–546

12)�

Alu

Y/L

TR

1911

1*X

q21.

14,

840

AL

5921

71.8

(182

87–2

3126

)�41

*18

q12.

14,

077

AC

0742

37.4

(266

72–3

0750

)�11

2*X

q21.

13,

222

AL

5925

63.7

(163

40–1

9563

)�42

18q2

1.2

687

AC

0911

35.9

(622

09–6

2897

)�11

3X

q21.

328,

115

AL

3908

40.1

7(18

9393

–197

509)

�L

1PA

5/L

1P(i

n5�

LT

R)

4318

q21.

22,

626

AC

0911

35.9

(629

17–6

5544

)�A

luY

/Alu

Sp44

*19

p13.

115,

715

AC

1239

12.1

(768

24–8

2540

)�A

luY

114*

Xq2

2.1

6,47

3A

L13

3277

.12(

1570

–804

4)�

45*

19p1

22,

998

AC

0114

93.4

(814

74–8

4473

)�11

5*X

q27.

33,

446

AL

5896

69.1

1(15

7039

–160

486)

�46

19p1

24,

950

AC

0114

93.4

(613

2–62

291)

�A

luSC

/Alu

Y/L

TR

7/H

ER

V-H

116*

Xq2

85,

541

U69

569.

1(18

47–7

389)

�47

19p1

25,

495

AC

0115

03.4

(276

93–3

3189

)�A

luSq

117*

Yp1

1.2

9,51

5A

C00

7274

.3(2

4960

–344

72)�

LT

R6A

/LT

R12

C48

*19

p12

3,59

4A

C01

1503

.4(4

7638

–512

33)�

Alu

SP/M

ER

911

8*Y

p11.

24,

211

AC

0072

75.4

(152

708–

1569

20)�

LT

R17

4919

q11

5,62

5A

C00

6504

.1(3

9162

–447

88)�

Alu

Yc

119*

Yp1

1.2

3,05

1A

C01

6749

.4(1

3125

5–13

4304

)�A

luY

50*

19q1

3.2

2,58

3A

C00

5337

.1(9

746–

1232

5)�

Alu

Y/L

1PA

512

0Y

p11.

28,

661

AC

0099

52.4

(751

35–8

3798

)�A

luY

/ME

R11

C51

2p23

.32,

713

AC

0108

96.1

5(27

276–

2999

0)�

LT

R17

/Alu

Y/A

luY

121*

Yp1

1.2

7,32

1A

C00

6986

.3(1

3826

0–14

5567

)�A

luY

/ME

R11

C52

*2p

16.1

4,93

1A

C01

0738

.5(7

2193

–771

25)�

ME

R2

122*

Yq1

1.22

15,

070

AC

0063

83.2

(688

83–7

3954

)�53

*2p

13.2

7,71

8A

C00

7878

.2(1

8354

9–18

5134

)�12

3Y

q11.

222

5,12

0A

C00

9233

.3(1

7211

–223

32)�

54*

2q21

.16,

694

AC

0682

82.2

(509

64–5

7659

)�A

luSc

124*

Yq1

1.22

24,

573

AC

0092

33.3

(108

140–

1127

14)�

Alu

Y/L

TR

1255

2q24

.35,

920

AC

0925

83.2

(123

236–

1291

57)�

LT

R12

C12

5*Y

q11.

223

6,11

9A

C00

7678

.3(8

8618

–947

34)�

Alu

Sc/A

luSp

/q56

*2q

31.1

7,76

7A

C09

2641

.2(5

721–

1348

9)�

126*

Yq1

1.22

35,

261

AC

0094

94.2

(123

311–

1285

73)�

Alu

Sc(i

n3�

LT

R)

8792 LAVIE ET AL. J. VIROL.

Page 7: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

HERV-K families. The age of a provirus can be estimatedfrom sequence comparison of flanking 5� and 3� LTRs. Owingto the retroviral reverse transcription strategy, both LTR se-quences are identical in sequence at the time of provirus for-mation. Without selective pressure, both LTRs independentlyaccumulate mutations over time. Thus, sequence differencesbetween a provirus’ LTRs are an approximate measure ofprovirus age (8). We determined the degree of sequence di-vergence between 5� and 3� LTRs with larger overlappingportions for 53 HML-5 proviruses. We obtained sequence di-vergences ranging from 6 to 24% (mean, 12.9%; standarddeviation, 3.87). These numbers equal an approximate age of50 (�15) million years for the HML-5 proviruses (Table 2).We furthermore calculated the average ages of proviral se-quences from Kimura-2-parameter corrected distances to theHML-5 consensus sequence for five different proviral regions,excluding gaps and CpG dinucleotides (16). Here, an averageevolutionary age of about 60 (�27) million years was indicated.The age of roughly 55 million years from both analyses thuscorresponds to a HML-5 integration time into the genomeclearly before the evolutionary split of Old World from NewWorld monkeys that took place about 40 million years ago.This observation is in contrast to other HERV-K families de-scribed till now because those are not present in New Worldmonkeys, suggesting that HML-5 represents an ancient beta-retrovirus family in primates. To investigate the possibility thatHML-5 elements are present in New World primate genomes,we examined the species distribution of HML-5 gag and envregions by PCR with genomic DNA from hominoids, OldWorld monkeys, New World monkeys, and prosimians. Theamplified portion in the gag gene was located outside theabove-described commonly deleted region, whereas the ampli-con in env included the frequently deleted region. We obtainedgag and env PCR products of expected sizes from all speciestested, except for prosimians (Fig. 5). This result was in goodagreement with the above-mentioned age estimate from LTRand consensus sequence analysis. Full-length env genes,present in a minority of HML-5 proviruses in the human ge-nome, were amplified as a PCR product from all species. PCRproducts corresponding to the env deletion variant were alsoamplified, indicating the presence of both longer and shorterenv variants in all tested species.

We cloned PCR products from the gag region obtained fromP. troglodytes, H. lar, M. mulatta, and A. seniculus and se-

FIG. 2. Sequence comparison of the HERV-K(HML-5) consensussequence primer binding site with reported tRNA 3� ends (http://rna.wustl.edu/GtRDB). Shown here are the HERV-K(HML-5) con-sensus primer binding site regions from Repbase (Rep) and as gener-ated in this study (own) and the closest matching methionine, isoleu-cine, lysine, and arginine tRNA 3� ends. Anticodon sequences aregiven in brackets.

572q

31.1

6,67

2A

C07

8883

.6(1

3118

8–13

7864

)�12

7*Y

q11.

223

7,72

0A

C00

9489

.3(8

1570

–892

91)�

58*

2q37

.15,

575

AC

0094

07.8

(766

29–8

2201

)�L

TR

10F

/HE

RV

IP10

FH

/LT

R10

F12

8*Y

q11.

223

5,42

0A

C00

7876

.2(1

0081

3–10

6233

4)�

59*

20p1

35,

040

AL

0497

61.1

1(10

942–

1598

3)�

129

Yq1

1.22

312

,395

AC

0092

39.3

(231

04–3

5500

)�L

TR

12B

/LT

R12

C/L

TR

12/H

ER

V9

6020

p11.

2165

4A

L03

1673

.19(

3250

5–33

148)

�H

ER

VIP

10F

H/A

luY

(in

3�L

TR

)61

*3q

26.1

4,71

6A

C01

8457

.14(

4424

7–48

964)

�13

0*Y

q11.

223

8,66

1A

C02

1107

.3(7

1095

–797

56)�

Alu

Y/M

ER

11C

62*

3q26

.21,

878

AC

0080

40.7

(241

80–2

6059

)�13

1*Y

q11.

223

7,33

6A

C02

5227

.6(4

6631

–539

68)�

63*

4p13

1,44

6A

C09

6586

.3(2

1933

–233

80)�

Alu

Y13

2*Y

q11.

223

1,39

7A

C00

7320

.3(1

1024

4–11

1627

)�64

*4q

13.1

2,56

9A

C09

7648

.2(1

1974

1–12

2311

)�13

3*Y

q11.

223

9,51

5A

C01

0080

.2(2

8375

–378

88)�

LT

R6A

/LT

R12

C65

*4q

13.3

1,28

8A

C02

4722

.5(1

1136

2–11

2651

)�13

4Y

q11.

223

12,8

88A

C00

6366

.3(7

4013

–869

02)�

HE

RV

IP10

FH

/Alu

Ya5

/LT

R12

C66

*4q

13.3

4,97

6A

C02

4722

.5(1

1265

3–11

7630

)�L

TR

30/L

TR

12/L

1PA

367

*4q

21.2

33,

800

AC

0974

88.2

(137

13–1

7514

)�13

5Y

q11.

223

206

AC

0108

8.3(

205–

412)

�68

*4q

244,

831

AC

1073

81.2

(383

46–4

3178

)�13

6*Y

q11.

223

9,51

5A

C00

9947

.2(4

7848

–573

61)�

LT

R6A

/LT

R12

C69

*4q

28.3

6,62

7A

C09

6763

.2(1

9403

–260

31)�

137

Yq1

1.22

388

6A

C01

0153

.3(2

2418

–233

05)�

704q

28.3

6,63

5A

C10

8867

.3(1

0921

5–11

5851

)�L

TR

12C

/LT

R22

138

Yq1

1.23

886

AC

0167

28.4

(821

88–8

3075

)�71

*4q

32.1

4,13

7A

C00

9567

.8(4

7850

–519

88)�

139*

Yq1

1.23

9,51

5A

C00

6991

.3(1

0765

1–11

7164

)�L

TR

64/L

TR

12C

aH

ER

V-K

(HM

L-5

)lo

ciar

enu

mbe

red

cons

ecut

ivel

y.C

hrom

osom

al(C

hr.)

band

loca

tions

asw

ella

ssi

zes

ofpr

ovir

uses

orre

mna

nts

oflo

ciar

egi

ven.

Acc

essi

onnu

mbe

rsof

finis

hed

Gen

bank

hum

ange

nom

ese

quen

ceen

trie

s,in

clud

ing

nucl

eotid

epo

sitio

nw

ithin

the

entr

yan

dor

ient

atio

n(�

or�

)of

the

HE

RV

-K(H

ML

-5)

port

ion

are

give

nin

the

four

thco

lum

n.T

hela

stco

lum

nlis

tsno

n-H

ER

V-K

(HM

L-5

)re

petit

ive

elem

ents

with

inpr

ovir

alse

quen

ces.

VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8793

Page 8: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

TA

BL

E2.

Ass

ignm

ent

ofL

TR

sequ

ence

sas

soci

ated

with

HE

RV

-K(H

ML

-5)

prov

iral

bodi

esa

No.

Ann

otat

ion

Age

No.

Ann

otat

ion

Age

No.

Ann

otat

ion

Age

Rep

base

Our

sR

epba

seO

urs

Rep

base

Our

s

5�L

TR

3�L

TR

5�L

TR

3�L

TR

5�L

TR

3�L

TR

5�L

TR

3�L

TR

5�L

TR

3�L

TR

5�L

TR

3�L

TR

122

Ano

22A

4822

A(�

)22

A22

A95

2222

B22

B2

22A

22A

22A

22A

50.8

4922

22B

22B

9622

2222

A2

22A

23

2222

(�)

5022

no97

22A

22A

(�)

22A

422

2251

22A

no22

A98

2222

2222

59.1

522

22(�

)52

2222

A(�

)22

A2

9922

no6

no22

5322

A22

A22

A22

A38

.410

0no

no7

22A

22A

22A

22A

5422

2222

2247

.910

122

B(�

)22

B22

B8

22A

22A

22A

22A

5522

2210

222

22am

b.am

b.50

.39

2222

5622

2222

2236

.110

322

22am

b.am

b.36

.710

2222

B22

B57

22B

22B

22B

22B

35.4

104

22A

no22

A11

22B

22B

(�)

22B

5822

2222

A2

105

22no

1222

2259

2222

22A

222

A2

47.7

106

no22

B22

B13

2222

22A

222

A2

40.6

60no

no10

722

B22

B22

B22

B51

.814

22no

2261

no22

22A

210

822

A22

A22

A22

A48

.615

2222

6222

A22

A22

A22

A30

.410

922

no16

22A

(�)

no63

22no

110

2222

2222

47.5

1722

A(�

)no

6422

A22

A22

A22

A52

.511

122

A(�

)22

A22

A18

nono

6522

A22

A(�

)11

2no

2222

B19

nono

6622

A22

A22

A22

A11

322

2220

22A

22A

22A

22A

38.5

6722

22(�

)11

422

22am

b.am

b.38

.721

22A

22A

22A

222

A2

38.2

6822

2222

2268

.611

522

A22

A22

A22

A34

.722

2222

(�)

6922

(�)

2222

A2

116

no22

2223

2222

22A

222

A2

45.3

7022

(�)

2211

7no

2222

B24

22B

22B

22B

22B

32.3

7122

2211

822

/22A

22/2

2A25

2222

7222

2222

2211

9no

no26

22A

22A

22A

22A

54.5

73no

22A

22A

120

22B

22B

22B

22B

86.7

2722

A22

A(�

)74

22A

no22

A12

1no

2222

B28

2222

22A

22A

3875

2222

122

22A

22A

22A

22A

2922

2222

A2

22A

42.2

7622

B22

B22

B22

B43

123

22B

22B

22B

22B

77.9

30no

no77

2222

2222

25.2

124

22(�

)no

3122

2278

22A

(�)

22A

22A

22A

125

22A

no22

A32

22B

22B

22B

22B

37.5

79no

no12

622

22am

b.33

22B

(�)

22B

22B

80no

no12

722

B22

B22

B22

B92

.734

no22

8122

Ano

128

no22

amb.

3522

A22

A(�

)22

A82

22A

22(�

)22

A12

922

22B

3622

22am

b.am

b.59

.783

2222

2222

130

22B

22B

3722

no22

A2

8422

2213

1no

2222

B38

2222

85no

2222

A2

132

nono

39no

22B

22B

8622

Ano

22A

133

no22

4022

B22

B22

B22

B54

.487

2222

2222

56.1

134

2222

4122

22(�

)22

A2

22A

256

.688

22A

22A

22A

22A

41.1

135

nono

42no

no89

22no

136

no22

4322

Ano

22A

90no

22B

137

22no

4422

Ano

22A

9122

B22

22B

138

22no

4522

A22

A22

A22

A68

9222

2222

2252

.813

9no

2246

2222

9322

A22

A22

A22

A52

.647

22A

22A

22A

9422

22A

22A

22A

70.6

aN

umbe

rsin

the

first

colu

mn

corr

espo

ndto

thos

ein

Tab

le1.

Ann

otat

ions

of5�

and

3�L

TR

sac

cord

ing

toR

epea

tMas

ker

and

toou

row

nan

alys

isar

ein

dica

ted.

Diff

eren

ces

betw

een

Rep

eatM

aske

ran

dou

row

nan

nota

tions

are

indi

cate

din

bold

.no,

the

corr

espo

ndin

gL

TR

was

mis

sing

;(�

),L

TR

sha

ving

reve

rse

com

plem

ent

orie

ntat

ions

asan

nota

ted

byR

epea

tMas

ker.

Our

own

anno

tatio

nsin

clud

eL

TR

sfo

rw

hich

subf

amily

assi

gnm

ent

was

poss

ible

.Her

e,“a

mb.

”in

dica

tes

that

ambi

guou

sL

TR

sco

uld

not

beas

sign

edto

apa

rtic

ular

fam

ily;t

hat

is,t

hey

grou

ped

phyl

ogen

etic

ally

betw

een

the

LT

R22

and

LT

R22

Bsu

bfam

ilies

(dat

ano

tsh

own)

.T

heev

olut

iona

ryag

esof

prov

irus

esin

mill

ions

ofye

ars

acco

rdin

gto

Kim

ura-

2-pa

ram

eter

corr

ecte

ddi

stan

ces

are

liste

dfo

rpr

ovir

uses

with

suffi

cien

t5�

and

3�L

TR

port

ions

.

8794 LAVIE ET AL. J. VIROL.

Page 9: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

quenced, in total, 16 clones. The sequences were included inthe neighbor-joining analysis presented in Fig. 2. The nonhu-man sequences grouped among the human sequences; that is,they did not form separate branches or groups. Also, the per-centage of identity with the human consensus sequence wasvery similar: 82 to 91% compared to 78 to 92% for the humansequences. Therefore, our sample sequence data set does notindicate different evolutionary behavior of HERV-K(HML-5)homologous sequences in the examined nonhuman primates.However, more elaborate studies will be required to charac-terize precise evolutionary behavior of HERV-K(HML-5) ho-mologues in primates after their evolutionary separation fromthe human lineage.

DISCUSSION

HERV represent former exogenous retroviruses that arefossilized in the human genome. Closer examination of HERVsequences provides information on ancient primate-targetingretroviruses and retrovirus evolution in general. The com-pleted sequence of the human genome provides an excellentsource of information for such studies. In this paper, we set outto analyze a hitherto little-characterized HERV family,HERV-K(HML-5), in more detail. We found that only 9 out ofthe approximately 139 HML-5 loci (6%) displayed full-lengthretroviral genes flanked by LTRs on both sides. A highernumber of HML-5 loci are defective regarding proviral struc-tures; gag-prt and/or env regions are missing in about 50% ofproviruses, and the start and end points of deletions obviouslycluster within defined regions. Analogous observations wererecently made for HERV-K(HML-3), displaying deletionswithin gag and pol (26). Amplification of deleted proviruses hasbeen observed also for other HERV families, for example,deleted HERV-H elements have increased to high copy num-bers during primate evolution in comparison to full-lengthHERV-H proviruses (24). Recombination on the DNA level,mutations during retroviral reverse transcription, or splicing ofretroviral transcripts before reverse transcription and provirusformation could account for such deleted proviruses. Splicedand reintegrated transcripts have been observed for HERV(12, 21). Spliced human immunodeficiency virus type 1 tran-scripts were also found as cDNA along with full-length retro-viral RNA during infection (20), further supporting provirusformation from spliced proviral transcripts in the evolutionarypast. Besides active amplification of proviral sequences, about5% of proviral loci probably arose passively from chromosomalduplications, owing to the fact that the human genome com-prises about 5% of duplicated sequences (14). In contrast, wedo not find evidence that HML-5 sequences were amplified byL1-mediated pseudogene formation, as recently described forHERV-W (6, 37).

HML-5 loci with both gag-prt and env deletions probablyemerged during reverse transcription by recombination be-tween RNA transcripts from proviruses having either deletion.In combination with HERV-K(HML-3) deletion variants (26),we note that despite the lack of one or more proviral regions,the 5� intergenic and gag portions are usually present in obvi-ously (retrovirus-like) retrotransposed proviruses. Those ret-roviral regions are known to encode the packaging signal �that interacts on the RNA level with the Gag-encoded nucleo-

capsid (NC) protein. One may therefore hypothesize that in-teraction between retroviral RNA and NC is still essentialduring intracellular (retrovirus-like) retrotransposition ofHERV. Alternatively, helper viruses could have been involvedin the formation of new proviruses, and a packaging signalcould have interacted with the helper virus’ NC protein. Alsoin combination with previous results (9, 26), deletions withinthe env gene seem to occur recurrently, rendering the Envprotein nonfunctional. Env-demolishing mutations could haveresulted in decreasing production of infectious retrovirus. Suchdecrease could have significantly added to the fading of exog-enous stages, and env deletions may therefore represent an-other cause for the extinction of exogenous stages.

Our study shows that HML-5 sequences were fixed in anancestral genome after the simian lineage had evolutionaryseparated from the prosimian lineage but before the evolution-ary separation of Old World and New World primates, indi-cated by provirus ages of approximately 55 million years andcorroborated by PCR examination of various primate species.Other HERV-K superfamily members were fixed in an ances-tral genome approximately 30 to 40 million years ago, after theevolutionary split of Old World from New World primates (26,28, 34, 39). To the best of our knowledge, HML-5 representsthe oldest betaretrovirus in the primate genome known to date.Despite a relatively high proportion of incomplete HML-5proviruses in the human genome, our study generated a con-sensus sequence displaying the four major retroviral ORFs gag,prt, pol, and env. The corresponding proteins displayed signif-icant similarities to other betaretroviruses, and the recreatedHML-5 consensus is therefore expected to be very close insequence to the former exogenous precursor to HML-5 en-dogenous variants. Thus, this study reconstructed an ancientbetaretrovirus that was targeting primates about 55 millionyears ago.

Phylogenetic analysis of several proviral regions indicatedsimilar sequence divergence among HML-5 sequences, result-ing in almost monophyletic tree structures. This finding indi-cates that probably all HML-5 sequences were generated in arelatively brief evolutionary period around 55 million yearsago, the latter as revealed by LTR-LTR divergences and di-vergence from the consensus sequence. Formation of new pro-viruses then obviously ceased. Proviruses with deletions in gag-prt and env were probably also generated in a relatively shortperiod of time. This observation is further confirmed by PCRanalysis that revealed gag and env deletions in all tested HML-5-positive primate species. Thus, different HML-5 “masterproviruses” with different genome structures were obviouslyactive and generated provirus progeny. In this manner, HML-5displays similar behavior as HERV-K(HML-3) (26). However,both families display different behavior than HERV-K(HML-2), which formed proviruses during the hominoid period aswell as in recent human evolution (5, 7, 28, 33).

Phylogenetic analysis of LTR sequences associated with pro-viral bodies revealed several apparent LTR families. However,phylogenetic analysis of proviral body sequences yieldedmonophyletic tree topologies for most examined regions. Onlya region between reverse transcriptase and RNase H displayedbranches with higher bootstrap support. Clearly, proviral bod-ies appear much more homogeneous in sequence than theassociated LTR sequences. Thus, almost homogeneous provi-

VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8795

Page 10: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

FIG. 3. Neighbor-joining analysis of 134 reasonably intact HERV-K(HML-5) LTR sequences. Shown here is a consensus tree from 1,000bootstraps replicates. Specific bootstrap values at major nodes are indicated. Consensus sequences, as given in Repbase (Rep) and as generatedin this study (own), were also included in the analysis. LTR22 and LTR22A sequences are clearly distinguished, with the latter comprising theso-called LTR22A2 group. Other sequences belong to the LTR22B group with some ambiguous sequences remaining (Table 2). Note that the

8796

Page 11: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

ral bodies were associated with clearly different LTR variantsat the time of provirus formations. It is currently not knownwhether the different LTR variants were already present in theexogenous precursor(s) or represent derivatives from a germline-fixed LTR founder family. In both cases, LTR sequencesevolved in sequence independently from, and obviously more

rapidly than, the proviral bodies. Reasons for apparently dif-ferent evolutionary rates of LTRs and proviral bodies are cur-rently not clear.

Whether HML-5 is ancestral to other HERV-K familiescurrently seems unclear. Both the HML-5 and HML-6 familiesappear less related to the remaining HERV-K families, asevidenced by DNA sequence comparisons (32) and by phylo-genetic comparison of dUTPase domains (30), for instance. In

FIG. 4. Neighbor-joining analysis of HERV-K(HML-5) proviruses.The tree shown here was obtained from the analysis of a gag geneportion (region 1 in Fig. 1) and is a consensus tree from 100 bootstrapreplicates. Numbers of proviral sequences refer to Table 1. Phyloge-netic analysis included HERV-K(HML-5) homologous sequencesfrom the same gag region, obtained by PCR from various primatespecies (for species abbreviations, see the legend for Fig. 5).

FIG. 5. Presence of HERV-K(HML-5) homologous sequences invarious primate species. The schematic depicts localization of PCRprimer pairs within the proviral body 5� end and within the env generegion of the HERV-K(HML-5) consensus sequence, yielding PCRproducts of about 548 bp and about 1,635 or 300 bp. The latter numberconsiders the common presence of deleted env gene regions. PCRresults are presented in the first and second panel, respectively. Am-plification of full-length env genes in all species but the prosimiancould not be reproduced satisfactorily. A schematic of primate phy-logeny and germ line fixation of HERV-K(HML-5) versus otherHERV-K families is shown at the bottom. Species abbreviations are asfollows: for Hominoidea, Hsa is Homo sapiens, Ptr is Pan troglodytes,Ppy is Pongo pygmaeus, and Hla is Hylobates lar; for Old World pri-mates, Cgu is Colobus guereza, Msp is Mandrillus sphinx, and Mmu isMacaca mulatta; for New World primates, Ssc is Saimiri sciureus, Cjais Callithrix jacchus, and Ase is Alouatta seniculus; for prosimians, Nycis Nycticebus coucang.

LTR22 Repbase sequence groups with LTR22B sequences, opposed to LTR22own. Numbers of LTR sequences refer to the proviruses listed inTable 1. The 5� and 3� LTRs of a particular provirus located at immediately neighboring terminal nodes are indicated as 5/3. Note in this contextthat 5� and 3� LTRs of particular proviruses do not always group together, reminiscent of recently suggested HERV-involving genomicrearrangement events (13).

VOL. 78, 2004 CHARACTERIZATION OF HERV-K(HML-5) 8797

Page 12: Human endogenous retrovirus family HERV-K(HML-5): · PDF fileWe deposited the updated HERV-K(HML-5)/HERVK22 and LTR22 consensus sequences in Repbase. ... generated schematic structures

addition, this study revealed that the HML-5 consensus se-quence primer binding site is identical to methionine butclearly less similar to lysine tRNA 3� ends; therefore, it isactually a HERV-M family and adds to the lesser phylogeneticrelationship with HERV-K. However, HML-5 is still morerelated to HERV-K than to other (beta)retroviruses. Also inthe course of this study, when employing an updated tRNAsequence data set (22) we noted that the previously character-ized HERV-K(HML-3) family primer binding site region (26)is more related to arginine and asparagine than to lysine tRNA3� ends, therefore actually requiring designation HERV-R orHERV-N. At the present time, the precise phylogenetic rela-tionship of HML-5 to other HERV-Ks as well as endogenousand exogenous retroviruses must await further specific inves-tigations. We are currently studying the remaining HERV-Kfamilies in a similar fashion. Certainly, results for other hith-erto little-described HERV families will significantly contrib-ute to such studies.

ACKNOWLEDGMENTS

This work was supported by grants from the Deutsche Forschungs-gemeinschaft to J.M. (Ma2298/2-1) and E.M. (Me917/16-1). P.M. issupported by a fellowship from the Knut and Alice Wallenberg Foun-dation and grants from the Swedish Research Council, Ake WibergFoundation, and Magn Bergvall Foundation. W.S. is supported byDFG grant SCH214/7-3.

REFERENCES

1. Andersson, M. L., M. Lindeskog, P. Medstrand, B. Westley, F. May, and J.Blomberg. 1999. Diversity of human endogenous retrovirus class II-like se-quences. J. Gen. Virol. 80:255–260.

2. Berkhout, B., M. Jebbink, and J. Zsiros. 1999. Identification of an activereverse transcriptase enzyme encoded by a human endogenous HERV-Kretrovirus. J. Virol. 73:2365–2375.

3. Blond, J. L., D. Lavillette, V. Cheynet, O. Bouton, G. Oriol, S. Chapel-Fernandes, B. Mandrand, F. Mallet, and F. L. Cosset. 2000. An envelopeglycoprotein of the human endogenous retrovirus HERV-W is expressed inthe human placenta and fuses cells expressing the type D mammalian ret-rovirus receptor. J. Virol. 74:3321–3329.

4. Boissinot, S., and A. V. Furano. 2001. Adaptive evolution in LINE-1 retro-transposons. Mol. Biol. Evol. 18:2186–2194.

5. Buzdin, A., S. Ustyugova, K. Khodosevich, I. Mamedov, Y. Lebedev, G.Hunsmann, and E. Sverdlov. 2003. Human-specific subfamilies of HERV-K(HML-2) long terminal repeats: three master genes were active simulta-neously during branching of hominoid lineages. Genomics 81:149–156.

6. Costas, J. 2002. Characterization of the intragenomic spread of the humanendogenous retrovirus family HERV-W. Mol. Biol. Evol. 19:526–533.

7. Costas, J. 2001. Evolutionary dynamics of the human endogenous retrovirusfamily HERV-K inferred from full-length proviral genomes. J. Mol. Evol.53:237–243.

8. Dangel, A. W., B. J. Baker, A. R. Mendoza, and C. Y. Yu. 1995. Complementcomponent C4 gene intron 9 as a phylogenetic marker for primates: longterminal repeats of the endogenous retrovirus ERV-K(C4) are a molecularclock of evolution. Immunogenetics 42:41–52.

9. de Parseval, N., V. Lazar, J. F. Casella, L. Benit, and T. Heidmann. 2003.Survey of human genes of retroviral origin: identification and transcriptomeof the genes with coding capacity for complete envelope proteins. J. Virol.77:10414–10422.

10. Felsenstein, J. 1993. PHYLIP (Phylogeny Inference Package) version 3.5c.Department of Genetics SK-50, University of Washington, Seattle.

11. Gifford, R., and M. Tristem. 2003. The evolution, distribution and diversityof endogenous retroviruses. Virus Genes 26:291–315.

12. Goodchild, N. L., J. D. Freeman, and D. L. Mager. 1995. Spliced HERV-Hendogenous retroviral sequences in human genomic DNA: evidence foramplification via retrotransposition. Virology 206:164–173.

13. Hughes, J. F., and J. M. Coffin. 2001. Evidence for genomic rearrangementsmediated by human endogenous retroviruses during primate evolution. Nat.Genet. 29:487–489.

14. International Human Genome Sequencing Consortium. 2001. Initial se-quencing and analysis of the human genome. Nature 409:860–921.

15. Jurka, J. 2000. Repbase update: a database and an electronic journal ofrepetitive elements. Trends Genet. 16:418–420.

16. Kapitonov, V., and J. Jurka. 1996. The age of Alu subfamilies. J. Mol. Evol.42:59–65.

17. Kimura, M. 1980. A simple method for estimating evolutionary rates of basesubstitutions through comparative studies of nucleotide sequences. J. Mol.Evol. 16:111–120.

18. Kitamura, Y., T. Ayukawa, T. Ishikawa, T. Kanda, and K. Yoshiike. 1996.Human endogenous retrovirus K10 encodes a functional integrase. J. Virol.70:3302–3306.

19. Lebedev, Y. B., O. S. Belonovitch, N. V. Zybrova, P. P. Khil, S. G. Kurdyukov,T. V. Vinogradova, G. Hunsmann, and E. D. Sverdlov. 2000. Differences inHERV-K LTR insertions in orthologous loci of humans and great apes.Gene 247:265–277.

20. Liang, C., J. Hu, R. S. Russell, M. Kameoka, and M. A. Wainberg. 2004.Spliced human immunodeficiency virus type 1 RNA is reverse transcribedinto cDNA within infected cells. AIDS Res. Hum. Retrovir. 20:203–211.

21. Lindeskog, M., and J. Blomberg. 1997. Spliced human endogenous retroviralHERV-H env transcripts in T-cell leukaemia cell lines and normal leuko-cytes: alternative splicing pattern of HERV-H transcripts. J. Gen. Virol.78:2575–2585.

22. Lowe, T. M., and S. R. Eddy. 1997. tRNAscan-SE: a program for improveddetection of transfer RNA genes in genomic sequence. Nucleic Acids Res.25:955–964.

23. Lower, R., J. Lower, and R. Kurth. 1996. The viruses in all of us: character-istics and biological significance of human endogenous retrovirus sequences.Proc. Natl. Acad. Sci. USA 93:5177–5184.

24. Mager, D. L., and J. D. Freeman. 1995. HERV-H endogenous retroviruses:presence in the New World branch but amplification in the Old Worldprimate lineage. Virology 213:395–404.

25. Mager, D. L., and P. Medstrand. 2003. Retroviral repeat sequences. In D.Cooper (ed.), Nature encyclopedia of the human genome. Macmillan Pub-lishers Ltd., Hampshire, England.

26. Mayer, J., and E. Meese. 2002. The human endogenous retrovirus familyHERV-K(HML-3). Genomics 80:331–343.

27. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Chromosomal assign-ment of human endogenous retrovirus K (HERV-K) env open readingframes. Cytogenet. Cell Genet. 79:157–161.

28. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1998. Human endogenousretrovirus K homologous sequences and their coding capacity in Old Worldprimates. J. Virol. 72:1870–1875.

29. Mayer, J., E. Meese, and N. Mueller-Lantzsch. 1997. Multiple human en-dogenous retrovirus (HERV-K) loci with gag open reading frames in thehuman genome. Cytogenet. Cell Genet. 78:1–5.

30. Mayer, J., and E. U. Meese. 2003. Presence of dUTPase in the varioushuman endogenous retrovirus K (HERV-K) families. J. Mol. Evol. 57:642–649.

31. Mayer, J., M. Sauter, A. Racz, D. Scherer, N. Mueller-Lantzsch, and E.Meese. 1999. An almost-intact human endogenous retrovirus K on humanchromosome 7. Nat. Genet. 21:257–258.

32. Medstrand, P., and J. Blomberg. 1993. Characterization of novel reversetranscriptase encoding human endogenous retroviral sequences similar totype A and type B retroviruses: differential transcription in normal humantissues. J. Virol. 67:6778–6787.

33. Medstrand, P., and D. L. Mager. 1998. Human-specific integrations of theHERV-K endogenous retrovirus family. J. Virol. 72:9782–9787.

34. Medstrand, P., D. L. Mager, H. Yin, U. Dietrich, and J. Blomberg. 1997.Structure and genomic organization of a novel human endogenous retrovirusfamily: HERV-K (HML-6). J. Gen. Virol. 78:1731–1744.

35. Morgenstern, B. 1999. DIALIGN 2: improvement of the segment-to-seg-ment approach to multiple sequence alignment. Bioinformatics 15:211–218.

36. Mueller-Lantzsch, N., M. Sauter, A. Weiskircher, K. Kramer, B. Best, M.Buck, and F. Grasser. 1993. Human endogenous retroviral element K10(HERV-K10) encodes a full-length gag homologous 73-kDa protein and afunctional protease. AIDS Res. Hum. Retrovir. 9:343–350.

37. Pavlicek, A., J. Paces, D. Elleder, and J. Hejnar. 2002. Processed pseudo-genes of human endogenous retroviruses generated by LINEs: their integra-tion, stability, and distribution. Genome Res. 12:391–399.

38. Reus, K., J. Mayer, M. Sauter, H. Zischler, N. Muller-Lantzsch, and E.Meese. 2001. HERV-K(OLD): ancestor sequences of the human endoge-nous retrovirus family HERV-K(HML-2). J. Virol. 75:8917–8926.

39. Seifarth, W., C. Baust, A. Murr, H. Skladny, F. Krieg-Schneider, J. Blusch,T. Werner, R. Hehlmann, and C. Leib-Mosch. 1998. Proviral structure,chromosomal location, and expression of HERV-K-T47D, a novel humanendogenous retrovirus derived from T47D particles. J. Virol. 72:8384–8391.

40. Smit, A. F. A. 1996. Structure and evolution of mammalian interspersedrepeats. Ph.D. dissertation. University of Southern California, Los Angeles.

41. Sverdlov, E. D. 2000. Retroviruses and primate evolution. Bioessays 22:161–171.

42. Thompson, J. D., D. G. Higgins, and T. J. Gibson. 1994. CLUSTAL W:improving the sensitivity of progressive multiple sequence alignment throughsequence weighting, position-specific gap penalties and weight matrix choice.Nucleic Acids Res. 22:4673–4680.

43. Tristem, M. 2000. Identification and characterization of novel human en-dogenous retrovirus families by phylogenetic screening of the human ge-nome mapping project database. J. Virol. 74:3715–3730.

8798 LAVIE ET AL. J. VIROL.