Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus:...

5
Proc. Natl. Acad. Sci. USA Vol. 90, pp. 10539-10543, November 1993 Microbiology RNA sequence of astrovirus: Distinctive genomic organization and a putative retrovirus-like ribosomal frameshifting signal that directs the viral replicase synthesis BAOMING JIANG*t, STEPHAN S. MONROE*, EUGENE V. KOONINt, SARAH E. STINE*, AND ROGER I. GLASS* *Viral Gastroenteritis Section, Centers for Disease Control and Prevention, Atlanta, GA 30333; and tNational Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894 Communicated by Bernard Fields, July 27, 1993 ABSTRACT The genomic RNA of human astrovirus was sequenced and found to contain 6797 nt organized into three open reading frames (la, lb, and 2). A potential ribosomal frameshift site identified in the overlap region of open reading frames la and lb consists of a "shifty" heptanucleotide and an RNA stem-loop structure that closely resemble those at the gag-pro junction of some retroviruses. This translation frame- shift may result in the suppression of in-frame amber termi- nation at the end of open reading frame la and the synthesis of a nonstructural, fusion polyprotein that contains the putative protease and RNA-dependent RNA polymerase. Comparative sequence analysis indicated that the protease and polymerase of astrovirus are only distantly related to the respective enzymes of other positive-strand RNA viruses. The astrovirus polypro- tein lacks the RNA helicase domain typical of other positive- strand RNA viruses of similar genome size. The genomic organization and expression strategy of astrovirus, with the protease and the polymerase brought together by predicted frameshift, most dosely resembled those of plant luteoviruses. Specific features of the sequence and genomic organization support the classification of astroviruses as an additional family of positive-strand RNA viruses, designated Astroviridae. Astroviruses were originally identified from the feces of infants with gastroenteritis on the basis of distinctive ultra- structural features: five- or six-pointed surface stars are characteristic of this agent (1). These nonenveloped agents were subsequently determined to be positive-strand RNA viruses (2, 3). Five serotypes have been defined to date based on their distinct antigenicity (4). Astroviruses cause acute gastroenteritis in children and adults worldwide (5), but the disease burden has been difficult to determine because of the lack of sensitive diagnostic assays. Recent studies have shown that astroviruses are more frequently found in children with diarrhea than was previously thought (6). In addition, astroviruses have been detected in the diarrheal feces of various animals (5). Studies of the biochemical properties of purified particles have provided divergent results' on the number and size of proteins in astroviruses; two to six polypeptides have been reported, ranging in size from 5.5 kDa to 42 kDa (5). Although the fastidious growth of astroviruses in vitro has hindered characterization of the genome, several investigators (3, 7) have reported partial sequence information from both inter- nal regions and the 3' end of human astrovirus serotype 1 (H-Astl), and we have recently sequenced and characterized the subgenomic RNA of serotype 2 (H-Ast2; ref. 8). How- ever, the complete sequence and the genomic organization of astroviruses remained unknown, and their classification was tentative. In the present study, we have sequenced and analyzed the entire genomic RNA of H-Ast2§ and compared the sequence and genomic structure with those of other positive-strand RNA viruses. The results highlight the specific genomic organization of astroviruses and support their classification in a separate virus family. MATERIALS AND METHODS Cells and Virus. LLCMK2 cells (ATCC CCL 7.1) were propagated in Earle's minimal essential medium (MEM) supplemented with antibiotics-and 10% fetal bovine serum. H-Ast2 was obtained from John Kurtz (Oxford, England) and used to infect LLCMK2 cells in MEM/trypsin at 5 pg/ml as described (2). Virions were partially purified from infected cell lysates by centrifuging through a 30% (wt/vol) sucrose cushion, suspension in TNE buffer [0.05 M Tris (pH 7.5)/0.1 M NaCl/5 mM EDTA]/1% SDS, and extraction with phenol/ chloroform. Virion RNA was precipitated with 2 M LiCl and used for both sequencing and PCR assays. cDNA Synthesis and Sequencing. Single-stranded cDNA was synthesized from virion RNA with Super reverse tran- scriptase (Molecular Genetics Resources, Tampa, FL) by using primers derived originally from cDNA sequence (8) and subsequently from sequences determined by directly se- quencing virion RNA, using a "primer walking" technique. DNA fragments of various length were amplified by the PCR assay with Taq polymerase (Perkin-Elmer) and virus-specific primers. Sequences were determined from three sources: virion RNA, PCR DNA, and cDNA clones (8). Virion RNA was directly sequenced by using an RNA sequencing kit (Boehringer Mannheim). Both the PCR DNA and the cloned cDNA were sequenced by using the Sequenase version 2.0 DNA sequencing kit (United States Biochemical). Sequences on both strands of DNA were determined with each base sequenced at least four times. Sequences were assembled and aligned by using the Genetics Computer Group sequence- analysis package (9), and a consensus sequence was derived. Sequences of the 5' and 3' ends of the genomic RNA were determined by following the procedure of Lambden et al. (10). Briefly, a synthetic primer 1 was ligated to the 3' ends of virion RNA or cDNA corresponding to the 5' end of virion RNA with T4 RNA ligase (GIBCO/BRL). cDNA fragments (400-600 bp) spanning either the 5' or the 3' ends were produced by PCR amplification using a primer 2 complemen- Abbreviations: H-Ast2, human astrovirus serotype 2; RdRp, RNA- dependent RNA polymerase; NLS, nuclear localization signal; ORF, open reading frame; RHDV, rabbit hemorrhagic disease virus. tTo whom reprint requests should be addressed at: Mailstop G04, Centers for Disease Control and Prevention, 1600 Clifton Road, Atlanta, GA 30333. §The sequence reported in this paper has been deposited in the GenBank data base (accession no. L13745). 10539 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Transcript of Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus:...

Page 1: Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus: Distinctive genomicorganization and ... (AAAAAAC)from position 2791 to 2797, followed by a

Proc. Natl. Acad. Sci. USAVol. 90, pp. 10539-10543, November 1993Microbiology

RNA sequence of astrovirus: Distinctive genomic organization anda putative retrovirus-like ribosomal frameshifting signal thatdirects the viral replicase synthesisBAOMING JIANG*t, STEPHAN S. MONROE*, EUGENE V. KOONINt, SARAH E. STINE*, AND ROGER I. GLASS**Viral Gastroenteritis Section, Centers for Disease Control and Prevention, Atlanta, GA 30333; and tNational Center for Biotechnology Information, NationalInstitutes of Health, Bethesda, MD 20894

Communicated by Bernard Fields, July 27, 1993

ABSTRACT The genomic RNA of human astrovirus wassequenced and found to contain 6797 nt organized into threeopen reading frames (la, lb, and 2). A potential ribosomalframeshift site identified in the overlap region of open readingframes la and lb consists of a "shifty" heptanucleotide and anRNA stem-loop structure that closely resemble those at thegag-pro junction of some retroviruses. This translation frame-shift may result in the suppression of in-frame amber termi-nation at the end of open reading frame la and the synthesis ofa nonstructural, fusion polyprotein that contains the putativeprotease and RNA-dependent RNA polymerase. Comparativesequence analysis indicated that the protease and polymerase ofastrovirus are only distantly related to the respective enzymesof other positive-strand RNA viruses. The astrovirus polypro-tein lacks the RNA helicase domain typical of other positive-strand RNA viruses of similar genome size. The genomicorganization and expression strategy of astrovirus, with theprotease and the polymerase brought together by predictedframeshift, most dosely resembled those of plant luteoviruses.Specific features of the sequence and genomic organizationsupport the classification of astroviruses as an additional familyof positive-strand RNA viruses, designated Astroviridae.

Astroviruses were originally identified from the feces ofinfants with gastroenteritis on the basis of distinctive ultra-structural features: five- or six-pointed surface stars arecharacteristic of this agent (1). These nonenveloped agentswere subsequently determined to be positive-strand RNAviruses (2, 3). Five serotypes have been defined to date basedon their distinct antigenicity (4). Astroviruses cause acutegastroenteritis in children and adults worldwide (5), but thedisease burden has been difficult to determine because of thelack of sensitive diagnostic assays. Recent studies haveshown that astroviruses are more frequently found in childrenwith diarrhea than was previously thought (6). In addition,astroviruses have been detected in the diarrheal feces ofvarious animals (5).

Studies of the biochemical properties of purified particleshave provided divergent results' on the number and size ofproteins in astroviruses; two to six polypeptides have beenreported, ranging in size from 5.5 kDa to 42 kDa (5). Althoughthe fastidious growth of astroviruses in vitro has hinderedcharacterization of the genome, several investigators (3, 7)have reported partial sequence information from both inter-nal regions and the 3' end of human astrovirus serotype 1(H-Astl), and we have recently sequenced and characterizedthe subgenomic RNA of serotype 2 (H-Ast2; ref. 8). How-ever, the complete sequence and the genomic organization ofastroviruses remained unknown, and their classification wastentative.

In the present study, we have sequenced and analyzed theentire genomic RNA of H-Ast2§ and compared the sequenceand genomic structure with those of other positive-strandRNA viruses. The results highlight the specific genomicorganization of astroviruses and support their classificationin a separate virus family.

MATERIALS AND METHODSCells and Virus. LLCMK2 cells (ATCC CCL 7.1) were

propagated in Earle's minimal essential medium (MEM)supplemented with antibiotics-and 10% fetal bovine serum.H-Ast2 was obtained from John Kurtz (Oxford, England) andused to infect LLCMK2 cells in MEM/trypsin at 5 pg/ml asdescribed (2). Virions were partially purified from infectedcell lysates by centrifuging through a 30% (wt/vol) sucrosecushion, suspension in TNE buffer [0.05 M Tris (pH 7.5)/0.1M NaCl/5 mM EDTA]/1% SDS, and extraction with phenol/chloroform. Virion RNA was precipitated with 2 M LiCl andused for both sequencing and PCR assays.cDNA Synthesis and Sequencing. Single-stranded cDNA

was synthesized from virion RNA with Super reverse tran-scriptase (Molecular Genetics Resources, Tampa, FL) byusing primers derived originally from cDNA sequence (8) andsubsequently from sequences determined by directly se-quencing virion RNA, using a "primer walking" technique.DNA fragments of various length were amplified by the PCRassay with Taq polymerase (Perkin-Elmer) and virus-specificprimers. Sequences were determined from three sources:virion RNA, PCR DNA, and cDNA clones (8). Virion RNAwas directly sequenced by using an RNA sequencing kit(Boehringer Mannheim). Both the PCR DNA and the clonedcDNA were sequenced by using the Sequenase version 2.0DNA sequencing kit (United States Biochemical). Sequenceson both strands of DNA were determined with each basesequenced at least four times. Sequences were assembled andaligned by using the Genetics Computer Group sequence-analysis package (9), and a consensus sequence was derived.Sequences of the 5' and 3' ends of the genomic RNA were

determined by following the procedure of Lambden et al.(10). Briefly, a synthetic primer 1 was ligated to the 3' endsof virion RNA orcDNA corresponding to the 5' end of virionRNA with T4 RNA ligase (GIBCO/BRL). cDNA fragments(400-600 bp) spanning either the 5' or the 3' ends wereproduced by PCR amplification using a primer 2 complemen-

Abbreviations: H-Ast2, human astrovirus serotype 2; RdRp, RNA-dependent RNA polymerase; NLS, nuclear localization signal; ORF,open reading frame; RHDV, rabbit hemorrhagic disease virus.tTo whom reprint requests should be addressed at: Mailstop G04,Centers for Disease Control and Prevention, 1600 Clifton Road,Atlanta, GA 30333.§The sequence reported in this paper has been deposited in theGenBank data base (accession no. L13745).

10539

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 2: Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus: Distinctive genomicorganization and ... (AAAAAAC)from position 2791 to 2797, followed by a

Proc. Natl. Acad. Sci. USA 90 (1993)

5'

Genome I1

Subgenomic RNA

3,AA

6797

AA

4314 6797

ORF lb

2773 Pol 4329

MB ProMB

V RFS

NLS2842

I I .a I I I . .I. I I I I I I I I I I I I I I I I I I I a I a I I I I I .2,000 3,000 4,000 5,000

Nucleotides

.. I ........

6,000

FIG. 1. Genomic organization of human astrovirus. The locations of three ORFs, predicted transmembrane helices (MB), protease (Pro),nuclear localization signal (NLS), ribosomal frameshift structure (RFS), and RNA-dependent RNA polymerase (Pol) are indicated. ORFs laand lb encode a putative nonstructural polyprotein, and ORF 2 codes for a capsid-protein precursor.

tary to the primer 1 and virus-specific primers and weresequenced by using internal primers.Comparative Sequence Analysis. Both nucleotide and de-

duced amino acid sequences were compared by using theBLAST program (11) and the BLOsUM62 matrix (12). Multiplealignments were done by using the OPrAL or MACAW pro-grams (13, 14). A phylogenetic tree was constructed by usingclustering unweighted pairwise group maximum averages(UPGMA), neighbor-joining, least-square (Fitch-Margo-liash), and protein-parsimony algorithms as implemented inthe PHYLIP package (15).

RESULTS AND DISCUSSIONThe genomic RNA of H-Ast2 is 6797 nt in length, excluding31 adenines [poly(A) tail] at the 3' end. The genome possessesthree overlapping open reading frames (ORFs la, lb, and 2;Fig. 1). The sequences surrounding the first AUG codons ofORFs la and 2 are predicted to be optimal for initiatingtranslation (16). ORF la is preceded by 82 untranslatednucleotides and encodes a polypeptide of 920 aa. Interest-ingly, ORF lb, which overlaps ORF la by 70 nt, is in readingframe +1, and its first AUG codon, which is predicted to beweak, is located 380 nt downstream of the ORF la termina-tion codon. ORF 2, present also in the subgenomic RNA,overlaps ORF lb by 5 nt, begins with an initiation codon atnt 4325, and ends with a stop codon 82 bases from the 3' end.ORF 2 codes for a capsid-protein precursor of 796 aa with apredicted molecular mass of 88 kDa (8).The existence of two separate ORFs (la and lb) located in

two different reading frames prompted us to examine the70-nt overlap region in more detail. A potential ribosomalframeshift signal was identified, consisting of the "shifty"'heptanucleotide (AAAAAAC) from position 2791 to 2797,followed by a stem-loop structure that may form apseudoknot with a downstream sequence (Fig. 2A). Theputative frameshift signal of the astrovirus showed a strikingresemblance to those at the gag-pro junction of some retro-viruses, such as mouse mammary tumor virus (Fig. 2B) andfit perfectly the simultaneous tRNA slippage model of -1frameshifting described for the synthesis of the gag-relatedpolyproteins (18). Ribosomal frameshifting recently has beenshown to be a normal expression mechanism in several

groups of positive-strand RNA viruses-namely, animalcoronaviruses and arteriviruses, and plant luteoviruses anddianthoviruses (19-22). However, the putative frameshiftingsignal of astrovirus was much less similar to the frameshiftregions of these viruses than to those of some retroviruses(data not shown). The ribosomal frameshifting during trans-lation of astrovirus RNA probably directs the synthesis ofanORF la/lb fusion nonstructural polyprotein of 1416 aa witha predicted molecular mass of 161 kDa.The nucleotide sequence of the astrovirus genomic RNA

and the deduced amino acid sequences of the nonstructuralpolyprotein and the capsid protein of H-Ast2 were comparedwith partial sequences available for H-Astl. Between sero-types, the ribosomal frameshifting region was completelyconserved (data not shown), and the amino acid sequencewas highly conserved (>90%o identical) in a portion of thenonstructural polyproteinsl (3) but was less conserved (51-56% identical) in the predicted capsid-protein regions (3, 7).Of interest, a region in the C-terminal portion of the non-structural polyprotein was significantly similar to the putativeRNA-dependent RNA polymerases (RdRps) of plant bymo-viruses (P = 0.015) and potyviruses (P = 0.095). This regioncontained the eight conserved motifs typical of the positive-strand RNA virus RdRps, indicating that it belongs to theso-called supergroup I, which includes the polymerases ofpicornaviruses, caliciviruses, potyviruses, and several othergroups of plant viruses (Fig. 3A; refs. 29 and 30).Comparison of the protein sequence of astrovirus with a

data base of sequences of other positive-strand RNA virusesidentified a region of similarity with RHDV that included theputative catalytic cysteine of the RHDV protease. Using thepreviously published alignments of chymotrypsin-relatedproteases of positive-strand RNA viruses, we identified, inthe putative protease domain of astrovirus, the conservedsegments surrounding the three catalytic amino acid residuesand a fourth distal segment implicated in substrate binding(Fig. 3B; ref. 31). In addition, sequences of the putativeproteases of H-Ast2, RHDV, and feline calicivirus were

lWillcocks, M. M. & Carter, M. J., Third International Symposiumon Positive Strand RNA Viruses, Sept. 19-24, 1992, Clearwater,FL, pp. 2-47 (abstr.).

LORF la

0

ORF 2

1 i~~~~~~~~~~~~~~~~~~~~~~

1,000

4325 6712

_ _ --

-

3

. . . . . . . . . . . . . . . . . . . .

10540 Microbiology: Jiang et al.

Page 3: Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus: Distinctive genomicorganization and ... (AAAAAAC)from position 2791 to 2797, followed by a

Proc. Natl. Acad. Sci. USA 90 (1993) 10541

A. Astrovirus-G A

CC

AA A

2810GG

2790 2800 C

GCCCCAAAAAACUACAAA CAAAAUUAUCACUCAtIMUGCAIGGAAAUCAUORF lb: P K K L Q I I I IORF la: A P K N Y K 2830 2840 2850 2860la-lb A P K K L Q

B. MMTVC C

A

GC

AAUUCAAAAAACUUG G C GCUCAAAAGGGGGAUGGAGUUpro: F K K L Lgag: N S K N L *gag-pro: N S K K L L

aligned. For a 118-aa residue overlap, an adjusted alignmentscore of 5.4 SDs above random expectation with an evolu-tionary distance of 214 was observed, values indicating agenuine evolutionary and functional relationship, given thatadditional evidence (e.g., conservation of specific functionalmotifs) is available (32).An important feature of the putative protease of H-Ast2 is

the substitution of serine for the catalytic cysteine found inmost positive-strand RNA-virus proteases of superfamily I.Previously, an analogous substitution was found in the pu-tative proteases of sobemoviruses, luteoviruses, and arteri-viruses (Fig. 3B; refs. 20, 31, 33, 34). However, the putativeprotease of H-Ast2 showed less similarity to these viralproteases than to the cysteine proteases of caliciviruses.An extensive search of the astrovirus nonstructural poly-

protein sequence for motifs defining other conserved do-mains of positive-strand RNA viruses failed to identify can-didate regions for an RNA helicase, methyltransferase, orpapain-like protease (35-37). Absence of the helicase domainis remarkable because this domain has always been identifiedin positive-strand RNA viruses with genomes >6000 nt (38).Absence of the methyltransferase domain suggested thatastrovirus may encode VPg, a viral protein covalently linkedto the 5' end of the viral genome (39, 40), a conjecturecompatible with the affinity of the putative H-Ast2 polymer-ase with supergroup I RdRps, which mostly belong to VPg-containing viruses (29, 30).

Additional features detected by computer analysis of thenonstructural polyprotein ofH-Ast2 included four transmem-brane a-helices and a NLS (Fig. 1). The transmembranehelices were located in the region upstream of the proteaseand may be involved in anchoring the viral RNA replicationcomplex in the membrane, as described for the 3A or 3ABproteins ofpoliovirus (41). In all positive-strand RNA virusesfor which the VPg domain has been localized, it is foundwithin a short region between a (putative) transmembranesegment and the protease (E.V.K., unpublished data) and islinked to the 5' end of the viral RNA by a tyrosine or a serine

FIG. 2. The putative ribosomal frameshiftingsignal in the astrovirus genome. (A) Nucleotidesequence and predicted RNA secondary structurein the overlap region of astrovirus ORFs la and lb.The putative frameshift site ("shifty" heptanucle-otide sequence) is underlined, and the terminationcodon for ORF la is boxed. A potential pseudoknotstructure was predicted by searching the regiondownstream of the stem-loop structure for se-quences complementary to the loop sequence.Three base pairs may be sufficient for thepseudoknot formation (17), but the formation of alarger "secondary" stem with a noncanonical GApair (shown by a dotted line) and two additionalcanonical base pairs is also possible. The deducedamino acid sequences of ORFs la, lb, and la-lbsurrounding the frameshift site are shown. (B) Nu-cleotide sequence and predicted RNA secondarystructure in the gag-pro overlap region of mousemammary tumor virus (MMTV) (18) are shown forcomparison. The frameshift site, the terminationcodon, and the RNA pseudoknot are indicated ordescribed as in A.

residue (39, 40). This region of the H-Ast2 polyprotein has noappropriately located tyrosines and has only one serine(Ser-420), suggesting that this serine may be the RNA-linkingamino acid of VPg. The NLS, spanning aa 666-682, isidentical to that of H-Astl. This signal may be involved intransport of astrovirus proteins to the nucleus, as substanti-ated by the fact that astrovirus products were detected byimmunofluorescence in the nucleus of bovine astrovirus-infected cells (42). The astrovirus NLS perfectly fits theconsensus for the bipartite-signal motif comprising two clus-ters of basic amino acid residues separated by a 10-aa spacerregion (43). In a curious analogy, both the protease and theRdRp of potyviruses contain similar NLSs and are accumu-lated in the nuclei of infected plant cells (44, 45).Although the data base screening failed to detect other

sequences significantly similar to the capsid protein ofH-Ast2,direct comparison of this capsid sequence with the sequencesof other positive-strand RNA virus-capsid proteins identifieda conserved domain (from position 107 to 286) with hepatitisE virus (from position 159 to 337), an agent phylogeneticallyremote from astrovirus and other supergroup I viruses in termsof the comparison ofRdRps and the other principal nonstruc-tural domains (data not shown; ref. 46). Because both astro-virus and hepatitis E virus replicate in the human gut, thisconserved domain might have resulted from a recombinationalevent during coinfection. Of interest, astrovirus-like particleshave been reported (47) in association with fatal hepatitis inducklings, suggesting a possible hepatic tropism for this virus.To gain further insight into the evolutionary relationship of

astroviruses, we generated a tentative phylogenetic tree (15)for the supergroup I RdRps, including the H-Ast2 sequence.The result showed that astroviruses constitute a distinctevolutionary lineage not closely associated with any othergroup of viruses (Fig. 4). We are inclined to interpret therelatively high similarity with the RdRps of bymoviruses andpotyviruses (see above) as conservation of ancestral featuresrather than direct evidence of common origin.Our data show that astrovirus has no close relatives among

other viruses, as demonstrated by comparative sequence

Microbiology: Jiang et al.

Page 4: Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus: Distinctive genomicorganization and ... (AAAAAAC)from position 2791 to 2797, followed by a

10542 Microbiology: Jiang et al. Proc. Natl. Acad. Sci. USA 90 (1993)

AI

..K.E.FLKKEI***S

SLKAELSLKAELGLKDELGLKDELALKDELYVKDELFLKDELFLKDEICLKDELCPKDELFVKGEPFVKQEP

13

121212121312121212121212

II.&.& R&& . &.

IVCADPIYTRIGA-CLEAHQNAL-MK-QHTD

VFTASPITSLFAM-KFYVDDFNK-KF-YATNTFTAAPIDTLLAG-KVCVDDFNN-QF-YDLNMIWGCDVGVATVCAAA-FKGVSDAITANHQYLLWGCDVGVA-VCAAAVFHNICYKLKMVARFLLWGADLGTV-VRAARAFGPFCDAIKSHTIKLIEASSLNDSVAM-RMAFGNLYA-AFHKNPGIVDVPPFEHCILG-RQLLGKFAS-KFQTQPGIVDVLPVEHILYT-RMMIGRFCA-QMHSNNGCIEACEVDYCIVY-RMIMMEIYD-KIYQTPCAIDACPLDYSILC-RMYWGPAIS-YFHLNPGLIMSVSLVDQLVA-RVLFQNQNKREISLWRSLISSVSIVDQLVE-RMLFGAQNELEIAEWQS

v....SG ... ... .NS.& ..&.&.

TTRGNPSGQFSTTMDNNMVNFWLQAFEF

* * *** ** .***. .

NVGNNSGQPSTVVDNTLVLMTAFLYAYHKGNNSGQPSTVVDNTLMVIIAMLYT-SSGLPSGMPLTSVINSLNHCLYVGCAIKRGLPSGMPFTSVINSICHWLLWSAAVKEGLPSGFPCTSQVNSINHWLITLCALKGGMPSGCSGTSIFNSMINNLIIRTLLTGGLPSGCAATSMLNTIMNNIIIIRAGLEGGMPSDCSATGIINTILNNIYVLYALHGGMPSGSPCTTVLNSLCNLMMCIYTTCGSMPSGSPCTALLNSIINNVNLYYVFPGVQKSGSYNTSSSNSRIR-VMAAYHCPGIMKSGSYCTSSTNSRIR-CLMAELI

15

171220191613131 3101644

VI.& . &GDD. &&

STVVYGDDRLS* ***

FVCNGDDNKFYYVNGDDLLIMMTYGDDGVYFYTYGDDGVYFSFYGDDEIVMIAYGDDVIAVLSYGDDLLVMISYGDDIVVPIVYGDDVILILCYGDDVLIAMAMGDDALECIAMGDDSVE

5

668876666666

36

34333939383637

3837421931

III&G . . . .&.CGWSPMEGGFKK

VGINKFGRGWEKVGMTKFYQGWNEMDSPSVEALFQRMTSRDVDVIINNNSIEDGPLIYAEVGCDP-DLFWSKIGCDP-DVHWTAVGCNP-DVDWQRVGINP-YKDWHFIGIDP-DRQWDEFGLST-DTQTAEMGLSV-IHQADA

VII....@~U...

CVGLSFCGFT

CENPYMSLTTQLWFMSHRDSVVFLKRTNKISFLKRTDGLVFLRRTENVTFLKRFEDVVFLKRKTDVTFLKRHMEVEFLKRKSELTFLKRS-ELEFCSHIYAVEFCSHV

13

1010101010131012131098

* *

H-AstFCVRHDVSRSVPLRVSBMVBaYMVTEVPVEMCVFMDVHAV

3C13C13C13C13C13C1NIaNIa3C3C3C3C

4531102112 71070

3941

22022632383838

NDIVTAAHVGVYASVAHVGLYISNTHTTVFITTTHVNALVTAEH-DVLMVPHHVDWILVPGHLPFIITNKHLNVAILPTHARTLVVNRHMTAYLVPRHLDWLLVPSHA

ECH022 3C 39 DEIILHGHS

25181421293134342831353735

KDIAFITCPG 47GEFCCFRSTK 47TDLCLVKGES 45GEFTQFRFSK 70NDISILVGPP 53IDFVLVKVPT 53-DVIAIRRPA 51-DMIIIRMPK 53LEITIITLKR 61TDVSFIRLSS 64SDAALMVLHR 64QDVVLMKVPT 69MDLAILKCKL 62

*

RTQDGMSGAPVC-DKYG---RVLAVHQTNETHPGDCGLPYI-DDNG---RVTGLHTGSQTTHGDCGLPLY-DSSG---KIVAIHTGKGTIPGDCGAPYV-HKRGNDWVVCGVHAAANTGPGYSGTGFW-SSKN----LLGVLKGFPTAKGWSGTPLY-TRDG---- IVGMHTGYSTVLGMCGCQFWTLER----QIDGIHVATQTKDGQCGSPLVSTRDG ---FIVGIHSASPTRAGQCGG-VITCT-G---KVIGMHVGGNTRKGWCGSALLADL-GGSKKILGIHSAGATRAGYCGGAVLAKD-GADTFIVGTHSAGAWRPGMCGGALVSSNQSIQNAILGIHVAGKSCKGMCGGLLISKVEG-NFKILGMHIAG

FIG. 3. Amino acid sequence alignment of the predicted functional domains of astrovirus with related domains of other positive-strand RNAviruses. (A) Putative RNA-dependent RNA polymerases. The designation of the motifs has been described (23). The consensus shows aminoacid residues that are conserved in at least 80% of the polymerases of supergroup I (23-25). U, bulky aliphatic residue (I, L, M, V); @, aromaticresidue (F, Y, W); &, bulky hydrophobic residue (aliphatic or aromatic); and *, any residue. Residues conserved in the (putative) polymerasesof all positive-strand RNA viruses of eukaryotes are highlighted by boldface type. Stars denote identical residues, and colons denote similarresidues in the sequences of the (putative) polymerases of H-Ast2 and barley yellow mosaic virus (BaYMV). The alignment was generated bythe MACAW program (14) using the available information on conserved motifs in viral polymerases (26). Distances between the aligned conservedmotifs and from the protein (or the polyprotein for astroviruses and caliciviruses) termini are indicated. The sequences were from GenBank(26-28). TEV, tobacco etch potyvirus; RHDV, rabbit hemorrhagic disease virus; FCV, feline calicivirus; SRSV, small round structured virus;FMDV, foot-and-mouth-disease virus (type 01K); EMCV, encephalomyocarditis virus; PV, poliovirus (type 1); HAV, hepatitis A virus;ECHO22, echovirus (type 22); SBMV, southern bean mosaic sobemovirus; and PLRV, potato leafroll luteovirus. (B) Putative chymotrypsin-likeproteases. The same set of sequences was used as in A, but the sequences were regrouped to show the closer similarity between the putativeproteases of the astrovirus and the caliciviruses. *, Putative catalytic residues; !, residues implicated in substrate binding (13); 3C1, 3C-likeprotease (after 3C protease of picornaviruses); NI, nuclear inclusion protein. Other designations, abbreviations, the procedure for alignmentgeneration, and the sources of the sequences are as in A.

analysis, and that its genomic organization is distinctive is remarkable, however, that astroviruses combine featuresamong animal viruses. Astrovirus can be distinguished from typical of several very different groups ofpositive-strand RNAother positive-strand, nonenveloped RNA viruses-Picorna- viruses and even retroviruses (the frameshift signal). Of spe-viridae, Caliciviridae, and hepatitis E virus-by the presence cial interest is the similarity of the genomic organization andof the ribosomal frameshift and the lack of a helicase domain. expression strategy of astrovirus and plant luteoviruses (51).Otherwise, the closest similarity in genomic organization is Both groups of viruses lack the helicase domain, whereas thewith caliciviruses, which differ from astrovirus in that ORF 2 protease and the polymerase domains are apparently fused viaencoding the capsid protein is separated from the 3' end by a ribosome frameshifting. Moreover, both share the substitutionsmall ORF 3; picornaviruses differ by the 5' terminal local- of serine for the catalytic cysteine in the viral proteases.ization of the capsid-protein genes and by the absence of The present findings strongly support the classification ofsubgenomic RNA; and hepatitis E virus has a distinct array of astroviruses in another family, Astroviridae. The availabilitynonstructural genes, even though the gene encoding the struc- of sequence information will be useful in the development oftural protein is similarly localized at the 3' end (46, 48-50). It sensitive diagnostic assays to further our understanding of

conslH-Ast

BaYMVTEVFCV

RHDVSRSV

PVEMCVFMDV

ECH022HAV

PLRVSBMV

cons 1

H-Ast

BaYMVTEVFCV

RHDVSRSV

PVEMCVFMDV

ECH022HAV

PLRVSBMV

B

1086

197169

14051 92

1391156156161163164284215

49

454743434340434342435049

15

14139

109

11131311142716

IV.D&[email protected]

* .*.*. .

GDGSRFDSSIDADGSQFDSSLTVDYSKWDSTQSLDYSKWDSTMSADYTAWDSTQNFDYTGYDASLSVDYSNFDSTHSVDYSAFDTNHCMDYSQYDGSLSLDFSAFDASLSTDCSGFDWSVAADISGFDWSVQ

VIII.U...

KLMASLLKPY

SLPVERI IAIKLEEERIVS ILLDRSSILRQKLDKSSILRQRLDRASIERQVMPMKEIHESVMNREALEAMVMASKTLEAILLDTENMI QHAISEKTIWSLVNTNKMLYKLTSWPKTLYRF

71

8098

1025

95615759636951

100

847552

?529282474721925292320

Page 5: Distinctive - Proceedings of the National Academy of · PDF fileRNAsequenceofastrovirus: Distinctive genomicorganization and ... (AAAAAAC)from position 2791 to 2797, followed by a

Proc. Natl. Acad. Sci. USA 90 (1993) 10543

FIG. 4. Tentative phylogenetic relationship for RNA-dependentRNA polymerases of supergroup I positive-strand RNA viruses. Thesame set of sequences was analyzed as for Fig. 3 A and B. Thedendrogram was derived by majority-rule consensus of 100 boot-strapped trees generated using the KITSCH program of the PHYLIPpackage, which implements the distance-matrix algorithm of Fitchand Margoliash under the evolutionary-clock assumption. The otheralgorithms applied (15) produced very similar tree topologies, exceptfor the protein-parsimony algorithm, which suggested grouping oftheastrovirus polymerase with the bymovirus and potyvirus polymer-ases. For abbreviations, see Fig. 3A legend.

the importance of this group of viruses as a cause of diseasein humans and animals.

We thank Daniel Bradley, Jon Gentsch, and John O'Connor forcritical reading of the manuscript.

1. Madeley, C. R. & Cosgrove, B. P. (1975) Lancet ii, 124.2. Monroe, S. S., Stine, S. E., Gorelkin, L., Herrmann, J. E.,

Blacklow, N. R. & Glass, R. I. (1991) J. Virol. 65, 641-648.3. Matsui, S. M., Kim, J. P., Greenberg, H. B., Young,

L. V. M., Smith, L. S., Lewis, T. L., Herrmann, J. E., Black-low, N. R., Dupis, K. & Reyes, G. R. (1993) J. Virol. 67,1712-1715.

4. Kurtz, J. B. & Lee, T. W. (1984) Lancet i, 1405.5. Greenberg, H. B. & Matsui, S. M. (1992) Infect. Agents Dis. 1,

71-91.6. Herrmann, J. E., Taylor, D. N., Echeverria, P. & Blacklow,

N. R. (1991) N. Engl. J. Med. 324, 1757-1760.7. Willcocks, M. M. & Carter, M. J. (1992) Arch. Virol. 124,

279-289.8. Monroe, S. S., Jiang, B., Stine, S. E., Koopmans, M. & Glass,

R. I. (1993) J. Virol. 67, 3611-3614.9. Devereux, J., Haeberli, P. & Smithies, 0. (1984) Nucleic Acids

Res. 12, 387-395.10. Lambden, P. R., Cooke, S. J., Caul, E. 0. & Clarke, I. N.

(1992) J. Virol. 66, 1817-1822.11. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman,

D. J. (1990) J. Mol. Biol. 215, 403-410.12. Henikoff, S. & Henikoff, J. G. (1992) Proc. Natl. Acad. Sci.

USA 89, 10915-10919.13. Gorbalenya, A. E., Blinov, V. M., Donchenko, A. P. & Koo-

nin, E. V. (1989) J. Mol. Evol. 28, 256-268.

14. Schuler, G. D., Altschul, S. F. & Lipman, D. J. (1991) Proteins9, 180-190.

15. Felsenstein, J. (1989) Cladistics 5, 164-166.16. Kozak, M. (1991) J. Biol. Chem. 266, 19867-19870.17. Pleij, C. W. (1990) Trends Biochem. Sci. 15, 143-147.18. Chamorro, M., Parkin, N. & Varmus, H. E. (1992) Proc. Natl.

Acad. Sci. USA 89, 713-717.19. Brierley, I., Digard, P. & Inglis, S. C. (1989) Cell 57, 537-547.20. den Boon, J. A., Snijder, E. J., Chirnside, E. D., de Vries,

A. A., Horzinek, M. C. & Spaan, W. J. (1991) J. Virol. 65,2910-2920.

21. Xiong, Z., Kim, K. H., Kendall, T. L. & Lommel, S. A. (1993)Virology 193, 213-221.

22. Prufer, D., Tacke, E., Schmitz, J., Kull, B., Kaufmann, A. &Rohde, W. (1992) EMBO J. 11, 1111-1117.

23. Eggen, R. & van Kammen, A. (1988) in RNA Genetics, eds.Ahlquist, P., Holland, J. & Domingo, E. (CRC, Boca Raton,FL), Vol. 1, pp. 49-69.

24. Hellen, C. U., Krausslich, H. G. & Wimmer, E. (1989) Bio-chemistry 28, 9881-9890.

25. Palmenberg, A. C. (1990) Annu. Rev. Microbiol. 44, 603-623.26. Koonin, E. V. (1991) J. Gen. Virol. 72, 2197-2206.27. Hyypia, T., Horsnell, C., Maaronen, M., Khan, M., Kalkki-

nen, N., Auvinen, P., Kinnunen, L. & Stanway, G. (1992) Proc.Natl. Acad. Sci. USA 89, 8847-8851.

28. Meyers, G., Wirblich, C. & Thiel, H. J. (1991) Virology 184,664-676.

29. Dolja, V. V. & Carrington, J. C. (1992) Semin. Virol. 3, 315-326.

30. Koonin, E. V. & Dolja, V. V. (1993) Crit. Rev. Biochem. Mol.Biol. 28, 375-430.

31. Gorbalenya, A. E., Donchenko, A. P., Blinov, V. M. & Koo-nin, E. V. (1989) FEBS Lett. 243, 103-114.

32. Doolittle, R. F. (1986) OfURFs and ORFs (University ScienceBooks, Mill Valley, CA).

33. Gorbalenya, A. E., Koonin, E. V., Blinov, V. M. &Donchenko, A. P. (1988) FEBS Lett. 236, 287-290.

34. Bazan, J. F. & Fletterick, R. J. (1989) FEBS Lett. 249, 5-7.35. Gorbalenya, A. E., Koonin, E. V., Donchenko, A. P. & Bli-

nov, V. M. (1989) Nucleic Acids Res. 17, 4713-4730.36. Gorbalenya, A. E., Koonin, E. V. & Lai, M. M. (1991) FEBS

Lett. 288, 201-205.37. Rozanov, M. N., Koonin, E. V. & Gorbalenya, A. E. (1992) J.

Gen. Virol. 73 (8), 2129-2134.38. Gorbalenya, A. E. & Koonin, E. V. (1989) Nucleic Acids Res.

17, 8413-8440.39. Vartapetian, A. B. & Bogdanov, A. A. (1987) Prog. Nucleic

Acid Res. Mol. Biol. 34, 209-251.40. Wimmer, E. (1982) Cell 28, 199-201.41. Giachetti, C., Hwang, S. S. & Semler, B. L. (1992) J. Virol. 66,

6045-6057.42. Aroonprasert, D., Fagerland, J. A., Kelso, N. E., Zheng, S. &

Woode, G. N. (1989) Vet. Microbiol. 19, 113-125.43. Dingwall, C. & Laskey, R. A. (1991) Trends Biochem. Sci. 16,

478-481.44. Carrington, J. C., Freed, D. D. & Leinicke, A. J. (1991) Plant

Cell 3, 953-962.45. Li, X. H. & Carrington, J. C. (1993) Virology 193, 951-958.46. Koonin, E. V., Gorbalenya, A. E., Purdy, M. A., Rozanov,

M. N., Reyes, G. R. & Bradley, D. W. (1992) Proc. Natl.Acad. Sci. USA 89, 8259-8263.

47. Gough, R. E., Collins, M. S., Borland, E. & Keymer, L. F.(1984) Vet. Rec. 114, 279.

48. Lambden, P. R., Caul, E. O., Ashley, C. R. & Clarke, I. N.(1993) Science 259, 516-519.

49. Kitamura, N., Semler, B. L., Rothberg, P. G., Larsen, G. R.,Adler, C. J., Dorner, A. J., Emini, E. A., Hanecak, R., Lee,J. J., van der Werf, S., Anderson, C. W. & Wimmer, E. (1981)Nature (London) 291, 547-553.

50. Tam, A. W., Smith, M. M., Guerra, M. E., Huang, C.-C.,Bradley, D. W., Fry, K. E. & Reyes, G. R. (1991) Virology185, 120-131.

51. Bazan, J. F. & Fletterick, R. J. (1990) Semin. Virol. 1, 311-322.

Microbiology: Jiang et al.