The GPRIME package: computer programs for identifying the best regions of aligned genes to target in...

10
Journal of Virological Methods 74 (1998) 67 – 76 The GPRIME package: computer programs for identifying the best regions of aligned genes to target in nucleic acid hybridisation-based diagnostic tests, and their use with plant viruses Adrian Gibbs *, John Armstrong, Anne M. Mackenzie, Georg F. Weiller Research School of Biological Sciences, Australian National Uni6ersity, Canberra, PO Box 475, ACT 2601, Australia Received 11 February 1998; received in revised form 20 April 1998; accepted 20 April 1998 Abstract The GPRIME (Group PRIMEr design) programs examine aligned sets of gene sequences to discover homologous regions to be targeted in diagnostic tests. The core program moves a ‘window’ over the aligned sequences and calculates, at each window position, a ‘redundancy value’, namely the number of sequences that would represent all permutations of the variable sequence positions within that window. Regions with minimal redundancy values may then be targeted in diagnostic tests based on oligonucleotide hybridisation. The likely specificity of tests targeting such regions can be assessed by searching the international databases with those regions using FASTA. The GPRIME programs, which include programs for designing primers to distinguish between two sub-sets of a group of aligned sequences, can be obtained from http://life.anu.edu.au/software.html. We have used GPRIME to design redundant primers for RT-PCR tests to detect all potexviruses and tobamoviruses, and then used these, together with a previously reported pair of primers for the Potyviridae, to screen some Australian orchid collections. Two orchid viruses previously reported from Australia were found; cymbidium mosaic potexvirus was common, but odontoglos- sum ringspot tobamovirus was not. In addition the recently described ceratobium mosaic potyvirus was found to be common, and three other novel potyviruses were also found. © 1998 Elsevier Science B.V. All rights reserved. Keywords: PCR primer design; Redundant and specific primers; Potexviruses; Potyviruses; Tobamoviruses; Virus identification; IT-based gene diagnostics 1. Introduction Modern societies rely on the correct and scien- tific identification of organisms for a whole range * Corresponding author. Tel.: +61 2 62494211; fax: +61 2 62494437; e-mail: [email protected] 0166-0934/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved. PII S0166-0934(98)00070-6

Transcript of The GPRIME package: computer programs for identifying the best regions of aligned genes to target in...

Journal of Virological Methods 74 (1998) 67–76

The GPRIME package: computer programs for identifying thebest regions of aligned genes to target in nucleic acid

hybridisation-based diagnostic tests, and their use with plantviruses

Adrian Gibbs *, John Armstrong, Anne M. Mackenzie, Georg F. Weiller

Research School of Biological Sciences, Australian National Uni6ersity, Canberra, PO Box 475, ACT 2601, Australia

Received 11 February 1998; received in revised form 20 April 1998; accepted 20 April 1998

Abstract

The GPRIME (Group PRIMEr design) programs examine aligned sets of gene sequences to discover homologousregions to be targeted in diagnostic tests. The core program moves a ‘window’ over the aligned sequences andcalculates, at each window position, a ‘redundancy value’, namely the number of sequences that would represent allpermutations of the variable sequence positions within that window. Regions with minimal redundancy values maythen be targeted in diagnostic tests based on oligonucleotide hybridisation. The likely specificity of tests targeting suchregions can be assessed by searching the international databases with those regions using FASTA. The GPRIMEprograms, which include programs for designing primers to distinguish between two sub-sets of a group of alignedsequences, can be obtained from http://life.anu.edu.au/software.html. We have used GPRIME to design redundantprimers for RT-PCR tests to detect all potexviruses and tobamoviruses, and then used these, together with apreviously reported pair of primers for the Potyviridae, to screen some Australian orchid collections. Two orchidviruses previously reported from Australia were found; cymbidium mosaic potexvirus was common, but odontoglos-sum ringspot tobamovirus was not. In addition the recently described ceratobium mosaic potyvirus was found to becommon, and three other novel potyviruses were also found. © 1998 Elsevier Science B.V. All rights reserved.

Keywords: PCR primer design; Redundant and specific primers; Potexviruses; Potyviruses; Tobamoviruses; Virusidentification; IT-based gene diagnostics

1. Introduction

Modern societies rely on the correct and scien-tific identification of organisms for a whole range

* Corresponding author. Tel.: +61 2 62494211; fax: +61 262494437; e-mail: [email protected]

0166-0934/98/$19.00 © 1998 Elsevier Science B.V. All rights reserved.

PII S0166-0934(98)00070-6

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–7668

of crucial purposes, including the understandingand control of industrial processes, diseases, quar-antine surveillance, patenting, etc. Molecular geneprobing and sequencing techniques are unparal-leled in their ability to identify and characteriseparticular organisms or groups of genetically re-lated organisms, unequivocally and with greatsensitivity. Where comparisons have been made ofdifferent methods for the diagnosis of, for example,plant viruses, RT-PCR (Reverse Transcription—Polymerase Chain Reaction) has been found to bethe most sensitive of the DNA hybridisation-basedtechniques, and to be much more sensitive, specificand robust than the best of the serological tech-niques (Figueira et al., 1997; Li et al., 1997; Stevenset al., 1997).

Until recently, gene sequences or probes forDNA diagnostic work were obtained empirically byisolating gene fragments that were specific to theorganism(s) to be identified. However an alternativethat has recently become available is to designprobes using data from international gene sequencedatabases, that can be accessed by Internet. Formany years these databases have been increasingexponentially 10-fold every 5 years and, for exam-ple, in January 1998 the EMBL nucleotide databasecontained nearly 1.5 million sequences involving atotal of over 1.2 billion nucleotides. These se-quences come from genes of most groups of organ-isms including viruses, and most of the genera ofplant viruses are well represented, some by severaldozen sequences, including complete genomes(Brunt et al., 1996). Thus the international nucle-otide and amino acid sequence databases have nowbecome a most important source of data fordiagnostic work.

The GPRIME package of simple computer toolsthat we describe in this paper helps one examinealigned sets of gene sequences to discover ho-mologous sequences, that may then be targeted withPCR primers. We have used the package to designprimers to detect potexviruses and tobamoviruses,and used these, together with a pair of primersdesigned to detect all species of the Potyviridae(Gibbs and Mackenzie, 1997; Mackenzie et al.,1998) to examine plants from the large collectionof orchids, mostly Australasian, at the AustralianNational Botanic Gardens, Canberra, and several

small private collections.Orchids are the largest family of flowering plants,

and are grown commercially in many countries asornamental potted plants or for cut flowers. Morethan a dozen viruses, mostly uncharacterised po-tyviruses, have been recorded in cultivated orchids.However, reports indicate that cymbidium mosaicpotexvirus and odontoglossum ringspot to-bamovirus are perhaps the most widespread orchidviruses; both are contagious, and spread throughwounds and during pruning. Potexviruses, po-tyviruses and tobamoviruses comprise about a thirdof the more than 1000 recorded plant virus species,and include many that are agronomically impor-tant.

2. Materials and methods

2.1. Sequences: their alignment and analysis

The complete genomic sequences of eleven po-texviruses were obtained via the Australian Na-tional Genomic Information Service fromGenBank; bamboo mosaic virus (Accession codeD26017—6366 nts), clover yellow mosaic virus(D29630, D01191—7015 nts), cymbidium mosaicvirus—Singapore isolate (CymMV) (U62963—6227 nts), foxtail mosaic virus (M62730—7015nts), narcissus mosaic virus (D13747, D00405—6955 nts), papaya mosaic virus (D13957, D00580—6656 nts), Plantago asiatica mosaic virus(Z21647—6128 nts), potato aucuba mosaic virus(S73580—7059 nts), potato virus X (D00344—6435 nts), strawberry mild yellow edge-associatedvirus (D12517 D01227—5966 nts) and white clovermosaic virus (X16636—5846 nts). Similarly thegenomic sequences of nine tobamoviruses wereobtained; Chinese rape mosaic virus (U30944—6303 nts), crucifer tobacco mosaic virus (Z29370—6312 nts), cucumber green mottle mosaic virus(D12505; D01188—6421 nts), odontoglossumringspot virus (ORSV) (X82130—6618 nts), peppermild mottle virus (M81413—6357 nts), sunnhempmosaic virus (U47034; J02413—4683 and 1800nts), tobacco mild green mosaic virus (M34077;M22483—6355 nts), tobacco mosaic virus(J02415—6395 nts) and turnip vein clearing virus(U03387; L22518—6311 nts).

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–76 69

To align each set, the chosen genomic se-quences were first split into their individual genes(including any overlapping portions in bothgenes), translated into their encoded amino acidsequences, and the homologous amino acid se-quences aligned using CLUSTAL V (Higgins etal., 1991) with default parameters. Then the AD-DGAPS program (G.F. Weiller, unpublished) wasused to read files containing the gapped aminoacid sequences together with the correspondingungapped gene sequences, and produce gappedgene sequences with triplets of gaps added toappropriate places. The terminal untranslatedportions of each genome were aligned directlyusing CLUSTAL V. The gapped genes and theterminal untranslated regions of each genomewere then recompiled to give a set of alignedgenomes, all now the same length and collatedinto the PIR format files required by theGPRIME package.

The likely specificities of chosen primers weretested by using the redundant sequence and itscomplement to search the EMBL database withFASTA (Pearson and Lipman, 1988; http://www2.ebi.ac.uk/fasta3/).

2.2. The GPRIME package

The GPRIME programs are written in the La-hey ELF90 version of FORTRAN 90, and willoperate with a DOS operating system version 3.3et seq; they do not require WINDOWS or anumerical co-processor. They may be obtainedfrom http://life.anu.edu.au/software.html or ftp://life.anu.edu.au/pub/software/GPRIME/.GPRIME.EXE is a self extracting archive file.

2.3. Virus tests

The potex- and tobamovirus primers designedin the work reported in this paper, together withpotyvirid-specific primers (Gibbs and Mackenzie,1997; Mackenzie et al., 1998), were tested for theirability to prime separate RT-PCR reactions withRNA extracted from orchid leaves showing virussymptoms, and from comparable healthy leaves.The leaves were collected from plants of severalorchid genera in the collection of the Australian

National Botanic Gardens and from several pri-vate collections in eastern Australia. When DNAfragments of the expected sizes were obtainedfrom the RT-PCR tests, they were further charac-terised by sequencing.

2.4. Oligonucleotides

Primers were synthesised in an Applied Biosys-tems DNA synthesiser.

2.5. Nucleic acid extraction and RT-PCR

Samples of leaf or flower tissue, each of about100 mg, were ground in liquid N2 and nucleicacids extracted using the NaCl wash–CTAB ex-traction method previously described (Mackenzieet al., 1998), and dissolved in 50–100 m l of sterilewater. RT-PCR was done using each primer pairdescribed below and the Titan One Tube RT-PCRSystem (Boehringer-Mannheim); 1 m l of extractednucleic acids was used with 25 pmoles each ofpotex 1/SP6 and potex 2/T7 primers or 15 pmolespotyvirid 1/SP6 and 30 pmoles potyvirid 2/T7 or25 pmoles each of tobamo 1/M13F and tobamo 2primers; in a 25 m l reaction mix containing 0.2mM dNTPs, 2.5 mM MgCl2, 5 mM DTT, 10units RNasin and 0.3 m l of enzyme mixture withthe System’s buffer. The thermocycle regime usedwas 45°C for 30 min, 94°C for 2 min, followed by35 cycles of 94°C for 30 s, 56°C (potyvirid andpotexvirus) or 50°C (tobamovirus) for 45 s and68°C for 2 min, finally 68°C for 5 min. A 5 m laliquot of each product mix was fractionated byagarose gel electrophoresis, stained in ethidiumbromide, and examined and photographed in UV-light.

Specific amplified fragments were purified andconcentrated using the PROMEGA WIZARDPCR Preps DNA Purification Kit and their con-centration estimated after further agarose gel elec-trophoresis. They were sequenced using the ABIDye Terminator Cycle Sequencing Ready Reac-tion Kit. Between 100–200 ng of the purified PCRfragment was sequenced from either end using theSP6, T7 or M13F primers. For each sample eitherthe PCR product was sequenced twice with theappropriate primers, or both products from two

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–7670

separate PCR reactions with the sample weresequenced once.

3. Results

3.1. Use of the programs

The GPRIME package, obtained as describedabove as the self-extracting file GPRIME.EXE(bytes), is best put into a new sub-directory¯GPRIME¯ and expanded by ‘\C:¯GPRIME¯�Enter/Return�’. The files, INTRO.DOC (Word6.0) and INTRO.TXT describe the use ofGPRIME, and there are files of test data. Thepackage menu is started from within the GPRIMEsubdirectory by ‘\GP�Enter/Return�’.

GPRIME programs of the first menu group aidthe search for regions where minimally redundantPCR primers would amplify all sequences in theset. Program 1, which is the core program of thepackage, slides a ‘window’ of chosen size over thealigned sequences, one nucleotide at a time. Ateach window position it calculates from their con-sensus a ‘redundancy value’, namely the numberof sequences that would represent all permutationsof the variable sequence positions within thatwindow; note that the consensus used here is the‘ambiguity consensus’, namely the maximumknown variation at each position in the alignedsequences (recorded using the ‘IUB ambiguitycodes’ from http://morgan.angis.su.oz.au/Angis/Tables.html), and is not the ‘majority consensus’,namely the nucleotides that occur most frequentlyat each position. The program also calculates foreach window position the maximum and minimumTm values for the dsDNAs produced when theprimers in that mixture hybridise with their targetsequences. A file is produced that lists, vertically,the ‘consensus’ of the sequences and the ‘redun-dancy values’ for each window position along thesequence, where the redundancy value is less than250. When any sequence within the ‘window’contains a gap the region will be unsuitable as thetarget for a primer and so a gap is recorded in theconsensus, a warning is posted in the ‘gap’column, and the program records a redundancyvalue of 250.

The ‘redundancy file’, REDUND1.OUT, is ex-amined using a text editor to find possible targetsites among those that have redundancy valuesless than about 100 and no gaps. Sites are chosen,that will yield fragments of the desired size andhave similar Tm values each covering a narrowrange, and their positions noted so that they canbe examined, one at a time, using the secondprogram. A second version of this file, RE-DUND1.XLS, is also produced and has the for-mat required for display by a spreadsheetprogram, such as EXCEL.

Program 2 is used to examine in turn each of thechosen likely target sites. It produces a file listingthe sequence and its complement, and also a dotdiagram comparing the consensus sequence withits complement, to reveal whether it contains self-complementary regions and might anneal to itself.

Programs of the second menu group aid thedesign of primers that distinguish between twochosen sub-sets of a group of aligned sequences.Program 3 is similar to Program 1, but seeksregions where PCR primers can distinguish be-tween two chosen subsets of the sequences (e.g.two phylogenetically distinct lineages). After thetwo subsets are defined, their consensus sequencesare derived, and again scanned using a sliding‘window’ of chosen length. At each position the‘redundancy value’ of each consensus is calculated,and also a ‘difference value’, which is the numberof nucleotide positions within the window atwhich the two subsets have no shared nucleotides,and which thus can range from zero to the lengthof the window. These data are recorded in twofiles, REDUND2.OUT and REDUND2.XLS,which are examined using a text editor or EXCEL.Using these, one can identify sites that have themaximum ‘difference value’, but minimum ‘redun-dancy values’ for the two subset consensus’. Re-gions of this sort are noted and examined usingProgram 4, which like Program 2, produces dotdiagrams comparing the two consensus’, and alsocomparing each with its complement. The first ofthese dot diagrams will confirm how distinct thetwo subsets of sequences are in the chosen region,and will also reveal regions of self complementar-ity in the two consensus’, as described above.

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–76 71

Programs in the third menu group helps onedecide which of the potential primers identifiedby the redundancy value calculations are bestfor particular purposes. The first of these, Pro-gram 5, produces a dot plot comparing two se-quences, each less than 100 nucleotides long. Itcan be used to compare two potential primersequences to check for complementarity; and theoutput is an ASCII file SEQDOT.OUT. Pro-gram 6 selects sequences, and–or parts of thosesequences, from a file of aligned sequences inPIR format, and puts them into a new file. Thusone can obtain a separate PIR file of the se-quence fragments that would be selected by aparticular pair of PCR primers. Program 7 en-ables one to check a proposed redundant primeragainst any given sequence in PIR format.

Finally the program package also containssome ancillary programs required by the LaheyELF90 version of FORTRAN 90, together withthe INTRO.DOC and INTRO.TXT files describ-ing how to use the programs, and a large selec-tion of test data files in the correct format topractice using the programs.

3.2. Design and testing of PCR primers forpotex- and tobamo6iruses.

Primers for detecting, hopefully, all species ofpotexviruses and tobamoviruses were designedusing GPRIME. The complete genomic se-quences of 11 potexviruses and nine to-bamoviruses were aligned as two sets, gene bygene and then recompiled as described above;these aligned sequences are included in theGPRIME package. The sequence sets were ex-amined by GPRIME Program 1, and the result-ing redundancy values obtained using a window16 nucleotides in length are shown in Figs. 1and 2. It can be seen that there are relativelyfew gap-free regions with redundancy values lessthan 250 in either set. These regions were testedfor their likely specificity by searching theEMBL database (http://www2.ebi.ac.uk/fasta3/)with them using FASTA (Pearson and Lipman,1988; Pearson, 1990). Those finally chosen forthe virus-specific regions of the primers, and

their genomic positions in the gapped andaligned sequences, were:

potex 1: 5%-CAYCARCARGCXAARGAYSA-3%(nts 4086–4105)5%-TCDGTRTTDGCRTCRAADGT-3%potex 2:(reverse complement of nts 4812–4831)

tobamo 1: 5%-TGATHAARMGDAAYWTBAAY-DCDCC-3%(nts 3971–3995)

tobamo 2: 5%-TTBGCYTCRAARTTCCA-3%(reverse complement of nts 4831–4847)

where the redundant nucleotide M represents Aor C; R, A or G; W, A or T; S, C or G; Y, Cor T; V, A or C or G; H, A or C or T; D, A orG or T; B, C or G or T; X, A or G or C or T.Sequences corresponding to the SP6 or T7 pro-moter sequences or the pUC/M13 forward

Fig. 1. Diagram indicating the positions of the selected potex1 and potex 2 primers in potexvirus genomes. The uppercartoon shows a potexvirus genome map: MT-hel, methyl-transferase–helicase; pol, polymerase; TGB, triple gene block;VP, virion protein; poly(A), 3%-terminal sequence. Lower boxshows a graph of the redundancy values (y-axis; only valuesB250 plotted) from 11 aligned potexvirus genomes (x-axis;each division 1000 nucleotides) obtained using a window sizeof 16. The x-axis of the cartoon and graph are the samelength, and the positions of gaps in any of the 11 sequences isshown by the graph line at the 50 redundancy value level.

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–7672

Fig. 2. Diagram indicating the positions of the selected to-bamo 1 and tobamo 2 primers in tobamovirus genomes. Theupper cartoon shows a tobamovirus genome map: MT-hel,methyl-transferase–helicase; pol, polymerase; MP, movementprotein; VP, virion protein; ‘t-RNA’, 3%-terminal t-RNA-likesequence. Lower box shows a graph of the redundancy values(y-axis; only values B250 plotted) from nine aligned to-bamovirus genomes (x-axis; each division 1000 nucleotides)obtained using a window size of 16. The x-axis of the cartoonand graph are the same length, and the positions of gaps inany of the nine sequences is shown by the graph line at the 50redundancy value level.

sequences in the top 17 of 70 sequences with Evalues of B0.0001, whereas potex 2 matchedeight target sequences in the top 16 of 70 se-quences with E values of only 0.01–0.1; an E-value (statistical expectation) corresponds to thenumber of sequences with the observed similaritythat one would expect by chance alone in thecurrent database, which has c. 1.2×109 nucle-otides and 1.7×106 sequences (an E value B0.05is considered equivalent to P=0.95). The tobamoprimers were also very specific, but in differentways; tobamo 1 found all 13 target sequences inthe top 19 of 70 sequences with E values of0.001–0.01, whereas tobamo 2 matched all targetsequences better than non-target, but with E val-ues of only 0.1–2.0. The few non-target sequencesthat matched the primers were mostly Caenorhab-ditis ESTs, and these, unlike the viral sequences,showed no homology in regions flanking thematched sequences. The complements of theprimer 1 sequences, and the primer 2 sequencesthemselves matched no target sequences in the top70.

These database searches indicate that the prob-ability that the chosen primer pairs will, in combi-nation, select and amplify in RT-PCR testsfragments of non-target sequences of the expectedsize is vanishingly small.

The subgenomic sequences bracketed by thechosen PCR primers were selected by GPRIMEProgram 7. Estimates of the pairwise differencesbetween these fragment sequences and the fullgenomic sequences were compared in scatter dia-grams (Fig. 3) obtained using the DIPLOMOprogram (Weiller and Gibbs, 1995). In this way itwas shown that the relationships between thechosen fragments were representative of those ofthe genomes from which they came.

3.3. RT-PCR tests of the potex-, poty- andtobamo6irus primers

Nucleic acid preparations from over 400 orchidplants were tested using the potex- and to-bamovirus primers, and also using the primersdesigned for detecting potyvirids (Gibbs andMackenzie, 1997; Mackenzie et al., 1998). Frag-

primer sequence were added to the 5% ends of thevirus-specific redundant primer sequences shownabove to give the primers used. In this way PCRfragments were produced that could be sequenceddirectly (Mackenzie et al., 1998). Deoxyinosinewas used in positions in the primers that had 3- or4-fold degeneracy (V, H, D, B, X) as Langeveld etal. (1991) had shown this to give primers thatwere more specific than when a mixture of nucle-otides was used at such degenerate sites. The genelocations of the chosen primer sites are also indi-cated on the genome maps in Figs. 1 and 2.

The FASTA search of databases showed that,with very few exceptions, these sequences matchedwith sequences from viruses of the same genus(target sequences) before other sequences (NB: thereverse complement of the primer 2 sequencesgiven above were, of course, used for searching).The potex 2 primer was found to be more specificthan the potex 1 primer as it matched all 14 target

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–76 73

Fig. 3. Scatter diagrams comparing (on the left) the pairwise nucleotide differences of the 11 aligned potexvirus genomes (x-axis) andof the corresponding sequences between the chosen primers (nts 4086–4832) (y-axis), and (on the right) the pairwise differences ofthe nine aligned tobamovirus genomes (x-axis) and of the corresponding selected sequences (nts 3971–4848) (y-axis). The potexvirusdata gave a linear regression equation of y=4.28+0.6631×g with a correlation coefficient of 0.688, and that of the tobamovirusdata y=0.45+0.8034×g with a correlation coefficient of 0.974.

ments of the expected size were reproduceablyobtained from almost half of the plants (Fig. 4).A total of 76 yielded fragments of c. 760 bp whenprimed with the potexvirus primers. Of these, 15were sequenced; 12 were found to have \95%identity to the same region of the genome of theSingaporean and Korean isolates of CymMV(Wong et al., 1997; GenBank code AF016914—unpublished data), whereas three had only 87%identity, thus confirming the report of Srifah et al.(1966) that there is more than one CymMV popu-lation. A total of five plants yielded fragments ofc. 880 bp with tobamovirus primers and whenthese fragments were sequenced they were foundto have more than 97% identity to the sameregion of the genome of odontoglossum ringspottobamovirus (Ryu and Park, 1994). A quarter ofthe plants yielded fragments of c. 1.7 kbp with thepotyvirid primers, and sequencing has shownthem to be of at least four species of potyvirus;commonest in species of the Epidendroideae, in-cluding Dendrobium spp., was a novel species wehave called ceratobium mosaic virus (CerMV;Mackenzie et al., 1998). A total of three otherspecies were detected in ground orchids; anothernovel potyvirus with 80% sequence identity toornithogalum mosaic potyvirus (Burger et al.,1990; Brunt et al., 1996) was found in Diurus and

Pterostylis spp., a strain of bean yellow mosaicpotyvirus (Boye et al., 1992; Brunt et al., 1996) inDiurus spp., and another distinct potyvirus dis-tantly related to tobacco etch potyvirus (Allison etal., 1985) was isolated from plants of Pleione spp.from an overseas collection.

4. Discussion

Several computer programs have been de-scribed that aid the search for sequences to targetwith oligonucleotide primers, many merely followrules to find such sequences automatically. Mostaim to find pairs of primers targeting single se-quences by comparing each part of the sequencewith all others, while optimising criteria based ontheir composition, annealing temperature, andself-complementarity (Lucas et al., 1991; Rychlik,1995). A few programs seek shared regions in setsof sequences, and most of these also follow the‘all combinations’ strategy (Proutski and Holmes,1996), and some require a ‘prototype’ or ‘tem-plate’ sequence to be defined (Lucas et al., 1991;Dopazo and Sobrino, 1993; Antoniw, 1995). Theonly programs, known to us, that are specificallydesigned to seek primer sequences in fully alignedsets of sequences are those described in this paper

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–7674

and the PROFILES program described by Ro-drıguez et al. (1992), who used it to design primersfor distinguishing the serotypes of foot-and-mouth disease aphthovirus, and for distinguishingisolates of that species from other picornaviruses.PROFILES calculates ‘homology profiles’ bycomparing all the aligned sequences with their‘majority consensus’, and uses the profiles to iden-tify shared or differentiating sequences. Both theGPRIME and PROFILES programs present theresults graphically for their users to interpret,however we believe that the lists and graphs of‘redundancy value’ produced by the GPRIMEprograms are simpler to interpret, more directlyrelated to the selection of primer sites, and imme-diately show the relatively few potential sites invariable sequences like those we have examined.Neither GPRIME nor PROFILES recognise andavoid repetitive regions in the sequences, althoughthese would be found in the subsequent FASTA‘specificity searches’. By contrast, the DOT-PRIME program (Antoniw, 1995) recognises re-peated sequences as it compares a single chosenreference sequence with the other sequences in theset, one at a time, using a version of the dotdiagram method (Gibbs and McIntyre, 1970). Itstores these sequences and their positions, so thatthose shared by all the chosen sequences can beidentified. Thus the DOTPRIME program avoidsthe time consuming work involved in aligning thesequences, and directly identifies repeated se-quences, however the generality of the target se-quences found by DOTPRIME depends cruciallyon the choice of the reference sequence.

The target sequences identified by the GPRIMEprograms may be unnecessarily redundant as theycontain all permutations of the nucleotidesrecorded at each position in the sequences. Inreality it is unlikely that all combinations willoccur in nature, so it may be useful, when aprimer is synthesised, to decrease its redundancyat some positions. One way to accomplish this isto classify the aligned sequences, then designprimer mixtures for each of the major lineages(subsets) using GPRIME program 3, before mix-ing those primers.

The sequences added to the 5% ends of thevirus-specific parts of the primers, together with

Fig. 4. Gel stained with ethidium bromide showing the DNAfragments obtained by RT-PCR from extracts of 17 orchidplants using the potex, potyvirid and tobamo primer pairs;each gel has the RT-PCR products from each extract in thesame position.

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–76 75

the use of deoxyinosine at sites of 3-fold orgreater redundancy, resulted in few non-specificPCR products resulting from mispriming, andthey also enabled the PCR product to be se-quenced directly (Mackenzie et al., 1998).

Our tests of orchids with the potex-, poty-, andtobamovirus RT-PCR primers found the two or-chid viruses previously reported from Australia;cymbidium mosaic potexvirus was common, butodontoglossum ringspot tobamovirus was not. Inaddition the recently described ceratobium mosaicpotyvirus was found to be common, and threeother novel potyviruses were also found. Howeverwe did not find any of the three potyvirusespreviously reported from orchids, and for whichsome genome sequence information is available,namely dendrobium mosaic detected in cultivatedplants of Dendrobium×superbum Rchb. from twoHawaiian islands (Hu et al., 1995), vanilla necro-sis isolated from Vanilla planifolia Jackson (syn.Vanilla fragrans Ames) from Tonga (Wang et al.,1993), and calanthe mild mosaic isolated fromCalanthe spp. in Japan (Gara et al., 1998). Nogene sequence data, nor distinctive host rangedata, have been reported for other orchid po-tyviruses, namely habenaria mosaic and pecteilismosaic potyviruses in Japan, and vanilla mosaicpotyvirus in French Polynesia (Brunt et al., 1996),and so it is unclear how closely they are related tothe four novel orchid potyviruses we have found.

The international databases are now large andrepresentative enough of some viral and cellularorganisms, to allow the data they contain to beused, in silico, both to design diagnostic primersand also test their specificity, as we have reportedhere. We call this two-stage process ‘IT-basedgene diagnostics’ (Littlejohn and Gibbs, 1997). Asthe databases increase still more in size, so toowill the number of organisms that can be treatedin this way, and the certainty with which thespecificity of primers can be tested will increase.

References

Allison, R.F., Dougherty, W.G., Parks, T.D., Willis, L., John-ston, R.E., Kelly, M., Armstrong, F.B., 1985. Biochemicalanalysis of the capsid protein gene and capsid protein of

tobacco etch virus: N-terminal amino acids are located onthe virion’s surface. Virology 147, 309–316.

Antoniw, J., 1995. A new method for designing PCR primersspecific for groups of sequences and its application to plantviruses. Mol. Biotech. 4, 111–119.

Boye, K., Stummann, B.M., Henningsen, K.W., 1992. cDNAcloning and sequencing of the bean yellow mosaic virusnuclear inclusion protein genes. Plant Mol. Biol. 18, 1203–1205.

Brunt, A.A., Crabtree, K., Dallwitz, M., Gibbs, A., Watson,L. (Eds.), 1996. Viruses of Plants: Descriptions and Listsfrom the VIDE Database. CAB International, UK, p.1484.

Burger, J.T., Brand, R.J., Rybicki, E.P., 1990. The molecularcloning and nucleotide sequencing of the 3%-terminal regionof Ornithogalum mosaic virus. J. Gen. Virol. 71, 2527–2534.

Dopazo, J., Sobrino, F., 1993. A computer program for thedesign of PCR primers for diagnosis of highly variablegenomes. J. Virol. Methods 41, 157–166.

Figueira, A.R., Domier, L.L., D’Arcy, C.J., 1997. Comparisonof techniques for detection of barley yellow dwarf virus-PAV-IL. Plant Disease 81, 1236–1240.

Gara, I., Kondo, H., Maeda, T., Inouye, N., Tamada, T.,1998. Calanthe mild mosaic virus, a new potyvirus causingmild mosaic disease of the Calanthe orchid in Japan.GenBank Accession Code AB011404, but otherwiseunpublished.

Gibbs, A., Mackenzie, A., 1997. A primer pair for amplifyingpart of the genome of all potyvirids by RT-PCR. J. Virol.Methods 63, 9–16.

Gibbs, A.J., McIntyre, G.A., 1970. The diagram, a method forcomparing sequences, its use with amino acid and nucle-otide sequences. Eur. J. Biochem. 16, 1–11.

Higgins, D.G., Bleasly, A., Fuchs, R., 1991. CLUSTAL V:improved software for multiple sequence alignment.CABIOS 8, 151–153.

Hu, J.S., Ferreria, S., Wang, M., Borth, W.B., Mink, G.,Jordan, R., 1995. Purification, host range, serology andpartial sequencing of dendrobium mosaic potyvirus, a newmember of the bean common mosaic virus subgroup.Phytopathology 85, 542–546.

Langeveld, S.A., Dore, J.-M., Memelink, J., Derks, A.F.L.M.,van der Vlugt, C.I.M., Asjes, C.J., Bol, J.F., 1991. Identifi-cation of potyviruses using the polymerase chain reactionwith degenerate primers. J. Gen. Virol. 72, 1531–1541.

Li, R.H., Wisler, G.C., Liu, H.-Y., Duffus, J.E., 1997. Com-parison of diagnostic techniques for detecting tomato in-fectious chlorosis virus. Plant Disease 82, 84–88.

Littlejohn, T.G., Gibbs, A.J., 1997. IT-based diagnostics. Aus-tralas. Biotech. 7, 233–238.

Lucas, K., Busch, M., Mossinger, S., Thompson, J.A., 1991.An improved microcomputer program for finding gene- orgene family-specific oligonucleotides suitable as primers forpolymerase chain reactions or as probes. CABIOS 7, 525–529.

A. Gibbs et al. / Journal of Virological Methods 74 (1998) 67–7676

Mackenzie, A.M., Nolan, M., Wei, K.-J., Clements, M.A.,Gowanlock, D., Wallace, B.J., Gibbs, A.J., 1998. Cerato-bium mosaic potyvirus; another virus from orchids. Arch.Virol. 143, 1–12.

Pearson, W.R., 1990. Rapid and sensitive sequence compari-son with FASTP and FASTA. Meth. Enzymol. 183, 63–98.

Pearson, W.R., Lipman, D.J., 1988. Improved tools for bio-logical sequence analysis. Proc. Natl. Acad. Sci. USA 85,2444–2448.

Proutski, V., Holmes, E.C., 1996. Primer Master: a new pro-gram for the design and analysis of PCR primers. CABIOS12, 253–255.

Rodırguez, A., Martınez-Salas, E., Dopazo, J., Davila, M.,Saiz, J.C., Sobrino, F., 1992. Primer design for specificdiagnosis by PCR of highly variable RNA viruses: typingof foot-and-mouth disease virus. Virology 189, 363–367.

Rychlik, W., 1995. Selection of primers for polymerase chainreaction. Mol. Biotech. 3, 129–134.

Ryu, K.H., Park, W.M., 1994. Nucleotide sequence analysis ofa cDNA clone encoding the 34K movement protein gene

of odontoglossum ringspot virus ORSV-Cy, the Koreanisolate. Plant Mol. Biol. 26, 995–999.

Srifah, P., Loprasert, S., Rungroj, N., 1966. Use of reversetranscription-polymerase chain reaction for cloning of coatprotein-encoding genes of cymbidium mosaic virus. Gene179, 105–107.

Stevens, M., Hull, R., Smith, H.G., 1997. Comparison ofELISA and RT-PCR for the detection of beet yellowsclosterovirus in plants and aphids. J. Virol. Methods 68,9–16.

Wang, Y.Y., Beck, D.L., Gardner, R.C., Pearson, M.N., 1993.Nucleotide sequence, serology and symptomatology sug-gest that vanilla necrosis potyvirus is a strain of water-melon mosaic virus II. Arch. Virol. 129, 93–103.

Weiller, G.F., Gibbs, A., 1995. DIPLOMO: the tool for a newtype of evolutionary analysis. CABIOS 11, 535–540.

Wong, S.M., Mahtani, P.H., Lee, K.C., Yu, H.H., Tan, Y.,Neo, K.K., Chan, Y., Wu, M., Chng, C.G., 1997. Cymbid-ium mosaic potexvirus RNA: complete nucleotide sequenceand phylogenetic analysis. Arch. Virol. 142, 383–391.

.