44RR

download 44RR

of 30

Transcript of 44RR

  • 8/3/2019 44RR

    1/30

    Factors influencing synonymous codon and amino acid usage biases in

    Aromonas salmonicida phage 44RR

    A.P.Ghosh, A. Deb, and K. Sau*

    Department of Biotechnology, Haldia Institute of Technology, PO-HIT, Dist-Purba-Medinipur, Haldia, W. B.

    721657.

    To reveal the structure-function of the genes ofA. salmonicida phage 44RR, synonymous codon and

    amino acid usage biases in this phage have been investigated at length. As expected for an AT-rich phage,

    third codon position of the synonymous codons of 44RR carries mostly A and/or T base. Analyses on the

    RSCU values of 44RR genes reveal that synonymous codon usage bias in 44RR is strongly dictated by the

    mutational pressure. Further analysis reveals that 44RR-specifc tRNAs preferentially influence the codon

    usage of putatively lowly expressed genes of 44RR, whereas, codon usage of its putatively highly expressed

    genes are influenced mainly by the abundant cellular tRNAs. The data suggest that translational selection

    also plays a role in shaping the codon usage variation in 44RR. Additional analysis reveals that the three

    factors such as hydropathy, aromaticity and cysteine content are mostly responsible for the variation of

    amino acid usage in 44RR proteins.

    Key words:Relative synonymous codon usage (RSCU); correspondence analysis; amino acid usage; phage 44RR.

    . Introduction

    Synonymous codon and amino acid

    usages in living organisms had been shown tovary inter- as well as intra-genomically.

    While mutational pressure (Levine et al.,

    2000; Jenkins et al., 2001; Jenkins and

    Holmes, 2003), translational selection(Grantham et al., 1981; Ikemura, 1985; Sharp

    and Cowe, 1991; Lesniket al., 2000; Ghoshet al ., 2000; Gupta and Ghosh, 2001),secondary structure of proteins (Oresisc and

    Shalloway, 1998; Xie and Ding, 1998;

    Chiusano et al., 2000; Gupta et al., 2000;DOnofrio et al ., 2002), replicational &

    transcriptional selection (McInerney, 1998;

    Romero et al., 2000) environmental factors

    (Lynn et al., 2002; Basak et al., 2004) etcinfluence the codon usage in various

    organisms, amino acid usage

    *Corresponding author:

    (K. Sau): [email protected]

    Tel: +91-3224-252900. Ext. 234.Fax: +91-3224-252800

    was shown to be governed by hydrophobicity,aromaticity, mean molecular weight, cysteine

    content, etc. (Lobry and Gautier, 1994; Garat

    and Musto, 2000; Zavala et al., 2002;Banerjee et al., 2004; Naya et al., 2004).

    Apart from elucidating the structure-function

    and evolution of genes / genomes, codon

    usage in particular has number of practicalapplications.

    Synonymous codon and amino acid usage

    biases had been studied in details in a fewphage genomes mainly belonging to different

    coliphages, mycobacteriophages, S. aureus

    phages, and P. aeruginosa phages (Sharp et al.,1984-85; Gouy, 1987; Holm, 1987; Kunisawa,

    1992; Kunisawa et al., 1998, Sahu et al., 2004;

    Sahu et al., 2005, Sau et al., 2005a; Sau et al.,

    2005b). Synonymous codon usage of highlyand lowly expressed genes in T4 were shown

    to be influenced by the abundant host tRNAs

    and own tRNAs, respectively, (Kunisawa,1992). In phage T7, codon usage was also

    suggested to be influenced mainly by the

    abundant host tRNAs (Sharp et al., 1984-85).It was described that codon usage in both T4

    and T7 are influenced mainly by mutational

    pressure (Kunisawa et al., 1998).

    1

  • 8/3/2019 44RR

    2/30

    Table 1. Overall codon usage analysis in 44RR.

    44RR A.salmonicida tRNA

    Amino Overall HEG* LEG** HEG copy

    Acid Codon RSCU RSCU RSCU RSCU 44RR

    Phe UUU 0.88 0.77 0.94 0.46

    UUC 1.12 1.23 1.06 1.54 1

    Leu UUA 0.58 0.66 0.72 0.03 1

    UUG 1.39 1.1 1.43 0.31

    CUU 1.32 0.62 1.39 0.17

    CUC 0.92 0.72 0.85 0.65

    CUA 0.39 0.31 0.49 0.00

    CUG 1.4 2.59 1.12 4.84

    Ile AUU 1.19 1.1 1.12 0.54

    AUC 1.27 1.5 1.08 2.36 1

    AUA 0.54 0.4 0.8 0.11

    Met AUG 1 1 1 1 2

    Val GUU 2.02 2.2 1.98 0.94

    GUC 0.77 0.54 0.8 1.11

    GUA 0.75 0.72 0.78 0.62

    GUG 0.45 0.54 0.44 1.34

    Ser UCU 1.31 1.33 1.26 1.04

    UCC 0.74 1.42 0.52 2.78

    UCA 1.49 1 1.7 0.11 1

    UCG 0.89 0.67 1.04 0.16

    AGU 0.68 0.71 0.62 0.38AGC 0.89 0.88 0.86 1.53 1

    Pro CCU 1.04 1.15 1.01 0.48

    CCC 0.25 0.2 0.19 0.85

    CCA 1.43 0.85 1.59 0.24 1

    CCG 1.29 1.8 1.21 2.42

    Thr ACU 1.25 1.46 1.22 0.62

    ACC 1.18 1.76 0.92 3.01

    ACA 1.06 0.46 1.31 0.29 1

    ACG 0.51 0.33 0.55 0.07

    Ala GCU 1.5 1.98 1.5 1.11GCC 0.74 0.65 0.66 1.90

    GCA 1.09 1.02 1.05 0.36

    GCG 0.67 0.35 0.79 0.63

    Tyr UAU 1.15 0.91 1.22 0.52

    UAC 0.85 1.09 0.78 1.48 1

    His CAU 1.12 0.88 1.11 0.37

    2

  • 8/3/2019 44RR

    3/30

    CAC 0.88 1.12 0.89 1.63 1

    Gln CAA 1.29 1.45 1.18 0.51

    CAG 0.71 0.55 0.82 1.49

    Asn AAU 0.93 0.84 0.94 0.24

    AAC 1.07 1.16 1.06 1.76 1

    Lys AAA 1.38 1.37 1.34 0.78 1

    AAG 0.62 0.63 0.66 1.22

    Asp GAU 1.15 1.01 1.18 0.79

    GAC 0.85 0.99 0.82 1.21 1

    Glu GAA 1.5 1.7 1.45 1.31

    GAG 0.5 0.3 0.55 0.69

    Cys UGU 0.79 0.85 0.9 0.73

    UGC 1.21 1.15 1.1 1.27

    Trp UGG 1 1 1 1 1

    Arg CGU 1.89 1.89 1.63 3.04

    CGC 1.63 1.89 1.37 2.39CGA 1.16 0.94 1.31 0.07

    CGG 0.64 0.67 0.68 0.29

    AGA 0.49 0.33 0.78 0.07 1

    AGG 0.19 0.28 0.23 0.14

    Gly GGU 1.68 2.1 1.38 1.75

    GGC 1.3 1.31 1.33 1.98

    GGA 0.74 0.44 0.93 0.05

    GGG 0.28 0.15 0.36 0.22

    Note- HEG and LEG denote highly and lowly expressed genes. The * and ** indicate putatively highly and lowlyexpressed genes of phage 44RR, which have been categorized respectively on the basis of lowest 10% and highest

    10% of the genes according to their Nc values.

    In phages lambda, N15, P2, and P4,

    synonymous codon usage patterns were

    found nearly similar to that of the lowly

    expressed genes of E. coli (Holm, 1987;Gouy, 1987; Kunisawa, 1992; Kunisawa et

    al., 1998). Contrary to above, codon usages

    of mycobacteriophages and S. aureus phageswere found almost identical to their

    respective bacterial hosts (Kunisawa, 2000;Sahu et al., 2004; Sahu et al., 2005; Sau et

    al., 2005a). It was shown that mutational pressure and translational selection mainly

    influence the codon usages in mycobacterial

    and S. aureus phages (Sahu et al., 2004;Sahu et al., 2005; Sau et al., 2005a). While

    tRNAs of mycobacteriophage Bxz1 wassuggested to regulate the expression of both

    its highly and lowly expressed genes (Sahuet al., 2004), tRNAs present in phages D29and L5 were suggested to affect their amino

    acid usage to some extent (Kunisawa, 2000).

    Very recently, it was shown that in AT-rich P. aeruginosa phage PhiKZ, codon usage

    bias is dictated mainly by mutational biasand, to an extent, by translational selection.

    Analysis also revealed that amino acid usagein PhiKZ proteins are mainly dictated by

    mean molecular weight, aromaticity and

    cysteine content (Sau et al., 2005b).

    3

  • 8/3/2019 44RR

    4/30

    Complete genome sequences of threeAeromonas phages, namely, 44RR (synonym44RR2.8t), phage 31, and Aeh1, are available

    in public databases at present. While former

    two phages grow in A. salmonicida, Aeh1grows in A. hydrophila (Ttart et al., 2001).

    The %GC content is nearly identical in all

    three phages. Thus far, codon and amino acidusage biases have not been studied in

    Aeromonas phages at length though such

    Fig. 1. Nc plot of phage 44RR genes. See text for

    details.

    study may reveal the structure-function ofprotein-coding genes, evolution etc. In this

    communication, We have studied bothsynonymous codon and amino acid usage

    biases in A. salmonicida phage 44RR (Ttartet al ., 2001), determined the factorsresponsible for codon and amino acid usage

    biases in 44RR, and discussed the data

    elaborately. My data show that synonymouscodon usage of 44RR genes are influenced by

    both mutational bias and translational

    selection, whereas, amino usage of the 44RRproteins are mainly dictated by hydropathy,

    aromaticity and cysteine content.

    Materials and Methods

    The genome sequence of bacteriophage

    44RR (synonym 44RR2.8t) was down loaded

    from GenBank (USA) and its two hundred

    forty nine protein coding sequences (carrying50 or more codons) have been extracted from

    the genome by coderet

    (http:bioweb.Pasteur.fr/seqanal/interfaces/codret.html) program. The relative synonymous

    codon usage (RSCU) in all protein coding

    sequences was determined to study the overallcodon usage variation among the genes.

    RSCU is defined as the ratio of the observed

    frequency of codons to the expected

    frequency if all the synonymous codons forthose amino acids are used equally (Sharp and

    Li, 1987). RSCU values greater than 1.0

    indicate that the corresponding codon is more

    frequently used than expected, whereas thereverse is true for RSCU values less than 1.0.

    RSCU values in seven putatively highlyexpressed genes (e. g. genes encoding

    chaperonins, detoxification, and outer

    membrane proteins etc.) of A. salmonicida

    were also determined in this paper forcomparison.

    GC3S is the frequency of (G+C) and A3S,

    T3S, G3S, and C3S are the frequencies of A, T,G and C at the synonymous third positions of

    codons. Nc, the effective number of codons

    used by a gene, is generally used to measurethe bias of synonymous codons and

    independent of amino acid compositions and

    codon number (Wright, 1990). The values ofNc range from 20 (when one codon is used per

    amino acid) to 61 (when all the codons are

    used with equal probability). Nc values were

    calculated according to the method ofBanerjee et al. (2005). The putatively highly

    and lowly expressed genes have been

    categorized respectively on the basis oflowest 10% and highest 10% of the genes

    according to their Nc values. The program

    CodonW 1.3 (available atwww.molbio.ox.ac.uk/cu) was used for

    calculating most of the parameters including

    correspondence analysis (CA) on the relative

    synonymous codon and amino acid usages.

    4

    http://www.molbio.ox.ac.uk/cuhttp://www.molbio.ox.ac.uk/cu
  • 8/3/2019 44RR

    5/30

    In correspondence analysis, the data are

    plotted in a multidimensional space of 59 axes

    (excluding Met, Trp and stop codons) andthen it determines the most prominent axes

    contributing the codon usage variation among

    the genes. In the present study RSCU valueshave been used for CA in order to minimize

    the amino acid composition.

    Results and Discussion

    Overall codon usage analysis in A.

    salmonicida phage 44RR.

    The relative synonymous codon usage

    (RSCU) values determined in all the 249

    protein coding genes of 44RR shows that A

    and / or T ending codons are predominant inthis phage (Table 1). This is expected as %GC

    content in 44RR is 43.66. As analysis ofoverall RSCU alone is not sufficient to reveal

    the heterogeneity of codon usage in 44RR

    genes, WE also determined the effective

    numbers of codons used by gene (Nc) and(G+C) percentage at the synonymous third

    positions of codons (GC3s). It was observed

    that in 44RR, Nc values range from 26.486 to55.241 with a mean of 40.025 and standard

    deviation (s.d.) 6.248, whereas, GC3s ranges

    from 0.234 to 0.577 with a mean of 0.414 ands.d. 0.056. Taking together the results suggest

    that apart from the mutational bias, other

    factors might have some influences in thecodon usage variation among 44RR genes.

    Effect of mutational pressure in codon usage

    variation in 44RR. It was suggested that a plot of Nc vs GC3s

    could effectively be used to explore the codon

    usage variation among the genes (Wright,1990). According to Wright (1990), the

    comparison of actual distribution of genes,

    with the expected distribution under noselection, could be indicative if codon usage

    bias of genes has some other influences other

    than mutational bias. If the codon usage bias

    is completely dictated by GC3s, the values of

    Nc should fall on the expected curve between

    GC3s and Nc. In other words it can be said that

    if codon usage bias is completely dictated by

    Fig. 2. Positions of the 44RR genes along the two

    major axes of variation in the correspondence analysison RSCU values. The genes presented by the open

    circles.

    GC3s composition the difference between

    observed and expected Nc values should bevery small in majority of genes. To explore

    the possible influence of natural selection

    and mutational bias on synonymous codon

    usage on 44RR genome we calculated(NcExpected- NcObserved)/NcExpected. The frequency

    distributions of (NcExpected- NcObserved)/NcExpectedshown in Fig. 1 demonstrate that majority ofgenes have large deviation of NcObserved from

    NcExpected. This suggests that the majority of

    genes in 44RR have additional codon usage bias, which is independent of mutational

    bias.

    5

  • 8/3/2019 44RR

    6/30

    Table 2. Relative synonymous codon usage (RSCU) values for each codon for the two groups of genes in phage44RR. The asterisk denotes the codons whose occurrences are significantly (p < 0.01) higher in the extreme left side

    of axis 1 than the genes present on the extreme right of the first major axis. Superscript "a" denotes for genes of

    extreme left of axis 1 and "b" for extreme right genes. Each group contains 10% of sequences at either extreme of

    the major axis generated by correspondence analysis. N is the number of codons, AA represents amino acid.

    AA Codon RSCUa Na RSCUb Nb AA Codon RSCUa Na RSCUb Nb

    Phe UUU 0.50 ( 46) 1.10 ( 67) Ser UCU* 1.57 ( 73) 0.80 ( 22)

    UUC* 1.50 (137) 0.90 ( 55) UCC* 1.91 ( 89) 0.33 ( 9)

    Leu UUA 0.02 ( 1) 1.20 ( 36) UCA 0.43 ( 20) 1.87 ( 51)

    UUG 0.66 ( 36) 1.47 ( 44) UCG 0.37 ( 17) 1.39 ( 38)

    CUU 0.66 ( 36) 1.17 ( 35) Pro CCU 1.36 ( 44) 1.00 ( 25)

    CUC 0.77 ( 42) 0.67 ( 20) CCC 0.28 ( 9) 0.36 ( 9)

    CUA 0.07 ( 4) 0.80 ( 24) CCA 0.71 ( 23) 1.44 ( 36)CUG* 3.82 (208) 0.70 ( 21) CCG 1.64 ( 53) 1.20 ( 30)

    Ile AUU 0.94 ( 92) 1.07 ( 69) Thr ACU* 1.46 ( 83) 0.88 ( 37)

    AUC* 2.02 (199) 0.82 ( 53) ACC* 2.13 (121) 1.00 ( 42)

    AUA 0.04 ( 4) 1.10 ( 71) ACA 0.26 ( 15) 1.67 ( 70)

    Met AUG 1.00 (145) 1.00 ( 83) ACG 0.14 ( 8) 0.45 ( 19)

    Val GUU 2.07 (171) 1.87 ( 77) Ala GCU* 1.92 (194) 1.30 ( 47)

    GUC 0.78 ( 64) 0.70 ( 29) GCC 0.85 ( 86) 0.72 ( 26)

    GUA 0.92 ( 76) 0.68 ( 28) GCA 0.87 ( 88) 1.32 ( 48)

    GUG 0.23 ( 19) 0.75 ( 31) GCG 0.36 ( 36) 0.66 ( 24)

    Tyr UAU 0.73 ( 47) 1.20 ( 71) Cys UGU 0.40 ( 7) 0.69 ( 10)

    UAC* 1.27 ( 82) 0.80 ( 47) UGC 1.60 ( 28) 1.31 ( 19)

    Trp UGG 1.00 ( 48) 1.00 ( 54)

    His CAU 0.88 ( 26) 1.18 ( 36) Arg CGU* 2.71 ( 90) 0.73 ( 16)

    CAC 1.12 ( 33) 0.82 ( 25) CGC* 2.38 ( 79) 0.87 ( 19)

    Gln CAA 1.41 (138) 1.38 ( 64) CGA 0.54 ( 18) 1.37 ( 30)

    CAG 0.59 ( 58) 0.62 ( 29) CGG 0.27 ( 9) 0.64 ( 14)

    Asn AAU 0.59 ( 63) 1.06 ( 74) Ser AGU 0.60 ( 28) 0.84 ( 23)

    AAC* 1.41 (151) 0.94 ( 66) AGC 1.12 ( 52) 0.77 ( 21)

    Lys AAA 1.24 (210) 1.33 (110) Arg AGA 0.06 ( 2) 1.69 ( 37)

    AAG 0.76 (129) 0.67 ( 55) AGG 0.03 ( 1) 0.69 ( 15)

    Asp GAU 1.22 (171) 1.06 ( 75) Gly GGU* 2.21 (162) 1.43 ( 57)

    GAC 0.78 (110) 0.94 ( 66) GGC 1.54 (113) 1.38 ( 55)

    Glu GAA 1.58 (289) 1.40 (125) GGA 0.19 ( 14) 0.98 ( 39)

    GAG 0.42 ( 76) 0.60 ( 54) GGG 0.05 ( 4) 0.20 ( 8)

    6

  • 8/3/2019 44RR

    7/30

    Next, we carried out correspondence analysis

    (CA) on the RSCU values of the 249protein-

    coding genes of 44RR phage in order todetermine the factors influencing the codon

    usage bias in 44RR. Figure 2 shows the

    distributions of 44RR genes on the first twomajor axes of the correspondence analysis. It

    is found that the first major axis is accounted

    for 10.13% of the total variation and thesecond major axis accounted for 5.20% of the

    total variation. The position of the genes

    along the first major axis is negatively

    correlated with the A3s (r = -0.575, p

  • 8/3/2019 44RR

    8/30

    8

  • 8/3/2019 44RR

    9/30

    9

  • 8/3/2019 44RR

    10/30

  • 8/3/2019 44RR

    11/30

    11

  • 8/3/2019 44RR

    12/30

    12

  • 8/3/2019 44RR

    13/30

    13

  • 8/3/2019 44RR

    14/30

    14

  • 8/3/2019 44RR

    15/30

    15

  • 8/3/2019 44RR

    16/30

    16

  • 8/3/2019 44RR

    17/30

    17

  • 8/3/2019 44RR

    18/30

    18

  • 8/3/2019 44RR

    19/30

    19

  • 8/3/2019 44RR

    20/30

    20

  • 8/3/2019 44RR

    21/30

    21

  • 8/3/2019 44RR

    22/30

    22

  • 8/3/2019 44RR

    23/30

    23

  • 8/3/2019 44RR

    24/30

    24

  • 8/3/2019 44RR

    25/30

    25

  • 8/3/2019 44RR

    26/30

    26

  • 8/3/2019 44RR

    27/30

    27

  • 8/3/2019 44RR

    28/30

    Fig. 3. Correlation between GRAVY scores of each

    amino acid residues versus the axis 1 values of

    correspondence analysis. Single letter codes are used to

    show the positions of 20 amino acid residues. Amino

    acid residues are presented closed circles.

    The cellular tRNA abundance is positively

    correlated with over-represented codons ofthe highly expressed genes in several

    organisms (Grantham et al., 1981; Sharp et

    al., 1984-85; Gouy, 1987; Ikemura. 1992;Zhou et al., 1999; Kanaya et al., 1999;

    Kanaya et al., 2001). In coliphage T4,

    synonymous codon usage of the highly

    expressed genes was also shown to be

    positively correlated with the abundantcellular tRNAs (Kunisawa, 1992). This may

    also be true for the highly expressed genes of44RR. As the status of tRNA copy number is

    not known in A. salmonicida at present, we

    determined the RSCU values of the putativelyhighly expressed genes of A. salmonicida and

    compared the resulting data with that of

    highly expressed genes of 44RR (Table 1). Itwas found that among the 19 over-represented

    codons present in each of the highly

    expressed genes of 44RR and A. salmonicida,

    13 are common in both species. This indicatesthat nearly 68% over-represented codons of

    highly expressed genes of 44RR are

    recognized by abundant host tRNAs.Amino Acid Usage in 44RR.

    To identify the factors influencing the

    amino acid composition in 44RR, we also

    28

  • 8/3/2019 44RR

    29/30

    carried out CA on the relative amino acid

    usage of its 249 proteins. Analysis showed

    that the first and second major axes of CA areaccounted for 18.66% and 10.91% of the total

    variation of the amino acid composition of

    44RR proteins, respectively. Further analysisrevealed that first major axis is positively

    correlated with GRAVY (r = 0.410, p < 0.01)

    of each 44RR protein. A plot of GRAVYscore of each amino acid residue versus first

    major axis in fact shows that charged residues

    like lys, arg, his, asp, glu are located on the

    negative side of first axis (Fig. 3).Further analysis has shown that the

    first major axis is also significantly correlated

    (r = -0.404, p

  • 8/3/2019 44RR

    30/30

    Naya, H., Zavala, A., et al., 2004.Biochem. Biophys.

    Res. Commun . 325: 1252-7

    Oresisc, M. and Shalloway, D. 1998.J. Mol. Biol. 281:

    31 48

    Romero, H., Zavala, A., et al.. 2000. Nucl. Acids Res.

    28: 2084-2090

    Sahu, K., Gupta, S. K., et al. 2004. J. Biochem. Mol.Biol. 37: 487 492

    Sahu, K., Gupta, S. K., Sau, S., Ghosh, T. C. 2005. J

    Biomol Struct Dyn 23:63-71

    Sau, K., Gupta, S. K., et al., 2005a. Virus Res.113:123-31

    Sau, K., Sau, S., et al., 2005b.Acta Biochim Biophys

    Sin (Shanghai). 37:625-33

    Sharp, P. M., Rogers, M. S. et al., 1984-85. Nucl.

    Acids Res 21: 150 60

    Sharp, P.M. and Li, W. H. 1987.Nucleic Acids Res. 15:

    1281 1295

    Sharp, P.M. and Cowe, E., 1991. Yeast. 7: 657-678Tetart, F., Desplats, Cet al. 2001. J Bacteriol. 183:

    358-66Wright, F. 1990. Gene. 87: 23 29

    Xie, T. and Ding, D.F. 1998.FEBS Lett. 434: 93-96.

    Zhou, J., Liu, W. J., et al. 1999. J. Virol. 73: 4972-

    4982.

    Zavala, A., Naya, H., Romero, H. and Musto, H.

    2002..J. Mol. Evol. 54: 563-8.

    http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Search&term=%22Sau+S%22%5BAuthor%5Dhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Search&term=%22Sau+S%22%5BAuthor%5D