8/3/2019 44RR
1/30
Factors influencing synonymous codon and amino acid usage biases in
Aromonas salmonicida phage 44RR
A.P.Ghosh, A. Deb, and K. Sau*
Department of Biotechnology, Haldia Institute of Technology, PO-HIT, Dist-Purba-Medinipur, Haldia, W. B.
721657.
To reveal the structure-function of the genes ofA. salmonicida phage 44RR, synonymous codon and
amino acid usage biases in this phage have been investigated at length. As expected for an AT-rich phage,
third codon position of the synonymous codons of 44RR carries mostly A and/or T base. Analyses on the
RSCU values of 44RR genes reveal that synonymous codon usage bias in 44RR is strongly dictated by the
mutational pressure. Further analysis reveals that 44RR-specifc tRNAs preferentially influence the codon
usage of putatively lowly expressed genes of 44RR, whereas, codon usage of its putatively highly expressed
genes are influenced mainly by the abundant cellular tRNAs. The data suggest that translational selection
also plays a role in shaping the codon usage variation in 44RR. Additional analysis reveals that the three
factors such as hydropathy, aromaticity and cysteine content are mostly responsible for the variation of
amino acid usage in 44RR proteins.
Key words:Relative synonymous codon usage (RSCU); correspondence analysis; amino acid usage; phage 44RR.
. Introduction
Synonymous codon and amino acid
usages in living organisms had been shown tovary inter- as well as intra-genomically.
While mutational pressure (Levine et al.,
2000; Jenkins et al., 2001; Jenkins and
Holmes, 2003), translational selection(Grantham et al., 1981; Ikemura, 1985; Sharp
and Cowe, 1991; Lesniket al., 2000; Ghoshet al ., 2000; Gupta and Ghosh, 2001),secondary structure of proteins (Oresisc and
Shalloway, 1998; Xie and Ding, 1998;
Chiusano et al., 2000; Gupta et al., 2000;DOnofrio et al ., 2002), replicational &
transcriptional selection (McInerney, 1998;
Romero et al., 2000) environmental factors
(Lynn et al., 2002; Basak et al., 2004) etcinfluence the codon usage in various
organisms, amino acid usage
*Corresponding author:
(K. Sau): [email protected]
Tel: +91-3224-252900. Ext. 234.Fax: +91-3224-252800
was shown to be governed by hydrophobicity,aromaticity, mean molecular weight, cysteine
content, etc. (Lobry and Gautier, 1994; Garat
and Musto, 2000; Zavala et al., 2002;Banerjee et al., 2004; Naya et al., 2004).
Apart from elucidating the structure-function
and evolution of genes / genomes, codon
usage in particular has number of practicalapplications.
Synonymous codon and amino acid usage
biases had been studied in details in a fewphage genomes mainly belonging to different
coliphages, mycobacteriophages, S. aureus
phages, and P. aeruginosa phages (Sharp et al.,1984-85; Gouy, 1987; Holm, 1987; Kunisawa,
1992; Kunisawa et al., 1998, Sahu et al., 2004;
Sahu et al., 2005, Sau et al., 2005a; Sau et al.,
2005b). Synonymous codon usage of highlyand lowly expressed genes in T4 were shown
to be influenced by the abundant host tRNAs
and own tRNAs, respectively, (Kunisawa,1992). In phage T7, codon usage was also
suggested to be influenced mainly by the
abundant host tRNAs (Sharp et al., 1984-85).It was described that codon usage in both T4
and T7 are influenced mainly by mutational
pressure (Kunisawa et al., 1998).
1
8/3/2019 44RR
2/30
Table 1. Overall codon usage analysis in 44RR.
44RR A.salmonicida tRNA
Amino Overall HEG* LEG** HEG copy
Acid Codon RSCU RSCU RSCU RSCU 44RR
Phe UUU 0.88 0.77 0.94 0.46
UUC 1.12 1.23 1.06 1.54 1
Leu UUA 0.58 0.66 0.72 0.03 1
UUG 1.39 1.1 1.43 0.31
CUU 1.32 0.62 1.39 0.17
CUC 0.92 0.72 0.85 0.65
CUA 0.39 0.31 0.49 0.00
CUG 1.4 2.59 1.12 4.84
Ile AUU 1.19 1.1 1.12 0.54
AUC 1.27 1.5 1.08 2.36 1
AUA 0.54 0.4 0.8 0.11
Met AUG 1 1 1 1 2
Val GUU 2.02 2.2 1.98 0.94
GUC 0.77 0.54 0.8 1.11
GUA 0.75 0.72 0.78 0.62
GUG 0.45 0.54 0.44 1.34
Ser UCU 1.31 1.33 1.26 1.04
UCC 0.74 1.42 0.52 2.78
UCA 1.49 1 1.7 0.11 1
UCG 0.89 0.67 1.04 0.16
AGU 0.68 0.71 0.62 0.38AGC 0.89 0.88 0.86 1.53 1
Pro CCU 1.04 1.15 1.01 0.48
CCC 0.25 0.2 0.19 0.85
CCA 1.43 0.85 1.59 0.24 1
CCG 1.29 1.8 1.21 2.42
Thr ACU 1.25 1.46 1.22 0.62
ACC 1.18 1.76 0.92 3.01
ACA 1.06 0.46 1.31 0.29 1
ACG 0.51 0.33 0.55 0.07
Ala GCU 1.5 1.98 1.5 1.11GCC 0.74 0.65 0.66 1.90
GCA 1.09 1.02 1.05 0.36
GCG 0.67 0.35 0.79 0.63
Tyr UAU 1.15 0.91 1.22 0.52
UAC 0.85 1.09 0.78 1.48 1
His CAU 1.12 0.88 1.11 0.37
2
8/3/2019 44RR
3/30
CAC 0.88 1.12 0.89 1.63 1
Gln CAA 1.29 1.45 1.18 0.51
CAG 0.71 0.55 0.82 1.49
Asn AAU 0.93 0.84 0.94 0.24
AAC 1.07 1.16 1.06 1.76 1
Lys AAA 1.38 1.37 1.34 0.78 1
AAG 0.62 0.63 0.66 1.22
Asp GAU 1.15 1.01 1.18 0.79
GAC 0.85 0.99 0.82 1.21 1
Glu GAA 1.5 1.7 1.45 1.31
GAG 0.5 0.3 0.55 0.69
Cys UGU 0.79 0.85 0.9 0.73
UGC 1.21 1.15 1.1 1.27
Trp UGG 1 1 1 1 1
Arg CGU 1.89 1.89 1.63 3.04
CGC 1.63 1.89 1.37 2.39CGA 1.16 0.94 1.31 0.07
CGG 0.64 0.67 0.68 0.29
AGA 0.49 0.33 0.78 0.07 1
AGG 0.19 0.28 0.23 0.14
Gly GGU 1.68 2.1 1.38 1.75
GGC 1.3 1.31 1.33 1.98
GGA 0.74 0.44 0.93 0.05
GGG 0.28 0.15 0.36 0.22
Note- HEG and LEG denote highly and lowly expressed genes. The * and ** indicate putatively highly and lowlyexpressed genes of phage 44RR, which have been categorized respectively on the basis of lowest 10% and highest
10% of the genes according to their Nc values.
In phages lambda, N15, P2, and P4,
synonymous codon usage patterns were
found nearly similar to that of the lowly
expressed genes of E. coli (Holm, 1987;Gouy, 1987; Kunisawa, 1992; Kunisawa et
al., 1998). Contrary to above, codon usages
of mycobacteriophages and S. aureus phageswere found almost identical to their
respective bacterial hosts (Kunisawa, 2000;Sahu et al., 2004; Sahu et al., 2005; Sau et
al., 2005a). It was shown that mutational pressure and translational selection mainly
influence the codon usages in mycobacterial
and S. aureus phages (Sahu et al., 2004;Sahu et al., 2005; Sau et al., 2005a). While
tRNAs of mycobacteriophage Bxz1 wassuggested to regulate the expression of both
its highly and lowly expressed genes (Sahuet al., 2004), tRNAs present in phages D29and L5 were suggested to affect their amino
acid usage to some extent (Kunisawa, 2000).
Very recently, it was shown that in AT-rich P. aeruginosa phage PhiKZ, codon usage
bias is dictated mainly by mutational biasand, to an extent, by translational selection.
Analysis also revealed that amino acid usagein PhiKZ proteins are mainly dictated by
mean molecular weight, aromaticity and
cysteine content (Sau et al., 2005b).
3
8/3/2019 44RR
4/30
Complete genome sequences of threeAeromonas phages, namely, 44RR (synonym44RR2.8t), phage 31, and Aeh1, are available
in public databases at present. While former
two phages grow in A. salmonicida, Aeh1grows in A. hydrophila (Ttart et al., 2001).
The %GC content is nearly identical in all
three phages. Thus far, codon and amino acidusage biases have not been studied in
Aeromonas phages at length though such
Fig. 1. Nc plot of phage 44RR genes. See text for
details.
study may reveal the structure-function ofprotein-coding genes, evolution etc. In this
communication, We have studied bothsynonymous codon and amino acid usage
biases in A. salmonicida phage 44RR (Ttartet al ., 2001), determined the factorsresponsible for codon and amino acid usage
biases in 44RR, and discussed the data
elaborately. My data show that synonymouscodon usage of 44RR genes are influenced by
both mutational bias and translational
selection, whereas, amino usage of the 44RRproteins are mainly dictated by hydropathy,
aromaticity and cysteine content.
Materials and Methods
The genome sequence of bacteriophage
44RR (synonym 44RR2.8t) was down loaded
from GenBank (USA) and its two hundred
forty nine protein coding sequences (carrying50 or more codons) have been extracted from
the genome by coderet
(http:bioweb.Pasteur.fr/seqanal/interfaces/codret.html) program. The relative synonymous
codon usage (RSCU) in all protein coding
sequences was determined to study the overallcodon usage variation among the genes.
RSCU is defined as the ratio of the observed
frequency of codons to the expected
frequency if all the synonymous codons forthose amino acids are used equally (Sharp and
Li, 1987). RSCU values greater than 1.0
indicate that the corresponding codon is more
frequently used than expected, whereas thereverse is true for RSCU values less than 1.0.
RSCU values in seven putatively highlyexpressed genes (e. g. genes encoding
chaperonins, detoxification, and outer
membrane proteins etc.) of A. salmonicida
were also determined in this paper forcomparison.
GC3S is the frequency of (G+C) and A3S,
T3S, G3S, and C3S are the frequencies of A, T,G and C at the synonymous third positions of
codons. Nc, the effective number of codons
used by a gene, is generally used to measurethe bias of synonymous codons and
independent of amino acid compositions and
codon number (Wright, 1990). The values ofNc range from 20 (when one codon is used per
amino acid) to 61 (when all the codons are
used with equal probability). Nc values were
calculated according to the method ofBanerjee et al. (2005). The putatively highly
and lowly expressed genes have been
categorized respectively on the basis oflowest 10% and highest 10% of the genes
according to their Nc values. The program
CodonW 1.3 (available atwww.molbio.ox.ac.uk/cu) was used for
calculating most of the parameters including
correspondence analysis (CA) on the relative
synonymous codon and amino acid usages.
4
http://www.molbio.ox.ac.uk/cuhttp://www.molbio.ox.ac.uk/cu8/3/2019 44RR
5/30
In correspondence analysis, the data are
plotted in a multidimensional space of 59 axes
(excluding Met, Trp and stop codons) andthen it determines the most prominent axes
contributing the codon usage variation among
the genes. In the present study RSCU valueshave been used for CA in order to minimize
the amino acid composition.
Results and Discussion
Overall codon usage analysis in A.
salmonicida phage 44RR.
The relative synonymous codon usage
(RSCU) values determined in all the 249
protein coding genes of 44RR shows that A
and / or T ending codons are predominant inthis phage (Table 1). This is expected as %GC
content in 44RR is 43.66. As analysis ofoverall RSCU alone is not sufficient to reveal
the heterogeneity of codon usage in 44RR
genes, WE also determined the effective
numbers of codons used by gene (Nc) and(G+C) percentage at the synonymous third
positions of codons (GC3s). It was observed
that in 44RR, Nc values range from 26.486 to55.241 with a mean of 40.025 and standard
deviation (s.d.) 6.248, whereas, GC3s ranges
from 0.234 to 0.577 with a mean of 0.414 ands.d. 0.056. Taking together the results suggest
that apart from the mutational bias, other
factors might have some influences in thecodon usage variation among 44RR genes.
Effect of mutational pressure in codon usage
variation in 44RR. It was suggested that a plot of Nc vs GC3s
could effectively be used to explore the codon
usage variation among the genes (Wright,1990). According to Wright (1990), the
comparison of actual distribution of genes,
with the expected distribution under noselection, could be indicative if codon usage
bias of genes has some other influences other
than mutational bias. If the codon usage bias
is completely dictated by GC3s, the values of
Nc should fall on the expected curve between
GC3s and Nc. In other words it can be said that
if codon usage bias is completely dictated by
Fig. 2. Positions of the 44RR genes along the two
major axes of variation in the correspondence analysison RSCU values. The genes presented by the open
circles.
GC3s composition the difference between
observed and expected Nc values should bevery small in majority of genes. To explore
the possible influence of natural selection
and mutational bias on synonymous codon
usage on 44RR genome we calculated(NcExpected- NcObserved)/NcExpected. The frequency
distributions of (NcExpected- NcObserved)/NcExpectedshown in Fig. 1 demonstrate that majority ofgenes have large deviation of NcObserved from
NcExpected. This suggests that the majority of
genes in 44RR have additional codon usage bias, which is independent of mutational
bias.
5
8/3/2019 44RR
6/30
Table 2. Relative synonymous codon usage (RSCU) values for each codon for the two groups of genes in phage44RR. The asterisk denotes the codons whose occurrences are significantly (p < 0.01) higher in the extreme left side
of axis 1 than the genes present on the extreme right of the first major axis. Superscript "a" denotes for genes of
extreme left of axis 1 and "b" for extreme right genes. Each group contains 10% of sequences at either extreme of
the major axis generated by correspondence analysis. N is the number of codons, AA represents amino acid.
AA Codon RSCUa Na RSCUb Nb AA Codon RSCUa Na RSCUb Nb
Phe UUU 0.50 ( 46) 1.10 ( 67) Ser UCU* 1.57 ( 73) 0.80 ( 22)
UUC* 1.50 (137) 0.90 ( 55) UCC* 1.91 ( 89) 0.33 ( 9)
Leu UUA 0.02 ( 1) 1.20 ( 36) UCA 0.43 ( 20) 1.87 ( 51)
UUG 0.66 ( 36) 1.47 ( 44) UCG 0.37 ( 17) 1.39 ( 38)
CUU 0.66 ( 36) 1.17 ( 35) Pro CCU 1.36 ( 44) 1.00 ( 25)
CUC 0.77 ( 42) 0.67 ( 20) CCC 0.28 ( 9) 0.36 ( 9)
CUA 0.07 ( 4) 0.80 ( 24) CCA 0.71 ( 23) 1.44 ( 36)CUG* 3.82 (208) 0.70 ( 21) CCG 1.64 ( 53) 1.20 ( 30)
Ile AUU 0.94 ( 92) 1.07 ( 69) Thr ACU* 1.46 ( 83) 0.88 ( 37)
AUC* 2.02 (199) 0.82 ( 53) ACC* 2.13 (121) 1.00 ( 42)
AUA 0.04 ( 4) 1.10 ( 71) ACA 0.26 ( 15) 1.67 ( 70)
Met AUG 1.00 (145) 1.00 ( 83) ACG 0.14 ( 8) 0.45 ( 19)
Val GUU 2.07 (171) 1.87 ( 77) Ala GCU* 1.92 (194) 1.30 ( 47)
GUC 0.78 ( 64) 0.70 ( 29) GCC 0.85 ( 86) 0.72 ( 26)
GUA 0.92 ( 76) 0.68 ( 28) GCA 0.87 ( 88) 1.32 ( 48)
GUG 0.23 ( 19) 0.75 ( 31) GCG 0.36 ( 36) 0.66 ( 24)
Tyr UAU 0.73 ( 47) 1.20 ( 71) Cys UGU 0.40 ( 7) 0.69 ( 10)
UAC* 1.27 ( 82) 0.80 ( 47) UGC 1.60 ( 28) 1.31 ( 19)
Trp UGG 1.00 ( 48) 1.00 ( 54)
His CAU 0.88 ( 26) 1.18 ( 36) Arg CGU* 2.71 ( 90) 0.73 ( 16)
CAC 1.12 ( 33) 0.82 ( 25) CGC* 2.38 ( 79) 0.87 ( 19)
Gln CAA 1.41 (138) 1.38 ( 64) CGA 0.54 ( 18) 1.37 ( 30)
CAG 0.59 ( 58) 0.62 ( 29) CGG 0.27 ( 9) 0.64 ( 14)
Asn AAU 0.59 ( 63) 1.06 ( 74) Ser AGU 0.60 ( 28) 0.84 ( 23)
AAC* 1.41 (151) 0.94 ( 66) AGC 1.12 ( 52) 0.77 ( 21)
Lys AAA 1.24 (210) 1.33 (110) Arg AGA 0.06 ( 2) 1.69 ( 37)
AAG 0.76 (129) 0.67 ( 55) AGG 0.03 ( 1) 0.69 ( 15)
Asp GAU 1.22 (171) 1.06 ( 75) Gly GGU* 2.21 (162) 1.43 ( 57)
GAC 0.78 (110) 0.94 ( 66) GGC 1.54 (113) 1.38 ( 55)
Glu GAA 1.58 (289) 1.40 (125) GGA 0.19 ( 14) 0.98 ( 39)
GAG 0.42 ( 76) 0.60 ( 54) GGG 0.05 ( 4) 0.20 ( 8)
6
8/3/2019 44RR
7/30
Next, we carried out correspondence analysis
(CA) on the RSCU values of the 249protein-
coding genes of 44RR phage in order todetermine the factors influencing the codon
usage bias in 44RR. Figure 2 shows the
distributions of 44RR genes on the first twomajor axes of the correspondence analysis. It
is found that the first major axis is accounted
for 10.13% of the total variation and thesecond major axis accounted for 5.20% of the
total variation. The position of the genes
along the first major axis is negatively
correlated with the A3s (r = -0.575, p
8/3/2019 44RR
8/30
8
8/3/2019 44RR
9/30
9
8/3/2019 44RR
10/30
8/3/2019 44RR
11/30
11
8/3/2019 44RR
12/30
12
8/3/2019 44RR
13/30
13
8/3/2019 44RR
14/30
14
8/3/2019 44RR
15/30
15
8/3/2019 44RR
16/30
16
8/3/2019 44RR
17/30
17
8/3/2019 44RR
18/30
18
8/3/2019 44RR
19/30
19
8/3/2019 44RR
20/30
20
8/3/2019 44RR
21/30
21
8/3/2019 44RR
22/30
22
8/3/2019 44RR
23/30
23
8/3/2019 44RR
24/30
24
8/3/2019 44RR
25/30
25
8/3/2019 44RR
26/30
26
8/3/2019 44RR
27/30
27
8/3/2019 44RR
28/30
Fig. 3. Correlation between GRAVY scores of each
amino acid residues versus the axis 1 values of
correspondence analysis. Single letter codes are used to
show the positions of 20 amino acid residues. Amino
acid residues are presented closed circles.
The cellular tRNA abundance is positively
correlated with over-represented codons ofthe highly expressed genes in several
organisms (Grantham et al., 1981; Sharp et
al., 1984-85; Gouy, 1987; Ikemura. 1992;Zhou et al., 1999; Kanaya et al., 1999;
Kanaya et al., 2001). In coliphage T4,
synonymous codon usage of the highly
expressed genes was also shown to be
positively correlated with the abundantcellular tRNAs (Kunisawa, 1992). This may
also be true for the highly expressed genes of44RR. As the status of tRNA copy number is
not known in A. salmonicida at present, we
determined the RSCU values of the putativelyhighly expressed genes of A. salmonicida and
compared the resulting data with that of
highly expressed genes of 44RR (Table 1). Itwas found that among the 19 over-represented
codons present in each of the highly
expressed genes of 44RR and A. salmonicida,
13 are common in both species. This indicatesthat nearly 68% over-represented codons of
highly expressed genes of 44RR are
recognized by abundant host tRNAs.Amino Acid Usage in 44RR.
To identify the factors influencing the
amino acid composition in 44RR, we also
28
8/3/2019 44RR
29/30
carried out CA on the relative amino acid
usage of its 249 proteins. Analysis showed
that the first and second major axes of CA areaccounted for 18.66% and 10.91% of the total
variation of the amino acid composition of
44RR proteins, respectively. Further analysisrevealed that first major axis is positively
correlated with GRAVY (r = 0.410, p < 0.01)
of each 44RR protein. A plot of GRAVYscore of each amino acid residue versus first
major axis in fact shows that charged residues
like lys, arg, his, asp, glu are located on the
negative side of first axis (Fig. 3).Further analysis has shown that the
first major axis is also significantly correlated
(r = -0.404, p
8/3/2019 44RR
30/30
Naya, H., Zavala, A., et al., 2004.Biochem. Biophys.
Res. Commun . 325: 1252-7
Oresisc, M. and Shalloway, D. 1998.J. Mol. Biol. 281:
31 48
Romero, H., Zavala, A., et al.. 2000. Nucl. Acids Res.
28: 2084-2090
Sahu, K., Gupta, S. K., et al. 2004. J. Biochem. Mol.Biol. 37: 487 492
Sahu, K., Gupta, S. K., Sau, S., Ghosh, T. C. 2005. J
Biomol Struct Dyn 23:63-71
Sau, K., Gupta, S. K., et al., 2005a. Virus Res.113:123-31
Sau, K., Sau, S., et al., 2005b.Acta Biochim Biophys
Sin (Shanghai). 37:625-33
Sharp, P. M., Rogers, M. S. et al., 1984-85. Nucl.
Acids Res 21: 150 60
Sharp, P.M. and Li, W. H. 1987.Nucleic Acids Res. 15:
1281 1295
Sharp, P.M. and Cowe, E., 1991. Yeast. 7: 657-678Tetart, F., Desplats, Cet al. 2001. J Bacteriol. 183:
358-66Wright, F. 1990. Gene. 87: 23 29
Xie, T. and Ding, D.F. 1998.FEBS Lett. 434: 93-96.
Zhou, J., Liu, W. J., et al. 1999. J. Virol. 73: 4972-
4982.
Zavala, A., Naya, H., Romero, H. and Musto, H.
2002..J. Mol. Evol. 54: 563-8.
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Search&term=%22Sau+S%22%5BAuthor%5Dhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Search&term=%22Sau+S%22%5BAuthor%5D