Small RNAs in Rickettsia: are they functional?
-
Upload
wagied-davids -
Category
Documents
-
view
213 -
download
0
Transcript of Small RNAs in Rickettsia: are they functional?
TRENDS in Genetics Vol.18 No.7 July 2002
http://tig.trends.com 0168-9525/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0168-9525(02)02685-9
331Research Update
Many obligate intracellular pathogens
have small genomes with high fractions of
pseudogenes. A recent analysis of gene
expression patterns in Rickettsia conorii
shows that short open reading frames
inside deteriorating genes are occasionally
transcribed into RNA. Here, we show
that substitution frequencies at
nonsynonymous sites are similar for
expressed and unexpressed parts of the
fragmented genes. We conclude that
the observed expression is a temporary
stage in the gene degradation process,
suggesting that the expressed gene
fragments are not functional.
Microbial genes with similar functions
are often organized into co-transcribed
operons, the longest and most highly
conserved of which is the super-
ribosomal protein gene cluster [1]. As a
consequence, most genomes are densely
packed with short spacer regions
between the genes [2]. The expression of
genes and operons in free-living bacteria
are controlled by a broad spectrum of
regulatory systems, allowing fast
adaptation to changing growth
environments. Regulatory RNAs seem
to be more common than previously
thought, and several encompass small
internal open reading frames [3].
By contrast, obligate intracellular
bacteria often contain disrupted operon
structures [4], and have only a small
set of genes involved in regulatory
processes [5–8]. Furthermore, many
previously active genes have been
partially degraded by mutations and
deletions to form high fractions of what
seems to be junk DNA [9–12]. The
process of gene degradation has been
inferred from comparative sequence
analyses [9,10], providing no
information about the possible role and
regulation of the resulting junk DNA.
Also, it is not known at what stage of
the deterioration process that the
function of the gene is lost or when
its expression is turned off. A recent
analysis has shown that RNA is still
produced from some of the degraded
genes in Rickettsia [6], raising
questions about a putative role for
the junk DNA.
Gene degradation in Rickettsia
Rickettsia was the first bacterium for
which gene degradation was described
in any detail [5,6,9–11]. Members of the
genus Rickettsia are obligate intracellular
pathogens that infect vertebrate hosts
with the help of bloodsucking arthropods,
such as fleas, lice and ticks. Some species
multiply exclusively in the host-cell
cytoplasm, whereas others can also grow
in the cell nucleus. A few are deadly
human pathogens, but others cause no
observable harm to their eukaryotic
hosts. Here, we compare two Rickettsia
species for which complete genome
sequence data are available, Rickettsia
prowazekii [5], the causative agent of
epidemic typhus and Rickettsia conorii
[6], the causative agent of Mediterranean
spotted fever.
The genomes of R. prowazekii and
R. conorii are very small, only 1.1 and
1.3 Mb in size, respectively [5–6]. Before
the genome sequences were obtained, it
was estimated that the R. prowazekii
genome contains a high fraction of
noncoding DNA, as inferred from a simple
calculation of the GC content of coding and
noncoding segments of the genome [13].
This estimate turned out to hold
remarkably well: the complete
R. prowazekii genome [5] showed that
genes comprise only 76% of the nucleotide
sequence, which at the time was the
lowest gene density described in any
microbial system. The remaining spacer
regions were suggested to be degraded
remnants of ancestral genes that are no
longer functional. But if so, why had they
not been eliminated completely?
It has been suggested that high
deletion rates have been selected in
microbial genomes to prevent the
accumulation of dangerous genetic
parasites [14]. Indeed, numerous studies
have shown that genes that confer no
selectable functions are lost rapidly, most
often by recombination between repetitive
elements [15,16], resulting in compact
genomes with little DNA in between
genes. The mean spacer length in
microbial genomes is estimated to be
140 bp, a value that is independent of
genome size [2]. This suggests that most
bacterial genomes, small and large, have
very small spacers. At first sight, it might
seem paradoxical that the bacterial
genomes with the longest spacers are
those of Rickettsia [5–6] and
Mycobacterium leprae [7]; that is, obligate
intracellular parasites subjected to
reductive genome evolution.
Comparative sequence analyses show
that pseudogenes and long spacers in
Rickettsia are degraded genes in the
process of being eliminated [6,10]. The
patterns of changes in these neutrally
evolving sequences reveal that there is
a mutational bias for short deletions,
which explains the observed sequence
degradation [9,11]. Because influx of
genetic material by horizontal gene
transfer is prevented by the lack of
exposure to bacteriophages and other
bacteria in the eukaryotic cytoplasm, the
result is a net loss of DNA [4,9–11,14].
The obligate intracellular parasite
M. leprae, which has a genome size of
3.2 Mb, a coding content of only 50% and
as many as 1116 pseudogenes, provides
the best example of a microbial genome
in which massive gene disintegration
has occurred [7]. The effect of this
degenerative process is a temporary
accumulation of junk DNA.
Fragmented genes in R. conorii
The R. conorii genome contains 804 of
the 834 genes previously identified in
R. prowazekii, and another 552 genes
are present uniquely in R. conorii [6]. An
inspection of the spacer sequences in
R. prowazekii that are located at the
corresponding position to the unique
genes in R. conorii has identified short
gene remnants for 229 of these 552 genes
[6]. This suggests that more than
200 genes have been extensively degraded
and that another 200 genes have been
completely eliminated from the
R. prowazekii genome since its divergence
from R. conorii.
A smaller suite of genes appears to
have been mutated more recently, as
inferred from the identification of short,
neighbouring, open reading frames
(ORFs) in R. conorii that are similar to
full-length orthologues in other species.
These include 37 genes that are split into
105 ORFs by internal stop codons and
Small RNAs in Rickettsia: are they functional?
Wagied Davids, Haleh Amiri and Siv G.E. Andersson
TRENDS in Genetics Vol.18 No.7 July 2002
http://tig.trends.com
332 Research UpdateResearch Update
frameshift mutations [6]. Fourteen
of these have intact orthologues in
R. prowazekii, and the remaining 23 are
not present in that genome. Also, the
R. prowazekii genome contains 11 genes
that are split into 23 ORFs, all of which
have intact orthologues in R. conorii [6].
Ogata et al. use the term ‘split genes’
rather than ‘pseudogenes’ so as not to
make any a priori assumption about the
functional consequences of this type of
gene disintegration.
Expression patterns of split genes in
R. conorii
Studies of the transcription profiles of the
split genes in R. conorii suggest that gene
inactivation is a complex process that
occurs in a step-wise manner (Fig. 1).
The most intriguing finding is that
transcription is sometimes re-initiated
inside the fragmented genes in R. conorii
[6]. This suggests that promoters can
either be created by mutations, or
recruited from existing sequences inside
the fragmented genes. In general,
transcription might be less well regulated
in the small AT-rich genomes of obligate
intracellular bacteria, and unwanted
transcription, especially inside
degrading gene sequences, could be
difficult to prevent.
Indeed, bacterial promoters are AT-rich
and potential promoter sequences are
very frequent in the AT-rich genomes
of R. prowazekii and R. conorii [5,6].
For example, the sequence TATAAT, one of
several possible RNA polymerase binding
sites, occurs seven times inside the
expressed split genes shown in Fig. 1.
The use of new promoters inside
deteriorating genes could lead to a
temporary retention of a partial gene
function, which in principle could
compensate for the accumulation of
mutations in these small genomes.
Alternatively, transcription of these short
fragments might solely be an effect of
the exposure of internal binding sites for
RNA polymerase, with no functional
consequences at the protein level.
Substitution frequencies of fragmented
genes in R. conorii
To distinguish between these
two alternatives, we have searched for
functional constraints on the expressed
gene fragments in R. conorii by
TRENDS in Genetics
Phenylalanyl-tRNA synthetase β
Rc
Rp
Rc
Rp
Rc
Rp
Rp
Rp
Rp
Rc
Rc
Rc
Alkaline phosphate synthesissensor protein
Rc702
Rc1043
Rc148
Rc217 Rc216
Rc721Rc720
Rc215Rc218
Rc1042
Rc703 Rc704
Unknown protein
Acetate kinase
Rc149Rc150
Propylendopeptidase
LPS 1, 2 glycosyltransferase
P
P
P
(d)
(f)
(h)
(b)
(j)
(l)
(a)
(c)
(e)
(g)
(i)
(k)
P
Deletion
Fig. 1. Gene inactivation in Rickettsia. The left panels(a,c,e,g,i,k) show a comparison of a selected subset of split genes in Rickettsia conorii (Rc) with theirfull-length orthologues in Rickettsia prowazekii (Rp).The right panels (b,d,f,h,j,l) show the inferredexpression status of the split genes displayed in thecorrespondingleft panel. We assume that most genesin the common ancestor of Rp and Rc were functionaland produced full-length mRNA, translated intoproteins by ribosome (a,b). We speculate that thefunctional inactivation of genes in Rickettsia occurs bythe following mechanism: the fixation of internal stopcodons induces premature translation termination(c,d), followed by premature transcription termination(e,f) and, occasionally, initiation of transcription atinternal start sites (g,h). Any of the promotersequences can be lost (i,j) and the continuedaccumulation of deletion mutations results in theelimination of all or most parts of the ancestral gene (k,l).The first stage of this process (c,d) is here exemplified by three open reading frames withsequence similarity to a gene coding for alkalinephosphatase synthesis sensor protein in R. prowazekii;the second stage (e,f) by a gene coding for a proteinwith unknown function; the third stage (g,h) by a splitgene with sequence similarity to acetate kinase inR. prowazekii; the fourth stage (i,j) by the gene forpropyl-endopeptidase; and the final stage (k,l) bytwo open reading frames with sequence similarity tothe 3′-terminal segment of a long gene in R. prowazekiiputatively coding for lipopolysaccharide1,2-glucosyltransferase. Yellow and blue boxesrepresent untranscribed and transcribed open readingframes (ORFs) in R. conorii, respectively. Aquamarineboxes represent full-length orthologous genes inR. prowazekii. Dotted lines indicate the borders ofhomology of genes in R. prowazekii and genefragments R. conorii. Green symbols representtranscription initiation sites and red hexamerstranslation termination sites. Green symbols combinedwith red hexamers indicate putative transcriptiontermination sites. Red stars represent the sequenceTATAAT, one of several possible RNA polymerasebinding sites. Open circles above the boxes representribosomes translating mRNA (curved lines). It remainsto be determined whether any of the transcribed ORFsin R. conorii are translated by ribosomes.
comparing substitution frequencies
(1) of split genes versus full-length
genes, (2) of split genes with different
expression characteristics and
(3) of expressed versus unexpressed
gene fragments.
Based on an analysis of
785 orthologous, full-length genes in
R. prowazekii and R. conorii, we estimated
that the nonsynonymous substitution
frequency (Ka) is 0.07 per site and that the
synonymous substitution frequency (Ks)
is 0.40 per site (Table 1). Here, Kais the
average number of substitutions at
sites causing amino acid replacements,
whereas Ksis the neutral exchange rate
for substitutions with no effect on the
amino acid sequences.
The corresponding Kafor 39 gene
fragments derived from 13 split genes
in R. conorii was estimated to be
0.16 substitutions per position (Table 1).
This shows that the split genes have, on
average, accumulated twice as many
mutations at nonsynonymous sites as
the full-length genes, suggesting that
they have less functional constraints on
evolution. This could be due to the simple
fact that different proteins evolve under
different functional constraints, or that
some or all of the split genes have recently
lost their function. Indeed, no more than
a twofold difference is to be expected for
genes that currently evolve under no
selective pressure in R. conorii. This is
because the observed substitution
frequencies for such genes represent those
substitutions that have accumulated
during their evolution as functional genes
in the R. prowazekii and R. conorii
lineages, plus those that have occurred
subsequent to fragmentation in the
R. conorii lineage.
To examine the difference in more
detail, we sorted the split genes in
R. conorii into five groups with different
expression features (Fig. 1). The first
group contains fragmented genes in
which all internal ORFs are expressed
(Fig. 1c); the second group includes genes
in which only the 5′ terminal ORF is
expressed (Fig. 1e); the third group
consists of fragmented genes in which
RNA is produced from two or more
fragments, but in no particular order
(Fig. 1g); the fourth group contains genes
in which only the 3′ terminal ORF is
expressed (Fig. 1i); and the fifth contains
a few short ORFs, none of which is
expressed (Fig. 1k). We observe that
genes in four of the five groups have
twofold higher fixation rates for
mutations at nonsynonymous sites
(Ka
= 0.14 to 0.16) than the set of full-
length orthologues (Ka
= 0.07) (Table 1).
The second group is the only group
with a lower frequency of substitutions
at nonsynonymous sites (Ka
= 0.09).
However, this group consists of a single
fragmented gene, leading to a less reliable
estimate. Thus, a higher substitution
frequency appears to be a characteristic
feature of the split genes, irrespective of
the different patterns of transcription.
If the split genes are indeed not
functional, we expect to find no difference
in substitution frequency for expressed
and unexpressed ORFs inside the
fragmented genes. To examine
systematically whether there is a
stronger selective constraint on the gene
fragments that still produce mRNA, we
compared the substitution frequencies
for 26 expressed ORFs with those of
13 unexpressed ORFs. No difference
was found between the two sets of
genes (Table 1), suggesting that the
expressed gene fragments have not been
more functionally constrained than the
unexpressed gene fragments.
We conclude that although
transcription is maintained for some
ORFs, these are accumulating mutations
at the same high frequencies as the
unexpressed ORFs. Together, the
data suggest that the split genes are
neutrally evolving sequences in the
process of being eliminated from the
R. conorii genome.
Translational readthrough, frameshifting
and/or ribosome hopping?
In a few split genes, all internal ORFs are
transcribed, possibly from the ancestral
promoter site. Are these, presumably
full-length, mRNAs translated by
ribosomes? Several mechanisms could,
in principle, account for the production
of a protein despite the accumulation of
internal stop codons (Fig. 1c,d). For
example, if the newly created stop codon
is leaky, translation might be able to
proceed (readthrough) or bypass a
stretch of noncoding nucleotides by
translational frameshifting or ribosome
hopping [17–20]. Alternatively, if
translation start sites are available
downstream of the internal termination
codon, translation could be re-initiated
on the same mRNA. If this process
restores the function of the gene
partially or completely, the ancestral,
full-length gene will be replaced by a
‘mini-operon’ with several, short genes.
In this context, it is interesting to note
that many bacterial operons, such as
the ribosomal protein operon, contain
stretches of short genes that encode
subunits of the same enzyme [1]. Some
of these operons could be the result of
compensatory mutations that have
been fixed in the population to preserve
gene function subsequent to the
accumulation of internal termination
codons and frameshift mutations in the
ancestral gene.
However, the irregular expression
patterns of the split genes in R. conorii
and the lack of conservation of split genes
in the two Rickettsia species suggest that
the accumulation of mutations do not
normally result in the creation of
functional ‘mini-operons’. A more likely
scenario is that the fixation of internal
termination codons in Rickettsia is most
often followed by additional mutational
TRENDS in Genetics Vol.18 No.7 July 2002
http://tig.trends.com
333Research Update
Table 1. Substitution frequencies of Rickettsia prowazekii genes and Rickettsia
conorii genes and gene fragments sorted into different expression groupsa
Set of genes n (ORF)b
Ka/K
sK
aK
s
Full-length genes, complete set 785 0.19 0.07±0.02 0.40±0.07Split genes, complete set 1 3 (39) 0.30 0.16±0.05 0.53±0.13Split genes, group c–d, Fig. 1 4 (10) 0.25 0.14±0.06 0.55±0.15Split genes, group e–f, Fig. 1 1 (2) 0.26 0.09±0.03 0.35±0.10Split genes, group g–h, Fig. 1 4 (17) 0.26 0.16±0.06 0.61±0.14Split genes, group i–j, Fig. 1 3 (8) 0.40 0.15±0.05 0.37±0.10Split genes, group k–l, Fig. 1 1 (2) 0.40 0.15±0.04 0.37±0.14Expressed ORFs 13 (26) 0.31 0.16±0.05 0.52±0.12Unexpressed ORFs 13 (13) 0.32 0.17±0.06 0.53±0.15
aThe frequency of substitutions at nonsynonymous (Ka) and synonymous (Ks) codon positions. Substitutionfrequencies and standard deviations have been estimated as described in [21–22].bn, number of genes in R. prowazekii; ORF, number of ORFs in R. conorii.
TRENDS in Genetics Vol.18 No.7 July 2002
http://tig.trends.com
334 Research UpdateResearch Update
changes. Indeed, the lack of a difference
in substitution frequencies suggests
that both expressed and unexpressed
short ORFs are regions of the ancestral,
functional gene in different stages
of deterioration.
Conclusions
Our interpretation of these results is
that the split genes in R. conorii are
degraded genes in which mutations have
started to accumulate, in spite of which
transcription, and possibly translation,
can continue. The enhanced substitution
frequencies at nonsynonymous sites
suggest that the split genes are no
longer functional and that the
expression driven by some of these
fragments is most probably a temporary
phenomenon, just as the accumulation
of junk DNA is a temporary stage in the
overall genome deterioration process [9–12].
A possible scenario for the gradual
process whereby (1) the function of a gene
is lost, (2) the expression is turned off,
and (3) the sequence is eliminated,
is outlined below.
The fixation of a frameshift mutation
and/or an internal termination codon
might first induce a halt in translation
(Fig. 1c,d). It can be assumed that the
ancestral promoter will continue to be
active for a while, in particular if one or
more of the shorter ORFs are translated
and still able to maintain some
functional role. However, in the
absence of translational readthrough,
frameshifting, hopping or re-initiation
[17–20], the naked mRNA will be
degraded and transcription will
probably be terminated prematurely
(Fig. 1e,f). This will expose ‘cryptic’
promoter sequences at which RNA
polymerase can re-initiate transcription
(Fig. 1g,h). Both promoters (the
ancestral and the cryptic) might work
simultaneously for a while, but, unless
any of the short gene products are
functionally selected for, either or both
promoters will become inactivated as
the gene accumulates more and more
mutations (Fig. 1i,j). Finally, deletion
mutations will remove any remaining
regulatory signals, leaving only a few
unexpressed fragments with weak
similarities to their full-length
orthologues in other species (Fig. 1k,l).
Thus, the balance between the rate
of disruptive mutations and the rate
at which cryptic transcriptional and
translational initiation sites are exposed,
or created by mutations, will determine
the extent to which the original gene
function can be recovered. A probable
secondary effect of the highly simplified
regulatory systems of obligate
intracellular parasites is that gene
expression could be more or less
constitutive and that ‘false’ initiation of
transcription and translation at internal
gene sites might occur at higher than
normal frequencies, especially inside
degrading genes. Although it cannot be
excluded that such processes might
temporarily, or in rare cases permanently,
recover the function of a disruptive
mutation, degenerative processes are
expected to dominate the evolution of
obligate intracellular pathogens in the
long term [4].
If transcription can indeed be driven
by false initiation inside non-functional
genes, it means that positive signals in
microarray analyses of transcription
profiles do not necessarily imply the
presence of functional genes. It should
be emphasized that other experimental
data are needed to confirm a functional
or regulatory role for any of the
identified small RNAs. As we have shown
here, comparative studies of gene
conservation and substitution rates
across closely related species could yield
important clues about the functional
significance of any observed RNA
expression pattern.
Acknowledgements
The authors’work is supported by grants
from the Swedish Science Foundation
(VR), the Foundation for Strategic
Research (SSF), the Knut and Alice
Wallenberg Foundation (KAW), the
European Union (EU) and the National
Science Foundation (NSF), USA, and
the National Research Foundation,
SouthAfrica.
References
1 Lathe, W. et al. (2000) Gene context conservation
of a higher order than operons. Trends Biochem.
Sci. 25, 474–479
2 Mira, A. et al. (2001) Deletional bias and the
evolution of bacterial genomes. Trends Genet.
17, 589–596
3 Wassarman, K.M. et al. (2001) Identification
of novel small RNAs using comparative
genomics and microarrays. Genes Dev.
15, 1637–1651
4 Andersson, S.G.E. and Kurland, C.G. (1998)
Reductive evolution of resident genomes.
Trends Microbiol. 6, 263–268
5 Andersson, S.G.E. et al. (1998) The genome
sequence of Rickettsia prowazekii and the origin of
mitochondria. Nature 396, 133–140
6 Ogata, H. et al. (2001) Mechanism of evolution in
Rickettsia conorii and R. prowazekii. Science
293, 2093–2098
7 Cole, S.T. et al. (2001) Massive gene decay in the
leprosy bacillus. Nature 409, 1007–1011
8 Shigenobu, S. et al. (2000) Genome sequence of
the endocellular bacterial symbiont of aphids
Buchnera sp. APS. Nature 407, 81–86
9 Andersson, J.O. and Andersson, S.G.E. (1999)
Genome degradation is an ongoing process in
Rickettsia. Mol. Biol. Evol. 16, 1178–1191
10 Andersson, J.O. and Andersson, S.G.E. (2001)
Pseudogenes, junk DNA, and the dynamics
of Rickettsia genomes. Mol. Biol. Evol.
18, 829–839
11 Andersson, J.O. and Andersson, S.G.E. (1999)
Insights into the evolutionary process of
genome degradation. Curr. Opin. Genet. Dev.
9, 664–671
12 Ochman, H. and Moran, N.A. (2001) Genes lost
and genes found: evolution of bacterial
pathogenesis and symbiosis. Science
292, 1096–1098
13 Andersson, S.G.E. and Sharp, P.M. (1996)
Codon usage and base composition in
Rickettsia prowazekii. J. Mol. Evol.
42, 525–536
14 Lawrence, J.G. et al. (2001) Where are the
pseudogenes in bacterial genomes?
Trends Microbiol. 9, 535–540
15 Galitski, T. and Roth, J.R. (1997) Pathways for
homologous recombination between chromosomal
direct repeats in Salmonella typhimurium.
Genetics 146, 751–767
16 Frank, C. et al. Genome deterioration: loss of
repeated sequences and accumulation of junk
DNA. Genetica (in press)
17 Buckingham, R.H. et al. (1997) Polypeptide
chain release factors. Mol. Microbiol.
24, 449–456
18 Weiss, R. and Gallant, J. (1983) Mechanism of
ribosome frameshifting during translation of the
genetic code. Nature 302, 389–393
19 Herr, A.J. et al. (2000) Coupling of open reading
frames by translational bypassing. Annu. Rev.
Biochem. 69, 343–372
20 Huang, W.M. et al. (1988) A persistent
untranslated sequence within bacteriophage
T DNA topoisomerase geen 60. Science
239, 1005–1012
21 Nei, M. and Gojobori, T. (1986) Simple methods
for estimating the numbers of synonymous and
nonsynonymous nucleotide substitutions.
Mol. Biol. Evol. 3, 418–426
22 Ohta, T. and Nei, M. (1994) Variances and
covariances of the number of synonymous and
nonsynonymous substitutions per site.
Mol. Biol. Evol. 11, 613–619
Wagied Davids
Haleh Amiri
Siv G.E. Andersson*
Dept of Molecular Evolution, University of Uppsala, Norbyvägen 18C,S-752 36 Uppsala, Sweden*e-mail: [email protected]