Trans -splicing in Trypanosoma brucei— results from genome-wide experiments
description
Transcript of Trans -splicing in Trypanosoma brucei— results from genome-wide experiments
Trans-splicing in Trypanosoma brucei—
results from genome-wide experiments
Shai CarmiBar-Ilan University
Department of physics and the faculty of life sciences
February 2010
mRNA processing in T. brucei
Almost all genes have no promoters.
Gene expression is regulated by controlling splicing (?),mRNA stability, and translation.
Gene1 Gene2 Gene3 Gene4
PolycistronicTranscript
AAAA
AAAA
AAAAAAAA
SL
Itai Dov Tkacz
Trans-Splicing=And
Polyadenylation=
mature transcripts
translation
Splicing overviewSL- Spliced Leader RNA
See also:Liang et. al, Euk. Cell (2003).
cis-splicing machinery and consensus
3’ splice-site
snRNPs
Yeast conserved branch site: TACTAAC
10-12nts
mammalian
Splicing regulation
splicing enhancer splicing silencer
SR proteins create ’bridges’ to stabilize the spliceosome
hnRNPIn trypanosomes:• U2F65 and 35 exist and do not interact.• U2F65 interacts with SF1.• Interacting SR proteins were identified.• hnRNP proteins exist.
Open questions 3’ splice site recognition and selection.
Spatial organization of splicing factors: protein-protein and protein-RNA interactions.
Splicing efficiency and gene expression regulation.
Detailed molecular mechanism of trans-splicing and spliceosome assembly, structure of 5’ splice site, SL-RNA biogenesis, and coupling to poly-adenylation:not in this talk.
Past studies of splicing regulation Clayton et. al, Mol. Biochem. Parasit. (2005):
Calculated the statistical properties of the splice sites based on a couple of hundreds ESTs.
Clayton et. al, Mol. Cell. Biol. (1994); Ullu et. al, Mol. Cell. Biol. (1998); Cross et. al, Mol. Cell. Biol. (2005):Used reporter gene systems with the splice sites of model genes (tubulin, actin, procyclin) to study the effect of splice site composition on splicing efficiency.
Limited applicability.
promoter intron 5’UTR reporter geneAG
Taken from endogenous gene and mutated
3’ splice-site
Major known facts Poly-adenylation is coupled to downstream trans-splicing.
Hierarchy of trans-splicing and polyA signals exist. Specific sequences in the 5’UTR (exon) are required for splicing. Optimal PPT should be 25 nts long, U dominated but interspersed
with Cs, and have no two consecutive purines. Optimal PPT-AG spacer should be 20-25 nts long, have U at
position -3 and never AC at [-3,-4].
reporter gene 3’UTR 5’UTR reporter geneintergenic region
3’ splice-sitepolyA-site
Research strategy– outline
Sequence all messenger RNAs to map transcript boundaries.
Silence splicing factors and measure the effect on each transcript.
Examine the splice site regions of regulated genes to infer possible roles for splicing factors and mechanisms of splicing regulation.
Methods– deep sequencing
illumina guide.
Deep sequencing of T. brucei mRNA Experiment performed at Ullu and Tschudi’s lab, Yale University. Library preparation:
Total RNA
Poly(A)+ RNA selectionTerminator exonuclease treatment
First strand cDNA synthesis with random hexamer or oligo(dT) primers
First strand cDNA synthesis with random hexamer primers
Second strand cDNA synthesis with RNaseH-derived RNA primers
Second strand cDNA synthesis with SL primer
cDNA fragmentation and size selection
Addition of adapters and amplification
Illumina sequencing
15 million useful reads!
Ullu’s lab results 532 transcripts with misannotated start codon. 805 annotated genes not producing an transcript. 442 genes with alternative transcript in their UTRs. 1,114 new transcripts, conserved coding and non-coding. Trans-splicing and polyadenylation of snoRNA clusters. The experimental method can be slightly modified to discover pol-II
transcription initiation sites. These sites were found at strand-switch-regions, in proximity to tRNA genes, and within transcription units.
Digital gene expression.
0
5
10
15
20
25
30
1 10 100 1000 10000 100000 1000000relative abundance
num
ber o
f gen
es
0-1 1-10 10-100 >100 mRNAmolecules
per cell
75% of genes
Examples of reannotated featuresChr VIII
Chr X
Chr VII
Chr XI
Chr VII
Correctly annotated gene cluster.Blue- number of reads from SL-enriched library.Red- number of reads from polyA-enriched library.
A novel transcript.
A misannotated start codon.Blues line at the bottom denote SL reads.
An ORF which is part of a larger transcript.
A short transcript at the 3’end of a gene.Red lines at the bottom denote polyA reads.
Examples were experimentally verified for all cases.
Statistics of UTR lengths
UTR length distribution is approximately log-normal.
median- 91
5’
3’
median- 388
Splice-site composition
PPT
No signal observed in the exon
No G allowed at the -3 position
Non AG splice-sites due to sequencing errors and strain differences.
Maximum at about -25,distance from AG varies:unique to trypansomes.
Splice-site compositionPyrimidine content
Sites closer to the PPT are stronger.
PPT disturbed along tens of nucleotides.Purines favored in the exon.
exon
AG
Splice-site compositionAC is not preferred at positions [-3,-4] of the 3’ splice-site:Splice-site with AC are less abundant.
Splicing heterogeneity
Not alternative splicing in the regular sense- leads to the same protein.
Average distance (nts) of all weak splice sites from the strongest splice site.
Uncertainty of splice-site usage.
i
ii ppH ln
log-scale
6967 genes: one major site978 genes: two major sites21 genes: three major sitesUncertainty
Splicing heterogeneity illustrated
• Each row correspond to one gene.• Each site is denoted with a bar.• Sites are centered around the strongest site.• Bar color is according to relative usage.
0204060
-300 -100 100 300
ATG
nt position relative to START codon
rela
tive
usag
e of
tran
s-sp
lice
site
s
Downstream sites are more popular.Some sites are found in frame.
Predicting splicing heterogeneity What determines if a gene will be differentially spliced? Look at 100nts up- and down-stream the strongest site. Rank all potential splice sites: TAG-3, AAG, CAG-2, GAG-1. heterogeneity rank of a gene = sum of ranks of all other AG
dinucleotides / rank of strongest site. Average heterogeneity rank about 10 for high uncertainty genes, but
only about 7 for low uncertainty genes (P=10-20). Signatures do not look meaningful, but analysis show that longer
5’UTRs, shorter PPTs, and longer PPT-AG distance also contribute significantly to heterogeneity.
What is heterogeneity good for? Unclear at the moment. Such heterogeneity is not found
in other organisms. In cis-splicing, exon boundaries must be conserved to
maintain intact coding sequence. In trans-splicing, such evolutionary pressure does not exist.
However, trans-splicing heterogeneity was not observed in C. elegans.
Can reflect another level of complexity in gene expression regulation, as the degree of heterogeneity significantly varies throughout the genome.
Explaining abundance A-rich exons are more abundant.
Other correlations:Genes with longer PPT and shorter 5’UTR are more abundant.
Splice-site ambiguity is anti-correlated with abundance.
A possible model for splicing factors organization? U2F65 does not bind U2F35, so AG can be far from PPT. Variable distance between AG and PPT allows regulation by
differential binding of the splicing efficiency.
intergenic region BP PPT AG 5’UTR
0-8010-30
Optimal: 25 25 AC-rich
AG
competitor splice-site
Silencing methods– RNAiStem-loop construct
T7-opposing construct
Inducible by Tertracycline.Gene is silenced after 3 days.
Wang et. al, JBC (2000).
Silencing methods– microarrays Microarrays are chips on
which thousands of DNA oligos are printed in an array. Each oligo represents a fragment of one gene.
Expression profiles of entire genomes are obtained in a single experiment.
Wikipedia
Genome-wide observations
Hundreds of genes are upregulated- unprecedented phenomenon.
U2F65 and SF1 are physically interacting and thus have similar pattern.
Vazquez et al., Mol. Biochem Parasitol. 164, 137 (2009).
red-up, green-down.
Genome-wide correlations
Potential protein-protein interactions should be biochemically verified. Interactions maybe indirect.
Spearman correlation coeffi cientPrp43 SmD1 U2F35 Prp31 U2F65 SF1 U1 PTB1 PTB2 Tsr1 Tsr1IP hnRNP_FHPrp19
Prp43 1 0.278349 -0.13685 0.294357 0.240051 0.342593 0.149605 0.125257 0.130586 -0.02391 0.221945 0.204737 0.10404SmD1 0.278349 1 0.044152 0.383218 0.333834 0.315953 -0.01695 0.230517 0.163068 0.041223 0.28852 0.494197 0.068624U2F35 -0.13685 0.044152 1 -0.3023 0.435671 0.190754 0.378621 0.010175 0.264658 0.375165 0.500294 0.255059 0.088768Prp31 0.294357 0.383218 -0.3023 1 0.217689 0.248819 0.017184 0.179219 -0.13272 0.024078 0.106424 -0.11128 0.126101U2F65 0.240051 0.333834 0.435671 0.217689 1 0.698639 0.428154 0.071559 0.290715 0.394992 0.742415 0.366936 0.169848SF1 0.342593 0.315953 0.190754 0.248819 0.698639 1 0.261155 0.175059 0.276896 0.056967 0.682194 0.344552 0.199872U1 0.149605 -0.01695 0.378621 0.017184 0.428154 0.261155 1 0.007941 0.189195 0.312908 0.38916 0.174986 0.078526PTB1 0.125257 0.230517 0.010175 0.179219 0.071559 0.175059 0.007941 1 0.254598 -0.11872 0.169165 0.024827 0.21833PTB2 0.130586 0.163068 0.264658 -0.13272 0.290715 0.276896 0.189195 0.254598 1 0.178874 0.345913 0.37377 0.178053Tsr1 -0.02391 0.041223 0.375165 0.024078 0.394992 0.056967 0.312908 -0.11872 0.178874 1 0.30302 0.147911 0.07821Tsr1IP 0.221945 0.28852 0.500294 0.106424 0.742415 0.682194 0.38916 0.169165 0.345913 0.30302 1 0.348961 0.231646hnRNP_FH 0.204737 0.494197 0.255059 -0.11128 0.366936 0.344552 0.174986 0.024827 0.37377 0.147911 0.348961 1 0.052611Prp19 0.10404 0.068624 0.088768 0.126101 0.169848 0.199872 0.078526 0.21833 0.178053 0.07821 0.231646 0.052611 1
Processes affected by splicing defects Upregulated- Mostly
ribosomal and translation involved proteins, peptidases, and chaperones.10 candidates verified experimentally by RT-PCR.
Downregulted-Mostly metabolic enzymes and transporters.
Downregulated genes The sequence at the splice site of the genes most impacted by
silencing may indicate the role of the splicing factor. Look at PPT length and distance to 3’ splice-site.
Most results are negative (discuss reason later).
P-value=0.001 P-value=0.004
Genes with shorter PPT require SF1 Genes with longer PPT-AG distance require PTB1
Sequence motifs Using DRIM tool of Yael Mandel-Gutfreund’s lab.
Hard to assess the significance of the motifs. Surprisingly no pyrimidine-rich motifs identified. Other tools not suited for RNA motifs or intended for the human
genome and thus perform poorly. Should look which elements are conserved.
U2F65 SF1 U2F65 SF1 U2F65 SF1Up Up Down Down Both Both5'UTR 3'UTR 5'UTR 3'UTR 5'UTR 3'UTR 5'UTR 3'UTR 5'UTR 3'UTR 5'UTR 3'UTRAGGGT TTAAG TTGCT TAAGG ACTTC TTTAG None TGTCA ACTCT AAGGG None GCGGGTACAT GAAAA CAACC AAAAC ATAAA AAGCG AATTT TAAGGCCCCA GGCAG AGAGA TCAAT GCGGG
TAAGT GGGGT GGTAA CAAAACTTTT ACTCA TTAGTACATA CTACC
hnRNPF/H binding sites.
Mechanisms of regulation RNA level regulation can be mediated via two mechanisms: 1. mRNA stability. The 3’UTR carries a specific sequence that causes stabilization or destabilization under
given experimental conditions (silencing). Demonstrated experimentally for a few upregulated genes. Binding can be directly to the silenced splicing factor (U2F65, SF1, …). Splicing factors
have been shown to bind mature mRNA in human cells (Carmo-Fonseca et. al, 2006). Alternatively, binding can be to some other factor which is affected by the silencing
(secondary effect). Binding can induce both up- and down-regulation of different genes, depending on the
context (e.g., competing with stabilizing/destabilizing proteins). Regulation might not due to binding but due to secondary structure. 2. Splicing defects. The absence of a splicing factor might cause downregulation of genes for which it is
required for splicing. Such genes may have certain properties such as weak splice site, long PPT-AG distance,
short PPT, competition with other AGs, etc.
Discussion (problems) Computational approaches are limited by low reproducibility of the
microarrays, noisy fold changes, and the very small number of genes affected by more than one factor.
Genes with splicing defects are masked by many more genes which are regulated by mRNA stability. It is unclear at the moment if there is a significant number of genes regulated by splicing.
mRNA stability can be mediated by more than one factor (primary and secondary effects).
Thus, a clean set of genes which undergo the same regulation is hard to obtain.
Discussion (future plans) Computational: Deep-sequencing of Leishmania at Ullu’s lab may provide
information about conserved regulatory elements. Secondary structure of 3’UTR will be explored. Experimental: Reporter gene system with the intergenic region of a model gene. CLIP-seq (in vivo cross linking and immunoprecipiation followed by
deep-sequencing) should yield RNA binding sites. Examine splicing defects (accumulation of SL-RNA or Y-structure)
of individual genes or genome-wide (co-silencing of the exosome).
Thank you for your attention!