Scikit-ribo reveals precise codon-level translational control by ...
-
Upload
duongthuan -
Category
Documents
-
view
226 -
download
0
Transcript of Scikit-ribo reveals precise codon-level translational control by ...
Scikit-ribo reveals precise codon-level translational control by dissecting ribosome pausing and codon elongation
October 26, 2016Biological Data Science
Han Fang
The hidden treasure in genomics
DNA RNA ProteinTranscription Translation
RNA Polymerase Ribosome
Replication
The hidden treasure in genomics
DNA RNA ProteinTranscription Translation
RNA Polymerase Ribosome
Replication
Ingolia et al. Science. (2009) Ingolia. Nat Rev Genet. (2014)
What is ribosome profiling (Riboseq)?
What is ribosome profiling (Riboseq)?
Less efficient translation Normal translation efficieny (TE) More efficient translation
log2(TE) < 0 log2(TE) = 0 log2(TE) > 0
T E = RP F rpkmRNA rpkm
Ingolia et al. Science. (2009) Ingolia. Nat Rev Genet. (2014)
What is ribosome profiling (Riboseq)?
Less efficient translation Normal translation efficieny (TE) More efficient translation
log2(TE) < 0 log2(TE) = 0 log2(TE) > 0
T E = RP F rpkmRNA rpkm
Less efficient translation Normal translation efficieny (TE) More efficient translation
log2(TE) < 0 log2(TE) = 0 log2(TE) > 0
T E = RP F rpkmRNA rpkm
Less efficient translation Normal translation efficieny (TE) More efficient translation
log2(TE) < 0 log2(TE) = 0 log2(TE) > 0
T E = RP F rpkmRNA rpkm
Ingolia et al. Science. (2009) Ingolia. Nat Rev Genet. (2014)
Calculate translational efficiency (TE)
Less efficient translation Normal translation efficieny (TE) More efficient translation
log2(TE) < 0 log2(TE) = 0 log2(TE) > 0
T E = RP F rpkmRNA rpkm
TE =
Riboseq rpkm
RNAseq rpkm
log2(TE) < 0
log2(TE) = 0
log2(TE) > 0
1
TE =
Riboseq rpkm
RNAseq rpkm
log2(TE) < 0
log2(TE) = 0
log2(TE) > 0
1
TE =
Riboseq rpkm
RNAseq rpkm
log2(TE) < 0
log2(TE) = 0
log2(TE) > 0
1
TE =
Riboseq rpkm
RNAseq rpkm
log2(TE) < 0
log2(TE) = 0
log2(TE) > 0
1
Hypothesis: TE distribution could be skewed by ribosome pausing events.
Ribosome footprints without bias Ribosome footprints with pausing
Simulated S. cerevisiae data
Null distribution
With ribosomepausing sites
Simulated S. cerevisiae data - TE distribution are negatively-skewed by ribosome pausing events
Null distribution
Randomly imputed ribosome pausing sites to 20% of the genes
Ribosome pausing sites (peaks) finding by negative binomial mixture model
Yes
SIMPLE BUSINESS PROCESS
Given a gene
Likelihood ratio test
Bimodal distribution, i.e.
potential pausing sites
Unimodal distribution, i.e.
no pausing
Bonferroni adjusted p-value
< 0.05
Posterior probability >
cutoff
Not a pausing site
No
Peak/Pausing sitesYes
No
Yifei HuangP (Xi|⇡i, µi, ki, ri) =Y
j
⇡iNB(Xij |µi, ri) + (1� ⇡i)NB(Xij |kiµi, ri),
for gene i at position j, where k � 5
H0 : ⇡ = 1
H1 : ⇡ 6= 1
1
P (Xi|⇡i, µi, ki, ri) =Y
j
⇡iNB(Xij |µi, ri) + (1� ⇡i)NB(Xij |kiµi, ri),
for gene i at position j, where k � 5
H0 : ⇡ = 1
H1 : ⇡ 6= 1
1
Yes
SIMPLE BUSINESS PROCESS
Given a gene
Likelihood ratio test
Bimodal distribution, i.e.
potential pausing sites
Unimodal distribution, i.e.
no pausing
Bonferroni adjusted p-value
< 0.05
Posterior probability >
cutoff
Not a pausing site
No
Peak/Pausing sitesYes
No
# genes # genes (rpkm > 100) # genes with pausing
# ribosome pausing sites identified
6664 1252 94 180
Yifei HuangP (Xi|⇡i, µi, ki, ri) =Y
j
⇡iNB(Xij |µi, ri) + (1� ⇡i)NB(Xij |kiµi, ri),
for gene i at position j, where k � 5
H0 : ⇡ = 1
H1 : ⇡ 6= 1
1
P (Xi|⇡i, µi, ki, ri) =Y
j
⇡iNB(Xij |µi, ri) + (1� ⇡i)NB(Xij |kiµi, ri),
for gene i at position j, where k � 5
H0 : ⇡ = 1
H1 : ⇡ 6= 1
1
Ribosome pausing sites (peaks) finding by negative binomial mixture model
mRNA with stronger secondary structure tend to have ribosome pausing events
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
0.0
0.5
1.0
1.5
No Yes
Seco
ndar
y st
ruct
ure
scor
e
factor(Pausing)NoYes
p-value< 2*10-16
PARS scores obtained from Kertesz et al. Nature (2010)
−4
0
4
0 100 200 300 400position
PAR
S sc
ores
−4 0 4PARS_score
0
500
1000
1500
2000
0 100 200 300 400position
ribos
ome
coun
ts
ribosome_pausing
FALSE
TRUE
mRNA with stronger secondary structure tend to have ribosome pausing events
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
0.0
0.5
1.0
1.5
No Yes
Seco
ndar
y st
ruct
ure
scor
e
factor(Pausing)NoYes
p-value< 2*10-16
PARS scores obtained from Kertesz et al. Nature (2010)
−4
0
4
0 100 200 300 400position
PAR
S sc
ores
−4 0 4PARS_score
0
500
1000
1500
2000
0 100 200 300 400position
ribos
ome
coun
ts
ribosome_pausing
FALSE
TRUE
mRNA with stronger secondary structure tend to have ribosome pausing events
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
0.0
0.5
1.0
1.5
No Yes
Seco
ndar
y st
ruct
ure
scor
e
factor(Pausing)NoYes
p-value< 2*10-16
PARS scores obtained from Kertesz et al. Nature (2010)
Simplifying the generalized linear model (GLM)
RNA count
Secondary structure
count
Dwell time(1/speed)
count
Translationefficiency (TE)
count
Simplifying the generalized linear model (GLM)
RNA count
Secondary structure
count
Dwell time(1/speed)
count
Translationefficiency (TE)
count
Simplifying the generalized linear model (GLM)
RNA count
Secondary structure
count
Dwell time(1/speed)
count
Translationefficiency (TE)
count
Simplifying the generalized linear model (GLM)
RNA count
Secondary structure
count
Dwell time(1/speed)
count
Translationefficiency (TE)
count
Joint inference of protein TE and codon dwell time using GLM, while accounting for secondary structure
GLM for joint inference of TE and codon dwell time:
Yij ⇠NB(mean = µij , dispersion = ↵), for gene i, position j
g(µ
ij) = �
0
+ x
im|{z}
mRNA
+ �
it|{z}
TE
+ �
kc|{z}
codon
+ �sxij
| {z }secondary structure
where g(.) is a log link function, µij = E(Yij),
x
im is mRNA abundance for gene i,
�
it is translational e�ciency for gene i,
�
kc is dwell time for codon k,
�sxij
is secondary structure e↵ect at position j for gene i.
1
Scikit-ribo perfectly reproduced relative codon dwell time from Weinberg et al
Scikit-ribo perfectly reproduced relative codon dwell time from Weinberg et al
CGG
CGA
CCG
Rarest codons
Significant correlation between tRNA abundance and codon elongation rates
GLM estimates vs. RPKM-based estimates reveals systematic bias in typical Riboseq analysis
GLM estimates vs. RPKM-based estimates reveals systematic bias in typical Riboseq analysis
The rpkm based approach overestimated TE of highly structured mRNA, while the rest of the mRNA were slightly under-estimated, as hypothesized.
Highly structured mRNA genes
Accurate TE estimation supported by proportional synthesis for heterodimeric complexes in S. cerevisiae.
Summary
Discussed:1) Introduced scikit-ribo for joint analysis of Riboseq and
RNAseq data.
2) Identified biases in Riboseq data due to ribosome pausing.
3) Corrected biases and revealed underlying biology
4) Joint inference of codon elongation rate and protein TE
5) Revealed precise translational control at codon levelAlign reads (eg. w/ STAR, HISAT)
Filter rRNA sequences(eg. w/ bowtie)
Identify & trim adapters (eg. w/ cutadapt)
Transcript quantification(eg. w/ cufflinks, salmon)
Joint analysisw/ scikit-ribo
Riboseq RNAseq
Ribosome A-site position predicition
A-site codonlocalization
Ribosome pausingsite calling
Translation efficiencyinference
Differential translation efficiency testing
https://github.com/hanfang/
Acknowledgments
Gholson Lyon Michael Schatz
Lyon LabMax DoerfelYiyang WuJonathan CrainJason O’Rawe
Schatz LabFritz SedlazeckTyler GarvinJames GurtowskiMaria NattestadSrividya Ramakrishnan
CSHL Stony Brook UniversityUCSF RutgersJHU
Yifei HuangAdam SiepelNoah Dukler
Rachel Green Jonathan WeissmanJoshua DunnDavid Weinberg
Premal Shah Rob Patro