RNA Folding
RNA Folding Algorithms Intuitively: given a sequence, find the structure
with the maximal number of base pairs For nested structures, four possibilities for
S(i,...,j) i,j are paired, added to S(i+1,...,j-1) i is unpaired, added to S(i+1,...,j) j is unpaired, added so S(i,...,j-1) i,j are paired but not to each other, to S(i,...,k),
S(k+1,...,j)
RNA Folding by DP Fill in a matrix of S(0,...,seq_length)
RNA Folding Assumptions RNA folding algorithms typically detect only
nested structures and do not recognize pseudoknots
Some folding algorithms identify pseudoknots but they are typically inefficient or limited (e.g., do not take stacking-dependent pairing models)
Current algorithms get about 50-70% of the base pairs correct, on average
MicroRNA Identification
miRNAs aregenomically encoded small RNAs
processed into single stranded 21-23 mers
incorporated into RNP complex (miRISC)
miRISC binds to 3’UTRs, repression of translation modest mRNA degradation
MicroRNAs: Introduction
miRISC
Ago1Bartel, Cell 116, 2004
MicroRNA Transcription miRNA genes can be in intergenic and intronic regions miRNA genes can be clustered and co-expressed Estimates: 60% singletons, 25% introns, 15% clusters
MicroRNA Examples
MicroRNA Gene Conservation
Some miRNAs are highly conserved (e.g. let-7)
Conservation must preserve a dsRNA hairpin from which the miRNA is processed by Dicer
MicroRNA Gene Identification
MicroRNA Cloning Map cloned ~22nt small RNAs to the genome Predict pre-miRNA secondary structures using m-fold Score pre-miRNAs based on known miRNA precursors
Computational Identification Identify conserved genomic segments Predict pre-miRNA secondary structures using m-fold Scoring pre-miRNAs based on the known miRNA precursors
MirScan, MirSeeker, …
MicroRNA Gene Identification
More complex methods: additional features
MiRBase
~4500 miRNAs in 41 eukaryotes Examples: 474 human, 78 fly Eight viruses express microRNAs
MiRBase
MicroRNAs: Open Questions Promoter Transcritpional start site Transcriptional Termination Transcriptional complex Regulation of miRNA expression
MicroRNA Targets:Mechanism & Identification
Are All RNAs Regulated by miRNAs?
The Target Prediction Problem
Target sites show imperfect sequence complementarity:
Strong match in 5’ region (‘seed’) Varying complementarity on 3’ end
Computational target predictions: Sensitive to exact pairing rules ~100 targets per miRNA within fly transcriptome ~25% of transcriptome under miRNA regulation
3’5’mRNA
3’ 5’miRNA
seed
87654321
Existing algorithms
focus on quality of the sequence match between miRNA and mRNA targetintroduce various filters, e.g. evolutionary conservation
3’5’mRNA
3’ 5’miRNA
987654321 Brennecke et al. 05
wt
seed
miRanda Target prediction: sequence-based rules
miRNA-target complementarity (strong in 5’, weaker in 3’)
Refinement with binding free energy scores Use conservation to increase signal to noise
PicTAR: Combinatorial Targets
mRNA
Perfect nucleu
s
Imperfect nucleus
miRNA
Filter - over 33% of mature miRNA
binding energy to perfect
complementary site
PicTAR: Combinatorial Targets
Anchor
PicTAR: Combinatorial Targets
PicTAR: Combinatorial Targets
Prior (transition) probabilities
p0 p1 p2 p3 pm. . .
Emission probabilitie
s
A C U G ACUGUAC
GGCAUUAC
Generated mRNA U ACUGUA
CC GGCAUUACACUGCAC . .
.
- Independency of binding sites (no overlapping)
- Transition does not depend on current state (memoryless)
- Competition between background and miRNA
10
m
iip
1…m miRNAs
Hidden states b
0.3
0.8
0.2 0.
8
0.02
Accessibility: The Missing Component
What about target accessibility?
miRISC miRISC
vs.
Experimental Method
Drosophila tissue culture cells (S2)
No miRNA overexpressionestablish miRNA expression profile
use endogenous miRNA (50-500 copies per cell) (bantam, miR-2 family, miR-184)
Dual luciferase reporter assay
mutate target site sequencemutate sequence surrounding the target site to alter mRNA secondary structure
firefly3’UTRRenilla
UTR engineering
Renilla experiment, firefly as internal controlmild overexpression of target sequence (<10fold)
no target degradation (20h transfection)
sensitive, quantitative, linear assay
3’UTR
AAAAAtarget
site
~200 b
N: ~200 bp fragment, native structure
C: ~200 bp fragment, closed structure
0.0
0.1
0.2
0.3
0.4
3’UTR
N C C3 C3+ C5 C5+norm
alize
d lu
cif
era
se r
ati
o
target site5’ end
A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’
C ACC
rpr (miR-2)
TargetmiRNA
The Role of Secondary Structure
Target Accessibility Matters
0.0
0.1
0.2
0.3
0.4
3’UTR
N C C3 C3+ C5 C5+norm
alize
d lu
cif
era
se r
ati
o
target site5’ end
A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’
C ACC
rpr (miR-2)
TargetmiRNA
grim (miR-2)
3’UTR N C
A GCA U GCUC AUCAAAGC UUGUGAU CGAG UAGUUUCG GACACUA ACC U
C AAUUAGUUUUCA AAUGAUCUCG UUAGUCGAAAGU UUACUAGAGU
U
hid (bantam)
3’UTR
N C
Accessibility as Important as Sequence
0.0
0.1
0.2
0.3
0.4
3’UTR
N C C3 C3+ C5 C5+norm
alize
d lu
cif
era
se r
ati
o
target site5’ end
A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’
C ACC
rpr (miR-2)
TargetmiRNA
A GA CUCAUCAAAGC UUGUGAUA
A C C
87654321
D5
D5+3
G
M2 M3 M6 I5
0.7
D5 D5+3
target site
mutations
Thermodynamic miRNA::RNA Model
UTR
∆G = -25.3
∆G5 = -15.1∆G3 = -10.2
Thermodynamic miRNA::RNA Model
UTRCDS Poly(A)
∆G0 = -28.3 ∆G1 = -19.5∆Gopen = ∆G0 - ∆G1
folding area = target +70bp
Thermodynamic miRNA::RNA Model
0.1
0.2
0.3
0.4
-30 -28 -26 -24 -22norm
alize
d lu
cif
era
se r
ati
o
0.1
0.2
0.3
0.4
-30 -20 -10 0 10 20
DGduple
x
DDGgrim
hid
rpr
22 constructs altering accessibility of target sites in rpr, hid, grim
r=0.36p<0.11
r=0.7p<4x10-4
30
-30 -20 -10 0 10 20 30
0.1
0.2
0.3
0.4DDG with flank17 up, 13 down
r=0.77p<3x10-5
15
10
5
0
20
255 10 200 15 25
r
0.70
0.72
0.74
0.76
0.68
exploring flank size
downstream (bp)
up
str
eam
(b
p)
ddG Predicts Measured Repression
ddG differential
measu
red
rep
ressio
n d
iffere
nti
al
miR-184 targets
r=0.87
190 validated targets
3’5’mRNA
3’ 5’miRNA
987654321
seed
Native Target Analysis
12 miR-184 targets with weaker 3’ pairing, tested in different backgrounds to alter secondary structure non-redundant set of 190 experimentally tested miRNA:mRNA target
pairs in Drosophila
miRNA target seeds favor highly accessible regions of the genome
DG
op
en
overrepresentation vs. random
accessibility
(DGopen)
accessibility
(DGopen)
fly human
Genome-Wide Target Analysis
Assignment Download the set of human microRNAs Download the set of human UTRs Download the mFold software For each microRNA, identify the set of targets on each
UTR, defined by a perfect match to the microRNA seed, bases 2-8
Partition the targets of each microRNA into conserved and non-conserved targets (define a conservation cutoff)
Compare the RNA-accessibility of conserved and non-conserved targets for each microRNA
For each putative target, extract the 100 bases that surround it Use mFold to compute the free energy of these 100 bases Create a dot-plot with points being microRNAs, and axes being the
median (plot #1) or mean (plot #2) free energy of all conserved (x-axis) or non-conserved (y-axis) targets of the microRNA
Top Related