RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the...

34
RNA Folding

Transcript of RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the...

Page 1: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

RNA Folding

Page 2: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

RNA Folding Algorithms Intuitively: given a sequence, find the structure

with the maximal number of base pairs For nested structures, four possibilities for

S(i,...,j) i,j are paired, added to S(i+1,...,j-1) i is unpaired, added to S(i+1,...,j) j is unpaired, added so S(i,...,j-1) i,j are paired but not to each other, to S(i,...,k),

S(k+1,...,j)

Page 3: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

RNA Folding by DP Fill in a matrix of S(0,...,seq_length)

Page 4: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

RNA Folding Assumptions RNA folding algorithms typically detect only

nested structures and do not recognize pseudoknots

Some folding algorithms identify pseudoknots but they are typically inefficient or limited (e.g., do not take stacking-dependent pairing models)

Current algorithms get about 50-70% of the base pairs correct, on average

Page 5: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Identification

Page 6: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

miRNAs aregenomically encoded small RNAs

processed into single stranded 21-23 mers

incorporated into RNP complex (miRISC)

miRISC binds to 3’UTRs, repression of translation modest mRNA degradation

MicroRNAs: Introduction

miRISC

Ago1Bartel, Cell 116, 2004

Page 7: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Transcription miRNA genes can be in intergenic and intronic regions miRNA genes can be clustered and co-expressed Estimates: 60% singletons, 25% introns, 15% clusters

Page 8: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Examples

Page 9: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Gene Conservation

Some miRNAs are highly conserved (e.g. let-7)

Conservation must preserve a dsRNA hairpin from which the miRNA is processed by Dicer

Page 10: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Gene Identification

MicroRNA Cloning Map cloned ~22nt small RNAs to the genome Predict pre-miRNA secondary structures using m-fold Score pre-miRNAs based on known miRNA precursors

Computational Identification Identify conserved genomic segments Predict pre-miRNA secondary structures using m-fold Scoring pre-miRNAs based on the known miRNA precursors

Page 11: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MirScan, MirSeeker, …

MicroRNA Gene Identification

More complex methods: additional features

Page 12: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MiRBase

~4500 miRNAs in 41 eukaryotes Examples: 474 human, 78 fly Eight viruses express microRNAs

Page 13: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MiRBase

Page 14: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNAs: Open Questions Promoter Transcritpional start site Transcriptional Termination Transcriptional complex Regulation of miRNA expression

Page 15: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

MicroRNA Targets:Mechanism & Identification

Page 16: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Are All RNAs Regulated by miRNAs?

Page 17: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

The Target Prediction Problem

Target sites show imperfect sequence complementarity:

Strong match in 5’ region (‘seed’) Varying complementarity on 3’ end

Computational target predictions: Sensitive to exact pairing rules ~100 targets per miRNA within fly transcriptome ~25% of transcriptome under miRNA regulation

3’5’mRNA

3’ 5’miRNA

seed

87654321

Existing algorithms

focus on quality of the sequence match between miRNA and mRNA targetintroduce various filters, e.g. evolutionary conservation

3’5’mRNA

3’ 5’miRNA

987654321 Brennecke et al. 05

wt

seed

Page 18: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

miRanda Target prediction: sequence-based rules

miRNA-target complementarity (strong in 5’, weaker in 3’)

Refinement with binding free energy scores Use conservation to increase signal to noise

Page 19: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

PicTAR: Combinatorial Targets

mRNA

Perfect nucleu

s

Imperfect nucleus

miRNA

Filter - over 33% of mature miRNA

binding energy to perfect

complementary site

Page 20: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

PicTAR: Combinatorial Targets

Anchor

Page 21: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

PicTAR: Combinatorial Targets

Page 22: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

PicTAR: Combinatorial Targets

Prior (transition) probabilities

p0 p1 p2 p3 pm. . .

Emission probabilitie

s

A C U G ACUGUAC

GGCAUUAC

Generated mRNA U ACUGUA

CC GGCAUUACACUGCAC . .

.

- Independency of binding sites (no overlapping)

- Transition does not depend on current state (memoryless)

- Competition between background and miRNA

10

m

iip

1…m miRNAs

Hidden states b

0.3

0.8

0.2 0.

8

0.02

Page 23: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Accessibility: The Missing Component

What about target accessibility?

miRISC miRISC

vs.

Page 24: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Experimental Method

Drosophila tissue culture cells (S2)

No miRNA overexpressionestablish miRNA expression profile

use endogenous miRNA (50-500 copies per cell) (bantam, miR-2 family, miR-184)

Dual luciferase reporter assay

mutate target site sequencemutate sequence surrounding the target site to alter mRNA secondary structure

firefly3’UTRRenilla

UTR engineering

Renilla experiment, firefly as internal controlmild overexpression of target sequence (<10fold)

no target degradation (20h transfection)

sensitive, quantitative, linear assay

Page 25: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

3’UTR

AAAAAtarget

site

~200 b

N: ~200 bp fragment, native structure

C: ~200 bp fragment, closed structure

0.0

0.1

0.2

0.3

0.4

3’UTR

N C C3 C3+ C5 C5+norm

alize

d lu

cif

era

se r

ati

o

target site5’ end

A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’

C ACC

rpr (miR-2)

TargetmiRNA

The Role of Secondary Structure

Page 26: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Target Accessibility Matters

0.0

0.1

0.2

0.3

0.4

3’UTR

N C C3 C3+ C5 C5+norm

alize

d lu

cif

era

se r

ati

o

target site5’ end

A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’

C ACC

rpr (miR-2)

TargetmiRNA

grim (miR-2)

3’UTR N C

A GCA U GCUC AUCAAAGC UUGUGAU CGAG UAGUUUCG GACACUA ACC U

C AAUUAGUUUUCA AAUGAUCUCG UUAGUCGAAAGU UUACUAGAGU

U

hid (bantam)

3’UTR

N C

Page 27: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Accessibility as Important as Sequence

0.0

0.1

0.2

0.3

0.4

3’UTR

N C C3 C3+ C5 C5+norm

alize

d lu

cif

era

se r

ati

o

target site5’ end

A GA 5 CUCAUCAAAGC UUGUGAUA 3’ 3’ GAGUAGUUUCG GACACUAU 5’

C ACC

rpr (miR-2)

TargetmiRNA

A GA CUCAUCAAAGC UUGUGAUA

A C C

87654321

D5

D5+3

G

M2 M3 M6 I5

0.7

D5 D5+3

target site

mutations

Page 28: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Thermodynamic miRNA::RNA Model

Page 29: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

UTR

∆G = -25.3

∆G5 = -15.1∆G3 = -10.2

Thermodynamic miRNA::RNA Model

Page 30: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

UTRCDS Poly(A)

∆G0 = -28.3 ∆G1 = -19.5∆Gopen = ∆G0 - ∆G1

folding area = target +70bp

Thermodynamic miRNA::RNA Model

Page 31: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

0.1

0.2

0.3

0.4

-30 -28 -26 -24 -22norm

alize

d lu

cif

era

se r

ati

o

0.1

0.2

0.3

0.4

-30 -20 -10 0 10 20

DGduple

x

DDGgrim

hid

rpr

22 constructs altering accessibility of target sites in rpr, hid, grim

r=0.36p<0.11

r=0.7p<4x10-4

30

-30 -20 -10 0 10 20 30

0.1

0.2

0.3

0.4DDG with flank17 up, 13 down

r=0.77p<3x10-5

15

10

5

0

20

255 10 200 15 25

r

0.70

0.72

0.74

0.76

0.68

exploring flank size

downstream (bp)

up

str

eam

(b

p)

ddG Predicts Measured Repression

Page 32: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

ddG differential

measu

red

rep

ressio

n d

iffere

nti

al

miR-184 targets

r=0.87

190 validated targets

3’5’mRNA

3’ 5’miRNA

987654321

seed

Native Target Analysis

12 miR-184 targets with weaker 3’ pairing, tested in different backgrounds to alter secondary structure non-redundant set of 190 experimentally tested miRNA:mRNA target

pairs in Drosophila

Page 33: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

miRNA target seeds favor highly accessible regions of the genome

DG

op

en

overrepresentation vs. random

accessibility

(DGopen)

accessibility

(DGopen)

fly human

Genome-Wide Target Analysis

Page 34: RNA Folding. RNA Folding Algorithms Intuitively: given a sequence, find the structure with the maximal number of base pairs For nested structures, four.

Assignment Download the set of human microRNAs Download the set of human UTRs Download the mFold software For each microRNA, identify the set of targets on each

UTR, defined by a perfect match to the microRNA seed, bases 2-8

Partition the targets of each microRNA into conserved and non-conserved targets (define a conservation cutoff)

Compare the RNA-accessibility of conserved and non-conserved targets for each microRNA

For each putative target, extract the 100 bases that surround it Use mFold to compute the free energy of these 100 bases Create a dot-plot with points being microRNAs, and axes being the

median (plot #1) or mean (plot #2) free energy of all conserved (x-axis) or non-conserved (y-axis) targets of the microRNA