RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two...

28
RNA Folding CMSC 423 Lecture by Darya Filippova

Transcript of RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two...

Page 1: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA FoldingCMSC 423

Lecture by Darya Filippova

Page 2: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding

G C

CG

UA

A U

U A

C G

CG

AU

A U

G

G

G

U

A

A

A

A G C C

GGCU

UA

A

A

GA

C

C

G

GU

C

U

U

U

A

CC

C

C

GG

A

U

A

U

G

C

CC

C

A

A

RNA is single stranded and folds up:• G and C stick together• A and U stick together

Page 3: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding Rules

RNA folding rules:1. If two bases are closer than 4 bases apart, they cannot

pair2. Each base is matched to at most one other base3. The allowable pairs are {U, A} and {C, G}4. Pairs cannot “cross.”

G C

CG

UA

A U

U A

C G

CG

AU G CCG UAA UU AC G CG AU

Page 4: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

No Crossings

If (i,j) and (k,m) are paired, we must have i < k < m < j.

Paired bases have to be nested.

i jk m

Page 5: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

RNA Folding

Given: a string r = b1b2b3,...,bn with bi ∈ {A,C,U,G}Find: the largest set of pairs S = {(i,j)}, where i,j ∈ {1,2,...,n} that satisfies the RNA folding rules.

Goal: match as many bases as possible.

Page 6: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems

G CCG UUA UU AC G CG AU1 j

G CCG UUA UU AC G CG AU1 j

j is not paired with anything

j is paired with some t ≤ j -4

t

OPT(t+1, j-1)OPT(1, t-1)

OPT(1, j-1)

Page 7: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Recurrence

If j - i ≤ 4:

OPT (i, j) = max

(OPT (i, j � 1)

maxt{1 +OPT (i, t� 1) +OPT (t+ 1, j � 1)

If j - i > 4:

In the 2nd case above, we try all possible t with which to pair j.That is, t runs from i to j-4.

OPT (i, j) = 0

Page 8: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

Page 9: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

• What problems do we need to solve OPT(i,j)?

OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j

• In what sense are these problems “smaller?”

Page 10: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Order to solve the subproblems

• In what order should we solve the subproblems?

• What problems do we need to solve OPT(i,j)?

OPT(i,t-1) and OPT(t+1, j-1) for every t between i and j

• In what sense are these problems “smaller?”

• They involve smaller intervals of the string:

We solve OPT(i,j) in order of increase value of j - i.

Page 11: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

only use half: i < j

OPT(i,j)

Page 12: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 13: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 14: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 15: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 16: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Filling in the matrix

i

j

n

1n1

in order of increasing j-i

Page 17: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 1

i

j

n

1n1

OPT(i,j)

OPT(i,j-1)

OPT (i, j) = max

(OPT (i, j � 1). . .

Page 18: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 1

i

j

n

1n1

OPT(i,j)

OPT(i,j-1)

OPT (i, j) = max

(OPT (i, j � 1). . .

Page 19: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 20: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 21: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Case 2

i

j

n

1n1

OPT(i,j)

OPT(t+1,j-1)

OPT(i,t-1)

OPT (i, j) = max

(. . .

maxt{1 + OPT (i, t� 1) + OPT (t + 1, j � 1)}

Page 22: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Code

def rnafold(rna): n = len(rna) OPT = make_matrix(n, n) Arrows = make_matrix(n, n) for k in xrange(5, n): # interval length for i in xrange(n-k): # interval start j = i + k # interval end best_t = OPT[i][j-1] arrow = -1 for t in xrange(i, j): if is_complement(rna[t], rna[j]): val = 1 + \\

(OPT[i][t-1] if t > i else 0) + OPT[t+1][j-1] if val >= best_t: best_t, arrow = val, t OPT[i][j] = best_t Arrows[i][j] = arrow return OPT, Arrows

Page 23: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Backtrace code

def rna_backtrace(Arrows): Pairs = [] # holds the pairs in the optimal solution Stack = [(0, len(Arrows) - 1)] # tracks cells we have to visit while len(Stack) > 0: i, j = Stack.pop() if j - i <= 4: continue # if cell is base case, skip it # Arrow = -1 means we didn’t match j if Arrows[i][j] == -1: Stack.append((i, j - 1)) else: t = Arrows[i][j] Pairs.append((t, j)) # save that j matched with t # add the two daughter problems if t > i: Stack.append((i, t - 1)) Stack.append((t + 1, j - 1))return Pairs

Page 24: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems, 2

• We have a subproblem for every interval (i,j)

• How many subproblems are there?

Page 25: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Subproblems, 2

• We have a subproblem for every interval (i,j)

• How many subproblems are there?

✓n

2

◆= O(n2)

Page 26: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Running Time

• O(n2) subproblems

• Each takes O(n) time to solve(have to search over all possible choices of t)

• Total running time is O(n3).

Page 27: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Summary

• This is essentially “Nussinov’s algorithm,” which was proposed for finding RNA structures in 1978.

• Same dynamic programming idea: write the answer to the full problem in terms of the answer to smaller problems.

• Still have an O(n2) matrix to fill.

• Main differences from sequence alignment:• We fill in the matrix in a different order: entries (i,j) in order

of increasing j - i.• We have to try O(n) possible subproblems inside the max.

This leads to an O(n3) algorithm.

Page 28: RNA Foldingckingsf/bioinfo-lectures/rnafold.pdf · RNA Folding Rules RNA folding rules: 1. If two bases are closer than 4 bases apart, they cannot pair 2. Each base is matched to

Pseudoknots

(Staple & Butcher, PLoS Biol, 2005)