RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export...
-
Upload
jevon-christian -
Category
Documents
-
view
216 -
download
2
Transcript of RNA structure prediction. RNA functions RNA functions as –mRNA –rRNA –tRNA –Nuclear export...
RNA structure prediction
RNA functions
• RNA functions as– mRNA– rRNA– tRNA– Nuclear export– Spliceosome– Regulatory molecules (RNAi)– Enzymes– Virus– Retrotransposons– Medicine
Base pairs
• C-G stronger than U-A• Non-standard G-U
CC
C
N
N C
O
C
CC
C
O
N C
O
N
cytosine
Uracyl
N
CC
C
NC
NC
N
N
O
N
CC
C
NC
NC
N
N
Adenine
Guanine
PYRIMIDINES PURINES
H donor acceptor
• Base-pairs are usually coplanar
• are almost always stacked
• stems – continuous stacks
• 3D structure of a stack is a helix
hairpin
Stacking
Predictable structures
Hard-to-predict structures
• Pseudoknots, kissing hairpins, hairpin-bulge
Secondary structure notations
Tertiary structure
RNAi
Structure representation
Main approaches to RNA secondary structure prediction
• Energy minimization – dynamic programming approach– does not require prior sequence alignment– require estimation of energy terms
contributing to secondary structure
• Comparative sequence analysis– Using sequence alignment to find conserved
residues and covariant base pairs.– most trusted
Dotplot
Think!
• Make a dotplot of an RNA molecule– Sequence : GGGAAAUCC
• What is the secondary structure?
Dynamic programming approach
• Nussinov algorithm
Dynamic programming approach
a) i,j is paired E(i,j) = E(i+1,j-1) + (ri,rj)b) i is unpaired E(i,j) = E(i+1,j) c) j is unpaired E(i,j) = E(i,j-1)d) bifurcation E(i,j) = E(i,k)+E(k+1,j)
i+1 j-1 i+1 j ji j-1i j i
ik k+1
a) b) c) d)
Let E(i,j) = minimum energy for subchain starting at i and ending at j(ri,rj) = energy of pair ri, rj (rj = base at position j)
RNA secondary structure algorithm
• Given: RNA sequence x1,x2,x3,x4,x5,x6,…,xL
• Initialization: E(i, i-1) = 0 for i = 2 to LE(i, i) = 0 for i = 1 to L
Recursion: for n = 2 to L # iteration over length
E(i,j) = min { E(i+1, j), E(i, j-1), E(i+1, j-1)+ (ri,rj) ,
min i<k<j {E(i,k)+E(k+1, j)} }
• Cost: O(n3)
ExampleLet (ri,rj) = -1 if ri,rj form a base pair and 0 otherwise Input : GGAAAUCC
G G A A A U C C
G 0
G 0 0
A 0 0
A 0 0
A 0 0
U 0 0
C 0 0
C 0 0
E(i,j) = lowestenergy conformation for subchain from i to j
i
j
Here we should have min energy for AAAUC
Example-continued
G G A A A U C C
G 0 0
G 0 0 0
A 0 0 0
A 0 0 0
A 0 0 -1
U 0 0 0
C 0 0 0
C 0 0
GGA (i=2, j=3)min { 0,
0,0+ (GA)
} = 0
AAU (i=5, j=6)min { 0,
0,0+ (AU)
} = -1
-1
0
i
j
Recovering the structure from the DP table
• Complexity O(n3)• Main difference to sequence alignment – we are
tracing back a tree-like structure not a single optimal path (bifurcation introduces branch points).
• Method 1: Leave pointers as you compute the table: for each element of the table store (at most two) pointers to the subsequences used in the solution.
• Method 2: Recover history based on numerical values in the table.– Stacking – check value along diagonal– Bifurcation - find k such that E(i,k)+E(k+1,j) =
E(i,j)
More realistic energy function
Stacking energies
Even more realistic energy function
Loops have destabilizing effect structure (d) should have lower energy that (b).
Destabilizing contribution of loops should depend on the loop length (k).
Stacking has additional stabilizing contribution .
(k) (k)
(k)
More realistic energy function requires slightly more involved
recurrenceE(i,j) = min{ E(i+1,j), E(i,j-1), min{E(i,k)+E(k+1,j), L(i,j)} where
L(i,j) = {(ri,rj) + (j-i-1) if L(i,j) is a hairpin loop;
(ri,rj) + ij-1if hairpin
mink{(ri,rj) + (k)+E(i+k+1,j-1)} if i-bulge
mink {(ri,rj) + (k)+E(i+1,j-k-1)} if j-bulge
mink1,k2{(ri,rj) + (k1+k2)+E(i+k1+1,j-k2-1)} if internal loop
}Extra “min” gives O(n4) algorithm
Covariance method
In a correct multiple alignment RNAs, conserved base pairs are often revealed by the presence of frequent correlated compensatory mutations.
Two boxed positions are covarying to maintain Watson-Crick complementary. This covariation implies a base pair which may then be extended in both directions.
GCCUUCGGGCGACUUCGGUCGGCUUCGGCC
Alignment
Quantities measure of pairwise sequence covariation
Mutual information Mij between two aligned columns i, j
Mij = i,j fxixj log2 (fxixj/fxi fxj)Where fxixj frequency of the pair (observed)
fxi frequency of nucleotide xi at position iObservations:
0 <= Mij <=2
i,j uncorrelated Mij = 0
MI: examples
A
A
C
G
U
U
G
C
fAi = .5fCi = .25fGi = .25fUj = .5fCj = .25fGj = .25
fAU = .5fCG = .25fGC = .25
Mij = xixj fxixj log2 (fxixj/fxi fxj) =
.5 log2 (.5/(.5*.5))+2*.25 log2 (.25/(.25*.25))=
.5 *1 +.5*2 = 1.5A
A
A
A
U
U
U
UMij = 1 log 1 = 0
U
A
C
G
A
U
G
C
Mij = 4*.25 log 4 = 2
i j
Other methods
• HMMs• Stochastic context free grammars
Conclusion
• RNA secondary structure prediction– Single sequence:
• Dot-plot• Nussinov dynamic programming• Energy function
– Covariance analysis• Mutual information• Hidden Markov Models • SCFGs
Finding “most probable structure”
• S – structure then, E(S) free energy of S p(S) = exp(-E(S)/kT)/Q Q = x exp(-E(x)/kT) ) partition function• Problem: computing Q• Method to compute Q – dynamic
programming (similar as presented before but scores are replaced with probabilities and min energy with sum of probabilities).
tRNA
Answer
http://ludwig-sun2.unil.ch/~bsondere/nussinov/form.html#CYK)