Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220...

51
Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics

Transcript of Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220...

Page 1: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

Lecture 11. RNA Secondary Structure Prediction

The Chinese University of Hong KongCSCI3220 Algorithms for Bioinformatics

Page 2: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 2

Lecture outline1. From sequences to functions2. RNA secondary structures

Last update: 21-Nov-2015

Page 3: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

FROM SEQUENCES TO FUNCTIONSPart 1

Page 4: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 4

From sequences to functions• One of the biggest questions in biology: Can

one tell the function of a molecule (DNA/RNA/protein) from its sequence alone?– Sometimes, but usually not– Easier if we also know the structure– Common believe:

sequence structure function– Of course, also depends on the environment

Last update: 21-Nov-2015

Page 5: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 5

Molecular structures• Four levels:– Primary structures

• The sequence

– Secondary structures• First formed• Local

– Tertiary structures• Global• “Folds”, “domains”

– Quaternary structures• Multiple molecules

Last update: 21-Nov-2015

Image credit: http://www.personal.psu.edu/jms5704/blogs/simmons/levels_of_protein_s_c_la_784.jpg

Page 6: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 6

Primary structures• Connections (strong

covalent bonds vs. weak hydrogen bonds)– Which molecules are

connected– Which atoms are

connected– First-level constraints of the

possible structures• Example: Molecules close in

primary structure must also be close in secondary, tertiary and quaternary structures

Last update: 21-Nov-2015

Image credit: Wikibooks

Page 7: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 7

Primary structures• Orientation:

– DNA, RNA: 5’-3’– Amino acids: Amino (N)

terminus to carboxyl (C) terminus• “Residue”: what

remains after a water molecule is expelled

Last update: 21-Nov-2015

Image credit: http://bealbio.wikispaces.com/file/view/dsDNA.jpg, http://attentionmanagement.ca/userfiles/image/DNA-RNA%20directions.gif, http://www.phschool.com/science/biology_place/biocoach/images/translation/peptbond.gif, http://www.cystinuria.org/resources/education/aminoacids/peptide.gif

Page 8: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 8

DNA secondary structures• Double helix• A-DNA (dehydrated samples)– Right-handed– 11bp per turn

• Most common: B-DNA– Right-handed– 10.5bp per turn

• Z-DNA (some methylated DNA)– Left-handed– 12bp per turn

Last update: 21-Nov-2015

Image credit: Wikipedia

Page 9: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 9

DNA secondary structures

Last update: 21-Nov-2015

A-DNA B-DNA Z-DNAImage credit: Wikipedia

Page 10: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 10

RNA secondary structures• Largely possible to be projected onto a 2D

plane

Last update: 21-Nov-2015

Stem/hairpin loop Stacking pairs Bulge

Image credit: http://www.clcbio.com/scienceimages/rna_prediction/RNA_structure_prediction_web.png

Internal loop Multi-loop Exterior loop

Dangling nucleotides Less stable pair Coaxial stacking

Page 11: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 11

RNA secondary structures• Pseudoknots: complex structures

Last update: 21-Nov-2015

Image credit: Wikipedia, Sperschneider and Datta, RNA 14(4):630-640, (2008)

Page 12: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 12

Protein secondary structures• Three main types:– -helixes– -sheets– Coils (connectors)

Last update: 21-Nov-2015

Image credit: http://calcium.uhnres.utoronto.ca/cadherin/images/pub_pages/general/ribbon.jpg, http://www.mun.ca/biology/scarr/MGA2-03-25.jpg

Page 13: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 13

DNA tertiary structures• Wrapped around nucleosomes

formed by histone proteins• Condensed form at beginning

of mitosis and meiosis

Last update: 21-Nov-2015

Image credit: http://micro.magnet.fsu.edu/cells/nucleus/images/chromatinstructurefigure1.jpg, Wikipedia

Page 14: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 14

RNA tertiary structures• Overall structure of an RNA

– More studied for RNAs that do not translate into proteins -- “non-coding” RNAs

– Example: tRNA

Last update: 21-Nov-2015

Image credit: Wikipedia

Page 15: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 15

Protein tertiary structures• Complex structures– Mainly caused by weak forces (hydrogen

bonds and hydrophobic interactions)– Occasionally stronger forces (disulfide bonds

between cysteines)

• The CATH hierarchy– Class: composition of secondary structures– Architecture: overall shape– Topology: connection of secondary structures– Homologous: with common ancestor

Last update: 21-Nov-2015

Image credit: CATH

Page 16: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 16

Quaternary structures• Types:– Protein subunit-protein

sub-unit– Protein-protein– Protein-DNA– Protein-RNA– (Protein-small molecules)– RNA-RNA– ...

Last update: 21-Nov-2015

Image credit: Wikipedia, http://serrano.crg.es/images/protein_dna1.jpg Protein-DNA interaction

Protein-subunit interaction (Hemoglobin)

Page 17: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 17

Structure and function • Why function depends

on structure?1. Structure itself is the

function (e.g., tubulins)

2. Binding• Complementarity of

interacting structures• Formation of special

bonds

Last update: 21-Nov-2015

Image credit: http://www.nigms.nih.gov/NR/rdonlyres/54BEAC37-47A9-454A-BC4F-B94EA127FA1E/0/fig1a_large.jpg, http://upload.wikimedia.org/wikimedia/en-labs/7/7f/Protein_Protein_Docking.JPG

Page 18: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 18

Structure and function • Why function depends

on structure? (cont’d)3. Functional group (e.g.,

catalytic site)4. Determining

localization (e.g., transporter membrane proteins)

Last update: 21-Nov-2015

Image credit: http://www.catalysis-ed.org.uk/principles/images/enzyme_substrate.gif, Spudich , Science 288(5470):1358-1359, 2000

Page 19: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

RNA SECONDARY STRUCTURESPart 2

Page 20: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 20

Important RNA classes• Coding:

– Messenger RNAs (mRNAs)• For translating into proteins

• Non-coding:– Ribosomal RNAs (rRNAs)

• Parts of the ribosome complex

– Transfer RNAs (tRNAs)• Delivering free amino acids during translation

– Micro RNAs (miRNAs)• Binding mRNA targets to promote RNA

degradation or repress translation

– Small nucleolar RNAs (snoRNAs)• Guiding chemical modifications of other RNAs

– Small nuclear RNAs (snRNAs)• Involved in mRNA splicing

– Long non-coding RNAs (lncRNAs)• Some involved in gene regulation

– ...

Last update: 21-Nov-2015

Image source: http://legacy.hopkinsville.kctcs.edu/sitecore/instructors/Jason-Arnold/VLI/Module%201/m1DNAfunction/m1DNAfunction3.html

Page 21: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 21

Importance of RNA structures• Structure is important to many classes of RNA• Examples:

Last update: 21-Nov-2015

Image sources: http://www.bio.miami.edu/dana/pix/tRNA.jpg, http://lowelab.ucsc.edu/images/CDBox.jpg

tRNA snoRNA

Page 22: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 22

Representing RNA secondary structures

• Formats: (see http://projects.binf.ku.dk/pgardner/bralibase/RNAformats.html):– Dot-bracket format– Stockholm format– ...

Last update: 21-Nov-2015

Page 23: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 23

Dot-bracket format

• Sequence (nucleotides 10, 20, 30, etc. marked in red):GUGAAUGAUGAAUUUAAUUCUUUGGUCCGUGUUUAUGAUGGGAAGUAAGACCCCCGAUAUGAGUGACAAAAGAGAUGUGGUUGACUAUCACAGUAUCUGACG

• Structure:......((((.......((((((.(((....((((((.((((..........)))).)))))).))).)))))).((((((.....)))))).)))).....

Last update: 21-Nov-2015

Image credit: Xihao Hu

Page 24: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 24

Predicting RNA secondary structures

• A basic assumption in structure predictions:– Real structure has the lowest free energy

• In a simplified view, more stable bonds lower free energy

• In the case of RNA secondary structures:– Good to form more pairs

• A-U• C-G• Sometimes G-U (a “wobble base pair”)

– Good to form more stable pairs• C-G > A-U > G-U

– Good to have stable sub-structures• E.g., stacking pairs

Last update: 21-Nov-2015

Page 25: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 25

Predicting RNA secondary structures

• We will assume there are no pseudoknots– With pseudoknots, currently there is no

known algorithm that can find the optimal solution efficiently

• We need two things:1. A thermodynamic model for

computing the free energy of a structure

2. A method for finding the structure with the minimum free energy

– This setting sounds familiar?

Last update: 21-Nov-2015

Image credit: Wikipedia

A pseudoknot

Page 26: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 26

Further assumptions1. The free energy of a secondary structure is

the sum of the free energies of the sub-structures– Not the sum of individual bases/base pairs, as

one base pair can participate in multiple sub-structures

2. The free energies of the sub-structures are independent

Last update: 21-Nov-2015

Page 27: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 27

Problem definition• Given an RNA sequence, find a set of base

pairs so that each base is paired at most once• Example:– Input sequence: GUGAAUGAUGAAUUU...ACG– Output set of base pairs:• (7, 97)• (8, 96)• ...• (18, 74)• ...• (81, 87)

Last update: 21-Nov-2015

Image credit: Xihao Hu

Page 28: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 28

Linear view

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97. ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ...... ) )

Page 29: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 29

Thermodynamics model• We will consider four types of sub-structures here:– Stacking pairs: both (i, j) and (i+1, j-1) are in the set– Hairpin loop: there is a pair (i, j), where all bases from i+1

to j-1 are not paired– Bulge/Internal loop: there are two pairs (i, j) and (i1, j1),

where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired

– Multi-loop: there are pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired

• One base pair can participate in multiple structures

Last update: 21-Nov-2015

Page 30: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 30

Stacking pairs• Both (i, j) and (i+1,

j-1) are in the set• E.g., i:20, j:72

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i+1 j-1 j

Page 31: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 31

Hairpin loop• There is a pair (i, j),

where all bases from i+1 to j-1 are not paired

• E.g., i: 81, j: 87

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i j

Image source: http://img.ehowcdn.com/article-new/ds-photo/getty/article/151/226/87820768_XS.jpg

Page 32: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 32

Bulge/Internal loop• Internal loop: There

are two pairs (i, j) and (i1, j1), where i<i1<j1<j, and all bases from i+1 to i1-1 and from j1+1 to j-1 are not paired– Called a bulge if only

one side has unpaired bases

• E.g., i:23, j:69, i1:25, j1:67

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i1 jj1

Page 33: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 33

Multi-loop• Multi-loop: There are

pairs (i, j), (i1, j1), ..., (ik, jk), where i<i1<j1<...<ik<jk<j, and all bases from i+1 to i1-1, from j1+1 to i2-1, ..., jk-1+1 to ik-1 and from jk+1 to j-1 are unpaired

• E.g., k=2, i:10, j:94, i1:18, j1:74, i2:76, j2:92

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97i i1 j1 i2 j2 j

Page 34: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 34

One possible thermodynamic model• Unpaired bases have 0 free energy and all the

terms below have negative free energy• eS(i, j): for the stacking pairs (i, j) and (i+1, j-1)• eH(i, j): for the hairpin loop closed at (i, j)• eBI(i, j, i1, j1): for a bulge or internal loop

enclosed by the pairs (i, j) and (i1, j1)

• eM(i, j, i1, j1, ..., ik, jk): for a multi-loop that consists of the pairs (i, j), (i1, j1), ..., (ik, jk) and satisfying i<i1<j1<...<ik<jk<j

Last update: 21-Nov-2015

Page 35: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 35

Finding the optimal structure• Dynamic programming• Let s be the RNA sequence with n nucleotides• Tables:– V(j): free energy of the optimal structure for s[1..j]

• Final answer is based on V(n)

– VP(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair

– VBI(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a budge or internal loop

– VM(i, j): free energy of the optimal structure for s[i..j] with i and j forming a pair that closes a multi-loop

Last update: 21-Nov-2015

Page 36: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 36

Update formulas• V(j): free energy of the optimal structure for

s[1..j]• V(1) = 0• For j > 1,

Last update: 21-Nov-2015

j...1

i ... j... i-11

j-1

j...1

j is unpaired

j pairs with i

...

...

...

Vሺ𝑗ሻ= minቊVሺ𝑗− 1ሻ 𝑗 is unpairedmin1≤𝑖<𝑗ሼVPሺ𝑖,𝑗ሻ+ Vሺ𝑖 − 1ሻሽ 𝑗 pairs with 𝑖

Page 37: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 37

Update formulas• VP(i, j): free energy of the optimal structure for

s[i..j] with i and j forming a pair • We require that i < j

Last update: 21-Nov-2015

i ... j...

i ... j... j-1i+1Stacking pairs

i ... j...Hairpin loop

All unpaired

...

...

...

Vpሺ𝑖,𝑗ሻ= minە۔

+eSሺ𝑖,𝑗ሻۓ VPሺ𝑖 + 1,𝑗− 1ሻ ሺ𝑖,𝑗ሻ and ሺ𝑖 + 1,𝑗+ 1ሻ form stacking pairseH(𝑖,𝑗) ሺ𝑖,𝑗ሻ closes a hairpin loopVBIሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes an internal loopVMሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes a multi loop

Page 38: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 38

Update formulas• VBI(i, j): free energy of the optimal structure

for s[i..j] with i and j forming a pair that closes a budge or internal loop (i.e., i and j take the roles of i1 and j1)

Last update: 21-Nov-2015

i ... j... ...

i ... j... ...i1 ... j1 ...Budge or internal loop

All unpaired All unpaired

VBIሺ𝑖,𝑗ሻ= min𝑖1,𝑗1:𝑖<𝑖1<𝑗1<𝑗ሼeBIሺ𝑖,𝑗,𝑖1,𝑗1ሻ+ VPሺ𝑖1,𝑗1ሻሽ

Page 39: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 39

Update formulas• VM(i, j): free energy of the optimal structure

for s[i..j] with i and j forming a pair that closes a multi-loop

Last update: 21-Nov-2015

i ... j... ...

VMሺ𝑖,𝑗ሻ= min𝑖1,𝑗1,…,𝑖k,𝑗𝑘:𝑖<𝑖1<𝑗1<...<𝑖𝑘<𝑗𝑘<𝑗൝eMሺ𝑖,𝑗,𝑖1,𝑗1,…,𝑖𝑘,𝑗𝑘ሻ+ VPሺ𝑖ℎ,𝑗ℎሻ𝑘ℎ=1 ൡ

Page 40: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 40

Time and space requirements

• V: n entries, each takes O(n) time• VP(i, j): O(n2) entries, each takes constant time

Last update: 21-Nov-2015

Vሺ𝑗ሻ= minቊVሺ𝑗− 1ሻ 𝑗 is unpairedmin1≤𝑖<𝑗ሼVPሺ𝑖,𝑗ሻ+ Vሺ𝑖 − 1ሻሽ 𝑗 pairs with 𝑖

Vpሺ𝑖,𝑗ሻ= minە۔

+eSሺ𝑖,𝑗ሻۓ VPሺ𝑖 + 1,𝑗− 1ሻ ሺ𝑖,𝑗ሻ and ሺ𝑖 + 1,𝑗− 1ሻ form stacking pairseH(𝑖,𝑗) ሺ𝑖,𝑗ሻ closes a hairpin loopVBIሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes an internal loopVMሺ𝑖,𝑗ሻ ሺ𝑖,𝑗ሻ closes a multi loop

Page 41: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 41

Time and space requirements

• VBI: O(n2) entries, each takes O(n2) time

• VM: O(n2) entries, each takes O(n2k) time

Last update: 21-Nov-2015

VBIሺ𝑖,𝑗ሻ= min𝑖1,𝑗1:𝑖<𝑖1<𝑗1<𝑗ሼeBIሺ𝑖,𝑗,𝑖1,𝑗1ሻ+ VPሺ𝑖1,𝑗1ሻሽ VMሺ𝑖,𝑗ሻ= min𝑖1,𝑗1,…,𝑖k,𝑗𝑘:𝑖<𝑖1<𝑗1<...<𝑖𝑘<𝑗𝑘<𝑗൝eMሺ𝑖,𝑗,𝑖1,𝑗1,…,𝑖𝑘,𝑗𝑘ሻ+ VPሺ𝑖ℎ,𝑗ℎሻ𝑘

ℎ=1 ൡ

Page 42: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 42

Time and space requirements• Summary:– V: n entries, each takes O(n) time– VP: O(n2) entries, each takes constant time

– VBI: O(n2) entries, each takes O(n2) time

– VM: O(n2) entries, each takes O(n2k) time

• Total: O(n2) space, O(n2k+2) time– Exponential if k is unbounded– Some approximations could bring the time down

to O(n4) – still huge for large n, but feasible for small or median n

Last update: 21-Nov-2015

Page 43: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 43

Some remarks• If we allow general pseudoknots, there is

currently no efficient way to find the optimal RNA secondary structure with the minimum free energy

• Other methods to predict RNA secondary structures:– Conservation and covariation• High conservation: 2 and 4• Strong covariation: 1 and 5

– Experimental methods (e.g., RNA footprinting)

Last update: 21-Nov-2015

12345ACGGUACUGUCCAGGUCCGA

Page 44: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 44

Representing pseudoknots

• Without pseudoknots, RNA secondary structures can be unambiguously represented by dots (single bases) and brackets (base pairs)

– What if there are pseudoknots?– Need more types of brackets

Last update: 21-Nov-2015

1 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ... 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 ...... 96 97

. ( ( ( ( . . . . . . . ( ( ( ( ( ( . ( ... ) . ) ) ) ) ) ) . ( ( ( ( ( ( . . . . . ) ) ) ) ) ) . ) ) ...... ) )

Image source: http://ultrastudio.org/upload/RNAPseudoKnot-25005810.jpg

GAAGUACAAUAUGUAACCG.{.((((.....))}))..

Page 45: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CASE STUDY, SUMMARY AND FURTHER READINGS

Epilogue

Page 46: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 46

Case study: Drug finding/design• Drugs are mostly chemicals with a specific

structure that interacts with some biological objects

• Examples:– Inhibiting the activities of an important protein of

bacteria– Blocking the interaction between virus and

receptors of host cell– Simulating the production of a hormone

Last update: 21-Nov-2015

Page 47: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 47

Case study: Drug finding/design• Suppose we want to identify/design a chemical to

target a particular object (e.g., a protein), we need to make sure that they have tight bindings through a process called docking

Last update: 21-Nov-2015

Image source: http://vds.cm.utexas.edu/

Page 48: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 48

Case study: Drug finding/design• Computational problem:– Input: a target protein and a list of chemicals– Goal: find a chemical that binds the target well

• Try different locations and orientations• Binding depends on structure and chemistry

– Output: One or more chemicals that bind the target well• Difficulties:– Computational complexity

• Large search space for each protein-chemical combination• Need to try many chemicals

– Need to ensure specificity (not to target other proteins and cause side effects)

Last update: 21-Nov-2015

Page 49: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 49

Case study: Drug finding/design• There is a game for players to try folding proteins called FoldIt (

http://fold.it/)– Score based on free energy– Real time update of scores and ranks– Players can discuss and share solutions– Resulted in some amazingly good folds as compared to automatic predictions

by computer programs

Last update: 21-Nov-2015

Image source: http://fold.it/portal/site_files/theme/science/competition.png

Page 50: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 50

Summary• Functions depend on structures• Different levels of structures:

– Primary (sequence)– Secondary (local)– Tertiary (global)– Quaternary (interactions)

• RNA secondary structures can be predicted by dynamic programming based on a thermodynamic model

• Important sub-structures– Stacking pairs– Hairpin loops– Internal loops/bulges– Multi-loops– Pseoduknots

Last update: 21-Nov-2015

Page 51: Lecture 11. RNA Secondary Structure Prediction The Chinese University of Hong Kong CSCI3220 Algorithms for Bioinformatics.

CSCI3220 Algorithms for Bioinformatics | Kevin Yip-cse-cuhk | Fall 2015 51

Further readings• Chapter 11 of Algorithms in Bioinformatics: A

Practical Introduction– Speed up of algorithm– Algorithm for RNA structure perdition with

pseudoknots– Free slides available

• Parts VII and VIII of Fundamental Concepts of Bioinformatics– Protein folding and protein structure prediction– Docking

Last update: 21-Nov-2015