RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ......

37
RNA Matrices and RNA RNA Matrices and RNA Secondary Structures Secondary Structures Institute for Mathematics and Its Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota October 29 – November 2, 2007 Asamoah Nkwanta, Morgan State University [email protected]

Transcript of RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ......

Page 1: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Matrices and RNA RNA Matrices and RNA Secondary StructuresSecondary Structures

Institute for Mathematics and Its Applications: RNA in Biology,

Bioengineering and Nanotechnology,University of Minnesota

October 29 – November 2, 2007

Asamoah Nkwanta, Morgan State [email protected]

Page 2: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Secondary RNA Secondary StructureStructure PredictionPrediction

l Given a primary sequence, we want to find the biological function of the related secondary structure. To achieve this goal we predict its’ secondary structure using a lattice walk or path approach.

l This walk approach involves enumerative combinatorics and is connected to infinite lower triangular matrices called RNA matrices.

Page 3: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Secondary StructureRNA Secondary Structure

l Primary Structure – The linear sequence of bases in an RNA molecule

l Secondary Structure – The folding or coiling of the sequence due to bonded nucleotide pairs: A-U, G-C

l Tertiary Structure – The three dimensional configuration of an RNA molecule. The three dimensional shape is important for biological function, and it is harder to predict.

Page 4: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA MoleculeRNA Molecule

Ribonucleic acid (RNA) molecule: Three main categories

l mRNA (messenger) – carries genetic information from genes to other cells

l tRNA (transfer) – carries amino acids to a ribosome (cells for making proteins)

l rRNA (ribosomal) – part of the structure of a ribosome

Page 5: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA MolculeMolcule (cont.)(cont.)

Other types (RNA) molecules:

l snRNA (small nuclear RNA) – carries genetic information from genes to other cells

l miRNA (micro RNA) – carries amino acids to a ribosome (cells for making proteins)

l iRNA (immune RNA) – part of the structure of a ribosome (Important for HIV studies)

Page 6: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Primary RNA SequencePrimary RNA Sequence

l CAGCAUCACAUCCGCGGGGUAAACGCU

l Nucleotide Length, 27 bases

Page 7: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Geometric RepresentationGeometric Representation

l Secondary structure is a graph defined on a set of n labeled points (M.S. Waterman, 1978)

l Biologicall Combinatorial/Graph Theoreticl Random WalklOther Representations

Page 8: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota
Page 9: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Structure RNA Structure

3-D structure of Haloarcula marismortui5S ribosomal RNA in large ribosomal subunit

Page 10: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA NUMBERSRNA NUMBERS

l 1,1,1,2,4,8,17,37,82,185,423,978,…

l These numbers count RNA secondary structures of length n.

Page 11: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics

l Recurrence Relation:

( ) ( ) ( )

( ) ( ) ( ) ( ) ( )∑−

=

≥−−+=+

===1

1

2,11

1210n

j

njnsjsnsns

sss

lM. Waterman, Introduction to Computational Biology: Maps, sequences and genomes, 1995.

lM. Waterman, Secondary structure of single-stranded nucleic acids, Adv. Math. (suppl.) 1978.

Page 12: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Counting Sequence DatabaseCounting Sequence Database

l The On-line Encyclopedia of Integer Sequences: http:/www.research.att.com/njas/sequences/index.html

l N.J.A. Sloane & S. Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995.

Page 13: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

l The number of RNA secondary structuresfor the sequence [1,n] is counted by the coefficients of S(z):

Coefficients of the power series:

l (1,1,1,2,4,8,17,37,82,185,423,978,…)

( ) 2 3 4 5 6 71 2 4 8 317 7s z z z z z z z z= + + + + + + + +L

Page 14: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

( ) 2 3 4 5 6 71 2 4 8 317 7s z z z z z z z z= + + + + + + + +L

Page 15: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

l Based on the coefficients of the generating function there are approximately 1.3 billion possible RNA structures of length n = 27.

( ) 2 3 271,392,21 2 51,012s z z z z z= + + + + + +L L

Page 16: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

l Using the recurrence relation we can find the closed form generating function associated with the RNA numbers.

( ) ( )2 2 3 4

2

1 1 2 2

2

z z z z z zs z

z

− + − − − − +=

( ) 2 3 4 5 6 27171 1,392,251,0 22 4 8 1s z z z z z z z z= + + + + + + + + +L L

Page 17: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

l Exact Formula, and Asymptotic Estimate (as n grows without bound):

( ) ( )

−−

−= ∑

≥ kkn

kkn

knns

k 11

1

( )n

nns

+

+ −

253

85715

~ 23

π

Page 18: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

l S(n,k) is the number of structures of length n with exactly k base pairs: For n,k > 0,

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

( ) 11,

1 1n k n k

s n kk kk

− − − = + −

Page 19: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

( ) ( )6,1 10 and 6,2 6s s= =

Page 20: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA RNA CombinatoricsCombinatorics (cont.)(cont.)

l RNA hairpin combinatorics.

( )

( )

( )

( )

2

1

2

10

4

20

2 1, number of hairpins of length n

H 2 1, number of hairpins with m or more bases in

the loop, n m+1

k a,b , number of stems with topmost bases bound

a,b

n

n m

a b

ak

a b

ak

h n

n

K

− −

+ −

−≥

+ −

−≥

= −

= −

=

=

∑ , number of stems with first and last bases bound

Page 21: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Random WalkRandom Walk

l A random walk is a lattice path from one point to another such that steps are allowed in a discrete number of directions and are of a certain length

Page 22: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Walk RNA Walk –– Type IType I

l NSE* Walks – Unit step walks starting at the origin (0,0) with steps up, down, and right

l No walks pass below the x-axis and there are no consecutive NS steps

Page 23: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Type I, RNA Array (n x k)Type I, RNA Array (n x k)

⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅

161524302817015101313800146640001332000012100000110000001

Page 24: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Type I, RNA Array (n x k)Type I, RNA Array (n x k)

0 0 0 0 01 0 0 0 02 1 0 03 1 0 0

6 4 0 01 3 5 1 0

2 8 3 0 2 4 1 5 6 1

00

0 03

111248

1 7

06 1

1 3 1 0

⋅ ⋅ ⋅

⋅ ⋅

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Page 25: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Type I, Formation Rule (Recurrence)Type I, Formation Rule (Recurrence)

( )( ) ( )

( ) ( ) ( )

( ) 1,0,1

,1,,1

,0,1

10,0

0

0

+>=+

+−+−=+

−=+

=

nkknm

jkjnmknmknm

jjnmnm

m

j

j

Note. S(z) can be derived using this recurrence.

Page 26: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

First Moments/Weighted Row SumsFirst Moments/Weighted Row Sums

1 0 0 0 0 0 11 1 0 0 0 0 21 2 1 0 0 0 32 3 3 1 0 0 44 6 6 4 1 0 58 13 13 10

1382155

1445 1 6

⋅ ⋅ ⋅

=⋅ ⋅

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Computing the average height of the walks above the x-axis is given by the alternate Fibonacci numbers

Page 27: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Walk RNA Walk –– Type IIType II

l NSE** Walks – Unit-step walks starting at the origin (0,0) with steps up, down, and right such that no walks pass below the x-axis and there are no consecutive SN steps

Page 28: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Type II , RNA Array (n x k)Type II , RNA Array (n x k)

0 0 0 0 0 01 0 0 0 0 02 1 0 0 0 04 3 1 0 0 09 7 4 1 0 0

1 7 1 1 5 1 0

4 5 4 1 2 9 1 6 6 1

2 0

11248

1 7

3 7

⋅ ⋅ ⋅

⋅ ⋅

⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅

Page 29: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

ExamplesExamples

l Type I: ENNESNESSE

l Type II: NEEENSEEES

Note. Some Type II walks are not associated with RNA.Thus we have two class of walks to work with for RNAprediction.

Page 30: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

RNA Walk BijectionRNA Walk Bijection

l Theorem: There is a bijection between the set of NSE* walks of length n+1 ending at height k = 0 and the set of NSE** walks of length n ending at height k = 0.

l Source: Lattice paths, generating functions, and the Riordan group, Ph.D. Thesis, Howard University, Washington, DC, 1997

Page 31: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Main TheoremMain Theorem

l Theorem: There is a bijection between the set of RNA secondary structures of length n and the set of NSE* walks ending at height k = 0.

l Source: Lattice paths and RNA secondary structures, DIMAC Series in Discrete Math. & Theoretical Computer Science 34 (1997) 137-147. (CAARMS2 Proceedings)

Page 32: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Main Theorem (cont.)Main Theorem (cont.)

l Proof (sketch): Consider an RNA sequence of length n and convert it to the non-intersecting chord form. Consider the following rules:

( ) ( )SNji

Ek

,, ↔

Page 33: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

l Given primary RNA sequences and using RNA combinatorics, the goal of this project is to model components of an HIV-1 RNA secondary structure (namely SL2 and SL3 domains). The major concentration of this project is on reducing the minimum free energy to form an optimum HIV-1 RNA secondary structure.

l Source: HIV-1 sequence prediction, 2007, in progress

Application: HIVApplication: HIV--1 Prediction1 Prediction

Page 34: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

HIV-I 5' RNA Structural Elements. Illustration of a working model of the HIV-I 5'UTR showing the various stem-loop structures important for virus replication. These are the TAR element, the poly(A) hairpin, the U5-PBS complex, the stem-loops 1-4 containing the DIS, the major splice donor, the major packaging signal, and the gag start codon, respectively. Nucleotides and numbering correspond to the HIV-I HXB2 sequence. (Adapted from Clever et al. (1995) and Berkhout and van Wamel (2000))

Page 35: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Application: HIVApplication: HIV--1 Prediction (cont.) 1 Prediction (cont.)

l The following sequence was obtained from the NCBI website. The first 363 nucleotides were extracted from the entire HIV-1 RNA genomic sequence:

l GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGGAACCCACUGCUUAAGCCUCAAUAAAGCUUGCCUUGAGUGCUUCAAGUAGUGUGUGCCCGUCUGUUGUGUGACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGUGUGGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACCUGAAAGCGAAAGGGAAACCAGAGGAGCUCUCUCGACGCAGGACUCGGCUUGCUGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUAGAAGGAGAGAGAUGGGUGCGAGAGCGUCAGUAUUAAGCG

l Color key: SL2 – yellowSL3 - red

Page 36: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

Future Research: Centers ForFuture Research: Centers For

l Biological and Chemical Sensors Research

l Environmental Toxicology and Biosensors Research

l The mission is to advance the fundamental scientific and technological knowledge needed to enable the development of new biological and chemical sensors.

Page 37: RNA Matrices and RNA Secondary Structures€¦ · RNA Matrices and RNA Secondary Structures ... Applications: RNA in Biology, Bioengineering and Nanotechnology, University of Minnesota

MathMath--Bio CollaboratorsBio Collaborators

l Dwayne Hill, Biology Dept., MSU

l Alvin Kennedy and Richard Williams, Chemistry Dept., MSU

l Wilfred Ndifon, Ecology and Evolutionary Biology Dept., Princeton U.

l Boniface Eke, Mathematics Dept., MSU