RNA Structure Prediction and Comparison Session 4 … structure is often wrong ... package...

42
RNA Structure SS 2009 Robert Giegerich Motivation Lost in Folding Space Abstraction comes to rescue The idea of abstract shapes The general idea Defining shape abstractions Properties of the shape space Simple shape analysis The tool RNAshapes Complete probabilistic shape analysis Shape Probabilitites RNA Structure Prediction and Comparison Session 4 Abstract Shape Analysis Robert Giegerich Faculty of Technology Bielefeld University [email protected] Bielefeld, SS 2009 Robert Giegerich RNA Structure SS 2009

Transcript of RNA Structure Prediction and Comparison Session 4 … structure is often wrong ... package...

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNA Structure Prediction and ComparisonSession 4

Abstract Shape Analysis

Robert Giegerich

Faculty of TechnologyBielefeld University

[email protected]

Bielefeld, SS 2009

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

1 MotivationLost in Folding SpaceAbstraction comes to rescue

2 The idea of abstract shapesThe general ideaDefining shape abstractionsProperties of the shape space

3 Simple shape analysisThe tool RNAshapes

4 Complete probabilistic shape analysisShape ProbabilititesThe RNAshapes package

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNA Gene Prediction via Structure . . .

“Is this sequence an RNA gene?” ↔“Does it have a known functional structure?”

When sequence conservation is low or no homologs are known:STEP 1: MFE folding (Mfold, RNAfold, pknotsRG)STEP 2: Structure comparison against known functionalstructures

It is not that easy ...

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNA Gene Prediction via Structure . . .

“Is this sequence an RNA gene?” ↔“Does it have a known functional structure?”

When sequence conservation is low or no homologs are known:STEP 1: MFE folding (Mfold, RNAfold, pknotsRG)STEP 2: Structure comparison against known functionalstructures

It is not that easy ...

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Accuracy of MFE folding . . .

adequacy of thermodynamic parameters . . . ?

interaction with other molecules . . . ?

RNA sequence processing . . . ?

folding kinetics (co-transcriptional folding) . . . ?

physical properties of the folding space . . . !

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Recent Mfold Evaluation by Gutell Lab

Doshi KJ, Cannone JJ, Cobaugh CW, Gutell RR.: Evaluation of the

suitability of free-energy minimization using nearest-neighbor energy

parameters for RNA secondary structure prediction. BMC Bioinformatics.

2004 Aug 5;5:105.

Compares MFE foldings to structures derived by comparativeanalysis and proven by experimental techniques.Findings:

base pair accuracy of about 20% - 71%

no improvement from recently updated thermodynamicparameters

note: did not check for good near-optimal solutions

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Lost in Folding Space (1)

The folding space of a given sequence is LARGE:

number of foldings is exponential in sequence length

number of near-optimal foldings is exponential in energywindow

Look at the 111 “best” structures for a tRNA (using the toolRNAmovies).

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Lost in Folding Space (2)

What we observe from RNAmovie:

LARGE number of close-to-optimal foldings

FEW structural classes holding many similar foldings

Can we reduce the folding space to the representatives of theseclasses?

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNA structure prediction based on thermodynamics

Even with the best possible model parameters:

MFE structure is often wrong

Some near-optimal structure is always right

The number of near-optimals is exponential

Most are similar, but some quite distinct

C

U

GC

A

G

UA

G

G

U U GG

UC C

G

CG

C

G

U C

UG

CUG

CGG

U

GC

C G

G

A

AU

C

G

U

C

G

G

U

U

G

G

Multiple Loop

Stacking Region

Hairpin Loop

Internal Loop

Bulge Loop (left)

Bulge Loop (right)

C

C A

C

UGGC

GCC

G

CG

G

GC

C

G

A

CG

UC

G A

CU

A G

G CC

G

C

U

C

GGA

A

A

C

G

G

G

G

U

A

C

C

G

C

G

UU

C

CC

A

C

U

A

G

G

C

G

C

C

GG

Is there a shape LIKE this .............. or NOT like this.....?

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stems

Shape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )

Shape abstraction may retain or disregard presence and type ofbulges and internal loops

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Formalizing the notion of (abstract) shape

Shape abstraction retains nesting and adjacency of stemsShape abstraction disregards all sizes (of stems, loops, . . . )Shape abstraction may retain or disregard presence and type ofbulges and internal loops

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Levels of abstraction

Level 0 Level 1

All types ofFull structure

loops

Level 3

All helix

Level 4

Multi− and

internal loops,

no bulges

Level 5

Stem

arrangement

only

interruptions

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Shape abstraction mathematics

General:

tree-like domains of structures F and shapes Ptree homomorphism π : F → P

For each sequence s:

folding space of sequence s: F (s)

shape space of sequence s: P(s) = π(F (s))

shape class of p in F (s):f (x , p) = {x |x ∈ F (S), π(x) = p}

shape representative structure:shrep = class member of minimal free energy, formally

shrep(s, p)

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Shape Abstraction – Informal

Strongest abstraction – no bulges

(((((...((((...))..))(((((...)))))...(((.((..))...))).)))))

[ [ [ ] _ ] [ ] [ _ [ ] _ ] ]

[ [ ] [ ] [ ] ]

Weaker abstraction – retaining bulges

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Shape trees and shape strings

Level 0 Level 3 Level 5

sr

sr

sr

ml

c

c

c

a a usr

srsr

c guuuu bl

auasr

g

g

g

sr

sr

c gccc

AD

CL CL

CL

CL CL

AD

CL

CLc g

c gc

g

g

c

((((.(((...)))((...(...))))))) [ [ ] [ [ ] ] ] [ [ ] [ ] ]

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Shape algorithmics

Implementation of shape analysis:

shape abstractions are tree homomorphisms

integrate well with DP algorithms

allows for a priori rather than a posteriori analysis

compute shapes in parallel with energyperform analyses on per-shape basis

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Properties of Shapes and shreps

shape classes are disjoint

shreps are interesting

shapes have sequence-independent representation

shapes are meaningful across different sequences (ofdifferent length)

shapes and shreps can be computed efficiently

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Simple shape Analysis with RNAshapes

The three top shreps of the aforementioned tRNA:

Shape GGGCCCAUAGCUCAGUGGUAGAGUGCCUCCUUUGCAAGGAGGAUGCCCUGGGUUCGAAUCCCAGUGGGUCCA

[] (((((((((((((((.((((.....(((((((...))))))).))))))))))).........)))))))). -35.9 kcal/mol[[][]] ((((((((.....((.((((.....(((((((...))))))).))))))(((.......))).)))))))). -32.2 kcal/mol

[[][][]] ((((((...((((.......)))).(((((((...))))))).....(((((.......))))).)))))). -31.7 kcal/mol

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

GG

GG

AUG

UA

GC

UCA

GUG

GUAG

AGC

GC

AU

GC

UU C

GCAUGU A U

GA

GGCC C

CGGGUU C

GAUCCCC G

GC

AUCU

C

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

GGGCCCAUAG

CUCA

GUGG

UAGAG

UGCCUCCUU

UG C

AAGGAGG

AUGCCCU

G G GU U

CG

AAUCCC

AGUGGGUCCA

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

GGGCCCAUA

GCUCAGU

GG

U AG A G U

GCCUCCUU

UG C

AAGGAGGAUGC

CC U G G G

U UCG

AAUCCCAG

UGGGUCCA

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Shape Space Statistics

Is the shape space really smaller than the folding space?See some statistics within 5% kcal/mol of MFE:

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300

Nr.

of S

truct

ures

/Sha

pes

Sequence length [nt]

ShapesStructures

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

1

10

100

1000

10000

100000

1e+06

0 50 100 150 200 250 300

Nr.

of S

truct

ures

/Sha

pes

Sequence length [nt]

ShapesStructures

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

0.0001

0.001

0.01

0.1

1

0 50 100 150 200 250 300

Rat

io o

f Sha

pes

to S

truct

ures

Sequence length [nt]

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

1e-05

0.0001

0.001

0.01

0.1

1

0 2 4 6 8 10

Rat

io o

f Sha

pes

to S

truct

ures

Energy range above mfe [kcal/mol]

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

0.01

1

100

10000

1e+06

1e+08

1e+10

1e+12

1e+14

1e+16

1e+18

0 20 40 60 80 100 120

Nr.

of S

truct

ures

/Sha

pes

Sequence length N [nt]

StructuresShapes

0.0391 * 1.3968912N

0.2064 * 1.1067094N

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Variation within shape

How homogenous are shape classes?

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Variation within shape

How homogenous are shape classes?

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNAshapes: Best k shreps

Björn Voß

[] [[][]] [[][][]]

RNAshapes

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

The tool RNAshapes

The tool RNAshapes

classifies structures by abstract shape

computes a small number of representative structures

no heuristics involved

as fast as traditional RNA folding

Available athttp://bibiserv.techfak.uni-bielefeld.de/RNAshapes/

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Complete probabilistic shape analysis

“How much would you trust a structure with aprobability of 0.1 ∗ 10−4, even when it is optimal?”

Chip Lawrence, Benasque 2003

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

From energy to probability

According to Boltzmann statistics, sequence s has structure xwith probability

Prob(x) = (e−Ex/RT )/Q

where T is temperature, R universal gas constant, andQ the “partition function”, Q =

∑x∈F (s) e−Ex/RT

Accumulated shape probabilities

Prob(p) =∑

π(x)=p Prob(x) for all p ∈ P(s)

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

RNAshapes package

Overtaking: Shape probabilities contradict energy ranking

[ ]E= -22.90 kcal/mol

P= 0.2370279

[ ][ ][ ]E= -22.50 kcal/mol

P= 0.0999191

[ ][ ]E= -22.30 kcal/mol

P= 0.5511424

Gets 2nd Gets 3rd

Gets 1stBjörn Voß

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

A propos “complete”

probabilities give full information about folding space

we cannot compute only the k most likely shapes

only feasible up to 300 nts

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Algorithmics

Complete probabilistic shape analysis

requires a non-ambiguous grammar with correct dangles atall places

applies “classified” dynamic programming

takes time O(1.1n ∗ n3) where n = |s|

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Results from complete probabilistic analysis

Some observations:

Sequence Shape 1 Prob. Shape 2 Prob.lin-4 precursor [] 0.99999994tRNA-ala [] 0.989744 [[]] 0.008994typical mRNA [][[][]] 0.432154 [[[][]][]] 0.149831HIV-1 Leader [][[][[][]]]] 0.6164 [][[[][[][]]][]] 0.3492

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Summary on abstract shape analysis (1)

Shape representatives

are cheap to compute

give small but representative sample of potential structures

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

Summary on abstract shape analysis (2)

Shape probabilities

provide the same representatives

give a measure of well-definedness of folding

independent of sequence composition and sequence length(in contrast to MFE values)

exclude further alternatives when probability is 90%covered

are more expensive to compute

require an exact solution of the dangling base problem

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

References on abstract shape analysis

Abstract Shapes of RNA by Giegerich, Voss, Rehmsmeier.Nucleic Acids Research 2004, Vol. 32, No 15, 1 - 9.

Complete Probabilistic Analysis of RNA Abstract Shapesby Voss, Giegerich, Rehmsmeier. BMC Biology, 2006, Feb15;4(1):5

RNAshapes: an integrated RNA analysis package based onabstract shapes. Steffen P, Voss B, Rehmsmeier M, ReederJ, Giegerich R. Bioinformatics 2006, Feb 15;22(4):500-3.

RNAsifter:Shape based indexing to speed up Rfamsearches by Voss,Janssen, Reeder, Giegerich. BMCBioinformatics, 2007.

Robert Giegerich RNA Structure SS 2009

RNAStructure SS

2009

RobertGiegerich

Motivation

Lost in FoldingSpace

Abstractioncomes to rescue

The idea ofabstractshapes

The general idea

Defining shapeabstractions

Properties of theshape space

Simple shapeanalysis

The toolRNAshapes

Completeprobabilisticshape analysis

ShapeProbabilitites

The RNAshapespackage

The End

Thanks for your attention.

Robert Giegerich RNA Structure SS 2009