Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D....

24
Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik, D. Ecker, R. Sampath, Natural Selection Inc. and Ibis Therapeutics
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D....

Discovery of RNA Structural Elements Using Evolutionary Computation

Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik, D.

Ecker, R. Sampath,

Natural Selection Inc. and Ibis Therapeutics

Presenter: Elena Zheleva

April 2, 2004

Introduction

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Problem Statement

Computational Biology problem: given a RNA secondary structure description, search for similar secondary structures

Currently, exhaustive search techniques are used to narrow down search space

Authors focus on presentation and set of operators to search via evolution

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Background

RNA (ribonucleic acid) directs middle steps of

protein production single-stranded, certain

parts are folded RNA Secondary

Structure - accounts for diverse functional activities

Background

RNA Secondary Structure: Recurs in multiple genes within a single

organism Recurs in across the same gene in several

organism Why a computational tool for RNA

secondary structure search? Discover new structures Improve understanding of functional and

regulatory relationships amongst related RNAs

Background – RNAMotif

RNAMotif: mines nucleotide sequence databases for repeating structure motifs

RNAMotif Input: descriptor contains details about pairing information, length, sequence

Background - RNAMotif

RNAMotif Output: list of real structures

RNAMotif may return a very high number of motifs when descriptor is more flexible

Input to the EA: RNAMotif Output

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Evolutionary Computation Population Initialization

P parent bins B – bin size Bin = a contending solution Each bin contains structures

from different organisms Structures chosen at random

from RNAMotif Output file

Figure 1

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Evolutionary Computation Variation

P parent bins are copied to O offspring bins Variables: operator, number of times to apply it Variation Operator 1: structure replacement

within a specified organism Replacement taken from RNAMotif Output File Local – neighboring replacement structure Global – random replacement structure Example: P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}

Evolutionary Computation Variation

Variation Operator 2: Structure replacement from different organisms Variable: # of structures to be replaced Example: # = 2

P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}

O organisms = {H. Sapiens, C. Griseus, E. Coli, S. Scrofa}

Evolutionary Computation Variation

Variation Operator 3: random single-point bin recombination Generates a second parent from RNAMotif output and

applies single-point bin recombination Chooses randomly one of the two offsprings Example: P = {H, S, E, G} P = {D, E, O, B}

O = {H, S, E, B} O = {D, E, O, G}

Variation Operator 4: random multi-point bin recombination

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Evolutionary Computation Fitness

Fitness Function Scoring Components: Structure nucleotide sequence similarity Structure length similarity Structure thermodynamic stability similarity

These measures are applied pairwise by each structural component and summed into a final bin score

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Evolutionary Computation Selection

Selection: For every bin in population, A set of R rival bins is randomly selected Calculate score = # rivals with lower fitness

Lower bins are removed Iterations continue until number of

generations (G) or CPU time is satisfied, or until expected change of fitness/gen 0

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Results

Experiment 1: 7.6x10 possible

bins Exhaustive search:

125 days

EA examined ~10 bins before

converging < 3 minutes

8

4

Results

Results

To test the utility of this method Run on newly discovered genomes (S. Pyogenes) Compare to database which has an alignment for

this RNA secondary structure for previously discovered genomes (S. Mutans)

Found similar sequence and structure to close organisms

Outline

Problem Statement Background Evolutionary Computation

Population initialization Variation Fitness Selection

Results Conclusion

Conclusion

Evolutionary Algorithm can be applied to find RNA secondary structures over a wide range of organisms

Converges quickly and reliably Algorithm comes up with a solution which

contains information about structural elements for different organisms/genomes