Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D....
-
date post
21-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of Discovery of RNA Structural Elements Using Evolutionary Computation Authors: G. Fogel, V. Porto, D....
Discovery of RNA Structural Elements Using Evolutionary Computation
Authors: G. Fogel, V. Porto, D. Weekes, D. Fogel, R. Griffey, J. McNeil, E. Lesnik, D.
Ecker, R. Sampath,
Natural Selection Inc. and Ibis Therapeutics
Presenter: Elena Zheleva
April 2, 2004
Introduction
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Problem Statement
Computational Biology problem: given a RNA secondary structure description, search for similar secondary structures
Currently, exhaustive search techniques are used to narrow down search space
Authors focus on presentation and set of operators to search via evolution
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Background
RNA (ribonucleic acid) directs middle steps of
protein production single-stranded, certain
parts are folded RNA Secondary
Structure - accounts for diverse functional activities
Background
RNA Secondary Structure: Recurs in multiple genes within a single
organism Recurs in across the same gene in several
organism Why a computational tool for RNA
secondary structure search? Discover new structures Improve understanding of functional and
regulatory relationships amongst related RNAs
Background – RNAMotif
RNAMotif: mines nucleotide sequence databases for repeating structure motifs
RNAMotif Input: descriptor contains details about pairing information, length, sequence
Background - RNAMotif
RNAMotif Output: list of real structures
RNAMotif may return a very high number of motifs when descriptor is more flexible
Input to the EA: RNAMotif Output
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Evolutionary Computation Population Initialization
P parent bins B – bin size Bin = a contending solution Each bin contains structures
from different organisms Structures chosen at random
from RNAMotif Output file
Figure 1
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Evolutionary Computation Variation
P parent bins are copied to O offspring bins Variables: operator, number of times to apply it Variation Operator 1: structure replacement
within a specified organism Replacement taken from RNAMotif Output File Local – neighboring replacement structure Global – random replacement structure Example: P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
Evolutionary Computation Variation
Variation Operator 2: Structure replacement from different organisms Variable: # of structures to be replaced Example: # = 2
P organisms = {H. Sapiens, S. Scrofa, E. Coli, G. Gallus}
O organisms = {H. Sapiens, C. Griseus, E. Coli, S. Scrofa}
Evolutionary Computation Variation
Variation Operator 3: random single-point bin recombination Generates a second parent from RNAMotif output and
applies single-point bin recombination Chooses randomly one of the two offsprings Example: P = {H, S, E, G} P = {D, E, O, B}
O = {H, S, E, B} O = {D, E, O, G}
Variation Operator 4: random multi-point bin recombination
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Evolutionary Computation Fitness
Fitness Function Scoring Components: Structure nucleotide sequence similarity Structure length similarity Structure thermodynamic stability similarity
These measures are applied pairwise by each structural component and summed into a final bin score
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Evolutionary Computation Selection
Selection: For every bin in population, A set of R rival bins is randomly selected Calculate score = # rivals with lower fitness
Lower bins are removed Iterations continue until number of
generations (G) or CPU time is satisfied, or until expected change of fitness/gen 0
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion
Results
Experiment 1: 7.6x10 possible
bins Exhaustive search:
125 days
EA examined ~10 bins before
converging < 3 minutes
8
4
Results
To test the utility of this method Run on newly discovered genomes (S. Pyogenes) Compare to database which has an alignment for
this RNA secondary structure for previously discovered genomes (S. Mutans)
Found similar sequence and structure to close organisms
Outline
Problem Statement Background Evolutionary Computation
Population initialization Variation Fitness Selection
Results Conclusion