Evolving L-Systems to Capture Protein Structure Native Conformations
description
Transcript of Evolving L-Systems to Capture Protein Structure Native Conformations
Evolving L-Systems to Evolving L-Systems to Capture Protein StructureCapture Protein Structure
Native ConformationsNative Conformations Gabi EscuelaGabi Escuela11, Gabriela Ochoa, Gabriela Ochoa22 and Natalio Krasnogor and Natalio Krasnogor33
1,2 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, VenezuelaDepartment of Computer Science, Universidad Simon Bolivar, Caracas, [email protected],[email protected], [email protected]@ldc.usb.ve
3 3 School of Computer Science and I.T., University of NottinghamSchool of Computer Science and I.T., University of Nottingham
[email protected]@nottingham.ac.uk
ContentContent
Proteins Proteins Protein Structure PredictionProtein Structure Prediction (PSP) (PSP) TheThe HP HP model model EA approaches to EA approaches to PSPPSP: current : current
encodingencoding L-SystemsL-Systems Why a grammatical encoding? Why a grammatical encoding? Methods and ResultsMethods and Results Discussion and Future WorkDiscussion and Future Work
3D structure of myoglobin, showing coloured alpha helices.
1A8M 3-D 1A8M 3-D StructureStructure
• Linear chains of ~30-400 units from Linear chains of ~30-400 units from 20 different amino acids20 different amino acids
• Fold into a unique functional Fold into a unique functional structure: structure: native statenative state or or tertiary tertiary structurestructure
ProteinsProteins
Show repeated Show repeated substructures: substructures: alphaalpha heliceshelices and and beta sheetsbeta sheets
Protein Structure Prediction (PSP)Protein Structure Prediction (PSP)
GoalGoal: Determining the 3D : Determining the 3D structure of proteins from their structure of proteins from their amino acid sequences amino acid sequences
StrategyStrategy: find an amino acid : find an amino acid chain's state of minimum chain's state of minimum energyenergy
Solution will have practical Solution will have practical consequences in medicine, consequences in medicine, drug development and drug development and agricultureagriculture
The 2D HP ModelThe 2D HP Model
Hydrophobic effect is the main force governing folding
qq ЄЄ{{HH, , PP}}++, each letter of , each letter of qq has to be put in vertex of a has to be put in vertex of a given lattice given lattice LL (at each point: (at each point: turn 90turn 90ºº Left or Right, or Left or Right, or continue ahead)continue ahead)
Scoring functionScoring function: adds -1 for : adds -1 for each “contact” between two each “contact” between two HsHs adjacent in the lattice that adjacent in the lattice that are not consecutive in are not consecutive in qq
9 H-H bondsScore = -9
HPHPPHHPHPPHPHHPPHPH
Square Lattice
ObjectiveObjective:: Find the Find the organization (organization (embeddingembedding) of ) of qq in in LL of minimum score of minimum score (maximum contacts)(maximum contacts)
2 Amino acids types: hydrophobic (H) and polar or hydrophilic (P)
EA approaches to PSP: Current EA approaches to PSP: Current (Direct) Encoding(Direct) Encoding
EAs and other stochastic methods: global optimization EAs and other stochastic methods: global optimization of a suitable energy functionof a suitable energy function
EncodingEncoding: Cartesian Coordinates, Distance : Cartesian Coordinates, Distance
Geometries, Geometries, Internal CoordinatesInternal Coordinates AbsoluteAbsolute:: structure encoded as a string of symbols. structure encoded as a string of symbols.
For example: In the 2D Square For example: In the 2D Square
s = {s = {UUp, p, DDown, own, LLeft, eft, RRight}ight}++
RelativeRelative:: each move is interpreted in terms of the each move is interpreted in terms of the previous one previous one
s = {s = {FForward, Turnorward, TurnLLeft, Turneft, TurnRRight} ight} ++
RDDLULDLDLUURULURRD L = 19
R
D
D
L
RFRRLLRLRRFRLLRRFR L = 18
F
R
R
Protein : HPHPPHHPHPPHPHHPPHPH L =20
Absolute Encoding
Relative Encoding
First position is fixed
First and second position are fixed
R
L-Systems L-Systems (Lindenmayer, 1968)(Lindenmayer, 1968)
A model of morphogenesis, A model of morphogenesis, based on formal grammarsbased on formal grammars
RewritingRewriting: Define complex : Define complex objects by replacing parts of a objects by replacing parts of a simple object using a set of simple object using a set of productions. productions.
Symbols: F, f, +, -, [, ] Axiom (SS) Production
(replacement) rules
SS: F F F+f f F
F
F+f
F+f+F
F+f+F+F+f
start
1
2
3
r1:r2:
Why a Grammatical Encoding?Why a Grammatical Encoding?
Specifies how to construct the Specifies how to construct the phenotypephenotype
Can achieve greater scalability Can achieve greater scalability through self-similar and hierarchical through self-similar and hierarchical structure structure
Proteins exhibit high degree of Proteins exhibit high degree of regularity, and repeated motifs regularity, and repeated motifs
Current encoding may not be Current encoding may not be suitable for crossover and building suitable for crossover and building block transfer between individuals block transfer between individuals
Protein Structure
3D L-System
MethodMethod
Prove of principleProve of principle: Can a folded protein be : Can a folded protein be captured (encoded) by an L-system?captured (encoded) by an L-system?
How to find that L-systemHow to find that L-system: An EA used to : An EA used to evolve an L-system that capture a folded evolve an L-system that capture a folded protein (protein (inverse probleminverse problem))
EAInput: Folded structure in Relative CoordinatesRFRRLLRLRRFRLLRRFR
Output: L-system L that once derived, will produce the target stringRFRRLLRLRRFRLLRRFR
Axiom = 01FRules = {0:RFR1, 1:2L2, 2:R0L}
Proposed Grammatical EncodingProposed Grammatical Encoding
D0L-system D0L-system (deterministic and context free)(deterministic and context free)::
AlphabetAlphabet: : ==tt ntnt
tt=={F,L,R}{F,L,R} terminal symbols (relative coord.) terminal symbols (relative coord.)
ntnt=={0,1,2,...,m-1}{0,1,2,...,m-1} non-terminal symbols non-terminal symbols
(rewriting rules), m = max. number of rules(rewriting rules), m = max. number of rules
AxiomAxiom: α : α **
Rewriting rulesRewriting rules: i: w: i: wii , where i , where i ntnt and w and wii **
axiom R2rules 0:R03F; 1:R01L; 2:F310; 3:LRL3
Example
Evolutionary AlgorithmEvolutionary Algorithm
Generational with rank based selectionGenerational with rank based selection Randomly generated initial populationRandomly generated initial population
Prefixed maximum number of rulesPrefixed maximum number of rules Axiom and Rules:Axiom and Rules: randomly generated strings of randomly generated strings of
prefixed maximum length prefixed maximum length
Genetic operatorsGenetic operators Uniform-likeUniform-like (homologous) recombination (rate = 1.0) (homologous) recombination (rate = 1.0)
complete production rules are interchangedcomplete production rules are interchanged Per symbol mutation in both axioms and rules Per symbol mutation in both axioms and rules
(deletion (30%), insertion (10%), modification(60%))(deletion (30%), insertion (10%), modification(60%))
DerivationDerivation, and Fitness Function, and Fitness Function
31
R0RLRFR1
RFR R0RL R 3LL2 RL
RFRR 3LL2 RL R RFR1 LL RRF RL
RFRRLLRLRRFRLLRRFR
axiom
1st step
2nd step
3th step
post-processing
Axiom = 31Rules ={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1}
phenotypefitness= 18
genotype
3 1
1 0
0 3 2
DerivationDerivation: from genotype : from genotype (axiom and rules) to (axiom and rules) to phenotype (folded phenotype (folded structure)structure)
Post-processingPost-processing: non-: non-terminal symbols pruningterminal symbols pruning
Fitness calculationFitness calculation: number : number of matches between the of matches between the target string and the target string and the solution Msolution Min. = 0, Max = in. = 0, Max = length of the desired length of the desired
foldingfolding..
Results (1)Results (1)Instance Length Successes One Solution
HPHPPHHPHPPHPHHPPHPHRFRRLLRLRRFRLLRRFR
18 5/50 (4 R) A = 31R = {0:3LL2, 1:R0RL,
2:RRF, 3:RFR1}
HHHPPHPHPHPPHPHPHPPH RRFRFRLFRRFLRLRFRR
18 3/50 (4 R) A = R2R = {0:RLR, 1:3F32L,
2:1FR33,3:R102}
HHPPHPPHPPHPPHPPHPPHPPHH RLLFLFFRRFLLFRRLRFFRRF
22 0/50 (4 R)1/50 (5 R)
A = 1RR = { 0:4LF3,1:RL243,
2:00F3, 3:RRFL, 4:0R14F}
PPHPPHHPPPPHHPPPPHHPPPPHH
FFRRFFFLLFFFFRRFFFFLLFF
23 1/50 (5 R) A= 32R ={0:20R2, 1:132F,
2:FF012, 3:0FLL}
Results (2)Results (2)
Evolutionary progression towards the target structure
DiscussionDiscussion
The proposed EA discovered L-systems The proposed EA discovered L-systems that capture a target folding under the HP that capture a target folding under the HP model in 2D latticesmodel in 2D lattices
We are not solving the PSP yet, We are not solving the PSP yet, but but .... We are proposing a novel and potentially We are proposing a novel and potentially
useful, generative encoding for useful, generative encoding for evolutionary approachesevolutionary approaches to PSPto PSP
Future workFuture work Incorporate problem knowledge about secondary Incorporate problem knowledge about secondary
structuresstructures
Alpha Helix Beta Sheet
Explore longer chains and 3D latticesExplore longer chains and 3D lattices
Beta Turn