Evolving L-Systems to Capture Protein Structure Native Conformations

17
Evolving L-Systems Evolving L-Systems to Capture Protein to Capture Protein Structure Structure Native Conformations Native Conformations Gabi Escuela Gabi Escuela 1 , Gabriela Ochoa , Gabriela Ochoa 2 and and Natalio Krasnogor Natalio Krasnogor 3 1,2 1,2 Department of Computer Science, Universidad Simon Bolivar, Department of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela Caracas, Venezuela 1 [email protected], [email protected], 2 [email protected] [email protected] 3 3 School of Computer Science and I.T., University of School of Computer Science and I.T., University of Nottingham Nottingham [email protected] [email protected]

description

Evolving L-Systems to Capture Protein Structure Native Conformations. Gabi Escuela 1 , Gabriela Ochoa 2 and Natalio Krasnogor 3 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, Venezuela 1 [email protected], 2 [email protected] - PowerPoint PPT Presentation

Transcript of Evolving L-Systems to Capture Protein Structure Native Conformations

Page 1: Evolving L-Systems to Capture Protein Structure Native Conformations

Evolving L-Systems to Evolving L-Systems to Capture Protein StructureCapture Protein Structure

Native ConformationsNative Conformations Gabi EscuelaGabi Escuela11, Gabriela Ochoa, Gabriela Ochoa22 and Natalio Krasnogor and Natalio Krasnogor33

1,2 1,2 Department of Computer Science, Universidad Simon Bolivar, Caracas, VenezuelaDepartment of Computer Science, Universidad Simon Bolivar, Caracas, [email protected],[email protected], [email protected]@ldc.usb.ve

3 3 School of Computer Science and I.T., University of NottinghamSchool of Computer Science and I.T., University of Nottingham

[email protected]@nottingham.ac.uk

Page 2: Evolving L-Systems to Capture Protein Structure Native Conformations

ContentContent

Proteins Proteins Protein Structure PredictionProtein Structure Prediction (PSP) (PSP) TheThe HP HP model model EA approaches to EA approaches to PSPPSP: current : current

encodingencoding L-SystemsL-Systems Why a grammatical encoding? Why a grammatical encoding? Methods and ResultsMethods and Results Discussion and Future WorkDiscussion and Future Work

3D structure of myoglobin, showing coloured alpha helices.

Page 3: Evolving L-Systems to Capture Protein Structure Native Conformations

1A8M 3-D 1A8M 3-D StructureStructure

• Linear chains of ~30-400 units from Linear chains of ~30-400 units from 20 different amino acids20 different amino acids

• Fold into a unique functional Fold into a unique functional structure: structure: native statenative state or or tertiary tertiary structurestructure

ProteinsProteins

Show repeated Show repeated substructures: substructures: alphaalpha heliceshelices and and beta sheetsbeta sheets

Page 4: Evolving L-Systems to Capture Protein Structure Native Conformations

Protein Structure Prediction (PSP)Protein Structure Prediction (PSP)

GoalGoal: Determining the 3D : Determining the 3D structure of proteins from their structure of proteins from their amino acid sequences amino acid sequences

StrategyStrategy: find an amino acid : find an amino acid chain's state of minimum chain's state of minimum energyenergy

Solution will have practical Solution will have practical consequences in medicine, consequences in medicine, drug development and drug development and agricultureagriculture

Page 5: Evolving L-Systems to Capture Protein Structure Native Conformations

The 2D HP ModelThe 2D HP Model

Hydrophobic effect is the main force governing folding

qq ЄЄ{{HH, , PP}}++, each letter of , each letter of qq has to be put in vertex of a has to be put in vertex of a given lattice given lattice LL (at each point: (at each point: turn 90turn 90ºº Left or Right, or Left or Right, or continue ahead)continue ahead)

Scoring functionScoring function: adds -1 for : adds -1 for each “contact” between two each “contact” between two HsHs adjacent in the lattice that adjacent in the lattice that are not consecutive in are not consecutive in qq

9 H-H bondsScore = -9

HPHPPHHPHPPHPHHPPHPH

Square Lattice

ObjectiveObjective:: Find the Find the organization (organization (embeddingembedding) of ) of qq in in LL of minimum score of minimum score (maximum contacts)(maximum contacts)

2 Amino acids types: hydrophobic (H) and polar or hydrophilic (P)

Page 6: Evolving L-Systems to Capture Protein Structure Native Conformations

EA approaches to PSP: Current EA approaches to PSP: Current (Direct) Encoding(Direct) Encoding

EAs and other stochastic methods: global optimization EAs and other stochastic methods: global optimization of a suitable energy functionof a suitable energy function

EncodingEncoding: Cartesian Coordinates, Distance : Cartesian Coordinates, Distance

Geometries, Geometries, Internal CoordinatesInternal Coordinates AbsoluteAbsolute:: structure encoded as a string of symbols. structure encoded as a string of symbols.

For example: In the 2D Square For example: In the 2D Square

s = {s = {UUp, p, DDown, own, LLeft, eft, RRight}ight}++

RelativeRelative:: each move is interpreted in terms of the each move is interpreted in terms of the previous one previous one

s = {s = {FForward, Turnorward, TurnLLeft, Turneft, TurnRRight} ight} ++

Page 7: Evolving L-Systems to Capture Protein Structure Native Conformations

RDDLULDLDLUURULURRD L = 19

R

D

D

L

RFRRLLRLRRFRLLRRFR L = 18

F

R

R

Protein : HPHPPHHPHPPHPHHPPHPH L =20

Absolute Encoding

Relative Encoding

First position is fixed

First and second position are fixed

R

Page 8: Evolving L-Systems to Capture Protein Structure Native Conformations

L-Systems L-Systems (Lindenmayer, 1968)(Lindenmayer, 1968)

A model of morphogenesis, A model of morphogenesis, based on formal grammarsbased on formal grammars

RewritingRewriting: Define complex : Define complex objects by replacing parts of a objects by replacing parts of a simple object using a set of simple object using a set of productions. productions.

Symbols: F, f, +, -, [, ] Axiom (SS) Production

(replacement) rules

SS: F F F+f f F

F

F+f

F+f+F

F+f+F+F+f

start

1

2

3

r1:r2:

Page 9: Evolving L-Systems to Capture Protein Structure Native Conformations

Why a Grammatical Encoding?Why a Grammatical Encoding?

Specifies how to construct the Specifies how to construct the phenotypephenotype

Can achieve greater scalability Can achieve greater scalability through self-similar and hierarchical through self-similar and hierarchical structure structure

Proteins exhibit high degree of Proteins exhibit high degree of regularity, and repeated motifs regularity, and repeated motifs

Current encoding may not be Current encoding may not be suitable for crossover and building suitable for crossover and building block transfer between individuals block transfer between individuals

Protein Structure

3D L-System

Page 10: Evolving L-Systems to Capture Protein Structure Native Conformations

MethodMethod

Prove of principleProve of principle: Can a folded protein be : Can a folded protein be captured (encoded) by an L-system?captured (encoded) by an L-system?

How to find that L-systemHow to find that L-system: An EA used to : An EA used to evolve an L-system that capture a folded evolve an L-system that capture a folded protein (protein (inverse probleminverse problem))

EAInput: Folded structure in Relative CoordinatesRFRRLLRLRRFRLLRRFR

Output: L-system L that once derived, will produce the target stringRFRRLLRLRRFRLLRRFR

Axiom = 01FRules = {0:RFR1, 1:2L2, 2:R0L}

Page 11: Evolving L-Systems to Capture Protein Structure Native Conformations

Proposed Grammatical EncodingProposed Grammatical Encoding

D0L-system D0L-system (deterministic and context free)(deterministic and context free)::

AlphabetAlphabet: : ==tt ntnt

tt=={F,L,R}{F,L,R} terminal symbols (relative coord.) terminal symbols (relative coord.)

ntnt=={0,1,2,...,m-1}{0,1,2,...,m-1} non-terminal symbols non-terminal symbols

(rewriting rules), m = max. number of rules(rewriting rules), m = max. number of rules

AxiomAxiom: α : α **

Rewriting rulesRewriting rules: i: w: i: wii , where i , where i ntnt and w and wii **

axiom R2rules 0:R03F; 1:R01L; 2:F310; 3:LRL3

Example

Page 12: Evolving L-Systems to Capture Protein Structure Native Conformations

Evolutionary AlgorithmEvolutionary Algorithm

Generational with rank based selectionGenerational with rank based selection Randomly generated initial populationRandomly generated initial population

Prefixed maximum number of rulesPrefixed maximum number of rules Axiom and Rules:Axiom and Rules: randomly generated strings of randomly generated strings of

prefixed maximum length prefixed maximum length

Genetic operatorsGenetic operators Uniform-likeUniform-like (homologous) recombination (rate = 1.0) (homologous) recombination (rate = 1.0)

complete production rules are interchangedcomplete production rules are interchanged Per symbol mutation in both axioms and rules Per symbol mutation in both axioms and rules

(deletion (30%), insertion (10%), modification(60%))(deletion (30%), insertion (10%), modification(60%))

Page 13: Evolving L-Systems to Capture Protein Structure Native Conformations

DerivationDerivation, and Fitness Function, and Fitness Function

31

R0RLRFR1

RFR R0RL R 3LL2 RL

RFRR 3LL2 RL R RFR1 LL RRF RL

RFRRLLRLRRFRLLRRFR

axiom

1st step

2nd step

3th step

post-processing

Axiom = 31Rules ={0:3LL2; 1:R0RL; 2:RRF; 3:RFR1}

phenotypefitness= 18

genotype

3 1

1 0

0 3 2

DerivationDerivation: from genotype : from genotype (axiom and rules) to (axiom and rules) to phenotype (folded phenotype (folded structure)structure)

Post-processingPost-processing: non-: non-terminal symbols pruningterminal symbols pruning

Fitness calculationFitness calculation: number : number of matches between the of matches between the target string and the target string and the solution Msolution Min. = 0, Max = in. = 0, Max = length of the desired length of the desired

foldingfolding..

Page 14: Evolving L-Systems to Capture Protein Structure Native Conformations

Results (1)Results (1)Instance Length Successes One Solution

HPHPPHHPHPPHPHHPPHPHRFRRLLRLRRFRLLRRFR

18 5/50 (4 R) A = 31R = {0:3LL2, 1:R0RL,

2:RRF, 3:RFR1}

HHHPPHPHPHPPHPHPHPPH RRFRFRLFRRFLRLRFRR

18 3/50 (4 R) A = R2R = {0:RLR, 1:3F32L,

2:1FR33,3:R102}

HHPPHPPHPPHPPHPPHPPHPPHH RLLFLFFRRFLLFRRLRFFRRF

22 0/50 (4 R)1/50 (5 R)

A = 1RR = { 0:4LF3,1:RL243,

2:00F3, 3:RRFL, 4:0R14F}

PPHPPHHPPPPHHPPPPHHPPPPHH

FFRRFFFLLFFFFRRFFFFLLFF

23 1/50 (5 R) A= 32R ={0:20R2, 1:132F,

2:FF012, 3:0FLL}

Page 15: Evolving L-Systems to Capture Protein Structure Native Conformations

Results (2)Results (2)

Evolutionary progression towards the target structure

Page 16: Evolving L-Systems to Capture Protein Structure Native Conformations

DiscussionDiscussion

The proposed EA discovered L-systems The proposed EA discovered L-systems that capture a target folding under the HP that capture a target folding under the HP model in 2D latticesmodel in 2D lattices

We are not solving the PSP yet, We are not solving the PSP yet, but but .... We are proposing a novel and potentially We are proposing a novel and potentially

useful, generative encoding for useful, generative encoding for evolutionary approachesevolutionary approaches to PSPto PSP

Page 17: Evolving L-Systems to Capture Protein Structure Native Conformations

Future workFuture work Incorporate problem knowledge about secondary Incorporate problem knowledge about secondary

structuresstructures

Alpha Helix Beta Sheet

Explore longer chains and 3D latticesExplore longer chains and 3D lattices

Beta Turn