BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1–8 · BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1–8 A...

8
BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1–8 A Path Planning Approach for Computing Large-Amplitude Motions of Flexible Molecules J. Cortés a,d , T. Siméon a , V. Ruiz de Angulo a,b , D. Guieysse c , M. Remaud-Siméon c , V. Tran d a LAAS-CNRS, 7 av. du Colonel-Roche, 31077 Toulouse, France b Institut de Robòtica i Informàtica Industrial, CSIC-UPC, Llorens Artigas 4-6, 2 planta, 08028 Barcelona, Spain c Laboratoire Biotechnologie-Bioprocédés, UMR CNRS 5504, UMR INRA 792, INSA, 135 av. de Rangueil, 31077 Toulouse, France d Unité Biotechnologie, Biocatalyse, Biorégulation (U3B), UMR CNRS 6204, Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44322 Nantes, France ABSTRACT Motivation: Motion is inherent to molecular interactions. Taking into account the molecular flexibility is necessary to develop accurate computational techniques for interaction pre- diction. Energy-based methods currently used in molecular modeling (i.e. molecular dynamics, Monte Carlo algorithms) are, in practice, only able to compute local motions. However, large-amplitude motions often occur in biological processes. We investigate the application of geometric path planning algo- rithms to compute such large motions on flexible molecular models. Our purpose is to exploit the efficacy of a geometric search as a filtering stage before refining with energies. Results: Two kinds of large-amplitude motions are treated in this paper: protein loop conformational changes (involving pro- tein backbone flexibility) and ligand trajectories to the active site of a protein (involving ligand and protein side-chain fle- xibility). The first simulations performed using our two-stage approach (geometry + energy) show that, compared to classi- cal molecular modeling methods, quite similar results can be obtained with a significant performance gain, ranging from 100 to 1000. Furthermore, the geometric treatment can provide by itself highly valuable information to biologists. Availability: The algorithms presented in this paper have been implemented in the general-purpose motion planning software Move3D (Siméon et al., 2001). We are currently working on an optimized stand alone library that will be available to the scientific community. Contact: [email protected] 1 INTRODUCTION The interest of the computational analysis of molecular moti- ons is well known. Molecular flexibility remains the main challenge for accurate docking approaches (Janin et al., 2003) or for studying molecular pathways (e.g. protein folding, structural rearrangements). Classical biomolecular modeling methods (e.g., Leach, 1996) are too computationaly expensive for generating large motions while accounting for molecular flexibility. Molecular dynamics simulations are in practice applicable to compute motions in the time range of nanose- conds. Techniques based on Monte Carlo algorithms permit to compute larger motions, but they are also restrained. In prac- tice, the two limiting factors of these methods are the CPU time cost of energy computation and the high tendency to fall in local minima of the energy landscape. Motion planning algorithms (Latombe, 1991) originally developed in robotics, have already been proposed as new tools for computational biology (see Section 2). While these techniques have been applied to explore the molecular force field, our aim is to exploit the efficacy of a geometric treat- ment of molecular constraints before dealing with energies. The goal is to better handle the complexity introduced by fle- xible molecular models. We investigate a two-stage approach for computing large-amplitude motions. In a first stage, motion planning algorithms are applied to explore the portions of the conformational space that satisfy a set of geome- tric constraints corresponding to strong energetic restrictions (see Section 3.1). The interest of this geometric filtering is that high-dimensional conformational spaces can be globally explored in an efficient way. The geometric path is then used in a second stage as a filtered input for a more accurate ana- lysis using classical energy-based modeling techniques. This paper focuses on the geometric stage of the approach. Note that previous works in computational biology already rely on a geometric treatment of molecular constraints. The geometric complementarity of molecular surfaces is a widely used criterion to predict protein-ligand or protein-protein interactions (e.g., Kuntz et al., 1982; Rarey et al., 1996; Jack- son et al., 1998), especially in rigid docking approaches. Also, techniques for the conformational sampling of polypeptide segments (e.g., Moult and James, 1986; DePristo et al., 2003) © Oxford University Press 2005. 1

Transcript of BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1–8 · BIOINFORMATICS Vol. 00 no. 00 2005 Pages 1–8 A...

BIOINFORMATICS Vol. 00 no. 00 2005Pages 1–8

A Path Planning Approach for ComputingLarge-Amplitude Motions of Flexible Molecules

J. Cortés a,d, T. Siméon a, V. Ruiz de Angulo a,b, D. Guieysse c,M. Remaud-Siméon c, V. Tran d

aLAAS-CNRS, 7 av. du Colonel-Roche, 31077 Toulouse, FrancebInstitut de Robòtica i Informàtica Industrial, CSIC-UPC,Llorens Artigas 4-6, 2 planta, 08028 Barcelona, Spain

cLaboratoire Biotechnologie-Bioprocédés, UMR CNRS 5504, UMR INRA 792,INSA, 135 av. de Rangueil, 31077 Toulouse, France

dUnité Biotechnologie, Biocatalyse, Biorégulation (U3B), UMR CNRS 6204,Faculté des Sciences et Techniques, 2 rue de la Houssinière, 44322 Nantes, France

ABSTRACTMotivation: Motion is inherent to molecular interactions.Taking into account the molecular flexibility is necessary todevelop accurate computational techniques for interaction pre-diction. Energy-based methods currently used in molecularmodeling (i.e. molecular dynamics, Monte Carlo algorithms)are, in practice, only able to compute local motions. However,large-amplitude motions often occur in biological processes.We investigate the application of geometric path planning algo-rithms to compute such large motions on flexible molecularmodels. Our purpose is to exploit the efficacy of a geometricsearch as a filtering stage before refining with energies.Results: Two kinds of large-amplitude motions are treated inthis paper: protein loop conformational changes (involving pro-tein backbone flexibility) and ligand trajectories to the activesite of a protein (involving ligand and protein side-chain fle-xibility). The first simulations performed using our two-stageapproach (geometry + energy) show that, compared to classi-cal molecular modeling methods, quite similar results can beobtained with a significant performance gain, ranging from 100to 1000. Furthermore, the geometric treatment can provide byitself highly valuable information to biologists.Availability: The algorithms presented in this paper have beenimplemented in the general-purpose motion planning softwareMove3D (Siméon et al., 2001). We are currently working onan optimized stand alone library that will be available to thescientific community.Contact: [email protected]

1 INTRODUCTIONThe interest of the computational analysis of molecular moti-ons is well known. Molecular flexibility remains the mainchallenge for accurate docking approaches (Janinet al., 2003)or for studying molecular pathways (e.g. protein folding,structural rearrangements). Classical biomolecular modeling

methods (e.g., Leach, 1996) are too computationaly expensivefor generating large motions while accounting for molecularflexibility. Molecular dynamics simulations are in practiceapplicable to compute motions in the time range of nanose-conds. Techniques based on Monte Carlo algorithms permit tocompute larger motions, but they are also restrained. In prac-tice, the two limiting factors of these methods are the CPUtime cost of energy computation and the high tendency to fallin local minima of the energy landscape.

Motion planning algorithms (Latombe, 1991) originallydeveloped in robotics, have already been proposed as newtools for computational biology (see Section 2). While thesetechniques have been applied to explore the molecular forcefield, our aim is to exploit the efficacy of a geometric treat-ment of molecular constraints before dealing with energies.The goal is to better handle the complexity introduced by fle-xible molecular models. We investigate a two-stage approachfor computing large-amplitude motions. In a first stage,motion planning algorithms are applied to explore the portionsof the conformational space that satisfy a set of geome-tric constraints corresponding to strong energetic restrictions(see Section 3.1). The interest of this geometric filtering isthat high-dimensional conformational spaces can be globallyexplored in an efficient way. The geometric path is then usedin a second stage as a filtered input for a more accurate ana-lysis using classical energy-based modeling techniques. Thispaper focuses on the geometric stage of the approach.

Note that previous works in computational biology alreadyrely on a geometric treatment of molecular constraints. Thegeometric complementarity of molecular surfaces is a widelyused criterion to predict protein-ligand or protein-proteininteractions (e.g., Kuntzet al., 1982; Rareyet al., 1996; Jack-sonet al., 1998), especially in rigid docking approaches. Also,techniques for the conformational sampling of polypeptidesegments (e.g., Moult and James, 1986; DePristoet al., 2003)

© Oxford University Press 2005. 1

Cortés et al

often apply geometry-based approaches to the loop closingproblem and to atom overlap detection.

The proposed approach is applied to two kinds of computa-tional biology problems. Section 4 deals with the analysisof protein loop mobility. Section 5 concerns the study ofaccessibility problems in protein-ligand interactions, consi-dering both the flexibility of the ligand and that of the proteinside-chains. Our studies show the efficacy of the approach onsuch problems involving large-amplitude motions and mole-cular flexibility which are both poorly treated by energy-basedmethods. Finally it highlights the interest of the approach forguiding the rational design of proteins.

2 ABOUT MOTION PLANNINGMotion planning is a classical problem in robotics (Latombe,1991). In the last years, motion planning techniques haveundergone considerable progress. Recent techniques havebeen successfully applied to challenging problems in diverseapplication domains and have provided promising results incomputational biology.

2.1 Sampling-Based Motion Planning AlgorithmsSampling-based motion planners have demonstrated to be effi-cient tools to explore high-dimensional spaces. Most of theseplanners are variants of theProbabilistic RoadMap(PRM)approach (Kavrakiet al., 1996). The basic principle of PRMis to compute a connectivity graph (the roadmap) encodingrepresentative feasible paths in the search space (e.g. the mole-cular conformational space). Nodes correspond to randomlysampled points that satisfy a certain feasibility requirement(e.g. collision-freeness) and connections represent feasiblesubpaths computed between neighboring samples. Once theroadmap has been constructed, it can be used subsequently toefficiently process multiple motion planning queries betweentwo given positions of the moving system or to determineensemble properties of mobility.

Variants of the PRM framework have been designed for sol-ving single-query problems without preprocessing the com-plete roadmap. For example, theRapidly-exploring RandomTrees(RRT) algorithm (LaValle and Kuffner, 2001) expandsrandom trees rooted at the query positions and advancingtoward each other through the use of a greedy heuristic. Suchvariants are adapted when solving highly constrained pro-blems for which the solution space has the shape of a long thintube. While random sampling within the tube would requirea high density of points, the random tree variant may benefitfrom the shape of the tube to naturally steer the expansion.

2.2 Applications to Computational BiologyRecently, PRM-based algorithms have been successfully app-lied for studying molecular motions involved in biologicalprocesses such as protein-ligand interactions (Singhet al.,2001; Apaydinet al., 2004), protein folding (Amatoet al.,2003; Apaydinet al., 2002) and also RNA folding (Tang

et al., 2004). The main difference of the molecular adap-tation of the PRM framework is that the binary collisiondetection, used in robotic applications, is replaced by a mole-cular force field. Sampled conformations are accepted on thebasis of their potential energy and by weighting the roadmapedges according to their energetic cost. The major strength ofthese sampling-based techniques is their capability to explorehigh-dimensional spaces and to circumvent the local minimaproblem encountered by other classic simulation techniques(like Monte-Carlo algorithms or molecular dynamics) whichwaste a lot of time trying to escape from the local minima ofthe molecular energy landscape. Although the techniques aregeneral enough to use any method to compute the potentialenergies, the proposed planners consider simple potentialsfor efficiency purpose (e.g. van der Waals and electrosta-tic terms, using thresholds to avoid performing all pairwisecomputations on the potential terms).

Singhet al. (2001) and Apaydinet al. (2004) show promi-sing results on the study of binding sites for flexible ligands,assuming, as most of the docking methods, a rigid model of theprotein to maintain a reasonably low dimension of the con-formational space. Protein flexibility, which however playsan important role in protein-ligand interactions, is partiallyconsidered for protein folding applications using simplifiedarticulated models (articulated backbone with bounding sphe-res approximating side-chains (Amatoet al., 2003), or avector-based approximation of secondary structure elements(Apaydinet al., 2002)).

Apart our previous work on long protein loop conformatio-nal studies (Cortéset al., 2004), RRT-like methods have neverbeen applied to computational biology problems.

3 THE GEOMETRIC APPROACHThis section describes our geometric path planning approach.It first discusses how geometric constraints allow us to modelthe strongest molecular constraints considered by energy-based methods. It then describes our articulated model ofproteins and the main algorithms we developed for theefficient computation of geometrically feasible paths.

3.1 Geometric View of Molecular Constraints3.1.1 Molecular degrees of freedom.Molecular mechanicsforce fields consider separately bonded and non-bonded ato-mic interactions. Bonded interactions concern the variation ofthe relative position of bonded atoms which is usually givenin internal coordinates: bond lengths (stretching), bond angles(bending) and dihedral angles (torsion). Slight variations ofbond lengths and bond angles from their ideal values producea large increase of energy. Thus, in first approximation, theseparameters are generally assumed to be constant. Under thisassumption, a molecular chain is an articulated mechanismwith revolute joints that correspond to bond torsions.

In many studies, the global molecular architecture is knownand only segments of the molecular chain (e.g. protein loops)

2

Computing Large-Amplitude Motions of Flexible Molecules

are assumed to be flexible. The first and last atoms of the fle-xible segments must remain bonded with their neighboringatoms. Hence, kinematic loop closure constraints consi-derably reduce the subset of feasible conformations of thearticulated molecular model. Such constraints also appear incyclic molecules and in the presence of disulphide bonds.

3.1.2 Main repulsive constraints.Concerning non-bondedinteractions, the strongest contributor (in short distance inter-actions) corresponds to the van der Waals repulsive term. Alarge amount of energy is required to get two non-bondedatoms significantly closer than the van der Waals equilibriumdistance. Acceptable conformations must then satisfy a geo-metric constraint: the distance between pairs of non-bondedatoms must be greater than a given contact distance.

3.1.3 Main attractive constraints.Two other importanttypes of interactions are hydrogen bonds and hydrophobicinteractions. Intrinsically, such interactions maintain the glo-bular shape of macromolecules and are also involved inmolecular docking. Both kinds of interactions are possiblewhen specific relative locations of the concerned atoms aresatisfied. Thus, they imply geometric (distance/orientation)constraints on the molecular model.

These geometric constraints are the driving forces affectingmolecular conformational changes. It is assumed that energe-tically acceptable conformations are contained in the subsetof the geometrically feasible ones.

3.2 Geometric ModelingAs in many other approaches we handle all-atom modelsof molecules, that are considered as articulated mechanismswith atoms represented by spheres. Groups of rigidly bondedatoms form the bodies and the articulations between bodiescorrespond to bond torsions. A cartesian coordinate frame isattached to each group. The relative location of consecutiveframes is defined by an homogeneous transformation matrixthat is a function of the rotation angle between them. We fol-low a similar method to that of Zhang and Kavraki (2002)to define such frames and matrices between rigid groups.Figure 1 shows the mechanical model for a particular aminoacid residue of a protein.

Our modeling can also take advantage of a known secondarystructure. In this case, the rigid secondary structure elements(alpha helices and beta sheets) are modeled as rigid groupsof backbone atoms with articulated side-chains. Since secon-dary structure elements are fixed in the model, loop closureconstraints are introduced in the in-between segment. Similarclosure constraints can also be introduced in the model of amolecule to consider nonbonded interactions, such as hydro-gen bonds, that impose the spatial proximity between someatoms of the protein.

Fig. 1. Mechanical model of a protein residue (phenylalanin). It iscomposed of five rigid bodies, classified in:- backbone rigids:Rb1 = {N}, Rb2 = {Cα,Cβ}, Rb3 = {C, 0}- side-chain rigids:Rs1 = {Cγ}, Rs2 = {Cδ1,Cδ2,Cε1,Cε2,Cζ}.The rotations between rigids areφ andψ for the backbone, andγ1

andγ2 for the side-chain.

3.3 Motion Planning ToolsSeveral geometric tools are combined within our motion plan-ner. The main algorithm fulfills the conformational spaceexploration. Two principal functions called into this algo-rithm concern the sampling of points in the search space (i.e.conformational sampling) and the avoidance of steric clas-hes (i.e. collision detection). We describe below the tailoredalgorithms we developed for molecular models.

3.3.1 Collision Detection.The overall performance ofmotion planning techniques highly relies on efficient algo-rithms for accepting or rejecting the sampled conformationsand paths. This performance requirement is especially import-ant for molecular applications because of the quadratic costof enumerating all non-bonded atom pairs in models withthousands of atoms.

For this purpose, we developed a tailored algorithm BioCD(Ruiz de Anguloet al., 2005) for efficient self-collision anddistance computations in highly articulated molecular models.BioCD uses, as Lotanet al. (2002), hierarchical data struc-tures that approximate the shape of the protein at successivelevels of details, allowing to significantly reduce the number ofinteracting pairs to be tested for collision. However, while theformer algorithm was designed to situations in which only afew randomly selected degrees of freedom (DOF) of the kine-matic chain are slightly changed at each step, BioCD is moreadapted to our sampling-based motion planning situations inwhich preselected but larger sets of DOF are modified simul-taneously and arbitrarily when sampling new conformationsduring conformational space exploration.

BioCD is inspired by the dual kd-tree traversal algorithmsinitially developed for N-point correlation problems in stati-stical learning (Mooreet al., 2001). The algorithm maintainstwo levels of bounding volume hierarchies grouped accordingto spatial proximity. The first level organizes the rigid parts

3

Cortés et al

of the articulated model according to the selected DOF, andthe second level organizes the atoms inside each rigid part ofthe first level. Such data-structure can be efficiently tested forcollision and also updated with a moderate cost. Experimen-tal tests performed with BioCD show its efficacy to processthousands of collision tests per second on articulated proteinchains with hundreds of DOF (Ruiz de Anguloet al., 2005).

3.3.2 Conformational Sampling.We also developed analgorithm to sample molecular conformations that satisfyloop closure constraints. The algorithm relies on the gene-ral technique calledRandom Loop Generator(RLG) (Cortéset al., 2002). Its application to protein loops is discussed byCortéset al. (2004). RLG is based on a decomposition ofthe closed-chain mechanism. The kinematic chain correspon-ding to the loop backbone is divided into an active subchainand a passive one. The passive subchain is a backbone por-tion involving 6 rotational bonds. The parameters (dihedralangles) of the active subchain are progressively sampled usinga simple geometric algorithm that notably increases the pro-bability of obtaining a conformation of the passive subchainsatisfying loop closure. The conformation of the passive sub-chain is computed by a general6R inverse kinematics method(Renaud, 2000). RLG performs efficiently with long proteinloops, which are a challenge for most other related techniques.

Our RLG-based sampler integrates collision detection in theprogressive sampling process. Each time that a dihedral angleis generated, overlaps between atoms in the correspondingrigid group and the rest of the protein (including the previouslygenerated loop portion) are checked. In case of collision, thedihedral angle is re-sampled a given number of times. Thecomputed backbone conformations satisfy loop-closure andavoidance of steric clashes.

The loop conformational sampler also considers distanceconstraints between different elements of the closed chain.Such constraints allow us to account for the presence of back-bone hydrogen bonds, which particularly affect the motionof some loops (e.g. hairpin loops). Loop conformations aresampled such that the position of N and O atom pairs involvedin hydrogen bonds remains within a given distance range.

3.3.3 Conformational Exploration.The main algorithm ina motion planner determines the strategy to explore the searchspace. Molecular motions are in general extremely cons-trained mainly due to steric clashes. In other applicationdomains (e.g. mechanical disassembly), incremental searchplanners have demonstrated to be better suited than roadmap-based algorithms to solve very constrained motion planningproblems. A recent improved version of the RRT approachpresented by Yershovaet al. (2005) shows impressive per-formance in such cases. The basic principle is to iterativelydevelop a random tree rooted at the initial conformation. Ineach iteration, the tree is expanded toward a randomly sampledconformation while motion constraints remain satisfied along

CS

q

q rand

nearinit

CSfeas

q

Fig. 2. Expansion of a node in a search tree using an RRT-basedalgorithm. A conformationqrand is randomly sampled. The confor-mationqnear corresponding to the nearest node in the tree is pulledtowardqrand while feasibility constraints are satisfied. The tree aimsto capture the topology of the feasible subsetCSfeas.

the path (see Figure 2). We extended this kind of explora-tion strategy to applications involving protein loops followingthe approach for closed-chain motion planning described byCortés and Siméon (2004).

The next sections discuss two applications of our planningtechniques on problems involving large molecular motions.They also demonstrate the capability of the approach toefficiently handle articulated models with many DOF.

4 PROTEIN LOOP MOBILITYLoops are irregular portions of proteins. Such “irregularity”implies a difficulty for structure prediction and available tech-niques often fail when applied to long loops (Tramontanoet al., 2001). Indeed, surface loops are, in many cases, able toundergo significant conformational changes (often correspon-ding to their functionality), and they can adopt a variety ofenergetically favorable conformations. The main interest forstudying loop conformational changes is due to their import-ance in protein interactions. For instance, they can adaptthe surface topology of antibodies for antigen recognition(Jameset al., 2003), or play key roles in catalytic mecha-nisms (Osborneet al., 2001). Despite the interest in proteinloops, very limited tools are available to analyze their mobility.Energy-based approaches are only applicable (in practice) tocompute slight conformational changes. For larger motions,computationally simpler approaches have to be conceived.

Next we discuss the results obtained with our geometricfiltering approach for the study of a particular enzyme, a xyla-nase. The goal of this study is to validate the approach bycomparing our results to those obtained by a classic molecularmodeling method and to experimental results.

4.1 Mobility of a Specific Loop of XylanaseWe study endo-β-1,4-xylanase (EC 3.2.1.8) fromThermoba-cillus xylanilyticus(XTX) aiming to optimize the conversionof cereal co-products in bio-ethanol fuel. The architecture of

4

Computing Large-Amplitude Motions of Flexible Molecules

Fig. 3. Geometrically feasible conformational change of the“thumb” hairpin loop of xylanase fromThermobacillus xylanilyticus.

XTX 1 is similar to a right hand, where the thumb is a longhairpin loop. In this protein, the corresponding amino acidsequence spans from 107 to 125. Although maintained by anetwork of hydrogen bonds, this loop is suspected to be veryflexible, like for other xylanases (e.g., Muiluet al., 1998).An open loop conformation could allow an easier access ofthe substrate (xylan) to the catalytic pocket. Once the xylaninside the main crevice, a closed conformation of this loopcould complete the full docking. Figure 3 shows the structureof XTX and a feasible loop motion computed by our approach.

4.2 Results4.2.1 Conformational Sampling.We used two methods tosample acceptable conformations for this loop. Both methodsinclude the presence of backbone hydrogen-bond networks(HBN) that maintain the hairpin structure of the loop. Severalpossible HBN (from known structures of similar anti-parallelbeta-sheet loops) were tested.

The first method is a modified simulated annealing pro-cedure which is a well described approach in molecularmodeling when the number of DOF is too high to perform asystematic conformational search. The initial dynamics simu-lations were performed at 500 K with CFF91 force fieldfrom Accelrys, maintaining constraint distances (around 3.2Å) between non-hydrogen atoms supposed to be involved inhydrogen bonds. In the second stage, sets of conformationsregularly picked along the high temperature trajectories wereprogressively and slowly cooled and then minimized (morethan 10,000 iterations). At the final minimization step, thedistance constraints were removed. Only low energy confor-mations were selected and then clustered in several (10) lowenergy regions. A complete set of calculations (dynamics +minimization) involving a given HBN needs more then 10hours on a SGI computer (with MIPS R14000 processor).

The second approach applies a version of the loop confor-mational sampling algorithm described in Section 3.3.2. The

1 Known from personal communication. Structures of other xylanases of thesame family are available (e.g., de Lemoset al., 2004).

Fig. 4. Geometrically feasible random conformation of xylanaseloop (blue/dark), and low energy conformation obtained from it bysimple minimization (pink/clear). Both conformations are very close,the backbone RMSD is 0.67.

articulated loop model involves 68 DOF (42 for the back-bone and 26 for the side-chains). Conformations generatedby this geometric algorithm satisfy loop closure and stericclash avoidance. The method is very fast: 10 significantlydifferent conformations are generated in some seconds or, atworst, some minutes, depending on the difficulty to satisfy agiven HBN (on a PC with Intel Centrino 1.8 MHz processor).Obviously, the energy of such geometrically feasible confor-mations is in general higher than the energy of those obtainedby simulated annealing. However, a simple minimization ofa geometrically feasible conformation generally yields a lowenergy one. Since the geometric conformations already satisfysome strong constraints, their energy minimization is also veryfast, requiring about 1 minute per conformation. As shown inFigure 4, this minimization only produces a slight deformationof the loop conformation (RMSD < 1 Å).

4.2.2 Incremental Search.We next performed a more accu-rate study of xylanase loop mobility, considering the confor-mational change continuity, using the path planning approachexplained in Section 3.3.3. We applied our technique tocompare the mobility of the thumb loop in native XTXand in a mutant with two deletions: Tyr-111 and Thr-121.Experimentally, this mutant presents no activity.

Tests were made with different HBN. In all the cases, themutated loop presents a much more restricted mobility in thecrevice context compared to the native one. Figure 6 illustratesthe possible loop motions computed for the hydrogen-bondnetworks represented in Figure 5. The search trees (RRT)computed by the planning algorithm contain 5000 nodes. Notethat the construction of the trees requires less than one houron a standard PC. Our tests confirm the impossibility for thismutated loop to go deeply inside the crevice in order to fix the

109 110 111 112 113 114 115

N O N O N O N O

O N O N O N O N

122123124125

107 108 109 110

116N O N O N O N O

O N O N O N O N

N

O

117118119120

112 113 114 115Mutated loop(deletion 111,121)

Native XTX loop 116N O N O N O N O N O N O N O N O N O

O N O N O N O N O N O N O N O N O N

N

O

117118119120121122123124

107

125

108

Fig. 5. Hydrogen-bond networks (HBN) for the native/mutated loopof XTX helping to maintain the hairpin-like loop structure.

5

Cortés et al

Fig. 6. Graphic representation of the loop mobility computed fornative xylanase (left) and the mutated one (right). The small frames(in black) display positions reachable by the Cα atom of the middleresidue in the loop. The native loop can move toward the crevice whilethe mutated loop only undergoes slight conformational changes.

ligand for the catalytic action. This could explain the absenceof activity experimentally known for this mutant.

5 PROTEIN-LIGAND ACCESSIBILITYThe active site of many enzymes is located at the bottom ofa narrow and deep cavity. In such case, it is reasonable toconsider that the docking of the ligand to the binding pocketis influenced by the difficulty of accessing to the active site,this latter affecting the enzyme-ligand affinity. With energy-based methods, computing motions as large as that of a ligandentering from the protein surface to a deep active site (orvice versa) is very expensive in terms of computing time.However, when only geometric constraints are considered,computing the ligand path to go out from the catalytic pocketcan be seen as a mechanical disassembly problem on articula-ted enzyme/ligand models. Our incremental planner describedin Section 3.3.3 is well adapted to this kind of problem.

5.1 Study of Lipase EnantioselectivityThe active site of theBurkholderia cepacialipase (BCL)is placed at the bottom of a narrow and 17Å deep pocket.This lipase is used for the kinetic resolution of racemic2-substituted carboxylic acid in transesterification reaction.Recently, a classical approach consisting in the modelingof the R andS tetrahedral intermediates of the reaction fai-led to explain the enzyme enantioselectivity (Guieysseet al.,2003). Thus, a molecular modeling procedure based onpseudo-molecular dynamics simulation under constraints wasemployed to model the trajectory of each enantiomer from theactive site to the protein surface. Figure 7 shows the trace of thetrajectory of one of the enantiomers. Remarkably, the energyof the enzyme/enantiomer interaction along the trajectory wasfound to be always lower for the preferred enantiomer and inagreement with the experimental results of kinetic resolution.Consequently, molecular modeling of each enantiomer tra-jectory may be very informative to understand the enzyme

Fig. 7. Trajectory of a ligand ((R)-ph(Br)Et) accessing to the activesite ofBurkholderia cepacialipase.

enantioselectivity and very useful to predict it. However, themodeling protocol designed by Guieysseet al. (2003) hasseveral drawbacks. First, it does not enable an automaticmodeling because manual corrections of several side-chainorientations are required to remove the steric conflicts bet-ween the active site and the ligand along the trajectory. It isalso very much time consuming. Several days are required togenerate one trajectory.

5.2 ResultsOur incremental search planner was used to compute geome-trically feasible paths of articulated (R,S)-enantiomers, alsoconsidering the flexibility of 17 side-chains in the catalyticpocket of BCL. The model contains a total of 68 DOF (11for the ligand and 57 for the side-chains). Paths were com-puted for several couples of (R,S)-2-halogenophenyl aceticacid ethyl ester (referred to as ph(X)Et) in BCL for whichboth experimental (in vivo) results andin silico predictionsare available. Computing time is, at worst, several minutes(see Figure 9). The first conclusion of the study is that all theligand paths computed by our geometric approach are verysimilar to those obtained, after several days of computation,

Geometric approach + minimization-40

-35

-30

-25

-20

-15

-10

4,5 5,5 6,5 7,5 8,5 9,5 10,5

distance (Å)

inte

ract

ion

ener

gy (

kcal

/mol

)

R

S

Pseudo-molecular dynamics-40

-35

-30

-25

-20

-15

-10

4,5 5,5 6,5 7,5 8,5 9,5 10,5

distance (Å)

inte

ract

ion

ener

gy (

kcal

/mol

)

RS

Fig. 8. Interaction energy along the most constrained portion ofthe ligand trajectory from BCL active site. The distance is measu-red between the barycenters of the catalytic amino acid and ligand.The curves obtained by pseudo-molecular dynamics (left) are verysimilar to those obtained by geometric motion planning followedby energy minimization (right). In both cases, the (S)-enantiomerpresents higher interaction energy.

6

Computing Large-Amplitude Motions of Flexible Molecules

time (sec.)

Preferred enantiomer R

Computing

Experimental63

57

17

26

~1

1.5enantioselectivity

ph(Br)Et ph(Cl)Et ph(F)Et0

1000

50

100

500

1840

29 18

306

18 17

Ratio S/R

_R

S

R

Fig. 9. Diagram representing the average time for computing pathsfor three couples of enantiomers, (R,S)-ph(Br,Cl,F)Et. The compu-ting time ratioS/Rcorrelates with experimental enantioselectivity(Guieysseet al., 2003).

by pseudo-molecular dynamics. After a simple (fast) energyminimization of intermediate conformations along the geo-metrically feasible paths, the energetic profiles are also verysimilar to the curves computed by pseudo-molecular dynamics(see Figure 8).

Figure 9 shows the average computing time calculatedfrom 50 tests and the experimental enantioselectivity forthree couples of enantiomers. In general, computing time ofsampling-based motion planning algorithms increases withthe difficulty of the problem. Thus, assuming that the topo-logy of the catalytic pocket is better adapted to the access ofthe preferred enantiomer, its path should be computed fasterthan the path of the slow reacting one. Remarkably, our resultsshow a good correlation between the ratio of the computingtime necessary to find the path of (R,S)-enantiomers and theexperimental enantioselectivity value.

In addition, analysis of the set of computed paths enables avery fast localization of amino acid residues constraining theaccess of the enantiomers and involved in BCL discriminationof racemic compounds. Histograms in Figure 10 display theenzyme atoms constraining the motion of the (R,S)-ph(Br,F)Etenantiomers in four different portions of the path. Note thatthe (S)-ph(Br)Et enantiomer meets a higher number of atomsrestraining its access. This is totally consistent with compu-ting time results. Therefore using this fast technique makes itvery easy to pinpoint residues possibly involved in enantio-selectivity, this providing highly valuable information for sitedirected mutagenesis.

6 DISCUSSIONThe essential idea behind our approach is that importantconstraints affecting molecular motions have a geometricalinterpretation. Sampling-based motion planning algorithmsapplied on flexible molecular models are efficient tools tocompute the subsets of their conformational space that satisfysuch constraints. Our aim is to use such geometric algorithmsas efficient conformational filters before a fine energy-basedanalysis. The geometric filtering is particularly important

for large-amplitude motions in high-dimensional spaces, forwhich the applicability of energy-based approaches is limited.Furthermore, as shown by the results above, a biological inter-pretation can be directly made from geometrically feasiblemolecular motions.

Two kinds of applications have been presented. The first oneconcerns the analysis of protein flexibility, aiming to predictpossible conformational changes of polypeptide segments.The other application concerns accessibility problems inprotein-ligand interactions. Although the study conducted forthis second application only involves protein side-chain flexi-bility, the backbone flexibility (treated in the first application)could also be considered.

Our geometric algorithms have been tested with severalmolecules and first results are very promising. However, adeeper study with a larger set of models remains to be done.We are currently looking for examples of protein-ligand inter-actions involving narrow and deep cavities with availableexperimental results for a more rigorous validation.

Our current work focused on the geometric level of theapproach and classical energy-based molecular modelingmethods are used for the second stage. We aim to developnew techniques for this stage to better exploit the geometricpath information provided in the first filtering stage.

ACKNOWLEDGMENTThis work has been partially supported by the PIR CNRSproject BioMove3D, the European project IST-37185 MOVIEand the French ADEME project 02-01051.

( )−ph(Br)EtR

Side−chainBackbone

100

50

0

( )−ph(Br)EtS

100

50

0

Contact (%) Contact (%)

100

50

0

R( )−ph(F)Et100

50

0

S( )−ph(F)EtContact (%) Contact (%)

Portion 1 Portion 2 Portion 3 Portion 45 6.5 8 10

Portion 1 Portion 2 Portion 3 Portion 45 6.5 8 10

Portion 1 Portion 2 Portion 3 Portion 45 6.5 8 10

Portion 1 Portion 2 Portion 3 Portion 45 6.5 8 1015 15

1515

Fig. 10. Histograms representing lipase atoms constraining theligand motion in different portions of the path. Contact is consideredbetween 80% and 100% of the van der Waals equilibrium distance.The ordinates represent the percentage of times (over 50 runs) that acontact is observed. The names of the atoms and the correspondingresidues have been omitted for clarity reasons.

7

Cortés et al

REFERENCESAmato N.M., Dill K.A., Song G. (2003) Using Motion Planning to

Map Protein Folding Landscapes and Analyze Folding Kinetics ofKnown Native Structures,J. Comput. Biol., 10, 149-168.

Apaydin M.S., Brutlag D.L., Guestrin C., Hsu D., Latombe J.-C.(2002) Stochastic Roadmap Simulation: An Efficient Represen-tation and Algorithm for Analyzing Molecular Motion,Proc.RECOMB, 12-21.

Apaydin M.S., Brutlag D.L., Guestrin C., Hsu D., Latombe J.-C. (2004) Stochastic Conformational Roadmaps for ComputingEnsemble Properties of Molecular Motion,Algorithmic Foun-dations of Robotics V (WAFR2002), Springer-Verlag, Berlin,131-147.

Canutescu A.A., Dunbrack Jr. R.L. (2003) Cyclic CoordinateDescent: A Robotics Algorithm for Protein Loop Closure,ProteinSci., 12, 963-972.

Cortés J., Siméon T., Laumond J.-P. (2002) A Random Loop Gene-rator for Planning the Motions of Closed Kinematic Chains usingPRM Methods,Proc. IEEE Int. Conf. Rob. & Autom., 2141-2146.

Cortés J., Siméon T., Remaud-Siméon M., Tran V. (2004) Geome-tric Algorithms for the Conformational Analysis of Long ProteinLoops,J. Comput. Chem., 25, 956-967.

Cortés J., Siméon T. (2004) Sampling-Based Motion Planning underKinematic Loop-Closure Constraints,Proc. WAFR, Springer-Verlag, Berlin, in press.

DePristo M.A., de Bakker P.I.W., Lovell S.C., Blundell T.L. (2003)Ab Initio Construction of Polypeptide Fragments: Efficient Gene-ration of Accurate, Representative Ensembles,Proteins, 51,41-55.

G o N., Scheraga H.A. (1970) Ring Closure and Local Confor-mational Deformations of Chain Molecules,Macromolecules, 3,178-187.

Guieysse D., Salagnad C., Monsan P., Remaud-Simeon M., TranV. (2003) Towards a Novel Explanation ofPseudomonas CepaciaLipase Enantioselectivity Via Molecular Modelling of the Enan-tiomer Trajectory Into the Active Site,Tetrahedron: Asymmetry,14, 1807-1817.

Jackson R.M., Gabb H.A., Stenberg M.J.E. (1998) Rapid Refinementof Protein Interfaces Incorporating Solvation: Application to theDocking Problem,J. Mol. Biol., 276, 265-285.

James L.C., Roversi P., Tawfik D.S. (2003) Antibody MultispecificityMediated by Conformational Diversity,Science, 299, 1362-1367.

Janin J., Henrick K., Moult J., Eyck L.T., Sternberg M.J.E., VajdaS., Vakser I., Wodak S.J. (2003) CAPRI: A Critical Assessment ofPRedicted Interactions,Proteins, 52, 2-9.

Kavraki L.E., Svestka P., Latombe J.-C., Overmars M.H. (1996)Probabilistic Roadmaps for Path Planning in High-DimensionalConfiguration Spaces,IEEE Trans. Rob. & Autom., 12, 566-580.

Kuntz I.D., Blaney J.M., Oatley S.J., Langridge R., Ferrin T.E.(1982) A Geometric Approach to Macromolecule-Ligand Inter-actions,J. Mol. Biol., 161, 269-288.

Latombe J.-C. (1991) Robot Motion Planning, Kluwer AcademicPublishers, Boston, MA.

LaValle S.M., Kuffner, J.J. (2001) Rapidly-Exploring Random Trees:Progress and Prospects,Algorithmic and Computational Robotics:New Directions (WAFR2000), A.K. Peters, Boston, 293-308.

Leach, A.R. (1996) Molecular Modeling: Principles and Applicati-ons, Longman, Essex.

de Lemos Esteves F., Ruelle V., Lamotte-Brasseur J., Quinting B.,Frère J.M. (2004) Acidophilic Adaptation of Family 11 Endo-β-1,4-xylanases: Modeling and Mutational Analysis,Protein Sci.,13, 1209-1218.

Lotan I., Schwarzer F., Halperin D. Latombe J.-C. (2002) EfficientMaintenance and Self-Collision Testing for Kinematic Chains,Proc. 18th ACM Symp. Comput. Geom., 43-52.

Moore A.W., Connolly A., Genovese C., Gray A., Grone L., Kani-doris II N., Nichol R., Schneider J., Szalay A., Szapudi I.,Wasseerman L. (2001) Fast Algorithms and Efficient Statistics: N-point Correlation Functions,Proc. MPA/MPE/ESO Conf. “Miningthe Sky”, Springer-Verlag, New York.

Moult J., James M.N.G. (1986) An Algorithm for Determining theConformation of Polypeptide Segments in Proteins by SystematicSearch,Proteins, 1, 146-163.

Muilu J., Torronen A., Perakyla M., Rouvinen J. (1998) Func-tional Conformational Changes of Endo-1,4-xylanase II FromTrichoderma reesei: A Molecular Dynamics Study,Proteins, 31,434-44.

Osborne M.J., Schnell J., Benkovic S.J., Dyson H.J., Wright P.E.(2001) Backbone Dynamics in Dihydrofolate Reductase Com-plexes: Role of Loop Flexibility in the Catalytic Mechanism,Biochemistry, 40, 9846-9859.

Rarey M., Kramer B., Lengauer T., Klebe G. (1996) A Fast FlexibleDocking Method Using an Incremental Construction Algorithm,J. Mol. Biol., 261, 470-489.

Renaud M. (2000) A Simplified Inverse Kinematic Model Calcula-tion Method for all6R type Manipulators,Proc. Int. Conf. Mech.Design and Production, 15-25.

Ruiz de Angulo V., Cortés J., Siméon T. (2005) BioCD: An Effi-cient Algorithm for Self-Detection and Distance Computationbetween Highly Articulated Molecular Models,Robotics: Scienceand Systems, submitted.

Siméon T., Laumond J.-P., Lamiraux F. (2001) Move3D: a GenericPlatform for Path Planning,Proc. IEEE Int. Conf. Rob. & Autom.,25-30.

Singh A.P., Latombe J.-C., Brutlag D.L. (1999) A Motion PlanningApproach to Flexible Ligand Binding,Proc. ISMB, AAAI Press,Menlo Park, CA, 252-261.

Tang X., Kirkpatrick B., Thomas S., Song G, Amato N.M. (2004)Using Motion Planning to Study RNA Folding Kinetics,Proc.RECOMB, 252-261.

Tramontano A., Leplae R., Morea V. (2001) Analysis andAssessment of Comparative Modeling Predictions in CASP4,Problems, Suppl.5, 22-38.

Wedemeyer W.J., Scheraga H.A. (1999) Exact Analytical Loop Clos-ure in Proteins Using Polynomial Equations,J. Comput. Chem.,20, 819-844.

Yershova A., Jaillet L., Siméon T., LaValle S.M. (2005) Dynamic-Domain RRTs: Efficient Exploration by Controlling the SamplingDomain,IEEE Int. Conf. Rob. & Autom., submitted.

Zhang M., Kavraki L.E. (2002) A New Method for Fast and AccurateDerivation of Molecular Conformations,J. Chem. Inf. Comput.Sci., 41, 64-70.

8