1997 ESCHER a New Docking Procedure Applied to the Reconstruction of Protein Tertiary Structure

Seediscussions,stats,andauthorprofilesforthispublicationat:http://www.researchgate.net/publication/13958109

ESCHER:anewdockingprocedureappliedtothereconstructionofproteintertiarystructure.ARTICLEinPROTEINSSTRUCTUREFUNCTIONANDBIOINFORMATICSSEPTEMBER1997ImpactFactor:2.92DOI:10.1002/(SICI)1097-0134(199708)28:43.0.CO;2-7Source:PubMed

CITATIONS60

DOWNLOADS262

VIEWS170

3AUTHORS,INCLUDING:

GianniCesareniUniversityofRomeTorVergata198PUBLICATIONS12,950CITATIONS

SEEPROFILE

Availablefrom:GianniCesareniRetrievedon:16June2015

ESCHER: ANew Docking ProcedureApplied to theReconstruction of Protein Tertiary StructureG.Ausiello, G. Cesareni, and M. Helmer-Citterich*Department of Biology, University of Rome Tor Vergata, Via della Ricerca Scientifica, 00133 Roma, Italy

ABSTRACT Evaluation of Surface Com-plementarity, Hydrogen bonding, and Electro-static interaction in molecular Recognition(ESCHER) is a new docking procedure consist-ing of three modules that work in series. Thefirst module evaluates the geometric comple-mentarity and produces a set of rough solu-tions for the docking problem. The secondmodule identifies molecular collisions withinthose solutions, and the third evaluates theirelectrostatic complementarity. We describe thealgorithm and its application to the docking ofcocrystallized protein domains and unboundcomponents of protein-protein complexes. Fur-thermore, ESCHER has been applied to thereassociation of secondary and supersecond-ary structure elements. Thepossibility of apply-ing a dockingmethod to the problem of proteinstructure prediction is discussed. Proteins 28:556567, 1997. r 1997 Wiley-Liss, Inc.

Key words: molecular recognition; automateddocking; protein domains; second-ary structure elements

INTRODUCTION

Numerous automated methods for the predictionof protein complex formation have been developed.1,2

They are generally based on the rigid body approxi-mation35: some rely on a geometric criterion and ona simplified representation of the protein surface,soft docking methods,68,10 whereas others combinea shape complementarity search with subsequentenergy refinement.11,12 Many algorithms can success-fully reconstruct a protein complex starting from theexperimentally determined conformation of the com-ponents in the bound state. In some cases, somesuccess, albeit with lower accuracy, has been ob-tained by docking elements whose structures havebeen determined separately.8,10,13

In the rigid body approach, the position and orien-tation of a protein is completely described by threetranslational and three rotational degrees of free-dom. This approach is, therefore, widely used inautomated methods, when molecular flexibility isnot expected to affect much the geometry of theinteracting structures.

On the other hand, when the docking probleminvolves the recognition between a rigid protein anda flexible ligand, the rigid body approximation is notapplicable, and more conformational degrees of free-dom have to be taken into account.14,15 The basicprinciples underlying rigid body and flexible dockingare common, but they generally require differentcomputational techniques.We explore here the possibility of applying a new

soft docking rigid body method to components ofintermediate size when compared with the classi-cal targets of the docking methodologies discussedabove. We chose to dock protein components, such asprotein domains and secondary and supersecondarystructures, and to explore the limits of the rigid bodyapproximation when applied to a series of test casesdisplaying different sizes, structural complexities,and root-mean-square (rms) deviations from thestructures of the protein in their bound conforma-tions. The ultimate goal of this approach is theassembly of a whole protein tertiary structure start-ing from its secondary structure elements.The reconstruction of a protein structure starting

from its secondary structure elements has alreadybeen tried on a very small number of proteins,generally relying on knowledge-based potentials,1618

as well as on other methods.1921 To tackle thisproblem, a novel docking programwas used as a tool.To this end, we have substantially modified thePUZZLE8 procedure to achieve a more accuratedescription of the surface of small proteins and moreeffective matching and scoring criteria. Further-more, the new program, ESCHER, takes advantageof two filtermodules that reject solutions resulting insteric (BUMPS) or electrostatic (CHARGES) clashes.The addition of these modules allows less stringentcriteria for geometric complementarity to be consid-ered and, consequently, contributes to a furthersoftening of the docking procedure.

Abbreviations: PDB, Protein Data Bank; rmsd, root-mean-square deviation

*Correspondence to: Manuela Helmer-Citterich, Departmentof Biology, University of Rome Tor Vergata, via della RicercaScientifica e Tecnologica 00133 Rome, Italy.E-mail: [email protected] 22 October 1996; Accepted 3April 1997

PROTEINS: Structure, Function, and Genetics 28:556567 (1997)

r 1997 WILEY-LISS, INC.

ESCHER was applied to a series of two-domainproteins, by separating the domains and by attempt-ing to reconstruct the native protein from the two-domain crystallographic structures, and to proteincomplexes starting from their unbound components.Furthermore, we defined a limited set of test cases inwhich the docking partners were represented bysecondary and supersecondary structures separatedfrom their protein. The docking experiments wereperformedwith crystallographic structures and struc-tural models. This was done to assess whether andhow minor deviations from the idealized secondarystructure backbone conformation or lack of knowl-edge of the exact side-chain conformation couldaffect the docking results.

METHODSOverview

ESCHER is a rigid docking program, written in Clanguage, assembled from three modules that workin series: SHAPES, BUMPS, and CHARGES. Thefirst module takes as input the solvent-accessiblesurface of the target and probe peptides and providesa list of 500 solutions, ranked on the basis ofgeometric complementarity. Each solution is identi-fied by the translations and rotations to be applied tothe probe to match the target. The second module,BUMPS, analyzes the molecular collisions withinthe solution complexes, and CHARGES evaluatestheir electrostatic complementarity. All the solutionsgenerated by SHAPES are subjected to the BUMPSfilter and grouped together whenever their deviationfrom the member of the cluster with the highestgeometric complementarity is lower than 2.5 .Finally, each group is represented by the solutionwith the best CHARGES value.

SHAPESModule

SHAPES analyzes two peptide structures, evalu-ates their complementarity in all possible orienta-tions, and proposes a number of docking solutions,each identified by the Ti translations and the Rirotations to be applied to the second element (theprobe) with respect to the first element (the target).Each structural element is cut in parallel slices, andeach slice is described as a polygon with sides ofidentical length. The polygons representing the tar-get are compared with the probe polygons, andregions of complementarity are evaluated. A com-plete search in the rotation space is done by rotatingthe probe in all possible orientations with respect tothe target.The flowchart of the geometric module is reported

in Figure 1. Each box in the flowchart (b-1 to b-7)corresponds to one paragraph in the following de-tailed SHAPES description.

(b-1) Structural elements description

The solvent-accessible surface22,23 of each proteinstructural element is generated from its atomiccoordinates by using a probe radius of 1.8 (adensity of 10 dots/2). Each surface is cut in parallelslices 1.5 thick, orthogonal to the Z axis (Fig. 2a).Each slice is transformed into a polygon (a fewhundred of vertices) by an intelligent contouringalgorithm (Fig. 2b,c,d).

(b-2) Starting position in Z translations

The probe polygons are translated along the Z axisuntil the probe top polygon is at the level of thetarget bottom polygon (Fig. 3a).

(b-3) Polygon matching

Each polygon of the target can now be comparedwith the corresponding probe polygon belonging to

Fig. 1. Flowchart of ESCHER geometric module. Each box inthe flowchart corresponds to a paragraph in Methods.

557DOCKING OF PROTEIN STRUCTURAL ELEMENTS

the same Z plane. Each one of the m probe polygonsides is superposed to each one of the n correspond-ing target polygon sides, by applying the necessarytranslations (Tx and Ty) and rotation (Rz). A mea-sure of geometric complementarity (complementar-ity score) is associated to each one of the mxndifferent superpositions and to the correspondingRz,Tx and Ty transformation (Fig. 3b); the complemen-tarity score is the number of corresponding consecu-tive polygon vertices whose distance is lower than1.6 (Fig. 3c). Whenever a small cavity is present in

the interface between two polygons, the complemen-tarity of the preceding or of the following polygonsides still can contribute to the geometric score.

(b-4) Match grouping

SHAPES solutions are generated by grouping thepolygon matches coming from different Z planes butrepresenting the same complementary region.To group polygonmatches and define a complemen-

tary region extending through different Z planes, a

Fig. 2. Schematic representation of four steps needed to transform a three-dimensionalstructure in a set of polygons. a: A section orthogonal to the Z axis is cut on the structural elementsolvent accessible surface. b: The grid squares are defined filled (in white) or empty (in gray)whether they contain surface dots. c:A first polygonal line is drawn joining the center of the externalfilled squares. d: A second polygon with even sides (shown in bold) is drawn over the first polygon.

558 G. AUSIELLO ET AL.

three-dimensional space is defined in which a poly-gon match is represented by a dot with the corre-sponding Tx, Ty, and Rz coordinates. In this space,we define a grouping grid with a cell size of 3 alongthe Tx and Ty axes and 12 along the Rz axis.Whenever a new polygon match is found, its comple-mentarity value is added into the correspondingelement of the grouping grid.At the end of the match grouping step, the indexes

of the grid element that scored the highest comple-

mentarity value represent the Tx, Ty, and Rz trans-formations to apply to the probe to correctly matchthe highest number of polygon pairs. These Tx, Ty,and Rz values, together with the Tz, Rx, and Ryexplored stepwise as explained below, define a solu-tion complex, associated with the complementarityvalue stored into its grouping matrix element.

(b-5) Z Translations

All the polygons of the probe are translated by21.5 along the Z axis, thus shifting the pairs ofpolygons that are matched in the subsequent poly-gon matching phase. This corresponds to a stepwiseexploration of the Tz degree of freedom, where DTz51.5 . This step is executed until the probe toppolygon reaches the level of the target bottom poly-gon. All the probe polygons are therefore comparedwith the target protein polygons.

(b-6) X-Y rotations

In this step, the probe is rotated around the X andY axis to analyze all the possible orientations withrespect to the target.Rx and Ry are explored stepwise: 2180 # Rx ,

180, 290 # Ry , 90, with DRx 5 DRy 5 DR. Thisis equivalent to exerting a complete search in thespace of all possible target-probe relative orienta-tions, by rotating one of the two peptides by DR stepsaround the X and Y axis. In the docking experimentsdescribed, DR 5 10. This value can be changed,when necessary, according to the probe dimensions.The complete exploration of the rotation space aroundthe Z axis (Rz) is exerted continuously within thepolygon pairs comparison.At the end of this step, theprobe is described into a new set of polygons.

(b-7) Target orientation

The cylindrical symmetry inherent to the kind ofdescription adopted is very convenient to transforma three-dimensional surface matching problem intoa simplified two-dimensional polygon comparisonbut offers a very poor description of the structuralelement poles.One of the two elements (the probe) is rotated in

all possible orientations and is therefore optimallydescribed at least once. The problem may concernthe target element, which remains in the sameorientation during the whole procedure. One mayovercome this problem describing the target elementin more than one set of parallel polygons and repeat-ing the comparison algorithm on the minimal num-ber of polygon sets which one considers necessary.Cases studied to address this problem indicate that,in general, two orthogonal sets of polygons aresufficient to detect a region of geometric complemen-tarity in at least one of the target orientations tested(data not shown).

Fig. 3. a: The target and probe structural elements aredescribed as Tn and Pn polygon sets, respectively, and aretranslated in the starting position along the Z axis. The probebottom polygon (Pn) is aligned with the top target polygon (T1). b:The three values that define one polygon match between Pn andT1 are represented: the two X and Y translations (Tx and Ty) andthe rotation around the Z axis (Rz). c: The vertices of thesuperposed polygon sides are numbered with 0, and distancesbetween the corresponding vertices on the two polygons aredepicted with dotted lines. The complementarity value is thenumber of consecutive sides whose corresponding vertices aresituated at a distance lower than 1.6 .


In all of the test cases discussed below, the targethas been described only once in an optimal orienta-tion (with the interaction site parallel to the Z axis).We also verified that the results of the procedure areessentially independent of the orientation of thetarget in the reference system (a target rotationaround the X or Y axis of up to 45 does not affect theresults).

BUMPSModule

SHAPES module evaluates complementarity onthe basis of distances, which are intrinsically posi-tive numbers. Consequently, it cannot distinguishbetween polygon perimeters that diverge or interpen-etrate following or preceding a complementary re-gion. Furthermore, a docking solution with a goodlocal match may be accompanied by a steric clashin another part of the proposed complex. The solu-tion list of the geometric module may, therefore,include sterically impossible structures, whose com-ponents share regions of shape complementarity butalso overlap somewhere else.The BUMPS module analyzes the distance be-

tween each atom of the target structural elementand each atom of the probe in the complexes pro-posed by SHAPES and evaluates a bumps value.

BUMPS 5 oa

obBab

where

Bab 5 50 if Rab . Ra 1 Rb

Ra 1 Rb 2 Rab if Rab # Ra 1 Rb

Ra and Rb are the Van der Waals radii of the atoms ofthe target and of the probe, and Rab is the distancebetween the atoms.A complex proposed by the SHAPES module is

discarded if the bumps value exceeds a certainthreshold. The threshold may vary for differentdocking tests: it should be set high when dockingmodels of proteins that have been crystallized sepa-rately, because conformational changes may takeplace on binding. On the other hand, the thresholdcould be decreased when docking models derivedfrom cocrystallized structures. The threshold waskept uniformly at 50 for all docking experiments(high stringency), except for those with the commonmodel structures (110 , low stringency). In theanalysis of the molecular collisions within models ofcocrystallized complexes, one should expect in prin-ciple a low or very low bump value. In our procedure,however, we have always taken into account theintrinsic resolution of the SHAPES module (seebelow).

CHARGESModule

CHARGES provides an estimate of the electro-static complementarity of the solution complexesproduced by SHAPES.Each atom of the two structural elements is as-

signed to one among six different atom types: posi-tively charged (Nh1/Nh2(Arg), Nz(Lys)), negativelycharged (Od1/Od2(Asp), Oe1/Oe2(Glu)), hydrogen-bond donor (backbone N, Ne(Arg), Nd2(Asn),Ne2(Gln)), hydrogen-bond acceptor (backbone O,Od1(Asn), Oe1(Gln)), hydrogen-bond donor or accep-tor (Og(Ser), Og1(Thr), Oh(Tyr), Nd1/Ne2(His)), apo-lar (everything else).The distance between each atom of the target

structural element and each atom of the probestructural element is measured. A charge value iscalculated for atom pairs whose distance is lowerthan a fixed cutoff (Table I).

CHARGES 5 oa

obCab

where

Cab 5 50 if Rab . Rcutoff

AKaKb if Rab # Rcutoff

and

Rcutoff 5 54.0 if both target and probe

atoms are apolar3.4 otherwise

Rab is the distance between atoms a and b; A is amatrix containing the values reported in Table I.Each complex proposed by the module SHAPES is

then discarded if the charge value is below a certainthreshold. As already discussed for BUMPS,CHARGES threshold can be varied for differentdocking experiments: it is set lower (2110, low

TABLE I. ChargeValues inDifferentAtomTypes Interactions

Atom type (1) (2) (Hb-d) (Hb-a) (Hb-da) (0)

Positivelycharged (1) 24 14 23 13 13 22

Negativelycharged (2) 24 13 23 13 22

H-bond donor(Hb-d) 22 12 12 21

H-bond acceptor(Hb-a) 22 12 21

H-bond donor/acceptor(Hb-da)

12 21

Apolar (0) 11


stringency) when docking common models, andhigher (250, high stringency) when docking all othertypes of structures.The distances considered and the values assigned

to the different potential electrostatic interactions donot mean to mirror real atomic interactions: thevalues shown in Table I and the cutoff consideredwere determined empirically by running theCHARGES module on a small database of proteincomplexes (trypsin/trypsin inhibitor, 3tpi; antibody/lysozyme, 2hfl; triose-phosphate isomerase dimer,7tim) and on approximately 40 three-dimensionalmodels of mutagenized Rop proteins. They have beenset to allow for the correct prediction (in 96% of thecases) of the mutant dimerization phenotype. Atomclassification assigned in Table I is similar to oth-ers.6,13

Output of the ESCHER Procedure

ESCHER execution time varies with the size of theinput structures: on a Silicon Graphics with anR8000 75MH processor, it may vary from a fewminutes for two 30 amino acid long a-helices up to afew hours for large domains (more than 300 aminoacids each).The ESCHER procedure output is a list of solu-

tions, each identified by the translations and rota-tions to be applied to the probe with respect to thetarget. In all of the docking experiments described,the number of listed solutions has been set to 500.This number has been increased to 5000 for thedocking of unbound proteins, with a modest increasein calculation time.Each solution is associated with the output scores

of the three ESCHER modules described: a geomet-ric complementarity score coming from SHAPES, amolecular collisions score from BUMPS, and anevaluation of electrostatic complementarity fromCHARGES. The solutions with acceptable BUMPSand CHARGES values are retained and rankedaccording to their SHAPES score.When SHAPES identifies an extended region of

good complementarity, it usually generates a num-ber of solutions in the same region. Therefore, all thesolutions generated are grouped together on thebasis of similarity; this is done by taking the solutionwith the highest geometric complementarity valueand defining all other solutions having an rmsdeviation , 2.5 (5.0 for low-stringency condi-tions) from this structure as belonging to one cluster.The grouping continues with the best solution fromthose remaining and forms the next cluster by usingthe same procedure. Within each group, solutionsare ranked according to their CHARGES value.The resolution of the ESCHER procedure depends

on two major factors: the size of the grouping matrixelement (which affects the Tx, Ty, and Rz coordi-nates) and the stepwise exploration of the Z transla-tion and of the X and Y rotations. The T and R

coordinates identifying the solutions proposed should,therefore, be considered more properly as T 6 DTand R 6 DR. The expected error is different accord-ing to the coordinate considered: DTz , 0.75 ; DRx 5DRy , 5; DRz varies from 0 to 6, DTx and DTy varyfrom 0 to 1.5 .In some cases, when docking secondary structure

elements, a loose distance constraint was applied todiscard solutions not compatible with the primarysequence of the analyzed protein. In the experimentwith the two Rop helices, we defined a 14.5 distance threshold between the a-carbons of resi-dues 28 and 32; this threshold corresponds to thedistance between the a-carbon of the N-terminal andC-terminal residues of a 5-residue long peptide in anextended conformation.

Construction of the Common, Similar,and Backbone Structural Models

To simulate the experimental cases in which onlyinformation on the secondary structure is known, weused three types of structural models in which theamino acid side-chain or backbone atoms positionare different from those found in the crystal struc-ture: common, similar, and backbone.Commonmodel structures (Fig. 4a) were obtained

from the crystallized element coordinates by substi-tuting the conformation of each amino acid sidechain with the most common rotamer in a databaseof amino acid rotamers24 (on average, 4.5 rotamers/amino acid). Common models can therefore be builton the backbone atom coordinates without any infor-mation on side-chain conformation. They display anaverage rms , 0.80 with respect to the originalcrystal structures.Similar model structures (Fig. 4b) have been

obtained by replacing each side chain with the mostsimilar rotamer in the rotamer database. Thesemodels can be built only when the original crystalstructure is known. The similar model structuresdisplay an average rms , 0.30 with respect to thecorresponding crystal structure.The backbonemodel structures (Fig. 4c) were built

by substituting Rop helices backbone with geometri-cally regular a-helices (f 5 265, c 5 240,v 5 180), by using the secondary structure option inthe Biopolymer module of insightII.25 Their rmsdeviation from the starting structures is,0.55 . Noenergy refinement was applied to the model struc-tures obtained.

Structural Elements Preparation

All of the target and probe structural elementsdocked by ESCHER in this work were obtained bysplitting single-protein structures into two parts.The splitting procedure consisted of three steps. Inthe first step, each amino acid was defined as belong-ing to either the target or the probe structuralelements. In the second, the protein main chain was


cut at the peptide bond between target and probeamino acids, generating a number of peptide frag-ments. In the last step, the target and probe struc-tural elements were defined as the merged ensembleof their fragments. When the protein had to be splitinto two domains, the target and probe were definedwith automated criteria,2629 when the protein had tobe split into secondary structure elements, the sepa-ration was performed according to secondary struc-ture assignment (see Table 4).The smallest structural element was designed as

the probe. The starting orientation of the target hasalways been chosen so that the interface was approxi-mately parallel to the Z axis. In each docking experi-ment, the two parts were positioned in the worstpossible relative orientation: the probe was rotated5 around the X axis and 5 around the Y axis awayfrom the crystallographically defined orientation(SHAPES explores the rotation space around the Xand Y axis by 10 steps).

Protein Structures Analyzed

Twenty-three proteins of known structure wereused for our docking experiments from the ProteinData Bank (PDB).30 For test runs with proteindomains, we used the following: porcine pancreaticelastase (3est); papain (9pap); penicillopepsin (1ppl);papain (1ppn); bovine liver rhodanese (1rhd); thermo-lysin (4tln); bovine ribonuclease A (1rnd); yeastguanylate kinase (1gky); bovine gamma a-crystallin(2gcr); a cytochrome C2 (2c2c); a bacterial holo-D-glyceraldehyde-3-phosphate dehydrogenase (1gd1); abacterial trypsin (1sgt); rat mast cell protease II(3rp2); a bacterial dihydrofolate reductase (3dfr);alpha-lactalbumin (1alc); a protozoan calmodulin(1clm); yeast phosphoglycerate kinase (3pgk); a bo-vine prothrombin fragment (2pf2). These moleculeswere selected for our tests because the definition oftheir domains were derived by means of automated

procedures and were present in the literature.2629

For test runs with unbound components, we choseone protease/inhibitor complex (cocrystallized struc-ture: 1cho, corresponding structures crystallizedseparately: 5cha/2ovo); and two antibody/antigencomplexes: 1) the antibody moiety of the 3hfm struc-ture versus the lysozyme unbound structure 6lyz; 2)the antibody moiety of the 2hfl structure versus 6lyz.In these two last cases, solutions were comparedwith the 3hfmy and 2hfly molecules. For test runswith secondary and supersecondary structure ele-ments, we used the Rop protein (PDB code: 1rop31);the light chain of Hy-Hel10 (3hfm32) and lysozyme(6lyz33).

RESULTS

In an initial series of experiments (data not shown),ESCHER has been successfully applied to the recon-struction of protein complexes starting from thecrystal structure of their separated components:trypsin/trypsin inhibitor (3tpi); antibody/lysozyme(2hfl); Rop and triose-phosphate isomerase dimersfrom their separated monomers (1rop and 7tim).

Docking of Protein Domains

We have applied ESCHER to the reconstruction oftwo-domain proteins starting from their separateddomains. The proteins chosen represent a good data-base for the study of protein domain interaction:some of them have an extended buried surface and alarge number of domain contacts, whereas othersshow only a limited number of interdomain contactsand a very small buried surface (Table II).The target and probe domains were prepared as

described in Methods, starting from the depositedPDB structure and following automatic criteria2629

for the target and probe definition.In Table II, the results obtained with protein

domains are shown: for each domain pair, the num-

Fig. 4. Rop helix 1 (a1a30) crystallographic structure is shown in bold. It is superposed to thecommon model structure (a), to the similar model structure (b), and to the backbone model (c).


ber of proposed solutions is reported together withthe ranking of the correct solution and its all-atomroot-mean-square deviation (rmsd) from the crystal-lographic structure. In 9 of 18 docking runs, thecorrect solution was found and ranked first; in twocases, the correct solution ranked among the firstfew solutions; in the remaining seven cases, ESCHERwas not able to find the correct solution within thefirst 500 solutions proposed by the SHAPES module.In all but one of these seven cases (the bacterialdihydrofolate reductase, 3dfr), the number of atomiccontacts among the two domains is generally low. Itis interesting to note that 1) the number of solutionsproposed by the SHAPES module can be increasedup to many thousands (if necessary), with a modestincrease in calculation time and 2) that, on the otherhand, the number of solutions that escape theBUMPS and CHARGES filters is generally very low:in case of a really blind test, this would allow furtherinvestigation of few solutions by more sophisticatedmethods.There is no evident correlation between ESCHER

efficiency and buried surface area in the domain-domain interface. A strong correlation could be de-tected between ESCHER efficiency and a value f,defined as the product of the ratios between theburied and the whole surface of each domain:

f 5b1 b2

tot1 tot2

where b1 and b2 are the two domain buried surfacesand tot1 and tot2 are the two domains total surfaces.This value has no physical significance but can beused to predict ESCHER ability in rebuilding acomplex of known structure.In the test cases that we analyzed, when f . 19 3

1023, ESCHER ranked the correct solution amongthe first few. For lower f values, the correct solutionwas not found among the first 500 solutions proposedby SHAPES.

Docking of Protein Complexes Starting FromUnbound Components

We chose three cases: a protease-protease inhibi-tor and two antibody-antigen complexes. The resultsof the docking procedure were evaluated by superim-posing the docked (originally unbound) moleculesonto their crystal structures in the PDB complexes.These complexes (Table III, second column) wereused as reference systems to calculate the rmsd fromthe Ca of the proposed solutions.The protease-protease inhibitor complex was run

by using the entire protein surfaces, and the proce-dure proposed only three solutions. A solution withan rmsd of 1.9 with respect to the defined referencesystem scored third.In the immunoglobulin-lysozyme docking experi-

ment, the entire antigenwas docked onto the comple-mentary determining region (CDR) of the immuno-globulin. The 3hfm experiment ended with only two

TABLE II. ProteinDomainsResults

PDBcode Target Probe f 3 1023

Buriedsurface(2)

Rank (rms*) ofthe first correct

solution

Rank (rms) ofthe bestsolution

Numberof solutionsproposed

1clm 588 89147 1.1 263 83pgk 189404 1188, 405415 1.1 639 101gky 132, 82186 3381 3.3 468 12pf2 63145 162 4.2 457 72gcr 181 89169 5.3 532 33dfr 133, 109162 34108 12.1 932 11rnd 149, 80103 5079, 104124 14.5 726 8 (3.3) 191ppl 1192, 304323 193303 19.1 1603 1 (2.4) 1 (2.4) 11rhd 1160 161293 20.4 2001 1 (1.5) 1 (1.5) 12c2c 6395 96112 20.8 861 7 (2.0) 7 (2.0) 184tln 1151 152316 21.6 1807 1 (1.6) 1 (1.6) 31alc 38104 137, 105122 21.7 972 1 (1.8) 1 (1.5) 311gd1 1148 149332 22.0 2186 1 (2.0) 2 (1.5) 119pap 116, 113208 17112, 209212 23.7 1387 2 (2.0) 2 (2.0) 61ppn 118, 111212 19110 28.0 1567 1 (2.1) 2 (1.5) 41sgt 1628, 6980,

1212342968, 81120 31.2 1583 1 (1.5) 1 (1.5) 9

3est 30121, 234245 1629, 121233 38.9 2186 1 (1.7) 1 (1.7) 43rp2 127, 123216,

22123028122, 231243 47.6 2382 1 (1.6) 3 (1.4) 5

*All atoms rms is calculated by superposing the target domain of the docking solution complex onto the target domain of the proteincrystal structure.Rank and rmsd of the first solution having an rmsd below 2.5 .Identifies solutions not presenting the best CHARGES value of their cluster (see Methods).


proposed solutions, the second one displaying an rmsof 1.0 with respect to its reference in the crystalcomplex. No docking solution was found for the 2hflcomplex. When restricting the search area on theantigen to approximately half of the lysozyme mol-ecule, including the epitope, the procedure endedwith 11 solutions: the eleventh solution shows 1.12 rms from the lysozyme crystal structure in thecomplex. This result shows that the correct solutionranks lower than 5000th in the SHAPES scoring list,when the entire lysozyme molecule is used as probeprotein; however, in this case, ESCHER proposed nosolution. When restricting the search area on theprobe protein, the correct solution was in theSHAPES scoring list and could be successfully se-lected from the BUMPS and CHARGESmodules.ESCHER results with unbound components from

protein-protein complexes can be compared withothers.8,10 In the protease-inhibitor complex, Fischeret al. obtained 8490 solutions: the best solution (alsothe first correct solution) ranked 214th with an rmsof 1.32 . In the same case, the PUZZLE procedure8

proposed only four solutions: the best solution rankedsecond with an rms of 7.3 (4.9 when consideringonly the contacting region).In the antibody-antigen complex, Fischer et al.

obtained 8957 solutions: the best solution ranked2436th with 1.28 rms from the crystallographicposition; the best ranking correct solution ranked1497th with rms 5 1.97 . Using the PUZZLEprocedure, we had only three solutions: the correct

solution scored first with an rms of 5.4 (4.2 whenconsidering only the contacting region).The described results show that generally a low

rms can be obtained in a long list of possible solu-tions and that the choice of selecting only a fewsolutions, including the correct one, is usually paidin terms of higher rmsd. Both procedures, the geom-etry-based suite10 and PUZZLE,8 generally requirefour to five times less cpu time than ESCHER.

Docking of Secondary and SupersecondaryStructural Elements

The experiments described in the previous sectionshow that 1) ESCHER can reconstruct whole pro-teins starting from their separated crystal domainswhen the interdomain contact area is sufficientlylarge with respect to the total surface area and 2)ESCHER is able to dock unbound protein compo-nents whose structures display differences with re-spect to their conformation in the crystallographiccomplex.Next, we explored the feasibility of docking second-

ary and supersecondary structure elements, wherethe f value is generally higher than 19 3 1023

(Table IV).We chose three docking cases, in which the dock-

ing partners were as follows: 1) a secondary struc-ture element separated from its structural environ-ment and the remaining part of the protein. Weselected alpha helices extracted either from Rop, afour-helix bundle protein (1a/3a), or from the alphaand beta protein lysozyme (1a/ab); 2) two supersec-ondary structures forming a protein domain: thebeta sheets components of an immunoglobulin vari-able domain (1b/1b); and 3) two secondary structureelements: the two helices forming the Rop monomer(1a/1a).We performed docking experiments starting either

from the crystal structures or from approximatemolecular models. Three different types of models(Fig. 4) were utilized as described in Methods: 1)similar models are derived from crystal structuresby substituting each amino acid rotamer with themost similar rotamer from a small database of aminoacid rotamers.24 Their rmsd from the crystal struc-tures is approximately 0.30 . 2) In backbone mod-els, backbone dihedral angles are approximated to

TABLE III. UnboundProteinsResults

Referencefor complex

pdbstructure

Rank(rms*) ofthe firstcorrectsolution

Rank(rms) ofthe bestsolution

Number ofsolutions

2ovo/5cha 1cho 3 (1.9) 3 (1.9) 32hfl/6lyz 2hfl 03hfm/6lyz 3hfm 2 (1.0) 2 (1.0) 22hfl/6lyz 2hfl 11 (1.12) 11 (1.12) 113hfm/6lyz 3hfm 1 (1.0) 1 (1.0) 1

*The rmsd of the unbound molecules with their correspondingreference molecules is obtained after superposing the Ca atomsof the unbound ligand (receptor) on those of the complexedreference ligand (receptor).Antibody CDRs versus entire lysozyme molecule. We selecteda subset of the antibody solvent accessible surface, whosedistance from the antigen is less than 6 in the cocrystallizedstructure; regions of geometric complementarity were evalu-ated if sharing at least 1 2 with the defined subset.Antibody CDRs versus the lysozyme antibody binding site. Weselected a subset of the lysozyme solvent accessible surface,whose distance from the antibody is less than 6 in thecocrystallized structure; complementarity has been evaluatedfor regions sharing at least 1 2 with this defined subset. In thiscase, approximately half of the antigen molecule has beenconsidered in the docking test.Rank and rmsd of the first solution having an rmsd below2.5 .

TABLE IV. Structural ElementsDefinition

PDB code Target Probe f 3 1023

1a/3a 1ROP a1a30,b1b56

a31a56 28

1a/ab 6LYZ 186,103129

87102 18

1a/1a 3HFM L1L27,L60L77

L28L59,L78L108

30

1a/1a 1ROP a1a30 a31a56 20


geometrically regular alpha-helical angles(rms , 0.55 from the crystal structure in thedescribed cases). 3) In common models, amino acidrotamers are substituted with the most commonrotamer from the rotamer database (rms , 0.80 from the crystal structure). Each pair of these crystalor model structures has been used in an ESCHERdocking simulation. Target and probe were preparedas described above (for target and probe definition,see Table IV). Results are reported in Tables V andVI.When the crystallographic structures are used,

the correct solution always scores first. When thesimilar model structures are used, the correct solu-tion scores first in three of the four tested cases. The1a/1a correct solution scored second in a total of 18solutions proposed.

Backbonemodels were built only for the reconstruc-tion of the Rop four helix bundle: the correct solutionscored first in both the 1a/1a and 1a/3a dockingexperiments.Common structural models have been subjected to

the BUMPS and CHARGES modules under twodifferent conditions, described as low and high strin-gency in Methods. A larger number of molecularcollisions and a worse electrostatic complementaritywere tolerated in the low-stringency condition, whereashigh-stringency conditions corresponded to the onesused in the crystal, similar, and backbone experiments.When high-stringency conditions are used (Table

V), ESCHER produces a low rms solution in the firstranking position in the 1a/ab and 1b/1b test cases.In the other two cases tested, no correct solution wasobtained. When using low-stringency conditions(Table VI), the correct solution scores among the firstones in three of the four tested cases. In low-stringency conditions, a solution is considered cor-rect when its backbone rmsd from the crystal posi-tion is lower than 5 .Analysis of the solutions obtained with low-

stringency conditions reveals that, in the 1a/1a andin the 1b/1b cases, commonmodel structures displaya substantially lower degree of electrostatic comple-mentarity compared with the corresponding crystalstructure; in the 1a/3a case, both molecular clashesand lowering of electrostatic complementarity con-tribute to the lack of low rms solutions in theproposed solutions.We report the results obtained under both condi-

tions for discussion.Finally, a control test was performed in which a 30

amino acid long Rop helix is docked onto the lyso-zyme domain 2, which normally shelters a 15 aminoacid helix. ESCHER geometric module predicted theformation of a Rop/lysozyme mosaic protein, with agood geometric complementarity. However, the elec-trostatic module gave a very poor score of the electro-static complementarity between the two proteinportions and discarded it.In all the docking runs described, the target and

the probe were treated as if they were independentmolecules; nevertheless, distance constraints couldbe applied, thus discarding physically impossiblesolutions. In the 1a/3a and 1a/1a cases, when usingsimilar or common model structures, a number ofproposed solutions was antiparallel to the correctones. They were, therefore, discarded because aminoacids close in the primary sequence were too far inthe proposed solution.

DISCUSSION

ESCHER is a new rigid body docking algorithmthat was developed after PUZZLE.8 ESCHER differsfrom PUZZLE in the simplified description of theprotein surface as well as in the matching andscoring criteria. Both procedures rely on a geometric

TABLEV. Secondary StructureResults(High Stringency*)

Modelstructure

Rank (rms)of the

first correctsolution

Rank (rms)of the bestsolution

Numberof solutions

1a/3a crystal 1 (1.4) 1 (1.3) 81a/ab crystal 1 (1.2) 1 (1.0) 181b/1b crystal 1 (1.8) 1 (1.4) 91a/1a crystal 1 (2.3) 5 (1.5) 241a/3a similar 1 (1.4) 1 (1.4) 21a/ab similar 1 (1.1) 1 (0.9) 111b/1b similar 1 (1.4) 1 (1.3) 101a/1a similar 2 (1.8) 2 (1.7) 181a/3a backbone 1 (1.7) 1 (1.3) 81a/1a backbone 1 (2.1) 1 (1.6) 231a/3a common 1 (4.1) 21a/ab common 1 (1.0) 1 (1.0) 131b/1b common 2 (2.3) 1 (1.5) 71a/1a common 2 (8.1) 10

*High-stringency conditions: highly selective BUMPS andCHARGES thresholds (50 and 250, respectively).Rank and rmsd of the first solution having an rmsd below 2.5.Solutions not presenting the best CHARGES value of theirgroup.

TABLEVI. Secondary StructureResults(LowStringency*)

Rank (rms)of the best

ranking correctsolution

Rank (rms)of the bestsolution

1a/3a common 2 (3.8) 77 (2.4)1a/ab common 1 (3.0) 4 (1.0)1b/1b common 1 (3.1) 7 (1.5)1a/1a common 84 (4.0) 84 (4.0)

*Low-stringency conditions: nonselective BUMPS andCHARGES thresholds (110 and 2110, respectively); a solutionis considered correct when its backbone rmsd from the crystalposition is lower than 5 ; solutions are grouped, as describedin Methods, whenever their difference is lower than 5 .Rank and rmsd of the first solution having an rmsd below 5 .


approach, but ESCHER has been endowed with amolecular clashes (BUMPS) and an electrostatic(CHARGES) module, which are used as a selectivefilter on the geometrically favorable solutions. TheCHARGES electrostatic module allows a lower reso-lution definition of the protein surfaces and lessstringent criteria of geometric complementarity. Typi-cally, when docking unbound components, thou-sands of solutions are proposed by the geometricmodule, but only a few, including the correct one,survive screening with the electrostatic module. Thisillustrates the importance of adding this module tothe procedure.The application of ESCHER to a database of

two-domain proteins indicates that its performancecorrelates with the ratio between the buried surfacearea upon domain-domain complex formation andthe total surface (f). When starting from the crystallo-graphic coordinates of the target and probe domainswith f greater than 193 1023 (see Table II), the correctdocking solution always scores among the first few.ESCHER has been applied to the unbound compo-

nents of a small database of protein-protein com-plexes. The low resolution description of the proteinsurface permits recognition of correct protein inter-faces even when side-chainmovements upon bindingsubstantially alter the geometric and electrostaticcomplementarity of the interface.In comparison with other published docking proce-

dures,8,9,10,34 ESCHER generally requires longer com-putational times but can propose low rms solutionsamong a very restricted number of possible choices.This allows one to rank the small number of possiblesolutions by applying more sophisticated computa-tional methods.We have also explored the feasibility of applying

ESCHER to the reassociation of secondary and super-secondary structure elements within a protein struc-ture. We simulated a helix-helix and a helix-threehelices interaction within a four-helix bundle (Rop);a helix and the remaining part of the protein in analpha-betaprotein (lysozyme)andtwobeta-sheetswithina beta protein (an immunoglobulin variable domain).Initially, these docking experiments were per-

formed with secondary structure elements extractedfrom the crystallographic structure, without alteringthe conformation of the side chains and maintaininga rigid alpha-carbon backbone. In this ideal situa-tion, in which the crystallographically determinedcomplementarity is strictly maintained, ESCHERproved to be very effective. In a realistic dockingexperiment, however, the conformation of the sidechains in the complex is not known. Thus, to explorethe versatility of ESCHER and to simulate theassembly of a three-dimensional model from thedefinition of its secondary structure, we applied theESCHER procedure to the docking of secondarystructure elements after changing the conformationeither of the amino acid side chains or of the back-

bone. The results obtained with the similar andbackbonemodels are very satisfactory. With commonmodels (Table V), the results were satisfactory inhalf of the cases tested: changing the rotamer confor-mations to the most common ones found in thedatabase considerably lowered the geometric comple-mentarity of target and probe elements. It was,therefore, necessary to lower the stringency of theBUMPS and CHARGES selective filters (Table VI) toachieve acceptable results in three of the four dock-ing experiments.The performance obtained in the common 1a/1a

case deserves more discussion: this approach doesnot seem to be applicable to elements, such as singlealpha helices or beta strands. Work is in progress tosee whether, in these difficult cases, our procedurecould be supported and complemented by differentmethods, such as the use of backbone-dependentrotamer libraries35,36 or the identification of corre-lated mutations.37

The relatively low resolution of the best solutionsin the experiments performed with common models,however, are more appropriately to be comparedwith resolutions typical of methods that predictstructure from sequence rather than with dockingmethods. This confirms that, especially when consid-ering small docking elements, the side-chain confor-mation is not irrelevant to the definition of thegeometric and electrostatic complementarity. Thesuccess of the experiments with the similar models,however, reassures that, by exploring combinatorially asmall library of side-chains conformations, the probabil-ity of predicting the correct solution is very high.Other applications for the ESCHER procedure can

be envisaged. The structure of a growing number oflarge proteins is approached by experimentally deter-mining the conformation of their separate domains.We propose ESCHER to represent a useful tool toassist in the task of assembling the complete proteinstarting from the crystallographic structures of itsdomains.It is often found that the structure of newly

discovered proteins can be reliably predicted byhomology modeling. Alignment of the two primarysequences, however, may reveal that the protein ofunknown conformation contains peptide insertionswith respect to the protein in the database. When-ever the secondary structure of the inserted peptidecan be predicted, ESCHER can help in docking theorphan peptide onto the core structure determinedby homology modeling.ESCHER has already been exploited as a tool to

predict the formation of protein-protein complexes.In one application, starting from the coordinates ofthe crystallized proteins, we have predicted theconformation of the complex between the TP7 anti-body and its cognate antigen, Thermus aquaticusDNApolymerase.38


Furthermore, ESCHER has been used as a tool todesign and evaluate Myc mutants that can ho-modimerize (manuscript in preparation). Work is inprogress to address the possibility of speeding up theprocedure on parallel (or even massively parallel)machines.

ACKNOWLEDGMENTS

The authors thank Dr. A. Tramontano for helpfulsupport, advice, and discussion and Dr. A. Lahm forcritically reading the manuscript. We acknowledgethe expert computer support of M. Pasquariello andM. Pizzelli at Centro di Calcolo e Documentazione ofthe University of Rome Tor Vergata. This work hasbeen supported by the Supercomputing Resource forMolecular Biology, European Union Human Capitaland Mobility Programme, Access to Large ScaleFacilities grant, Contract ERBCHGECT940062, andby the CNR programme Development of innovativealgorithms for dynamic and static simulation ofmolecular recognition.

REFERENCES

1. Cherfils, J., Janin, J. Protein docking algorythms: Simulat-ing molecular recognition. Curr. Opin. Struct. Biol. 3:265269, 1993.

2. Strynadka, N.C., Eisenstein, M., Katchalski-Katzir, E.,Shoichet, B.K., Kuntz, I.D., Abagyan, R., Totrov, M., Janin,J., Cherfils, J., Zimmerman, F., Olson, A., Duncan, B., Rao,M., Jackson, R., Sternberg, M., James, M.N. Moleculardocking programs successfully predict the binding of abeta-lactamase inhibitory protein to TEM-1 beta-lacta-mase. Nat. Struct. Biol. 3:233239, 1996.

3. Shoichet, B., Kuntz, I.D. Protein docking and complementa-rity. J. Mol. Biol. 221:327346, 1991.

4. Bacon, D.J., Moult, J. Docking by least-squares fitting ofmolecular surface patterns. J. Mol. Biol. 225:849858,1992.

5. Lenhof, H.P. An algorithm for the protein docking problem.In: Bioinformatics: From Nucleic Acids and Proteins toCell Metabolism. Schomburg, D., Lessel, U. (eds.). GBF.Weinheim, 1995.

6. Jiang, F., Kim, S.H. Soft docking: Matching of molecularsurface cubes. J. Mol. Biol. 219:79102, 1991.

7. Walls, P.H., Sternberg, J.E. New algorythm to modelprotein-protein recognition based on surface complementa-rity. J. Mol. Biol. 228:277297, 1992.

8. Helmer-Citterich, M., Tramontano, A. PUZZLE: A newmethod for automated protein docking based on surfaceshape complementarity. J. Mol. Biol. 235:10211031, 1994.

9. Katchalski-Katzir, E., Shariv, I., Eisenstein, M., Friesem,A.A., Aflalo, C., Vakser, I.A. Molecular surface recognition:Determination of geometric fit between proteins and theirligands by correlation techniques. Proc. Natl. Acad. Sci.USA89:21952199, 1992.

10. Fischer, D., Lin, S.L., Wolfson, H.L., Nussinov, R. A geom-etry-based suite of molecular docking processes. J. Mol.Biol. 248:459477, 1995.

11. Cherfils, J., Duquerroy, S., Janin, J. Protein-protein recog-nition analyzed by docking simulation. Proteins 11:271280, 1991.

12. Jackson, R.M., Sternberg, M.J.E. A continuum model forprotein-protein interactions: Application to the dockingproblem. J. Mol. Biol. 250:258275, 1995.

13. Sobolev, V., Wade, R.C., Vriend, G., Edelman, M. Moleculardocking using surface complementarity. Proteins 25:120129, 1996.

14. Rosenfeld, R., Vajda, S., De Lisi, C. Flexible docking anddesign. Annu. Rev. Biophys. Biomol. Struct. 24:677700,1995.

15. Rarey, M., Kramer, B., Lengauer, T., Klebe, G. A fastflexible docking method using an incremental constructionalgorithm. J. Mol. Biol. 261:470486, 1996.

16. Sippl, M.J., Hendlich, M., Lackner, P. Assembly of polypep-tide and protein backbone conformations from low energyensembles of short fragments. Development of strategiesand construction of models for myoglobin, lysozyme andthymosin b-4. Protein Sci. 1:625640, 1992.

17. Monge, A., Friesner, R.A., Honig, B. An algorithm togenerate low-resolution protein tertiary structures fromknowledge of secondary structure. Proc. Natl. Acad. Sci.USA91:50275029, 1994.

18. Monge, A., Lathrop, E.J.P., Gunn, J.R., Shenkin, P.S.,Friesner, R.A. Computer modeling of protein folding: Con-formational and energetic analysis of reduced and detailedprotein models. J. Mol. Biol. 247:9951012, 1995.

19. Cohen, F.E., Richmond, T.J., Richards, F.M. Protein fold-ing: Evaluation of some simple rules for the assembly ofhelices into tertiary structures with myoglobin as anexample. J. Mol. Biol. 132:275288, 1979.

20. Cohen, F.E., Kuntz, I.D. Prediction of the three-dimen-sional structure of human growth hormone. Proteins 1:162166, 1987.

21. Callaway, D.J.E. Solvent-induced organization: a physicalmodel of folding myoglobin. Proteins 20:124138, 1994.

22. Connolly, M.L. Solvent-accessible surface of proteins andnucleic acids. Science 221:709713, 1983.

23. Connolly, M.L. Analytical molecular surface calculation. J.Appl. Crystallogr. 16:548558, 1983.

24. Ponder, J.W., Richards, F.M. Tertiary templates for pro-teins. Use of packing criteria in the enumeration of allowedsequences for different structural classes. J. Mol. Biol.193:775791, 1987.

25. Dayringer, H.E., Tramontano, A., Sprang, S.R., Fletterick,R.J. Interactive program for visualization and modelling ofproteins, nucleic acids and small molecules. J. Mol. Graph-ics 4:8287, 1986.

26. Holm, L., Sander, C. Parser for protein folding units.Proteins 19:256268, 1994.

27. Siddiqui, A.S., Barton, G.J. Continuous and discontinuousdomains:An algorithm for the automatic generation of reliableprotein domain definitions. Protein Sci. 4:872884, 1995.

28. Sowdhamini, R., Blundell, T.L.An automaticmethod involv-ing cluster analysis of secondary structures for the identifi-cation of domains in proteins. Protein Sci. 4:506520, 1995.

29. Swindells, M.B. A procedure for detecting structural do-mains in proteins. Protein Sci. 4:103112, 1995.

30. Bernstein, F.C., Koetzle, T.F., Williams, G.J.B., Meyer, E.F.,Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T.,Tasumi, M. The Protein Data Bank: A computer-basedarchival file for macromolecular structures. J. Mol. Biol.112:535542, 1977.

31. Banner, D.W., Kokkinidis, M., Tsernoglou, D. Structure ofthe ColE1 Rop protein at 1.7 resolution. J. Mol. Biol.196:657675, 1987.

32. Padlan, E.A., Silverton, E.W., Sheriff, S., Cohen, G.H.,Smith-Gill, S.J., Davies, D.R. Structure of an antibody-antigen complex. Crystal structure of the hy/hel10 fab-lysozyme complex. Proc. Natl. Acad. Sci. USA 86:59385942,1989.

33. Diamond, R. Real-space refinement of the structure of henegg-white lysozyme. J. Mol. Biol. 82:371391, 1974.

34. Vakser, I.A. Protein docking for low-resolution structures.Protein Eng. 8:371377, 1995.

35. Dunbrack, R.L., Karplus, M. Backbone-dependent rotamerlibrary for proteins. Application to side-chain prediction. J.Mol. Biol. 230:543574, 1993.

36. Dunbrack, R.L., Karplus, M. Conformational analysis ofthe backbone-dependent rotamer preferences of proteinsidechains. Nat. Struct. Biol. 1:334340, 1994.

37. Gobel, U., Sander, C., Scheneider, R., Valencia, A. Corre-lated mutations and contact in proteins. Proteins 18:309317, 1994.

38. Murali, R., Helmer-Citterich, M., Sharkey, D.J., Scalice, E.,Daiss, J., Murthy, K. Structural Studies on a InhibitoryAntibody Against Thermus aquaticus DNA PolymeraseSuggest Mode of Inhibition (submitted).


1997 ESCHER a New Docking Procedure Applied to the Reconstruction of Protein Tertiary Structure

Documents

Transcript of 1997 ESCHER a New Docking Procedure Applied to the Reconstruction of Protein Tertiary Structure