Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

7
proteins STRUCTURE FUNCTION BIOINFORMATICS Proteinprotein docking in CAPRI using ATTRACT to account for global and local flexibility Andreas May and Martin Zacharias * School of Engineering and Science, Jacobs University Bremen, D-28759 Bremen, Germany INTRODUCTION Knowledge of the structure of protein–protein complexes is of major importance to understand the biological function of protein–protein interactions. Experimental struc- ture determination of protein complexes, for example by X-ray crystallography, requires purification of large amounts of proteins and the ability to crystallize the protein–pro- tein complex, which may not be feasible for all known interacting proteins. The realistic prediction of protein–protein complex structures (protein–protein docking) is therefore of increasing importance. The CAPRI (Critical Assessment of Predicted Interaction) challenge 1–3 offers the opportunity to evaluate and compare different methods and protocols for protein–protein docking. We have developed the protein–protein docking approach ATTRACT 4,5 based on a reduced protein model with an emphasis on effi- cient and explicit consideration of conformational flexibility during protein-protein docking. Most other protein-protein docking approaches employ rigid partners in a first docking phase followed by a flexible refinement step (reviewed in Ref. 6). During docking, the protein partners are represented by several (up to three) pseudo atoms per amino acid residue. Docking calculations take into account not only the surface com- plementarity but also the physico-chemical character of interacting amino acids. Sys- tematic docking is performed by energy minimization starting from thousands of start configurations. The reduced protein model representation contains fewer docking energy minima on the protein partners and allows for much more rapid energy mini- mization compared to an atomic resolution representation. The docking approach involves also the translation of the complex structures to an atomic resolution represen- tation and subsequent fully flexible refinement and re-evaluation. Recently, we included and tested the possibility of accounting efficiently for global flex- ibility during systematic docking searches. 7–9 This was achieved by extracting soft global degrees of freedom from a normal mode analysis of protein partners based on an Aniso- tropic Elastic Network description of the proteins. 10–14 It has been found in previous studies that soft modes from Anisotropic network models (ANM) frequently overlap quite well with observed global conformational changes in proteins. 13–16 In our docking approach, a subset of softest modes (five modes) from an ANM can be used as addi- tional energy minimization variables to allow conformational relaxation of the protein partners in global flexible degrees of freedom. 9 This explicit optimization of global flexi- ble degrees of freedom during docking was achieved at a very modest additional compu- tational cost (slows down the docking search approximately by a factor of 2–3). 9 Although the result of systematic docking studies showed improvement compared to The authors state no conflict of interest. Grant sponsor: Deutsche Forschungsgemeinschaft (DFG). *Correspondence to: Martin Zacharias, School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, D-28759 Bremen, Germany. E-mail: [email protected] Received 4 June 2007; Revised 19 July 2007; Accepted 20 July 2007 Published online 5 September 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21735 ABSTRACT A reduced protein model com- bined with a systematic dock- ing approach has been employed to predict protein– protein complex structures in CAPRI rounds 6–11. The docking approach termed ATTRACT is based on energy minimization in translational and rotational degrees of free- dom of one protein with respect to the second protein starting from many thousand initial protein partner place- ments. It also allows for ap- proximate inclusion of global flexibility of protein partners during systematic docking by conformational relaxation of the partner proteins in precal- culated soft collective back- bone degrees of freedom. We have submitted models for six targets, achieved acceptable docking solutions for two tar- gets, and predicted >20% cor- rect contacts for five targets. Possible improvements of the docking approach in particu- lar at the scoring and refine- ment steps are discussed. Proteins 2007; 69:774–780. V V C 2007 Wiley-Liss, Inc. Key words: protein–protein interaction; induced fit; aniso- tropic network model; docking minimization; protein–protein complex formation. 774 PROTEINS V V C 2007 WILEY-LISS, INC.

Transcript of Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

Page 1: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

proteinsSTRUCTURE O FUNCTION O BIOINFORMATICS

Protein–protein docking in CAPRI usingATTRACT to account for global andlocal flexibilityAndreas May and Martin Zacharias*

School of Engineering and Science, Jacobs University Bremen, D-28759 Bremen, Germany

INTRODUCTION

Knowledge of the structure of protein–protein complexes is of major importance to

understand the biological function of protein–protein interactions. Experimental struc-

ture determination of protein complexes, for example by X-ray crystallography, requires

purification of large amounts of proteins and the ability to crystallize the protein–pro-

tein complex, which may not be feasible for all known interacting proteins. The realistic

prediction of protein–protein complex structures (protein–protein docking) is therefore

of increasing importance. The CAPRI (Critical Assessment of Predicted Interaction)

challenge1–3 offers the opportunity to evaluate and compare different methods and

protocols for protein–protein docking. We have developed the protein–protein docking

approach ATTRACT4,5 based on a reduced protein model with an emphasis on effi-

cient and explicit consideration of conformational flexibility during protein-protein

docking. Most other protein-protein docking approaches employ rigid partners in a

first docking phase followed by a flexible refinement step (reviewed in Ref. 6). During

docking, the protein partners are represented by several (up to three) pseudo atoms per

amino acid residue. Docking calculations take into account not only the surface com-

plementarity but also the physico-chemical character of interacting amino acids. Sys-

tematic docking is performed by energy minimization starting from thousands of start

configurations. The reduced protein model representation contains fewer docking

energy minima on the protein partners and allows for much more rapid energy mini-

mization compared to an atomic resolution representation. The docking approach

involves also the translation of the complex structures to an atomic resolution represen-

tation and subsequent fully flexible refinement and re-evaluation.

Recently, we included and tested the possibility of accounting efficiently for global flex-

ibility during systematic docking searches.7–9 This was achieved by extracting soft global

degrees of freedom from a normal mode analysis of protein partners based on an Aniso-

tropic Elastic Network description of the proteins.10–14 It has been found in previous

studies that soft modes from Anisotropic network models (ANM) frequently overlap

quite well with observed global conformational changes in proteins.13–16 In our docking

approach, a subset of softest modes (�five modes) from an ANM can be used as addi-

tional energy minimization variables to allow conformational relaxation of the protein

partners in global flexible degrees of freedom.9 This explicit optimization of global flexi-

ble degrees of freedom during docking was achieved at a very modest additional compu-

tational cost (slows down the docking search approximately by a factor of 2–3).9

Although the result of systematic docking studies showed improvement compared to

The authors state no conflict of interest.

Grant sponsor: Deutsche Forschungsgemeinschaft (DFG).

*Correspondence to: Martin Zacharias, School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, D-28759

Bremen, Germany. E-mail: [email protected]

Received 4 June 2007; Revised 19 July 2007; Accepted 20 July 2007

Published online 5 September 2007 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/prot.21735

ABSTRACT

A reduced protein model com-

bined with a systematic dock-

ing approach has been

employed to predict protein–

protein complex structures

in CAPRI rounds 6–11. The

docking approach termed

ATTRACT is based on energy

minimization in translational

and rotational degrees of free-

dom of one protein with

respect to the second protein

starting from many thousand

initial protein partner place-

ments. It also allows for ap-

proximate inclusion of global

flexibility of protein partners

during systematic docking by

conformational relaxation of

the partner proteins in precal-

culated soft collective back-

bone degrees of freedom. We

have submitted models for six

targets, achieved acceptable

docking solutions for two tar-

gets, and predicted >20% cor-

rect contacts for five targets.

Possible improvements of the

docking approach in particu-

lar at the scoring and refine-

ment steps are discussed.

Proteins 2007; 69:774–780.VVC 2007 Wiley-Liss, Inc.

Key words: protein–protein

interaction; induced fit; aniso-

tropic network model; docking

minimization; protein–protein

complex formation.

774 PROTEINS VVC 2007 WILEY-LISS, INC.

Page 2: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

rigid docking, it was also recognized that simultaneous

explicit inclusion of global and local (side chain) flexibility

might be required to achieve best possible results.8,9

Within the same time frame as the CAPRI rounds, we

gradually developed and implemented the new methods

to include global flexibility during docking and applied

them to several of the CAPRI targets.

We have participated in the CAPRI rounds 6–11, and

in the following, we report our predictions of the com-

plex structures for targets 20, 21, and 24–27.

MATERIALS AND METHODS

The reduced protein model and the ATTRACT dock-

ing program have been described in detail in previous

publications [4] and only a brief description of the

approach and the docking protocol is given in the fol-

lowing.

In a first step, the protein partner coordinates are

translated into a reduced protein presentation consisting

of up to three pseudo atoms per amino acid residue.

One pseudo atom represents the protein backbone

(located at the Ca position). Small amino acid side

chains (Ala, Asp, Asn, Cys, Ile, Leu, Pro, Ser, Thr, Val)

are represented by one pseudo atom (geometric mean of

side chain heavy atoms). Larger and more flexible side

chains are represented by two pseudo atoms to better

account for the shape and dual chemical character of

some side chains.4 Effective interactions between pseudo-

atoms are described by soft distance-dependent Lennard–

Jones (LJ)-type potentials (A/r8-B/r6-potential). The re-

pulsive and attractive LJ-parameters describe approxi-

mately the size and physico-chemical character of the

side chain chemical groups.

Recently, the possibility to account approximately but

very efficiently for global conformational changes during

docking was implemented.8,9 In this case, protein part-

ner structures can relax (deform) along precalculated soft

collective degrees of freedom during the docking search.

The soft collective degrees of freedom corresponded to

eigenvectors of the proteins calculated using an approxi-

mate normal-mode analysis method developed by Hinsen

(harmonic potential model)12 related to Anisotropic

Elastic Network models.10–14 The normal modes were

calculated with respect to the protein backbone (Ca

atoms), and the side chains follow the same global

motion as the corresponding Ca atoms.

For systematic docking studies, one of the proteins

(usually the smaller protein, called the ligand protein) was

used as probe and placed at various positions and various

orientations on the surface of the second fixed (receptor)

protein. A probe radius was chosen that was slightly larger

than the maximum distance of any atom from the ligand

center. At each starting position on the receptor protein,

various initial ligand protein orientations were generated.

The docking from each start position consisted of a series

of energy minimizations in translational and rotational

degrees of freedom of the ligand protein with respect

to the receptor protein. Typically, between 40,000 and

100,000 start configurations were energy-minimized.

Approximately 10,000–15,000 complexes (in case of me-

dium-sized protein partners with <200 residues) can be

energy-minimized to low residual gradients in about 1 h

on a high-end Linux PC.

Experimental data and knowledge of possible residues

involved in protein–protein interaction can be taken into

account at various stages of the docking procedure. This

includes the possibility to restrict the search to regions

that are known to interact with the second protein part-

ner or distance restraints that enforce a putative contact.

To obtain docked protein–protein complexes at atomic

resolution, the protein partner structures were superim-

posed onto the docking solutions using the reduced rep-

resentation. Amino acid side chain conformations at the

protein–protein interface were adjusted using the Swiss-

PdbViewer program,17 and the resulting protein com-

plexes were finally energy-minimized using the Sander

program from the Amber8 package.18 During energy

minimization, a Generalized Born (GB) model was

employed to implicitly account for solvation effects as

implemented in Amber8.

RESULTS AND DISCUSSION

Targets and predictions

In the CAPRI rounds 6–11, we submitted predictions

for targets 20, 21, and 24–27 (the docking challenge for

targets 22–23 was cancelled before the CAPRI submission

deadline). A summary of the predictions is given in Table

I. In the following, we discuss our results and the diffi-

culties we encountered with some of the targets.

Target 20 (HemK–eRF1)

Methylation of a glutamine side chain at a specific tar-

get sequence (GGQ-motive) of polypeptide release factors

(RFs) modulates peptide chain release activity of the

release factors.19 Specific methylation is catalyzed by

a protein methyltransferase (PrmC). Target 20 corre-

sponded to the complex between a bacterial (Escherichia

Coli) polypeptide release factor (eRF1) and a methyl-

transferase (Hemk).20 For Hemk, a structure in the

unbound form was available (in complex with a single

glutamine residue). No experimental structure of the

eRF1 was available but a homology modeled structure

could be generated based on the coordinates for release

factor 2 (pdb1gqe) using the SWISS-Modeling-Server.21

Since the active site of the Hemk enzyme and a Gly-Gly-

Gln (GGQ) loop segment in eRF1 supposed to interact

with the active site of the Hemk enzyme were known, an

approximate binding region could be deduced to focus

Flexible Protein–Protein Docking

DOI 10.1002/prot PROTEINS 775

Page 3: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

the docking search on putative binding regions. However,

the GGQ motif containing loop region in the RF2 tem-

plate structure differs dramatically from the loop struc-

ture in the complex between eRF1 and Hemk.20 The

conformational change corresponds basically to a com-

plete refolding of a peptide loop segment. This structural

change goes beyond what can be tolerated in our reduced

protein model representation and also cannot be covered

by relaxation of precalculated soft normal modes. Inter-

estingly, for the Hemk enzyme, a quite substantial over-

lap between soft ANM modes calculated for the unbound

form and the conformational difference between un-

bound and bound form was observed. The Rmsd (back-

bone) between a best deformed unbound structure in

terms of the five softest ANM modes and the bound

structure was 1.2 A compared to 1.7 A for unbound ver-

sus bound Hemk structures (Table I, Column 3). A more

recently developed approach based on a multiple confor-

mational copy representation of loop structures might be

applicable in case of the eRF1 partner.22 In this method,

a loop segment can be represented by several sterically

possible loop structures that determine a meanfield dur-

ing the docking search. If one of the copies is sufficiently

close to the bound loop structure, it can be selected as

the most favorable conformational copy during the dock-

ing minimization. However, this method was not fully

implemented during Capri round 6 and therefore not

applied. Our best predicted complex structure for target

21 had 26% correct contacts but a large interface back-

bone Rmsd (I_rmsd) of 9.8 A (Table I).

Target 21 (Orc1p–Sir1)

The Silent information regulator 1 protein (Sir1) plays

an important role to establish silent chromatin by binding

to the chromatin origin recognition complex subunit 1

(Orc1p).23 The docking task was to predict the Sir1-

Orc1p complex structure based on the structures of the

isolated partner proteins (unbound structures). Experi-

mental data on Orc1p mutations and hybrid structures

allowed assigning the putative Sir1 binding region to

the helical (H)-domain of Orc1p or to the interface of

H-domain and the bromo-adjacent homology (BAH) do-

main.24 For the Sir1 protein, mutagenesis data gave addi-

tional hints to the interaction region on the Sir1 pro-

tein.25 The experimental data was used to restrict the sys-

tematic search to putative binding regions on the two

protein partners. Both proteins undergo conformational

changes when comparing bound and unbound protein

partner structures.23 The conformational changes involve

simultaneous local changes in the protein backbone and

side chain but also significant global changes especially in

Orc1p (hinge motion of the H-domain with respect to the

BAH domain).23 Although in several of our docking solu-

tions the binding region was correctly covered, the best

predicted complex structure was still incorrect with an

interface Rmsd of 5.1 A and 34% correct contacts (Table I,

Fig. 1). In a recent study, we demonstrated that the

observed conformational difference between the bound

and unbound structures of the Orc1 protein showed

indeed overlap with soft modes calculated from an ANM

of the protein (Ref. 9, see also Table I and Fig. 2). How-

ever, although this resulted in an improved ranking, it

did not improve the placement of the protein (deviation

from experiment) during docking presumably due to

additional local conformational changes of backbone and

side chains at the Sir1–Orc1p interface (Table II).

Figure 1Side-by-side comparison of predicted (left panels) and experimental (right

panels) complex structures for three CAPRI targets (cartoon representation). (A)

Target 21: Origin recognition complex subunit 1 (Orc1p, gray) in complex with

the Silent information regulator 1 (Sir1, black).23 (B) Target 26: Transport

protein B (TolB, gray) in complex with the peptidoglycan-associated lipoprotein

1 (Pal1, black).26 (C) Target 27: Ubiquitin-conjugating enzyme E2-25 kDa

(Hip2, black) in complex with conjugating enzyme Sumo-1 (Ubc9, gray).27 For

target 27, the top ranking prediction and for targets 21 and 26, predictions 3

and 2, respectively, are shown.

May and Zacharias

776 PROTEINS DOI 10.1002/prot

Page 4: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

We re-evaluated this target by performing systematic

docking searches including approximately global flexibility

on both partner proteins (including the five softest modes

from an ANM analysis12 of both unbound proteins). In

case of the Orc1p protein, the five softest modes employ-

ing an ANM type model according to Hinsen12 show an

overlap of �30% with the backbone conformational

changes observed during complex formation [Table I, Fig.

2(A)]. In addition, side chain flexibility was approximately

accounted for by using a multicopy representation of each

surface side chain. During docking minimization, the

ATTRACT program allows to select the best fitting side

chain rotamer copy at the protein–protein interface. In

this way, a simultaneous optimization of docking geome-

try and side chain conformation at the interface is possi-

ble. With a simultaneous treatment of both global flexibil-

ity and side chain flexibility, the docking approach now

results in docking solutions quite close to the experimen-

tal geometry (I_Rmsd 5 2.8 A, Table II). However, this

comes at the cost of a significantly worse scoring of the

solution closest to experiment compared to rigid docking

or accounting only for global protein flexibility (Table II).

Closer inspection of the protein–protein interface and

comparison of the bound and unbound form of the Sir1

protein indicates a critical conformational change of the

Tyr489 side chain upon binding. In addition, the local

backbone structure at the Tyr489 differs between bound

and unbound Sir1 structures [Fig. 2(B)]. Therefore, none

of the side chain conformational copies employed during

docking is a good match for the Tyr489 side chain confor-

mation in the bound form [illustrated in Fig. 2(B)]. It is

likely that this coupled local backbone and side chain con-

formational change, not yet covered in our flexible dock-

ing approach, causes the significant deviation of the

docked complex from experiment and/or the unsatisfac-

tory scoring of the complex closest to experiment.

Targets 24 and 25 (ArfBD–ARF1GTPase)

The ARF1 (ADP-Ribosylatin-Factor 1)-binding domain

(ArfBD) is part of the ARHGAP21 protein and is a bind-

ing partner of the ARF1-GTPase protein.28 The ARF1-

GTPase protein was given in the unbound form and for

the partner protein domain (ArfBD) either a homology

model had to be generated (target 24) or it was given in

the bound form (target 25). Homology modeling was

performed using the pleckstrin-homology domain

(pdbentry:1BTW) as template structure. Because of the

limited target-template sequence similarity (30%) and a

C-terminal a-helical segment in ArfBD that was absent

in the template, the homology modeling (using SWISS-

Modeling-Server21) resulted only in a low quality model

for ArfBD. None of our docking solutions came close to

the experimental binding mode. For target 25 (ArfBD in

bound form), our best predicted model had an interface

Rmsd of 4.4 A from experiment; however, because of the

small fraction of correct native contacts (22%), it counts

as an incorrect prediction. Inspection of the set of gener-

ated docking solutions indicated that our scoring func-

tion did not pick up complexes in closer agreement with

experiment as most favorable solutions.

Target 26 (TolB–PAL)

The Peptidoglycan-associated lipoprotein (Pal protein)

is associated with the inner leaflet of the outer membrane

Figure 2(A) Comparison of unbound, bound, and best possible approximation of the

bound structure by deformation of the unbound protein structures in the five

softest ANM normal modes. Unbound, bound, and best possible deformed

structures are shown in red, green, and gray, respectively, for the Orc1p protein

and in brown, yellow, and blue, respectively, for the Sir1 protein. The backbone

Rmsd of unbound versus bound structures is 1.3 A for the Orc1p and 0.9 A for

the Sir1 proteins, respectively. For the best possible approximation of the bound

forms by deforming the unbound structures in the five softest modes, the

backbone deviation from the corresponding bound structures is 1.0 A (Orc1p)

and 0.7 A (Sir1), respectively. (B) Comparison of the Tyr489 side chain

conformation of Sir1 at the protein–protein interface in the bound structure

(light gray) and the side chain rotamer copies included during the systematic

docking search (blue). Side chains on Orc1p that contact Tyr489 in the complex

are indicated in black (part of the backbone structure is shown as light gray

cartoon).

Flexible Protein–Protein Docking

DOI 10.1002/prot PROTEINS 777

Page 5: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

of Escherichia coli and can form a complex with the peri-

plasmic Transport protein B (TolB).26 Both proteins

were provided as unbound structures. Experimental in-

formation in form of mutagenesis data was available that

allowed to approximately locate likely binding regions on

both partner proteins. For TolB, mutagenesis data29 indi-

cated a binding region at the center of the six-bladed b-propeller domain of TolB (Fig. 1). In case of Pal, experi-

mental evidence on deletion mutations indicated a region

involving residues 94–121 to participate in binding to

TolB.30 This experimental information allowed for a sig-

nificant reduction of the docking search space. We found

for this target several acceptable solutions. The best solu-

tion had a backbone interface Rmsd of 2.1 A and 45%

correct native contacts (Table I). Both protein partner

structures undergo side chain and backbone conforma-

tional changes upon binding. A quite substantial overlap

between soft modes calculated for the unbound structure

of TolB and the conformational difference between

unbound and bound protein structures was observed

(Table I, Column 3). However, this affected mainly a

slight domain rearrangement between the TolB b-propel-ler domain and a second segment of the protein not

involved in Pal binding. When comparing only the b-propeller domain binding region of TolB, the backbone

Rmsd between bound and unbound form amounts to

0.97 A. The backbone conformational changes upon pro-

tein binding correspond mainly to local changes in loop

regions of the TolB b-propeller domain that cannot be

well approximated by a few soft modes obtained from an

ANM analysis and also do not significantly alter the rec-

ognition surface of the binding region. The reason for

Table IISystematic Docking on Target 21 Including Approximately Global Backbone and Side Chain Flexibility

No side chain rotamers(side chain conformations of the unbound structures) Side chain rotamers on both proteins

Rigidpartners

Global flexibilityOrc1p

Global flexibilityOrc1p and Sir1

Rigidbackbone

Global flexibilityOrc1p and Sir1

L_rmsd (�) 6.3 7.7 7.2 6.2 4.5R_rmsd (�) 1.3 1.1 1.2 1.3 1.1I_rmsd (�) 5.3 5.0 3.6 5.8 2.3Rank 36 4 11 143 224

Systematic docking minimizations were started from �50,000 start configurations using rigid protein partners or allowing conformational relaxation of the partner pro-

teins in the five softest normal modes obtained from a GNM analysis (Global flexibility) of the partner proteins (see Materials and Methods section and Ref. 9). Side

chain optimization was performed by a switching approach to select for the best fitting side chain rotamers at the protein–protein interface during docking minimiza-

tion (see Ref. 4). L_rmsd corresponds to the deviation of the ligand (Sir1) Ca-atoms from the experimental placement23 after best superposition of the complex on the

partner protein (Orc1p). R_rmsd indicates the Ca-Rmsd of the receptor protein (Orc1p) from the structure in the experimental complex and I_rmsd indicates the

Rmsd of all atoms with 5 A of the protein–protein interface from experiment.

Table IResults of CAPRI Predictions

Target Receptor–ligand

Receptor rmsd (�)unbound versusbound/unbounddeformed in fivemodes versus

bound

Ligand rmsd (�)unbound versusbound/unbounddeformed in fivemodes versus

boundBestmodel

%Correctcontacts

I_rmsd(�) Quality

20 (Hemk 1/eRF1)a (unbound/homology model) 1.7/1.2 – 4 26 9.8 –21 (Orc1p/Sir1)b (unbound/unbound) 1.3/1.0 0.9/0.7 3 34 5.1 –24 (ARF1GTPase-ArfBD)c (unbound/homology model) 0.5/0.4 – 4 2 9.4 –25 (ARF1GTPase-ArfBD)c (unbound/bound) 0.5/0.4 – 9 21 4.4 –26 (TolB-Pal)d (unbound/unbound) 1.7/1.1 0.6/0.4 2 45 2.1 Acceptable27 (Hip2/Ubc9)e (unbound/unbound) 0.9/0.6 0.8/0.6 1 39 3.6 Acceptable

I_rmsd is the root mean square deviation between prediction and experiment of protein backbone atoms within 10 A of the protein–protein interface.

Columns 3 and 4 report the backbone (Ca) Rmsd of unbound versus bound conformation of the first (receptor) and second (ligand) protein partner, respectively. In

addition, the Rmsd between the bound conformation and the unbound structure deformed in the five softest ANM modes to best approximate the bound structure is

also reported. No Rmsd data is given for the ligand proteins of targets 20, 24, and 25 because these were either homology-modeled or given in the bound form.aComplex formed by the methyltransferase Hemk and the polypeptide release factor 1 (eRF1) from Escherichia coli.20

bOrigin recognition complex subunit 1 (Orc1p) in complex with the Silent information regulator 1 (Sir1).23

cARF1-GTPase in complex with the ArfBD domain of the ARHGAP21 protein.28

dTransport protein B (TolB) in complex with Peptidoglycan-associated lipoprotein (Pal).26

eUbiquitin-conjugating enzyme E2-25 kDa (Hip2) complex to conjugating enzyme Sumo-1 (Ubc9).27

May and Zacharias

778 PROTEINS DOI 10.1002/prot

Page 6: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

obtaining only a docking solution in acceptable agree-

ment with experiment might be attributable to the

reduced protein model we are employing during docking

searches. An improvement of the scoring function and

our atomic resolution refinement protocol may help to

achieve more accurate final docking solutions.

Target 27 (Hip2–Ubc9)

Post-translational modification of proteins with small

ubiquitin-related modifier (SUMO) influences the func-

tion of many proteins and has emerged as a critical signal-

ing system for protein degradation and cell stability.31

SUMOylation of target proteins requires the Ubiquitin-

conjugating enzyme Ubc927,31 and the task was to predict

the structure of Ubc9 in complex with the Ubiquitin-con-

jugating enzyme E2-25k (Hip2) that is a target substrate

for Ubc9. Experimental data on the active site cystein resi-

due (Cys93) on Ubc9 and a lysine residue (Lys14) on

Hip2 that acts as the accepting residue for SUMOylation

was available.31 This experimental information was

included to restrict the docking search to the region sug-

gested by the experimental information. The analysis of

the crystal structure of the complex suggested two possible

binding modes and at present the biological state is

unknown.27 In the first binding mode, the protein–pro-

tein interaction geometry differs significantly from the

substrate recognition geometry supported by biochemical

data.31 None of our docking solutions overlapped with

the first binding geometry suggested by the crystal struc-

ture of the complex. However, several of our predicted

docking geometries were close to a second possible bind-

ing geometry extracted from the crystal structure analy-

sis.27 The best prediction had a backbone interface Rmsd

of 3.6 A and 39% native interface contacts and was con-

sidered as acceptable docking prediction (Table I, Fig. 1).

The comparison of bound and unbound partner struc-

tures indicated conformational changes of backbone and

side chains upon complex formation. However, similar to

target 26, the backbone conformational changes are rather

modest in case of Ubc9 (<0.9 A) or corresponded to sig-

nificant localized backbone changes limited to the binding

region in case of Hip2. As indicated in the Introduction

section, such local backbone conformational changes typi-

cally do not overlap well with global soft modes obtained

from an ANM analysis of the protein structure. Similar to

our conclusion concerning target 26, an improvement of

our refinement protocol for generating atomic resolution

structures based on the geometries obtained from the

reduced model docking may produce docking geometries

of higher accuracy.

CONCLUSIONS

The application of our systematic docking minimiza-

tion approach combined with a reduced protein model

resulted in acceptable predictions for two targets (26 and

27). However, for most other targets, at least some dock-

ing solutions with significant fraction of native contacts

as observed in the experimental complex were identified,

but the accuracy was insufficient for an acceptable solu-

tion. The recent implementation to include explicitly

global flexibility by allowing for conformational relaxa-

tion during systematic docking searches had little influ-

ence on our docking results. Some of the protein struc-

tures underwent global conformational changes upon

complex formation and these changes showed overlap

with precalculated soft modes obtained from an ANM

analysis of the unbound from. However, with the excep-

tion of target 21, these changes either occurred in regions

apart form the protein binding sites (e.g. tolB protein) or

substantial local conformational changes of the protein

partners had a far greater influence on the docking result

than global protein backbone changes (e.g. eRF1 protein

in target 20). Consequently, inclusion of global soft

modes during docking did not improve the docking per-

formance in these cases. For target 21, we tested the

explicit inclusion of both global and local (side chain)

flexibility that resulted at least in an improved docking

geometry for this case. The possibility to simultaneously

treat both side chain and global flexibility efficiently in

our docking approach will be investigated more system-

atically in the future.

However, the results of the CAPRI challenge clearly

indicate that both the scoring function used during dock-

ing (within the reduced protein model) but also the

refinement at atomic resolution needs further improve-

ment. The current scoring function consists of pairwise

interaction potentials that cannot account realistically for

desolvation effects during complex formation. As a first

step for improvement, it is possible to include a surface

area-based solvation term to improve scoring of protein

complexes.

During the Capri rounds 6–11, a very simple atomic

resolution refinement method was employed (using

energy minimization with the Amber package). Energy

minimization at atomic resolution (in Cartesian coordi-

nates) usually leads only to marginal changes in the com-

plex structure. A combination with molecular dynamics

simulations or advanced sampling strategies like potential

scaling at the protein–protein interface32,33 might also

help to improve the accuracy of the docking results.

ACKNOWLEDGMENTS

We thank the organizers of the CAPRI challenge for

this opportunity and the assessors for the hard work to

evaluate the predictions. We thank all structural biolo-

gists who contributed target structures for the CAPRI

experiment. We also thank A. Saladin, and Drs. K. Bas-

tard, C. Prevost for helpful discussions.

Flexible Protein–Protein Docking

DOI 10.1002/prot PROTEINS 779

Page 7: Protein–protein docking in CAPRI using ATTRACT to account for global and local flexibility

REFERENCES

1. Janin J. Welcome to CAPRI; a critical assessment of predicted inter-

actions. Proteins 2002;47:257.

2. Wodak SJ, Mendez R. Prediction of protein–protein interactions:

the CAPRI experiment, its evaluation and implications. Curr Opin

Struct Biol 2004;14:242–249.

3. Mendez R, Leplae R, Lensink MF, Wodak SJ. Assessment of CAPRI

predictions in rounds 3–5 shows progress in docking procedures.

Proteins 2005;60:150–169.

4. Zacharias M. Protein–protein docking with a reduced protein model

accounting for side-chain flexibility. Protein Sci 2003;12:1271–1282.

5. Zacharias M. ATTRACT: protein–protein docking in CAPRI using a

reduced protein model. Proteins 2005;60:252–256.

6. Smith GR, Sternberg MJE. Prediction of protein–protein interac-

tions by docking methods. Curr Opin Struct Biol 2002;12:28–35.

7. Zacharias M. Rapid Protein–ligand docking including soft degrees

of freedom from molecular dynamics simulations to account for

protein flexibility: FK506 binding to FKBP binding protein as an

example. Proteins 2004;54:759–767.

8. May A, Zacharias M. Accounting for protein deformability during

protein–protein and protein–ligand docking. Biochim Biophys Acta

2005;1754:225–231.

9. May A, Zacharias M. Energy minimization in low-frequency normal

modes to efficiently allow for global flexibility during systematic

protein–protein docking. Proteins, E-pub ahead of print; 10.1002/

prot.21579.

10. Tirion MM. Large amplitude elastic motions in proteins from a sin-

gle-parameter atomic analysis. Phys Rev Lett 1996;77:1905–1908.

11. Bahar I, Atilgan AR, Erman B. Direct evaluation of thermal fluctua-

tions in proteins using a single-parameter harmonic potential. Fold

Des 1997;2:173–181.

12. Hinsen K. Analysis of domain motions by approximate normal

mode calculations. Proteins 1998;33:417–429.

13. Tama F, Sanejouand YH. Conformational change of proteins arising

from normal mode calculations. Protein Eng 2001;14:1–6.

14. Tobi D, Bahar I. Structural changes involved in protein binding cor-

relate with intrinsic motions of proteins in the unbound state. Proc

Natl Acad Sci USA 2005;102:18908–18913.

15. Bahar I, Rader EJ. Coarse-grained normal mode analysis in struc-

tural biology. Curr Opin Struct Biol 2005;15:586–592.

16. Lindahl E, Delarue M. Refinement of docked protein–ligand and

protein–DNA structures using low-frequency normal mode ampli-

tude optimization. Nucleic Acids Res 2005;33:4496–4506.

17. Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an

environment for comparative protein modeling. Electrophoresis

1997;18:2714–2723.

18. Case D, Pearlman DA, Caldwell JW, Cheatham III, TE, Ross WS,

Simmerling CL, Darden TA, Merz KM, Stanton RV, Cheng AL, Vin-

cent JJ, Crowley M, Tsui V, Radmer RJ, Duan Y, Pitera J, Massova

I, Seibel GL, Singh UC, Weiner PK, Kollman PA. Amber 8. Univer-

sity of California, 2003, San Francisco.

19. Frolova LY, Tsivkovskii RY, Sivolobova GF, Oparina NY, Serpinsky

OI, Blinov VM, Tatkov SI, Kisselev LL. Mutations in the highly

conserved GGQ motif of class 1 polypeptide release factors abolish

ability of human eRF1 to trigger peptidyl-tRNA hydrolysis. RNA

1999;5:1014–1020.

20. Graille M, Heurgue-Hamard V, Champ S, Mora L, Scrima N,

Ulryck N, van Tilbeurgh H, Buckingham RH. Molecular basis for

bacterial class I release factor methylation by PrmC. Mol Cell 2005;

20:917–927.

21. Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: an auto-

mated protein homology-modeling server. Nucleic Acids Res 2003;

31:3381–3385.

22. Bastard K, Prevost C, Zacharias M. Accounting for loop flexibility

during protein–protein docking. Proteins 2006;62:956–969.

23. Hou Z, Bernstein D, Fox CA, Keck JL. Structural basis of the Sir1-

origin recognition complex interaction in transcriptional silencing.

Proc Natl Acad Sci USA 2005;102:8489–8494.

24. Zhang Z, Hayashi MK, Merkel O, Stillman B, Xu RM. Structure

and function of the BAH-containing domain of Orc1p in epigenetic

silencing. EMBO J 2002;21:4600–4611.

25. Gardner KA, Rine J, Fox CA. A region of the Sir1 protein dedicated

to recognition of a silencer and required for interaction with Orc1

protein in Saccharomyces cerevisiae. Genetics 1999;151:31–44.

26. Bonsor DA, Grishkovskaya I, Dodson EJ, Kleanthous C. Molecular

mimicry enables competitive recruitment by natively disordered

protein. J Am Chem Soc 2007;129:4800–4807.

27. Walker JR, Avvakumov GV, Xue S, Newman EM, Mackenzie F,

Weigelt J, Sundstrom M, Arrowsmith CH, Edwards AM, Bochkarev

A, Dhe-Paganon SA. Novel and unexpected complex between the

SUMO-1-conjugating enzyme UBC9 and the ubiquitin-conjugating

enzyme E2-25 kDa, in press.

28. Menetrey J, Perderiset M, Cicolari J, Dubois T, Elkhatib N, El Kha-

dali F, Franco M, Chavrier P, Houdusse A. Structural basis for

ARF1-mediated recruitment of ARHGAP21 to Golgi membranes.

EMBO J 2007;26:1953–1962.

29. Ray MC, Germon P, Vianney A, Portalier R, Lazzaroni JC. Identifi-

cation by genetic suppression of Escherichia coli TolB residues

important for TolB–Pal interaction. J Bacteriol 2000;182:821–

824.

30. Cascales E, Lloubes R. Deletion analyses of the peptidoglycan-asso-

ciated lipoprotein Pal reveals three independent binding sequences

including a TolA box. Mol Microbiol 2004;51:873–885.

31. Pichler A, Knipscheer P, Oberhofer E, van Dijk WJ, Korner R, Vel-

gaard Olsen J, Jentsch S, Melchior F, Sixma TK. SUMO modifica-

tion of the ubiquitin-conjugating enzyme E2-25K. Nat Struct Biol

2005;12:264–269.

32. Kannan S, Zacharias M. Enhanced sampling of peptide and protein

conformations using replica exchange simulations with a peptide

backbone biasing-potential. Proteins 2007;66:697–706.

33. Riemann N, Zacharias M. Refinement of protein cores and protein–

peptide interfaces using a potential scaling approach. Protein Eng

2005;18:465–476.

May and Zacharias

780 PROTEINS DOI 10.1002/prot