2 Couplings as Restraints in Ab Initio Protein Structure...

15
Use of Residual Dipolar Couplings as Restraints in Ab Initio Protein Structure Prediction Turkan Haliloglu 1 Andrzej Kolinski 2,3 Jeffrey Skolnick 2 1 Polymer Research Center and Chemical Engineering Department, Bogazici University, Bebek 80815, Istanbul, Turkey 2 Buffalo Center of Excellence in Bioinformatics, 901 Washington St., Ste. 300, Buffalo, NY 14203 3 Faculty of Chemistry, Warsaw University, Pasteura 1, 02-093 Warsaw, Poland Received 23 May 2003; accepted 24 July 2003 Abstract: NMR residual dipolar couplings (RDCs), in the form of the projection angles between the respective internuclear bond vectors, are used as structural restraints in the ab initio structure prediction of a test set of six proteins. The restraints are applied using a recently developed SICHO (SIde-CHain-Only) lattice protein model that employs a replica exchange Monte Carlo (MC) algorithm to search conformational space. Using a small number of RDC restraints, the quality of the predicted structures is improved as reflected by lower RMSD/dRMSD (root mean square deviation/distance root mean square deviation) values from the corresponding native structures and by the higher correlation of the most cooperative mode of motion of each predicted structure with that of the native structure. The latter, in particular, has possible implications for the structure- based functional analysis of predicted structures. © 2003 Wiley Periodicals, Inc. Biopolymers 70: 548 –562, 2003 Keywords: ab initio structure prediction; dynamic modes; residual dipolar coupling; SICHO model Correspondence to: Jeffrey Skolnick; email: skolnick@buffalo. edu Contract grant sponsor: NIH; Contract grant number: GM-37408 Contract grant sponsor: BU Research; Contract grant number: 00HA502D-00HA503 (T.H.) Contract grant sponsor: DPT Project; Contract grant number: 01K120280 (T.H.) Contract grant sponsor: EA-TUBA-GEBIP.2001-1-1 (T.H.) Biopolymers, Vol. 70, 548 –562 (2003) © 2003 Wiley Periodicals, Inc. 548

Transcript of 2 Couplings as Restraints in Ab Initio Protein Structure...

Page 1: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

Use of Residual DipolarCouplings as Restraints in AbInitio Protein StructurePrediction

Turkan Haliloglu1

Andrzej Kolinski2,3

Jeffrey Skolnick2

1Polymer Research Centerand Chemical Engineering

Department,Bogazici University,

Bebek 80815,Istanbul, Turkey

2Buffalo Center of Excellencein Bioinformatics,

901 Washington St.,Ste. 300,

Buffalo, NY 14203

3 Faculty of Chemistry,Warsaw University,

Pasteura 1,02-093 Warsaw, Poland

Received 23 May 2003;accepted 24 July 2003

Abstract: NMR residual dipolar couplings (RDCs), in the form of the projection angles betweenthe respective internuclear bond vectors, are used as structural restraints in the ab initio structureprediction of a test set of six proteins. The restraints are applied using a recently developed SICHO(SIde-CHain-Only) lattice protein model that employs a replica exchange Monte Carlo (MC)algorithm to search conformational space. Using a small number of RDC restraints, the quality ofthe predicted structures is improved as reflected by lower RMSD/dRMSD (root mean squaredeviation/distance root mean square deviation) values from the corresponding native structures andby the higher correlation of the most cooperative mode of motion of each predicted structure withthat of the native structure. The latter, in particular, has possible implications for the structure-based functional analysis of predicted structures. © 2003 Wiley Periodicals, Inc. Biopolymers 70:548–562, 2003

Keywords: ab initio structure prediction; dynamic modes; residual dipolar coupling; SICHOmodel

Correspondence to: Jeffrey Skolnick; email: [email protected]

Contract grant sponsor: NIH; Contract grant number: GM-37408Contract grant sponsor: BU Research; Contract grant number:

00HA502D-00HA503 (T.H.)Contract grant sponsor: DPT Project; Contract grant number:

01K120280 (T.H.)Contract grant sponsor: EA-TUBA-GEBIP.2001-1-1 (T.H.)

Biopolymers, Vol. 70, 548–562 (2003)© 2003 Wiley Periodicals, Inc.

548

Page 2: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

INTRODUCTION

Methods that would allow for the more rapid deter-mination of protein structure are of great importance,both from the viewpoint of traditional structural biol-ogy as well as for structural genomics projects. NMRresidual dipolar couplings, RDCs, are of particularinterest for this purpose, as they provide considerablestructural information through their dependence onthe orientation of an intermolecular vector relative toan order frame1 and offer the advantage that this datacan be collected in a relatively short time period. Theintroduction of the RDC methodology has increasedthe scope of the problems that could be addressed byNMR spectroscopy. The identification of conforma-tional changes, the relative orientations of domainsand intermolecular complexes, studies of rapid recog-nition of homologous protein folds, and submillisec-ond timescale dynamics have all shown considerableprogress due to the development of this methodol-ogy.2 It is clear that RDCs can improve the quality ofthe determined3 structures by providing long-rangeorientational restraints; that is, a set of global struc-tural restraints that nicely complement those typicallyobtained from other data such as nuclear Overhauserenhancements (NOEs), scalar couplings, and chemi-cal shifts.1,4,5 RDCs are also sensitive to internalmotions; yet most of the existing structural refinementprotocols for analyzing dipolar data implicitly assumethat internal motions are either absent or are uniformand axially symmetric in nature.6–8 In a recent work,9

it was demonstrated that an ab initio structure methodcombined with an extensive set of RDC restraints canprovide a general method for the structure predictionof a variety of protein folds (proteins up to 125residues). The effect of the completeness of the dataset on the algorithm performance was also investi-gated in the latter study, and small decreases in accu-racy and precision were observed for the structuresstudied when 30% of the data were randomly re-moved. However, it is not obvious to what extent thismethod can be generalized to larger proteins with amuch less complete data set.

Experimental RDCs can be used to define theorientation of a vector with respect to an alignmenttensor, which is affected either by paramagnetic prop-erties of the molecule or solvation in liquid crystalmedia.4,5,10–12 In structure calculations, RDCs can beused for the optimization of the orientation of bondvectors with respect to the orientation of the externalalignment or susceptibility tensor.6,7,13–17 Althoughthe size of the alignment tensor can be derived fromthe distribution of the experimental RDCs, its orien-tation with respect to the coordinate system of a

molecule is unknown at the beginning of the structureprediction; this could cause convergence problems inthe folding process due to the multiple-minima prob-lem,18 the solution of which requires complex sam-pling protocols. Thus, a method19,20 that is indepen-dent of the orientation of the alignment tensor withrespect to the molecule is employed in the presentstudy to transform the RDCs into the projection an-gles between the internuclear bond vectors for whichthe RDCs are measured. This approach could be ef-fectively used at the beginning of the folding proce-dure or at any stage of the folding process. Anothermethod of rapid generation of protein structures fromdipolar coupling data was presented recently.21 Thismethod employs dipolar coupling constraints in theform of a simple elliptical equation. Here, in thisarticle, a test set of six proteins is used to investigatethe effect of RDC restraints on the quality of thepredicted structures generated by an ab initio structureprediction algorithm based on the recent low-resolu-tion protein model—namely, the SIde-CHain-Only(SICHO) model, where conformational space is sam-pled by a multiple copy simulated annealing MCalgorithm.22,23 A single set having a small number ofRDCs is employed. As the goal is to examine thecontribution of RDC restraints alone, NOE restraintsare not incorporated, although this could be readilydone.

MODEL AND METHOD

Theory

The residual dipolar coupling, Dij, measured between cou-pled nuclei i and j, provides geometric information relativeto the common alignment frame of the form7

Dij��,�� � D�(3cos2� � 1) � 3/2D� (cos2�sin2�) (1)

Here, D� and D� are, respectively, the axial and perpendic-ular components of the alignment tensor and can be deter-mined from the experimental RDCs (powder pattern ofcouplings).6 {�,�} is the vector orientation relative to thistensor.

Equations that allow us to use RDCs as restraints with-out the need to define the orientation of the alignment tensorcan be derived from Eq. (1). This is done by calculating theprojection angles �ij between all pairs of internuclear vec-tors i and j for which the dipolar coupling data are mea-sured. There is a continuum of �� pairs for each set of RDCdata. These lead to the assignment of a set of �ij (consid-ering mirror reflections symmetry along each axis as well)for each pair of vectors i and j. In the simulations, thepossible range for the angle �ij is no longer allowed to be inthe whole interval from 0 to �. Instead, two general possi-

RDCs as Restraints in Protein Structure Prediction 549

Page 3: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

bilities are allowed: one range of angles, where �ij � (�1,�� �1); two ranges of angles, where �ij � (�1,�/2 � �2) or�ij � (�/2 � �2,� � �1), with [�1,2 � (0,�/2)]. Thus, thenumber of RDCs, n, provides n(n � 1)/2 structural re-straints.

Potential and Simulations

A harmonic type of potential is implemented to impose theallowed ranges for the projection angles between a set ofinternuclear vectors to be satisfied during the simulation, inaddition to the other terms of the potential of generic andresidue specific short-/long-range interactions used in thelattice simulation algorithm based on the SICHO proteinmodel.

SICHO22,23 is a lattice protein model that only uses oneexplicit interaction center per residue located at the sidechain center of mass. These interaction centers are restrictedto an underlying simple cubic lattice. The positions of thealpha carbons are estimated from the side chain coordinatesand used to map the approximate orientations of vectorsassociated with RDCs.

The RDC could depend on different bond vectors foundin the structure (NH, C�N, C�H�, C�OC). Here, the incor-porated RDCs are for amide NH bond vectors. In a low-resolution model, the explicit description of an NH vector isnot possible without a reverse mapping of the low-resolu-tion structure back to its full atomistic representation; thus,it is not efficiently performed during the simulation. How-ever, it is possible to have an approximate description foreach NH vector by associating it with a perpendicular vectorto the backbone. This vector is the normal vector to theplane described by the two successive virtual bonds adja-cent to the �-carbon center of the residue associated withthe respective NH vector (i.e., by the cross-product of thesuccessive two virtual bonds, each of which connects thetwo successive �-C atoms). To see the error introduced bythis description of an NH vector, the projection anglesbetween the respective NH bond pairs are calculated for thenative structure in atomic resolution from the BrookhavenProtein Data Bank (PDB) and compared with the corre-sponding angles calculated from the approximate descrip-tion of the vectors in the low-resolution representation ofthe native structure. This error remains within 20–30° forthe structures of our test set, which is also within the angularresolution of the SICHO lattice protein model. Thus, theallowed range for the projection angles between the respec-tive vectors calculated using Eq. (1) is also within theresolution of the model.

In the present work, a test set of six proteins (one blind)was used to see the effects of the RDC restraints on thequality of the predicted structures by ab initio simulations.These structures are: acyl carrier protein (ACP, PDB code:1ACP), Rubredoxin (PDB code: 1brf), RNA Binding Do-main (NS1, PDB code: 1ns1), Nodulation Protein F (Nodf,PDB code: 1nodf), Carbohydrate Recognition Domain (Crd,PDB code: 1a3k), and Brct domain. The RDCs of eachstructure were obtained from private communications.24

The simulations were carried out for both cases, with andwithout restraints, and the results for each are presented anddiscussed under the corresponding subsections in Resultsand Discussion.

The quality of the predicted structures is evaluated fromthe viewpoint of two measures of fold quality. One analysisof the improved quality of the predicted structures relativeto the absence of RDC information is the decrease in coor-dinate, RMSD, and distance, dRMSD, values of the pre-dicted structure from the native structure. The implementa-tion of the restraints during the folding process could drivethe folding of the structure to the right topology by restrict-ing the conformational space accessible to the structure (asin the case of ACP, see below). The orientational orderimposed by the RDCs might also lead to improvements inthe slowest dynamic mode shape of the structure that rep-resents the most cooperative mode of motion and charac-terizes the global dynamic behavior of the structure. How-ever, as shown below, the RMSD, which is a rather globaltype of structural measure, does not necessarily significantlyreflect this improvement (as in the case of Rubredoxin). Toaddress this issue, the Gaussian Network Model (GNM)25 isused for the latter analysis.

The GNM uses the topology of residue-residue contactsto model the proteins as an elastic network with uniformsingle parameter harmonic potentials between the � carbonsof contacting residue pairs. Adopting a harmonic interactionpotential means assuming that the residues are undergoingthe Gaussianly distributed fluctuations about these mean-positions. Using the GNM, the dynamics of a biomolecularsystem can be decomposed into a collection of internalmotions of different frequencies with a procedure similar tonormal mode analysis. The slowest modes with the lowestfrequencies refer to the most cooperative motions involvingthe entire structure.

These dominant modes of motion give information aboutthe molecular dynamics relevant to biological function thatoccurs on the global scale.26–29 Thus, the predicted struc-ture ideally should be able to display the correct dynamicmodes of motions that are inherent in its native packingdensity. The prediction of nativelike fluctuations could beparticularly important if one wants to design a structure-based method to resolve some aspects of functional prop-erties. In previous studies,26–29 it was suggested that theminima in the global mode shapes generally coincide withthose residues acting as hinges, and the same regions arealso usually observed to be correlated with (or juxtaposedto) biologically active sites, such as catalytic sites in en-zymes. Maxima, on the other hand, correspond to segmentsdistinguished by their enhanced mobilities and are oftenimplicated in substrate recognition.

In this article, the slowest mode shape for each predictedstructure with and without RDC restraints is calculated andcompared to that of the native structure. A calculated linearcorrelation coefficient r of the slowest mode shape of thepredicted best structure with that of the native (consideringshifting of up to five residues for optimum matching be-tween the two curves) is taken into account. We will also

550 Haliloglu, Kolinski, and Skolnick

Page 4: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

examine the corresponding RMSD/dRMSD from the nativeto assess the improvement obtained by the incorporation ofa small number of RDCs in the absence of any otherexperimental data (see Table I). The best structure is taken

as the centroid of the cluster that has the lowest RMSDrelative to native (except for Brct, which is a blind predic-tion). Furthermore, the functional implication of the fluctu-ations in the slowest mode is examined.

Table I a

a) Acyl carrier protein, ACP (1ACP)Without RDC restraints

Run RMSD (Å) (n*/n) dRMSD (Å)b r1 10.35 1/2 5.38 0.172 10.30 2/2 5.40 0.123 10.29 2/2 5.40 0.154 10.29 1/2 5.33 0.12Average 10.31 5.38 0.14

With RDC restraints1 7.77 3/4 4.68 0.592 6.59 4/4 4.16 0.823 8.02 3/3 4.68 0.404 9.34 3/3 4.88 0.675 7.19 2/4 4.28 0.44Average 7.78 4.53 0.58b) Rubredoxin (1brf)

Without RDC restraintsRun RMSD(Å) (n*/n) dRMSD(Å) r

1 4.88 1/4 3.68 0.522 5.19 1/3 3.85 0.623 5.56 1/6 4.26 0.444 4.94 2/5 3.75 0.486 5.79 1/4 4.28 0.26Average 5.27 3.96 0.46

With RDC restraints1 5.24 1/6 3.91 0.662 5.34 1/5 4.07 0.873 4.34 3/7 3.35 0.814 4.20 4/5 3.27 0.83Average 4.78 3.65 0.79c) RNA binding domain, NS1 (1ns1)

Run RMSD (Å) (n*/n) dRMSD (Å) rWithout RDC restraints

1 8.38 1/3 6.46 0.91With RDC restraints

1 7.85 1/3 5.73 �0.06d) Nodulation Protein F, Nodf (1nodf)

Run RMSD (Å) (n*/n) dRMSD (Å) rWithout RDC restraints

1 6.53 2/2 5.21 0.342 6.43 1/2 5.23 0.42Average 6.48 5.27 0.32

With RDC restraints1(RDC) 6.01 2/6 4.99 0.812(RDC) 5.54 4/5 4.62 0.82Average 5.77 4.80 0.82

aRMSD/dRMSD (Å) of the centroid of the best cluster for five predicted structures. n* and n show the rank of the best cluster and the totalnumber of clusters, respectively. r is the correlation coefficient between the slowest mode shape of the predicted and native structures (awindow of five residues is used for optimum matching of the two curves). The results with RDCs are marked.

bdRMSD � ((1/Npair) �i�j (rija � rij

b)2)1/2; a and b refer the two structures and Npair is the number of residue pairs, ij.

RDCs as Restraints in Protein Structure Prediction 551

Page 5: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

RESULTS AND DISCUSSION

ACP

ACP, having 77 residues, is an essential cofactor inthe biosynthesis of fatty acids in many reactions thatrequire acyl transfer steps. It has mainly three �-he-lices (�1: Ile3-Leu15, �2: Leu37-Asp51, �3: Val65-His75/ excluding small ��helical segment: Glu57-Phe62). The number of experimental RDCs used hereis 24 (transformed to 24 � 23/2 orientational re-straints). These restraints are associated with three�-helices of the structure and thus provide informa-tion about the relative orientation of each with respectto the other.

The projection angles �ij for a set of NH bond pairsi and j of ACP calculated from a set of measuredRDCs are depicted in Figure 1(a). The RMSD/dRMSD from native values for all runs with/withoutRDC restraints and the corresponding correlation co-efficients r of the slowest mode shapes with that of thenative are summarized in Table I(a).

For the best case, the RMSD/dRMSD improvesfrom 10.29/5.40 Å to 6.59/4.16 Å with use of theRDC restraints. Figure 1(b) depicts the ribbon dia-gram of the best predicted structure with the RDCrestraints superimposed with the native structure(PDB code: 1ACP). For comparison, the ribbon dia-gram of the predicted structure without the RDCrestraints (with RMSD of 10.29 Å) is displayed inFigure 1(c). It could be noted that the alignment of thesecondary structural units is improved by the incor-poration of orientational restraints into the simula-tions, which, in turn, leads to a significant improve-ment in the prediction as reflected by its RMSD/dRMSD from native. In the case with RDC, theresults suggest that the incorporation of distance re-straints associated with the C-terminus might easilylead to the prediction of a better structure with evenlower RMSD.

The mean-square fluctuations of the �-carbons inthe first slowest mode for the best predicted structureswith and without the RDC restraints compared to the

FIGURE 1 (a) Projection angles �ij of NH bond vectors i and j of ACP from RDCs. The specificranges with symmetric counterparts allowed for the respective pair of bond vectors are given in thefigure. (b) Ribbon diagram of superimposition of native (light gray) and predicted (dark gray)structures with RDC restraints (RMSD from native of 6.56 Å). (c) Ribbon diagram of the predictedstructure without RDC restraints (RMSD from native of 10.29 Å).

552 Haliloglu, Kolinski, and Skolnick

Page 6: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

corresponding dynamic mode of the native structureare depicted in Figure 2(a) and (b), respectively. Thecorrelation coefficient r of the slowest mode shapeincreases, on the average, from 0.14 to 0.58, withRDCs (Table Ia).

The slowest mode shape of the native structuredepicts that the region of �2 in the middle of thestructure appears to be able to display high amplitudefluctuations (Figure 2a). It is interesting to note thatthe phoshopantetheine prosthetic group is attached toSer36 (the site is a conserved Asp-Ser-Leu motif),which resides in the N-terminus of the latter helix.

X-ray crystallographic studies30 on Butyryl-ACP re-veal flexibility of the structure around a putative acylchain binding site. The analysis of the molecularsurface of ACP pictures a plastic hydrophobic cavityin the vicinity of Ser36, which is expanded in onecrystal form and contracted in another crystal form,implying that the protein has adopted this conforma-tion after delivery of substrate into the active site of apartner enzyme. The latter region to which Ser36 isattached can be recognized by enhanced fluctuationsin the slowest mode of the predicted structure with theRDCs as well; yet, the mode shape seems to shift

FIGURE 2 Mean-square fluctuations of the C� atoms in the slowest mode of predicted structuresof ACP with RDC restraints (RMSD from native of 6.56 Å) (a) and without RDC restraints (RMSDof 10.29 Å) (b) in comparison with the corresponding mode of native structure (dashed curve).

RDCs as Restraints in Protein Structure Prediction 553

Page 7: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

from that of the native by two-five residues. If weexclude the misfolded C-terminus, the maximum inthe predicted slowest mode shape with the RDCscoincides with the segment comprising Ser36.

The dynamic model for the structure of ACP insolution based on two-dimensional NMR data sug-gests31 that the helices �1 and �3 move in a concertedfashion, although both remain in similar conforma-tions, whereas helix disruption occurs in �2 in someconformers. The fast exchange of amide protons ofresidues 41 and 42 and some other residues closer tothe middle in �2 compared to the other amide protonin other helices indicates instability of the middlehelix. The enhanced mobility of �2 is also reflected byfluctuations of the respective region in the slowestmode of both native structure and predicted structurewith the RDCs (excluding the misfolded C-terminus).The helix disruption in some conformers might couplewith a larger conformational change of the region ofSer36 associated with the delivery of the substrate, asdiscussed above. In the same dynamic model, Phe28and Phe50 were found to violate a different set ofrestraints in each conformer; this may promote theexistence of slower (more cooperative) internal mo-tions of the structure. These two residues correspondto the two hinge-points, as preceding and succeedingthe loop of Ser36, in the native slowest mode. The twohinge points—one at around Asn24 (yet, shifted byfour residues from Phe28) and the other at Phe50—appear in the predicted slowest mode with the RDCsas well.

On the other hand, ACP has a hydrophobic cleftbetween �2 and �3 in which an acyl chain can lie.32

The secondary structure analysis of the latter regionby two-dimensional 1H-NMR spectroscopy suggests ahinge region at residues Thr63 and Thr64. Further-more, a number of hydrophobic residues line thecontact area between these helices (Phe50, Ile54,Ala59, Val65, Ala68, and Tyr71), which could pro-vide a site for acyl chain stabilization.17 These appearwith relatively restrained fluctuations in the slowestmodes of both native and predicted structures withRDCs (the acyl chain is not considered in the GNMcalculations; thus, its presence with ACP should leadto more restrained fluctuations).

Rubredoxin

Rubredoxin, containing 53 residues, is an electrontransfer protein whose structure has three �-helices(�1: Pro19-Asn21, �2: Phe29-Glu31, �3: Lys45-Glu47) and three small �-strands (�1: Lys2-Cys5, �2:Ile11-Asp13, �3: Phe48-Lys50). The number ofRDCs employed is 15–24 [(15 � 14/2) � (24 � 23)/2

orientational restraints]. The restraints imposed areassociated not only with these helical regions, butwith other regions of the structure as well.

The results from the present simulations are sum-marized in Table I(b). Figure 3(a) and (b) depict theribbon diagrams of the predicted structures of a rep-resentative case of Rubredoxin with RDC (RMSDfrom native of 4.34 Å; the number of RDCs is 22) andwithout RDC (RMSD from native of 4.88 Å) re-straints, respectively. The mean-square fluctuations ofthe �-carbon atoms along the slowest mode of thepredicted structures in comparison with that of thenative state structure (PDB code: 1brf) are presentedin Figure 3(c) (with RDC restraints) and 3(d) (withoutRDC restraints), respectively. Although the RMSDvalues of the two structures do not differ significantly,the use of RDC restraints leads to the prediction of astructure that can display more nativelike fluctuations(the average correlation coefficient r increases from0.46 without RDCs to 0.8 with RDCs).

As can be seen from Figure 3(c) and (d), the shapeof the native slowest mode shows that there are basi-cally three loops—the highly flexible middle loopGlu14-Leu32, which is separated from the other twoby the two hinges, or minimum fluctuating regionsaround Tyr12 and Asp34. The active site of theRubredoxin contains an iron that is coordinated by thesulfurs of four conserved cysteine residues (residuesCys5, Cys8, Cys38, and Cys41), which reside on thelatter two associated loops that are the most conservedregions of the protein.33 Resolution crystal structures(1.5 Å) and molecular dynamics simulations of oxi-dized and reduced Rubredoxin from Clostridium pas-teurianum suggest34 that a gating mechanism causedby the conformational change of Leu41, a nonpolarside chain, allows transient penetration of water mol-ecules, which increases the polarity of water mole-cules and also provides a source of protons. Prior tothis, expansion of the Fe-S cluster and concomitantcontradiction of the NH. . .S hydrogen bonds lead togreater electrostatic stabilization of the negativecharge in this region; this involves the breathing mo-tion of Val8 and Val44. These structural rearrange-ments upon reduction suggest specific mechanisms bywhich electron transfer reactions of Rubredoxinshould be facilitated. This may explain the mobility ofthe two associated loops in the slowest mode shape,which coordinate the Fe: all the latter residues appeararound the maximum of the two loops in the respec-tive regions in the native slowest mode, whereas allothers, excluding Gly9, appear in the same positionsin the predicted slowest mode shape with RDCs. Forthe highly flexible multiple turn region comprisingresidues Glu14-Leu32, reflected by the largest mean-

554 Haliloglu, Kolinski, and Skolnick

Page 8: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

FIG

UR

E3

(a,b

)R

ibbo

ndi

agra

ms

ofpr

edic

ted

stru

ctur

esof

Rub

redo

xin

with

RD

C(R

MSD

from

nativ

eof

4.34

Å)

and

with

outR

DC

(RM

SDfr

omna

tive

of4.

88Å

)re

stra

ints

,re

spec

tivel

y.(c

,d)

Mea

n-sq

uare

fluct

uatio

nsof

C�

atom

sin

the

slow

estm

ode

inco

mpa

riso

nw

ithth

atof

the

nativ

est

ate,

resp

ectiv

ely

[(i)

and

(ii)

refe

rto

,res

pect

ivel

y,th

ena

tive

and

pred

icte

dst

ruct

ures

].

RDCs as Restraints in Protein Structure Prediction 555

Page 9: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

square fluctuations between the latter two loops alongthe slowest mode shape of both native and predictedstructures with RDCs, the amide exchange experi-ments35 promote a model of solvent exposure havinga subglobal cooperative conformational opening forthis region. On the other hand, the significantly lowerexchange rates preceding and following the latterregion suggest that this segment is constrained ateither hand by the less flexible binding region. Indeed,this corresponds to two hinge points in the respectiveregions reflected by two minima in the slowest modeshapes of both the native and the predicted structureswith RDCs; yet, the predicted mode shape appearsslightly shifted (by two-five residues). The distribu-tion of fluctuations in the predicted structure withoutRDCs does not imply any of these observations, asseen in the respective figure. The RDC restraintscorrect the slowest mode shape of the structure incomparison to that without the RDCs by shifting theminimum fluctuating regions closer to those pointsobserved in the native state (Figure 3c,d). The latterresults imply that the restrained/enhanced mobility ofthese loops has an implication for its function, and theRDCs promote the recovery of the nativelike distri-bution of these fluctuations.

This analysis is more challenging here, as there aredifferences between the dynamic modes of even rel-atively low RMSD structures. Ideally, one expectsthat as the RMSD values become lower, the modeshapes should be very similar between the structureswith similar RMSD values. Nevertheless, these resultsimply that a structure with an RMSD value of about4–5 Å does not necessarily have the correct modeshapes. This, on the other hand, emphasizes the sig-nificance of another measure of structural quality instructure prediction calculations.

RNA Binding Domain (NS1)

NS1 is a 73 residue RNA-binding/dimerization do-main. The structure has three �-helices (�1: Thr5-Asp24, �2: Pro31-Thr49, �3: Ile54-Lys70). The RDCdata are mainly associated with the NH bonds of thethree �-helices in the structure. The total number ofRDCs employed here is 24 (24 � 23/2 orientationalrestraints).

The ribbon diagrams of the native (PDB code:1ns1) and the predicted (with RDCs, they have aRMSD/dRMSD of 7.85/5.73 Å; while without RDCs,RMSD/dRMSD is 8.4/6.18 Å) structures are depictedin Figure 4. Here, while the C-terminus of the struc-ture is not correctly folded with or without the RDCs,the RDC restraints appear to contribute to the adjust-ment of the relative orientations of the �-helices.

Figure 5 displays the first slowest mode shape of thepredicted structures with the RDC (a) and without theRDC (b) restraints. The correlation of the mode’sshape with native is higher for the case with the RDCs(r � 0.91 and �0.06, respectively). This observationcan be expected, as the structure with the RDCs has alower RMSD in comparison to the one without theRDCs; however, we have already shown with Rubre-doxin that this does not necessarily happen.

As shown in Figure 5, the distribution of the nativefluctuations along the slowest mode displays strikingextensive mobility for the segment from residuesVal22 to Asn53, which comprises the middle �-helix,�2. It was suggested36 from the distributions of basicresidues and conserved salt bridges of dimeric NS1that the face containing antiparallel helices 2 and 2�forms a novel arginine-rich nucleic acid binding mo-tif. Arg38 is absolutely required for binding, andLys41 makes a strong contribution to the affinity ofbinding; those residues in each of the latter antiparal-lel helices contact the phosphate backbone of theRNA target.37 Arg38 and Lys41 appear at the maxi-mum of the native mode shape in the respectiveregion. The predicted mode shape with the RDCsreflects this extensively moving segment from resi-dues Gly10 to Leu50, with the maximum beingshifted by about six residues.

The results depict improvement in the predictedfluctuations with RDCs, yet a shift in the slowestmode shapes appears for a predicted structure withrelatively high RMSD. However, the biologically ac-tive unit is a dimer, and we have simulated only themonomer; this may partially rationalize the relativelypoor results that we have obtained.

Nodf

Nodf has 35 residues (the residue indexes range from1–86 due to missing residues). It has three �-helices(�1: Leu5-Val17, �2: Asp46-Leu58, �3: Val76-Gly86). The number of RDCs employed is 15 (15� 14/2 orientational restraints). The results from thesimulations are presented in Table I(d).

The ribbon diagrams of the best predicted struc-tures with the RDCs (RMSD/dRMSD is 5.33/4.62 Å)and without the RDCs (RMSD/dRMSD is 6.43/5.21Å) superimposed on the native structure (PDB code:1fh1) are displayed in Figure 6(a) and (b), respec-tively. Figure 6(c) displays the mean-square fluctua-tions of ��carbon atoms for the native state and forthe predicted structures with/without RDC restraints,respectively. The calculation with the RDC restraintshas both lower RMSD/dRMSD values and a slowestmode shape that is more highly correlated with the

556 Haliloglu, Kolinski, and Skolnick

Page 10: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

native state; the average r is 0.82 and 0.22 with theRDCs and without the RDCs, respectively. The C-terminus appears to be misfolded without the RDCs.

Nodf has a high level of homology with ACP,especially around the prosthetic group attachmentsite.38 The phoshopantetheine prosthetic group is at-tached to Ser45 (the site is a conserved Asp-Ser-Leumotif), which is Ser36 in ACP. Residue Leu46 begins�2. The corresponding residue index is 14 in Figure6(a) and (b) (the indexes are readjusted because of themissing residues between the �-helices). The loopsare missing in our prediction and in the PDB struc-ture17 as well. Thus, the functional site should be inthe loop preceding �2. Functional analysis of an in-terspecies chimera of ACPs indicates a specializeddomain for protein recognition, and Nodf is a special-ized ACP whose specific features are encoded in theC-terminal region of the protein.38 Both the predicted

structure with RDCs and the native state structurehave the minimum fluctuating region in the C-termi-nal region �2, and have the active site on the preced-ing arm of that hinge point; yet, the native statepromotes higher fluctuation for the active site region.The missing loops may partially explain the latter.Nevertheless, the fluctuations of the C-terminal regionof the structure from 43–93 (starting from 14 to 37 inour index), that is, the functional domain38 of Nodf,are predicted with the RDCs closer to native com-pared to those of the predicted structures without theRDCs. It was suggested17 that, as in the case of ACP,Nodf has a hydrophobic cleft where the acyl chain canlie between �2 and �3. There is a hinge region17,32

between the two latter �-helices where the slowestmodes of both the native and the predicted structurewith the RDCs promote this behavior (the regionaround Glu24-Asn27) with highly restrained fluctua-

FIGURE 4 Ribbon diagrams of native (1ns1) and predicted structures (NS1) with RDC restraints(RMSD from native of 7.85 Å) and without RDC restraints (RMSD from native of 8.35 Å) of theRNA binding domain.

RDCs as Restraints in Protein Structure Prediction 557

Page 11: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

tions. This behavior for acyl chain stabilization is alsoobserved in the other ACP structures from severalsources.17

Crd

The structure Crd contains 137 residues (PDB code:1a3k) and is not folded below 10 Å either with orwithout RDC restraints. It is the only structure with allbeta sheets in our test set. The number of RDCsemployed is 24. Incorporation of NOE restraints

might help for folding. Nevertheless, here we aim tosee exclusively the contribution of the RDC restraintsto the quality of the predicted structures; thus, NOErestraints are not employed in any of the cases.

Brct Domain

The Brct domain contains 92 residues, and these sim-ulations constitute a blind prediction. The number ofRDCs employed here is 25. A close structure in thePDB to the one predicted here is the Brct domain from

FIGURE 5 Mean-square fluctuations of the C� atoms in the slowest mode of predicted structuresof the RNA binding domain (NS1) with RDC restraints (RMSD from native of 7.85 Å) (a) andwithout RDC restraints (RMSD from native of 8.35 Å) (b) in comparison with the correspondingmode of native structure (dashed curve).

558 Haliloglu, Kolinski, and Skolnick

Page 12: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

DNA-repair protein Xrccl from Homo Sapiens (PDBcode: 1cdz39; residue number of 96) with 15.8% se-quence identity. Figure 7(a) and (b) depicts the ribbondiagrams of the predicted structures, obtained fromthe centroids of the first cluster (ranked based on theenergy), with and without RDC restraints, superim-posed on 1cdz (RMSD of 5.14 Å with RDCs andRMSD of 7.21 Å without RDCs, respectively). If we

look at the ribbon diagrams, the relative position ofN-terminus of the structure is different for the twocases—with and without RDCs. The simulations withvarying seed numbers were repeated for the case withRDCs, and the first cluster centroid of each consis-tently gave the structures with an RMSD of 5.14–5.65 Å (with the total number of clusters varying fromone to four) from 1cdz. On the other hand, if we look

FIGURE 6 Superimposition of ribbon diagrams of predicted structures (light gray) of Nodf withRDC restraints (RMSD from native of 5.33 Å) (a) and without RDC restraints (RMSD from nativeof 6.51 Å) (b) on native structure (1nodf) (dark gray). (c) Mean-square fluctuations of C� atoms inthe slowest mode for native (1nodf) and predicted structures with/without RDC restraints of Nodf.

RDCs as Restraints in Protein Structure Prediction 559

Page 13: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

at the second cluster (the second lowest energy clustercentroid of the predicted structures without RDCs), itsRMSD is 5.38 Å from 1cdz.

As Brct was a blind prediction, a relatively highernumber of restraints from threading were incorporatedcompared to the other structures simulated here. Thismay partly explain why we observe that the firstcluster centroid (ranked based on the energy) is thebest structure in all the runs with the RDCs restraints.

CONCLUSION

The results suggest that RDC restraints lead to higherquality structures in ab initio prediction, as assessedby both structural and dynamic measures; yet, unlikein the previous work,9 a small number of RDCs isemployed. The predicted structures should be evalu-ated from the view of both structural and dynamicmeasures, with the intricate relationship between thetwo, in connection with the biological function, beingof utmost importance.

Ideally, the dynamic characteristics of the pre-dicted structure should converge to those of the nativeas the native state is approached. The fluctuations andquality of the structure are strongly coupled as thefluctuations reflect the quality of the structure and areexpected to improve as the quality of the structure isimproved. However, it is not obvious for which range

of RMSD/dRMSD values should this happen. TheRMSD is a global measure of the similarities betweenthe structures and may not reflect the differences inthe local packing densities that inherently affect thedynamic behavior of the structures. The present re-sults show that dynamic characteristics, describedhere in terms of the shape of the so-called global (ordominant) collective mode, between predicted struc-tures with an RMSD from native around 4–5 Å andcloser may not necessarily be similar, as for theRubredoxin. The inclusion of RDCs corrects theshape of slowest mode to that of the native, as ob-served by the analysis of the centroid of the clusters(given in Table I) and the analysis of the individualconformations in the clusters. This has implicationsfor structure prediction calculations, as the objectiveis to select the best structure closest to the native statewith the right packing density that dictates the rightdynamic modes of functional significance.

The results demonstrate that the predicted struc-tures with the RDCs yield a distribution of motionalfluctuations whose patterns can catch functionally im-portant conformational changes as well as the approx-imate positions of the functionally important residues.This suggests that we can still extract functionallyimportant information from low- to moderate-resolu-tion structural models.

Regarding the implementation and use of RDCs,the following comments can be made:

FIGURE 7 (a,b) Ribbon diagrams of the predicted structures (first cluster centroids/lowestenergy) (dark gray) with RDC restraints (RMSD from native of 5.14 Å) and without RDC restraints(RMSD from native of 7.21 Å), respectively, superimposed on the structure 1czd (light gray).

560 Haliloglu, Kolinski, and Skolnick

Page 14: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

1. From the relative RDC values of the respectivebonds, one has an approximate idea about thecorresponding projection angles. If the bondpairs have high RDC values of the same sign,these vectors are close to the collinear. On theother hand, if the pairs have large but oppositeRDC values, the vectors are normal or close tonormal to each other. In a low-resolution model,such as is employed in the present work, thosesets of RDC values that would give unambigu-ous projection angles within the resolution ofthe present chain model are preferred.

2. If the data are well associated with secondaryunits, it should be possible to use RDCs asrestraints on the orientation of such secondarystructural elements rather than the individualbond vectors; that is, instead of the projectionangles of the vectors being associated with he-lices or groups of residues, the projection anglesof segments could be obtained to align the sec-ondary structural elements with respect to eachother. Then, rapid characterization of quater-nary structures and classification of tertiary foldcould be feasible.

3. In some cases, there is a need to have morerestraints to be able to fold the structure with orwithout RDCs (as was the case of 1a3k). It isobviously important to be able to implement therestraints from the beginning of the foldingsimulation; however, it would be useful as wellto concentrate on the refinement of alreadyfolded structures. For the latter, it could beworthwhile to consider an all-atomistic model/potential with RDC restraints, as refinementcould be done in a more accurate manner withthe explicit treatment of bond vectors of thestructure associated with RDC data, if the struc-ture is not misfolded. Such refinement could beparticularly significant with RDC restraints indetermining the relative orientations of the do-mains in multidomain structures and identifyingthe orientation of the ligands to the substrate inthe complex systems with a small effort.

4. Larger sets of RDC restraints may lead tohigher quality structures with lower RMSD.Therefore, the analysis of the effect of the num-ber of RDCs on the quality of the predictedstructures is important. However, besides thenumber, the positions of the restraints withinthe structure may affect the quality of the struc-tures. A consideration of topologically differentstructures and implementation of the same num-ber of restraints, but associated with different

parts of the structure for each given case, wouldcomplement the latter analysis.

5. The process of deriving sequence specificRDCs may yield chemical shifts sensitive to thesecondary structure. The incorporation of thechemical shift data together with RDCs canfurther improve the prediction of the fold. In arecent study by Rohl et al.,9 the incorporation ofboth chemical shift and RDC data was shown toallow the best results in their protocol.

We thank Drs. James Prestegard, Gaetano Montelione, andFang Tian for providing us with the RDC data used in thiswork. This research was supported in part by NIH Grant No.GM-37408 of the Division of General Medical Sciences ofthe National Institutes of Health. Useful discussions withW. Tian are gratefully acknowledged.

REFERENCES

1. Prestegard, J.H. Nat Struct Biol 1998, 5, 517–522.2. Tolman, J.R. Curr Opin Struc Biol 2001, 11, 532–539.3. Prestegard, J.H.; Al-Hashimi, H.M.; Tolman, J.R. Q

Rev Biophys 2000, 33, 371–424.4. Tolman, J.R.; Flanagan, J.M.; Kennedy, M.A.; Prest-

egard, J.H. Proc Natl Acad Sci USA 1995, 92, 9279–9283.

5. Tjandra, N.; Bax, A. Science 1997, 278,1111–1114.6. Clore, G.M.; Gronenborn, M.; Bax, A. J Magn Res

1998, 133, 216–221.7. Clore, G.M.; Gronenborn, A.M.; Tjandra, N. J Magn

Reson 1998, 131, 159–162.8. Clore, G.M.; Garrett, D.S. J Am Chem Soc 1998, 121,

9008–9012.9. Rohl, C.A.; Baker, D. J Am Chem Soc 2002, 124,

2723–2729.10. Bax, A.; Tjandra, N. J Biomol NMR 1997, 10, 289–

292.11. Ramirez, B.E.; Bax, A. J Am Chem Soc 1998, 120,

9106–9107.12. Hansen, M.R.; Mueller, L.; Pardi, A. Nat Struct Biol

1998, 5, 1065–1074.13. Fischer, M.W.F.; Losonczi, J.A.; Weaver, J.L.; Prest-

egard, J.H. Biochemistry 1999, 45, 9013–9022.14. Bolon, P.J.; Al-Hashimi, H.M.; Prestegard, J.H. J Mol

Biol 1999, 293, 107–115.15. Hus, J.-C.; Marion, D.; Blackledge, M. J Mol Biol

2000, 298, 927–936.16. Olejniczak, E.T.; Meadows, R.P.; Wang, H.; Cai, M.;

Nettesheim, D.G.; Fesik, S.W. J Am Chem Soc 1999,121, 9249–9250.

17. Fowler, C.A.; Tian, F.; Al-Hashimi, H.M.; Prestegard,J.H. J Mol Biol 2000, 304, 447–460.

18. Chou, J.J.; Li, S.; Bax, A. J Biomol NMR 2000,18,217–227.

RDCs as Restraints in Protein Structure Prediction 561

Page 15: 2 Couplings as Restraints in Ab Initio Protein Structure ...cssb.biology.gatech.edu/skolnick/publications/pdffiles/228.pdf · Couplings as Restraints in Ab Initio Protein Structure

19. Meiler, J.; Blomberg, N.; Nilges, M.; Griesinger, C.J Biomol NMR 2000, 16, 245–252.

20. Skrynnikov, N.R.; Kay, L.E. J Biomol NMR 2000, 18,239–252.

21. Wedemeyer, W.J.; Rohl, C.A.; Scheraga, H.A. J Bi-omol NMR 2002, 22, 137–151.

22. Kolinski, A.; Skolnick, J. Proteins 1998, 32, 475–494.23. Kolinski, A.; Betancourt, M.R.; Kihara, D.; Rot-

kiewicz, P.; Skolnick, J. Proteins 2001, 44, 133–149.24. Aramani, J.; Montelione, G. (RDCs of ACP, Nodf,

NS1, Brct Domain, and Crd); Tian, F. (RDCs of Rubre-doxin), private communications.

25. Bahar, I.; Atilgan, A.R.; Erman, B. Fold Des 1997, 2,173–181.

26. Bahar, I.; Atilgan, A.R.; Demirel, M.C.; Erman, B.Phys Rev Lett 1998, 12, 2733–2736.

27. Amadei, A.; Linssen, A.B.; Berendsen, H.J.C. Proteins1993, 17, 412–425.

28. Hinsen, K. Proteins 1998, 33, 417–429.29. de Groot, B.L.; Hayward, S.; van Aalten, D.M.F.; Ama-

dei, A.; Berendsen, H.J.C. Proteins 1998, 31, 116–127.30. Roujeinikova, A.; Baldock, C.; Simon, W.J.; Gilroy, J.;

Baker, P.J.; Stuitje, A.R.; Rice, D.W.; Slabas, A.R.;Rafferty, J.B. Structure 2002, 10, 825–835.

31. Kim, Y.; Prestegard, J.H. Biochemistry 1989, 28,8792–8797.

32. Jones, P.-J.; Holak, T.A.; Prestegard, J.H. Biochemistry1987, 26, 3493–3500.

33. Frey, M.; Sieker, L.; Payan, F.; Haser, R.; Bruschi, M.;Pepe, G.; LeGall, J. J Mol Biol 1987, 197, 525–541.

34. Min, T.; Ergenekan, C.E.; Eidsness, M.K.; Ichiye, T.;Kang, C. Protein Sci 2001, 10, 613–621.

35. Hernandez, G.; LeMaster, D.M. Biochemistry 2001, 40,14384–14391.

36. Chien, C.Y.; Tejero, R.; Huang, Y.; Zimmerman, D.E.;Rios, C.B.; Krug, R.M.; Montelione, G.T. Nat StructBiol 1997, 4, 891–895.

37. Wang, W.; Riedel, K.; Lynch, P.; Chien, C.Y.; Monte-lione, G.T.; Krug, R.M. RNA 1999, 5, 195–205.

38. Ritsema, T.; Gehring, A.M.; Stuitje, A.R.; Van derDrift, G.M.; Dandal, I.; Lambolat, R.H.; Walsh, C.T.;Thomas, Oates, J.E.; Lugtenberg, B.J.; Spaink, H.P.Mol Gen 1998, 257, 641–648.

39. Zhang, X.; Morera, S.; Bates, P. A.; Whitehead, P. C.;Coffer, A. I.; Hainbucher, K.; Nash, R. A.; Sternberg,M. J. E.; Lindahl, T.; Freemont, P. S. Embo J 1998, 17,6404–6411.

562 Haliloglu, Kolinski, and Skolnick