Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical...

20
Conformational Analysis of GpA and GpAp in Aqueous Solution by Molecular Dynamics and Statistical Methods Jesu ´ s Giraldo 1,2 , Shoshana J. Wodak 1 and Daniel van Belle 1 * 1 Unite ´ de Conformation des Macromole ´cules Biologiques Universite ´ Libre de Bruxelles CP160/16, Avenue F. D. Roosevelt, 50, B-1050 Bruxelles Belgium 2 Laboratorio de Medicina Computacional, Unidad de Bioestadı ´stica, Facultad de Medicina, Universidad Auto ´noma de Barcelona 08193 Bellaterra, Spain Barnase, an extracellular endoribonuclease from Bacillus amyloliquefaciens, hydrolyses single-stranded RNA. Its very low catalytic activity toward GpN dinucleotides, where N stands for any nucleoside, is markedly increased when a phosphate is added to the 3 0 -end, as in GpNp. Here we investigate the conformational properties of GpA and GpAp in solution, in order to determine whether differences in these properties may be related to the changes in enzymatic activity. Two independent 1.3 ns molecular dynamics trajectories are generated for each dinucleotide in the presence of explicit water molecules and counter ions. These trajec- tories are analysed by monitoring molecular properties, such as the sol- vent accessible surface area, the distance and orientation between the bases, the behaviour of torsion angles and formation of intramolecular H-bonds. To identify relevant correlations between these parameters, stat- istical techniques, comprising multiple regression, clustering and discri- minant analysis are used. Results show that GpA has a significant propensity to form folded conformations (50%), fostered by a small number of intramolecular H-bonds, whereas GpAp remains essentially extended. The latter behaviour seems to be due to an H-bond between the terminal phosphate and adenosine ribose group, which restricts rotation about the adenine Ag angle. We also find that GpA folding is induced by a concerted motion of specific torsion angles, which is closely coupled to the formation of a network of flexible hydrogen bonds. Finally, on the basis of an expression for barnase K M , which incorporates the folded/extended conformational equilibria of the dinucleotide sub- strates, it is argued that our findings on the differences between these equilibria, can qualitatively rationalize the experimentally measured differences in enzymatic properties. # 1998 Academic Press Keywords: molecular dynamics; barnase; dinucleotides; conformational families; stacking *Corresponding author Introduction Barnase an extracellular endoribonuclease from Bacillus amyloliquefaciens, catalyses the cleavage of single-stranded RNA by a two step, general acid- base mechanism. This is believed to involve resi- dues Glu73 and His102, the former acting as the general base and the latter as the general acid (Mossakowska et al., 1989). The first step of the process is the transesterification reaction, in which the carboxylate group of Glu73 takes up a proton from the 2 0 -OH group of the substrate. The second step involves the hydrolysis of the phosphodiester bond and is promoted by the protonated form of the imidazole ring of His102. Barnase catalysis is base selective. The enzyme always cleaves after guanosine when the substrate is a dinucleotide and preferentially after guanosine in longer substrates (Ruishizky et al., 1963). One of the most puzzling findings in barnase catalysis is the observed 100-fold increase in k cat E-mail address of the corresponding author: [email protected] Abbreviations used: MD, molecular dynamics; d BB , base-base distance; y BB , base-base orientation; ASA T , total accessible surface area; BSA BB , base-base buried surface area; RMS, root mean square. Article No. mb982139 J. Mol. Biol. (1998) 283, 863–882 0022 – 2836/98/440863–20 $30.00/0 # 1998 Academic Press

Transcript of Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical...

Page 1: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Article No. mb982139 J. Mol. Biol. (1998) 283, 863±882

Conformational Analysis of GpA and GpAp inAqueous Solution by Molecular Dynamics andStatistical Methods

Jesu s Giraldo1,2, Shoshana J. Wodak1 and Daniel van Belle1*

1Unite de Conformation desMacromoleÂcules BiologiquesUniversite Libre de BruxellesCP160/16, Avenue F. D.Roosevelt, 50, B-1050 BruxellesBelgium2Laboratorio de MedicinaComputacional, Unidad deBioestadõÂstica, Facultad deMedicina, UniversidadAutoÂnoma de Barcelona08193 Bellaterra, Spain

E-mail address of the [email protected]

Abbreviations used: MD, molecubase-base distance; yBB, base-base ototal accessible surface area; BSABB,surface area; RMS, root mean squar

0022±2836/98/440863±20 $30.00/0

Barnase, an extracellular endoribonuclease from Bacillus amyloliquefaciens,hydrolyses single-stranded RNA. Its very low catalytic activity towardGpN dinucleotides, where N stands for any nucleoside, is markedlyincreased when a phosphate is added to the 30-end, as in GpNp. Here weinvestigate the conformational properties of GpA and GpAp in solution,in order to determine whether differences in these properties may berelated to the changes in enzymatic activity. Two independent 1.3 nsmolecular dynamics trajectories are generated for each dinucleotide inthe presence of explicit water molecules and counter ions. These trajec-tories are analysed by monitoring molecular properties, such as the sol-vent accessible surface area, the distance and orientation between thebases, the behaviour of torsion angles and formation of intramolecularH-bonds. To identify relevant correlations between these parameters, stat-istical techniques, comprising multiple regression, clustering and discri-minant analysis are used. Results show that GpA has a signi®cantpropensity to form folded conformations (�50%), fostered by a smallnumber of intramolecular H-bonds, whereas GpAp remains essentiallyextended. The latter behaviour seems to be due to an H-bond betweenthe terminal phosphate and adenosine ribose group, which restrictsrotation about the adenine Ag angle. We also ®nd that GpA folding isinduced by a concerted motion of speci®c torsion angles, which is closelycoupled to the formation of a network of ¯exible hydrogen bonds.Finally, on the basis of an expression for barnase KM, which incorporatesthe folded/extended conformational equilibria of the dinucleotide sub-strates, it is argued that our ®ndings on the differences between theseequilibria, can qualitatively rationalize the experimentally measureddifferences in enzymatic properties.

# 1998 Academic Press

Keywords: molecular dynamics; barnase; dinucleotides; conformationalfamilies; stacking

*Corresponding author

Introduction

Barnase an extracellular endoribonuclease fromBacillus amyloliquefaciens, catalyses the cleavage ofsingle-stranded RNA by a two step, general acid-base mechanism. This is believed to involve resi-dues Glu73 and His102, the former acting as the

ing author:

lar dynamics; dBB,rientation; ASAT,base-base buriede.

general base and the latter as the general acid(Mossakowska et al., 1989). The ®rst step of theprocess is the transesteri®cation reaction, in whichthe carboxylate group of Glu73 takes up a protonfrom the 20-OH group of the substrate. The secondstep involves the hydrolysis of the phosphodiesterbond and is promoted by the protonated form ofthe imidazole ring of His102. Barnase catalysis isbase selective. The enzyme always cleaves afterguanosine when the substrate is a dinucleotide andpreferentially after guanosine in longer substrates(Ruishizky et al., 1963).

One of the most puzzling ®ndings in barnasecatalysis is the observed 100-fold increase in kcat

# 1998 Academic Press

Page 2: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

864 Conformational Analysis of Flexible Molecules

and a tenfold decrease in KM upon going fromGpN to GpNp (Day et al., 1992). Analysis of thecrystal structure of a barnase-d(GpC) complex ledto the suggestion that non-productive bindingmay be at the origin of the lower ef®ciency forGpN catalysis (Baudet & Janin, 1991). In thatstructure, the bound d(GpC), adopting a foldedconformation, is located near the active site, butthe guanine base does not interact with the rec-ognition loop (residues 56±62) and stacks insteadagainst the imidazole of His102. As a result theGlu73 and His102 residues cannot perform theircatalytic role. It has been speculated that thisnon-productive binding mode is accessible toGpN substrates, whereas GpNp dinucleotideswould preferably bind in a productive mode,because only in doing so can both phosphatesmake electrostatic interactions with barnase(Buckle & Fersht, 1994).

Some of the features characterizing productivebinding were observed in the complex of barnasewith d(CGAC) (Buckle & Fersht, 1994). In thisstructure the bound GpAp moiety adopts anextended conformation. The two phosphate groupsmake electrostatic interactions with active site resi-dues, and the guanine base is bound to the recog-nition loop in a manner resembling that incomplexes of other microbial RNAses. The adeninebase stacks against the His102 imidazole group, asthe guanine base in the barnase-d(GpC) complex(Baudet & Janin, 1991); only here, this poises theimidazole correctly for the proton transfer reactionto the O50 atom of the leaving group. A differentsituation altogether is observed with regard toGlu73. In order to initiate the transesteri®cationreaction, its carboxylate must be close enough tothe guanosine 20-OH in order to take up the protonfrom this group. Modelling this hydroxyl groupinto the barnase-d(CGAC) complex, reveals how-ever, that it is too far away from the Glu73carboxyl (�7 AÊ ) to permit proton transfer.Inspection of the model suggests that this isdue to the anti conformation of the guanosineglycosidic torsion angle w, which governs therelative base-sugar orientation (see Figure 1 fornotation). To form the H-bond between theGlu73 carboxyl and guanosine 20-OH, requiredfor catalysis, this angle must adopt a syn con-formation (Buckle & Fersht, 1994). It was shownthat this can be achieved in the barnase-d(CGAC) complex while maintaining virtuallythe same interactions between the enzyme andthe nucleotide moieties (bases and phosphategroups) as in the original crystal structure(Gordon-Beresford et al., 1996).

A further indication that the ¯exibility of theguanosine w angle may play a role in nucleotidebinding comes from structural data on mono-nucleotide complexes with barnase and relatedenzymes. The anti conformation, with a corre-sponding non-productive location for the 20-OHgroup is observed in the binase-30GMP (Pavlovskyet al., 1988) and RNase Sa-30GMP (Sevcik et al.,

1991) complexes. On the other hand the crystalstructure of the barnase-30GMP complex (Guilletet al., 1993) features the syn conformation, with the20-OH hydrogen-bonded to Glu73, whereas thesame complex in solution displays an equilibriumbetween the syn and anti conformations (Meieringet al., 1993).

These data make it clear that in this system, asin many others, a quantitative description of thecatalytic mechanism in terms of detailed atomicmodels of both enzyme and reactants, must takeinto account the conformational ¯exibility of thecorresponding molecules. But given the verylarge number of degrees of freedom involved,this cannot be undertaken without making anumber of approximations. A useful strategy(Gilson et al., 1997) involves identifying the low-est energy conformations of the free and com-plexed molecules by exploring a limited yetrelevant subset of the degrees of freedom. Sincethese conformations provide the predominantcontributions to the con®guration integrals, whichyield the corresponding free energies, these inte-grals can be approximated by a discrete sumover these conformations.

With the ultimate goal of rationalizing in quanti-tative terms the difference between the catalyticparameters of barnase for GpN and GpNp, weundertake a systematic analysis of the confor-mational properties of the free nucleotides GpAand GpAp in aqueous solution using MD simu-lations. In particular, we set out to identify andcompare the preferred conformations of these mol-ecules as well as to map out the internal degreesof freedom that govern their conformational¯exibility.

Nucleotide conformations are characterized byspeci®c ranges of the torsion angles. The angles (b,g, d, e), govern the sugar-phosphate backbonestructure (z, a), de®ne the phosphodiester confor-mation (w), the relative sugar-base orientation, andthe pseudo-rotation phase angle (P) describes thesugar pucker (Figure 1; Saenger, 1984). The sugarpuckering, in particular, plays a crucial role indetermining the allowed conformational states,and in de®ning the observed structural variabilityof nucleotides (Lavery & Hartmann, 1994).

Extensive surveys of nucleotide crystal structures(Berman et al., 1992) yielded valuable informationon the geometric properties of these molecules(Gelbin et al., 1996; Louise-May et al., 1996;Saenger, 1984). A number of studies were alsodevoted to relating the nucleotide torsion anglevalues to conformational features, and seeking cor-relations between the values of different torsionangles (Sundaralingam, 1969; Arnott & Hukins,1969; Sarma et al., 1974; Yathindra &Sundaralingam, 1974; Berman et al., 1978;Kitamura et al., 1981; Gelbin et al., 1996). The pic-ture emerging from these analyses has been thatconformational changes in nucleotides are pro-duced by concerted variations of speci®c torsionangles.

Page 3: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Figure 1. (a) Atomic and torsion angle notation forGpA (left) and GpAp (right) dinucleotides. (b) X-raystructure of ApA showing a parallel arrangement of thebases, the ribose rings point in the same direction (O40

upward; fragment of (50-R(*AP*AP*A)-30); NDB code:URC002). (c) X-ray structure of UpA showing an anti-parallel arrangement of the bases, the ribose rings pointin opposite directions ((50-R(*UP*A)-30); NDB codeURB008). For the purine bases (G, A), the referenceframe is de®ned by the bond vectors N9! C8 andN9! C4. The normal vector to the purine base ring isde®ned as the normalized cross product between thetwo bond vectors: (N9! C8)X(N9! C4). We obtain for(b), two approximately parallel normal vectors pointingupward. For the pyrimidine bases (U, C), the referenceframe is de®ned by the bond vectors N1! C6 andN1! C2, the normal vector to the plane of the pyrimi-dine bases is calculated from the normalized crossproduct between these two bond vectors:(N1! C6)X(N1! C2). We obtain for (c), two approxi-mately antiparallel normal vectors, the one located onthe adenine ring points downward, the one located onthe uracyl ring points upward.

Conformational Analysis of Flexible Molecules 865

From the structural data presented above, itappears that productive binding to barnaserequires the bound nucleotide to adopt anextended conformation in which adjacent basesinteract with protein residues, but not with eachother. The free nucleotide may on the other handadopt more compact conformations in which stack-ing between adjacent bases is observed. This maybe a limiting factor for barnase catalysis, as thenucleotide would have to undergo a conformation-al change to bind productively.

To address these issues, our study generates twoindependent 1.3 ns (1 ns � 10ÿ9 seconds) moleculardynamics trajectories of GpA and GpAp, respect-ively, in the presence of explicit solvent and coun-ter ions. These trajectories are analysed in order todetermine the conformational preferences of thetwo molecules, particularly with regard toextended versus compact conformations. Inaddition, we identify the set of dihedral angles,and speci®c hydrogen bonds, that are the majorplayers in de®ning these two types of confor-

mations and the transition between them. This isdone by applying automatic clustering proceduresto de®ne families of similar conformations, basedon the values of the nucleotide torsion angles andthe sugar puckering modes, and by investigatingthe parameters that characterize the differentfamilies. The considered parameters include tor-sion angles, the distance and relative orientationbetween the bases, the solvent accessibility, andthe intramolecular hydrogen bonds. To establishthe degree of correlation between some of theseparameters, multivariate statistical techniques suchas multiple regression and discriminant analysesare used. This approach complements recent stu-dies on the base stacking free energy pro®les forall 16 pairs of natural ribonucleotide monophos-phates (Norberg & Milsson, 1995) and on the con-formational analysis of ApApA (Norberg &Nilsson, 1996). It furthermore provides interestingclues on the in¯uence of a single additional30phosphate on the conformational properties of adinucleotide.

Page 4: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

866 Conformational Analysis of Flexible Molecules

Results

General conformational trends ofthe dinucleotides

To study the general trends displayed by theconformations of GpA and GpAp in our simu-lations we monitored several key parameters.These are the base-base distance (dBB), the relativeorientation between the bases (yBB), the surfacearea accessible to solvent of the entire molecule(ASAT) and the surface area buried due to theinteraction between the two bases in the nucleotide(BSABB).

GpA

The base-base distance (dBB) pro®le along the tra-jectory (Figure 2a) displays three distinct periods,I (t � 0±472 ps), II (t � 472±977 ps) and III(t � 977±1320 ps). I and III involve extended con-formations whereas II involves folded structures.

The signi®cant distance ¯uctuations in Figure 2a(�2 AÊ ) suggest that none of these periods arecharacterized by a single molecular conformationbut by a family of conformations. The very begin-ning of period I (Ia, t � 0±21 ps) features semiex-tended structures whereas the end of period I (Ib,t � 324±472 ps) and the beginning of period III(IIIa, t � 977±1052 ps) correspond to transitionperiods representing the ``folding'' and ``unfold-ing'' processes of the dinucleotide, respectively.The time interval for the folding process is longerthan for unfolding, suggesting that the latter mayinvolve a lower energy barrier.

The behaviour of the relative orientation of thetwo bases, de®ned by the angle between the nor-mals to the plane of the bases (yBB), is depicted inFigure 2c. This angle also varies in roughly threestages. In particular, folded conformations(dBB � �5 AÊ ) are seen to display yBB values of�170�. Such values correspond to an anti-parallelbase stacking arrangement, in which the normalsto the planes of the G and A bases point in oppo-

Figure 2. Structural characteriz-ation of the bases relative toeach other. Base-base distance (dBB):(a) GpA (plain curve calculatedfrom the MD trajectory, dottedcurve is predicted from multipleregression). (b) GpAp (plain curvecalculated from the MD trajectory,dotted curve is predicted from mul-tiple regression). Base-base orien-tation (yBB): GpA (c) and GpAp (d).A different trend to folding ofGpAp relative to GpA is shown.Thus, for GpA, dBB delimits threeregions: I (t � 0±472 ps, open struc-tures), II (t � 472±977 ps, closedstructures), and III (t � 977±1320 ps, open structures). In con-trast to GpA, the GpAp closedstructures appear in a relativelyshort period (t � 456±474 ps). Plota reveals also the transition periodsto folding: Ib (t � 324±472 ps), andunfolding: IIIa (t � 977±1052 ps)and a subregion of semiextendedstructures (Ia: t � 0±21 ps), forGpA. c Illustrates that guanine andadenine bases display an antiparal-lel stacking for GpA along thefolded range; whereas d shows thatthese bases ¯uctuate about anorthogonal orientation for GpAp.

Page 5: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Conformational Analysis of Flexible Molecules 867

site directions (see Figure 1). We see moreover thattransitions to and from the folded structuresinvolve large ¯uctuations in yBB, whereas duringthe period corresponding to folded structures thevariations of this angle are more limited.

Figure 3a displays the total accessible surfacearea (ASAT) of GpA along the trajectory. It featuresthree regions, which correspond to the same con-formational subsets as those described by the vari-ations of dBB: I (infolded), II (folded) and III(unfolded). There is moreover a good temporaloverlap between the two subregions in Figures 2aand 3a, which correspond to the folding/unfoldingtransitions. This is re¯ected by the high correlationcoef®cient between the dBB and ASAT values alongthe trajectory (R � 0.96).

The relationship between ASAT and dBB isdepicted in Figure 4, which displays data pointsfrom the simulation together with the predictedregression line and the lower (L95) and upper(U95) con®dence limits for individual predictedvalues. Interestingly, this relationship is roughly

linear, suggesting that the bases get independentlysolvated as soon as they separate. This occurs as aresult of a complex movement, which combinesbase separation with base rotation. Furthermore,whereas only three ASAT values are lower than(L95), a signi®cant number of points with dBB inthe range 6±12.5 AÊ lie above U95. The molecularstructures corresponding to these outliers haveASAT values much larger than expected from theirdBB. We ®nd a connection between these unex-pected large ASAT values and sharp changes inyBB.

In the base-base buried surface area pro®le(BSABB, Figure 3c) the GpA trajectory is alsodivided into different regions, which are clearlyrelated to those discussed above. We see a centralregion where BSABB > 70 AÊ 2, which is ¯anked onboth sides by regions where BSABB reduces essen-tially to zero. Comparison of Figure 3c and a,shows that base-base interactions alone account foressentially all the ¯uctuations in the accessible sur-face area of the entire molecule. We also see a cor-

Figure 3. Solvent interaction.Total accessible surface area(ASAT): GpA (a) and GpAp (b).Base-base buried surface area(BSABB): GpA (c) and GpAp (d).A general inverse correspondencebetween ASAT and BSABB isobserved. GpA and GpAp show avery different solvent-interactionpro®le. For GpA, a central foldeddomain maintaining �500 psbetween two extended regions isdiscernible (a and c). For GpAp, apeak lasting only 18 ps about450 ps is found (b and d). More-over, the difference in BSABB (GpAreaches 150 AÊ 2 whereas GpApmaximum is 100 AÊ 2) makes evidenta difference in stacking. Also, wesee, for GpAp a loss in ASAT about1061 ps with no effect on BSABB.

Page 6: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Figure 4. Total accessible surface area (ASAT) versusbase-base distance (dBB) for GpA. Regression line andlower (L95) and upper (U95) 95% con®dence limits (inbold) are displayed.

868 Conformational Analysis of Flexible Molecules

respondence between semiextended structures(t � 0±21 ps; Figure 2a) and conformations withBSABB < 20 AÊ 2 (Figure 3c).

Thus, all three parameters monitored along thetrajectory (ASAT, dBB, and BSABB), consistentlydescribe folding and unfolding transitions forGpA, occurring at t � 324±472 ps and t � 977±1052 ps, respectively.

GpAp

The behaviour of the base-base distance dBB

during the GpAp simulations (Figure 2b) is signi®-cantly different from that of GpA. The time periodfeaturing folded structures in GpAp is very short(t � 456±474 ps). Its length is a mere 3.6% of thelifetime of the folded GpA conformations(Figure 2a). In contrast to the reasonably wellstacked GpA bases in the folded form, the GpApbases ¯uctuate about an orthogonal arrangement,with yBB values close to 108� (Figure 2d). This,together with the dBB pro®le suggest that theadditional 30 phosphate hampers the formation ofstable base stacking in this dinucleotide.

The GpAp ASAT pro®le is shown in Figure 3b.The ASAT values are obviously larger for GpApthan GpA due to the additional phosphate. Asexpected, the total accessible surface area of thedinucleotide decreases with the shortening of dBB

(t � 456±474 ps; Figure 2b). Interestingly, weobserve here too a trough in the ASAT values,which is not related to dBB variations, occurringlater during the simulation (t � 1061 ps).

The GpAp base-base buried surface area (BSABB)peaks between t � 456±474 ps (Figure 3d), the

period corresponding to the folded dinucleotideconformations. Interestingly, the correspondingBSABB value is signi®cantly smaller (�100 AÊ 2) thanfor GpA (�140 AÊ 2), most probably due to thepoorer base-base stacking in the larger dinucleo-tide, mentioned above. The trough in ASAT neart � 1061 ps, which was not related to a shorteningin dBB, is not accompanied by a peak in BSABB.This suggests that some dinucleotide confor-mations manage to decrease their accessible surfacearea to solvent in the absence of base stacking, aswill be discussed below.

Behaviour of torsion angles

GpA

Figure 5a illustrates the variation of all the GpAtorsion angles (Figure 1) including the pseudorota-tion phase angle, P, which characterizes the sugarpuckering. Most of these angles vary so extensivelythat it is meaningless to compute their mean valuesacross the entire trajectory. We therefore focusedon analysing their interdependence. Figure 5a dis-plays this interdependence for some of theseangles, for instance, the pseudorotation phaseangle of ribose 1 (Gp) and the guanine glycosidic(Gw) angle. Thus, Gp starts at 144.6� and ¯uctuatesaround a mean value of 140.4� during 287 ps. Thisdihedral angle corresponds to 2T1 (a twisted formbetween C20-endo and C10-exo), S puckering. Atabout this time a transition to lower values takesplace and Gp varies around a mean value of 12.9�(C30-endo, N puckering) till t � 1067 ps. Then asecond transition to higher values occurs, and Gpreturns to the former 2T1 puckering, with a meanvalue of 145.8�. Figure 5a also shows that tran-sitions from the S to N pucker modes, or vice versa,occur via E intermediates, in agreement with pre-vious theoretical and experimental data (Olson &Sussman, 1982).

The Gw values are linked to the pucker variation.We see indeed that the S pucker is accompaniedby mean Gw values of 269.2� and 245.5�. In con-trast, for the N puckering the mean Gw value is183.5�. We furthermore see that the glycosidicangle displays much steeper changes in the S thanin the N puckering modes. Table 1 (upper diag-onal) shows the matrix of the correlation coef®-cients for the GpA torsion angles. The correlationbetween Gp and Gw angles is characterized by areasonable coef®cient of 0.79.

The puckering dynamics of the adenosine andguanosine sugars are completely different. Thepseudorotation angle Ap lies predominantly in thesouth region with a short incursion in the northregion (hApi � 19.9� at t � 202±268 ps). In contrastto guanosine, the correlation coef®cient betweenAp and the glycosidic angle Aw angles is very low(R � 0.11). Gg and Ag are signi®cantly anti-corre-lated (R � ÿ0.84). The Gg angle visits the �syncl-inal (�sc), ÿsynclinal (ÿsc) and antiperiplanar (ap)regions. The latter region is not adopted by Ag, Gz

Page 7: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Table 1. Correlation coef®cients for GpA (upper diagonal) and GpAp (lower diagonal) torsion angles

R Gd Ge Gz Gg Gw Gp Ad Ae Az Aa Ab Ag Aw Ap

Gd 1.00 0.63 ÿ0.39 0.24 0.75 0.94 ÿ0.22 ��� ��� ÿ0.50 0.03 ÿ0.16 ÿ0.25 ÿ0.23Ge 0.69 1.00 ÿ0.42 0.32 0.47 0.64 ÿ0.28 ��� ��� ÿ0.12 0.05 ÿ0.28 ÿ0.09 ÿ0.33Gz 0.19 0.33 1.00 ÿ0.74 ÿ0.58 ÿ0.43 0.28 ��� ��� ÿ0.01 ÿ0.26 0.76 0.08 0.39Gg 0.19 0.28 0.21 1.00 0.42 0.27 ÿ0.12 ��� ��� 0.30 0.20 ÿ0.84 ÿ0.05 ÿ0.25Gw 0.37 0.07 ÿ0.04 ÿ0.11 1.00 0.79 ÿ0.25 ��� ��� ÿ0.30 0.03 ÿ0.34 ÿ0.21 ÿ0.31Gp 0.95 0.69 0.22 0.17 0.41 1.00 ÿ0.23 ��� ��� ÿ0.51 0.03 ÿ0.17 ÿ0.22 ÿ0.25Ad ÿ0.10 ÿ0.08 ÿ0.22 ÿ0.11 ÿ0.06 ÿ0.10 1.00 ��� ��� ÿ0.01 ÿ0.10 0.10 0.10 0.91Ae 0.04 0.03 0.13 0.17 0.02 0.02 ÿ0.03 1.00 ��� ��� ��� ��� ��� ���Az 0.56 0.65 0.62 0.15 0.34 0.62 ÿ0.07 0.07 1.00 ��� ��� ��� ��� ���Aa ÿ0.19 ÿ0.08 0.09 0.13 0.07 ÿ0.18 0.03 0.17 0.05 1.00 0.02 ÿ0.46 0.02 ÿ0.09Ab ÿ0.11 ÿ0.09 0.08 0.03 ÿ0.05 ÿ0.10 0.03 0.12 0.10 0.05 1.00 ÿ0.27 ÿ0.01 ÿ0.12Ag ÿ0.13 ÿ0.20 ÿ0.29 ÿ0.03 0.02 ÿ0.13 ÿ0.01 ÿ0.08 ÿ0.15 ÿ0.42 ÿ0.04 1.00 0.14 0.24Aw ÿ0.07 ÿ0.20 ÿ0.31 ÿ0.48 0.26 ÿ0.02 0.23 ÿ0.05 ÿ0.01 0.03 0.02 0.08 1.00 0.11Ap ÿ0.14 ÿ0.10 ÿ0.22 ÿ0.09 ÿ0.09 ÿ0.13 0.67 ÿ0.08 ÿ0.13 0.05 ÿ0.01 0.03 0.18 1.00

In bold, those correlation coef®cients higher than 0.60.

Conformational Analysis of Flexible Molecules 869

displays a high degree of ¯exibility: starting at217.9�, spends 54 ps in the ap region; then, returnsto the �syn region where it stays for 233 ps. Itsvalue increases subsequently, reaching the ÿscinterval where it remains till t � 430 ps. After that,it crosses the synperiplanar (sp) sector movingonce more to the �sc region where it remains tillthe end of the simulation, with only a short visit tothe �anticlinal (�ac) region.

The other P-O torsion angle Aa also explores asigni®cant portion of conformational space, withvalues in the �sc, ap, ÿac, ÿsc, and �ac regions. Gespreads along the ap, ÿac and ÿsc regions avoidingcompletely the ®rst quadrant. Ab is con®ned to theanti region. The highest correlation coef®cients cor-respond to the pairs Gd-Gp (R � 0.94) and Ad-Ap(R � 0.91), as expected from the central position ofthe d bond in the ribose ring.

GpAp

The behaviour of the GpAp torsion angles isdepicted in Figure 5b. The guanosine ribose puckerGp behaves similarly as in GpA, although the pro-portion of values in the N pucker mode is lowerhere than in GpA. For the adenosine ribose pucker,we see that, in contrast to GpA, no N Ap pucker-ing is observed.

The distribution of Gw is qualitatively similar tothat in GpA. Once again, the highest values of Gw(ÿsc region) appear when the guanosine riboseadopts the S puckering mode. Interestingly, thecorrelation between Gw and the pseudorotationangle Gp is lower (R � 0.41, Table 1, lower diag-onal) than for GpA (R � 0.79, Table 1, upper diag-onal). On the other hand, a clear difference isfound for adenosine glycosidic angle. In GpAp, Awspends a long time in the �sc regions whereas inGpA, Aw never visits the ®rst quadrant. The corre-lation coef®cient between Aw and Ap (R � 0.18) issimilar to that in GpA (R � 0.11).

Central C40-C50 torsion angles (gs) are more con-strained for GpAp than for Gpa. In contrast toGpA, neither Gg nor Ag are found in the �sc sec-

tor. The correlation between Gg and Ag is alsovery different between GpAp (R � ÿ0.03) andGpA (R � ÿ0.84). For Gz, the main differencebetween GpA and GpAp is that GpA makes acomplete crossing through the synperiplanar sectorwhereas this region seems to be forbidden forGpAp in the simulation.

The presence of the second phosphate adds anew phosphoester angle Az. This angle is mostlycon®ned to the �sc sector, but remains quite ¯ex-ible and takes nearly all values. Aa is more con-strained for GpAp than for GpA, and in contrastto GpA, the ÿac or ÿsc regions are not observedat all. On the other hand, the Ab and Ge torsionangles behave similarly in GpAp and GpA. Inter-estingly, we observe a swing in Ab, moving fromap to higher values about the same time as thelate decrease in ASAT, which is independent ofbase-base interaction (Figure 3b, t � 1061 ps).Visual inspection of the corresponding molecularconformations revealed the presence of a hydro-gen bond between guanine N7 and the hydroxylgroup of the terminal phosphate. This H-bondproduces a macro-ring involving the guaninebase, ribose 1, the inner phosphate, ribose 2 andthe terminal phosphate moieties, which results ina decrease in the solvent accessibility of the mol-ecule. Thus, the decrease in ASAT can be pro-duced either by base-base or by base-phosphateinteractions. Furthermore, we see (Table 1, lowerdiagonal) that Ae is not correlated with Ap(R � ÿ0.08) whereas a relatively high correlationcoef®cient (R � 0.69) is computed for the pair(Ge, Gp). In a similar way, the correlation coef®-cient for the pair (Ad, Ap) is much lower(R � 0.67) than that for the pair (Gd, Gp)(R � 0.95). Comparison of Table 1 upper andlower diagonals, reveals that, in general, the cor-relation coef®cients between dihedral angles arelower for GpAp than for GpA. We also see thatthe second phosphate in¯uences the guanosineconformations through a correlation between Az,and {Ge (R � 0.65), Gz (R � 0.62) and Gp(R � 0.62)}.

Page 8: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Figure 5. GpA (a) and GpAp (b) torsion angle evol-ution along the simulation. G denotes guanosine and Adenotes adenosine (see Figure 1 for angle notation).

870 Conformational Analysis of Flexible Molecules

Multivariate analysis of the GpA and GpAptrajectories: conformational families

To characterize the dinucleotide conformationsgenerated in the simulations, a cluster analysis wasperformed on the basis of their torsion anglevalues (see Methods). In addition, the level ofstructural similarity within and between the gener-ated clusters was assessed using structure superpo-sitions (McLachlan, 1979) and computing theresulting root mean square (RMS) deviations of theatomic coordinates. To assess structural similaritybetween clusters, we ®rst computed the mean con-formation of each family by superimposing all thestructures of a given family onto one of its mem-bers, and computing the resulting average atomic

Figure 6. Distribution of GpA (a) and GpAp (b) clus-ter partition along the trajectory (19 and 25 clusters forGpA and GpAp, respectively, are included; seeMethods).

Page 9: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Conformational Analysis of Flexible Molecules 871

coordinates. Then RMS deviations between themean structures of all the clusters were evaluated.

GpA

The clustering procedure yields a total of 19families, or clusters, for GpA denoted C1-C19.Table 2A shows the mean value of the torsionangles de®ning each family of conformations,

Figure 7. Density plot for RMS deviations between allGpA (a) and GpAp (b) cluster average structures. Dar-ker squares arise from smaller values. For GpA, clusters{C9, . . . , C15} constitute a group of resembling struc-tures. Clusters {C10, . . . , C15} correspond to folded struc-tures whereas C9 contains the last part of foldingtransition (see Figures 6a, 2a, 3a and 3c). For GpAp, it isapparent that cluster 11, which includes the folded inter-val, divides the plot into two separate regions.

Figure 6a depicts how the families are distributedalong the trajectory, and Figure 7a illustrates theirRMS deviations.

We will start by commenting on the confor-mational resemblance of the different families, andfollow with a more detailed description of theirsalient features.

Figure 7a shows that the largest RMS values(4.9 AÊ ) are obtained for the C4, C12 families,whereas the smallest value (0.3 AÊ ) is that for theC14, C15 pair. From this plot, a region of closelyresembling families spanning the clusters C9±C15,clearly stands out. As seen from Figures 2a and 6a,these clusters delimit the conformational subspacethat de®nes the folded structures. More speci®-cally, all the clusters in this group, except C9,correspond to folded conformations; C9 con-tains transient conformations leading to foldedstructures.

To evaluate the signi®cance of these structuraldifferences we compared them to the structuraldispersion of the conformers within each family.This dispersion was evaluated from the pairwiseRMS deviations between conformers, followingwhich Tukey's test (Spjùtvoll & Stoline, 1973) wasapplied (see Methods). This revealed that clustersC4, C5 and C6 and clusters C13, C14 and C15,respectively, displayed no signi®cant structuraldifferences. These clusters were consequentlygrouped into single classes, reducing the totalnumber of distinctly different structural families to15. The mean structures of the most populatedclusters, representing respectively the folded statesand unfolded states are displayed in Figure 8(a).The values of the torsion angles of the mean struc-tures of all 15 classes, are listed in Table 2A.

Let us now turn to the conformational featuresof the families. The C1±C3, and C4/C5/C6families, which appear successively during the ®rst287 ps are associated with the S pucker of the gua-nosine ribose (Gp) and ÿsc values for Gw. Duringthis time interval, changes in some speci®c anglesare observed. C1 lasts 20.5 ps and correspondsto the previously discussed region of dBB

(Figures 2a and 3c, respectively). C2 contains smal-ler average Gw values near the ap sector togetherwith an increase of Ge. C3 appears after a con-certed turn of Gz and Gw from ap to �sc and ÿsc,respectively. C4 and its associated clusters C5 andC6 follow after a change in Aa from �sc to aptogether with a change in Ad and consequently inthe puckering of adenosine ribose. We see that Apis quite stable in the south region with only a shortincursion in the north (C4). Relative to C4, C6 ischaracterized by an increase in Aa, together withthe return of adenosine ribose to south puckering.

C7 begins at t � 287 ps after a change in Gp andassociated Gd angle to N puckering. This variationis related to a concomitant shift in Gz and Gwtowards the ÿsc and ap sectors, respectively. C8arises from concerted changes in Ge, Aa and Ab.C9 results from the Gz crossing of the sp sectortowards the �sc region. C10 arises from a 90� turn

Page 10: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

872 Conformational Analysis of Flexible Molecules

of Gg to the ÿsc sector. C11 results after a Ag 90�turn towards the �sc sector. Ag remains at thisvalue till the end of the simulation.

During the 545±1067 ps interval, the dinucleo-tide ¯uctuates between several conformations(Figure 6a). These ¯uctuations are caused bychanges in Aa: C11, C13 and C14 ¯uctuate aroundAa � 90� whereas C12 and C15 vary around 150�.C13, C14 and C15 are characterized by a commonmean value of Gg around 180�. C13 differs fromC14 in Aa and Gw. C13 contains the lowest valuesof Aa and Gw in this part of the dynamics. One ofthe structures in the C13 family has the lowestGw values in the entire simulation (106.0� att � 718 ps). Around the same time we also observea decrease in BSABB (Figure 3c).

The C16±C19 conformations appear after theguanosine ribose pucker changes from north tosouth, which occurs at 1068 ps. The correspondingconformations also display higher Ge values thanthose encountered in the C9±C15 families. C17 ismainly determined by the Gz and Aa peaks. C18is de®ned by high values of Gw. Finally, C19 ischaracterized by an antiperiplanar value of Gg.

It is noteworthy that we observe a relation-ship between the guanosine ribose puckering

Table 2. Conformational families for MD simulation

A. GpA MD simulationC % Gd Ge Gz Gg Gw

1 1.6 133.7 214.8 186.5 55.8 278.9 1133.1 210.6 186.5 60.3 279.2 1

2 2.5 129.7 268.5 178.7 55.0 192.8 1126.6 273.5 175.6 52.7 189.9 1

3 11.2 130.6 238.1 74.2 54.5 277.7 1133.7 235.8 69.6 63.1 279.9 1

4 5.0 131.0 258.1 79.6 56.8 284.8 15 0.5 111.7 217.8 139.2 55.4 244.7 16 0.9 121.1 268.0 86.9 55.4 288.3 1

129.1 258.2 80.4 62.9 282.8 17 9.5 76.6 230.0 270.0 47.9 200.1

78.5 244.1 267.6 41.2 200.08 1.3 78.6 203.4 287.0 56.4 186.4

79.6 206.7 282.7 51.2 188.19 4.8 78.3 174.4 48.2 55.5 186.0

78.6 172.4 49.9 59.8 187.910 3.9 76.3 184.9 54.1 ÿ66.2 176.0

76.3 178.8 58.2 ÿ65.1 175.911 18.2 76.3 185.8 53.1 ÿ64.9 178.9

75.5 179.5 57.7 ÿ63.6 178.912 12.3 77.6 200.8 52.9 ÿ65.4 185.4

76.8 198.2 61.2 ÿ65.0 185.513 5.0 78.8 185.2 54.9 ÿ177.6 166.914 2.2 75.8 187.2 44.2 ÿ177.8 182.115 1.8 78.3 198.2 50.9 ÿ176.0 184.0

76.7 178.4 55.7 ÿ181.8 175.116 9.4 133.5 255.7 87.5 ÿ69.5 221.5 1

127.4 268.0 80.5 ÿ59.8 216.8 117 0.8 127.3 263.1 139.2 ÿ68.1 213.2 1

128.2 266.2 135.9 ÿ75.2 213.0 118 7.7 138.3 214.4 70.8 ÿ68.9 282.3 1

138.3 209.9 67.6 63.5 283.0 119 1.3 139.8 281.4 99.4 ÿ170.0 220.2 1

140.5 281.7 100.0 ÿ177.9 219.0 1

In plain, torsion angle mean values from statistical cluster analysiscedure. Conformations C4, C5, C6, C13, C14 and C15, that are consinto two sets after using RMS procedure.

and folding/unfolding process. From our MDsimulation, it appears that a change from S toN is required to initiate the folding process,whereas an inverse change characterizes theprocess of unfolding.

GpAp

The mean torsion angle values of the GpAp con-formational families and the temporal sequence ofappearance of these families are given in Table 2Band Figure 6b, respectively. The RMS deviationsbetween the average conformations of thesefamilies are represented on Figure 7b. These datatogether with an analysis of the time-dependentbehaviour of the torsion angles and sugar puckermodes (not shown) allow us to gain insight intosome important structural properties of this dinu-cleotide.

We see ®rst of all that the number of structuralclusters identi®ed for GpAp (a total of 25), is largerthan that for GpA, indicating that the addition of amere 30 phosphate increases conformational diver-sity. Tukey's test, con®rms indeed that most ofthese families represent statistically distinct clus-ters. The exceptions are the cluster pairs C9, C10;

Gp Ad Aa Ab Ag Aw Ap

45.7 143.6 58.6 161.1 ÿ69.2 185.6 160.141.5 148.3 58.2 164.3 ÿ68.5 186.0 161.841.8 140.8 74.1 170.1 ÿ64.3 203.0 157.237.2 144.3 68.9 171.7 ÿ63.1 202.0 158.341.2 134.4 72.1 184.0 ÿ67.0 202.9 147.840.6 140.4 72.0 186.2 ÿ68.1 199.8 150.140.4 79.8 154.4 199.6 ÿ61.1 188.1 19.918.8 127.2 266.6 165.8 ÿ68.5 215.0 137.830.3 146.9 204.0 203.3 ÿ60.5 219.6 166.533.9 78.7 178.3 198.4 ÿ60.3 198.5 46.315.1 138.1 275.1 184.8 ÿ67.2 211.0 152.6

8.5 141.2 284.5 179.8 ÿ72.9 204.5 150.811.1 135.3 185.2 249.7 ÿ59.1 194.4 146.5

6.5 143.5 186.0 256.4 ÿ60.9 193.7 151.21.2 144.7 217.7 179.6 ÿ63.2 191.5 161.83.2 147.8 213.2 182.5 ÿ64.9 190.3 160.2

12.6 142.0 248.8 180.9 ÿ63.7 191.1 157.215.0 144.8 246.0 181.8 ÿ64.5 190.2 156.815.5 139.5 100.5 184.0 59.8 223.9 163.516.1 144.2 94.0 186.8 60.7 222.5 166.014.4 131.3 159.5 143.1 47.6 211.5 145.514.9 136.2 158.8 136.0 43.8 212.3 147.8

5.2 136.4 91.8 188.6 61.8 194.9 158.716.2 142.4 115.2 179.1 56.7 210.4 173.315.0 132.7 153.8 146.8 48.1 197.9 149.910.2 142.4 97.5 185.2 60.5 199.6 165.143.5 131.4 76.6 171.9 54.8 195.5 149.634.7 134.4 84.4 167.4 56.5 192.9 148.536.8 136.6 148.1 153.2 49.7 205.1 154.833.3 139.2 149.4 152.7 46.1 203.9 153.448.4 140.7 74.2 159.3 52.6 192.9 166.346.7 144.0 72.5 159.2 51.1 193.2 167.752.5 109.0 93.8 208.3 58.7 188.8 113.850.3 113.3 91.7 205.7 59.0 187.4 118.2

. In italics, torsion angles for average structures from RMS pro-tructed as separated classes within cluster analysis, are grouped

Page 11: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Table 2Ðcontinued

B. GpAp MD simulationC % Gd Ge Gz Gg Gw Gp Ad Ae Az Aa Ab Ag Aw Ap

1 1.5 130.0 270.2 278.3 290.6 214.5 139.5 141.5 267.8 107.0 177.2 183.1 192.1 59.9 156.8130.2 273.8 278.5 295.8 212.4 136.7 144.4 266.3 124.1 170.1 186.3 196.1 58.1 152.3

2 7.6 133.6 268.1 243.4 290.3 195.8 143.9 137.7 261.7 72.4 71.0 171.3 289.5 63.9 147.0132.3 279.2 244.1 284.8 190.5 141.4 140.7 264.4 64.7 54.0 185.6 295.5 59.0 144.9

3 4.7 133.5 256.4 189.0 292.7 295.4 143.8 138.2 244.8 68.0 87.1 174.0 293.8 60.4 148.9128.8 272.4 179.9 294.9 294.9 133.3 142.1 243.1 74.2 71.3 180.6 294.6 60.4 144.7

4 1.3 127.7 271.3 258.8 290.1 219.8 134.0 131.5 214.1 167.6 73.5 142.1 288.5 69.3 141.0131.8 281.2 257.4 189.4 220.6 135.7 129.4 207.4 171.2 63.6 150.8 287.0 63.7 134.8

5 2.5 139.8 278.5 200.4 293.9 197.9 151.8 125.4 223.4 73.5 74.5 161.8 293.1 62.9 132.1134.9 286.1 210.8 298.1 193.9 146.8 130.1 209.6 80.7 54.7 181.9 301.2 62.7 138.2

6 2.0 146.7 225.4 98.7 294.0 291.0 157.8 142.2 256.7 69.2 85.5 155.2 296.9 63.9 154.1152.0 212.3 96.8 294.0 294.1 159.6 150.8 254.3 66.2 84.7 162.7 300.2 63.9 158.3

7 2.5 125.3 236.3 163.8 292.0 227.1 133.6 135.1 259.7 72.1 75.2 214.5 296.7 65.6 141.9127.4 252.8 157.6 297.8 225.1 132.2 135.3 265.6 70.0 68.7 209.2 293.5 62.5 137.0

8 8.1 130.2 271.5 123.5 291.7 198.0 139.6 146.2 239.7 69.4 70.9 173.7 293.3 61.6 160.4129.4 278.1 115.6 289.6 194.1 136.5 150.6 233.2 61.5 72.2 168.2 290.6 61.7 160.8

9 0.9 126.1 240.1 198.1 291.9 192.9 133.2 144.5 280.8 ÿ31.2 74.7 170.1 291.3 61.8 156.610 0.8 146.2 222.5 177.2 298.3 239.9 158.3 146.5 290.5 ÿ30.6 81.0 174.6 296.7 68.2 163.2

131.7 254.5 176.9 291.7 212.4 136.9 147.1 287.2 ÿ41.1 58.9 177.0 293.9 59.8 151.211 4.9 146.2 284.2 97.1 293.1 197.3 157.4 150.7 233.2 ÿ119.5 54.3 174.1 284.4 59.4 169.3

148.6 280.1 103.6 301.1 190.7 157.8 157.3 237.2 ÿ121.8 51.0 177.5 283.7 58.1 171.712 7.5 144.4 250.3 87.3 290.9 201.8 155.5 138.0 246.0 ÿ283.8 65.1 157.4 290.8 57.6 151.1

145.5 248.2 95.2 287.9 205.5 155.7 140.3 245.9 ÿ291.4 57.8 160.5 287.9 56.3 145.513 5.4 143.4 224.3 73.6 289.8 236.6 156.0 137.7 226.3 ÿ293.5 69.5 159.6 291.1 65.0 150.114 1.2 115.6 209.4 78.4 290.4 222.0 120.2 137.3 262.1 ÿ288.2 65.9 139.9 289.5 65.6 148.3

141.9 205.7 84.3 290.1 234.2 150.4 138.8 218.1 ÿ280.1 53.1 172.3 297.9 60.3 146.315 16.5 78.6 207.5 87.0 292.8 192.0 23.0 140.7 256.9 ÿ286.9 89.3 175.1 294.2 63.3 153.1

80.2 211.6 75.0 297.7 196.5 25.0 149.9 255.3 ÿ291.5 99.7 171.3 291.5 65.0 159.616 6.4 75.8 183.9 147.6 292.6 189.8 26.2 140.8 231.1 ÿ293.2 62.0 164.2 289.8 62.2 155.917 0.1 74.3 232.9 105.9 290.5 187.3 47.0 129.4 257.4 ÿ311.1 128.4 178.5 219.7 48.5 165.3

67.2 165.3 161.8 288.2 196.8 28.0 143.8 210.4 ÿ286.2 66.1 169.2 291.2 60.6 154.818 4.5 78.9 192.5 164.5 182.7 182.2 14.5 135.8 243.1 ÿ292.8 67.3 179.0 290.1 60.8 147.3

75.8 191.5 174.6 181.8 184.0 13.8 139.4 244.3 ÿ290.5 55.4 188.5 292.8 58.2 144.819 5.0 77.4 207.9 82.5 182.1 188.7 25.3 147.5 218.1 ÿ288.0 74.9 164.0 293.2 63.3 164.7

79.7 206.2 77.6 180.9 194.8 24.3 150.7 203.3 ÿ285.0 79.1 165.1 296.5 63.3 167.020 4.0 78.7 211.4 82.2 292.4 195.0 26.3 156.2 257.6 ÿ287.3 105.6 180.4 298.6 197.9 185.5

79.0 216.5 75.9 292.9 199.8 24.7 154.1 271.1 ÿ289.2 110.2 174.9 296.2 199.0 181.521 2.9 79.2 204.3 102.6 292.4 203.2 37.4 144.6 210.7 ÿ185.5 79.8 179.8 291.9 214.8 158.5

75.1 198.4 104.0 296.0 201.9 40.0 144.0 205.4 ÿ182.4 77.1 183.1 295.1 216.1 158.422 1.5 81.9 215.5 98.4 180.7 204.9 67.3 154.8 224.8 ÿ168.9 62.1 150.9 298.2 243.2 174.2

81.1 212.6 101.2 183.2 205.4 57.6 155.3 208.8 ÿ172.1 59.0 152.7 296.3 242.3 175.123 2.0 130.9 233.8 71.1 182.7 213.7 142.8 145.5 259.0 ÿ140.3 73.5 170.6 287.5 283.2 154.1

131.7 221.7 78.4 183.8 212.9 140.9 145.3 273.7 ÿ153.5 64.0 176.0 290.2 280.7 148.524 2.8 128.2 215.7 71.0 183.9 252.0 137.4 147.5 235.3 ÿ73.1 71.7 178.1 292.9 260.4 159.9

127.9 204.0 76.1 183.6 252.5 133.9 148.8 232.6 ÿ84.3 66.0 185.0 293.2 259.2 154.325 3.3 128.0 216.4 69.8 184.7 307.0 135.9 144.0 237.5 ÿ69.4 69.0 164.4 292.0 240.5 158.6

135.8 208.8 66.7 178.3 309.7 141.2 149.6 231.7 ÿ68.1 66.8 170.5 292.2 239.1 158.8

In plain, torsion angle mean values from statistical cluster analysis. In italics, torsion angles for average structures from RMS pro-cedure. Conformations C9, C10, C16 and C17, that are constructed as separated classes within cluster analysis, are grouped into twosets after using RMS procedure.

Conformational Analysis of Flexible Molecules 873

C13, C14; and C16, C17, respectively, for which thedifferences in RMS are not signi®cant, reducing thenumber of families to 22.

The mean structures of the most populated clus-ters, are depicted in Figure 8(b). The values of thetorsion angles of the mean structures of all 22classes, are listed in Table 2B. With regard tospeci®c conformational features, we see that theidenti®ed families are determined by Gz and Aztransitions. Next we observe that while the adeno-sine ribose puckering is quite stable in the S region,the guanosine ribose changes from S to N att � 674 ps yielding C15 and goes back to S att � 1214 ps yielding C23. This behaviour is alsovisible in the variations of Gd and Ad. Highest

values of Ge are observed when the guanosineribose adopts the S puckering mode. Gg ¯uctuatesin the ÿsc sector with two excursions to the ap area(C18, C19; and C22±C25). Gw varies in the ap sec-tor with various explorations of the ÿsc part (C3,C6, and C25 conformations). ÿsc Gw conformationsoccur concomitantly with the S guanosine ribosepuckering, whereas ap Gw conformations occur inboth the N and S puckering modes.

Unlike GpA, in which the S guanosine ribosepucker was mainly associated with ÿsc Gw values,here we observe, ap and ÿsc Gw conformations toalternate in the S pucker. These differencesbetween GpA and GpAp are re¯ected in the Gw-Gp correlation coef®cients, which are higher in

Page 12: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Figure 8. Visualization of the four most populated molecular structures corresponding to GpA (a) and GpAp (b)cluster averages. For GpA, clusters C11, C12, C3 and C7 are shown (see Table 2A). For GpAp, clusters C15, C8, C2and C12 are shown (see Table 2B).

874 Conformational Analysis of Flexible Molecules

GpA than in GpAp (Table 1). Ae ¯uctuates in theÿanti quadrant providing a succession of incur-sions into the ÿap, ÿac zones. Aa and Ag start inthe ap sector where they stay for 20 ps characteriz-ing C1. Ab ¯uctuates in the ap sector though apeak towards higher values is detected in C7. Aw¯uctuates in the �sc region till t � 1102 ps, whereit changes towards ap and ÿsc regions (C20±C25).

Lastly, Figure 7b, indicates that the lowest RMSvalue (0.7) is computed for the C9, C10 pair, andthe highest one (4.4) for the C11, C20 pair. Thisplot appears to be split into two regions near clus-ter C11, which occurs at 400±500 ps (Figure 6b).This is also the time when the closed conformationforms (Figure 2b). Figures 7b,6b together with datain Table 2B provide a structural rationalization tothe data in Figure 2b. We see indeed, that theGpAp trajectory contains two different regions ofrelated open structures connected by a closedstructure.

Multivariate analysis of the GpA and GpAptrajectories: folding/unfolding

To obtain further insight to the structural par-ameters which govern the conformational prefer-ences of GpA and GpAp, multiple regression and

discriminant analysis were applied to the gener-ated trajectories.

The multiple regression approach was used toexamine the dependence of the base-base distancedBB on the torsion angles in both dinucleotides.Due to the correlated character of the torsionangles, we determined the minimum set of theseangles that could reproduce the behaviour of dBB

as a function of time.Discriminant analysis was used to classify the

molecular structures into two discrete groups(``folded'', BSABB > 0 or ``unfolded'', BSABB � 0) onthe basis of the torsion angles. Here too we deter-mined the minimum set of angles that could pro-duce a good discrimination model, i.e. a modelcapable of describing with reasonable accuracy theappearance of folded and unfolded states as in thetime series in our trajectory.

GpA

Multiple regression. The relationship between dBB

(taken as the continuous dependent variable) andthe collection of torsion angles (the independentvariables) was studied, using a stepwise selectionprocedure, in which the independent variables areselected in order of decreasing contribution to the

Page 13: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Table 3. Multiple regression analysis

A. GpA dBB

Step

Variable enteredin the model for

each step

Standardizedcoefficients in the

final modelModel R2

for each step

1 Gp 0.45 0.64132 Gg 0.21 0.75653 Ge 0.16 0.77534 Ab 0.09 0.78155 Aa 0.10 0.78886 Gw 0.15 0.79477 Aw 0.07 0.79758 Gd 0.16 0.80139 Gz 0.09 0.8018

10 Ag ÿ0.10 0.802711 Ad ÿ0.10 0.803212 Ap 0.09 0.8043

The Table shows the order in which torsion angles enter thestepwise multiple regression model for GpA dBB. The puckeringof guanosine ribose (Gp) is the ®rst variable to enter the model;64.13% of the variance is explained when only this variable ispresent (simple regression model). R2 increases successively ineach step as a new variable enters; in the ®nal model (stepnumber 12), 80.43% of the variance is explained. Coef®cientsare standardized to quantify the relative contribution of eachindependent variable (see Methods). Gp bears the highest valueof the standardized coef®cients in the ®nal model. The positivesign of Gp indicates that low puckering values (C30-endo) corre-late with folding and high puckering values (C20-endo) correlatewith unfolding. Inclusion of the rest of variables improves con-siderably the predictive power of the model. For instance, achange of Gg from �sc (60�) to ÿsc (ÿ60�) accompanies theappearance of folded structures. Furthermore, higher values(270�) of Ge are associated to unfolded structures whereaslower values (180�) are associated to folded structures.

Step

Variable enteredin the modelfor each step

Standardizedcoefficients in

the final modelModel R2

for each step

1 Ge ÿ0.23 0.37982 Aa 0.33 0.46553 Aw 0.29 0.53934 Ad ÿ0.14 0.59465 Gd ÿ0.51 0.64046 Gw 0.22 0.68337 Ag 0.18 0.70378 Ap ÿ0.15 0.71339 Ae ÿ0.09 0.7222

10 Gp 0.12 0.723011 Az ÿ0.06 0.723712 Gz 0.03 0.7239

Multiple regression model for GpAp dBB is very different incomparison to GpA (see A). The discrepancies in the order inwhich the variables enter the model, the relative weight andthe sign of some of them are in agreement with the deep con-trast in dBB pro®les.

Conformational Analysis of Flexible Molecules 875

change (variance) in dBB. Table 3A displays foreach step, the selected independent variables,together with their associated standardized coef®-cient, and the corresponding R2 value. The standar-dized coef®cient measures the relative contributionof the considered torsion angle or sugar puckervariable to the behaviour of dBB, and R2 gives thefraction of the total variance modelled by all theparameters considered at a given step.

This analysis indicates, ®rst of all, that the®rst three selected independent variables (Gp, Gg

and Ge) belong to the guanosine ribose(Table 3A). Furthermore it shows that the fold-ing/unfolding process is induced by a concertedchange of a small subset of the dinucleotide tor-sion angles.

We see that the guanosine ribose puckeringmode (Gp) is the ®rst variable to be selected, con-tributing 64% to the total variance. Using this oneangle in a simple regression model allows us toreproduce to within 64%, the time dependent dBB

behaviour of GpA in our trajectory. This makes theguanosine pucker the major determinant of GpAconformation, in agreement with previous con-clusions on the importance of the sugar pucker indetermining allowed conformations in B-DNA(Lavery & Hartmann, 1994).

The positive sign of the standardized coef®cientindicates that closed structures are associated withlow values of Gp (N pucker). However, the Gpangle alone poorly describes the dBB behaviour.Semiextended molecular structures belonging to Ib(folding transition process, t � 324±472 ps) andIIIa (unfolding transition process, t � 977±1052 ps)regions fall in the N puckering mode, and wrongfolded structures would be predicted by usingonly the Gp variable.

The inclusion of Gg in the second step improvesthe model with R2 increasing from 0.64 to 0.76. Gglocalizes the left edge of period II (Figure 2a) byimposing the condition that high (positive) Ggvalues yield long base-base distances whereas low(negative) Gg values yield short base-base dis-tances. This makes it clear that a change of Ggfrom �syn to ÿsyn accompanies the transition tofolded structures (cf. cluster 10, Figure 6a andTable 2A). Addition of the rest of the angles makesthe model more ¯exible and enhances the accuracyof the prediction.

Figure 2a displays the observed time depen-dent behaviour of the base-base distance in theMD simulation (plain curve) together with thepredicted one derived from the regression anal-ysis (dotted curve). We see that the predictedcurve reproduces the observed one quite well,and that the closed and open regions are welldifferentiated. However, some subregions are notcorrectly reproduced. For instance, subregion Ia(0±21 ps), which was clearly identi®ed in thecluster analysis as C1 with well-characterized Gwand Ge mean values, is not detected. Thissuggests that the set of torsion angles consideredhere may not predict precisely enough the dinu-cleotide conformation. Additional degrees of free-dom need to be considered, in order to describemore precisely local structural variations whichlead to the formation or breaking of some of thekey internal hydrogen bonds.

Discriminant analysis. The different GpA confor-mations sampled along the molecular dynamicstrajectory were classi®ed according to their BSABB

value: structures with BSABB > 0 were consideredas folded, the ones with BSABB � 0 were con-

Page 14: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Table 4. Discriminant analysis of GpA BSABB

Step

Variable enteredin the modelfor each step

Standardizedcoefficients in

the final model

Average squaredcanonical

correlation

1 Gp 2.47 0.59132 Gg 1.17 0.66523 Ge 1.49 0.70934 Aa 1.29 0.71975 Ab 0.64 0.72406 Ap 1.15 0.72887 Gw 0.81 0.73388 Aw 0.36 0.73559 Gz 0.38 0.7367

10 Ad ÿ0.62 0.737811 Gd 0.37 0.7384

To investigate GpA folding process by discriminant analysistechnique, BSABB is transformed in a binary variable. BSABB

values are grouped into two classes: folded if BSABB is greaterthan zero, and unfolded otherwise. A consistent coincidence inthe variables which ®rst enter the stepwise procedure (Gp, Ggand Ge) relative to corresponding multiple regression model ofdBB (Table 3A) is found.

876 Conformational Analysis of Flexible Molecules

sidered as unfolded. The chosen independent vari-ables were again the collection of torsion angles.As in the previous approach, a stepwise variableselection procedure was performed. The results ofthe calculation are shown in Table 4.

In agreement with the regression analysis, thethree principal variables de®ning the discriminantmodel were Gp, Gg and Ge. The MD simulationyielded 2640 molecular structures from which 1282(48.6%) and 1358 (51.4%) were closed and open,respectively. For the set of folded structures, the

Figure 9 (legen

discriminant analysis classi®ed correctly 96.3% ofthe structures, with only 3.7% being mispredicted.For the set of extended structures, 90.0% appearedto be correctly recognized, with 10.0% of misclassi-®ed structures.

We thus see that the discriminant model isworse for the set of extended structures, indicatingthat the folded state is better characterized than theunfolded one.

GpAp

The low fraction of GpAp structures (4.7%) withburied surface area between the bases (BSABB > 0)precludes us from using the discriminant analysisfor predictive purposes. Hence, only the multipleregression analysis was employed.

Table 3B shows the results of the multipleregression analysis applied to the behaviour of dBB

in GpAp. In the stepwise selection procedure, twotorsion angles Gg and Ab, did not ful®ll the cri-terion for being included in the model (their contri-butions to the model were not signi®cant at the0.05 level).

Clearly, the pattern of regression coef®cientsfor GpAp is very different than for GpA. Thisdifference is particularly striking for the ®rstthree variables (Gp, Gg, Ge) selected in the GpAanalysis. Here, Gp is selected only at the tenthstep of the procedure, Gg is excluded, whereasGe is included in the ®rst step but with an oppo-site sign. A detailed analysis of the variationalong the dynamics of the ®rst two variables

d opposite)

Page 15: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Conformational Analysis of Flexible Molecules 877

entering the GpAp regression model showed thatGe follows roughly the variation of dBB: small Gevalues correspond to large dBB and vice versa. Wealso observe that Aa contributes to the minimumin dBB appearing at approximately 500 ps: adecrease in Aa occurs concomitantly with thedecrease in dBB. It is noteworthy that the highvalue of Aa during the ®rst 20 ps of the trajec-tory (cluster C1, Table 2B, Figure 6b) would yielda wrong prediction for dBB. Inclusion of more tor-sion angles in the regression leads to a more ¯ex-ible model capable of ®tting rather well theobserved behaviour of dBB along the trajectory(Figure 2b, dotted curve).

Formation of intramolecular hydrogen bonds

GpA

Figure 9a displays the behaviour of the 12 intra-molecular H-bonds, labelled HB01±HB012, which

Figure 9. GpA (a) and GpAp (b) hydrogen-bonding distsent 5 1% are represented. (a) Notation for GpA is: HBO1:RIB1.HO50-RIB1.O30; HBO4: GUA.N7-RIB2.HO20; HBO5: RRIB2.O50-RIB2.HO30; HBO8: RIB1.HO50-RIB2.O50; HBO9: RGUA.H22-ADE.N6; HBO12: RIB1.HO20-GUA.N3. {HBO3, Hcan be considered as the hydrogen-bond network associatebetween the formation of some of the more weakly populfeatures: HB05, present in 5% of the structures, occurs inessentially with the ®rst peak of dBB, concomitant with theregion Ib, Figures 2a,3a). It occurs in the conformational famAa and Gz, and ÿac values of Ge. Its loss is associated witTable 2A). HBO6, present at a 5% level, forms near the endadopted. It is associated with ÿsc values of Ge, and �sc aduring the sharp concerted movement of Aa and Gz towaHBO6 at t � 1277 ps is associated with a short incursion oHBO7 present 5%, is linked to an S! N inversion of the aHBO8, present 3%, is associated with Ab values ¯uctuatingC1 (Figure 6a, Table 2A), which corresponds to subregion Iawas shown above that these semiextended conformations wysis of dBB where torsion angles were used as independenfrom base-base interactions and thus appear only in the clossent 1%, leads to the conformational family C2 after a concthat a rotation of Ge is necessary to break HBO9 and set frbond interaction. (b) Notation for GpAp is: HBO1: RIBRIB1.HO20-GUA.N3; HBO4: RIB1.HO50-RIB2.O2P; HBO5: RRIB1.HO50-RIB2.O1P; HBO8: RIB1.HO20-RIB2.O1P; HBOHBO11: RIB2.HO20-PHOt.O50. Terminal phosphate is denoteHBO1, present in 26% of the trajectory, is related mainly topucker of the adenosine sugar when HBO1 is present or abÿac or in the ap sectors when HBO1 is present or absent, ret � 40 ps (cluster C4; Figure 6b, Table 2B) seems to occur adirectly related to Ae. Contrary to HBO1, Ae occurs more oor absent, respectively. HBO3, present 14%, is ®xed primari(HBO12), HBO3 presence is associated with ÿsc and ap vdepends, to a ®rst approximation, on Ge. The presence andrespectively. HBO5, present 10%, is associated with ÿsc anoccurs preferably in the S Gp puckering mode HBO6, also pering mode, and is associated with �sc conformation of Gz9%, correlates with ÿsc rotamers of Gz and Ge. HBO8, presassociated with ap values for both angles. HBO9, presentIndeed, it occurs only in the north Gp puckering mode, andters C18 and C19; Figure 6b and Table 2B). The formation oof Ae and Az. Finally, HBO11, present 4%, occurs mainly fo

are present in at least 1% of the conformations inthe trajectory of GpA. The ®rst two H-bonds,HB01, HB02, are the most populated (respectivelyin 41% and 32% of the conformations). All theremaining H-bonds occur in less than 10% of theconformations.

Comparison with Figures 2a,c and 3a,c indicatesthat the formation of four H-bonds, HB03, HB04,HBO10, HBO11, overlaps signi®cantly with that ofthe folded GpA structures. This is illustrated moredramatically by the joint behaviour of theseH-bonds, displayed at the top of Figure 9a(HBO S). We also see that local increase in dBB andconcurrent decrease of BSABB in region II, areassociated with a temporary loss of these H-bonds.These results suggest that these four bonds mayplay a role in maintaining the folded confor-mations of GpA.

In addition, we observe a clear correlationbetween dihedral angle values and the formationof speci®c H-bonds. In the following we discuss

ribution along the dynamics. Only hydrogen bonds pre-RIB2.HO20-ADE.N3; HBO2: RIB1.HO20-RIB2.O2P; HBO3:IB1.HO50-RIB2.O1P; HBO6: RIB1.HO50-RIB2.O2P; HBO7:IB1.HO20-RIB2.O1P; HBO10: GUA.H1-ADE.N1; HBO11:BO4, HBO10, HBO11}, represented conjointly as HBO S,d to observed folding (see Figures 2a, 3a, and 3c). Link

ated H-bonds of GpA, and some speci®c conformationalthe guanosine ribose N pucker. Its formation coincidestransition from the unfolded to the folded states (sub-ily C7 (Figure 6a, Table 2A) together with ÿsc values of

h a 90� rotation of Aa towards the ap sector (C8 cluster,of the simulation, when the S guanosine ribose pucker isnd �ac values of Aa and Gz, respectively. It also formsrds the ap level (C17; Figure 6a, Table 2A). The loss off Gg towards the ap region (C19; Figure 6a, Table 2A).denosine ribose pucker (cluster C4; Figure 6a, Table 2A).in the �ac sector. HBO9, present 2%, characterizes cluster

of dBB and BSABB pro®les (Figures 2a,3c, respectively). Itere not correctly predicted in the multiple regression anal-t variables. HBO10 and HBO11, both present 2%, ariseed states (region II, Figure 2a). Formation of HBO12, pre-erted change of Ge (ap!ÿsc) and Gw (ÿsc! ap). Note

ee the guanosine ribose O20-H group for a new hydrogen2.O50-PHOt.HO50; HBO2: RIB2.HO20-PHOt.O2P; HBO3:IB1.HO20-RIB2.O2P; HBO6: RIB1.HO50-RIB2.O50; HBO7:

9: RIB1.HO50-RIB1-O30; HBO10: RIB2.HO20-PHOt.O1P;d as PHOt. Description of the hydrogen bonds in GpAP:Ap and Ae. We observe a trend towards the 2T1 or 2T3

sent, respectively. Moreover, Ae occurs more often in thespectively. Short incursion of Az towards the ap sector atfter the temporary loss of HBO1. HBO2, present 19%, is

ften in the ap or in the ÿac sectors when HBO2 is presently by Ge and Gw. Like the corresponding H-bond in GpAalues of Ge and Gw, respectively. HBO4, present 12%,absence of HBO4 are related to ÿsc and ap values of Ge,d �sc conformations of Gw and Gz, respectively. HBO5

resent 10%, forms preferentially in the guanosine N puck-and with higher Aa values (�sc ÿ �ac). HBO7, present

ent 7%, depends primarily on Ge and Gz. Its presence is6%, behaves as the equivalent H-bond in GpA (HBO3).

as in GpA, it is lost as Gg changes from ÿsc to ap (clus-f HBO10, present 6%, is determined mainly by ap values

r ÿsc values of Az.

Page 16: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

878 Conformational Analysis of Flexible Molecules

this correlation for the H-bonds appearing in morethan 5% of the conformations. The observationsmade for the remaining H-bonds of Figure 9a, aregiven in the legend to this Figure.

We observe that HBO1 (adenosine ribose O20-H . . . N3 adenine) restricts the movements of theadenine base. In its presence, Aw adopts a meanvalue of 184.4�, while in its absence, it wanderstowards the ÿsc sector, adopting a mean value of218.9�.

HBO2 (guanosine ribose O20-H . . . O2P phosphate)conditions Ge variations. As for Aw, Ge adoptsvalues near the ÿsc region, with a mean of 227.5�when HBO2 is absent, but adopts a mean value of189.8� when this H-bond is formed.

HBO3 (guanosine ribose O50-H . . . O30 guanosineribose), present in 9% in the conformations, corre-lated with Gd variations and consequently alsowith the guanosine ribose pucker (Gp). It formsonly when the guanosine adopts the N puckeringmode, and is hence absent during the ®rst 450 psand the last 250 ps of the simulations, and between600±750 ps. In the latter case its disappearanceseems to be caused by a sharp rotation of Gg fromÿsc to ap (Figure 5a).

The close link between the formation of thesehydrogen bonds and the values of speci®c torsionangles, also entails a clear link with the identi®edconformational families (Figure 6a, Table 2A):HBO3 appears in the conformational family C10,which results from a transition of Gg to the ÿscregion, and disappears with C16, corresponding tothe transition of Gp from the N to S pucker modes.HB03 is also absent in clusters C13, C14 and C15.Comparison with Figure 2a shows that it forms inregions II (folded conformation) and IIIa (fol-ding! unfolding transition).

HBO4 (guanine N7 . . . H-O20 adenosine ribose),present at a level of 9%, is characterized by the Nguanosine ribose pucker, and forms exclusively infolded conformations (region II; Figure 2a).

GpAp

The behaviour of the 11 intramolecular H-bondsin GpAp populated more than 1% is displayed inFigure 9b. Interestingly, though individualH-bonds are populated, at a lower level than inGpA (the most highly populated is HBO1, at a26% level), 10 out of the 11 H-bonds of Figure 9b,occur above the 5% level.

We see that here too, the brief appearance offolded conformations (t � 456±474 ps; Figures 2b,3b, 3d) is linked to the presence or absence of cer-tain hydrogen bonds. First, our results indicatethat the formation of HBO1 (adenosine riboseO50 . . . H-O50 terminal phosphate), signi®cantlylimits the degrees of freedom of the adenosineribose, whereas its loss increases the sugar ¯exi-bility (Aa is displaced towards �sc values). ThisH-bond thus seems to be very important for keep-ing this molecule in the extended conformation.Second, the formation of HBO3 (guanosine ribose

O20-H . . . N3 guanine) seems to limit the guanine¯uctuations. Third, HBO10 (adenosine ribose O20-H . . . O1P terminal phosphate), seems to stabilizethe folded by neutralizing, in part, the charge ofthe terminal phosphate, thereby contributing to theprocess of its partial desolvation.

Finally, as for GpA, there also seems to be aclear relationship between the values of certaindihedral angles, and the formation of the most per-sistent H-bonds. These relations are described inthe legend to Figure 9b.

Discussion

Our torsion angle based cluster analysis andstructure superposition procedures allowed us todescribe the 1.3 ns GpA and GpAp MD trajec-tories in terms of a small number of conformation-al families. This then permitted us to investigatethe structural changes that accompany the tran-sitions between these families, leading to novelinsights into the parameters that govern thesetransitions.

Let us consider, for example, the P-O esterbonds, known to be the major determinants ofpolynucleotide conformation (Saenger, 1984). Innucleic acid crystal structures, three types of turnsgoverned by values of the z torsion angle areobserved: p1 (z � �sc), p2 (z � ap) and p3 (z � ÿsc),whose sharpness depends on the a angle. In ourGpA simulation, different combinations of Gz andAa are observed (Table 2A). The formation of theGpA folded structures (clusters C10±C15,Figure 6a) is preceded by that of a p1 turn (clusterC9; Figure 6a, Table 2A) with Gz and Aa centredat 49.9� and 213.2�, respectively. In the folded con-formations, Gz remains in the �sc sector, maintain-ing the p1 turn, whereas Aa ¯uctuates within the�sc and ap regions.

We see furthermore that these conformations arecharacterized by an inverted stacking of the G andA bases, which adopt an antiparallel orientation,with the riboses pointing in opposite directions(Figure 8(a), see for comparison Figure 1(b) and(c)). Our multivariate analysis detected three tor-sion angles, Gp, Gg, and Ge, as being principallyresponsible for this arrangement. Gp adopts the Npucker mode, Gg ¯uctuates in ÿsc or ap rangesand Ge oscillates within the ap sector.

Our study revealed in addition that folding inGpA is induced by a reduced number of torsionangles belonging to the guanosine moiety and thatessentially four intramolecular H-bonds played acrucial role in fostering this state. Quite remark-ably, the addition of just one phosphate at the 30-end reduces dramatically the ability of GpAp tostack. This can be attributed to the presence of ahydrogen bond between the adenosine ribose O50

and the terminal phosphate O50. This hydrogenbond restricts the rotation about Ag, thereby hin-dering the response of the adenosine nucleoside tothe movements of the adjacent guanosine.

Page 17: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Conformational Analysis of Flexible Molecules 879

Our ®ndings on the crucial role played by H-bonds in stabilizing local conformations, are ingood general agreement with early semiempiricalenergy calculations in vacuum for dinucleosidemonophosphates (Yathindra & Sundaralingam,1974).

To validate further our results we also surveyedthe conformations of dinucleotide crystal structuresin the Nucleic Acid DataBase (NDB; Berman et al.,1992), which at the time of our analysis comprised58 cases (46 ribodinucleotides and 12 deoxyribodi-nucleotides; see Table 5). This showed that p2 wasthe lesser populated arrangement in both deoxyri-bodinucleotides (8.3%) and ribodinucleotides(4.4%). Moreover, the preference for p3 relative top1, was clearer in ribodinucleotides (87.0/8.7%)than in deoxyribodinucleotides (58.3/33.3%). Inthese crystal structures a appeared always in the�sc sector for p1 and in the ÿsc sector for both p2

and p3 turns. The latter rule is not strictly obeyedin our aqueous simulation, where other combi-nations of z and a are also allowed. But this is notsurprising since many local minima, as well as con-formations representing transitions between theseminima, may be sampled in room temperature MDsimulations.

A particularly interesting ®nding of this surveywas that one of the two conformations in theasymmetric unit of the crystal structure of UpAH�

(Sussman et al., 1972; Rubin et al., 1972), closelyresembled our folded GpA conformation(Figures 19c) and 8(a)). This crystal conformationfeatures a p1 turn. Both its P-O torsion angles (z, a)are in the �sc range. its riboses are C30-endo, itsbases in the ap orientation, the Ge and Ab angleare in the ap sector and Gg and Ag are in the �synrange. This 3D arrangement is very similar to ourC11 family and the grouped {C13, C14, C15}families, except for the Gg and adenosine riboseconformation.

Finally, we ®nd that some of the GpA open con-formations adopted in our simulation (C16 andC18) also resemble the backbone conformation ofseveral dinucleotide crystal structures (Seemanet al., 1975; Frederick et al., 1988; NDB codesDRB003, and UDB012, respectively). We observeone main discrepancy however, for the Gg torsion,which is �sc in one of the crystal structures(UDB012) and ÿsc in the C16 and C18 confor-mational families (Table 2A). this discrepancy can

Table 5. NDB codes of dinucleotides used for structural com

BDBP23A BDBP23B UDB004A UUDB012A UDB012B UDBS38A ZARB002A ARB002B ARB003A AARB004A ARB005A DRB002A DDRB006B DRB007A DRB007B DDRBB01A DRBB09A DRBB09B DDRBB11B DRBB12A DRBB12B DDRBB14B DRBB15A DRBB15B DDRBB17B DRBB19A DRBB19B DURB008A URB008B URB016A U

First six characters denote NDB ID whereas last symbol correspon

be attributed to the fact that in these families, Ggbelongs to a terminal hydroxyl.

Influence of conformational equilibria of GpAand GpAp on barnase catalysis and binding

A major aim of our study has been to determinewhether the isolated dinucleotides GpA and GpApdisplayed different conformational properties, andif so, to see if these differences could help rational-ize the marked increase in catalytic ef®ciency ofbarnase towards GpAp.

Our results show the existence of one majordifference, namely, the presence of a relativelyimportant proportion of folded structures for GpA(48.6%), but not for GpAp. The latter remainsessentially extended, with folded structures repre-senting only a minor species (4.7%). We suggestthat this may, at least in part, be responsible forthe increased catalytic ef®ciency and/or bindingfor GpAp versus GpA.

Indeed, let us consider the conformational equili-brium of the dinucleotides Sextended () Sfolded,whose corresponding constant is K � [Sfolded]/[Sex-

tended]. Furthermore, since all the available structur-al data suggest that the nucleotide substrate bindsto the barnase active site in an extended confor-mation (see for example, Buckle & Fersht, 1994)one can make the simplifying assumption that onlythe open nucleotide conformation binds theenzyme active site. Based on these considerationsan expression for KM can be derived, which expli-citly incorporates the presence of different ionicstates of the catalytic residues of barnase, Glu73and His102 (Gordon-Beresford et al., 1996), as wellas the nucleotide conformational equilibria, asfollows:

KM � KS � �1� K� �1�H�

Ka1

� Ka2

H�

1�H�

Kba1

� Kba2

H�

�1�

where Ks is the dissociation constant for the bar-nase-substrate complex in which the catalytic resi-dues are in their active protonation state andthe free substrate is in an active extendedconformation; Ka1

, Ka2are the proton dissociation

constants of Glu73 and His102 in the free enzyme

parisons

DB005A UDB005B UDB007ADB020A ZDB020BRB003B ARB003C ARB003DRB003A DRB005A DRB006ARB008A DRB018A DRB018BRBB09C DRBB09D DRBB11ARBB13A DRBB13B DRBB14ARBB16A DRBB16B DRBB17ARD004A DRD004B URB003ARB016B

ds to strand name (Berman et al., 1992).

Page 18: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

880 Conformational Analysis of Flexible Molecules

and Kba1

, Kba2

are the corresponding constants in theenzyme-substrate complex.

Equation (1) shows that KM is expressed as afunction of (1 � K), with K being the above men-tioned constant for the nucleotide conformationalequilibrium. Since according to our results K shouldbe signi®cantly smaller for GpAp than GpA, thisshould lead to smaller KM values for the formerthan the latter, in good qualitative agreement withthe experimental measures of Day et al. (1992).

Our computational analysis clearly does notallow us to accurately estimate the magnitudes ofK for the two dinucleotides, because the simulationtimes are too short to quantitatively estimate theconformational equilibria. We observe indeed onlyone folding/unfolding transition for GpA and onlya ¯eeting approach to the folded conformation forGpAp. Quantitative estimates would require muchlonger simulations, in the �100 ns time range ormore, which are not feasible with present day com-puters. Nevertheless, Figure 5, which displays thevariation of torsion angles and sugar puckering inour simulations, clearly shows that most of theallowed ranges in the dinucleotide torsion angleshave been extensively visited during our simu-lations. This suggests that to a ®rst approximation,the sampling of conformational space has beenreasonable enough to allow the identi®cation ofthe major conformational trends, and provide use-ful insights into the correlations between backbonetorsions that govern these trends, although heretoo much longer simulations are required to obtainreliable statistics.

In addition to longer simulations of the isolateddinucleotide, detailed analyses of the confor-mational space of barnase and barnase-nucleotidecomplexes will also be required to quantitativelyevaluate binding and kinetic constants. Such ana-lyses are presently in progress.

Lastly, we wish to comment on base-sugar orien-tation of the guanosine moiety. Based on their crys-tal structure, Buckle & Fersht (1994) proposed thatGg should adopt the ÿsc conformation (syn) forcatalysis to take place. The modelling study ofGordon-Beresford et al. (1996) makes a similar pro-posal. It is intriguing that this conformation isnever observed in our simulations of either of thenucleotides. This suggests that access to this con-formation involves energy barriers, which cannotbe overcome at room temperature, at least in theconsidered simulation time. It is also possible thatbinding to the enzyme helps to overcome thesebarriers, in which case the enzyme will be said toinduce this productive conformation. The calcu-lations in progress should also be helpful in clarify-ing these issues.

Methods

MD simulations

A time-dependent trajectory of GpA and GpAp in thepresence of explicit solvent molecules is computed by

solving Newton's equations of motion for each particlein the system. The MD simulations were performed witha BRUGEL package (Delhaise et al., 1985) in the microca-nonical ensemble (N, V, E) at room temperature (298 K),with an integration time-step size of 1fs (1fs � 10ÿ15

seconds). The force-®eld used to calculate the potentialenergy of the system (including intrasolute, solute-sol-vent and solvent-solvent interactions) is borrowed fromthe CHARMM potentials (Mackerell et al., 1995) version22. The solvent is modeled by the three-centre chargeTIP3P model (Jorgensen, 1981). Electrostatic interactionsare computed by the usual Coulomb potential with adielectric constant e � 1, and a 8.5 AÊ radial site-site cut-off distance and a termination function (Brooks et al.,1985; preÂvost et al., 1990) is applied to the Coulombpotential on the entire distance range. These simulationconditions were shown to yield excellent agreement withresults obtained in simulations of liquid water, per-formed using Ewald summation (PreÂvost et al., 1990).Geometrical constraints are applied during the MDsimulations to all the bond lengths involving hydrogenatoms and the water molecules are considered as fullyconstrained (six degrees of freedom per water molecule).These geometrical constraints were calculated with theSHAKE method (Ryckaert et al., 1977).

The starting con®gurations of the systems consisted ofGpA solute molecule taken from the barnase-tetranucleo-tide crystal complex (Buckle & Fersht, 1994), one coun-ter-ion Na� and 485 TIP3P water molecules and GpApsolute molecule taken from the barnase-tetranucleotidecrystal complex (Buckle & Fersht, 1994), two counter-ions Na� and 481 TIP3P water molecules. The watermolecules were randomly oriented around the soluteand counter ions and placed on a cubic lattice in a cubicbox of 24.84 AÊ edge size. The systems were ®rst sub-mitted to a thermalization phase of 10 ps followed by anequilibration phase of 150 ps. The productive MD runsamounted in total 1.32 ns for each system.

Structural and statistical analyses

To analyse MD simulations in terms of molecularstructure, torsion angles, the distance and relative orien-tation between the bases, the accessibility to solvent andthe internal hydrogen-bonding framework were deter-mined. To give insight into the striking interrelationsamong all these molecular properties, several multivari-ate statistical techniques as multiple regression, clusterand discriminant analyses (A®® & Clark, 1984) wereapplied. SAS statistical package (SAS, 1993) was used forstatistical analyses.

Cluster analysis

To characterize a rational set of molecular confor-mations, a K-means clustering was applied. The cluster-ing was done using the nearest centroid sorting methodon the basis of Euclidean distances computed from thenucleotide torsion angle values (Anderberg, 1973; A®® &Clark, 1984). This procedure was chosen because it canhandle large data sets of up to 100,000 observations andis therefore well suited for analysing the conformationsin our computed trajectories (each containing 2640 obser-vations). Our goal was to ®nd the minimum number ofclusters able to characterize the main conformationalfamilies sampled in our systems. Because cluster analysisis highly empirical, we took the time-dependent beha-viour of the torsion angles as a reference. Several inde-

Page 19: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

Conformational Analysis of Flexible Molecules 881

pendent clustering runs were performed in which thenumber of clusters (k) was increased from two, till theresulting classi®cation was consistent with the torsionangle pro®les.

Since compensating changes in two or more torsionangles can lead to very similar 3D structures, the levelof structural similarity between families generated bythe clustering procedure was evaluated as follows: Themean conformation of each family was computed bysuperimposing all the structures of that family ontoone of its members (McLachlan, 1979), and computingthe resulting average atomic coordinates. Next, theRMS deviations were computed for every pair of meanconformations and for the members of every familyrelative to their mean. Tukey's test for unequal samplesizes (Spjùtvoll & Stoline, 1973) was employed toassess whether the difference between a pair of clustersis signi®cant (P < 0.05) or not.

Discriminant analysis

To elucidate the angular determinants of the fold-ing/unfolding process, a discriminant analysis wasapplied. This technique was used to classify units (i.e.molecular structures) into one of two alternativegroups (i.e. folded or unfolded conformations) on thebasis of a set of measurements (i.e. torsion angles). Theproperty we chose to de®ne folding was the base-baseburied surface area (BSABB). A structure was con-sidered either folded or unfolded when BSABB wasgreater or equal to zero, respectively. Because torsionangles are interrelated, we carried out ®rst a stepwiseprocedure to determine the smallest subset of variables(torsion angles) to produce a good discriminationmodel. With this subset we carried out the discrimi-nant analysis to get the Fisher discriminant function (alinear combination of the variables in the model). Thisfunction is helpful in two aspects. First, it allows forthe classi®cation of molecular structures into folded orunfolded groups and, second, it indicates the directionand degree to which each angle contributes to theclassi®cation. The ®rst property becomes in addition ameans of validating the model if the percentages ofcorrect classi®cation in both groups is high. Moreover,to compare the contribution of each angle to the pre-diction of either folding or unfolding, the discriminantcoef®cients were standardized by multiplying them bythe corresponding pooled standard deviations (A®® &Clark, 1984).

Multiple regression

De®ning the distance between the geometric centresof the two bases (dBB), as the continuous variable,which characterizes the extent of folding of the dinu-cleotide, we used a multiple regression procedure toanalyse the dependence of dBB on the torsion angles.Analogously to the above described application of thediscriminant analysis, a stepwise algorithm wasemployed to select a subset of angles. From this sub-set, the predicted value of dBB was obtained for eachmolecular structure. Standardized regression coef®-cients were calculated to assess the relative weight ofthe angles to the prediction of dBB (A®® & Clark,1984). The results from the multiple regression anddiscriminant analyses were compared.

Acknowledgements

The authors are grateful to H. Berman, C. A. Zardeckiand B. Schneider for helping with the NDB database.J.G. was supported, in part, by grants from the Minister-io de EducacioÂn y Cultura, Spain (DGICYT PR95-276and PB95-0624). In addition, we gratefully acknowledgesupport for this work from the Belgian programme ofInteruniversity Poles of Attraction initiated by the Bel-gian Prime Minister's Of®ce for Science, Technology andCulture, and the Action de Recherches ConcerteÂes de laCommunaute Francais de Belgique; particular thanks goto the computer center of the Universite Libre de Brux-elles for extensive use of the computer facilities.

References

A®®, A. A. & Clark, V. (1984). Multiple regression andcorrelation: examining linear relations between onevariable and many variables. Variable selection inregression analysis Discriminant analysis: classify-ing individuals into populations. Cluster analysis:Empirical grouping of individuals. In Computer-Aided Multivariate Analysis, pp. 120±195, 247±286,379±411, Lifetime Learning Publications, Belmont,California.

Anderberg, M. R. (1973). Cluster Analysis for Applications,Academic Press, Inc., New York.

Arnott, S. & Hukins, D. W. L. (1969). Conservation ofconformation in mono- and polynucleotides. Nature,224, 886±888.

Baudet, S. & Janin, J. (1991). Crystal structure of a bar-nase-d(GpC) complex at 1.9 AÊ resolution. J. Mol.Biol. 219, 123±132.

Berman, H. M., Neidle, S. & Stodola, R. K. (1978). Drug-nucleic acid interactions: conformational ¯exibilityat the intercalation site. Proc. Natl Acad. Sci. USA,75, 828±832.

Berman, H. M., Olson, W. K., Beveridge, D. L.,Westbrook, J., Gelbin, A., Demeny, T., Hsieh, S.-H.,Srinivasan, A. R. & Schneider, B. (1992). A compre-hensive relational database of three-dimensionalstructures of nucleic acids. Biophys J. 63, 751±759.

Brooks, C. L., Pettitt, B. M. & Karplus, M. (1985). Struc-tural and energetic effects of truncating long rangedinteractions in ionic and polar ¯uids. J. Chem. Phys.83, 5897±5908.

Buckle, A. M. & Fersht, A. R. (1994). Subsite binding inan RNase: structure of a barnase-tetranucleotidecomplex at 1.76-AÊ resolution. Biochemistry, 33,1644±1653.

Day, G. D., Parsonage, D., Ebel, S., Brown, T. & Fersht,A. R. (1992). Barnase has subsites that give rise tolarge rate enhancements. Biochemistry, 31, 6390±6395.

Delhaise, P., van Belle, D., Bardiaux, M., Alard, P.,Hamers, P., van Cutsem, E. & Wodak, S. J. (1985).Analysis of data for computer simulations onmacromolecules using the ceram package. J. Mol.Graphics, 3, 116±119.

Frederick, C. A., Coll, M., Van Der Marel, G. A., VanBoom, J. H. & Wang, A. H.-J. (1988). Molecularstructure of cyclic deoxydiadenylic acid at atomicresolution. Biochemistry, 27, 8350±8361.

Gelbin, A., Schneider, B., Clowney, L., Hsieh, S.-H.,Olson, W. K. & Berman, H. M. (1996). Geometricparameters in nucleic acids: sugar and phosphateconstituents. J. Am. Chem. Soc. 118, 519±529.

Page 20: Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods

882 Conformational Analysis of Flexible Molecules

Gilson, M. K., Given, J. A. & Head, M. S. (1997). A newclass of models for computing receptor-ligand bind-ing af®nities. Chem. Biol. 4, 87±92.

Gordon-Beresford, M. H., Van Belle, D., Giraldo, J. &Wodak, S. J. (1996). Effect of nucleotide substratebinding on the pKa of catalytic residues in barnase.Proteins: Struct. Funct. Genet. 25, 180±194.

Guillet, V., Lapthorn, A. & Mauguen, Y. (1993). Three-dimensional structure of a barnase-30GMP complexat 2.2 AÊ resolution. FEBS Letters, 330, 137±140.

Jorgensen, W. L. (1981). Transferable intermolecularpotential functions for water, alcohols, and ethers.Application to liquid water. J. Am. Chem. Soc. 103,335±340.

Kitamura, K., Wakahara, A., Mizuno, H., Baba, Y. &Tomita, K. (1981). Conformationally ``concerted''changes in nucleotide structures. A new descriptionusing circular correlation and regression analyses.J. Am. Chem. Soc. 103, 3899±3904.

Lavery, R. & Hartmann, B. (1994). Modelling DNA con-formational mechanics. Biophys. Chem. 50, 33±45.

Louise-May, S., Auf®nger, P. & Westhoff, E. (1996). Cal-culations of nucleic acid conformations. Curr. Opin.Struct. Biol. 6, 289±298.

Mackerell, A. D., Jr, WioÂrkiewicz-Kuczera, J. & Karplus,M. (1995). An all-atom empirical energy functionfor the simulation of nucleic acids. J. Am. Chem. Soc.117, 11946±11975.

McLachlan, A. D. (1979). Gene duplications in the struc-tural evolution of chymotrypsin. J. Mol. Biol. 128,49±79.

Meiering, E. M., Bycroft, M., Lubienski, M. J. & Fersht,A. R. (1993). Structure and dynamics of barnasecomplexed with 30-GMP studies by NMR spec-troscopy. Biochemistry, 32, 10975±10987.

Mossakowska, D. E., Nyberg, K. & Fersht, A. R.(1989). Kinetic characterization of the recombinantribonuclease from Bacillus amyloliquefaciens (Bar-nase) and investigation of key residues in cataly-sis by site-directed mutagenesis. Biochemistry, 28,3843±3850.

Norberg, J. & Nilsson, L. (1995). Stacking free energypro®les for all 16 natural ribodinucleoside mono-phosphate in aqueous solution. J. Am. Chem. Soc.117, 10832±10840.

Norberg, J. & Nilsson, L. (1996). Conformational freeenergy landscape of ApApA from moleculardynamics simulations. J. Phys. Chem. 100, 2550±2554.

Olson, W. K. & Sussman, J. L. (1982). How ¯exible isthe furanose ring? 1. A comparison of experimentaland theoretical studies. J. Am. Chem.Soc. 104, 270±278.

Pavlovsky, A. G., Borisova, S. N., Strokopytov, B. V.,Sanishvili, R. G., Vagin, A. A. & Chepurnova, N. K.(1988). Metabolism and Enzymology of Nucleic AcidsIncluding Gene Manipulations, Plenum Press, NewYork.

PreÂvost, M., van Belle, D., Lippens, G. & Wodak, S. J.(1990). Computer simulations of liquid water: treat-

ment of long-range interactions. Mol. Phys. 71, 587±603.

Rubin, J., Brennan, T. & Sundaralingam, M. (1972).Crystal and molecular structure of a naturallyoccurring dinucleoside monophosphate. Uridylyl-(30-50)-adenosine hemihydrate. Conformational``rigidity'' of the nucleotide unit and models forpolynucleotide chain folding. Biochemistry, 11,3112±3128.

Rushizky, G. W., Greco, A. E., Hartley, R. W. & Sober,H. A. (1963). Studies on B. subtilis ribonuclease. I.Characterization of enzymatic speci®city. Biochemis-try, 2, 787±793.

Ryckaert, J. P., Cicotti, G. & Berendsen, H. J. C.(1977). Numerical integration of the cartesianequations of motion of a system with constraints:molecular dynamics of n-alkanes. J. Comp. Phys.23, 327±341.

Saenger, W. (1984). Editor of Principles of Nucleic AcidStructure. Springer Advanced Text in Chemistry,Springer-Verlag, New York.

Sarma, R. H., Lee, C.-H., Evans, F. E., Yathindra, N. &Sundaralingam, M. (1974). Probing the interrelationbetween the glycosyl torsion, sugar pucker, and thebackbone conformation in C(8) substituted adeninenucleotides by 1H and 1H-{31P} fast fourier trans-form nuclear magnetic resonance methods and con-formational energy calculations. J. Am. Chem. Soc.96, 7337±7348.

SAS release 6.10 (1993). SAS Institute Inc., Cary, USA,NC.

Seeman, N. C., Day, R. O. & Rich, A. (1975). Nucleicacid-mutagen interactions: crystal structure of ade-nylyl-30,50-uridine plus 9-aminoacridine. Nature, 253,324±326.

Sevcik, J., Dodson, E. J. & Dodson, G. G. (1991). Deter-mination and restrained least-squares re®nement ofthe structures of ribonuclease Sa and its complexwith 30-guanylic acid at 1.8 AÊ resolution. Acta Crys-tallog. sect. B, 47, 240±253.

Spjùtvoll, E. & Stoline, M. R. (1973). An extension of thet-method of multiple comparisons to include thecases with unequal sample size. J. Am. Stat. Assoc.68, 975±978.

Sundaralingam, M. (1969). Stereochemistry of nucleicacids and their constituents. IV. Allowed and pre-ferred conformations of nucleosides, nucleosidemono-, di-, tri-, tetraphosphates, nucleic acids andpolynucleotides. Biopolymers, 7, 821±860.

Sussman, J. L., Seeman, N. C., Kim, S.-H. & Berman,H. M. (1972). Crystal structure of a naturally occur-ring dinucleoside phosphate: uridylyl 30,50-adeno-sine phosphate models for RNA chain folding.J. Mol. Biol. 66, 403±421.

Yathindra, N. & Sundaralingam, M. (1974). Backboneconformations in secondary and tertiary structuralunits of nucleic acids. Constraint in the phosphodie-ster conformation. Proc. Natl Acad. Sci. USA, 71,3325±3328.

Edited by A. R. Fersht

(Received 24 March 1998; received in revised form 18 August 1998; accepted 20 August 1998)