Forced coalescence phasing: method initio … · crystallographic phases, which we call forced...

Proc. Natl. Acad. Sci. USAVol. 92, pp. 547-551, January 1995Chemistry

Forced coalescence phasing: A method for ab initio determinationof crystallographic phases

(x-ray crystallography/direct methods/macromolecule phasing/real space filtering/numerical seed coefficients)

WILLIAM B. DRENDEL, RAKHAL D. DAVE*, AND SANJEEV JAINtE. A. Doisy Department of Biochemistry and Molecular Biology, St. Louis University School of Medicine, St. Louis, MO 63104

Communicated by Lawrence F. Dahl, University of Wisconsin, Madison, WI, October 11, 1994 (received for review February 16, 1994)

ABSTRACT A method has been developed for ab initiodetermination of crystallographic phases. This technique,called forced coalescence phasing (FCP), is implemented on acomputer and uses an automated iterative procedure thatcombines real space filtering with numerically seeded Fouriertransforms to solve the crystallographic phase problem. Thisapproach is fundamentally different from that of traditionaldirect methods of phasing, which rely on structure invariantprobabilistic phase relationships. In FCP, the process beginswith an appropriate set of atoms randomly distributedthroughout the unit cell. In subsequent cycles of the program,these atoms undergo continual rearrangements ultimatelyforming the correct molecular structure(s) consistent with theobserved x-ray data. In each cycle, the molecular rearrange-ment is directed by an electron density (Fourier) map calcu-lated using specially formulated numerical seed coefficientsthat, along with the phase angles for the map, are derived fromthe arrangement of atoms in the preceding cycle. The methodhas been tested using actual x-ray data from three organiccompounds. For each data set, 100 separate phase determi-nation trials were conducted, each trial beginning with adifferent set of randomly generated starting phases. Correctphase sets were successfully determined in all ofthe trials withmost trials requiring fewer than 50 cycles of the FCP program.In addition to its effectiveness in small molecule phase deter-mination, FCP offers unexplored potential in the applicationofreal-space methods to ab initio phasing of proteins and othermacromolecule structures.

X-ray crystallography has become a widely used technique forstudying the molecular structures of materials ranging fromsimple inorganic compounds to proteins and even viruses. Itssophistication is accompanied, however, by a number ofchallenging obstacles. Among the more demanding of these isthe so-called phase problem, in which the phase angles of theFourier transform equation must be determined before it canbe used to calculate the electron density map of a moleculefrom the diffraction data. The phase angles cannot be exper-imentally measured and must, therefore, be derived by othermeans. For most small molecules, this can be done by com-puter, using ab initio methods that rely on structure invariantprobabilistic phase relationships, but larger molecules, such asproteins, require much more laborious efforts, including heavyatom derivatization of crystals and extensive additional datacollection. To overcome such difficulties, we have soughtalternative solutions to the phase problem. We report here thedevelopment of a method for ab initio determination ofcrystallographic phases, which we call forced coalescencephasing (FCP). Conceptually, FCP is a fairly straightforwardscheme that combines numerical methods with several com-monly used crystallographic procedures to produce a surpris-ingly effective ab initio phasing routine. In our trial studies with

structures ranging in size from 19 to 33 atoms, FCP appears torival existing ab initio phasing methods in speed, simplicity, andreliability. Our results suggest that FCP could be effective forlarger problems and may even hold promise for the ab initiophasing of protein structures.FCP is a purely computational approach to crystallographic

phase determination. The only experimental data required arethe unit cell parameters of the crystal, the chemical formula ofthe unit cell, and the x-ray diffraction data-i.e., the observedstructure factor amplitudes (F0). The name forced coalescencephasing derives from the ability of this method to force ahypothetical random distribution of electron density to grad-ually coalesce into a correctly phased structure consistent withthe experimentally measured x-ray diffraction data. FCP usesan iterative numerical procedure to determine the correctphases. In its current implementation, the process begins withan appropriate set of atoms randomly distributed throughoutthe unit cell in accordance with the atomic peaks of a randomlyphased electron density map. In succeeding cycles of the FCPprogram, these atoms undergo continual rearrangements asthey gradually coalesce to form the correct molecular struc-ture(s). These molecular rearrangements are carried out by anautomated atomic map-fitting procedure, in which the atomsare assigned to the peaks in an electron density map. Theatoms are free to undergo substantial positional changes fromone cycle to the next, as the peaks to which they are assignedappear or disappear in different regions of the map. Theelectron density maps are calculated using a Fourier transformequation (Fig. 1, Eq. a) whose terms contain special numericalseed coefficients, which are derived from both the Fo valuesand a set of calculated structure factor amplitudes (Fc). TheseFc values and the phases used in calculating the electrondensity maps are derived from the arrangement of atoms in thepreceding cycle and in fact are simply the Fourier transform ofthat arrangement (Fig. 1, Eq. b). The numerical seed coeffi-cients are formulated to gradually pull the Fc values into closeragreement with the corresponding F. values, which combinedwith the constraints imposed by atomic map fitting, ultimatelyproduces a correct molecular structure consistent with theobserved x-ray data. Phasing and structure determination are,therefore, interdependent processes in FCP. A more detaileddescription of the steps involved in FCP along with our resultsfrom tests with the actual x-ray data from several trial com-pounds is presented below.

METHODSRandom Starting Phases. A flow chart of the FCP routine

is shown in Fig. 2. In the first step, the Fo values are combinedwith a set of starting phases in the Fourier transform equationof Fig. 1, Eq. a to produce an initial electron density map.

Abbreviation: FCP, forced coalescence phasing.*Present address: Olsen and Associates Research Institute for AppliedEconomics, CH-8008 Zurich, Switzerland.tTo whom reprint requests should be addressed.

547

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement" inaccordance with 18 U.S.C. §1734 solely to indicate this fact.

Proc. Natl. Acad Sci USA 92 (1995)

a

p(x, y, z) =I E E IFh,klI cos 27r(hx + ky + lz - a'kl

bIFh,kl =,I hi+Bk aXhk1 = tan -I

where Ahkl-fE cos 27r(hxj + kyj + lzj)Bhkl EIfj sin 27r(hxj + kyj + lzi)

FIG. 1. Crystallographic Fourier transform equations. The use ofx-ray crystallography to determine molecular structures is based on thediffraction properties of crystals. A crystal can be viewed conceptuallyas a three-dimensional array of identical building blocks or repeatingunits called unit cells, each of which contains one or more moleculesof the substance of interest. When a crystal is placed in an x-ray beam,the x-rays are scattered in all directions by the electrons of the atomsin the crystal. The crystalline lattice causes this scattering to formdiscrete x-ray beams, called reflections, that emerge from the crystalin many different directions. When these reflections strike an x-rayfilm or other imaging plane, they produce the familiar grouping ofspots known as a diffraction pattern. The raw x-ray data consist of themeasured intensities (IhkI) of these reflections, where the indices (h, k,and 1) uniquely identify each reflection. The Ihkl values are essentiallya sampling of the x-rays scattered by the electrons in each unit cell ofthe crystal and, therefore, contain detailed information about thespatial arrangement of the atoms whose electrons produced thescattered x-rays. This information can be extracted using the Fouriertransform in Eq. a, where p is the electron density at any point (x, y,z) in the unit cell and V is the unit cell volume. Each term in thesummation on the right side of Eq. a represents a longitudinalsinusoidal electron density wave. The amplitudes of these waves,designated as IFhldI in Eq. a and referred to as F. values in the text,come directly from the diffraction data. Each IFhkIl is proportional tothe square root of the corresponding reflection intensity, IhkI. For eachwave, the harmonic wavenumber and the orientation of the planarwave front are indicated by the hkl indices from the diffraction data.The phase angle of each wave relative to the origin of the unit cell is2rrahkl. The FCP method described in this report is designed todetermine these phase angles, which are critical for calculating anelectron density map of the unit cell but are missing from thediffraction data. In the FCP program, Eq. a is implemented as a fastFourier transform and is used to generate a new electron density mapin each cycle. In this case, the lFhklI values are seed coefficients basedon a combination of the F. and Fr values (see text). Eq. b defines theinverse Fourier transformation of Eq. a. These equations are used ineach iteration to generate Fc and ac values (designated in Eq. b as IFhklIand ahki, respectively) from the current arrangement of atoms. xj, yj,and zj are the coordinates for each of the j atoms in the unit cell. Thescattering factors (fj) are known quantities corresponding to the x-rayscattering power of each atom.

Although the starting phases may in principle be based onwhatever phase information is available, such information is bydefinition completely absent in ab initio phase determination.To ensure the most cogent testing of our method, therefore, wehave conducted all of our trials using randomly generatedstarting phases.

Real Space Filtering. Once an electron density map isgenerated, the next step is to modify the electron density mapin accordance with certain known topological characteristicsof the molecular contents of the unit cell. This technique,referred to as real space filtering, has also been used in severalother phase refinement and phase determination schemes,where it serves primarily to impose constraints on the phases.In FCP it also plays a second critical and distinctive role bysupplying the Fc values needed in the calculation of numericalseed coefficients. Various forms of real space filtering havebeen developed, based on several different topological prop-erties, such as solvent content (1), electron density histograms

FIG. 2. Flow chart of the steps involved in FCP. Only the generalscheme is shown. The type of real space filtering used and theparticular formulation for seed coefficients are open to experimen-tation.

(2), and the atomic makeup of the unit cell (3-5). In the currentimplementation of FCP, real space filtering is based on thechemical formula of the unit cell and is accomplished by usinga common crystallographic procedure known as map fitting, inwhich all of the atoms from the known content of the unit cellare placed into the electron density map. An automatedpeak-picking algorithm is used to select appropriate atomicpositions in the map. Peaks in the electron density map areranked according to their relative magnitudes and then theatoms, ranked by their atomic numbers, are assigned to thesepeaks in a corresponding fashion, such that atoms with thehighest atomic numbers are positioned at the highest peaks.This requires high-resolution (=1 A) x-ray data so that eachatom will have a discrete peak in the map. As peaks areselected, an optional distance constraint may be imposed toprevent atoms from being placed too closely to one another.Ideally, all of the atoms should be used during map-fitting;however, we have performed experiments that indicate thatphases can be determined successfully using only 60-70% ofthe atoms. This could be important in cases where the contentsof the unit cell are not completely known.

Rationale for Use of Seed Coefficients. When map fitting iscomplete, inverse Fourier transformation (Fig. 1, Eq. b) isapplied to the resulting arrangement of atoms to generateanother set of Fc values and calculated phases (ac). These Fcand ac values are used in calculating an electron density mapfor the next round of map fitting. The phases used in makingthis map are simply the ac values, themselves, while thecoefficients for the map are numerical seed coefficients thatincorporate the Fo and the Fc values. The use of numericalseeding is intended to produce a map, which after fitting,results in an improved set of Fc values that are closer to theircorresponding Fo values. After a sufficient number of itera-tions, the seeding process should reach convergence, where theFc values are approximately equal to the Fo values. Ourrationale for accomplishing this is based on the fact that for

548 Chemistry: Drendel et al.

Proc. Natl. Acad. Sci. USA 92 (1995) 549

each coefficient the value of Fc is significantly influenced byits seed value, F,. More specifically, when a given Fc value isless than its corresponding Fo value, the new F, value for thatcoefficient should be greater than Fc. This will tend to pull thevalue of F, in the direction of F.. By the same reasoning, if Fcis greater than Fo, then the new F. should be less than Fc. Thissimplistic seeding postulate, diagrammed in Fig. 3, is notrigorous and is not necessarily expected to yield an optimalseed value for every coefficient in a given cycle of the program,but over a series of iterations, it turns out to be very effective.

Formulation of Seed Coefficients. Values of F, are onlyloosely constrained by this elementary approach to seeding,giving rise to an endless variety of possible seed formulationsbased on both linear and nonlinear functions ofF. and F,. Forthe trials presented in this report, we experimented with twomain types of seed coefficients; difference seeds and propor-tional seeds. Difference seeds are formulated according to theequation F, = F. + g(F. -F), for real numbers ,u > 0. Thevalue of ,u may be kept constant or permitted to vary for eachcoefficient in each iteration. In the latter case, the constrainton ,u is relaxed to ,u > -1. The best results have been obtainedwith a constant ,u, though both options are effective. Ourexperience suggests that seeds with A values of 2 to 3 areoptimal for rapid and reliable convergence, but the range ofacceptable ,u values may depend in part on the type of realspace filtering that is used. Difference seeds are based ontraditional crystallographic difference coefficients, which havebeen around for many years and are commonly used forrefinement and completion of known structures that havealready been phased by standard methods. In these roles, it ismore customary to view difference coefficients not from theperspective of satisfying the seeding postulate, but rather fortheir well-defined effect on electron density maps, which is toremove density from areas where it has been incorrectlyplaced, while appropriately enhancing density at locationswhere it was missing. We have not yet studied the precisebehavior of electron density maps in FCP, but the use ofdifference coefficients in FCP is at-least operationally similarto their use in structure refinement. It is therefore rather

a

F.

b

F,

F,

(i = 3)

(p = 2)

(1= 1)

(p = 0)

( = -1)

( = -1)

(p =0)

(1= 1)

(p =2)

(p = 3)

F5

FIG. 3. Diagram of the rationale for generating numerical seedcoefficients. The arrow labeled F, depicts the range of potential seedvalues when F. > Fc (a) and F. < F, (b). The actual value of F. woulddepend on the particular type of seed coefficient being used. Fo andF, are shown as solid horizontal lines. Dashed lines represent seedsgenerated by various positive integer values of ,t in the equation Fs =Fo + p(Fo - F-). The basic seeding postulate (see text) is satisfied byany real number j,, where ,u > -1.

surprising from a historical perspective that their value in abinitio phasing was not previously explored or even anticipated.Whereas difference seeds are derived from the quantity (Fo

- F,), proportional seeds make use of the ratio FO/FC and aregenerated by the expression Fs,new = Fs,prev(Fo/Fc), whereFs,new is the new seed and Fs,prev is the seed from the previouscycle. Proportional seeds are not based on any commonly usedcoefficients but are somewhat similar in form to the 3-gencoefficient, F2/FC, studied by Ramachandran (6). We find thatFCP trials with proportional seeds exhibit a significantly lowersuccess rate than those with difference seeds and generallyrequire a greater number of iterations to achieve convergence.

Monitoring Convergence. In the final step of the iterativeloop of the FCP program, the Fourier transform Eq. a (Fig. 1)is used to produce an electron density map from the seedcoefficients and ac values. The process then returns to the realspace filtering routine to continue with additional cycles of theprogram. As the program proceeds, the convergence betweenthe FC and FO values is monitored using the conventionalcrystallographic R factor, defined as:

F-IFoIFor high-resolution data, R factors <0.25 are generally indic-ative of a correct structure and program execution is usuallystopped when the R factor drops into the 0.15-0.20 range,although FCP can routinely achieve R factors as low as 0.10with only a few additional program cycles.

Correctness of Structures. As an added precaution inconfirming the validity of FCP, we have routinely verified thatthe low R factors obtained by FCP convergence are indeedcorrelated with correct structures. Computer graphics wereused to visually compare the molecular structures determinedby FCP with the known structures obtained by conventionalphasing and refinement methods. After accounting for trans-lations of the unit cell origin, and when necessary, enantio-meric inversions, the FCP structures were found to match theknown structures almost perfectly. All of the atom types werecorrectly assigned and the atomic coordinates had rms devi-ations of <0.1 A. The atomic coordinates generated by FCPcan be used directly for structure refinement.

Current Implementation. During the development of FCP,we experimented with several types of seed coefficients andreal space filtering techniques, by testing the effectiveness ofvarious schemes on a series of hypothetical model compoundsand their corresponding artificial x-ray data. Although weobtained encouraging results with a number of differentapproaches, the combination of difference seeds and atomicmap fitting proved so successful that this form of FCP hasbecome the primary focus of our current efforts and was theonly method used for the trials presented in this report. Thecurrent implementation of FCP treats all problems with spacegroup P1 symmetry and imposes no symmetry constraints onthe Fc and ac values or atomic coordinates.

RESULTS AND DISCUSSIONTesting Conditions. To provide a rigorous evaluation of the

power of FCP, we tested the method by using actual x-ray datasets collected on crystals of three organic compounds whosemolecular structures had been determined by conventionalphasing methods. The compounds crystallized either in spacegroup P21 or P212121 with unit cells containing from 56 to 132nonhydrogen atoms. For each data set, 100 phase determina-tion trials were conducted, each trial beginning with a differentset of randomly generated starting phases. All of the trials useddifference seeds with a ,u value of 3-i.e., Fs = 4Fo - 3Fc.

Success Rate and Statistics. The test results are shown inTable 1. Correct phase sets were determined in all of the 300

Chemistry: Drendel et aL

s

Proc. Natl. Acad. ScL USA 92 (1995)

Table 1. Results of FCP trials using actual x-ray data sets withrandom starting phases

Result

Compound Compound CompoundParameter 1 2 3

Space group P21 P21212i P212121Chemical formula of

the unit cell C46N208 C4402OCl12 CiooB4N4020S4No. of nonhydrogen

atoms in theasymmetric unit (andunit cell) 28 (56) 19 (76) 33 (132)

No. of reflections (inhemisphere) 2793 2754 5129

High-resolution cutofffor x-ray data, A 0.9 1.0 1.0

% of trials requiring<200 cycles (successrate) 100 100 100

% of trials requiring<50 cycles 53 97 68

No. of cycles requiredAverage 61.1 25.6 45.2Median 47 24 36Range 16-199 11-57 13-111

Cycle time, sec 2.3 2.7 12

Results of 100 FCP phase determination trials conducted for threecompounds, by using actual x-ray data and random starting phases, areshown. The compounds crystallized in the space groups indicated, butin each case the data were expanded to correspond to space group P1for use in FCP. (4Fo - 3Fc) difference seeds were used for all trialsshown in the table. In each trial, the program automatically switchedto (2F. - Fc) difference seeds after the R factor dropped to <0.35. Inthe table, the number of cycles refers to the number of FCP iterationsrequired for the R factor to drop to <0.25. By definition 50% of thetrials required fewer cycles than the number reported as the median.The Fc values were adjusted in each cycle by using a shell scalingroutine whose parameters were optimized for each data set before theseries of trials was begun. R factors were calculated by using the bestglobal scale factor. A distance constraint of 1.1 A was imposed duringmap fitting, and all atoms were assigned a global isotropic temperaturefactor (B), which can be determined from a Wilson plot. The cycletimes shown are for the current version of the FCP program run on aDEC Alpha AXP 3000 workstation, which is rated at 100 millioninstructions per sec (100 MIPS).

trials. The number of cycles needed for phasing ranged from11 to 199, with most trials requiring <50 cycles of the FCPprogram. Program running times typically ranged from 1 to 7min [on a 100 million instructions per sec (MIPS) computer],depending on the size of the data set and the number of atomsin the unit cell. In the best case, coordinates of all the

~ ~ o

*0

nonhydrogen atoms were obtained in <30 sec. The programcan be initiated with little or no user input or decision makingand requires no intervention by the user throughout the entirephasing process. Although compound 1 was the smallestproblem, with 56 atoms in the unit cell, it turned out to be themost challenging, requiring an average of 61.1 cycles to reachconvergence vs. 25.6 cycles for compound 2 and 45.2 cycles forcompound 3. This is probably due to the presence of Cl and Satoms in compounds 2 and 3; Cl and S have greater x-rayscattering power than the other atoms in these compoundsand, therefore, produce stronger peaks in the electron densitymaps. This increases the contrast of the maps, which may helpto constrain the phases. In compound 1, the scattering factorsof all the atoms (C, N, and 0) are similar to one another, thusreducing the contrast of the electron density maps.

Convergence Behavior. The convergence to correct struc-tures in these trials displayed an interesting and characteristicpattern of behavior (Fig. 4). All of the trials began with Rfactors of -0.57, which is the theoretically predicted value fora random distribution of atoms in noncentrosymmetric unitcells. After dropping fairly quickly in the early cycles, the Rfactors usually reached a plateau at around 0.50-0.52, wherethey fluctuated mildly for most of the remaining cycles in eachtrial. The time spent in the plateau region varied considerablyfrom one trial to another, but at some point the R factorsinvariably made a sudden and precipitous drop reaching verylow values in just a few cycles. The reason for this behavior isnot yet clear and trials with other seed coefficients exhibiteda somewhat different pattern. When proportional seeds or(3F. - 2Fc) difference seeds were used, R factors fell rapidlyinto the 0.40-0.42 range and then fluctuated for variouslengths of time before dropping sharply.Comparison with Existing Methods. A recently introduced

modification to the classical direct methods, referred to by itsauthors as the Shake-and-Bake method (3, 4), also incorpo-rates an iterative real space filter. However, Shake-and-Baketakes a fundamentally different approach to the phase prob-lem, relying heavily on traditional structure invariant phaserelationships instead of using seed coefficients. Althoughsufficient data are not yet available to establish the ultimatepotential of FCP, the current results do show that at least forsmaller structures, FCP can provide a simple, fast, and reliablecomplement to existing phasing methods. Eventually, it may bepossible to combine the power of FCP with the Shake-and-Bake method in a hybrid approach.

Future Potential. The test results presented here indicatethat relatively small structures of 35 atoms or so are well withinthe effective range of FCP. We have also conducted successfultrials (not reported here) on artificial data sets generated frommodel compounds containing >100 atoms. While our resultsclearly show that FCP is useful in small molecule phase

CYCLE #R - VALUE (%)# OF BONDS(1.0 - 1.65 A)

157.250

1054.160

2050.760

2537.689

3012.7110

FIG. 4. Illustration of FCP forcing a random arrangement of atoms to coalesce into the correct molecular structure. The results of an FCP trialfor compound 3 are shown. The number of bonds found in the range of 1.0 to 1.65 A is a measure of the degree of order in the atomic arrangement.Bonds to atoms in adjacent unit cells are not shown. In the first cycle, the atoms are arranged according to the peaks in a randomly phased F.map. Note that in the early cycles, the bonding pattern and atomic positions are incorrect. Even after 20 cycles, the R factor remains >50% andthe structure still appears essentially random. Correct bonding and atomic positions do not become apparent until cycle 25 at an R factor of 37.6%.Only after cycle 20 does the R factor begin to drop sharply, reaching 12.7% by cycle 30. At this point the structural arrangement of the atoms iscorrect throughout the unit cell.

550 Chemistry: Drendel et al.

Chemistry: Drendel et al.

determination, more significant is its unexplored potential inab initio phase determination for macromolecule structures. Ina recent study, Mukherjee and Woolfson (7) have concludedthat conventional direct methods "have only a very limitedcontribution to make to protein crystallography" and that"new ideas, perhaps coupled to the use of real-space methods"will be needed for future progress. As a promising real-spacemethod, therefore, it is important to investigate FCP's poten-tial in macromolecule phasing. However, to accomplish this itwill first be necessary to extend the applicability of our methodto lower-resolution (2-3 A) data, which are more commonlyavailable, for proteins and other macromolecules. This willrequire a more sophisticated routine for fitting low-resolutionelectron density maps, which will also address the concern thatatomicity by itself is probably not a sufficient constraint forphase determination using low-resolution data (ref. 8, andW.B.D., R.D.D., & S.J., unpublished results).

Proc. Natl. Acad. Sci. USA 92 (1995) 551

We thank Dr. Douglas Powell (Department of Chemistry, Univer-sity of Wisconsin-Madison) for providing the data sets used in testingthe FCP method and the Elsa U. Pardee Foundation for partialfinancial support.

1. Wang, B. C. (1985) Methods Enzymol. 115, 90-112.2. Zhang, K. Y. J. & Main, P. (1990) Acta Crystallogr. A 46, 41-46.3. Weeks, C. M., DeTitta, G. T., Miller, R. & Hauptman, H. A.

(1993) Acta Crystallogr. D 49, 179-181.4. Miller, R., DeTitta, G. T., Jones, R., Langs, D. A., Weeks, C. M.

& Hauptman, H. A. (1993) Science 259, 1430-1433.5. Wilson, C. & Agard, D. A. (1993) Acta Crystallogr. A 49, 97-104.6. Ramachandran, G. N. (1964) Advanced Methods of Crystallogra-

phy (Academic, New York).7. Mukherjee, M. & Woolfson, M. M. (1993)Acta Crystallogr. D 49,

9-12.8. Baker, D., Krukowski, A. E. & Agard, D. A. (1993) Acta Crys-

tallogr. D 49, 186-192.

Forced coalescence phasing: method initio … · crystallographic phases, which we call forced...

Documents

Transcript of Forced coalescence phasing: method initio … · crystallographic phases, which we call forced...