A robust approach to protein foldability measures based on spin-glass models

9
A robust approach to protein foldability measures based on spin-glass models Tapon Roy Citation: Journal of Mathematical Physics 42, 4283 (2001); doi: 10.1063/1.1379746 View online: http://dx.doi.org/10.1063/1.1379746 View Table of Contents: http://scitation.aip.org/content/aip/journal/jmp/42/9?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Irreversibility and anisotropy of the low-temperature magnetization in manganites. Spin-glass polyamorphism Low Temp. Phys. 40, 179 (2014); 10.1063/1.4865567 A soft and transparent handleable protein model Rev. Sci. Instrum. 83, 084303 (2012); 10.1063/1.4739961 A statistical mechanical approach to protein aggregation J. Chem. Phys. 135, 235102 (2011); 10.1063/1.3666837 Mechanisms of kinetic trapping in self-assembly and phase transformation J. Chem. Phys. 135, 104115 (2011); 10.1063/1.3635775 Gradual development of folding ability through functional selection AIP Conf. Proc. 487, 69 (1999); 10.1063/1.59897 This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP: 134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Transcript of A robust approach to protein foldability measures based on spin-glass models

Page 1: A robust approach to protein foldability measures based on spin-glass models

A robust approach to protein foldability measures based on spin-glass modelsTapon Roy Citation: Journal of Mathematical Physics 42, 4283 (2001); doi: 10.1063/1.1379746 View online: http://dx.doi.org/10.1063/1.1379746 View Table of Contents: http://scitation.aip.org/content/aip/journal/jmp/42/9?ver=pdfcov Published by the AIP Publishing Articles you may be interested in Irreversibility and anisotropy of the low-temperature magnetization in manganites. Spin-glass polyamorphism Low Temp. Phys. 40, 179 (2014); 10.1063/1.4865567 A soft and transparent handleable protein model Rev. Sci. Instrum. 83, 084303 (2012); 10.1063/1.4739961 A statistical mechanical approach to protein aggregation J. Chem. Phys. 135, 235102 (2011); 10.1063/1.3666837 Mechanisms of kinetic trapping in self-assembly and phase transformation J. Chem. Phys. 135, 104115 (2011); 10.1063/1.3635775 Gradual development of folding ability through functional selection AIP Conf. Proc. 487, 69 (1999); 10.1063/1.59897

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 2: A robust approach to protein foldability measures based on spin-glass models

A robust approach to protein foldability measures basedon spin-glass models

Tapon Roya)

Boehringer Ingelheim Pharmaceuticals, 900 Ridgebury Rd., Ridgefield, Connecticut 06877

~Received 30 October 2000; accepted for publication 14 March 2001!

Spin-glass models and related methods have been applied to protein folding prob-lems, often by assuming an underlying Gaussian distribution for the energy leveldistribution. In this paper, we derive robust foldability measures that relax theGaussian distribution assumption implicit in current foldability and energy gapmeasures. ©2001 American Institute of Physics.@DOI: 10.1063/1.1379746#

I. INTRODUCTION

Spin-glass models and their analogs have been applied to problems involving proteinstructures1 and protein folding.2–4 In particular, spin-glass models have been used to determinewhether proteins can be characterized as to their foldability potential.2 Spin glasses are magneticsystems whose periodicity~translational invariance! is broken by ‘‘frozen randomness,’’ charac-terized by metastable states with essentially indeterminate relaxation times to a stable phase. Spinglasses can be alloys of a nonmagnetic atom of a noble metal and a magnetic atom of a transitionmetal~like manganese or iron!. Nonstoichiometric ternary alloys exhibiting a periodic crystallinestructure, but with the magnetic atoms randomly scattered through the lattice sites are other typesof spin glasses as are non-crystalline alloys of aluminum and gadolinium, where the atoms are inrandom spatial positions. Spin glasses all display similar thermodynamic behavior, notably, sin-gularities at a critical temperature.5

II. SPIN-GLASS MODELS AND THE RANDOM ENERGY MODEL

The Sherrington–Kirkpatrick~SK! spin-glass model6 is an Ising model in which spins arecoupled by infinite-ranged random interactions. Their probability density is taken to be Gaussianin the conventional formulation. Following Petritis,5 the SK model is of the mean-field type~thestrength of the interactions between interacting magnetic atoms retains the same magnitudethroughout the material! that is defined over sitesLN5$1,...,N% with dual latticeLN* of LN havinga complete graphKN5$$ i , j %: i PLN , j PLN ,iÞ j %5$$ i , j %: i PLN , j PLN ,i , j % over N. Nowconsider a group of independent, centered, Gaussian random variables with variance1:(Ji j ) $ i , j %PL

N*indexed byLN* . The interaction energy~Hamiltonian! of this model is

I N~s!521

AN(

$ i , j %PLN*Ji j s is j . ~1!

The sum extends overuLN* u5N(N21)/2 terms, and the normalization is inAN. The use of thenormal distribution is for computational convenience, and the central limit theorem is far frombeing applicable in this case. Formulating~1! as a time-dependent stochastic process, we obtainthe standard derivation:

a!Electronic mail: [email protected]

JOURNAL OF MATHEMATICAL PHYSICS VOLUME 42, NUMBER 9 SEPTEMBER 2001

42830022-2488/2001/42(9)/4283/8/$18.00 © 2001 American Institute of Physics

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 3: A robust approach to protein foldability measures based on spin-glass models

I N~ t,s!ª21

AN(

$ i , j %PLN*Bi j ~ t !s is j , sPSNª$21,11%N, ~2!

where theBi , j are independent Brownian motions with varianceEBi , j2 5t if i , j andEBi , j

2 5t/2 ifi 5 j . Let Es(•)522N(sPSN

(•) be the uniform probability on the configuration spaceSN . Thepartition function with time dependence is then

ZN~ t !5Es exp$I N~ t,s!2Nt/4%, ~3!

which, apart from a minor term, defines the same Gibbs measure as~1! that we will denoterN,t .SinceZN(t) is a positive martingale~a ‘‘martingale’’ represents a generalization of the concept ofa sum of independent random variables! with mean 1,ZN has stochastic differentialdZN(t) andlog martingaleMN defined by the stochastic integral:

MN~ t !5E0

t

ZN~s!21 dZN~s!. ~4!

MN is a centered martingale of the form

^MN&~ t !5E0

t

ErN,s^

2 F1

2 S s.s8

AND 2Gds, ~5!

where ErN,s^

2

is the expectation over two independent replicass,s8 associated with the GibbsmeasurerN,s .

For the simplified~time-independent! Hamiltonian of~1!, we can define a partition function as

ZN5 (sPSN

exp„2bI N~s!…, ~6!

whereb is the inverse temperature and is a random variable~for every fixedb!. The quenched freeenergy is

FN~b!521

blogZN~b!, ~7!

and the quenched specific free energy is defined by

f N~b!51

uLNuFN~b!. ~8!

Due to computational difficulties involving the log term in~7!, Edwards and Anderson7 proposedthe famous replica trick: instead of computingElogZN , they computeEZN

R , whereR is the numberof replicas@independent copies of the model~in s! all having the same random interactionsJ#.Thus, for positiveR, one needs to compute the moments of the partition function. Parisi8 andMezard, Parisi, and Virasoro9 observed that as a consequence of phase transition, there is abreaking of the symmetry groupSR over replicas, and that introducing infinite replica symmetrybreaking, inducing an ultrametric structure to the space of states, provides a heuristic solution forthe specific free energy that is in reasonable agreement with rigorous results.@Earlier, Mezard,Parisi, and Virasoro10 discussed the fluctuations~of order 1/N! of the free energy to define the purestate weights of the model. They explain how to obtain the replica symmetry-breaking solution ofthe SK model without introducing replicas.# Derrida11,12 describes two simplified SK-based mod-els, the random energy model~REM!, and the generalized random energy model~GREM!. In the

4284 J. Math. Phys., Vol. 42, No. 9, September 2001 Tapon Roy

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 4: A robust approach to protein foldability measures based on spin-glass models

case of the REM model, the energy levelsEi form a system of independent, identically distributedvariables, and the partition function is written as a sum over 2N energy levels,

Z~b!5(i 51

2N

exp~2bEi !, ~9!

whereb is the inverse temperature. For the GREM, correlations between the energy levels areintroduced hierarchically. The energy levels are defined:

Ek1 ,...,kn5AN(

j 51

n

Aajek1 ,...,kj

j , ~10!

and the partition function is then

Z~b!5 (k151

a1N

¯ (kn51

anN

expS bAN(j 51

n

Aajek1 ,...,kj

j D . ~11!

Spin-glass models have been used to describe protein folding, and to explain certain salientfeatures such as abrupt transitions between folded and unfolded states; multiexponential kinetics;and misfolds, irreversible denaturation, and protein drift.13

III. FOLDABILITY AND ENERGY GAP

Briefly summarizing Buchler and Goldstein, the equilibrium glass transition temperature,Tg ,or ‘‘heteropolymer freezing’’ temperature, is defined as the temperature where the protein chainentropy drops below zero and the chain attains one of its low-energy, metastable states. Using theREM ~which does not incorporate conformation energy correlations!, one can demonstrate thatTg

demarcates the kinetic behavior into two classes. ForT.Tg , the escape rate distribution fromlow-energy metastable states is lognormal, fast rates are much more likely than slow rates; forT,Tg , the kinetic distribution of rates becomes more uniform, so that both slow and fast escaperates become equally likely.2 Using the REM~and settingkB to one!, one can determine that theequilibrium glass temperature is

Tg5A s2

2S0, ~12!

wheres2 is the variance or ‘‘roughness’’~squared! of the REM energy distribution, andS0 is theconformational entropy of the system. One can thus surmise that ‘‘rougher’’ energy landscapeslead to higher glass transition temperatures. The presence of protein drift, misfolds, irreversibledenaturation, discrete intermediates, and multiexponential kinetics would indicate proteins withrougher energy landscapes than those that fold consistently and have single-exponential kinetics.Since the folded state must be thermodynamically preferable to other possible structures at equi-librium, it was postulated that optimal folding landscapes would tend to maximize folding tem-peratureTf , a measure of relative stability, and minimizeTg , that is, increase the ratio,Tf /Tg .14–16 Using the REM, one can show that

Tf

Tg5AF 2

2S01AF 2

2S021,

where the foldability,F, is

F5E2Ens

s~13!

4285J. Math. Phys., Vol. 42, No. 9, September 2001 Robust protein foldability and spin-glass models

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 5: A robust approach to protein foldability measures based on spin-glass models

whereE is the average energy of the protein chain in all conformations,Ens is the energy of thenative structure, ands is the ‘‘roughness’’ of the REM energy landscape. Monte Carlo kinetic andmolecular dynamic simulations have shown that fast folding proteins had higher average foldabili-ties and largerTf /Tg ratios.15,16 The quantification of folding ability byF has contributed tofurther development of protein designability research.

An equivalent measure relating to folding kinetics is the energy gap,Dg , defined as

Dg5Eg2Ens, ~14!

whereEg is the glass transition energy andEns is the native state energy.For now, assume that the energy states are uncorrelated, using the REM model, one can show,

in the limit of large proteins, that the energy distribution of compact states is of Gaussian form.The density of states of a REM heteropolymer sequence is denotedV(E)5nrREM(E), wheren isthe number of compact protein structures andrREM(E) is a normalized Gaussian distribution:

rREM~E!51

sA2pe2~E2E!2/2s2

, ~15!

whereE is the average energy of the compact states ands is the roughness of the energy densitylandscape. If we consider deriving the foldability of the native state,F, as a function of thenumber of compact protein structuresn, the condition that the native state energyEns has thelowest value among all othern21 energies follows from native state uniqueness and thermody-namic considerations. One can thus describe the native state energy distribution in the REM by

r~Ensun!5rREMP~Ens,n21!. ~16!

Using ~15!, and assuming independence of energies, we obtain

rREM~Ens!51

sA2pe2~Ens2E!2/2s2

, ~17!

and

P~Ens,n21!5F EEns

}

rREM~E!dEG ~n21!

, ~18!

combining, and normalizing„*2`1`r(Ensun)dEns51… gives the density of native states energies as

r~Ensun!5nrREM~Ens!F EEns

`

rREM~E!dEG ~n21!

, ~19!

substituting~17! and evaluating~18!, one obtains

r~Ensun!5n

sA2pe2~Ens2E!2/2s2F1

2 S 12ErfS Ens2E

s&D D G ~n21!

, ~20!

which is of the form of an extreme value distribution. To convert to foldability, we use~13! toobtain

r~F!5n

A2pe2F 2/2F1

2 S 11ErfS F&

D D G ~n21!

. ~21!

4286 J. Math. Phys., Vol. 42, No. 9, September 2001 Tapon Roy

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 6: A robust approach to protein foldability measures based on spin-glass models

Figure 1 displays foldability distribution curves for various values ofn ~this figure is based on Fig.1 of Buchler and Goldstein!. Having completed the summarization of the Buchler and Goldstein’sderivation, we will now introduce the new methodology.

IV. ROBUST ESTIMATION

Though the assumption of a Gaussian distribution~from which the extreme value distributionshown above is derived! is conventional and definitely plausible, we need to be able to account forirregularities in the distribution, and keep in mind that ‘‘large datasets of high quality showsignificant deviations from normality in cases which should be prime examples for the normal@Gaussian# law of errors....’’.17 To address this concern, we propose using a robust estimator,which is of an approximate parametric class. Robust methods allow us to retain the generalparametric form of the model while compensating for deviations from the assumeddistribution.18,19 In statistical parlance ‘‘robust’’ methods are intermediate between the usual para-metric ~e.g., Gaussian assumption! techniques and nonparametric or distribution-free methods.

We can now derive a robust version of~21!, by introducing the notion of thea-trimmed mean~0,a,1

2!, which is

T~F !5Ea

12a

F21~ t !dt/~122a!, ~22!

whereT(F) indicates a functional of the distribution of energies of compact states.17 The trimmedmean is a robust estimator, and is intuitively appealing, since one removes the@an# smallest andthe @an# largest energies~where @•# indicates the integer function!, excluding Ens from thisoperation, and then takes the mean of the remaining values. One can also replaces with thestandard deviation of this reduced set to produce a trimmed standard deviation,s t . SettingEt

5T(F), we now can define robust foldability asFt5(Et2Ens)/s t , yielding

r~Ft!5n2@2an#

A2pe2F t

2/2F1

2 S 11ErfS Ft

&D D G ~n2@2an#21!

. ~23!

FIG. 1. Plot of the foldability distributionr~F! for different numbers of compact energy states using the random energymodel ~based on Fig. 1 of Buchler and Goldstein!.

4287J. Math. Phys., Vol. 42, No. 9, September 2001 Robust protein foldability and spin-glass models

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 7: A robust approach to protein foldability measures based on spin-glass models

This relation is useful when the underlying energy distribution of compact states has a skewed,non-Gaussian character, or is predominantly Gaussian with outliers, and reduces to~21! when theunderlying distribution is Gaussian or close to Gaussian, giving it wide applicability for the actual,nonidealized distributions one can see in practice. Figure 2 shows the trimmed foldability distri-butions for various levels of trimming whenn51081. Figure 3 plots the energy gap distributionsfor the same values ofn that were used in Fig. 1~this figure is based on Fig. 2 of Buchler andGoldstein, with thex-axis corrected!. Note the small percentage of area under the curves in Fig. 3for which Dg.0, the weakest foldability criterion. Though foldability increases withn, only asmall fraction of REM heteropolymer sequences tend to be foldable as protein size increases.

FIG. 2. Plot of the robust~trimmed! foldability distributionr(Ft) for various degrees of trimming forn51081 compactenergy states.

FIG. 3. Plot of the energy gap distributionr(Dg) for different numbers of compact energy states using the random energymodel ~based on Fig. 2 of Buchler and Goldstein with thex axis corrected!.

4288 J. Math. Phys., Vol. 42, No. 9, September 2001 Tapon Roy

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 8: A robust approach to protein foldability measures based on spin-glass models

Using a similar argument as for foldability, we can obtain a robust version of the Buchler andGoldstein equation~13! for the energy gap distribution:

r~Dgt!5n2@2an#

A2pe2~A2 ln~n2@2an# !1Dgt!

2/2F1

2 S 11ErfS A2 ln~n2@2an# !1Dgt

&D D G ~n2@2an#21!

.

~24!

Note thatr(Dgt) is identical tor(Ft) except for the shifting factorA2 ln(n2@2an#).An alternative method for deriving a robust estimator would be to find the trimming level and

the value ofm, in ~25! below that causes the curve described by~25! to most closely approach thenonparametric kernel density estimation20 fit ~see Sec. V! to the energy levels histogram:

r~Fp!5m

A2pe2~Fp!2/2F1

2 S 11ErfS Fp

&D D G ~m21!

. ~25!

Here,Fp is based on the proportion of observations included to match the curves. More formally,consider a sequence of one-dimensional observationsX1 ,...,Xn that are independent and identi-cally distributed. These observations belong to some sample spaceS, which is a subset of the realline R. As a measure of discrepancy, consider the Prokhorov distance21 between two probabilitydistributionsG andH in F~S!:

p~G,H!inf$«;G~A!<H~A«!1« for all A%, ~26!

whereA« is the set of all points with distance fromA less than«. Equation~25! results frommatching~21! to the nonparametric kernel density curve with tolerance«. Note that one could alsouse the Hellinger or Le´vy distances or the bounded Lipschitz metric.22

V. EXAMPLE AND DISCUSSION

Figure 4 displays a foldability distribution that is somewhat irregular and has fairly long tails.This corresponds to an energy distribution of compact states that is moderately noisy with outliers.Superimposed on the histogram are three curves: the nonparametric kernel density fit, which canbe made to fit any histogram to any degree of accuracy, and is used here as the reference curve, thebest normal distribution fit, which is skewed to the right due to the long tail of the distribution, andthe r~F! fit to the distribution, which is more centered, but clearly is appreciably affected by the

FIG. 4. Nonparametric kernel density~--—--!, normal~---!, andr~F! ~—! fits to simulated noisy energy level data withoutliers.

4289J. Math. Phys., Vol. 42, No. 9, September 2001 Robust protein foldability and spin-glass models

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13

Page 9: A robust approach to protein foldability measures based on spin-glass models

irregularities and long tails of the distribution. In Fig. 5, the trimmed foldability distribution isdisplayed. Note that the histogram bars appear slightly different from those of Fig. 4 due torecalculation after trimming. The best normal distribution fit is still somewhat skewed to the right,but the trimmedr(Fp) fit to the distribution nearly coincides with the reference curve, and is justslightly skewed left, since the left tail of the distribution is still a little long, though of low density.Thus, the robust trimmed estimator distinctly improves the fit to the distribution when noise andoutliers are present.

1M. S. Friedrichs and P. G. Wolynes, Science246, 371 ~1989!.2J. D. Bryngelson and P. G. Wolynes, J. Phys. Chem.93, 6902~1989!.3E. I. Shakhnovich and A. M. Gutin, J. Phys. A22, 1647~1989!.4J. D. Bryngelson and P. G. Wolynes, Biopolymers30, 171 ~1990!.5D. Petritis, Ann. Inst. Henri Poincare´ Phys. Theor.64, 255 ~1996!.6D. Sherrington and S. Kirkpatrick, Phys. Rev. Lett.35, 1792~1975!.7S. F. Edwards and P. W. Anderson, J. Phys. F: Met. Phys.5, 965 ~1975!.8G. Parisi, Phys. Rev. Lett.43, 1754~1979!.9M. Mezard, G. Parisi, and M. Virasoro,Spin Glass Theory and Beyond~World Scientific, Singapore, 1987!; F. Cometsin Mathematical Aspects of Spin Glasses and Neutral Networks, edited by A. Bovier and P. Picco~Birkhauser, Boston,1998!.

10M. Mezard, G. Parisi, and M. Virasoro, J. Phys.~France! Lett. 46, L217 ~1985!.11B. Derrida, Phys. Rev. Lett.45, 79 ~1980!.12B. Derrida, J. Phys.~France! Lett. 46, L401 ~1985!.13N. E. G. Buchler and R. A. Goldstein, J. Chem. Phys.111, 6599~1999!.14J. D. Bryngelson, J. N. Onuchic, N. D. Socci, and P. G. Wolynes, Proteins21, 167 ~1995!.15R. A. Goldstein, Z. A. Luthey-Schulten, and P. G. Wolynes, Proc. Natl. Acad. Sci. U.S.A.89, 4918~1992!.16R. A. Goldstein, Z. A. Luthey-Schulten, and P. G. Wolynes, Proc. Natl. Acad. Sci. U.S.A.89, 9029~1992!.17F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel,Robust Statistics: the Approach Based on Influence

Functions~Wiley, New York, 1986!.18T. Roy, J. Math. Chem.21, 103 ~1997!.19T. Roy, J. Chemom.11, 501 ~1997!.20G. R. Terrell and D. W. Scott, J. Am. Stat. Assoc.80, 209 ~1985!.21Y. V. Prokhorov, Theor. Probab. Appl.1, 157 ~1956!.22P. J. Huber,Robust Statistics~Wiley, New York, 1981!.

FIG. 5. Nonparametric kernel density~--—--!, normal~---!, and robustr(Fp) ~—! fits to trimmed simulated noisy energylevel data with outliers.

4290 J. Math. Phys., Vol. 42, No. 9, September 2001 Tapon Roy

This article is copyrighted as indicated in the article. Reuse of AIP content is subject to the terms at: http://scitation.aip.org/termsconditions. Downloaded to IP:

134.139.173.111 On: Wed, 03 Dec 2014 02:17:13