pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of...

7
The role of methyl–induced polarization in ion binding Mariana Rossi * , Alexandre Tkatchenko * Susan B. Rempe and Sameer Varma * Fritz-Haber-Institut der Max-Planck-Gesellschaft, D-14195 Berlin, Germany, Biological and Materials Sciences Center, Sandia National Laboratories, Albuquerque, NM-87185, United States of America, and Department of Cell Biology, Microbiology and Molecular Biology, Department of Physics, University of South Florida, 4202 E. Fowler Ave., Tampa, FL-33620, United States of America Submitted to Proceedings of the National Academy of Sciences of the United States of America The chemical property of methyl groups that renders them indis- pensable to biomolecules is their hydrophobicity. Quantum mechan- ical studies undertaken here to understand the effect of point sub- stitutions on potassium (K–) channels illustrate quantitatively how methyl–induced polarization also contributes to biomolecular func- tion. K–channels regulate transmembrane salt concentration gradi- ents by transporting K + ions selectively. One of the K + binding sites in the channel’s selectivity filter, the S4 site, also binds Ba 2+ ions, which blocks K + transport. This inhibitory property of Ba 2+ ions has been vital in understanding K–channel mechanism. In most K– channels, the S4 site is comprised of four threonine amino acids. The K–channels that carry serine instead of threonine are significantly less susceptible to Ba 2+ block and have reduced stabilities. We find that these differences can be explained by the lower polarizability of serine compared to threonine as serine carries one less branched methyl group than threonine. ATS substitution in the S4 site reduces its polarizability, which, in turn, reduces ion binding by sev- eral kcal/mol. While the loss in binding affinity is high for Ba 2+ , the loss in K + binding affinity is also significant thermodynamically, which reduces channel stability. These results highlight, in general, how biomolecular function can rely on the polarization induced by methyl groups, especially those that are proximal to charged moi- eties, including ions, titratable amino acids, sulphates, phosphates and nucleotides. ion channels | methylation | polarization | electronic structure | selectivity M ethyl groups play a central role in biomolecular func- tion. As constituents of biomolecules, these groups define the biomolecule’s solvated configurations. Also, their post-translational addition to peptides is regulated tightly to enable numerous physiological processes, including gene tran- scription, and signal transduction [1]. Furthermore, methy- lation of nucleotides is a crucial epigenetic modification that regulates many cellular processes such as embryonic develop- ment, transcription, chromatin structure, genomic imprint- ing and chromosome stability [2]. The chemical property of methyl groups that render them indispensable to these processes is their hydrophobicity, that is, their inability to hydrogen-bond with water molecules, or more generally, with polar groups. Methyl groups are, however, also polarizable. For exam- ple, methanol differs from water chemically in that it includes a methyl group, and has an average static dipole polarizabil- ity more than twice that of water (3.3 vs 1.5 ˚ A 3 ) [3]. In ad- dition, the average static dipole polarizabilities of alcohols in- crease with addition of methylene bridges (–CH2–). Methanol, ethanol and propanol have increasingly larger polarizabilities of 3.3, 4.5 and 6.7 ˚ A 3 , respectively [3]. While such trends in- dicate that methyl or methylene polarizability could be an im- portant contributor to the electrostatic and polarization forces that drive biomolecular function, there exist only suggestive and qualitative evidences as to their actual role. Studies ex- amining the effect of methylation on DNA stability [4] pro- posed that the enhanced stability of methylated DNA may be explained by considering that methylation of nucleotides in- creases their polarizability. In a different experimental study concerning the relative stabilities of DNA and RNA helices [5], it was proposed that the largest contribution to stabilization by methyl groups was due to increased base–stacking ability rather than favorable hydrophobic methyl–methyl contacts. We present investigations on K–channels that illustrate quan- titatively how methyl–induced polarization contributes to the functional properties of biomolecules. S1 S2 S3 S4 Thr side-chain Extra-cellular side a) KcsA T T V G Y G Kir 2.1 T T I G Y G Kir 2.1* T S I G Y G Kir 2.4 T S I G Y G Erg T S V G F G SK L S I G Y G Kcv S T V G F G Kcv* S S V G F G b) Threonine Serine c) S4 Fig. 1. (a) Selectivity filter of a representative K–channel, KcsA [7]. The orange spheres denote K + ions bound to two of their four preferred bindings sites, S2 and S4 (PDB ID: 1K4C). Threonine side chains that makes up the S4 site are highlighted in green. (b) Sequence alignment of the selectivity filter region of K–channels that have serine residues in their S4 sites. The K–channels whose names are appended with an asterisk are engineered via site-directed mutagenesis and were functionally characterized recently by Chatelain et al [16]. (c) Representative configurations of threonine and serine amino acids. The primary function of K–channels is to regulate trans- membrane salt concentration gradients [6]. K–channels ac- complish this task by transporting K + ions selectively through their pores in response to specific external stimuli. Fig. 1a is a representative structure of the selectivity filter of their pores [7], which shows four preferred binding sites for K + ions, S1–4. Three of these binding sites, S1–3, can be considered chemically identical as they provide eight backbone carbonyl oxygens for ion coordination. The fourth site, S4, is typically Reserved for Publication Footnotes www.pnas.org/cgi/doi/10.1073/pnas.0709640104 PNAS Issue Date Volume Issue Number 17

Transcript of pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of...

Page 1: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

The role of methyl–induced polarization in ionbindingMariana Rossi ∗, Alexandre Tkatchenko ∗ Susan B. Rempe † and Sameer Varma † ‡

∗Fritz-Haber-Institut der Max-Planck-Gesellschaft, D-14195 Berlin, Germany,†Biological and Materials Sciences Center, Sandia National Laboratories, Albuquerque, NM-87185,

United States of America, and ‡Department of Cell Biology, Microbiology and Molecular Biology, Department of Physics, University of South Florida, 4202 E. Fowler Ave.,

Tampa, FL-33620, United States of America

Submitted to Proceedings of the National Academy of Sciences of the United States of America

The chemical property of methyl groups that renders them indis-pensable to biomolecules is their hydrophobicity. Quantum mechan-ical studies undertaken here to understand the effect of point sub-stitutions on potassium (K–) channels illustrate quantitatively howmethyl–induced polarization also contributes to biomolecular func-tion. K–channels regulate transmembrane salt concentration gradi-ents by transporting K+ ions selectively. One of the K+ binding sitesin the channel’s selectivity filter, the S4 site, also binds Ba2+ ions,which blocks K+ transport. This inhibitory property of Ba2+ ionshas been vital in understanding K–channel mechanism. In most K–channels, the S4 site is comprised of four threonine amino acids. TheK–channels that carry serine instead of threonine are significantlyless susceptible to Ba2+ block and have reduced stabilities. We findthat these differences can be explained by the lower polarizabilityof serine compared to threonine as serine carries one less branchedmethyl group than threonine. A T→S substitution in the S4 sitereduces its polarizability, which, in turn, reduces ion binding by sev-eral kcal/mol. While the loss in binding affinity is high for Ba2+,the loss in K+ binding affinity is also significant thermodynamically,which reduces channel stability. These results highlight, in general,how biomolecular function can rely on the polarization induced bymethyl groups, especially those that are proximal to charged moi-eties, including ions, titratable amino acids, sulphates, phosphatesand nucleotides.

ion channels | methylation | polarization | electronic structure | selectivity

Methyl groups play a central role in biomolecular func-tion. As constituents of biomolecules, these groups

define the biomolecule’s solvated configurations. Also, theirpost-translational addition to peptides is regulated tightly toenable numerous physiological processes, including gene tran-scription, and signal transduction [1]. Furthermore, methy-lation of nucleotides is a crucial epigenetic modification thatregulates many cellular processes such as embryonic develop-ment, transcription, chromatin structure, genomic imprint-ing and chromosome stability [2]. The chemical propertyof methyl groups that render them indispensable to theseprocesses is their hydrophobicity, that is, their inability tohydrogen-bond with water molecules, or more generally, withpolar groups.

Methyl groups are, however, also polarizable. For exam-ple, methanol differs from water chemically in that it includesa methyl group, and has an average static dipole polarizabil-ity more than twice that of water (3.3 vs 1.5 A3) [3]. In ad-dition, the average static dipole polarizabilities of alcohols in-crease with addition of methylene bridges (–CH2–). Methanol,ethanol and propanol have increasingly larger polarizabilitiesof 3.3, 4.5 and 6.7 A3, respectively [3]. While such trends in-dicate that methyl or methylene polarizability could be an im-portant contributor to the electrostatic and polarization forcesthat drive biomolecular function, there exist only suggestiveand qualitative evidences as to their actual role. Studies ex-amining the effect of methylation on DNA stability [4] pro-posed that the enhanced stability of methylated DNA may beexplained by considering that methylation of nucleotides in-creases their polarizability. In a different experimental study

concerning the relative stabilities of DNA and RNA helices [5],it was proposed that the largest contribution to stabilizationby methyl groups was due to increased base–stacking abilityrather than favorable hydrophobic methyl–methyl contacts.We present investigations on K–channels that illustrate quan-titatively how methyl–induced polarization contributes to thefunctional properties of biomolecules.

S1

S2

S3

S4

Thrside-chain

Extra-cellular sidea)KcsA … T T V G Y G …Kir 2.1 … T T I G Y G …Kir 2.1* … T S I G Y G …Kir 2.4 … T S I G Y G …Erg … T S V G F G …SK … L S I G Y G …Kcv … S T V G F G …Kcv* … S S V G F G …

b)

Threonine Serine

c)

S4

Fig. 1. (a) Selectivity filter of a representative K–channel, KcsA [7]. The orange

spheres denote K+ ions bound to two of their four preferred bindings sites, S2 and

S4 (PDB ID: 1K4C). Threonine side chains that makes up the S4 site are highlighted

in green. (b) Sequence alignment of the selectivity filter region of K–channels that

have serine residues in their S4 sites. The K–channels whose names are appended

with an asterisk are engineered via site-directed mutagenesis and were functionally

characterized recently by Chatelain et al [16]. (c) Representative configurations of

threonine and serine amino acids.

The primary function of K–channels is to regulate trans-membrane salt concentration gradients [6]. K–channels ac-complish this task by transporting K+ ions selectively throughtheir pores in response to specific external stimuli. Fig. 1ais a representative structure of the selectivity filter of theirpores [7], which shows four preferred binding sites for K+ ions,S1–4. Three of these binding sites, S1–3, can be consideredchemically identical as they provide eight backbone carbonyloxygens for ion coordination. The fourth site, S4, is typically

Reserved for Publication Footnotes

www.pnas.org/cgi/doi/10.1073/pnas.0709640104 PNAS Issue Date Volume Issue Number 1–7

Page 2: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

composed of four threonine residues that provide four back-bone carbonyl oxygens and four side chain hydroxyl oxygensfor ion coordination. This S4 site is also the preferred bindingsite for Ba2+ ions that block K+ permeation [8, 9, 10, 11].This inhibitory property of Ba2+ has proven vital toward un-derstanding the mechanisms underlying K–channel function[8, 9, 10, 12, 13, 14, 15, 17, 18].

Genetic selection and site-directed mutagenesis experi-ments on a viral K–channel, Kcv, and the inward rectifierK–channel, Kir 2.1, show that a threonine-to-serine (T→S)substitution in the S4 sites reduces channel susceptibility toBa2+ block by over two orders in magnitude [16]. In addition,this substitution also reduces channel stability, including thechannel’s mean open probabilities. Furthermore, the sequencealignment of K–channels [19] shows that some K–channel subfamilies carry a serine instead of threonine residue in theirS4 sites (Fig. 1b). Among these serine-carrying channels,Kir 2.4 has also been found to exhibit similar characteristicdifferences with respect to the typical threonine–containingchannels [20, 21].

The single chemical difference between a serine and a thre-onine side chain is that serine has one less branched methylgroup than threonine (Fig. 1c). Consequently, serine can beexpected to be less polarizable than threonine. However, isthe difference in electronic polarizability sufficient to explainthe T→S induced changes in K-channel properties, especiallygiven that the ion in the S4 site resides at a distance greaterthan 5 A from the branched methyl group [7]? Serine andthreonine also have different hydrophobicities [22], so do T→Ssubstitutions alter the manner in which their side chain hy-droxyl groups align with the permeation pathway and interactwith ions?

The primary challenge associated with investigating theseissues is to model accurately the broad range of molecularforces involved in ion complexation, in particular, polariza-tion and dispersion that contribute non-trivially to ion-ligandand ligand-ligand energetics [23, 24, 25]. A consistent first-principles approach is, therefore, essential. Toward that end,we use density-functional theory with the semilocal PBE [26]and hybrid PBE0 [27, 28] exchange-correlation functionals.Since van der Waals (vdW) dispersion can be expected to be amajor contributor to ion–methyl energetics in the 5 A distancerange [29], we describe it explicitly using the recently devel-oped DFT+vdW method [30]. In this method, the C6[n(r)]coefficients of the interatomic interaction term C6[n(r)]/R6

are obtained from the self–consistent electron density n(r).This method yields an accuracy of about 0.3 kcal/mol in com-parison to “gold standard” quantum chemical calculationsfor a wide range of intermolecular interactions in moleculardimers [31]. We find that for our systems, the C6 coefficientsof the ions depend strongly on their local coordination en-vironments and can vary by an order in magnitude, makingtheir explicit electron–density dependent evaluation criticalto the accuracy of energetic and structural properties. Thesecalculations show that the reduction in methyl–induced po-larization of the S4 site associated with T→S substitutions issufficient to explain the aforementioned experimental obser-vations.

Results and DiscussionTo understand how T→S substitutions in the S4 site affect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate changes in enthalpies (∆H) and free energies(∆G) for the substitution reactions

AXn + nX′ AX′n + nX, [1]

in the gas phase. We consider first the set of reactions in whichA represents one of the ions Na+, K+ and Ba2+ and X repre-sents one of the molecules, water (W), methanol (M), ethanol(E), and propanol (P). While these four small molecules allprovide hydroxyl oxygens for ion coordination, they differfrom each other in the number of methyl groups or methylenebridges they carry. Consequently they have different staticdipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3, respectively[3]. Note, however, that these four small molecules all havecomparable gas phase dipole moments of 1.85, 1.70, 1.69 and1.68 Debye, respectively [32]. The PBE0 functional we em-ploy yields similar values for gas phase dipole moments, thatis, 1.91, 1.68, 1.72 and 1.77 Debye, respectively. The resultsof the substitution reaction energy calculations are plotted inFig. 2. The ∆G obtained for the set of reactions involving wa-ter molecules and methanols, for which experimental data areavailable [33, 34, 35], are in quantitative agreement with ex-perimental values (Table S1 of supporting information). Theexplicit inclusion of dispersion improves the performance ofPBE0 significantly.

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

< ⌧zz(t)⌧zz(0) >eq � < ⌧xx(t)⌧xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

✏|| = 0

d�/d✏ = 0.37 GPa

25

kxi � xjkAWn ! AMn

AMn ! AEn

AEn ! APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,?

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: August 7, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di↵erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di↵erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

dro

xylgro

upsalign

wit

hth

eper

mea

tion

path

way

and

inte

ract

wit

hio

ns?

Inth

isw

ork

,w

ein

ves

tigate

thes

eis

sues

usi

ng

quantu

mch

emic

alca

lcula

tions.

The

pri

mary

challen

ge

ass

oci

ate

dw

ith

studyin

gm

ethyl

gro

ups

usi

ng

quantu

mch

emis

try

isth

at

one

nee

ds

toacc

ount

for

the

contr

ibuti

on

from

dis

per

sion,

and

inth

isca

se,

for

syst

ems

conta

inin

gth

ousa

nds

of

elec

-tr

ons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[2

0]and

PB

E0

[21,22]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

3]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Resu

lts

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r(W

),m

ethanol(M

),et

hanol(E

)and

pro

panol

(P).

While

all

thes

em

ole

cule

spro

vid

ehydro

xyl

ligands

for

ion-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hylgro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

er-

ent

stati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of

1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

of

thes

eca

lcula

tions

are

plo

tted

inFig

.2.

We

find

that,

ingen

eral,

com

ple

xst

ability

changes

non-t

rivia

lly

wit

hth

enum

ber

of

met

hyl

gro

ups

inth

eco

ord

inati

ng

ligand.

the

magnit

ude

of

the

incr

ease

dep

ends

non-t

rivia

lly

charg

e/si

zera

tio

ofio

n,th

enum

ber

ofligandsco

ord

inati

ng

the

ion,asw

ellasth

edis

tance

from

the

ion

wher

eth

em

ethylgro

up

isadded

.W

efind

that

as

the

num

ber

of

met

hyl

gro

ups

inth

elig-

ands

incr

ease

,co

mple

xst

ability

enhance

s.W

hile

the

magni-

tude

of

this

e↵ec

tex

pec

tedly

dec

rease

sw

hen

met

hyl

gro

ups

are

intr

oduce

dfu

rther

away

from

the

ion,

itre

main

sin

the

physi

olo

gic

ally

signifi

cant

range

even

when

met

hylgro

ups

are

intr

oduce

dat

dis

tance

sgre

ate

rth

an

6A

(AE

n!

AP

n).

Inaddit

ion,

the

change

inco

mple

xst

ability

als

odep

ends

non-

linea

rly

on

ion-c

oord

inati

on

num

ber

.Such

am

ult

i-body

e↵ec

tem

erges

as

aco

nse

quen

ceofre

puls

ion

bet

wee

nth

eco

ord

inat-

ing

ligands,

and

iscr

itic

alto

the

esti

mati

on

ofre

lati

ve

stabil-

itie

sbet

wee

ntw

oio

ns

[16,17].

Inth

ecr

yst

alst

ruct

ure

of

the

Kcs

A[8

],th

edis

tance

be-

twee

nth

eK

+io

nand

the

met

hylgro

up

ofth

eth

reonin

esi

de

chain

inth

eS4

site

is⇠

5A

.T

his

isw

ithin

the

ion-m

ethyl

dis

tance

sex

am

ined

inFig

.2,w

hic

hsu

gges

tsth

at

the

subst

i-tu

tion

ofth

ism

ethylgro

up

wit

ha

hydro

gen

ato

min

aT!

Sm

uta

tion

can

reduce

the

bin

din

ga�

nit

ies

of

all

thre

eio

ns,

Na+,

K+

and

Ba2+

signifi

cantl

y.H

owev

er,

inth

eS4

site

,th

ese

ions

coord

inate

wit

hei

ght

ligands.

Conse

quen

tly,

the

repuls

ion

bet

wee

nth

eligands

will

be

larg

er,

and

the

over

all

reduct

ion

inbin

din

ga�

nity

could

be

smaller

.To

evalu

ate

Table

2.

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxovir

us

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cter

izati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cteri

zation

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cter

izati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tative

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cteri

zation

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cter

izati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cteri

zation

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

H)an

dfree

ener

gie

s(�

G)due

toin

crea

sein

the

num

ber

ofm

ethyl

gro

ups

inth

eio

n-c

oor

din

atin

glig

ands.�

Han

d

�G

are

estim

ated

forth

esu

bst

itution

reac

tionsA

Xn

+nX

0⌦

AX

0 n+

AX

n,

whic

har

eden

ote

das

AX

n!

AX

0 n.

Inth

ese

reac

tions,

Are

pres

ents

one

of

the

ions,

Na+

,K+

orB

a2+

,an

dX

repr

esen

tsone

ofth

elig

and

mole

cule

s,wat

er(W

),

met

han

ol(M

),et

han

ol(E

)or

propan

ol(P

).A

llen

ergie

sar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tofT!

Ssu

bst

itution

on

elec

tron

den

sities

ofan

isola

ted

S4

site

.T

he

di↵

eren

ces

bet

wee

nth

eel

ectr

onic

den

sities

,⇢

Ba

T4�

4⇢

CH

3�

⇢B

aS4�

4⇢

H,

wer

ees

tim

ated

using

the

PB

E0+

vdW

funct

ional

for

the

optim

um

geo

met

ryof

the

BaT4

com

ple

x.T

he

geo

met

ryof

the

BaS

4co

mple

xwas

const

ruct

edfrom

the

BaT4

optim

ized

geo

met

ryby

repla

cing

the

CH

3gro

up

of

each

thre

onin

ere

sidue

with

anH

atom

,an

dth

enre

laxi

ng

the

coor

din

ates

ofH

atom

s.

Mate

rials

and

Meth

ods

All

geo

met

ric

rela

xations

wer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[24,25],

wher

ewe

em-

plo

yed

the

def

ault

“tight”

stan

dar

ds

regar

din

gbas

isse

tsan

din

tegra

tion

grids

for

all

elem

ents

.For

DFT

(LD

A/G

GA

)ca

lcula

tions

thes

ese

ttin

gs

produce

esse

ntial

lyco

n-

2w

ww

.pnas

.org

/cg

i/doi/

10.1

073/pnas

.0709640104

Footlin

eA

uth

or

dro

xylgro

upsalign

wit

hth

eper

mea

tion

path

way

and

inte

ract

wit

hio

ns?

Inth

isw

ork

,w

ein

ves

tigate

thes

eis

sues

usi

ng

quantu

mch

emic

alca

lcula

tions.

The

pri

mary

challen

ge

ass

oci

ate

dw

ith

studyin

gm

ethyl

gro

ups

usi

ng

quantu

mch

emis

try

isth

at

one

nee

ds

toacc

ount

for

the

contr

ibuti

on

from

dis

per

sion,

and

inth

isca

se,

for

syst

ems

conta

inin

gth

ousa

nds

of

elec

-tr

ons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[2

0]and

PB

E0

[21,22]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

3]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Resu

lts

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r(W

),m

ethanol(M

),et

hanol(E

)and

pro

panol

(P).

While

all

thes

em

ole

cule

spro

vid

ehydro

xyl

ligands

for

ion-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hylgro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

er-

ent

stati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of

1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

of

thes

eca

lcula

tions

are

plo

tted

inFig

.2.

We

find

that,

ingen

eral,

com

ple

xst

ability

changes

non-t

rivia

lly

wit

hth

enum

ber

of

met

hyl

gro

ups

inth

eco

ord

inati

ng

ligand.

the

magnit

ude

of

the

incr

ease

dep

ends

non-t

rivia

lly

charg

e/si

zera

tio

ofio

n,th

enum

ber

ofligandsco

ord

inati

ng

the

ion,asw

ellasth

edis

tance

from

the

ion

wher

eth

em

ethylgro

up

isadded

.W

efind

that

as

the

num

ber

of

met

hyl

gro

ups

inth

elig-

ands

incr

ease

,co

mple

xst

ability

enhance

s.W

hile

the

magni-

tude

of

this

e↵ec

tex

pec

tedly

dec

rease

sw

hen

met

hyl

gro

ups

are

intr

oduce

dfu

rther

away

from

the

ion,

itre

main

sin

the

physi

olo

gic

ally

signifi

cant

range

even

when

met

hylgro

ups

are

intr

oduce

dat

dis

tance

sgre

ate

rth

an

6A

(AE

n!

AP

n).

Inaddit

ion,

the

change

inco

mple

xst

ability

als

odep

ends

non-

linea

rly

on

ion-c

oord

inati

on

num

ber

.Such

am

ult

i-body

e↵ec

tem

erges

as

aco

nse

quen

ceofre

puls

ion

bet

wee

nth

eco

ord

inat-

ing

ligands,

and

iscr

itic

alto

the

esti

mati

on

ofre

lati

ve

stabil-

itie

sbet

wee

ntw

oio

ns

[16,17].

Inth

ecr

yst

alst

ruct

ure

of

the

Kcs

A[8

],th

edis

tance

be-

twee

nth

eK

+io

nand

the

met

hylgro

up

ofth

eth

reonin

esi

de

chain

inth

eS4

site

is⇠

5A

.T

his

isw

ithin

the

ion-m

ethyl

dis

tance

sex

am

ined

inFig

.2,w

hic

hsu

gges

tsth

at

the

subst

i-tu

tion

ofth

ism

ethylgro

up

wit

ha

hydro

gen

ato

min

aT!

Sm

uta

tion

can

reduce

the

bin

din

ga�

nit

ies

of

all

thre

eio

ns,

Na+,

K+

and

Ba2+

signifi

cantl

y.H

owev

er,

inth

eS4

site

,th

ese

ions

coord

inate

wit

hei

ght

ligands.

Conse

quen

tly,

the

repuls

ion

bet

wee

nth

eligands

will

be

larg

er,

and

the

over

all

reduct

ion

inbin

din

ga�

nity

could

be

smaller

.To

evalu

ate

Table

2.

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxovir

us

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

H)an

dfree

ener

gie

s(�

G)due

toin

crea

sein

the

num

ber

ofm

ethyl

gro

ups

inth

eio

n-c

oor

din

atin

glig

ands.�

Han

d

�G

are

estim

ated

forth

esu

bst

itution

reac

tionsA

Xn

+nX

0⌦

AX

0 n+

AX

n,

whic

har

eden

ote

das

AX

n!

AX

0 n.

Inth

ese

reac

tions,

Are

pres

ents

one

of

the

ions,

Na+

,K+

orB

a2+

,an

dX

repr

esen

tsone

ofth

elig

and

mole

cule

s,wat

er(W

),

met

han

ol(M

),et

han

ol(E

)or

propan

ol(P

).A

llen

ergie

sar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tofT!

Ssu

bst

itution

on

elec

tron

den

sities

ofan

isola

ted

S4

site

.T

he

di↵

eren

ces

bet

wee

nth

eel

ectr

onic

den

sities

,⇢

Ba

T4�

4⇢

CH

3�

⇢B

aS4�

4⇢

H,

wer

ees

tim

ated

using

the

PB

E0+

vdW

funct

ional

for

the

optim

um

geo

met

ryof

the

BaT4

com

ple

x.T

he

geo

met

ryof

the

BaS

4co

mple

xwas

const

ruct

edfrom

the

BaT4

optim

ized

geo

met

ryby

repla

cing

the

CH

3gro

up

of

each

thre

onin

ere

sidue

with

anH

atom

,an

dth

enre

laxi

ng

the

coor

din

ates

ofH

atom

s.

Mate

rials

and

Meth

ods

All

geo

met

ric

rela

xations

wer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[24,25],

wher

ewe

em-

plo

yed

the

def

ault

“tight”

stan

dar

ds

regar

din

gbas

isse

tsan

din

tegra

tion

grids

for

all

elem

ents

.For

DFT

(LD

A/G

GA

)ca

lcula

tions

thes

ese

ttin

gs

produce

esse

ntial

lyco

n-

2w

ww

.pnas

.org

/cg

i/doi/

10.1

073/pnas

.0709640104

Footlin

eA

uth

or

< ⌧zz(t)⌧zz(0) >eq � < ⌧xx(t)⌧xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

✏|| = 0

d�/d✏ = 0.37 GPa

25

kxi � xjkAWn ! AMn

AMn ! AEn

AEn ! APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,?

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: August 7, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di↵erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di↵erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< ⌧zz(t)⌧zz(0) >eq � < ⌧xx(t)⌧xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

✏|| = 0

d�/d✏ = 0.37 GPa

25

kxi � xjkAWn ! AMn

AMn ! AEn

AEn ! APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,?

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: August 7, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di↵erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di↵erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (∆H) and free energies (∆G)

due to increase in the number of methyl groups or methylene bridges in the ion-

coordinating ligands. ∆H and ∆G are estimated at 298 K for the substitution

reactions given by Equ. 1, which are denoted as AXn → AX′n. All energies are in

units of kcal/mol.

We find that the stability of an ion complex, measuredin terms of both enthalpy and free energy, increases withthe numbers of methyl groups and methylene bridges in thecoordinating ligand. This enhanced stability cannot be ex-plained by considering the subtle differences in the gas phasedipole moments of the coordinating ligands [23], demonstrat-ing that the enhanced stability is due to the larger ligandpolarizabilities arising from the higher numbers of methylgroups or methylene bridges. The magnitude of the increasein stability, however, depends intricately on the interplay be-tween several factors. While the complex is stabilized lesswith each incremental change in hydrocarbon chain length,the increase in stability is still, for the molecules examined,large enough to be relevant physiologically. The ∆H and∆G associated with the substitution reactions also depend

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

Page 3: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

non–linearly on ion coordination number. This non-linearityemerges as a consequence of repulsion between the coordinat-ing ligands [23, 36, 37]. Finally, ∆H and ∆G also depend onthe charge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced effects are small. For example,while the ∆G associated with a K+W→ K+M substitution is-1.9 kcal/mol, the ∆G associated with the Ba2+W → Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear effect also un-derscores the importance of explicit polarization effects in ioncomplexation [23]. In a separate set of calculations, we findthat while polarizable force fields can describe such trends,there can be notable quantitative differences with respect toPBE0+vdW (Figure S1 of supporting information).

In the calculations above, the small molecules have similargas phase dipole moments, but markedly different polarizabil-ities. A ligand with a higher polarizability produces, in gen-eral, a more stable ion complex. Can ligands with similar gasphase dipole moments as well as static dipole polarizabilitiesproduce complexes with markedly different stabilities? Bu-tanol has three different isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (∼1.7 Debye) and polarizabili-ties (∼8.9 A3)[3], the differences in their chemical structureswill result in different distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite theirsimilar polarizabilities, they will respond to the spatially vary-ing field differently. It may also be expected that ion com-plexes comprised of isobutanol or 2-butanol, each of whichcarry one additional methyl group compared to n-butanol intheir branched hydrocarbon chains, will have a denser packingof methyl groups near the central ion and, therefore, higherstabilities compared to n-butanol.

To quantify this, we carry out substitution reactions(Equ. 1) involving these three butanol isomers. The resultsare plotted in Fig. 3. Indeed, we find that the isobutanoland 2-butanol complexes are, in general, more stable thanthe corresponding n-butanol complexes. Similar to the studyabove, ∆H depends non–linearly on ion coordination num-ber; a non–linearity that emerges from ligand-ligand repulsion[23, 36, 37], which increases with ligand number. Therefore,packing increasingly higher numbers of methyl groups aroundthe ion does not necessarily imply that the stability of theion complex will increase. For example, while ∆H = −3.0kcal/mol for the substitution reaction KB2 → K(2B)2, ∆H ∼0 for the substitution reaction KB4 → K(2B)4. In addition,despite the fact that one of the two terminal methyl groupsof 2-butanol is closer to the ion than that of isobutanol, com-plexes comprised of 2-butanols are not necessarily more sta-ble than the corresponding complexes of isobutanols. Thus,in addition to the static dipole polarizabilities of the methylgroups, their distribution around the ion also matter.

We note that the distances of the ions from the closestmethyl groups of 2-butanol (∼5 A) are comparable to thedistances of ions from the branched methyl group of threo-nine in the S4 site of KcsA [7, 11]. In addition, the packingof the branched methyl group of threonine in the S4 site ofKcsA is intermediate between the packing of branched methylgroups in 4-fold complexes comprised of 2-butanols and isobu-tanols (Figure S2 of supporting information). This suggeststhat the reduction in methyl–induced polarization from T→Ssubstitutions can reduce ion stabilities in the S4 sites of K–

channels. However, in the S4 sites of K–channels the ionscoordinate with eight ligands, and not just four hydroxyl lig-ands as studied above. Consequently, the repulsion betweenthe coordinating ligands can be larger in the S4 site, whichcould reduce or wash out the contribution of the threoninemethyl groups to the binding of ions at the S4 site. Further-more, a T→S substitution may also alter the manner in whichthe coordinating groups of the S4 site interact with ions.

ion-coordination number. This non-linearity emerges as aconsequence of repulsion between the coordinating ligands[23, 36, 37]. Finally, �H and �G, also depend on thecharge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced e↵ects are small. For example,while the �G associated with a K+W ! K+M substitution is-1.9 kcal/mol, the �G associated with the Ba2+W ! Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear e↵ect also un-derscores the importance of explicit polarization e↵ects in ioncomplexation [23].

In the calculations above, the small molecules have similargas phase dipole moments, but markedly di↵erent polarizabil-ities, and a ligand with a higher polarizability produces, ingeneral, a more stable ion complex. Can ligands with simi-lar gas phase dipole moments as well as static dipole polariz-abilties produce complexes with markedly di↵erent stabilities?Butanol has three di↵erent isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (⇠1.7 Debye) and polarizabili-ties (⇠8.9 A3)[3], the di↵erences in their chemical structureswill result in di↵erent distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite hav-ing similar polarizabilities, they will respond to the distance–dependent field of the ion di↵erently, and the closer proxim-ity of methyl groups in isobutanol and 2-butanol to the ion,compared to n-butanol, can be expected to increase the sta-bility of the complex. To quantify this e↵ect we carry outsubstitution reactions (Equ. 1) involving these three butanolisomers. The results are plotted in Fig. 3. Indeed, we findthat the isobutanol and 2-butanol complexes are more stablethan the corresponding n-butanol complexes. Furthermore,complexes comprising of 2-butanols are more stable than thecorresponding complexes comprising of isobutanols, as one ofthe two terminal methyl groups of 2-butanol is closer to theion than that of isobutanol. Thus, in addition to the staticdipole polarizabilities of the methyl groups, the distributionof methyl groups around the ion also plays a role in ion com-plex stability. We also note that the distances of the ions fromthe closest methyl groups of 2-butanol (⇠5 A) are compara-ble to the distances of ions from the branched methyl groupof threonine in the S4 site of KcsA [7, 11]. This suggests fur-ther that the reduction in methyl–induced polarization due toT!S substitutions in the S4 sites of K–channels can reducethe binding a�nities of all three ions significantly.

ABn ! A(iB)n

Fig. 3. E↵ect of butanol isoform on ion complexation enthalpies (�H). The

enthalpies are estimated at 298 K for the substitution reactions given by Equ. 1,

which are denoted as AXn ! AX0n. All energies are in units of kcal/mol.

In the S4 sites of K–channels the ions coordinate witheight ligands, and not just four hydroxyl ligands as studiedabove. Consequently, the repulsion between the coordinatingligands will be larger in the S4 site, which could reduce orwash out the contribution of the threonine methyl groups tothe binding of ions at the S4 site. Furthermore, a T!S sub-stitution may also alter the manner in which the coordinatinggroups of the S4 site interact with ions.

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS ⌦ AT4�nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coordi-nating groups is not large enough to counteract the stabiliza-tion provided by the presence of methyl groups. The a�nityof the S4 site for all three ions, Na+, K+ and Ba2+, dimin-ishes, although non-linearly, with each threonine substitution.Furthermore, these substitutions result in only minor config-urational changes that do not alter the overall topology ofthe S4 site (Table S2 of supporting information), which im-plies that the e↵ect of T!S substitutions on configurationalentropy will be minimal, and may be neglected. While wefind that the configurational change is generally higher in thecase of Ba2+ complexes as compared to Na+ and K+ com-plexes, we also note that optimizations of Ba2+ complexesfollowing T!S substitutions, but under heavy atom positionrestraints, result in a higher drop in S4 site a�nity. For ex-ample, a BaT4 ! BaS4 substitution followed by a constrainedoptimization yields a �E = XX kcal/mol, as opposed to a �E= XX kcal/mol obtained after an unconstrained optimization.This indicates that if the K–channel matrix were rigid to dis-allow a structural change, a T!S substitution will result ina higher loss in S4 site a�nity for Ba2+, as compared to theresults in Table 1. Together, this suggests that the loss ina�nity of the S4 site noted in Table 1 is not a result of alteredion–binding topologies, but is primarily due to a reduction inthe methyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the a�nity of the S4 site is the highest forBa2+ (Table 1). The free energy di↵erence of 6.9 kcal/mol isequivalent to a 106-fold drop in binding a�nity, where morethan half of the a�nity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop inBa2+ binding a�nity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T!S substi-tution experiments [16]. In addition, it also accounts for thedi↵erence between the Kir 2.4 and Kir 2.1 experimental IC50

values of Ba2+ [20, 21], as Kir 2.4 carries a serine and Kir 2.1a threonine in the S4 site.

The computed drop in stability is, however, greater thanthat inferred from experiments (by over 3 kcal/mol). This dis-crepancy does not result from the pairwise approximation [31]employed in the estimation of the dispersion energy. Whilemany-body dispersion terms can contribute significantly toligand binding [38], we find their contribution to be only0.3 kcal/mol to the BaT4 ! BaS4 substitution reaction (Ta-ble S4 of supporting information). We also do not expectthis discrepancy to be due to the missing structural restraintsfrom the protein matrix on the isolated S4 site, as the T!Ssubstitutions are accompanied by only minor configurationalchanges (Table S2 of supporting information). The overesti-mated drop in Ba2+ binding a�nity is most likely because ourcalculations lack a polarization coupling between the S4 siteand its external environment. Fig. 4 shows that the polar-ization in the electron density of the threonine methyl groupsis along the electric field of the Ba2+ ion, which maximizesthe contribution of methyl polarization to Ba2+ binding. Thepresence of other polar chemical moieties proximal to the S4site, including water molecules, can reduce the contributionof polarization and thereby the overall e↵ect of T!S substi-tutions on Ba2+ binding. Thermodynamically, this reduction

Footline Author PNAS Issue Date Volume Issue Number 3

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

dro

xylgro

upsalign

wit

hth

eper

mea

tion

path

way

and

inte

ract

wit

hio

ns?

Inth

isw

ork

,w

ein

ves

tigate

thes

eis

sues

usi

ng

quantu

mch

emic

alca

lcula

tions.

The

pri

mary

challen

ge

ass

oci

ate

dw

ith

studyin

gm

ethyl

gro

ups

usi

ng

quantu

mch

emis

try

isth

at

one

nee

ds

toacc

ount

for

the

contr

ibuti

on

from

dis

per

sion,

and

inth

isca

se,

for

syst

ems

conta

inin

gth

ousa

nds

of

elec

-tr

ons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[2

0]and

PB

E0

[21,22]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

3]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Resu

lts

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r(W

),m

ethanol(M

),et

hanol(E

)and

pro

panol

(P).

While

all

thes

em

ole

cule

spro

vid

ehydro

xyl

ligands

for

ion-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hylgro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

er-

ent

stati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of

1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

of

thes

eca

lcula

tions

are

plo

tted

inFig

.2.

We

find

that,

ingen

eral,

com

ple

xst

ability

changes

non-t

rivia

lly

wit

hth

enum

ber

of

met

hyl

gro

ups

inth

eco

ord

inati

ng

ligand.

the

magnit

ude

of

the

incr

ease

dep

ends

non-t

rivia

lly

charg

e/si

zera

tio

ofio

n,th

enum

ber

ofligandsco

ord

inati

ng

the

ion,asw

ellasth

edis

tance

from

the

ion

wher

eth

em

ethylgro

up

isadded

.W

efind

that

as

the

num

ber

of

met

hyl

gro

ups

inth

elig-

ands

incr

ease

,co

mple

xst

ability

enhance

s.W

hile

the

magni-

tude

of

this

e↵ec

tex

pec

tedly

dec

rease

sw

hen

met

hyl

gro

ups

are

intr

oduce

dfu

rther

away

from

the

ion,

itre

main

sin

the

physi

olo

gic

ally

signifi

cant

range

even

when

met

hylgro

ups

are

intr

oduce

dat

dis

tance

sgre

ate

rth

an

6A

(AE

n!

AP

n).

Inaddit

ion,

the

change

inco

mple

xst

ability

als

odep

ends

non-

linea

rly

on

ion-c

oord

inati

on

num

ber

.Such

am

ult

i-body

e↵ec

tem

erges

as

aco

nse

quen

ceofre

puls

ion

bet

wee

nth

eco

ord

inat-

ing

ligands,

and

iscr

itic

alto

the

esti

mati

on

ofre

lati

ve

stabil-

itie

sbet

wee

ntw

oio

ns

[16,17].

Inth

ecr

yst

alst

ruct

ure

of

the

Kcs

A[8

],th

edis

tance

be-

twee

nth

eK

+io

nand

the

met

hylgro

up

ofth

eth

reonin

esi

de

chain

inth

eS4

site

is⇠

5A

.T

his

isw

ithin

the

ion-m

ethyl

dis

tance

sex

am

ined

inFig

.2,w

hic

hsu

gges

tsth

at

the

subst

i-tu

tion

ofth

ism

ethylgro

up

wit

ha

hydro

gen

ato

min

aT!

Sm

uta

tion

can

reduce

the

bin

din

ga�

nit

ies

of

all

thre

eio

ns,

Na+,

K+

and

Ba2+

signifi

cantl

y.H

owev

er,

inth

eS4

site

,th

ese

ions

coord

inate

wit

hei

ght

ligands.

Conse

quen

tly,

the

repuls

ion

bet

wee

nth

eligands

will

be

larg

er,

and

the

over

all

reduct

ion

inbin

din

ga�

nity

could

be

smaller

.To

evalu

ate

Table

2.

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tative

Chara

cte

rizati

on

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxovir

us

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

H)an

dfree

ener

gie

s(�

G)due

toin

crea

sein

the

num

ber

ofm

ethyl

gro

ups

inth

eio

n-c

oor

din

atin

glig

ands.�

Han

d

�G

are

estim

ated

forth

esu

bst

itution

reac

tionsA

Xn

+nX

0⌦

AX

0 n+

AX

n,

whic

har

eden

ote

das

AX

n!

AX

0 n.

Inth

ese

reac

tions,

Are

pres

ents

one

of

the

ions,

Na+

,K+

orB

a2+

,an

dX

repr

esen

tsone

ofth

elig

and

mole

cule

s,wat

er(W

),

met

han

ol(M

),et

han

ol(E

)or

propan

ol(P

).A

llen

ergie

sar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tofT!

Ssu

bst

itution

on

elec

tron

den

sities

ofan

isola

ted

S4

site

.T

he

di↵

eren

ces

bet

wee

nth

eel

ectr

onic

den

sities

,⇢

Ba

T4�

4⇢

CH

3�

⇢B

aS4�

4⇢

H,

wer

ees

tim

ated

using

the

PB

E0+

vdW

funct

ional

for

the

optim

um

geo

met

ryof

the

BaT4

com

ple

x.T

he

geo

met

ryof

the

BaS

4co

mple

xwas

const

ruct

edfrom

the

BaT4

optim

ized

geo

met

ryby

repla

cing

the

CH

3gro

up

of

each

thre

onin

ere

sidue

with

anH

atom

,an

dth

enre

laxi

ng

the

coor

din

ates

ofH

atom

s.

Mate

rials

and

Meth

ods

All

geo

met

ric

rela

xations

wer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[24,25],

wher

ewe

em-

plo

yed

the

def

ault

“tight”

stan

dar

ds

regar

din

gbas

isse

tsan

din

tegra

tion

grids

for

all

elem

ents

.For

DFT

(LD

A/G

GA

)ca

lcula

tions

thes

ese

ttin

gs

produce

esse

ntial

lyco

n-

2w

ww

.pnas

.org

/cg

i/doi/

10.1

073/pnas

.0709640104

Footlin

eA

uth

or

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

ion-coordination number. This non-linearity emerges as aconsequence of repulsion between the coordinating ligands[23, 36, 37]. Finally, �H and �G, also depend on thecharge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced e↵ects are small. For example,while the �G associated with a K+W ! K+M substitution is-1.9 kcal/mol, the �G associated with the Ba2+W ! Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear e↵ect also un-derscores the importance of explicit polarization e↵ects in ioncomplexation [23].

In the calculations above, the small molecules have similargas phase dipole moments, but markedly di↵erent polarizabil-ities, and a ligand with a higher polarizability produces, ingeneral, a more stable ion complex. Can ligands with simi-lar gas phase dipole moments as well as static dipole polariz-abilties produce complexes with markedly di↵erent stabilities?Butanol has three di↵erent isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (⇠1.7 Debye) and polarizabili-ties (⇠8.9 A3)[3], the di↵erences in their chemical structureswill result in di↵erent distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite hav-ing similar polarizabilities, they will respond to the distance–dependent field of the ion di↵erently, and the closer proxim-ity of methyl groups in isobutanol and 2-butanol to the ion,compared to n-butanol, can be expected to increase the sta-bility of the complex. To quantify this e↵ect we carry outsubstitution reactions (Equ. 1) involving these three butanolisomers. The results are plotted in Fig. 3. Indeed, we findthat the isobutanol and 2-butanol complexes are more stablethan the corresponding n-butanol complexes. Furthermore,complexes comprising of 2-butanols are more stable than thecorresponding complexes comprising of isobutanols, as one ofthe two terminal methyl groups of 2-butanol is closer to theion than that of isobutanol. Thus, in addition to the staticdipole polarizabilities of the methyl groups, the distributionof methyl groups around the ion also plays a role in ion com-plex stability. We also note that the distances of the ions fromthe closest methyl groups of 2-butanol (⇠5 A) are compara-ble to the distances of ions from the branched methyl groupof threonine in the S4 site of KcsA [7, 11]. This suggests fur-ther that the reduction in methyl–induced polarization due toT!S substitutions in the S4 sites of K–channels can reducethe binding a�nities of all three ions significantly.

ABn ! A(2B)n

Fig. 3. E↵ect of butanol isoform on ion complexation enthalpies (�H). The

enthalpies are estimated at 298 K for the substitution reactions given by Equ. 1,

which are denoted as AXn ! AX0n. All energies are in units of kcal/mol.

In the S4 sites of K–channels the ions coordinate witheight ligands, and not just four hydroxyl ligands as studiedabove. Consequently, the repulsion between the coordinatingligands will be larger in the S4 site, which could reduce orwash out the contribution of the threonine methyl groups tothe binding of ions at the S4 site. Furthermore, a T!S sub-stitution may also alter the manner in which the coordinatinggroups of the S4 site interact with ions.

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS ⌦ AT4�nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coordi-nating groups is not large enough to counteract the stabiliza-tion provided by the presence of methyl groups. The a�nityof the S4 site for all three ions, Na+, K+ and Ba2+, dimin-ishes, although non-linearly, with each threonine substitution.Furthermore, these substitutions result in only minor config-urational changes that do not alter the overall topology ofthe S4 site (Table S2 of supporting information), which im-plies that the e↵ect of T!S substitutions on configurationalentropy will be minimal, and may be neglected. While wefind that the configurational change is generally higher in thecase of Ba2+ complexes as compared to Na+ and K+ com-plexes, we also note that optimizations of Ba2+ complexesfollowing T!S substitutions, but under heavy atom positionrestraints, result in a higher drop in S4 site a�nity. For ex-ample, a BaT4 ! BaS4 substitution followed by a constrainedoptimization yields a �E = XX kcal/mol, as opposed to a �E= XX kcal/mol obtained after an unconstrained optimization.This indicates that if the K–channel matrix were rigid to dis-allow a structural change, a T!S substitution will result ina higher loss in S4 site a�nity for Ba2+, as compared to theresults in Table 1. Together, this suggests that the loss ina�nity of the S4 site noted in Table 1 is not a result of alteredion–binding topologies, but is primarily due to a reduction inthe methyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the a�nity of the S4 site is the highest forBa2+ (Table 1). The free energy di↵erence of 6.9 kcal/mol isequivalent to a 106-fold drop in binding a�nity, where morethan half of the a�nity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop inBa2+ binding a�nity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T!S substi-tution experiments [16]. In addition, it also accounts for thedi↵erence between the Kir 2.4 and Kir 2.1 experimental IC50

values of Ba2+ [20, 21], as Kir 2.4 carries a serine and Kir 2.1a threonine in the S4 site.

The computed drop in stability is, however, greater thanthat inferred from experiments (by over 3 kcal/mol). This dis-crepancy does not result from the pairwise approximation [31]employed in the estimation of the dispersion energy. Whilemany-body dispersion terms can contribute significantly toligand binding [38], we find their contribution to be only0.3 kcal/mol to the BaT4 ! BaS4 substitution reaction (Ta-ble S4 of supporting information). We also do not expectthis discrepancy to be due to the missing structural restraintsfrom the protein matrix on the isolated S4 site, as the T!Ssubstitutions are accompanied by only minor configurationalchanges (Table S2 of supporting information). The overesti-mated drop in Ba2+ binding a�nity is most likely because ourcalculations lack a polarization coupling between the S4 siteand its external environment. Fig. 4 shows that the polar-ization in the electron density of the threonine methyl groupsis along the electric field of the Ba2+ ion, which maximizesthe contribution of methyl polarization to Ba2+ binding. Thepresence of other polar chemical moieties proximal to the S4site, including water molecules, can reduce the contributionof polarization and thereby the overall e↵ect of T!S substi-tutions on Ba2+ binding. Thermodynamically, this reduction

Footline Author PNAS Issue Date Volume Issue Number 3

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

non–linearly on ion coordination number. This non-linearityemerges as a consequence of repulsion between the coordinat-ing ligands [23, 36, 37]. Finally, �H and �G also depend onthe charge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced e↵ects are small. For example,while the �G associated with a K+W ! K+M substitution is-1.9 kcal/mol, the �G associated with the Ba2+W ! Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear e↵ect also un-derscores the importance of explicit polarization e↵ects in ioncomplexation [23].

In the calculations above, the small molecules have similargas phase dipole moments, but markedly di↵erent polarizabil-ities. A ligand with a higher polarizability produces, in gen-eral, a more stable ion complex. Can ligands with similar gasphase dipole moments as well as static dipole polarizabiltiesproduce complexes with markedly di↵erent stabilities? Bu-tanol has three di↵erent isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (⇠1.7 Debye) and polarizabili-ties (⇠8.9 A3)[3], the di↵erences in their chemical structureswill result in di↵erent distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite theirsimilar polarizabilities, they will respond to the spatially vary-ing field di↵erently. It may also be expected that the closerproximity of the ion to the methyl groups in isobutanol and 2-butanol, compared to n-butanol, will result in higher complexstabilities.

To quantify this, we carry out substitution reactions(Equ. 1) involving these three butanol isomers. The resultsare plotted in Fig. 3. Indeed, we find that the isobutanoland 2-butanol complexes are, in general, more stable thanthe corresponding n-butanol complexes. Similar to the studyabove, �H depends non–linearly on ion coordination num-ber; a non–linearity that emerges from ligand-ligand repulsion[23, 36, 37], which increases with ligand number. Therefore,packing increasingly higher numbers of methyl groups aroundthe ion does not necessarily imply that the stability of the ioncomplex will increase. For example, while �H = �3 kcal/molfor the substitution reaction K(B)2 ! K(2B)2, �H ⇠ 0 forthe substitution reaction K(B)4 ! K(2B)4. In addition, de-spite the fact that one of the two terminal methyl groups of2-butanol is closer to the ion than that of isobutanol, com-plexes comprised of 2-butanols are not necessarily more sta-ble than the corresponding complexes of isobutanols. Thus,in addition to the static dipole polarizabilities of the methylgroups, their distribution around the ion also matter.

We note that while the distances of the ions from the clos-est methyl groups of 2-butanol (⇠5 A) are comparable to thedistances of ions from the branched methyl group of threoninein the S4 site of KcsA [7, 11], the packing of methyl groupsin the 4-fold 2-butanol complexes (tetrahedral geometry) isdenser than the that in the S4 site of KcsA (Fig. 1). This sug-gests that the reduction in methyl–induced polarization fromT!S substations will reduce ion stabilities in the S4 sites ofK–channels. However, in the S4 sites of K–channels the ionscoordinate with eight ligands, and not just four hydroxyl lig-ands as studied above. Consequently, the repulsion betweenthe coordinating ligands can be larger in the S4 site, whichcould reduce or wash out the contribution of the threoninemethyl groups to the binding of ions at the S4 site. Further-

more, a T!S substitution may also alter the manner in whichthe coordinating groups of the S4 site interact with ions.

-12

-9

-6

-3

0

3

1 2 3 4 1 2 3 4

dro

xylgro

upsalign

wit

hth

eper

mea

tion

path

way

and

inte

ract

wit

hio

ns?

Inth

isw

ork

,w

ein

ves

tigate

thes

eis

sues

usi

ng

quantu

mch

emic

alca

lcula

tions.

The

pri

mary

challen

ge

ass

oci

ate

dw

ith

studyin

gm

ethyl

gro

ups

usi

ng

quantu

mch

emis

try

isth

at

one

nee

ds

toacc

ount

for

the

contr

ibuti

on

from

dis

per

sion,

and

inth

isca

se,

for

syst

ems

conta

inin

gth

ousa

nds

of

elec

-tr

ons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[2

0]and

PB

E0

[21,22]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

3]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

of

met

hylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r(W

),m

ethanol(M

),et

hanol(E

)and

pro

panol

(P).

While

all

thes

em

ole

cule

spro

vid

ehydro

xyl

ligands

for

ion-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hylgro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

er-

ent

stati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of

1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

of

thes

eca

lcula

tions

are

plo

tted

inFig

.2.

We

find

that,

ingen

eral,

com

ple

xst

ability

changes

non-t

rivia

lly

wit

hth

enum

ber

of

met

hyl

gro

ups

inth

eco

ord

inati

ng

ligand.

the

magnit

ude

of

the

incr

ease

dep

ends

non-t

rivia

lly

charg

e/si

zera

tio

ofio

n,th

enum

ber

ofligandsco

ord

inati

ng

the

ion,asw

ellasth

edis

tance

from

the

ion

wher

eth

em

ethylgro

up

isadded

.W

efind

that

as

the

num

ber

of

met

hyl

gro

ups

inth

elig-

ands

incr

ease

,co

mple

xst

ability

enhance

s.W

hile

the

magni-

tude

of

this

e↵ec

tex

pec

tedly

dec

rease

sw

hen

met

hyl

gro

ups

are

intr

oduce

dfu

rther

away

from

the

ion,

itre

main

sin

the

physi

olo

gic

ally

signifi

cant

range

even

when

met

hylgro

ups

are

intr

oduce

dat

dis

tance

sgre

ate

rth

an

6A

(AE

n!

AP

n).

Inaddit

ion,

the

change

inco

mple

xst

ability

als

odep

ends

non-

linea

rly

on

ion-c

oord

inati

on

num

ber

.Such

am

ult

i-body

e↵ec

tem

erges

as

aco

nse

quen

ceofre

puls

ion

bet

wee

nth

eco

ord

inat-

ing

ligands,

and

iscr

itic

alto

the

esti

mati

on

ofre

lati

ve

stabil-

itie

sbet

wee

ntw

oio

ns

[16,17].

Inth

ecr

yst

alst

ruct

ure

of

the

Kcs

A[8

],th

edis

tance

be-

twee

nth

eK

+io

nand

the

met

hylgro

up

ofth

eth

reonin

esi

de

chain

inth

eS4

site

is⇠

5A

.T

his

isw

ithin

the

ion-m

ethyl

dis

tance

sex

am

ined

inFig

.2,w

hic

hsu

gges

tsth

at

the

subst

i-tu

tion

ofth

ism

ethylgro

up

wit

ha

hydro

gen

ato

min

aT!

Sm

uta

tion

can

reduce

the

bin

din

ga�

nit

ies

of

all

thre

eio

ns,

Na+,

K+

and

Ba2+

signifi

cantl

y.H

owev

er,

inth

eS4

site

,th

ese

ions

coord

inate

wit

hei

ght

ligands.

Conse

quen

tly,

the

repuls

ion

bet

wee

nth

eligands

will

be

larg

er,

and

the

over

all

reduct

ion

inbin

din

ga�

nity

could

be

smaller

.To

evalu

ate

Table

2.

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quanti

tati

ve

Chara

cteri

zati

on

ofIn

trin

sic

Mole

cula

rM

oti

on

Usi

ng

Support

Vect

or

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxovir

us

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cteri

zation

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vect

or

Mach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

trons-ALEX/MARIANA-PLEASEELABORATE...Weem-ploytherecentlydeveloped...Weusedensity-functionaltheorywithinthePBE[18]andPBE0[19,20]approximationsfortheexchange-correlationpotential.Inaddition,weemployare-centlydevelopedvanderWaals(vdW)correctionscheme[21]basedonasummationofpairwiseC6[n]/R

6terms,wherethe

C6coe�cientsarebasedontheself-consistentelectronicden-sity.Thisapproachallowsustotreataccuratelytheenergeticsande↵ectsofpolarizationinoursystems.

WefindthataT!Ssubstitutionisaccompaniedwithonlyminorstructuralchanges.Inaddition,wefindthatthereduc-tioninpolarizabilityofS4associatedwithaT!Ssubstitu-tionissu�cienttoexplainexperimentalobservations.Thesestudiesprovide,ingeneral,avividexampleofhowpeptidefunctionisinfluencedbythepolarizabilityofmethylgroups,especiallythosethatareproximaltochargedmoieties,includ-ingions,titratableside-acidsandphosphates.

ResultsandDiscussionTounderstandhowT!SsubstitutionsintheS4sitea↵ectK–channelfunction,weexaminefirstthegeneralconsequencesofmethylpolarizabilityonthethermodynamicsofionbind-ing.WeestimatethegasphaseenthalpiesandfreeenergiesofNa

+,K

+andBa

2+ionbindingtofourdi↵erentmolecules,

namelywater,methanol,ethanolandpropanol.Whileallthesemoleculesprovidehydroxylligandsforion-coordination,theydi↵erfromeachotherinthenumberofmethylgroupstheycarry.Consequentlytheyhavedi↵erentstaticaverageground-statedipolepolarizabilitiesof1.5,3.3,5.4and6.7A

3,

respectively[15].TheresultsofthesecalculationsareplottedinFig.2.Wefindthatwhiletheadditionofmethylgroupsenhancethestabilityofthecomplex,thee↵ectdecreaseswitheachincrementaladdition.Inaddition,multibodye↵ectsarethee↵ectofcoordinationnumberisnon-trivial.

Table2.distancebetweenmethylandKioninthes4siteisabout5A

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

-18-15-12-9-6-303

123412341234

-20

-16

-12

-8

-4

0

123412341234

K+

Ba2+

Na+ ΔHΔG

nnn

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

<�zz(t)�zz(0)>eq�<�xx(t)�xx(0)>eq

{a,b,c}k(xi,xj)=xi·xj

2nd

�||=0

d�/d�=0.37GPa

25

�xi�xj�AWn�AMn

AMn�AEn

AEn�APn

QuantitativeCharacterizationofIntrinsicMolecularMotionUsingSupportVectorMachines

RalphE.Leighty1

andSameerVarma1,2,3,�

1DepartmentofCellBiology,MicrobiologyandMolecularBiology,

2DepartmentofPhysics,UniversityofSouthFlorida,Tampa,FL33620,USAand

3InstituteofPureandAppliedMathematics,UniversityofCaliforniaLosAngeles,LosAngeles,CA90095,USA

(Dated:July30,2012DRAFT)

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,canbealteredbynumerousenvironmentalfactors,suchastemperature,andalsobythebindingofothermolecules.Anunderstandingofensemble-leveldi↵erencesfromageometricperspectiveisimportantbecauseitprovidesabasisforrelatingthermodynamicchangestochangesinmolecularmotion.Thetaskofdi↵erentiatingtwoconfigurationalensemblesis,however,challengingbecauseitrequirescomparingtwohigh-dimensionaldatasets.Traditionally,whenanalyzingmolecularsimulations,thisproblemiscircumventedbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingsummarystatisticsfromthetwoensemblesagainsteachother.However,sincedimensionalityreductioniscarriedoutpriortocomparisonofensembles,suchstrategiesaresusceptibletoartifactualbiasesfrominformationloss.Hereweintroduceamethodbasedonsupportvectormachines(SVM)thatcompares3-densemblesdirectlyagainstoneanotherintheHilbertspacewithoutaprerequisitereductioninphasespace.Whilethismethodcanbeappliedtoanymolecularsystem,hereweexploreitssensitivitytowardmodelsystemscomprisedofaminoacids.WealsoapplythetechniquetoidentifythespecificregionsofaparamyxovirusGproteinthatarea↵ectedbythebindingofitspreferredhumanreceptor,EphrinB2.Wefindthatthespecificregionsidentifiedbythismethodincludethesetofaminoacidsthatareknownfromexperimentalstudiestoplayavitalroleinviralfusion.Theconfigurationalensemblesoftheviralproteininbothitsboundandunboundstatesweregeneratedusingall-atommoleculardynamicssimulations.

INTRODUCTION

Theensembleof3-dconfigurationsexhibitedbyamolecule,thatis,itsintrinsicmotion,iscorrelatedwiththepropertiesofitsenvironment[1–12].Changesinintensivevariables,suchastemperatureandpressure,modifytheintrinsicmotionofamolecule.Additionally,molecularmotionalsochangesasaresultofbindingwithothermolecules,suchasinligand-substratecomplexesormolecularassemblies.Moreover,theextentofthechangeinmolecularmotionisdependentuponmultiplefactors,includingpropertiesofthemoleculeandthenatureoftheperturbationorexternalpotential.Aquantitativecharacterizationofsuchchangesinmolecularmotionisimportantfrombothscientificaswellasengineeringper-spectivesbecauseitprovidesabasistoassociatechanges

inthermodynamicpropertiesdirectlywithcorrespondingchangesinmolecularmotion.

Di�erentiatingbetweentwoensemblesofmolecularconfigurationsis,however,challenging.Thechallengeliesincomparingtwosetsofhigh-dimensiondata.Forthemotionofan-particlemoleculerepresentedbym-configurations,{an(x)}

m1,thetaskofdi�erentiatingit

fromthemolecule’sreferencestate,{an(x0)}m1,involves

comparingtwo3n-dimensionalvectorspaces.Tradition-ally,whenanalyzingmolecularsimulations,thisprob-lemisdealtwithbyfirstreducingthedimensionsofthetwoensemblesseparately,andthencomparingtheresult-ingsummarystatisticsfromthetwoensemblesagainsteachother[13].Dimensionalityreductioniscarriedout,forexample,byaveragingovern-spacethatinvolvesparticle-clusteringoraveragingoverm-spacethatyields

Fig.2.Changesinioncomplexationenthalpies(��H)andfreeenergies

(��G)duetomethylgroups.�Hand�Gwereestimatedseparatelyforre-

actionsA+nX⌦AXninwhichAisanion,Na+,K+orBa2+,andXis

aligandmolecule,water(W),methanol(M),ethanol(E)orpropanol(P).The

reactionswerethencombinedtoobtain��Hand��G.Allvaluesarereported

inkcal/mol.

Fig.3.E↵ectofT!Ssubstitutiononelectrondensities.Di↵erencesbetweenthe

electronicdensitiesof(BaT++4�4CH3)�(BaS

++4�4H),calculatedwith

PBE0+vdW,areshown.Thegeometryusedwasthatofthefullyrelaxed(PBE+vdW)

BaT++4complex.TobuildBaS

++4theCH3groupsofTweresubstitutedby

Hatomsandonlythesewererelaxed.Thedi↵erenceinelectrondensityshownthus

correspondstothee↵ectofpolarizationontheelectronicdensitiescausedbythe

additionofmethylgroups.

MaterialsandMethods

Allgeometricrelaxationswereperformedwiththeall-electron,localizedbasis

(numericatom-centeredorbitals)programpackageFHI-aims[22,23],whereweem-

2www.pnas.org/cgi/doi/10.1073/pnas.0709640104FootlineAuthor

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mat

eria

lsan

dM

ethods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mat

eria

lsan

dM

ethods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons-A

LE

X/M

AR

IAN

A-P

LE

ASE

ELA

BO

RA

TE

...W

eem

-plo

yth

ere

centl

ydev

eloped

...W

euse

den

sity

-funct

ionalth

eory

wit

hin

the

PB

E[1

8]and

PB

E0

[19,20]appro

xim

ati

onsfo

rth

eex

change-

corr

elati

on

pote

nti

al.

Inaddit

ion,

we

emplo

ya

re-

centl

ydev

eloped

van

der

Waals

(vdW

)co

rrec

tion

schem

e[2

1]

base

don

asu

mm

ati

on

ofpair

wis

eC

6[n

]/R

6te

rms,

wher

eth

eC

6co

e�ci

ents

are

base

don

the

self-c

onsi

sten

tel

ectr

onic

den

-si

ty.

This

appro

ach

allow

susto

trea

tacc

ura

tely

the

ener

get

ics

and

e↵ec

tsofpola

riza

tion

inour

syst

ems.

We

find

thata

T!

Ssu

bst

ituti

on

isacc

om

panie

dw

ith

only

min

or

stru

ctura

lch

anges

.In

addit

ion,w

efind

that

the

reduc-

tion

inpola

riza

bility

of

S4

ass

oci

ate

dw

ith

aT!

Ssu

bst

itu-

tion

issu�

cien

tto

expla

inex

per

imen

tal

obse

rvati

ons.

Thes

est

udie

spro

vid

e,in

gen

eral,

aviv

idex

am

ple

of

how

pep

tide

funct

ion

isin

fluen

ced

by

the

pola

riza

bility

ofm

ethylgro

ups,

espec

ially

those

thatare

pro

xim

alto

charg

edm

oie

ties

,in

clud-

ing

ions,

titr

ata

ble

side-

aci

ds

and

phosp

hate

s.

Res

ults

and

Discu

ssio

nTo

under

stand

how

T!

Ssu

bst

ituti

onsin

the

S4

site

a↵ec

tK

–ch

annel

funct

ion,

we

exam

ine

firs

tth

egen

eral

conse

quen

ces

of

met

hylpola

riza

bility

on

the

ther

modynam

ics

of

ion

bin

d-

ing.

We

esti

mate

the

gas

phase

enth

alp

ies

and

free

ener

gie

sofN

a+,K

+and

Ba2+

ion

bin

din

gto

four

di↵

eren

tm

ole

cule

s,nam

ely

wate

r,m

ethanol,

ethanol

and

pro

panol.

While

all

thes

em

ole

cule

spro

vid

ehydro

xylligandsfo

rio

n-c

oord

inati

on,

they

di↵

erfr

om

each

oth

erin

the

num

ber

of

met

hyl

gro

ups

they

carr

y.C

onse

quen

tly

they

hav

edi↵

eren

tst

ati

cav

erage

gro

und-s

tate

dip

ole

pola

riza

bilit

ies

of1.5

,3.3

,5.4

and

6.7

A3,

resp

ecti

vel

y[1

5].

The

resu

lts

ofth

ese

calc

ula

tions

are

plo

tted

inFig

.2.

We

find

that

while

the

addit

ion

of

met

hylgro

ups

enhance

the

stability

ofth

eco

mple

x,th

ee↵

ect

dec

rease

sw

ith

each

incr

emen

taladdit

ion.

Inaddit

ion,m

ult

ibody

e↵ec

tsare

the

e↵ec

tofco

ord

inati

on

num

ber

isnon-t

rivia

l.

Table

2.

dis

tance

bet

wee

nm

ethyland

Kio

nin

the

s4si

teis

about

5A

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

-18

-15

-12-9-6-303

12

34

12

34

12

34

-20

-16

-12-8-40

12

34

12

34

12

34

K+

Ba2+

Na+

ΔH ΔG

nn

n

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2n

d

� ||=

0

d�/d�

=0.3

7G

Pa

25 �x

i�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cte

rization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vecto

rM

ach

ines

Ralp

hE

.Lei

ghty

1and

Sam

eer

Varm

a1,2

,3,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biolo

gyand

Molecu

lar

Bio

logy

,2D

epart

men

tof

Physi

cs,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Date

d:

July

30,2012

DR

AFT

)

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,

its

intr

insi

cm

oti

on,

can

be

alt

ered

by

num

erous

envir

onm

enta

lfa

ctors

,su

chas

tem

per

atu

re,and

als

oby

the

bin

din

gofoth

erm

ole

cule

s.A

nunder

standin

gofen

sem

ble

-lev

eldi↵

eren

cesfr

om

ageo

met

ric

per

spec

tive

isim

port

ant

bec

ause

itpro

vid

esa

basi

sfo

rre

lati

ng

ther

modynam

icch

anges

toch

anges

inm

ole

cula

rm

oti

on.

The

task

ofdi↵

eren

tiati

ng

two

configura

tionalen

sem

ble

sis

,how

ever

,ch

allen

gin

gbec

ause

itre

quir

esco

mpari

ng

two

hig

h-d

imen

sional

data

sets

.Tra

dit

ionally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

ble

mis

circ

um

ven

ted

by

firs

tre

duci

ng

the

dim

ensi

ons

of

the

two

ense

mble

sse

para

tely

,and

then

com

pari

ng

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er.

How

ever

,si

nce

dim

ensi

onality

reduct

ion

isca

rrie

dout

pri

or

toco

mpari

son

of

ense

mble

s,su

chst

rate

gie

sare

susc

epti

ble

toart

ifact

ualbia

sesfr

om

info

rmati

on

loss

.H

ere

we

intr

oduce

am

ethod

base

don

support

vec

tor

mach

ines

(SV

M)

that

com

pare

s3-d

ense

mble

sdir

ectl

yagain

stone

anoth

erin

the

Hilber

tsp

ace

wit

hout

apre

requis

ite

reduct

ion

inphase

space

.W

hile

this

met

hod

can

be

applied

toany

mole

cula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

of

am

ino

aci

ds.

We

als

oapply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gio

ns

ofa

para

myxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gofit

spre

ferr

edhum

an

rece

pto

r,E

phri

nB

2.

We

find

thatth

esp

ecifi

cre

gio

ns

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

aci

ds

that

are

know

nfr

om

exper

imen

talst

udie

sto

pla

ya

vit

al

role

invir

al

fusi

on.

The

configura

tional

ense

mble

sof

the

vir

al

pro

tein

inboth

its

bound

and

unbound

state

sw

ere

gen

erate

dusi

ng

all-a

tom

mole

cula

rdynam

ics

sim

ula

tions.

INT

RO

DU

CT

ION

The

ense

mble

of

3-d

configura

tions

exhib

ited

by

am

ole

cule

,th

at

is,it

sin

trin

sic

moti

on,is

corr

elate

dw

ith

the

pro

per

ties

of

its

envir

onm

ent

[1–12].

Changes

inin

tensi

ve

vari

able

s,su

chas

tem

per

atu

reand

pre

ssure

,m

odify

the

intr

insi

cm

oti

on

ofa

mole

cule

.A

ddit

ionally,

mole

cula

rm

oti

on

als

och

anges

asa

resu

ltofbin

din

gw

ith

oth

erm

ole

cule

s,su

chasin

ligand-s

ubst

rate

com

ple

xes

or

mole

cula

rass

emblies

.M

ore

over

,th

eex

tentofth

ech

ange

inm

ole

cula

rm

oti

on

isdep

enden

tupon

mult

iple

fact

ors

,in

cludin

gpro

per

ties

of

the

mole

cule

and

the

natu

reof

the

per

turb

ati

on

or

exte

rnal

pote

nti

al.

Aquanti

tati

ve

chara

cter

izati

on

of

such

changes

inm

ole

cula

rm

oti

on

isim

port

antfr

om

both

scie

nti

fic

as

wel

las

engin

eeri

ng

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

basi

sto

ass

oci

ate

changes

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espondin

gch

anges

inm

ole

cula

rm

oti

on.

Di�

eren

tiati

ng

bet

wee

ntw

oen

sem

ble

sof

mole

cula

rco

nfigura

tions

is,

how

ever

,ch

allen

gin

g.

The

challen

ge

lies

inco

mpari

ng

two

sets

of

hig

h-d

imen

sion

data

.For

the

moti

on

of

an-p

art

icle

mole

cule

repre

sente

dby

m-

configura

tions,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiati

ng

itfr

om

the

mole

cule

’sre

fere

nce

state

,{a

n(x

0)}

m 1,in

volv

esco

mpari

ng

two

3n-d

imen

sionalvec

tor

space

s.Tra

dit

ion-

ally,

when

analy

zing

mole

cula

rsi

mula

tions,

this

pro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

para

tely

,and

then

com

pari

ng

the

resu

lt-

ing

sum

mary

stati

stic

sfr

om

the

two

ense

mble

sagain

stea

choth

er[1

3].

Dim

ensi

onality

reduct

ion

isca

rrie

dout,

for

exam

ple

,by

aver

agin

gov

ern-s

pace

that

involv

espart

icle

-clu

ster

ing

or

aver

agin

gov

erm

-space

that

yie

lds

<� z

z(t

)�zz(0

)>

eq�

<� x

x(t

)�xx(0

)>

eq

{a,b

,c}

k(x

i,x

j)

=x

i·x

j

2nd

� ||=

0

d�/d

�=

0.37

GPa

25 �xi�

xj�

AW

n�

AM

n

AM

n�

AE

n

AE

n�

AP

n

Quantita

tive

Chara

cter

ization

ofIn

trin

sic

Mole

cula

rM

otion

Usi

ng

Support

Vec

tor

Mach

ines

Ral

ph

E.

Lei

ghty

1an

dSam

eer

Var

ma1

,2,3

,�

1D

epart

men

tof

Cel

lBio

logy

,M

icro

biology

and

Molecu

lar

Bio

logy

,2D

epart

men

tof

Phys

ics,

Univ

ersi

tyof

South

Flo

rida,

Tam

pa,

FL

33620,

USA

and

3In

stitute

ofPure

and

Applied

Math

ematics

,U

niv

ersi

tyofC

alifo

rnia

Los

Ange

les,

Los

Ange

les,

CA

90095,U

SA

(Dat

ed:

July

30,20

12D

RA

FT

)

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,

can

be

alte

red

by

num

erou

sen

vir

onm

enta

lfa

ctor

s,su

chas

tem

per

ature

,an

dal

soby

the

bin

din

gof

other

mol

ecule

s.A

nunder

stan

din

gof

ense

mble

-lev

eldi↵

eren

cesfr

oma

geom

etri

cper

spec

tive

isim

por

tant

bec

ause

itpro

vid

esa

bas

isfo

rre

lati

ng

ther

modynam

icch

ange

sto

chan

ges

inm

olec

ula

rm

otio

n.

The

task

ofdi↵

eren

tiat

ing

two

configu

rati

onal

ense

mble

sis

,how

ever

,ch

alle

ngi

ng

bec

ause

itre

quir

esco

mpar

ing

two

hig

h-d

imen

sion

aldat

aset

s.Tra

dit

ional

ly,

when

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

ble

mis

circ

um

vente

dby

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

.H

owev

er,

since

dim

ensi

onal

ity

reduct

ion

isca

rrie

dou

tpri

orto

com

par

ison

ofen

sem

ble

s,su

chst

rate

gies

are

susc

epti

ble

toar

tifa

ctual

bia

sesfr

omin

form

atio

nlo

ss.

Her

ew

ein

troduce

am

ethod

bas

edon

suppor

tve

ctor

mac

hin

es(S

VM

)th

atco

mpar

es3-

den

sem

ble

sdir

ectl

yag

ainst

one

anot

her

inth

eH

ilber

tsp

ace

wit

hou

ta

pre

requis

ite

reduct

ion

inphas

esp

ace.

While

this

met

hod

can

be

applied

toan

ym

olec

ula

rsy

stem

,her

ew

eex

plo

reit

sse

nsi

tivity

tow

ard

model

syst

ems

com

pri

sed

ofam

ino

acid

s.W

eal

soap

ply

the

tech

niq

ue

toid

enti

fyth

esp

ecifi

cre

gion

sof

apar

amyxov

irus

Gpro

tein

that

are

a↵ec

ted

by

the

bin

din

gof

its

pre

ferr

edhum

anre

cepto

r,E

phri

nB

2.W

efind

that

the

spec

ific

regi

ons

iden

tified

by

this

met

hod

incl

ude

the

set

ofam

ino

acid

sth

atar

eknow

nfr

omex

per

imen

talst

udie

sto

pla

ya

vit

alro

lein

vir

alfu

sion

.T

he

configu

rati

onal

ense

mble

sof

the

vir

alpro

tein

inbot

hit

sbou

nd

and

unbou

nd

stat

esw

ere

gener

ated

usi

ng

all-at

omm

olec

ula

rdynam

ics

sim

ula

tion

s.

INT

RO

DU

CT

ION

The

ense

mble

of3-

dco

nfigu

rati

ons

exhib

ited

by

am

olec

ule

,th

atis

,it

sin

trin

sic

mot

ion,is

corr

elat

edw

ith

the

pro

per

ties

ofits

envir

onm

ent

[1–1

2].

Chan

ges

inin

tensi

veva

riab

les,

such

aste

mper

ature

and

pre

ssure

,m

odify

the

intr

insi

cm

otio

nof

am

olec

ule

.A

ddit

ional

ly,

mol

ecula

rm

otio

nal

soch

ange

sas

are

sult

ofbin

din

gw

ith

other

mol

ecule

s,su

chas

inliga

nd-s

ubst

rate

com

ple

xes

orm

olec

ula

ras

sem

blies

.M

oreo

ver,

the

exte

ntof

the

chan

gein

mol

ecula

rm

otio

nis

dep

enden

tupon

mult

iple

fact

ors,

incl

udin

gpro

per

ties

ofth

em

olec

ule

and

the

nat

ure

ofth

eper

turb

atio

nor

exte

rnal

pot

enti

al.

Aquan

tita

tive

char

acte

riza

tion

ofsu

chch

ange

sin

mol

ecula

rm

otio

nis

impor

tantfr

ombot

hsc

ienti

fic

asw

ellas

engi

nee

ring

per

-sp

ecti

ves

bec

ause

itpro

vid

esa

bas

isto

asso

ciat

ech

ange

s

inth

erm

odynam

icpro

per

ties

dir

ectl

yw

ith

corr

espon

din

gch

ange

sin

mol

ecula

rm

otio

n.

Di�

eren

tiat

ing

bet

wee

ntw

oen

sem

ble

sof

mol

ecula

rco

nfigu

rati

ons

is,

how

ever

,ch

alle

ngi

ng.

The

chal

lenge

lies

inco

mpar

ing

two

sets

ofhig

h-d

imen

sion

dat

a.For

the

mot

ion

ofa

n-p

arti

cle

mol

ecule

repre

sente

dby

m-

configu

rati

ons,

{an(x

)}m 1

,th

eta

skof

di�

eren

tiat

ing

itfr

omth

em

olec

ule

’sre

fere

nce

stat

e,{a

n(x

0)}

m 1,in

volv

esco

mpar

ing

two

3n-d

imen

sion

alve

ctor

spac

es.

Tra

dit

ion-

ally

,w

hen

anal

yzi

ng

mol

ecula

rsi

mula

tion

s,th

ispro

b-

lem

isdea

ltw

ith

by

firs

tre

duci

ng

the

dim

ensi

ons

ofth

etw

oen

sem

ble

sse

par

atel

y,an

dth

enco

mpar

ing

the

resu

lt-

ing

sum

mar

yst

atis

tics

from

the

two

ense

mble

sag

ainst

each

other

[13]

.D

imen

sion

ality

reduct

ion

isca

rrie

dou

t,fo

rex

ample

,by

aver

agin

gov

ern-s

pac

eth

atin

volv

espar

ticl

e-cl

ust

erin

gor

aver

agin

gov

erm

-spac

eth

atyie

lds

Fig

.2.

Chan

ges

inio

nco

mple

xation

enth

alpie

s(�

�H

)an

dfree

ener

gies

(��

G)

due

tom

ethyl

grou

ps.

�H

and�

Gwer

ees

tim

ated

separ

atel

yfo

rre

-

action

sA

+nX

⌦A

Xn

inw

hic

hA

isan

ion,

Na+

,K+

orB

a2+

,an

dX

is

alig

and

mol

ecule

,wat

er(W

),m

ethan

ol(M

),et

han

ol(E

)or

prop

anol

(P).

The

reac

tion

swer

eth

enco

mbin

edto

obta

in��

Han

d��

G.

All

valu

esar

ere

por

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

.D

i↵er

ence

sbet

wee

nth

e

elec

tron

icden

sities

of(B

aT

++

4�

4C

H3)�

(BaS

++

4�

4H

),ca

lcula

ted

with

PB

E0+

vdW

,ar

esh

own.

The

geom

etry

use

dwas

that

ofth

efu

llyre

laxe

d(P

BE+

vdW

)

BaT

++

4co

mple

x.To

build

BaS

++

4th

eC

H3

grou

ps

ofT

wer

esu

bst

itute

dby

Hat

oms

and

only

thes

ewer

ere

laxe

d.

The

di↵

eren

cein

elec

tron

den

sity

show

nth

us

corr

espon

ds

toth

ee↵

ect

ofpol

ariz

atio

non

the

elec

tron

icden

sities

cause

dby

the

additio

nof

met

hyl

grou

ps.

Mat

eria

lsan

dM

ethods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[22,

23],

wher

ewe

em-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

Fig

.2.

Chan

gesin

ion

com

ple

xation

enth

alpie

s(�

H)an

dfree

ener

gies

(�G

)due

toin

crea

sein

the

num

ber

ofm

ethyl

grou

ps

inth

eio

n-c

oor

din

atin

glig

ands.�

Han

d

�G

are

estim

ated

forth

esu

bst

itution

reac

tion

sA

Xn

+nX

0⌦

AX

0 n+

AX

n,

whic

har

eden

oted

asA

Xn!

AX

0 n.

Inth

ese

reac

tion

s,A

repr

esen

tson

eof

the

ions,

Na+

,K+

orB

a2+

,an

dX

repr

esen

tson

eof

the

ligan

dm

olec

ule

s,wat

er(W

),

met

han

ol(M

),et

han

ol(E

)or

prop

anol

(P).

All

ener

gies

are

repor

ted

inkc

al/m

ol.

Fig

.3.

E↵ec

tof

T!

Ssu

bst

itution

onel

ectr

onden

sities

ofan

isol

ated

S4

site

.T

he

di↵

eren

ces

bet

wee

nth

eel

ectr

onic

den

sities

,⇢

Ba

T4�

4⇢

CH

3�

⇢B

aS4�

4⇢

H,

wer

ees

tim

ated

using

the

PB

E0+

vdW

funct

ional

for

the

optim

um

geom

etry

ofth

e

BaT4

com

ple

x.T

he

geom

etry

ofth

eB

aS

4co

mple

xwas

const

ruct

edfrom

the

BaT4

optim

ized

geom

etry

byre

pla

cing

the

CH

3gr

oup

ofea

chth

reon

ine

residue

with

anH

atom

,an

dth

enre

laxi

ng

the

coor

din

ates

ofH

atom

s.

Mate

rials

and

Met

hods

All

geom

etric

rela

xation

swer

eper

form

edw

ith

the

all-el

ectr

on,

loca

lized

bas

is

(num

eric

atom

-cen

tere

dor

bital

s)pr

ogra

mpac

kage

FH

I-ai

ms

[24,

25],

wher

ewe

em-

plo

yed

the

def

ault

“tig

ht”

stan

dar

ds

rega

rdin

gbas

isse

tsan

din

tegr

atio

ngr

ids

for

all

elem

ents

.For

DFT

(LD

A/G

GA

)ca

lcula

tion

sth

ese

sett

ings

produce

esse

ntial

lyco

n-

2w

ww

.pnas

.org

/cgi

/doi

/10.

1073

/pnas

.070

9640

104

Foot

line

Auth

or

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

trons - ALEX/MARIANA - PLEASE ELABORATE...We em-ploy the recently developed...We use density-functional theorywithin the PBE [18] and PBE0 [19, 20] approximations for theexchange-correlation potential. In addition, we employ a re-cently developed van der Waals (vdW) correction scheme [21]based on a summation of pairwise C6[n]/R6 terms, where theC6 coe�cients are based on the self-consistent electronic den-sity. This approach allows us to treat accurately the energeticsand e↵ects of polarization in our systems.

We find that a T!S substitution is accompanied with onlyminor structural changes. In addition, we find that the reduc-tion in polarizability of S4 associated with a T!S substitu-tion is su�cient to explain experimental observations.Thesestudies provide, in general, a vivid example of how peptidefunction is influenced by the polarizability of methyl groups,especially those that are proximal to charged moieties, includ-ing ions, titratable side-acids and phosphates.

Results and DiscussionTo understand how T!S substitutions in the S4 site a↵ect K–channel function, we examine first the general consequencesof methyl polarizability on the thermodynamics of ion bind-ing. We estimate the gas phase enthalpies and free energiesof Na+, K+ and Ba2+ ion binding to four di↵erent molecules,namely water, methanol, ethanol and propanol. While allthese molecules provide hydroxyl ligands for ion-coordination,they di↵er from each other in the number of methyl groupsthey carry. Consequently they have di↵erent static averageground-state dipole polarizabilities of 1.5, 3.3, 5.4 and 6.7 A3,respectively [15]. The results of these calculations are plottedin Fig. 2. We find that while the addition of methyl groupsenhance the stability of the complex, the e↵ect decreases witheach incremental addition. In addition, multibody e↵ects arethe e↵ect of coordination number is non-trivial.

Table 2. distance between methyl and K ion in the s4 siteis about 5 A

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

-18-15-12-9-6-303

1 2 3 4 1 2 3 4 1 2 3 4

-20

-16

-12

-8

-4

0

1 2 3 4 1 2 3 4 1 2 3 4

K+

Ba2+

Na+

ΔHΔG

n n n

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

< �zz(t)�zz(0) >eq � < �xx(t)�xx(0) >eq

{a,b, c}k(xi,xj) = xi · xj

2nd

�|| = 0

d�/d� = 0.37 GPa

25

�xi � xj�AWn � AMn

AMn � AEn

AEn � APn

Quantitative Characterization of Intrinsic Molecular MotionUsing Support Vector Machines

Ralph E. Leighty1 and Sameer Varma1,2,3,�

1Department of Cell Biology, Microbiology and Molecular Biology,2Department of Physics, University of South Florida, Tampa, FL 33620, USA and

3Institute of Pure and Applied Mathematics, University of California Los Angeles, Los Angeles, CA 90095, USA

(Dated: July 30, 2012 DRAFT)

The ensemble of 3-d configurations exhibited by a molecule, that is, its intrinsic motion, can bealtered by numerous environmental factors, such as temperature, and also by the binding of othermolecules. An understanding of ensemble-level di↵erences from a geometric perspective is importantbecause it provides a basis for relating thermodynamic changes to changes in molecular motion.The task of di↵erentiating two configurational ensembles is, however, challenging because it requirescomparing two high-dimensional datasets. Traditionally, when analyzing molecular simulations,this problem is circumvented by first reducing the dimensions of the two ensembles separately,and then comparing summary statistics from the two ensembles against each other. However,since dimensionality reduction is carried out prior to comparison of ensembles, such strategies aresusceptible to artifactual biases from information loss. Here we introduce a method based on supportvector machines (SVM) that compares 3-d ensembles directly against one another in the Hilbertspace without a prerequisite reduction in phase space. While this method can be applied to anymolecular system, here we explore its sensitivity toward model systems comprised of amino acids.We also apply the technique to identify the specific regions of a paramyxovirus G protein that area↵ected by the binding of its preferred human receptor, Ephrin B2. We find that the specific regionsidentified by this method include the set of amino acids that are known from experimental studiesto play a vital role in viral fusion. The configurational ensembles of the viral protein in both itsbound and unbound states were generated using all-atom molecular dynamics simulations.

INTRODUCTION

The ensemble of 3-d configurations exhibited by amolecule, that is, its intrinsic motion, is correlated withthe properties of its environment [1–12]. Changes inintensive variables, such as temperature and pressure,modify the intrinsic motion of a molecule. Additionally,molecular motion also changes as a result of binding withother molecules, such as in ligand-substrate complexes ormolecular assemblies. Moreover, the extent of the changein molecular motion is dependent upon multiple factors,including properties of the molecule and the nature ofthe perturbation or external potential. A quantitativecharacterization of such changes in molecular motion isimportant from both scientific as well as engineering per-spectives because it provides a basis to associate changes

in thermodynamic properties directly with correspondingchanges in molecular motion.

Di�erentiating between two ensembles of molecularconfigurations is, however, challenging. The challengelies in comparing two sets of high-dimension data. Forthe motion of a n-particle molecule represented by m-configurations, {an(x)}m

1 , the task of di�erentiating itfrom the molecule’s reference state, {an(x0)}m

1 , involvescomparing two 3n-dimensional vector spaces. Tradition-ally, when analyzing molecular simulations, this prob-lem is dealt with by first reducing the dimensions of thetwo ensembles separately, and then comparing the result-ing summary statistics from the two ensembles againsteach other [13]. Dimensionality reduction is carried out,for example, by averaging over n-space that involvesparticle-clustering or averaging over m-space that yields

Fig. 2. Changes in ion complexation enthalpies (��H) and free energies

(��G) due to methyl groups. �H and �G were estimated separately for re-

actions A + nX ⌦ AXn in which A is an ion, Na+, K+ or Ba2+, and X is

a ligand molecule, water (W ), methanol (M ), ethanol (E ) or propanol (P). The

reactions were then combined to obtain ��H and ��G. All values are reported

in kcal/mol.

Fig. 3. E↵ect of T!S substitution on electron densities. Di↵erences between the

electronic densities of (BaT++4 � 4CH3)� (BaS++

4 � 4H), calculated with

PBE0+vdW, are shown. The geometry used was that of the fully relaxed (PBE+vdW)

BaT++4 complex. To build BaS++

4 the CH3 groups of T were substituted by

H atoms and only these were relaxed. The di↵erence in electron density shown thus

corresponds to the e↵ect of polarization on the electronic densities caused by the

addition of methyl groups.

Materials and Methods

All geometric relaxations were performed with the all-electron, localized basis

(numeric atom-centered orbitals) program package FHI-aims [22, 23], where we em-

2 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

ion-coordination number. This non-linearity emerges as aconsequence of repulsion between the coordinating ligands[23, 36, 37]. Finally, �H and �G, also depend on thecharge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced e↵ects are small. For example,while the �G associated with a K+W ! K+M substitution is-1.9 kcal/mol, the �G associated with the Ba2+W ! Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear e↵ect also un-derscores the importance of explicit polarization e↵ects in ioncomplexation [23].

In the calculations above, the small molecules have similargas phase dipole moments, but markedly di↵erent polarizabil-ities, and a ligand with a higher polarizability produces, ingeneral, a more stable ion complex. Can ligands with simi-lar gas phase dipole moments as well as static dipole polariz-abilties produce complexes with markedly di↵erent stabilities?Butanol has three di↵erent isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (⇠1.7 Debye) and polarizabili-ties (⇠8.9 A3)[3], the di↵erences in their chemical structureswill result in di↵erent distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite hav-ing similar polarizabilities, they will respond to the distance–dependent field of the ion di↵erently, and the closer proxim-ity of methyl groups in isobutanol and 2-butanol to the ion,compared to n-butanol, can be expected to increase the sta-bility of the complex. To quantify this e↵ect we carry outsubstitution reactions (Equ. 1) involving these three butanolisomers. The results are plotted in Fig. 3. Indeed, we findthat the isobutanol and 2-butanol complexes are more stablethan the corresponding n-butanol complexes. Furthermore,complexes comprising of 2-butanols are more stable than thecorresponding complexes comprising of isobutanols, as one ofthe two terminal methyl groups of 2-butanol is closer to theion than that of isobutanol. Thus, in addition to the staticdipole polarizabilities of the methyl groups, the distributionof methyl groups around the ion also plays a role in ion com-plex stability. We also note that the distances of the ions fromthe closest methyl groups of 2-butanol (⇠5 A) are compara-ble to the distances of ions from the branched methyl groupof threonine in the S4 site of KcsA [7, 11]. This suggests fur-ther that the reduction in methyl–induced polarization due toT!S substitutions in the S4 sites of K–channels can reducethe binding a�nities of all three ions significantly.

ABn ! A(iB)n

Fig. 3. E↵ect of butanol isoform on ion complexation enthalpies (�H). The

enthalpies are estimated at 298 K for the substitution reactions given by Equ. 1,

which are denoted as AXn ! AX0n. All energies are in units of kcal/mol.

In the S4 sites of K–channels the ions coordinate witheight ligands, and not just four hydroxyl ligands as studiedabove. Consequently, the repulsion between the coordinatingligands will be larger in the S4 site, which could reduce orwash out the contribution of the threonine methyl groups tothe binding of ions at the S4 site. Furthermore, a T!S sub-stitution may also alter the manner in which the coordinatinggroups of the S4 site interact with ions.

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS ⌦ AT4�nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coordi-nating groups is not large enough to counteract the stabiliza-tion provided by the presence of methyl groups. The a�nityof the S4 site for all three ions, Na+, K+ and Ba2+, dimin-ishes, although non-linearly, with each threonine substitution.Furthermore, these substitutions result in only minor config-urational changes that do not alter the overall topology ofthe S4 site (Table S2 of supporting information), which im-plies that the e↵ect of T!S substitutions on configurationalentropy will be minimal, and may be neglected. While wefind that the configurational change is generally higher in thecase of Ba2+ complexes as compared to Na+ and K+ com-plexes, we also note that optimizations of Ba2+ complexesfollowing T!S substitutions, but under heavy atom positionrestraints, result in a higher drop in S4 site a�nity. For ex-ample, a BaT4 ! BaS4 substitution followed by a constrainedoptimization yields a �E = XX kcal/mol, as opposed to a �E= XX kcal/mol obtained after an unconstrained optimization.This indicates that if the K–channel matrix were rigid to dis-allow a structural change, a T!S substitution will result ina higher loss in S4 site a�nity for Ba2+, as compared to theresults in Table 1. Together, this suggests that the loss ina�nity of the S4 site noted in Table 1 is not a result of alteredion–binding topologies, but is primarily due to a reduction inthe methyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the a�nity of the S4 site is the highest forBa2+ (Table 1). The free energy di↵erence of 6.9 kcal/mol isequivalent to a 106-fold drop in binding a�nity, where morethan half of the a�nity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop inBa2+ binding a�nity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T!S substi-tution experiments [16]. In addition, it also accounts for thedi↵erence between the Kir 2.4 and Kir 2.1 experimental IC50

values of Ba2+ [20, 21], as Kir 2.4 carries a serine and Kir 2.1a threonine in the S4 site.

The computed drop in stability is, however, greater thanthat inferred from experiments (by over 3 kcal/mol). This dis-crepancy does not result from the pairwise approximation [31]employed in the estimation of the dispersion energy. Whilemany-body dispersion terms can contribute significantly toligand binding [38], we find their contribution to be only0.3 kcal/mol to the BaT4 ! BaS4 substitution reaction (Ta-ble S4 of supporting information). We also do not expectthis discrepancy to be due to the missing structural restraintsfrom the protein matrix on the isolated S4 site, as the T!Ssubstitutions are accompanied by only minor configurationalchanges (Table S2 of supporting information). The overesti-mated drop in Ba2+ binding a�nity is most likely because ourcalculations lack a polarization coupling between the S4 siteand its external environment. Fig. 4 shows that the polar-ization in the electron density of the threonine methyl groupsis along the electric field of the Ba2+ ion, which maximizesthe contribution of methyl polarization to Ba2+ binding. Thepresence of other polar chemical moieties proximal to the S4site, including water molecules, can reduce the contributionof polarization and thereby the overall e↵ect of T!S substi-tutions on Ba2+ binding. Thermodynamically, this reduction

Footline Author PNAS Issue Date Volume Issue Number 3

ion-coordination number. This non-linearity emerges as aconsequence of repulsion between the coordinating ligands[23, 36, 37]. Finally, �H and �G, also depend on thecharge/size ratio of the ion, and once again, we find a non-linear relationship between the charge/size ratio of the ion andthe magnitude of the thermodynamic change. Doubling thecharge/size ratio of the ion does not necessarily imply that thestability of the complex also increases by a factor of two, asmay be the case when induced e↵ects are small. For example,while the �G associated with a K+W ! K+M substitution is-1.9 kcal/mol, the �G associated with the Ba2+W ! Ba2+Msubstitution is -7.9 kcal/mol, which is a 4-fold change insteadof a 2-fold change. This additional non-linear e↵ect also un-derscores the importance of explicit polarization e↵ects in ioncomplexation [23].

In the calculations above, the small molecules have similargas phase dipole moments, but markedly di↵erent polarizabil-ities, and a ligand with a higher polarizability produces, ingeneral, a more stable ion complex. Can ligands with simi-lar gas phase dipole moments as well as static dipole polariz-abilties produce complexes with markedly di↵erent stabilities?Butanol has three di↵erent isomers, n-butanol (B), isobutanol(iB) and 2-butanol (2B). While these isomers have compara-ble gas phase dipole moments (⇠1.7 Debye) and polarizabili-ties (⇠8.9 A3)[3], the di↵erences in their chemical structureswill result in di↵erent distributions of methyl groups and/ormethylene bridges around the ion. Consequently, despite hav-ing similar polarizabilities, they will respond to the distance–dependent field of the ion di↵erently, and the closer proxim-ity of methyl groups in isobutanol and 2-butanol to the ion,compared to n-butanol, can be expected to increase the sta-bility of the complex. To quantify this e↵ect we carry outsubstitution reactions (Equ. 1) involving these three butanolisomers. The results are plotted in Fig. 3. Indeed, we findthat the isobutanol and 2-butanol complexes are more stablethan the corresponding n-butanol complexes. Furthermore,complexes comprising of 2-butanols are more stable than thecorresponding complexes comprising of isobutanols, as one ofthe two terminal methyl groups of 2-butanol is closer to theion than that of isobutanol. Thus, in addition to the staticdipole polarizabilities of the methyl groups, the distributionof methyl groups around the ion also plays a role in ion com-plex stability. We also note that the distances of the ions fromthe closest methyl groups of 2-butanol (⇠5 A) are compara-ble to the distances of ions from the branched methyl groupof threonine in the S4 site of KcsA [7, 11]. This suggests fur-ther that the reduction in methyl–induced polarization due toT!S substitutions in the S4 sites of K–channels can reducethe binding a�nities of all three ions significantly.

ABn ! A(2B)n

Fig. 3. E↵ect of butanol isoform on ion complexation enthalpies (�H). The

enthalpies are estimated at 298 K for the substitution reactions given by Equ. 1,

which are denoted as AXn ! AX0n. All energies are in units of kcal/mol.

In the S4 sites of K–channels the ions coordinate witheight ligands, and not just four hydroxyl ligands as studiedabove. Consequently, the repulsion between the coordinatingligands will be larger in the S4 site, which could reduce orwash out the contribution of the threonine methyl groups tothe binding of ions at the S4 site. Furthermore, a T!S sub-stitution may also alter the manner in which the coordinatinggroups of the S4 site interact with ions.

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS ⌦ AT4�nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coordi-nating groups is not large enough to counteract the stabiliza-tion provided by the presence of methyl groups. The a�nityof the S4 site for all three ions, Na+, K+ and Ba2+, dimin-ishes, although non-linearly, with each threonine substitution.Furthermore, these substitutions result in only minor config-urational changes that do not alter the overall topology ofthe S4 site (Table S2 of supporting information), which im-plies that the e↵ect of T!S substitutions on configurationalentropy will be minimal, and may be neglected. While wefind that the configurational change is generally higher in thecase of Ba2+ complexes as compared to Na+ and K+ com-plexes, we also note that optimizations of Ba2+ complexesfollowing T!S substitutions, but under heavy atom positionrestraints, result in a higher drop in S4 site a�nity. For ex-ample, a BaT4 ! BaS4 substitution followed by a constrainedoptimization yields a �E = XX kcal/mol, as opposed to a �E= XX kcal/mol obtained after an unconstrained optimization.This indicates that if the K–channel matrix were rigid to dis-allow a structural change, a T!S substitution will result ina higher loss in S4 site a�nity for Ba2+, as compared to theresults in Table 1. Together, this suggests that the loss ina�nity of the S4 site noted in Table 1 is not a result of alteredion–binding topologies, but is primarily due to a reduction inthe methyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the a�nity of the S4 site is the highest forBa2+ (Table 1). The free energy di↵erence of 6.9 kcal/mol isequivalent to a 106-fold drop in binding a�nity, where morethan half of the a�nity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop inBa2+ binding a�nity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T!S substi-tution experiments [16]. In addition, it also accounts for thedi↵erence between the Kir 2.4 and Kir 2.1 experimental IC50

values of Ba2+ [20, 21], as Kir 2.4 carries a serine and Kir 2.1a threonine in the S4 site.

The computed drop in stability is, however, greater thanthat inferred from experiments (by over 3 kcal/mol). This dis-crepancy does not result from the pairwise approximation [31]employed in the estimation of the dispersion energy. Whilemany-body dispersion terms can contribute significantly toligand binding [38], we find their contribution to be only0.3 kcal/mol to the BaT4 ! BaS4 substitution reaction (Ta-ble S4 of supporting information). We also do not expectthis discrepancy to be due to the missing structural restraintsfrom the protein matrix on the isolated S4 site, as the T!Ssubstitutions are accompanied by only minor configurationalchanges (Table S2 of supporting information). The overesti-mated drop in Ba2+ binding a�nity is most likely because ourcalculations lack a polarization coupling between the S4 siteand its external environment. Fig. 4 shows that the polar-ization in the electron density of the threonine methyl groupsis along the electric field of the Ba2+ ion, which maximizesthe contribution of methyl polarization to Ba2+ binding. Thepresence of other polar chemical moieties proximal to the S4site, including water molecules, can reduce the contributionof polarization and thereby the overall e↵ect of T!S substi-tutions on Ba2+ binding. Thermodynamically, this reduction

Footline Author PNAS Issue Date Volume Issue Number 3

K+Ba2+

Na+

Fig. 3. E↵ect of butanol isoform on ion complexation enthalpies (�H). The

enthalpies are estimated at 298 K for the substitution reactions given by Equ. 1,

which are denoted as AXn ! AX0n. All energies are in units of kcal/mol.

APn ! ABn

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS ⌦ AT4�nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coor-dinating groups is not large enough to counteract the sta-bilization provided by the presence of methyl groups. Thea�nity of the S4 site for all three ions (Na+, K+, Ba2+) di-minishes, although non-linearly, with each threonine substi-tution. Furthermore, these substitutions result in only minorconfigurational changes that do not alter the overall topol-ogy (geometry) of the S4 site (Table S2 of supporting infor-mation), which implies that the e↵ect of T!S substitutionson configurational entropy will be minimal [38], and may beneglected. While we find that the configurational change isgenerally higher in the case of Ba2+ complexes as comparedto Na+ and K+ complexes, we also note that optimizationsof Ba2+ complexes following T!S substitutions, but underheavy atom position restraints, result in a higher drop in S4site a�nity. For example, a BaT4 ! BaS4 substitution fol-lowed by a constrained optimization yields a single point en-ergy di↵erence �E = XX kcal/mol, as opposed to a �E =XX kcal/mol obtained from an unconstrained optimization.This indicates that if the structure of the S4 site remainedunchanged from a T!S mutation, then the a�nity of the S4site for Ba2+ would drop even further than that reported inTable 1. Together, this suggests that the loss in a�nity ofthe S4 site noted in Table 1 is not a result of altered ion–binding topologies, but is primarily due to a reduction in themethyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the a�nity of the S4 site is the highest forBa2+ (Table 1). The free energy di↵erence of 6.9 kcal/mol isequivalent to a 106-fold drop in binding a�nity, where morethan half of the a�nity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop inBa2+ binding a�nity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T!S substi-tution experiments [16]. In addition, it also accounts for thedi↵erence between the Kir 2.4 and Kir 2.1 experimental IC50

Footline Author PNAS Issue Date Volume Issue Number 3

K+Ba2+

Na+

1 2 3 4 1 2 3 4-12

-9

-6

-3

0

3

1 2 3 4

Fig. 3. Effect of butanol isomer on ion complexation enthalpies (∆H). The en-

thalpies are estimated at 298 K for the substitution reactions given by Equ. 1, which

are denoted as AXn → AX′n. All energies are in units of kcal/mol.

To resolve these issues and understand how the threoninemethyl groups contribute to K–channel function, we consideran isolated S4 site composed of threonine residues. In thisrepresentative model, we substitute, in unit increments, thethreonines with serines, that is,

AT4 + nS AT4−nSn + nT, [2]

and estimate the changes in free energy in the gas phase (Ta-ble 1). We find that the repulsion between the eight coor-dinating groups is not large enough to counteract the sta-bilization provided by the presence of methyl groups. Theaffinity of the S4 site for all three ions (Na+, K+, Ba2+) di-minishes, although non-linearly, with each threonine substi-tution. Furthermore, these substitutions result in only minorconfigurational changes that do not alter the overall topology(geometry) of the S4 site (Table S2 and Figure S3 of sup-porting information), which implies that the effect of T→Ssubstitutions on configurational entropy will be minimal [38],and may be neglected. While we find that the configurationalchange is generally higher in the case of Ba2+ complexes ascompared to Na+ and K+ complexes, we also note that op-timizations of Ba2+ complexes following T→S substitutions,but under heavy atom position restraints, result in a higherdrop in S4 site affinity. For example, a BaT4 → BaS4 substi-tution followed by a constrained optimization yields a singlepoint energy difference ∆E = 8.8 kcal/mol, as opposed toa ∆E = 5.2 kcal/mol obtained from an unconstrained opti-mization. This indicates that if the structure of the S4 siteremained unchanged after a T→S mutation, then the affinityof the S4 site for Ba2+ would drop even further than thatreported in Table 1. Together, this suggests that the loss inaffinity of the S4 site noted in Table 1 is not a result of alteredion–binding topologies, but is primarily due to a reduction inthe methyl–induced polarization of the S4 site.

When all four threonines are replaced by serines, the es-timated drop in the affinity of the S4 site is the highest forBa2+ (Table 1). The free energy difference of 6.9 kcal/mol isequivalent to a 106-fold drop in binding affinity, where morethan half of the affinity drop is due to a loss in dispersionenergy (Table S3 of supporting information). The net drop in

Footline Author PNAS Issue Date Volume Issue Number 3

Page 4: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

Ba2+ binding affinity accounts for the change in half maximalinhibitory concentrations (IC50) of Ba2+ seen in T→S substi-tution experiments [16]. In addition, it also accounts for thedifference between the Kir 2.4 and Kir 2.1 experimental IC50

values of Ba2+ [20, 21], as Kir 2.4 carries a serine and Kir 2.1a threonine in the S4 site.

The computed drop in stability is, however, greater thanthat inferred from experiments (by over 3 kcal/mol). This dis-crepancy does not result from the pairwise approximation [31]employed in the estimation of the dispersion energy. Whilemany-body dispersion terms can contribute significantly toligand binding [39], we find their contribution to be only0.3 kcal/mol to the BaT4 → BaS4 substitution reaction (Ta-ble S4 of supporting information). We also do not expectthis discrepancy to be due to the missing structural restraintsfrom the protein matrix on the isolated S4 site, as the T→Ssubstitutions are accompanied by only minor configurationalchanges (Table S2 of supporting information).

The overestimated drop in Ba2+ binding affinity is mostlikely because our calculations lack a polarization couplingbetween the S4 site and its external environment. Fig. 4shows that the polarization in the electron density of the thre-onine methyl groups is along the electric field of the Ba2+ ion,which maximizes the contribution of methyl polarization toBa2+ binding. The presence of other polar chemical moietiesproximal to the S4 site, including water molecules, can reducethe contribution of polarization and thereby the overall effectof T→S substitutions on Ba2+ binding. Thermodynamically,this reduction will appear as an increase in the free energypenalty associated with threonine extraction from its localenvironment, which has been shown to influence ion bind-ing [40, 41]. Another possible explanation could be that thecomputed values may not be comparable directly with exper-imental estimates. It is plausible that a T→S substitutionindeed causes the Ba2+ binding affinity of the S4 site to dropclose to the computed value, and in such an event Ba2+ bindsto an alternative site in the filter that is more favorable thanthe serine–containing S4 site. The experimental estimates ofBa2+ IC50 values in the presence of threonines and serinescould, therefore, correspond to two different binding sites ofBa2+ in the filter, instead of two different chemistries of an S4binding site. Irregardless, we find that the presence of methylgroups contributes significantly to the binding of Ba2+ ionsto K–channels.

Fig. 4. Effect of T→S substitution on the electron density ρ of an isolated S4

site bound to Ba2+. This effect was estimated as ∆ρ = (ρBaT4− 4ρCH3

−ρBaS4

+ 4ρH) using the PBE0+vdW functional. In the expression above, ρX are

the electron densities of the isolated groups, X. The BaS4 complex was constructed

from the optimized geometry of the BaT4 complex by first replacing its side chain

CH3 groups with H atoms, and then relaxing the H atom coordinates.

From Table 1, we also see that when all four threoninesare replaced by serines, the stability of the S4 site complexedwith K+ ions drops by 3.4 kcal/mol. It is known that K+

binding to the selectivity filter contributes to the organiza-tion of the filter’s conductive state, as well as to the channel’sability to assemble into tetramers [7, 50, 51, 52]. A dropin K+ binding affinity at the S4 site could, therefore, desta-bilize the filter’s conductive state, and/or impede the chan-nel’s ability to form tetramers. Indeed, electrophoresis stud-ies show that T→S substitutions in Kcv reduce the channel’sability to assemble into tetramers [16]. In addition, electro-physiological studies show that T→S substitutions in Kir 2.1alter the channel’s mean open probabilities [16]. The bind-ing of K+ ions to the S4 site also plays a vital role in theconcerted movement of ions along the permeation pathway[10, 42, 43, 44, 45, 46, 47, 48, 49]. A T→S substitution could,therefore, potentially also alter the kinetics of K+ transportthrough the channel, although no such effects were observed inexperiments on Kcv and Kir 2.1 channels [16]. Nevertheless,it is clear from Table 1 and Figures 2 and 3 that the polar-ization induced by methyl groups can contribute significantlyto how K+ ions interact with the channel, and therefore, theion’s transport properties.

The data in Table 1 also indicates that T→S substitutionswill lower the affinity of the S4 site for Na+ ions. This reduc-tion is, in general, greater than that estimated in the case ofK+ ions, which suggests that T→S substitutions will also in-crease the K/Na selectivity of the S4 site. This, however, doesnot imply that the overall selectivity of the channel will alsoincrease. Thermodynamic calculations on KcsA indicate thatthe contribution of the S4 site to the channel’s K/Na selectiv-ity is small [53]. Furthermore, the channel’s overall selectivityalso depends on the number of 8-fold ion binding sites in theselectivity filter [54], and how these sites interact with theirproximal environment, including with each other [40, 55, 56].While it is not straightforward to ascertain how these effectswill interplay to alter the selective properties of T→S mutantchannels, electrophysiological studies show that T→S substi-tutions in Kcv channels do not alter K/Na selectivity [16].

Together, we find that the polarization induced by methylgroups stabilizes ion complexation significantly, and can,therefore, also be expected to influence the interactions ofother biological charged species. The magnitude of the ef-fect depends non–linearly on ion charge, the proximity of the

4 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

Page 5: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

methyl group from the ion, as well as on the number of methylgroups present in the ion complex. T→S substitutions in theS4 sites of K–channels result in removal of methyl groups froma distance greater than 5 A from the ion. While the architec-ture of the binding site is affected minimally, the stability ofthe S4 site for Ba2+ drops by several kcal/mol with the T→Ssubstitution. This destabilization accounts for the changes inIC50 values of Ba2+ seen in experiments. Additionally, theaffinity of the S4 site also drops for K+ ions, which correlateswith changes in gating properties as well as a reduction inthe overall stability of the channel. These results show howpolarization induced by methyl groups is important to bothK–channel function and stability.

SummaryNaturally occurring or engineered T→S substitutions at theS4 ion-binding sites of K–channels reduce the number ofmethyl groups near the ion. Despite this seemingly benignchange in the number of hydrophobic groups, electrophysio-logical experiments show that T→S substitutions modify K–channel properties. Results from our first principles quantummechanical calculations account for these modifications. Wefind that the methyl–induced polarization is large enough thatits reduction in S4 sites due to T→S substitutions reduces ionaffinities dramatically, with minimal change in topology. Thisloss in ion affinity is consistent with experimental observa-tions, such as changes in open channel probabilities, and IC50

values of Ba2+. These results illustrate quantitatively howprotein function can also rely on the polarization induced bymethyl groups. Given the abundance of charged moieties inbiological systems, such as ions, titratable amino acids, sul-phates, phosphates and nucleotides, it appears unlikely thatwe have uncovered an unwonted instance of where the po-larizability of methyl groups matter. The recent advances in

computational methods now permit a proper quantitative as-sessment of the role of methyl groups in biomolecular function.

Materials and MethodsFree energy changes ∆G associated with the substitution reactions (Equations 1 and

2) are obtained after relaxing separately each molecular complex using the DFT+vdW

method [30] implemented in the all-electron, localized basis FHI-aims package [57, 58].

The vdW corrected PBE [26] functional (PBE+vdW) is used for relaxations. Within

the DFT+vdW method, PBE performs well against the S22 data set [59], exhibiting

a mean absolute error of only 0.3 kcal/mol [31]. Relativistic corrections are used only

for complexes containing barium ions, and using the atomic ZORA method [60]. We

employ the “tight” settings for integration grids and basis sets, as described in Ref.

[57], which yield converged energy differences and negligible basis set superposition

error. While the starting geometries of the ion-water complexes are taken from Ref.

[41], the starting geometries of the ion-alcohol complexes are constructed using the

ion-water complexes as templates. The coordinates for the S4 site of the K–channel

selectivity filter are taken from the X-ray structure of KcsA [7].

Following relaxation, the complex geometries are subjected to a single point

calculation using the hybrid PBE0[27, 28]+vdW functional. Due to the partial in-

clusion of exact exchange in PBE0, this functional improves the description of elec-

tron delocalization [61]. This is important to the estimation of vdW contributions,

C6[n(r)]/R6, as they depend explicitly on the self-consistent electron densities

n(r). The translational, rotational and vibrational contributions are estimated sep-

arately for each molecular complex, with the PBE+vdW functional, at a temperature

of 298 K and a pressure of 1 atmosphere. These contributions are then added to the

PBE0+vdW single point energies to obtain the free energies G of individual com-

plexes. The free energy changes ∆G associated with the substitution reactions are

then obtained by subtracting the free energies of reactant complexes from the product

complexes; that is, ∆G =∑

npGp −∑

nrGr , where np and nr are the

stoichiometries of the products and reactants.

ACKNOWLEDGMENTS. We thank Daniel L. Minor, Jr. and Carsten Baldauf forstimulating discussions. MR and AT acknowledge computational time from the Ar-gonne Leadership Computing Facility at ANL, which is supported by the Office ofScience of the US DoE under contract DE-AC02-06CH11357. SBR acknowledgesfunding from Sandia’s LDRD program.SNL is managed and operated by Sandia Cor-poration, a wholly owned subsidiary of LMC, for the US DoEs NNSA under contractDE AC04-94AL85000. AT and SV also acknowledge support from the Spring 2011long program of the Institute of Pure and Applied Mathematics at UCLA.

1. WK Paik, DC Paik, and S Kim. Historical review: the field of protein methylation.

TRENDS in Biochemical Sciences, 32(3):146-152, 2007.

2. KD Roberston. DNA methylation and human disease. Nature Rev. Genetics, 6:597-

610, 2005.

3. DR Lide. CRC Handbook of Chemistry and Physics. Taylor and Francis, Boca Raton,

FL, 2008.

4. LC Sowers, BR Shaw, and WD Sedwick. Base stacking and molecular polarizability:

Effect of a methyl group in the 5-position of pyrimidines. Biochem. Biophys. Res.

Comm., 148(2):790-794, 1987.

5. S Wang, and ET Kool. Origins of the Large Differences in Stability of DNA and RNA

Helices: C-5 Methyl and 2-Hydroxyl Effects Biochemistry, 34:4125-4132, 1995.

6. B Hille. Ionic Channels of Excitable Membranes. Sinauer Associates, Sunderland, MA,

2001.

7. Y Zhou, JH Morais-Cabral, A Kaufman, and R MacKinnon. Chemistry of ion coor-

dination and hydration revealed by a K+ channel-Fab complex at 2.0 A resolution.

Nature, 414:43-48, 2001.

8. CM Armstrong, and SR Taylor. Interaction of barium ions with potassium channels in

squid giant axons. Biophys. J, 30:473-488, 1980.

9. DC Eaton, and MS Brodwick. Effect of barium on the potassium conductance of squid

axons. J Gen. Physiol., 75:727-750, 1980.

10. J Neyton, and C Miller. Discrete Ba2+ block as a probe of ion occupancy and

pore structure in the high-conductance Ca2+-activated K+ channel. J Gen. Physiol.,

92:569-586, 1988.

11. Y Jiang, and R MacKinnon. The Barium Site in a Potassium Channel by X-Ray

Crystallography. J Gen. Physiol., 15(3):269-272, 2000.

12. C Vergara, and R Latorre. Kinetics of Ca2+-activated K+ channels from rabbit mus-

cle incorporated into planar bilayers. Evidence for a Ca2+ and Ba2+ blockade. J Gen.

Physiol., 75:727-750, 1980.

13. N Alagem, M Dvir, and E Reuveny. Mechanism of Ba(2+) block of a mouse in-

wardly rectifying K+ channel: differential contribution by two discrete residues. J

Gen. Physiol., 534:381-39, 2001.

14. P Proks, JF Antcliff, and FM Ashcroft. The ligand-sensitive gate of a potassium

channel lies close to the selectivity filter. EMBO Journal, 4:70-75, 2003.

15. FC Chatelain, N Alagem, Q Xu, R Pancaroglu, E Reuveny, and DL Minor Jr. The

Pore Helix Dipole Has a Minor Role in Inward Rectifier Channel Function Neuron, 47,

833-843, 2005.

16. FC Chatelain, S Gazzarrini., Y Fujiwara, C Arrigoni, C Domigan, G Ferrara, C Pantoja,

G Thiel, A Moroni, and DL Minor Jr. Selection of Inhibitor-Resistant Viral Potassium

Channels Identifies a Selectivity Filter Site that Affects Barium and Amantadine Block.

PLoS ONE, 4(10): e7496, 2009.

17. S Varma, DM Rogers, LR Pratt, and SB Rempe. Perspectives on Ion selectivity: De-

sign principles for K+ selectivity in membrane transport. J Gen. Physiol., 137:479-488,

2011.

18. I Kim and TW Allen. On the selective ion binding hypothesis for potassium channels.

Proc. Natl. Acad. Sci. USA., 108:17963-17968, 2011.

19. RT Shealy, AD Murphy, R Ramarathnam, E Jakobsson, and S Subramaniam.

Sequence-function analysis of the K+-selective family of ion channels using a compre-

hensive alignment and the KcsA channel structure. Biophys. J, 84:2929-2942, 2003.

20. C Topert, F Doring, E Wischmeyer, C Karschin, J Brockhaus, K Ballanyi, C Derst,

and A Karschin Kir2.4: A Novel K+ Inward Rectifier Channel Associated with Mo-

toneurons of Cranial Nerve Nuclei. J Neurosci., 18(11):4096-4105, 1998.

21. H Hibino, A Inanobe, K Furutani, S Murakami, I Findlay, and Y Kurachi Inwardly

Rectifying Potassium Channels: Their Structure, Function, and Physiological Roles.

Physiol. Rev., 90: 291-366, 2010.

22. WC Wimley, SH White. Experimentally determined hydrophobicity scale for proteins

at membrane interfaces. Nature Struct. Biol., 3:842-848, 1996.

23. S Varma, and SB Rempe. Multibody effects in ion binding and selectivity. Biophys.

J., 99:3394-3401, 2010.

24. DL Bostick, and CL Brooks III. Selective complexation of K+ and Na+ in simple

polarizable ion-ligating systems. J Amer. Chem. Soc., 132:13185-13187, 2010.

25. A Tkatchenko M Rossi, V Blum, J Ireta, and M Scheffler. Unraveling the stability

of polypeptide helices: Critical role of van der Waals interactions. Phys. Rev. Lett.,

106:118102-118106, 2011.

26. JP Perdew, K Burke, and M Ernzerhof. Generalized Gradient Approximation Made

Simple. Phys. Rev. Lett., 77(18):3865–3868, 1996.

27. M. Ernzerhof and G. E. Scuseria. Assessment of the Perdew-Burke-Ernzerhof

exchange-correlation functional. J Chem. Phys., 110:5029, 1999.

Footline Author PNAS Issue Date Volume Issue Number 5

Page 6: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

28. C. Adamo and V. Barone. Toward reliable density functional methods without ad-

justable parameters: The PBE0 model. J Chem. Phys., 110(13):6158–6170, 1999.

29. N Marom, A Tkatchenko, M Scheffler and L Kronik Describing Both Dispersion In-

teractions and Electronic Structure Using Density Functional Theory: The Case of

Metal-Phthalocyanine Dimers. J Chem. Theory Comput., 6:81-90, 2010.

30. A. Tkatchenko and M. Scheffler. Accurate molecular van der Waals interactions

from ground-state electron density and free-atom reference data. Phys. Rev. Lett.,

102:073005, 2009.

31. N Marom, A Tkatchenko, M Rossi, V Gobre, O Hod, M Scheffler, and L Kronik.

Dispersion interactions within density functional theory: Benchmarking semi-empirical

and pair-wise corrected density functionals. J. Chem. Theory Comput., 7:3944–3951,

2011.

32. RD Nelson, DR Lide, and AA Maryott. Selected Values of Electric Dipole Moments

for Molecules in the Gas Phase. Natl. Stand. Ref. Data Ser. - Nat. Bur. Stnds., 10,

1967.

33. MD Tissandier et al. The proton’s absolute aqueous enthalpy and Gibbs free energy

of solvation from cluster–ion solvation data. J. Phys. Chem. A, 102:7787–7794, 1998.

34. M Peschke, AT Blades, and P Kebarle. Hydration energies and entropies for Mg2+,

Ca2+, Sr2+ and Ba2+ from gas-phase ion-water molecule equilibria determinations.

J. Phys. Chem. A, 102:9978–9985, 1998.

35. SB Nielsen, M Masella, and P Kebarle. Competitive gas-phase solvation of alkali

metal ions by water and methanol. J. Phys. Chem. A, 103:9891–9898, 1999.

36. EE Dahlke, and DJ Truhlar. Assessment of the pairwise additive approximation and

evaluation of many-body terms for water clusters. J. Phys. Chem. B Lett., 110:10595-

10601, 2006.

37. C Krekeler, and L Delle Site. Solvation of positive ions in water: the dominant role

of water-water interaction. J. Phys. Condens. Matter. , 19:192101-192107, 2007.

38. L Pauling. The Structure and Entropy of Ice and of Other Crystals with Some Ran-

domness of Atomic Arrangement. J. Am. Chem. Soc., 57:2680-2684, 1935.

39. RA DiStasio, OA von Lilienfeld, and A Tkatchenko. Collective many-body van der

Waals interactions in molecular systems. Proc. Natl. Acad. Sci. USA, 109: 14791-

14795, 2012.

40. S Varma, and SB Rempe. Tuning ion coordination architectures to enable selective

partitioning. Biophys. J., 93:1093-1099, 2007.

41. S Varma, and SB Rempe. Structural transitions in ion coordination driven by changes

in competition for ligand binding. J. Amer. Chem. Soc., 130:15405-15419, 2008.

42. AL Hodgkin, and RD Keynes. The potassium permeability of a giant nerve fibre. J

Physiol. (London), 128:61-88, 1955.

43. IH Shrivastava and MS Sansom. Simulations of ion permeation through a potas-

sium channel: molecular dynamics of KcsA in a phospholipid bilayer. Biophys J., 78:

557-570, 2000.

44. J Aqvist, and V Luzhkov. Ion permeation mechanism of the potassium channel. Na-

ture, 404:881-884., 2000.

45. S Berneche, and B Roux. Energetics of ion conduction through the K+ channel.

Nature, 414:73-77, 2001.

46. RJ Mashl, Y Tang, J Schnitzer, and E Jakobsson. Hierarchical approach to predicting

permeation in ion channels. Biophys J., 81: 2473-2483, 2001.

47. S Furini, and C Domene. Atypical mechanism of conduction in potassium channels.

Proc. Natl. Acad. Sci. USA, 106:16074-16077, 2009.

48. MO Jensen, DW Borhani, K Lindorff-Larsen, P Maragakis, V Jogini, MP Eastwood,

RO Dror, and DE Shaw. Principles of conduction and hydrophobic gating in K+

channels. Proc. Natl. Acad. Sci. USA, 107:5833-5838, 2010.

49. M Iwamoto and S Oiki Counting Ion and Water Molecules in a Streaming File through

the Open-Filter Structure of the K Channel. J. Neurosci., 31(34):12180-12188, 2011.

50. MN Krishnan, JP Bingham, SH Lee, P Trombley, and E Moczydlowski Functional role

and affinity of inorganic cations in stabilizing the tetrameric structure of the KcsA K+

channel. J Gen. Physiol., 126: 271-283, 2005.

51. MN Krishnan, P Trombley, and E Moczydlowski Thermal stability of the K+ channel

tetramer: cation interactions and the conserved threonine residue at the innermost

site (S4) of the KcsA selectivity filter. Biochemistry, 47: 5354-5367, 2008.

52. S Wang, Y Alimi, A Tong, CG Nichols, and D Enkvetchakul Differential roles of

blocking ions in KirBac1.1 tetramer stability. J Biol. Chem., 284: 2854-2860, 2009.

53. S Noskov, B Roux. Control of ion selectivity in potassium channels by electrostatic

and dynamic properties of carbonyl ligands. Nature, 431:830-834, 2004.

54. MG Derebea, DB Sauera, W Zeng, A Alam, N Shi, and Y Jiang. Tuning the ion

selectivity of tetrameric cation channels by changing the number of ion binding sites.

Proc. Natl. Acad. Sci. USA, 108(2):598-602, 2011.

55. DL Bostick, and CL Brooks. Selectivity in K+ channels is due to topological control

of the permeant ions coordinated state. Proc. Natl. Acad. Sci. USA., 104: 9260-9265,

2007.

56. M Thomas, D Jayatilaka, and B Corry. The predominant role of coordination number

in potassium channel selectivity. Biophys. J., 93:2635-2643, 2007.

57. V Blum, R Gehrke, F Hanke, P Havu, X Ren, K Reuter, and M Scheffler. Ab ini-

tio molecular simulations with numeric atom-centered orbitals. Comp. Phys. Comm.,

180:2175–2196, 2009.

58. V Havu, V Blum, P Havu, and M Scheffler. Efficient o(n) integration for all-

electron electronic structure calculation using numeric basis functions. J Comput.

Phys, 228(22):8367–8379, 2009.

59. P Jurecka, J Sponer, J Cerny, and P Hobza. Benchmark database of accurate (MP2

and CCSD(T) complete basis set limit) interaction energies of small model complexes,

DNA base pairs, and amino acid pairs. Phys. Chem. Chem. Phys., 8:1985, Jan 2006.

60. E. van Lenthe, E. J. Baerends, and J. G. Snijders. Relativistic regular two-component

Hamiltonians. J Chem. Phys., 99(6):4597–4610, 1993.

61. A Cohen, P Mori-Sanchez, and W. Yang. Insights into Current Limitations of Density

Functional Theory Science 321:792–794, 2008.

6 www.pnas.org/cgi/doi/10.1073/pnas.0709640104 Footline Author

Page 7: pubman.mpdl.mpg.depubman.mpdl.mpg.de/pubman/item/escidoc:1804481:8/... · The role of methyl{induced polarization in ion binding Mariana Rossi , Alexandre Tkatchenko Susan B. Rempe

Table 1. Free energy change ∆G at298 K associated with the substitu-tion reaction given by Equ. 2.

n Na+ K+ Ba2+

1 1.5 1.4 2.3

2 2.2 (2.5)a 2.4 (1.7)a 3.4 (3.8)a

3 3.3 2.5 5.0

4 4.6 3.4 6.9a Two simultaneous T → S substitutions can be introduced in two different ways. In one case, the substitutions can be made on adjacent

threonine residues, and in the other case, the substitutions are made on non-adjacent threonine residues. The numbers in brackets

correspond to T → S substitutions made on adjacent threonine residues.

Footline Author PNAS Issue Date Volume Issue Number 7