Efficient nonparametric n-body force fields from machine learning · Since their conception,...

14
Ecient nonparametric n-body force fields from machine learning Aldo Glielmo, 1, Claudio Zeni, 1, and Alessandro De Vita 1, 2 1 Department of Physics, King’s College London, Strand, London WC2R 2LS, United Kingdom 2 Dipartimento di Ingegneria e Architettura, Università di Trieste, via A. Valerio 2, I-34127 Trieste, Italy We provide a definition and explicit expressions for n-body Gaussian Process (GP) kernels, which can learn any interatomic interaction occurring in a physical system, up to n-body contributions, for any value of n. The series is complete, as it can be shown that the “universal approximator” squared exponential kernel can be written as a sum of n-body kernels. These recipes enable the choice of optimally ecient force models for each target system, as confirmed by extensive testing on various materials. We furthermore describe how the n-body kernels can be “mapped” on equivalent repres- entations that provide database-size-independent predictions and are thus crucially more ecient. We explicitly carry out this mapping procedure for the first non-trivial (3-body) kernel of the series, and we show that this reproduces the GP-predicted forces with meV/Å accuracy while being orders of magnitude faster. These results pave the way to using novel force models (here named “M-FFs”) that are computationally as fast as their corresponding standard parametrised n-body force fields, while retaining the nonparametric character, the ease of training and validation, and the accuracy of the best recently proposed machine learning potentials. I. INTRODUCTION Since their conception, first-principles molecular dy- namics (MD) simulations [1] based on density func- tional theory (DFT) [2, 3] have proven extremely use- ful to investigate complex physical processes that require quantum accuracy. These simulations are computation- ally expensive, and thus still typically limited to hun- dreds of atoms and the picosecond timescale. For larger systems that are non-uniform and thus intractable us- ing periodic boundary conditions, multiscale embedding (“QM/MM”) approaches can sometimes be used success- fully. This is possible if full quantum accuracy is only needed in a limited (QM) zone of the system, while a simpler molecular mechanics (MM) description suces everywhere else. Very often, however, target problems require minimal model system sizes and simulation times so large that the calculations must be exclusively based on classical force fields i.e., force models that use the po- sition of atoms as the only explicitly represented degrees of freedom. In the remainder of this introductory section we briefly review the relative strengths and weaknesses of standard parametrized (P-FFs) and machine learning force fields (ML-FFs). We then consider how accurate P-FFs are hard to develop but eventually fully exploit useful know- ledge on the systems, while GP-based ML-FFs oer a general mathematical framework for handling training and validation, but are significantly slower (Section I A). These shortcomings motivate an analysis of how prior knowledge such as symmetry has been so far incorpor- ated in GP kernels (Section I B) and points to features still missing in ML kernels, which are commonplace in the more standard, highly ecient P-FFs based on trun- cated n-body expansions (Section I C). This suggests the [email protected] [email protected] possibility of defining a series of n-body GP kernels (Sec- tion II B), providing a scheme to construct them (Section II C and D) and, after the best value of n for the given target system has been identified with appropriate test- ing (Section II E), exploiting their dimensionally-reduced feature spaces to massively boost the execution speed of force prediction (Section III). A. Parametrized and machine learning force fields Producing accurate and fully transferable force fields is a remarkably dicult task. The traditional way to do this involves adjusting the parameters of carefully chosen ana- lytic functions in the hope of matching extended reference data sets obtained from experiments or quantum calcu- lations [4, 5]. The descriptive restrictiveness of the para- metric functions used is both a drawback and a strength of this methodology. The main diculty is that devel- oping a good parametric function requires a great deal of chemical intuition and patient eort, guided by trial and error steps with no guarantee of success [6]. How- ever, for systems and processes in which the approach is fruitful, the development eort is amply rewarded by the opportunity to provide extremely fast and accurate force models [7–10]. The identified functional forms will in these cases contain valuable knowledge on the target system, encoded in a compact formulation that still ac- curately captures the relevant physics. Such knowledge is furthermore often transferable to novel (while similar) systems as a “prior” piece of information, i.e., it consti- tutes a good working hypothesis on how these systems will behave. When QM data on the novel system be- come available, this can be simply used to fine-tune the parameters of the functional form to a new set of best-fit values that maximise prediction accuracy. Following a dierent approach, “nonparametric” ML force fields can be constructed, whose dependence on the atomic position is not constrained to a particular

Transcript of Efficient nonparametric n-body force fields from machine learning · Since their conception,...

Page 1: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

Efficient nonparametric n-body force fields from machine learning

Aldo Glielmo,1, ⇤ Claudio Zeni,1, † and Alessandro De Vita1, 2

1Department of Physics, King’s College London, Strand, London WC2R 2LS, United Kingdom

2Dipartimento di Ingegneria e Architettura, Università di Trieste, via A. Valerio 2, I-34127 Trieste, Italy

We provide a definition and explicit expressions for n-body Gaussian Process (GP) kernels, whichcan learn any interatomic interaction occurring in a physical system, up to n-body contributions, forany value of n. The series is complete, as it can be shown that the “universal approximator” squaredexponential kernel can be written as a sum of n-body kernels. These recipes enable the choice ofoptimally efficient force models for each target system, as confirmed by extensive testing on variousmaterials. We furthermore describe how the n-body kernels can be “mapped” on equivalent repres-entations that provide database-size-independent predictions and are thus crucially more efficient.We explicitly carry out this mapping procedure for the first non-trivial (3-body) kernel of the series,and we show that this reproduces the GP-predicted forces with meV/Å accuracy while being ordersof magnitude faster. These results pave the way to using novel force models (here named “M-FFs”)that are computationally as fast as their corresponding standard parametrised n-body force fields,while retaining the nonparametric character, the ease of training and validation, and the accuracyof the best recently proposed machine learning potentials.

I. INTRODUCTION

Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional theory (DFT) [2, 3] have proven extremely use-ful to investigate complex physical processes that requirequantum accuracy. These simulations are computation-ally expensive, and thus still typically limited to hun-dreds of atoms and the picosecond timescale. For largersystems that are non-uniform and thus intractable us-ing periodic boundary conditions, multiscale embedding(“QM/MM”) approaches can sometimes be used success-fully. This is possible if full quantum accuracy is onlyneeded in a limited (QM) zone of the system, while asimpler molecular mechanics (MM) description sufficeseverywhere else. Very often, however, target problemsrequire minimal model system sizes and simulation timesso large that the calculations must be exclusively basedon classical force fields i.e., force models that use the po-sition of atoms as the only explicitly represented degreesof freedom.

In the remainder of this introductory section we brieflyreview the relative strengths and weaknesses of standardparametrized (P-FFs) and machine learning force fields(ML-FFs). We then consider how accurate P-FFs arehard to develop but eventually fully exploit useful know-ledge on the systems, while GP-based ML-FFs offer ageneral mathematical framework for handling trainingand validation, but are significantly slower (Section I A).These shortcomings motivate an analysis of how priorknowledge such as symmetry has been so far incorpor-ated in GP kernels (Section I B) and points to featuresstill missing in ML kernels, which are commonplace inthe more standard, highly efficient P-FFs based on trun-cated n-body expansions (Section I C). This suggests the

[email protected][email protected]

possibility of defining a series of n-body GP kernels (Sec-tion II B), providing a scheme to construct them (SectionII C and D) and, after the best value of n for the giventarget system has been identified with appropriate test-ing (Section II E), exploiting their dimensionally-reducedfeature spaces to massively boost the execution speed offorce prediction (Section III).

A. Parametrized and machine learning force fields

Producing accurate and fully transferable force fields isa remarkably difficult task. The traditional way to do thisinvolves adjusting the parameters of carefully chosen ana-lytic functions in the hope of matching extended referencedata sets obtained from experiments or quantum calcu-lations [4, 5]. The descriptive restrictiveness of the para-metric functions used is both a drawback and a strengthof this methodology. The main difficulty is that devel-oping a good parametric function requires a great dealof chemical intuition and patient effort, guided by trialand error steps with no guarantee of success [6]. How-ever, for systems and processes in which the approachis fruitful, the development effort is amply rewarded bythe opportunity to provide extremely fast and accurateforce models [7–10]. The identified functional forms willin these cases contain valuable knowledge on the targetsystem, encoded in a compact formulation that still ac-curately captures the relevant physics. Such knowledgeis furthermore often transferable to novel (while similar)systems as a “prior” piece of information, i.e., it consti-tutes a good working hypothesis on how these systemswill behave. When QM data on the novel system be-come available, this can be simply used to fine-tune theparameters of the functional form to a new set of best-fitvalues that maximise prediction accuracy.

Following a different approach, “nonparametric” MLforce fields can be constructed, whose dependence onthe atomic position is not constrained to a particular

Page 2: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

2

analytic form. An implementation and tests explor-ing the feasibility of ML to describe atomic interactionscan be found, e.g., in pioneering work by Skinner andBroughton [11] that proposed using ML models to re-produce first-principles potential energy surfaces. Morerecent works implementing this general idea have beenbased on Neural Networks [12], Gaussian Process (GP)regression [13] or linear regression on properly definedbases [14]. Current work aims at making these learningalgorithms both faster and more accurate [15–20].

As processing power and data communication band-width increase, and the cost of data storage decreases,modeling based on ML and direct inference promisesto become an increasingly attractive option, comparedwith more traditional classical force field approaches.However, although ML schemes are general and havebeen shown to be remarkably accurate interpolators inspecific systems, so far they have not become as wide-spread as it might have been expected. This is mainlybecause “standard” classical potentials are still ordersof magnitude faster than their ML counterpart [21].Moreover, ML-FFs also involve a more complex math-ematical and algorithmic machinery than the traditionalcompact functional forms of P-FFs, whose arguments arephysically descriptive features that remain easier to visu-alize and interpret.

B. Prior knowledge and GP kernels

These shortcomings provide motivation for the presentwork. The high computational cost of many ML mod-els is a direct consequence of the general inverse relationbetween the sought flexibility and the measured speed ofany algorithm capable of learning. Highly flexible MLalgorithms by definition assume very little or no priorknowledge of the target systems. In a Bayesian context,this involves using a general prior kernel, typically aspir-ing to preserve the full universal approximator propertiesof e.g., the square exponential kernel [22, 23]. The priceof such a kernel choice is that the ML algorithm willrequire large training databases [24], slowing down com-putations as the prediction time grows linearly with thedatabase size.

Large database sizes are not, however, unavoidable,and any data-intensive and fully flexible scheme to po-tential energy fitting is suboptimal by definition, as it ex-ploits no prior knowledge of the system. This completely“agnostic” approach is at odds with the general lessonfrom classical potential development, indicating that it isessential for efficiency to incorporate in the force predic-tion model as much prior knowledge of the target systemas can be made available. In this respect, GP kernels canbe tailored to bring some form of prior knowledge to thealgorithm.

For example, it is possible to include any symmetryinformation of the system. This can be done by usingdescriptors that are independent of rotations, transla-

tions and permutations [15, 25–27]. Alternatively, onecan construct scalar-valued GP kernels that are made in-variant under rotation (see e.g., [16, 28]) or matrix-valuedGP kernels made covariant under rotation ([16], an ideathat can be extended to higher-order tensors [29, 30]). In-variance or covariance are in these cases obtained start-ing from a non-invariant representation by appropriateintegration over the SO(3) rotation group [16, 28].

Symmetry aside, progress can be made by attemptingto use kernels based on simple, descriptive features cor-responding to low-dimensional feature spaces. Taking in-spiration from parametrized force fields, these descriptorscould e.g., be chosen to be interatomic distances takensingularly or in triplets, yielding kernels based on 2- or3-body interactions [16, 31, 32]. Since low-dimensionalfeature spaces allow efficient learning (convergence isreached using small databases), to the extent that simpledescriptors capture the correct physics, the GP processwill be a relatively fast, while still very accurate, inter-polator.

C. Scope of the present work

There are, however, two important aspects that havenot as yet been fully explored while trying to developefficient kernels based on dimensionally reduced featurespaces. Both aspects will be addressed in the presentwork.

First, a systematic classification of rotationally invari-ant (or covariant, if matrix valued) kernels, representat-ive of the feature spaces corresponding to n-body inter-actions is to date still missing. Namely, no definition orgeneral recipe has been proposed for constructing n-bodykernels, or for identifying the actual value (or effectiveinterval of values) of n associated with already availablekernels. This would be clearly useful, however, as thediscussion above strongly suggests that the kernel cor-responding to the lowest value of n compatible with thephysics of the target system will be the most information-ally efficient one for carrying out predictions: striking theright balance between speed and accuracy.

Second, for any ML approach based on a GP kernel anda fixed database, the GP predictions for any target con-figuration are also fixed once and for all. For an n-bodykernel, these predictions do not need, however, to be ex-plicitly carried out as sums over the training dataset, asthey could be approximated with arbitrary precision by“mapping” the GP prediction on a new representationbased on the underlying n-body feature space. We notethat this approximation step would make the final pre-diction algorithm independent of the database size, andthus in principle as fast as any classical n-body potentialbased on functional forms, while still parameter free. Theremainder of this work explores these two issues, and itis structured as follows.

In the next Section II, after introducing the termino-logy and the notation (II A), we provide a definition of

Page 3: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

3

an n-body kernel (II B) and we propose a systematic wayof constructing n-body kernels of any order n, showinghow previously proposed approaches can be reinterpretedwithin this scheme (II C and D). We furthermore show,by extensive testing on a range realistic materials, howthe optimal interaction order can be chosen as the lowestn compatible with the required accuracy and the avail-able computational power (II E). In the following SectionIII we describe how the predictions of n-body GP kernelscan be recast (mapped) with arbitrary accuracy into veryfast nonparametric force fields based on machine learning(M-FFs) which fully retain the n-body character of theGP process from which they were derived. The proced-ure is carried out explicitly for a 3-body kernel, and wefind that evaluating atomic forces is orders of magnitudefaster than the corresponding GP calculation.

II. n-BODY EXPANSIONS WITH n-BODYKERNELS

A. Notation and terminology

GP-based potentials are usually constructed by assign-ing an energy " to a given atomic configuration ⇢, typic-ally including a central atom and all its neighbors up to asuitable cutoff radius. The existence of a correspondinglocal energy function "(⇢) is generally assumed, in orderto provide a total energy expression and guarantee a lin-ear scaling of the predictions with the total number ofatoms in the system. Within GP regression this functionis calculated from a database D = {⇢d, "d, fd}Nd=1 of ref-erence data, typically obtained by quantum mechanicalsimulations, and usually consisting of a set of N atomicconfigurations {⇢d} together with their relative energies{"d} and/or forces {fd}.

It is worth noting here that although there is no welldefined local atomic energy in a reference quantum sim-ulation, one can always use gradient information (atomicforces, which are well defined local physical quantities)to machine-learn a potential energy function. This canbe done straightforwardly using derivative kernels (cf.,e.g., Ref. [22] or Ref. [33]) to learn and predict forces.Alternatively, one can learn forces directly without an in-termediate energy expression, as done in Refs. [15, 34] ormore recently in Ref. [16]. A necessary condition for anyof these approaches to produce energy-conserving forcefields (i.e., fields that make zero work on any closed tra-jectory loop) is that the database is constructed onceand for all, and never successively updated. After train-ing on the given fixed database, the GP prediction on atarget configuration ⇢ consists of a linear combination ofthe kernel function values measuring the similarity of thetarget configuration with each database entry:

"(⇢) =NX

d=1

k(⇢, ⇢d)↵d, (1)

where the coefficients ↵d are obtained by means of in-version of the covariance matrix [22] and can be shownto minimise the regularised quadratic error between GPpredictions and reference calculations.

B. Definition of an n-body kernel

Classical interatomic potentials are often character-ized by the number of atoms (“bodies”) they let inter-act simultaneously (cf. e.g., Refs. [9, 10]). To translatethis concept into the realm of GP regression, we assumethat the target configuration ⇢({ri}) represents the localatomic environment of an atom fixed at the origin of aCartesian reference frame, expressed in terms of the re-lative positions ri of the surrounding atoms. We definethe order of a kernel kn(⇢, ⇢0) as the smallest integer n

for which the following property holds true:

@nkn(⇢, ⇢0)

@ri1 · · · @rin= 0 8ri1 6= ri2 6= · · · 6= rin , (2)

where ri1 , . . . , rin are the positions of any choice of aset of n different surrounding atoms. By virtue of lin-earity, the local energy in Eq. (1) will also satisfy thesame property if kn does. Thus, Eq. (2) implies that thecentral atom in a local configuration interacts with upto n � 1 other atoms simultaneously, making the inter-action energy term n-body. For instance, using a 2-bodykernel, the force on the central atom due to atom rj willnot depend on the position of any other atom rl 6=j be-longing to the target configuration ⇢({ri}). Eq. (2) canbe used directly to check through either numeric or sym-bolic differentiation if a given kernel is of order n, a factthat might be far from obvious from its analytic form,depending on how the kernel is built.

C. Building n-body kernels I: SO(3) integration

Following a standard route [16, 28, 35], we begin byrepresenting each local atomic configuration as a sum ofGaussian functions N with a given variance �

2, centeredon the M atoms of the configuration:

⇢(r, {ri}) =MX

i=1

N (r | ri,�2), (3)

where r and {ri}Mi=1 are position vectors relative to thecentral atom of the configuration. This representationguarantees by construction invariance with respect totranslations and permutations of atoms (here assumedto be of a single chemical species). As described in [16],a covariant 2-body force kernel can be constructed fromthe non-invariant scalar (“base”) kernel obtained as a dot

Page 4: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

4

product overlap integral of the two configurations:

k2(⇢, ⇢0) =

Zdr ⇢(r)⇢0(r)

= L

X

i2⇢,j2⇢0

e�(ri�r0j)2/4�2

, (4)

where L is an unessential constant factor, omitted forconvenience from now on. That (4) is a 2-body ker-nel consistent with the definition of Eq. (2) can bechecked straightforwardly by explicit differentiation (seeAppendix A). Its 2-body structure is also readable fromthe fact that k2 is a sum of contributions comparing pairsof atoms in the two configurations, the first pair locatedat the two ends of vector ri in the target configuration ⇢,and consisting of the central atom and atom i, and thesecond pair similarly represented by the vector r0

jin the

database configuration ⇢0. A rotation-covariant matrix-

valued force kernel can at this point be constructed byHaar integration [36, 37] as an integral over the SO(3)manifold [16]:

Ks

2(⇢, ⇢0) =

Z

SO(3)dRR k2(⇢,R⇢

0). (5)

This kernel can be used to infer forces on atoms using aGP regression vector formula analogous to Eq. (1) (seeRef. [16]). These forces belong to a 2-body force fieldpurely as a consequence of the base kernel property inEq. (2). It is interesting to notice that there is no use orneed for an intermediate energy expression to constructthis 2-body force field, which is automatically energy-conserving.

Higher order n-body base kernels can be constructedas finite powers of the 2-body base kernel (4):

kn(⇢, ⇢0) = k2(⇢, ⇢

0)n�1 (6)

where the n-body property (Eq. (2)) can once more bechecked by explicit differentiation (see Appendix A). Fur-thermore, taking the exponential of the kernel in Eq. (4)gives rise to a fully many-body base kernel, as all powersof k2 are contained in the exponential formal series ex-pansion:

kMB(⇢, ⇢0) = ek2(⇢,⇢

0)/✓2

= 1 +1

✓2k2 +

1

2!✓4k3 + . . . . (7)

One can furthermore check that the simple exponentialmany-body kernel kMB defined above is, up to normal-isation, equivalent to the squared exponential kernel [22]on the natural distance induced by the dot product kernelk2(⇢, ⇢0):

e�d2(⇢,⇢0)/2✓2

=N(⇢)N(⇢0)kMB(⇢, ⇢0) (8)

d2(⇢, ⇢0) =k2(⇢, ⇢) + k2(⇢

0, ⇢

0)� 2k2(⇢, ⇢0). (9)

To check on these ideas, we next test the accuracy ofthese kernels in learning the interactions occurring in a

2 3 4 5Interaction order

1E − 06

1E − 05

1E − 04

1E − 03

1E − 02

1E − 01

1E00

Rela

tive

erro

r on

ener

gy

Kernel2-body3-body4-bodymany-body

Figure 1. GP relative error as a function of the interactionorder (2- to 5-body), using n-body kernels with increasingn. Learning energies within baseline precision (black dashedline) requires an n-body kernel with n at least as high as theparticles’ interaction order.

simple 1D model consisting of n0 particles interacting viaan ad-hoc n

0-body potential (see Appendix B.). We firstlet the particles interact to generate a configuration data-base, and then attempt to machine-learn these interac-tions using the kernels just described. Figure 1 illustratesthe average prediction errors on the local energies of thissystem incurred by the GP regression based on four dif-ferent kernels as a function of the interaction order n

0.It is clear from the graph that a force field that lets then0 particles interact simultaneously can only be learned

accurately with a (n � n0)-body kernel (6), or with the

many-body exponential kernel (7) which contains all in-teraction orders.

To construct n-body kernels useful for applications toreal 3D systems we need to include rotational symmetryby averaging over the rotation group. For our presentscopes, it is sufficient to discuss the case of rotation-invariant n-body scalar energy kernels, for which theintegral (formally a transformation integration [38]) isreadily obtained from Eq. (5) by simply dropping the Rmatrix in the integrand:

ks

n(⇢, ⇢0) =

Z

SO(3)dR kn(⇢,R⇢

0). (10)

The use of this integral in the context of potential en-ergy learning was originally proposed in [28], where itwas carried out using appropriate functional expansions.Alternatively, one can exploit the Gaussian nature ofthe configuration expansion (3) to obtain an analytic-ally exact formula, as done further below. The resultingsymmetrized n-body kernel ks

nwill learn faster than its

non-symmetrized counterpart kn, as the rotational de-grees of freedom have been integrated out. This is be-cause a non-symmetrized n-body kernel (kn) must learnfunctions of 3n� 3 variables (translations are taken intoaccount by the local representation based on relative po-

Page 5: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

5

sition in Eq. (3)). After integration, the new kernel ksn

defines a smaller and more physically-based space of func-tions of 3n � 6 variables, which is the rotation-invariantfunctional domain of n interacting particles.

The symmetrization integral in Eq. (10) can be writtendown for the many-body base kernel kMB (Eq. (7)), todefine a new many-body kernel ks

MBinvariant under all

physical symmetries:

ks

MB(⇢, ⇢0) =

Z

SO(3)dR kMB(⇢,R⇢

0). (11)

By virtue of the universal approximation theorem [22, 39]this kernel would be able to learn arbitrary physical in-teractions with arbitrary accuracy, if provided with suf-ficient data.

Unfortunately, the exponential kernel (7) has to dateresisted all attempts to carry out the analytic integra-tion over rotations (11), leaving as the only open optionsnumerical integration, or discrete summation over a rel-evant point group of the system [16]. On the other hand,the analytic integration of 2- or 3-body kernels to givesymmetric n-body kernels can be carried out in differentways.

For example, one could use an intermediate step thatwas introduced during the construction of the widelyused SOAP kernel [28, 31, 40–42]. This kernel has afull many-body character [28], ensured by the prescribednormalisation step defined by Eqs. (31-36) of the stand-ard Ref. [28]), which made it possible to use it e.g., toaugment to full many-body the descriptive power of a 2-and 3-body explicit kernel expansion [26]. However, theHaar integral over rotations introduced in [28] as an in-termediate kernel construction step could also be seen, iftaken on its own, as a transformation integration proced-ure [38] yielding a symmetrized n-body kernel as definedin Eq. (10) above, which would in turn become a higherfinite-order kernel if raised to integer powers ⇣ � 2 (seenext subsection).

Carrying out Haar integrals is not, in general, an easytask. In the example above, computing a general rotationinvariant n-body kernel via the exact, suitably truncatedspherical harmonics expansion procedure of Ref. [28] be-comes challenging for n > 3. Significant difficulties like-wise arise if attempting a “covariant” integration over ro-tations, for which we found an exact analytic expressiononly for 2- and 3-body matrix-valued kernels [16], with atechnique that becomes unviable for n > 3. Fortunately,the Haar integration can be avoided altogether, follow-ing the simple route of constructing symmetric n-kernelsdirectly using symmetry-invariant descriptors, as we willsee in the next section. The problem of obtaining ananalytic Haar integral expression for the general n caseremains, however, an interesting one, which we tackle inthe remainder of this section following a novel analyticroute.

We first write the n-body base kernel of Eq. (6) as anexplicit product of (n � 1) 2-body kernels. The Haar

0.0

0.5

1.0 n = 2 n = 3

0.0 0.5 1.00.0

0.5

1.0 n = 4

0.0 0.5 1.0

n = 5

Numerical integral

Anal

ytic

al in

tegr

al

Figure 2. Scatter plots showing the values of the integralin (13) (on a random selection of configurations) computedeither by numerical integration or via the analytic expression(Eqs. (14, 15, 17)). Interaction orders from n = 2 to n = 5are considered.

integral (10) can then be written as

ks

n(⇢, ⇢0) =

X

i=(i1,...,in�1)2⇢

j=(j1,...,jn�1)2⇢0

k̃i,j (12)

k̃i,j =

ZdR e�

kri1�Rr0j1k2

4�2 . . . e�krin�1

�Rr0jn�1k2

4�2 (13)

where now for each of the two configurations ⇢, ⇢0, the

sum runs over all n-plets of atoms that include the centralatom (whose indices i0 and j0 are thus omitted). Expand-ing the exponents as (ri�Rr0

j)2 = r

2i+r

02j�2Tr(Rr0

jrTi)

allows us to extract from the integral (13) a rotationindependent constant Ci,j, and to express the rotation-dependent scalar products sum as a trace of a matrixproduct:

k̃i,j = Ci,jIi,j (14)

Ci,j = e�(r2i1+r

02j1

+...r2in�1

+r02jn�1

)/4�2

(15)

Ii,j =

ZdR eTr(RMi,j) (16)

where the matrix Mi,j is the sum of the outer productsof the ordered vector couples in the two configurations:Mi,j = (r0

j1rTi1+ · · ·+ r0

jn�1rTin�1

)/2�2. The integral (16)occurs in the context of multivariate statistics as the gen-erating function of the non-central Wishart distribution[43]. As shown in [44], it can be expressed as a powerseries in the symmetric polynomials (↵1 =

P3iµi,↵2 =P3

i<jµiµj ,↵3 = µ1µ2µ3) of the eigenvalues {µi}

3i=1 of

Page 6: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

6

the symmetric matrix MTi,jMi,j:

Ii,j =X

p1,p2,p3

Ap1p2p3↵p11 ↵

p22 ↵

p33 (17)

Ap1p2p3 =⇡ 2�(1+2p1+4p2+6p3)(p1 + 2p2 + 4p3)!

p1!p2!p3!�(32 + p1 + 2p2 + 3p3)�(1 + p2 + 2p3)

⇥1

�( 12 + p3)(p1 + 2p2 + 3p3)!. (18)

Remarkably, in this result (whose exactness is checkednumerically in Figure 2) the integral over rotations doesnot depend on the order n of the base kernel, once thematrix Mi,j is computed. This is not the case for previ-ous approaches to integrating over rotations [16, 28] thatneed to be reformulated with increasing and eventuallyprohibitive difficulty each time the order n needs to beincreased.

However, the final expression given by Eqs. (14-18)is still a relatively complex and computationally heavyfunction of the atomic positions. To alleviate its evalu-ation cost, it would be interesting to see whether it is pos-sible to recast it as an explicit scalar product in a givenfeature space. This would allow e.g., to transfer mostof the computational burden to the pre-computation ofthe corresponding basis functions. Fortunately such com-plexity can be largely avoided altogether if equally accur-ate kernels can be built by physical intuition at least forthe most relevant lowest n orders, as discussed in thenext section.

D. Building n-body kernels II: n-body featurespaces and uniqueness issues

The practical effect of the Haar integration (10) isthe elimination of the three spurious rotational degreesof freedom. The same result can always be achievedby selecting a group of symmetry- invariant degreesof freedom for the system, typically including the dis-tances and/or bond angles found in local atomic envir-onments, or simple functions of these. Appropriate sym-metrized kernels can then simply be obtained by defininga similarity measure directly on these invariant quantities[15, 26, 27, 45]. To construct symmetry invariant n-bodykernels with n = 2 and n = 3 we can choose these degreesof freedom to be just interparticle distances:

ks

2(⇢, ⇢0) =

X

i2⇢

j2⇢0

k̃2(ri, rj) (19)

ks

3(⇢, ⇢0) =

X

i1,i22⇢

j1,j22⇢0

k̃3((ri1 , ri2 , ri1i2), (r0j1, r

0j2, r

0j1j2

))

(20)

where the k̃ are kernel functions that directly specify thecorrelation of distances, or triplets of distances, found

10 100 1000Training points

0.08

0.1

0.12

0.14

0.16

0.18

0.2

MA

E o

f for

ce [e

V/Å

]

2-body2-body, Haar integrated3-body3-body, Haar integrated

Figure 3. Learning curves for 2 and 3-body kernels obtainedeither via a Haar integration (Eqs. (12-18)), or directly spe-cifying a similarity kernel function of the effective degrees offreedom (Eqs. (19, 20)).

within the two configurations. Since these kernels learnfunctions of low-dimensional spaces, their exact analyticform is not essential for performance, as any fully non-linear function k̃ will give equivalent converged results inthe rapidly reached large-database limit. This equival-ence can be neatly observed in Figure 3, which reports theperformance of 2- and 3-body kernels built either directlyover the set of distances (Eqs. (19) and (20)) or via theexact Haar integral (Eqs. (12-18)). As the test system iscrystalline Silicon, 3-body kernels are better performing.However, since convergence of the 2- and 3-body featurespace is quickly achieved (at about N = 50 and N = 100respectively), there is no significant performance differ-ence between SO(3)-integrated n-body kernels and phys-ically motivated ones. Consequently, for low interactionorders, simple and computationally fast kernels like theones in Eqs. (19, 20) are always preferable to more com-plex (and heavier) alternatives obtained via integrationover rotations (e.g., the one defined by Eqs. (12-18) orthose found in Refs. [16, 28].

We note at this point that Eq. (19) can be generalizedto construct a symmetric n-body kernel

ks

n(⇢, ⇢0) =

X

i1,...,in�12⇢

j1,...,jn�12⇢0

k̃n(qi1,...,in�1 ,q0j1,...,jn�1

), (21)

where the components of the feature vectors q are thechosen symmetry-invariant degrees of freedom describingthe n-plet of atoms.

The q feature vectors are required to be (3n � 6) di-mensional for all n, except for n = 2, where they becomescalars. In practice, for n > 3 selecting a suitable set ofinvariant degrees of freedom is not trivial. For instance,for n = 4 the set of six unordered distances betweenfour particles does not specify their relative positions un-ambiguously, while for n > 4 the number of distancesassociated with n atoms exceeds the target feature space

Page 7: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

7

Non UniqueUnique

ρρ

Figure 4. Unique interaction (left panel) associated with the3-body kernel ks

3 (20) compared with the non-unique 3-bodyinteraction (right panel) associated with the kernel k¬u

3 =(ks

2)2 (22), which is a function of two distances only (see text).

dimension 3n� 6. Meanwhile, the computational cost ofevaluating the full sum in Eq. (21) very quickly becomesprohibitively large as the number of elements in the sumgrows exponentially with n.

The order of an already symmetric n-body kernel canhowever be augmented with no computational overheadby generating a derived kernel through simple exponenti-ation to an integer power, at the cost of losing the unique-ness [28, 46, 47] of the representation. This can be easilyunderstood by means of an example (graphically illus-trated in Figure 4). Let us consider the 2-body symmet-ric kernel ks2 (Eq. (19)) which learns a function of justa single distance, and therefore treats the ri distancesbetween the central atom and its neighbors independ-ently. Its square is the kernel

k¬u

3 (⇢, ⇢0) =X

i1,i22⇢

j1,j22⇢0

k̃2(ri1 , r0j1)k̃2(ri2 , r

0j2) (22)

which will be able to learn functions of two distancesri1 , ri2 from the central atom of the target configura-tion ⇢ (see Figure 4) and thus will be a 3-body kernelin the sense of Eq. (2). However, this kernel cannotresolve angular information, as rotating the atoms in ⇢

around the origin by independent, arbitrary angles willyield identical predictions.

Extending this line of reasoning, it is easy to show thatsquaring a symmetric 3-body kernel yields a kernel thatcan capture interactions up to 5-body, although againnon-uniquely. This has often been done in practice bysquaring the SOAP integral [26, 42]. In general, raising a3-body “input” kernel to an arbitrary integer power ⇣ � 2yields an n-body output kernel of order 2⇣+1, k¬u

n=2⇣+1 =

ks3(⇢, ⇢

0)⇣ . This kernel is also non-unique as it will learna function of only 3⇣ variables, while the total number ofrelevant n-body degrees of freedom (3n � 6 = 6⇣ � 3) isalways larger than this. Substituting 3 with any n

0 orderof the symmetrized input kernel will similarly generate ak¬un

= ks

n0(⇢, ⇢0)⇣ kernel of order n = (n0� 1)⇣ + 1. A

kernel order symm. unique name

ks2 2 � � 2-body

k¬u3 3 � ⇥ 3-body, non-uniqueks3 3 � � 3-body

k¬u5 5 � ⇥ 5-body, non-unique

kdsMB 1 ⇠ � many-body, approx. symmetric

Table I. Some of the kernels presented and their properties.

simple calculation reveals that, also in the general case,the number of variables on which k

¬un

is implicitly builtis (3n0

� 6)⇣, always smaller than the full dimension ofn-body feature space (3n0

�3)⇣�3 (as expected, the twobecome equal only for the trivial exponent ⇣ = 1).

None of the kernels obtained as finite powers of somesymmetric lower-order kernels is a many-body one (theywill all satisfy Eq. (2) for some finite n). However, an at-tractive immediate generalization consists of substitutingany squaring or cubing with full exponentiation. For in-stance, exponentiating a symmetrized 3-body kernel weobtain the many-body kernel kMB = exp[ks3(⇢, ⇢

0)]. Itis clear from the infinite expansion in Eq. (7) that thiskernel is a many-body one in the sense of Eq. (2), and isalso fully symmetric 1. As is also the case for all finite-power kernels, the computational cost of this many bodykernel will depend on the order n

0 of the input kernel (3in the present example) as the sum in Eq. (21) only runson the atomic n

0-plets (here, triplets) in ⇢ and ⇢0. This

new kernel is not a priori known to neglect any orderof interaction that might occur in a physical system andthus be encoded in a reference QM training database.

To summarise, we provided a definition for an n-bodykernel, and proposed a general formalism for building n-body kernels by exact Haar integration over rotations.We then defined a class of simpler kernels based on rota-tion invariant features that are also n-body according tothe previous definition. As both approaches become com-putationally expensive for high values of n, we pointedout that n-body kernels can be built as powers of lower-order input n

0-body kernels, with no additional compu-tational overhead. While such a procedure in our casecomes at the cost of sacrificing the unicity property of thedescriptor, it also suggests how to build, by full exponen-tiation, a many-body symmetric kernel. For many applic-ations, however, using a finite-order kernel will providethe best option.

1 One could also inexpensively obtain a many body kernel bynormalisation of an explicit finite order one, for instance, askMB = ks3(⇢, ⇢

0)/p

ks3(⇢, ⇢)ks3(⇢

0, ⇢0). The denominator makesthis many-body in the sense of Eq. (2) (as is also the case forthe SOAP kernel, see discussion in Section II C, while no Haarintegration is needed here).

Page 8: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

8

10 100 1000Training points

0

0.1

0.2

0.3

0.4M

AE

of fo

rce

[eV

/Å]

2-body3-bodymany-body, approx. symmetricEAM

0.00

0.05

0.10

0.15

0.20

2-body 3-body many-body

BulkCluster

(a)

10 100 1000Training points

0

0.1

0.2

0.3

0.4

MA

E of

forc

e [e

V/Å

]

2-body3-body, non-unique3-body5-body, non-uniquemany-body, approx. symmetricEAM

(b)

10 100 1000Training points

0

0.5

1

1.5

MA

E of

forc

e [e

V/Å

]

2-body3-bodymany-body, approx. symmetric

(c)

10 100 1000Training points

0

0.5

1

1.5

MA

E of

forc

e [e

V/Å

]

2-body3-bodymany-body, approx. symmetric

0.0

0.1

0.2

0.3

0.4

2-body 3-body many-body

CrystalAmorph.

(d)

Figure 5. Learning curves reporting the mean generalization error (measured as the modulus of the difference between targetand predicted force vectors) as a function of the training set size, for different materials and kernels of increasing order. Theinsets in (a) and (d) report the converged error achieved by a given kernel as a function of the kernel’s order. The systemsconsidered are: (a) Crystalline nickel, 500 K (compared to a nickel nanocluster in the inset); (b) iron with a vacancy, 500 K; (c)diamond and graphite, mixed temperatures and pressures; and (d) amorphous silicon, 650 K (compared to crystalline silicon inthe inset). For extra details on the datasets and kernels used, and on the experimental methodology, see Appendixes C and D.

E. Optimal n-kernel choice

In general, choosing a higher order n-body kernel willimprove accuracy at the expense of speed. The optimalkernel choice for a given application will correspond tothe best tradeoff between computational cost and repres-entation power, which will depend on the physical sys-tem investigated. The properties of some of the kernelsdiscussed above are summarized in Table I, while theirperformance is tested on a range of materials in Figure 5.

The figure reveals some general trends. 2-body kernelscan be trained very quickly, as good convergence can beattained already with ⇠100 training configurations. The2-body representation is a very good descriptor for a fewmaterials under specific conditions, while their overallaccuracy is ultimately limited. This will yield e.g., ex-cellent force accuracy for a close-packed bulk system like

crystalline Nickel (inset (a)), and reasonable accuracy fora defected ↵-Fe system whose bcc structure is howevermetastable if just pair potentials are used (inset (b)). Ac-curacy improves dramatically once angular informationis acquired by training 3-body kernels. These can ac-curately describe forces acting on iron atoms in the bulk↵-Fe system containing a vacancy (inset (b)) and thoseacting on carbon atoms in both diamond and graphite(inset (c)). However, 3-body GPs need larger trainingdatabases. Also, atoms participate in many more tripletsthan simple bonds in their standard environments con-tained in the database, which will make 3-body kernelsslower than 2-body ones for making predictions by GPregression. Both problems would extend, getting worse,to higher values of n, as summing over all database con-figurations and all feature n-plets in each database con-figuration will make GP predictions progressively slower.

Page 9: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

9

However, complex materials where high-order interac-tions presumably play a significant role should be expec-ted to be well described by ML-FF based on a many-bodykernel. This is verified here in the case of amorphous Sil-icon (inset (d)).

Identifying the n value best suited for the descriptionof a given material system can also be done in practice bymonitoring how the converged error varies as a functionof the kernel order. Plots illustrating this behaviour areprovided in insets (a) and (d) for nickel and silicon sys-tems, respectively. In each plot the more complex system(a Ni cluster and an amorphous Si system, respectively)display a high accuracy gain (larger negative slope) whenthe kernel order is increased, while the relatively simplercristalline Ni and Si systems show a practically constanttrend on the same scale.

Figure 5 (b) also shows the performance of some non-unique kernels. As discussed above, these are options toincrease the order of an input kernel avoiding the need tosum over the correspondingly higher order n-plets. Ourtests indicate that the ML-FFs generated by non-uniquekernels sometimes improve appreciably on the input ker-nels’ performance: e.g., the error incurred by the 2-bodykernel of Eq. (19) in the Fe-vacancy system is higher thanthat associated with its square, the non-unique 3-bodykernel of Eq. (22). Unfortunately, but not surprisingly,the improvement can be in other cases modest or nearlyabsent, as exemplified by comparing the errors associatedwith the 3-body kernel and its square -the non-unique 5-body kernel-, in the same system.

Overall, the analysis of Figure 5 suggests that an op-timal kernel can be chosen by comparing the learningcurves of the various n-body kernels and the many-bodykernel over the available QM database: the compar-ison will reveal the simplest (most informative, lowestn) description that is still compatible with the error leveldeemed acceptable in the simulation.

Trading transferability for accuracy by training thekernels on a QM database appropriately tailored for thetarget system (e.g., restricted to just bulk or simply-defected system configurations sampled at the relevanttemperatures as done in the Ni and Fe-systems of Fig-ure 5) will enable surprisingly good accuracy even forlow n values. This should be expected to systematicallyimprove on the accuracy performance of classical poten-tials involving non-linear parameter fitting, as exempli-fied by comparing the errors associated with n-body ker-nel models and the average errors of state-of-the-art em-bedded atom model (EAM) P-FFs [7, 48] (insets (a) and(b)). The next section further explores the performanceof GP-based force prediction, to address the final issueof what execution speed can be expected for ML-basedforce fields, once the optimally accurate choice of kernelhas been made.

0 1 2 3 4 5 6 7 8 9 10Error on force [meV/Å]

0

0.2

0.4

0.6

0.8

1

Forc

e er

ror

dist

ribu

tion

1E04 1E05 1E06

Number of grid points

0.1

1

Stde

v. O

f err

or [m

eV/Å

]

Figure 6. nser grids. Inset: standard deviation of the dis-tributions as a function of the number of interpolation gridpoints, on a log-log scale (each distribution in the main panelcorresponds to a dot of same color in the insert).

III. MAPPED FORCE FIELDS (M-FFS)

Once a GP kernel is recognized as being n-body, itautomatically defines an n-body force field correspondingto it, for any given choice of training set. This will be ann-body function of atomic positions satisfying Eq. (2),whose values can be computed by GP regression sumsover the training set as done by standard ML-FF imple-mentations, but do not have to be computed this way.In particular, the execution speed of a machine learning-derived n-body force field might be expected to dependon its order n (e.g., it will involve sums over all atomictriplets, like any 3-body P-FF, if n=3), but should oth-erwise be independent of the training set size. It shouldtherefore be possible to construct a mapping procedureyielding a machine learning-derived, nonparametric forcefield (an efficient “M-FF”) that allows a very significantspeed-up over calculating forces by direct GP regression.We note that non-unique kernels obtained as powers ofn0-body input kernels exceed their reference n

0-body fea-ture space and thus could not be similarly sped up bymapping their predictions onto an M-FF of equal ordern0, while mapping onto an M-FF of the higher output

order n would still be feasible.

For convenience, we will analyze a 3-body kernel case,show that a 3-body GP exactly corresponds to a clas-sical 3-body M-FF, and show how the mapping yieldingthe M-FF can be carried out in this case, using a 3D-spline approximator. The generalization to any order n

is straightforward provided that a good approximator canbe identified and implemented. We begin by inserting thegeneral form of a 3-body kernel (Eq. (21)) into the GPprediction expression (Eq. (1)), to obtain

Page 10: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

10

Pred

icti

on T

ime

GP 3-body Mapped 3-body

10−310−210−1

100101102103104

10M

20 30

Ratio

1 10 100 1000N

N= 500 M = 24

Figure 7. Computational cost of evaluating the 3-body energy(Eq. (24)) as a function of the database size N and the numberof atoms M located within the cutoff radius. Left: time taken(s) for a single energy prediction using the GP (blue dots andsolid line) and the mapped potential (orange dots and solidline), as a function of N , for M=24. The speed-up ratio isalso provided as a dotted line. Right: scaling of the samequantities as a function of M for a training set of N = 500configurations.

"(⇢) =NX

d=1

X

i1,i22⇢

j1,j22⇢d

k̃3(qi1,i2 ,qd

j1,j2)↵d. (23)

Inverting the order of the sums over the database andatoms in the target configurations yields a general ex-pression for the 3-body potential:

"(⇢) =X

i1,i22⇢

"̃(qi1,i2) (24)

"̃(qi1,i2) =NX

d=1

X

j1,j22⇢d

k̃3(qi1,i2 ,qd

j1,j2)↵d. (25)

Eq. (24) reveals that the GP implicitly defines the localenergy of a configuration as a sum over all triplets con-taining the central atom, where the function "̃ representsthe energy associated with each triplet in the physicalsystem. The triplet energy is calculated by three nestedsums, one over the N database entries and two runningover the M atoms of each database configuration (M mayslightly vary over configurations, but can be assumed tobe constant for the present purpose). The computationalcost of a single evaluation of the triplet energy (25) scalesconsequently as O(NM

2). Clearly, improving the GPprediction accuracy by increasing N and M will makethe prediction slower. However, such a computationalburden can be avoided, bringing the complexity of thetriplet energy calculation (25) to O(1).

Since the triplet energy "̃ is a function of just three vari-ables (the effective symmetry-invariant degrees of free-dom associated with three particles in three dimensions),we can calculate and store its values on an appropriately

2.0 2.5 3.0 3.5 4.0 4.5

Distance [Å]

-0.25

0.00

0.25

2-body

80 90 100 110 120 130 140Angle [deg]

0.00

0.10

0.20

3-body, r1 = r2 = 2.4 Å

Ener

gy [

eV]

Figure 8. Energy profiles of the 3-body M-FF, trained for thea-Si system at 650K. Upper panel: 2-body interaction term.Lower panel: 3-body interaction energy for an atomic triplet,angular dependence when the two distances from the centralatoms are both equal to 2.4Å.

distributed grid of points within its domain. This proced-ure effectively maps the GP predictions on the relevant3-body feature space: once completed, the value of thetriplet energy at any new target point can be calculatedvia a local interpolation, using just a subset of nearesttabulated grid points. If the number of grid points Ng

is made sufficiently high, the mapped function will beessentially identical to the original one but, by virtue ofthe locality of the interpolation, the cost of evaluating itwill not depend on Ng.

The 3-body configuration energy of Eq. (24) also in-cludes 2-body contributions coming from the terms inthe sum for which the indices i1 and i2 are equal. Wheni1 = i2 = i the term "(qi,i) can be interpreted as the pair-wise energy associated with the central atom and atom i.The term can consequently be mapped onto a 1D 2-bodyfeature space whose coordinate is the single independ-ent component of the qi,i feature vector, typically thedistance between atom i and the central atom. In thesame way, an n-body kernel naturally defines a set of n-body energy terms of order comprised between 2 and n,depending on the number of repeated indices.

Figure 6 shows the convergence of the mapped forcesderived from the 3-body kernel in Eq. (20) for a data-base of DFTB atomic forces for the a-Si system. Theinterpolation is carried out using a 3D cubic spline fordifferent 3D mesh sizes. Comparison with the referenceforces produced by the GP allows to generate, for eachmesh size, the distribution of the absolute deviation ofthe force components from their GP-predicted values.The standard deviation of the interpolation error dis-tribution is shown in the insert on a log-log scale, asa function of Ng. Depending on the specific referenceimplementation, the speed-up in calculating the local en-ergy (Eq. (24)) provided by the mapping procedure canvary widely, while it will always grow linearly with N

and quadratically with M (see Figure 7), and it will bealways substantial: in typical testing scenarios we foundthis to be of the order of 103 � 104.

Page 11: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

11

An M-FF example, obtained for a-Si with n = 3 isshown in Figure 8. As the energy profile is not prescribedby any particular functional form, it is free to optimallyadapt to the information contained in the QM trainingset, to best reproduce the quantum interactions that pro-duced it. Figure 8 contains some expected features e.g., aradial minimum at about r ' 2.4Å in the 2-body section(upper panel), the corresponding angular minimum at✓0 ' 110� (lower panel), which is approximately equal tothe sp

3 hybridization angle of 109.47�, and rapid growthfor small radii (upper panel) and angles (lower panel).Less intuitive features are also visible, which howevercontribute to the best representation of the bulk system’sinteractions that a 3-body expansion can achieve for thegiven database. An example is the shallow maximum inthe 2-body section at r ' 3.1Å, which would of coursedisappear if we fitted our model on QM forces calculatedfor a Si dimer, that do not contain a hump. The res-ulting Si force field, appropriate for a Si dimer, wouldhowever inevitably reproduce the QM bulk interactionsless accurately. More generally, training on the aggreg-ate dataset could be a sensible compromise, producing amore transferable, but locally less accurate force field.

IV. CONCLUDING REMARKS

The results presented in this work exemplify how phys-ical priors built in the GP kernels restrict their descript-iveness, while improving their convergence speed as afunction of training dataset size. This provides a frame-work to optimise efficiency. Comparing the performanceof n-body kernels allows us to identify the lowest order nthat is compatible with the required accuracy as a con-sequence of the physical properties of the target system.As a result, accuracy can in each case be achieved usingthe most efficient kernel e.g., a 2-body kernel for bulkNi, or a 3-body kernel for carbon and graphite, for a ⇠

0.1 eV/Å target force accuracy, see Figure 5. As shouldbe reasonably expected, relying on low-dimensional fea-ture spaces will limit the maximum achievable accuracy ifhigher-order interactions, missing in the representation,occur in the target system.

On the other hand, we find that once the optimal ordern has been identified and the n-kernel has been trained(whichever its form e.g., whether defined as a functionof invariant descriptors as in (21), or constructed as apower of such a function, or derived as an analytic integ-ral over rotations using Eqs. (12-18)), it becomes possibleto map its prediction on the appropriate n-dimensionaldomain, and thus generate a n-body M-FF machine-learned atomic force field that predicts atomic forces atessentially the same computational cost of a classical n-body P-FF parametric force field. The GP predictionsallow for a natural intrinsic measure of uncertainty -theGP predicted variance-, and the same mapping proced-ure used for the former can also be applied to the lat-ter. Thus, like their ML-FF counterparts, and unlike

P-FFs, M-FFs offer a tool which could be used to mon-itor whether any extrapolation is taking place that mightinvolve large prediction errors.

In general, our results suggest a possible three-stepprocedure to build fast nonparametric M-FFs whenevera series of kernels kn can be defined with progressivelyhigher complexity/descriptive power and well-definedfeature spaces with n-dependent dimensionality. How-ever the series is constructed (and whether or not it con-verges to a universal descriptor) this will involve (i) GPtraining of n-kernels for different values of n, in each caseusing as many database entries relevant to the target sys-tem as needed to achieve convergence of the n-dependentprediction error; (ii) identification of the optimal order n,yielding the simplest viable description of the system’sinteractions - this could be e.g., the minimal value ofn compatible with the target accuracy or, within a GPstatistical learning framework, the result of a Bayesianmodel selection procedure [22]; (iii) mapping of the res-ulting (GP-predicted) ML-FF onto an efficient M-FF, us-ing a suitably fast approximator function defined over therelevant feature space.

A major limitation of the M-FFs obtained this way isthat, similar to P-FFs, each of them can be used onlyin “interpolation mode”, that is when the target config-urations are all well represented in the fixed databaseused. This is not the case in molecular dynamics simula-tions potentially revealing new chemical reaction paths,or whenever online learning or the use of dynamically-adjusted database subsets are necessary to avoid unval-idated extrapolations and maximise efficiency. In suchcases, “learn on the fly” (LOTF) algorithms can be de-ployed, which have the ability to incorporate novel QMdata into the database used for force prediction. In suchschemes, the new data are either developed at runtimeby new QM calculations, or they are adaptably retrievedas the most relevant subset of a much larger availableQM database [15]. The availability of an array of n-bodykernels is very useful for this class of algorithms, whichprovides further motivation for their development. Inparticular, distributing the use of n-body kernels non-uniformly in both space and time along the system’s tra-jectory has the potential to provide an optimally efficientapproach to accurate MD simulations using the LOTFscheme. Finally, while the complication of continuouslymapping the GP predictions to reflect a dynamically up-dated training database makes on the fly M-FF genera-tion a less attractive option, a strategy to produce theMD trajectory with classical force field efficiency mightinvolve using (concurrently across the system, and at anygiven time) a locally optimal choice built from a compre-hensive set of pre-computed low-order M-FFs.

ACKNOWLEDGEMENTS

The authors acknowledge funding by the Engineer-ing and Physical Sciences Research Council (EPSRC)

Page 12: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

12

through the Centre for Doctoral Training “Cross Dis-ciplinary Approaches to Non-Equilibrium Systems”(CANES, Grant No. EP/L015854/1) and by the Officeof Naval Research Global (ONRG Award No. N62909-15-1-N079). ADV acknowledges further support by theEPSRC HEmS Grant No. EP/L014742/1 and by theEuropean Union’s Horizon 2020 research and innovationprogram (Grant No. 676580, The NOMAD Laboratory,a European Centre of Excellence). We are grateful to the

UK Materials and Molecular Modelling Hub for compu-tational resources, which is partially funded by EPSRC(EP/P020194/1). We furthermore thank Gábor Csányi,from the Engineering Department, University of Cam-bridge, for the Carbon database and Samuel Huberman,from the MIT Nano-Engineering Group for the initialgeometry used in the a-Si simulations. Finally, we wantto thank Ádám Fekete for stimulating discussions andprecious technical help.

[1] R. Car and M. Parrinello, Physical Review Letters 55,2471 (1985).

[2] P. Hohenberg and W. Kohn, Physical review 136 (1964).[3] W. Kohn and L. J. Sham, Physical review 140, A1133

(1965).[4] F. H. Stillinger and T. A. Weber, Physical review B31,

5262 (1985).[5] J. Tersoff, Physical Review B 37, 6991 (1988).[6] D. W. Brenner, physica status solidi(b) 217, 23 (2000).[7] Y. Mishin, Acta Materialia 52, 1451 (2004).[8] A. C. T. van Duin, S. Dasgupta, F. Lorant, and W. A.

Goddard, The Journal of Physical Chemistry A 105,9396 (2001).

[9] G. A. Cisneros, K. T. Wikfeldt, L. Ojamäe, J. Lu, Y. Xu,H. Torabifard, A. P. Bartók, G. Csányi, V. Molinero, andF. Paesani, Chemical Reviews 116, 7501 (2016).

[10] S. K. Reddy, S. C. Straight, P. Bajaj, C. Huy Pham,M. Riera, D. R. Moberg, M. A. Morales, C. Knight, A. W.Götz, and F. Paesani, The Journal of Chemical Physics145, 194504 (2016).

[11] A. J. Skinner and J. Q. Broughton, Modelling and Simu-lation in Materials Science and Engineering 3, 317 (1995).

[12] J. Behler and M. Parrinello, Physical Review Letters 98,146401 (2007).

[13] A. P. Bartók, M. C. Payne, R. Kondor, and G. Csányi,Physical Review Letters 104, 136403 (2010).

[14] A. V. Shapeev, Multiscale Modeling & Simulation 14,1153 (2016).

[15] Z. Li, J. R. Kermode, and A. De Vita, Physical ReviewLetters 114, 096405 (2015).

[16] A. Glielmo, P. Sollich, and A. De Vita, Physical ReviewB 95, 214302 (2017).

[17] V. Botu and R. Ramprasad, Physical Review B 92,094306 (2015).

[18] G. Ferré, T. Haut, and K. Barros, (2016), 1612.00193v1.[19] E. V. Podryabinkin and A. V. Shapeev, Computational

Materials Science 140, 171 (2017).[20] A. Takahashi, A. Seko, and I. Tanaka, (2017),

1710.05677.[21] J. R. Boes, M. C. Groenenboom, J. A. Keith, and J. R.

Kitchin, International Journal of Quantum Chemistry116, 979 (2016).

[22] C. K. I. Williams and C. E. Rasmussen, the MIT Press(2006).

[23] C. M. Bishop, Pattern Recognition and Machine Learn-

ing, Information Science and Statistics (Springer, NewYork, NY, 2006).

[24] M. J. Kearns and U. V. Vazirani, An Introduction to

Computational Learning Theory (MIT Press, 1994).

[25] M. Rupp, R. Ramakrishnan, and O. A. von Lilien-feld, The Journal of Physical Chemistry Letters 6, 3309(2015).

[26] V. L. Deringer and G. Csányi, Physical Review B 95,094203 (2017).

[27] F. A. Faber, A. S. Christensen, B. Huang, and O. A. vonLilienfeld, The Journal of Chemical Physics 148, 241717(2018).

[28] A. P. Bartók, R. Kondor, and G. Csányi, Physical Re-view B 87, 184115 (2013).

[29] T. Bereau, R. A. DiStasio Jr, A. Tkatchenko, and O. A.von Lilienfeld, arXiv.org (2017), 1710.05871v1.

[30] A. Grisafi, D. M. Wilkins, G. Csányi, and M. Ceriotti,arXiv.org (2017), 1709.06757v1.

[31] W. J. Szlachta, A. P. Bartók, and G. Csányi, PhysicalReview B 90, 104108 (2014).

[32] H. Huo and M. Rupp, arXiv.org (2017), 1704.06439v1.[33] I. Macêdo and R. Castro, Learning divergence-free and

curl-free vector fields with matrix-valued kernels (Insti-tuto Nacional de Matematica Pura e Aplicada, 2008).

[34] V. Botu and R. Ramprasad, International Journal ofQuantum Chemistry 115, 1075 (2015).

[35] G. Ferré, J. B. Maillet, and G. Stoltz, The Journal ofChemical Physics 143, 104114 (2015).

[36] L. Nachbin, The Haar integral (Van Nostrand, Princeton,1965).

[37] H. Schulz-Mirbach, in Proceedings of the 12th IAPR In-

ternational Conference on Pattern Recognition, Vol. 3 -

Conference C: Signal Processing (Cat. No.94CH3440-5

(IEEE, 1994) pp. 387–390 vol.2.[38] B. Haasdonk and H. Burkhardt, Machine Learning 68,

35 (2007).[39] K. Hornik, Neural networks 6, 1069 (1993).[40] A. P. Thompson, L. P. Swiler, C. R. Trott, S. M. Foiles,

and G. J. Tucker, Journal of Computational Physics 285,316 (2015).

[41] S. De, A. P. Bartók, G. Csányi, and M. Ceriotti, PhysicalChemistry Chemical Physics 18, 13754 (2016).

[42] P. Rowe, G. Csányi, D. Alfè, and A. Michaelides,arXiv.org (2017), 1710.04187v2.

[43] T. W. Anderson, The Annals of Mathematical Statistics17, 409 (1946).

[44] A. T. James, Proc. R. Soc. Lond. A 229, 367 (1955).[45] M. Rupp, A. Tkatchenko, K. R. Müller, and O. A. von

Lilienfeld, Physical Review Letters 108, 058301 (2012).[46] B. Huang and O. A. von Lilienfeld, The Journal of Chem-

ical Physics 145, 161102 (2016).[47] O. A. von Lilienfeld, R. Ramakrishnan, M. Rupp, and

A. Knoll, International Journal of Quantum Chemistry115, 1084 (2015).

Page 13: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

13

[48] M. I. Mendelev, S. Han, D. J. Srolovitz, G. J. Ackland,D. Y. Sun, and M. Asta, Philosophical Magazine 83,3977 (2003).

[49] J. P. Perdew, K. Burke, and M. Ernzerhof, PhysicalReview Letters 77, 3865 (1996).

[50] C. Zeni, K. Rossi, A. Glielmo, N. Gaston, F. Baletto,and A. De Vita, arXiv.org (2018), 1802.01417v1.

APPENDIX

A. Kernel order by explicit differentiation

We first prove that the kernel given in Eq. (4) is 2-bodyin the sense of Eq. (2). For this it is sufficient to show thatits second derivative with respect to the relative positionof two different atoms of the target configuration ⇢ alwaysvanishes. The first derivative is

@k2(⇢, ⇢0)

@ri1=

X

ij

@

@ri1e�kri�r0jk

2/4�2

=X

ij

e�kri�r0jk2/4�2 (ri � r0

j)

2�2�ii1

=X

j

e�kri1�r0jk2/4�2 (ri1 � r0

j)

2�2.

This depends only on the atom located at ri1 of the con-figuration ⇢. Thus, differentiating with respect to therelative position ri2 of any other atom of the configura-tion gives the relation in Eq. (2) for 2-body kernels:

@2k2(⇢, ⇢0)

@ri1@ri2= 0.

We next show that the kernel defined in Eq. (6) is an n-body in the sense of Eq. (2). This follows naturally fromthe result above, given that kn is defined as kn = k

n�12 .

We can thus write down its first derivative as

@kn

@ri1= (n� 1)kn�2

2

@k2

@ri1.

Since the second derivative of k2 is null, the second de-rivative of kn is simply

@2kn

@ri1@ri2= (n� 2)(n� 1)kn�3

2

@k2

@ri1

@k2

@ri2

and after n� 1 derivations we similarly obtain

@2kn�12

@ri1 · · · @rin= (n� 1)! k02

@k2

@ri1. . .

@k2

@rin�1

.

Since k02 = 1, the final derivative with respect to the nth

particle position rin is zero as required by Eq. (2).

B. 1D n0-body model

To test the ideas behind the n-body kernels, we used a1D n

0-particle model reference system where a (“central”)particle is kept fixed at the coordinate axis origin (con-sistent with the local configuration convention of Eq. (3)).The energy of the central particle is defined as

f =X

i1...in0�1

J xi1 . . . xin0�1

where {xip}n0�1

p=1 are the relative positions of n0� 1

particles, and J is an interaction constant.

To generate Figure 1 a large set of configurations wasgenerated by uniformly and independently sampling eachrelative position xip within the range (�0.5, 0.5). Theenergy of the central particle of each configuration wasthen given by the above equation, with the interactionconstant J set to 0.5. In order to analyse the convergedproperties of the n-body kernels presented, large trainingsets (N = 1000) were used.

C. Databases details

The bulk Ni and Fe databases were obtained from sim-ulations using a 4⇥ 4⇥ 4 periodically repeated unit cell,modelling the electronic exchange and correlation inter-actions via the PBE/GGA approximation [49], and con-trolling the temperature (set at 500K) by means of aweakly-coupled Langevin thermostat (the DFT trajector-ies are available from the KCL research data managementsystem at the link http://doi.org/10.18742/RDM01-92).The C database comprises bulk diamond and AB- andABC-stacked graphene layer structures. These struc-tures were obtained from DFT simulations at varyingtemperatures and pressures, using a fixed 3⇥3⇥2 peri-odic cell geometry for graphite, and simulation cellsranging from 1⇥1⇥1 to 2⇥2⇥2 unitary cells for dia-mond, the relative DFT trajectories can be found inthe “libAtoms” data repository via the following linkhttp://www.libatoms.org/Home/DataRepository. Crys-talline and amorphous Si database was obtained from amicrocanonical DFTB 64-atom simulation carried out inperiodic boundaries, with average kinetic energy corres-ponding to a temperature of T = 650K. The resultspresented for the Ni cluster are reported from Ref. [50]and correspond to constant temperature DFT MD runs(T = 300K) of a particular Ni19 geometry (named“4HCP” in the article). The radial cutoffs used to createthe local environments for the four elements consideredare: 4.0 Å (Ni), 4.45 Å (Fe), 3.7 Å (C) and 4.5 Å (Si).

Page 14: Efficient nonparametric n-body force fields from machine learning · Since their conception, first-principles molecular dy-namics (MD) simulations [1] based on density func-tional

14

D. Details on the kernels used and on theexperimental methodology

All energy kernels presented in the work can be usedto learn/predict forces after generating a standard deriv-ative kernel (Ref. [22], section 9.4, also cf. Section II Aof the main text.) In particular, for each scalar energykernel k a matrix-valued force kernel K can be readilyobtained by double differentiation with respect to thepositions (r0 and r00) of the central atoms in the targetand database configurations ⇢ and ⇢

0:

K(⇢, ⇢0) =@2k(⇢, ⇢0)

@r0@r0T0.

The kernels k̃2 and k̃3 (Eqs. (19,20)) were chosen assimple squared exponentials in the tests shown. Notingas q (or q) the vector (or scalar) containing the effectivedegrees of freedom of the atomic n-plet considered (seeEq. (21)), the two kernels read:

k̃2(qi, q0j) = e�(qi�q

0j)

2/2�2

k̃3(qi1,i2 ,q0j1,j2

) =X

P2Pc

e�kqi1,i2�Pq0j1,j2

k2/2�2

,

where Pc (|Pc| = 3) is the set of cyclic permutations ofthree elements. Summing over the permutation group isneeded to guarantee permutation symmetry of the en-ergy. As discussed in the main text, the exact form ofthese low-order kernels is not essential as the large data-base limit is quickly reached.

The many-body force kernel referred to in Fig. 5 wasbuilt as a covariant discrete summation of the many-bodyenergy base-kernel (7) over the O48 crystallographic pointgroup, using the procedure of Ref. [16]. This procedureyields an approximation to the full covariant integral ofthe many-body kernel (7) given in Eq. (5).

In order to obtain Figure 5, repeated (randomised)realisations of the same learning curves were performed.The points (and error bars) plotted are the means (andstandard deviations) of the generated data . The kernelhyperparameters were independently optimised by crossvalidation for each dataset.