Classical Simulations Theory

Classical simulations theory Accurate simulation of atomic and molecular systems generally involves the application of quantum mechanical theory. However, quantum mechanical techniques are computationally expensive and are usually only applied to small systems containing between 10 and 100 atoms, or small molecules. It is not practical to model large systems such as a condensed polymer containing many thousands of monomers in this way. Even if such a simulation were possible, in many cases much of the information generated would be discarded. This is because in simulating large systems, the goal is often to extract bulk (statistical) properties, such as diffusion coefficients or Young's moduli, which depend on the location of the atomic nuclei or, more often, an average over a set of atomic nuclei configurations. Under these circumstances the details of electronic motion are lost in the averaging processes, so bulk properties can be extracted if a good approximation of the potential in which atomic nuclei move is available and if there are methods that can generate a set of system configurations which, while they may not follow the exact dynamics of the nuclei, are statistically consistent with a full quantum mechanical description.

There are a number of potentials (or forcefields) and distribution generating techniques available and they are collectively referred to as classical simulation methods. The term classical is used because some of the earliest simulations generated configurations by integrating the Newtonian (Classical) equations of motion and this approach is still widely used. The material in this section gives a general overview of the principal elements of classical simulation, while the detailed implementation of these techniques used by Discover and Forcite are documented separately.

Further information

Forcefields Structure preparation Geometry optimization Dynamics

Forcefields A crucial part of any simulation is the choice of forcefield. The forcefield describes approximately the potential energy hypersurface on which the atomic nuclei move. Forcefields are usually tuned for particular groups of systems, so the choice of forcefield will depend on the type of structure that is being investigated.

The following topics describe forcefields in more detail:

z The potential energy surface z The forcefield z The energy expression z Modeling periodic systems z Supported forcefields

Further information

Preparing the structure Geometry optimization Dynamics

1/109Classical simulations theory

2006-8-27file://C:\Documents and Settings\Owner\Local Settings\Temp\~hh7BDC.htm

The potential energy surface The complete mathematical description of a molecule, including both quantum mechanical and relativistic effects, is a formidable challenge, due to the small scales and large velocities involved. Therefore in the following discussion, these intricacies are ignored. Instead, the focus is on general concepts. This is possible because molecular mechanics and dynamics are based on empirical data that implicitly incorporate all the relativistic and quantum effects.

Since no complete relativistic quantum mechanical theory is suitable for the description of molecules, this discussion starts with the nonrelativistic, time-independent form of the Schrdinger description:

Eq. 1 The Schrdinger equation

where H is the Hamiltonian for the system, is the wavefunction, and E is the energy. In general, is a function of the coordinates of the nuclei (R) and of the electrons (r).

The Born-Oppenheimer approximation Although the Schrdinger equation is quite general, it is too complex to be of any practical use, so approximations are made. Noting that the electrons are several thousands of times lighter than the nuclei and therefore move much faster, Born and Oppenheimer (1927) proposed the following approximation: the motion of the electrons can be decoupled from that of the nuclei, giving two separate equations. The first of these equations describes the electronic motion:

Eq. 2 Equation for electronic motion, or the potential energy surface

and depends only parametrically on the positions of the nuclei.

Note. This equation defines an energy E(R), which is a function of the coordinates of the nuclei only. This energy is usually called the potential energy surface.

The second equation describes the motion of the nuclei on this potential energy surface E(R):

Eq. 3 Equation for nuclear motion on the potential energy surface

The direct solution of Eq. 2 is the province of ab initio quantum chemical codes such as Gaussian, CADPAC, Hondo, GAMESS, DMol, and Turbomole. Semiempirical codes such as ZINDO, MNDO, MINDO, MOPAC, and AMPAC also solve Eq. 2, but they approximate many of the integrals required with empirically fitted functions. The common feature of these programs is that they solve for the electronic wavefunction and energy as a function of nuclear coordinates. In contrast, simulation engines provide an empirical fit to the potential energy surface.



Empirical fit to the potential energy surface Solving Eq. 3 is important if you are interested in the structure or time evolution of a model. As written, Eq. 3 is the Schrdinger equation for the motion of the nuclei on the potential energy surface. In principle, Eq. 2 could be solved for the potential energy E, and then Eq. 3 could be solved. However, the effort required to solve Eq. 2 is extremely large, so usually an empirical fit to the potential energy surface, commonly called a forcefield (V), is used. Since the nuclei are relatively heavy objects, quantum mechanical effects are often insignificant, in which case Eq. 3 can be replaced by Newton's equation of motion:

Eq. 4 Newton's equation of motion

The solution of Eq. 4 using an empirical fit to the potential energy surface E(R) is called molecular dynamics. Molecular mechanics ignores the time evolution of the system and instead focuses on finding particular geometries and their associated energies or other static properties. This includes finding equilibrium structures, transition states, relative energies, and harmonic vibrational frequencies.

Further information

The forcefield The energy expression Modeling periodic systems Forcefields

The forcefield The purpose of a forcefield is to describe the potential energy surface of entire classes of molecules with reasonable accuracy. In a sense, the forcefield extrapolates from the empirical data of the small set of models used to parameterize it, a larger set of related models. Some forcefields aim for high accuracy for a limited set of elements, thus enabling good predictions of many molecular properties. Others aim for the broadest possible coverage of the periodic table, with necessarily lower accuracy.

Components of a forcefield The forcefield contains all the necessary elements for calculations of energy and force:

z A list of forcefield types z A list of partial charges z Forcefield-typing rules z Functional forms for the components of the energy expression z Parameters for the function terms z For some forcefields, rules for generating parameters that have not been explicitly defined z For some forcefields, a way of assigning functional forms and parameters

This package for the empirical fit to the potential energy surface is the forcefield.



Coordinates, terms, functional forms The forcefields commonly used for describing molecules employ a combination of internal coordinates and terms (bond distances, bond angles, torsions, etc.), to describe that part of the potential energy surface due to interactions between bonded atoms, and non-bond terms to describe the van der Waals, electrostatic, etc. interactions between atoms. The functional forms range from simple quadratic forms to Morse functions, Fourier expansions, Lennard-Jones potentials, etc.

Physical significance The physical significance of most of the types of interactions in a forcefield are easily understood, since describing a model's internal degrees of freedom in terms of bonds, angles, and torsions seems natural. The analogy of vibrating balls connected by springs used to describe molecular motion is equally familiar. However, it must be remembered that such models have limitations. Consider for example the difference between such a mechanical structure and a quantum mechanical bond.

Quantum and mechanical descriptions of bonds Covalent bonds can, to a first approximation, be described by an harmonic oscillator, both in quantum and classical mechanical theory. Consider the classic oscillator in Figure 1. A ball poised at the intersection of the pale horizontal line with the parabolic energy surface (thick line) would begin to roll down, converting its potential energy to kinetic energy and achieving a maximum velocity as it passes the minimum. Its velocity (kinetic energy) is then converted back into potential energy until, at the exact same height as it had started, it would pause momentarily before rolling back. The interchange of kinetic and potential energy in such a mechanical system is familiar and intuitive.

The probability of finding the ball at any point along its trajectory is inversely proportional to its velocity at that point (which is opposite to the probability for a real atom). This probability is plotted above the parabolic curve (thin line, Figure 1). The probability is greatest near the high-energy limits of its trajectory (where it is moving slowly) and lowest at the energy minimum (where it is moving quickly). Because the total energy cannot exceed the initial potential energy defined by the starting point, the probability drops to zero outside the limit defined by the intersection of the total energy (pale horizontal line) with the parabola.

Figure 1. Energy and probability of a mechanical and quantum particle in a harmonic energy well. The energy is indicated by the heavy lines and probability by the thin lines. The total energy of the system is indicated by the pale horizontal line. The classical (mechanical) probability is highest when the particle reaches it maximum potential energy (zero velocity) and drops to zero between these points. The quantum mechanical probability is highest where the potential energy is lowest, and there is a finite probability that the particle can be found outside the classical limits (pale vertical lines).



Describing a quantum mechanical 'trajectory' is impossible, because the uncertainty principle prevents an exact, simultaneous specification of both position and momentum. However, the probability that the quantum mechanical ball will be at a given point on the parabola can be quantified. The quantum mechanical probability function plotted in the right panel of Figure 1 is very different from that of the mechanical system. First, the highest probability is at the energy minimum, which is the opposite of the mechanical case. Second, the quantum mechanical ball can actually be found beyond the classical limits imposed by the total energy of the system (tunneling). Both these properties can be attributed to the uncertainty principle.

Utility of the forcefield approach Given that there is such a difference between the qualitative pictures of these two fundamental physical principles, is it reasonable to use a mechanical approach for obviously quantum mechanical entities like bonds? In practice, many experimental properties such as vibrational frequencies, sublimation energies, and crystal structures can be reproduced with a forcefield, not because the systems behave mechanically, but because the forcefield is fit to reproduce relevant observables and therefore includes most of the quantum effects empirically. Nevertheless, it is important to appreciate the fundamental limitations of a mechanical approach.

Applications beyond the capability of most forcefield methods include:

z Electronic transitions (photon absorption) z Electron transport phenomena z Proton transfer (acid/base reactions)

The true power of the atomistic description of a structure embodied in the energy expression lies in three major areas:

1. The first is that forcefield-based simulations can handle large systems, since these simulations are several orders of magnitude faster (and cheaper) than quantum-based calculations.



Forcefield-based simulations can be used for studying condensed-phase molecules, macromolecules, crystal morphology, inorganic and organic interphases, etc., where the properties of interest are not sensitive to quantum effects (e.g., phase behavior, equations of state, bond energies, etc.).

2. The second is the analysis of the energy contributions at the level of individual or classes of interactions. For instance, you can decompose the energy into bond energies, angle energies, non-bond energies, etc. or even to the level of a specific hydrogen bond or van der Waals contact, in order to understand a physical observable or to make a prediction.

3. The third area, which is described under Applying constraints, lies in the modification of the energy expression to bias the calculation. You can impose constraints (absolute conditions), such as fixing an atom in space and not allowing it to move. (You can also apply constraints for quantum-based energy calculations.)

Further information

The potential energy surface The energy expression Modeling periodic systems Forcefields

The energy expression The coordinates of a structure combined with a forcefield create an energy expression (or target function). This energy expression is the equation that describes the potential energy surface of a particular structure as a function of its atomic coordinates.

The potential energy of a system can be expressed as a sum of valence (or bond), crossterm, and non-bond interactions:

Eq. 5

Etotal=Evalence + Ecrossterm + Enon-bond

Valence interactions The energy of valence interactions is generally accounted for by diagonal terms:

z bond stretching (bond) z valence angle bending (angle) z dihedral angle torsion (torsion) z inversion, also called out-of-plane interactions (oop) terms, which are part of nearly all

forcefields for covalent systems z A Urey-Bradley (UB) term may be used to account for interactions between atom pairs

involved in 1-3 configurations (i.e., atoms bound to a common atom):

Eq. 6

Evalence=Ebond + Eangle + Etorsion + Eoop + EUB

Valence cross terms



Modern (second-generation) forcefields generally achieve higher accuracy by including cross terms to account for such factors as bond or angle distortions caused by nearby atoms. These terms are required to accurately reproduce experimental vibrational frequencies and, therefore, the dynamic properties of molecules. In some cases, research has also shown them to be important in accounting for structural deformations. Cross terms can include the following: stretch-stretch, stretch-bend-stretch, bend-bend, torsion-stretch, torsion-bend-bend, bend-torsion-bend, stretch-torsion-stretch.

Note. Cross terms can become unstable when the structure is significantly distorted and this can lead to optimization returning unrealistic geometries because the optimizer has become stuck in a local minimum that is an artifact of the cross terms. It these cases it is best to pre-optimize the structure with a simpler forcefield that does not have cross terms and refine the structure with a more modern forcefield.

Non-bond interactions The energy of interactions between non-bonded atoms is accounted for by:

z van der Waals (vdW) z electrostatic (Coulomb) z hydrogen bond (hbond) terms in some older forcefields

Eq. 7

Enon-bond=EvdW + ECoulomb + Ehbond

Constraints Constraints that can be added to an energy expression include distance, angle, torsion, and inversion constraints. Constraints are useful if, for example, you are interested in only part of a structure. For information on constraints, their implementation and use, see the documentation for the particular simulation engine.

As a simple example of a complete energy expression, consider the following equation, which might be used to describe the potential energy surface of a water structure:

Eq. 8 Example energy expression for water

In this example, the forcefield defines:

z bond lengths (b) and angles (). z The functional form (a simple quadratic in both types of coordinates) z the force constants (K) z the reference O-H bond length (b0)and H-O-H angle ( 0) are the values for an ideal O-H bond

and H-O-H angle at zero energy, which is not necessarily the same as their equilibrium values in a real water molecule.

Eq. 8 is an example of an energy expression as set up for a simple molecule. Eq. 9 is an example of the corresponding general, summed forcefield function:



Eq. 9 Example forcefield function

The first four terms in this equation are sums that reflect the energy needed to:

z stretch bonds (b) z bend angles () away from their reference values z rotate torsion angles () by twisting atoms about the bond axis that determines the torsion

angle z distort planar atoms out of the plane formed by the atoms they are bonded to ().

The next five terms are cross terms that account for interactions between the four types of internal coordinates.

The final term represents the non-bond interactions as a sum of repulsive and attractive Lennard-Jones terms as well as Coulombic terms, all of which are a function of the distance (rij) between atom pairs.

The forcefield defines the functional form of each term in this equation as well as the parameters such as Db, a, and b0. It also defines internal coordinates as a function of the Cartesian atomic coordinates, although this is not explicit in Eq. 9.

Note. The energy expression in Eq. 9 is cast in a general form. The true energy expression for a specific structure includes information about the coordinates that are included in each sum. For example, it is common to exclude interactions between bonded and 1-3 atoms in the summation representing the non-bond interactions. Thus, a true energy expression might actually use a list of allowed interactions rather than the full summation implied in Eq. 9.

Further information

The potential energy surface The forcefield Modeling periodic systems Forcefields

Modeling periodic systems



The term Periodic boundary conditions refers to the simulation of structures consisting of a periodic lattice of identical subunits. By applying periodic boundaries to simulations, the influence, for example, of bulk solvent or crystalline environments can be included, thereby improving the rigor and realism of a structure.

MS Modeling modules detect whether a system is periodic and displays the appropriate controls or parameters in the interface.

Models are specified in Cartesian space Some simulation engines accept only Cartesian coordinates, not crystal coordinates (others are able to convert between the two systems). This is important when using asymmetric space groups, since the symmetry operators assume that the input coordinates correspond to the standard asymmetric unit as defined in the International Tables for Crystallography (Reidl, 1983).

In general, it is assumed that the x Cartesian axis corresponds to the a crystal axis and that the b axis lies in the x,y plane (see Figure 2).

Figure 2. Relationship between Cartesian coordinate system (xyz) and periodic system (abc)

Minimum-image structure Note. For periodic systems in which non-bond interactions dominate, the Ewald sum method is preferred over the minimum-image convention.

The lefthand side of Figure 3 shows a solute molecule surrounded by enough solvent to occupy the volume (and shape) of a cube. A simulation carried out on this isolated cubic system is a poor approximation of what would happen in a true bulk solvent environment. For example, the solute can diffuse toward a surface or solvent molecules can evaporate. To remedy this, on the right of Figure 3 the cube is replicated in three dimensions to form a 3 x 3 x 3 lattice of identical cubes. This is a much better representation of bulk solvent for the interior cube, because molecules near the surfaces now interact with solvent in adjacent cubes. The imaged atoms are used to calculate energies and forces on the real atoms in the interior cube. The energies and forces on the imaged atoms themselves are not calculated because their motions are computed as symmetry operations on the real atoms, for example, by translations along the cubic axes.



Figure 3. Solute surrounded by solvent A solute surrounded by an isolated cube of solvent is replicated periodically in three dimensions in order to better represent a bulk or crystalline environment.

Consider the implications of this structure for a specific case. In Figure 4, molecule A1 is located near an edge of the square. (For simplicity, this discussion focuses on a two-dimensional lattice.) In addition, eight images of A1 (A2-A9) are present in the adjacent symmetrically related squares. Consider the interactions of molecules A with molecules B. The closest image of B to A1 is actually not B1, but rather B5. If molecules in the interior cell are allowed to interact only with the molecule or molecular image closest to it, this is called a minimum-image structure. Each molecule interacts only with those molecules and images within a distance of half the cell size. The advantage of this approach is its simplicity. It is straightforward to compute energy between a given pair of molecules without explicitly keeping track of the images in neighboring cells. All periodic boundary algorithms imply a cutoff criterion, but the minimum-image convention implies a maximum distance for this cutoff of no more than half the cell dimensions.

Figure 4. Minimum-image structure Minimum-image structure showing that each real molecule interacts with at most only one image of each real molecule.

For a more detailed description of the minimum-image convention, see also Allen and Tildesley, 1987.

Explicit-image structure Simulation engines can also use the more general approach of generating explicit images of the



interior molecules, so-called ghost molecules, which interact with the interior molecules. These ghost molecules are repeated to whatever distance is necessary (but no farther than necessary) to satisfy the desired potential energy cutoff criteria.

The lefthand side of Figure 5 shows molecule A1 interacting with several images of B (B1, B2, B3, B5) within the specified cutoff radius (shown as a shaded circle centered on A1). A1 interacts with several of its own images as well (A3, A5, A6, A8).

Figure 5. Explicit-image structure Explicit-image structure showing how a cutoff distance defines which molecules in adjacent unit cells are selected as ghost images. (Different cutoff distances are used in the left and right figures.) Left: explicit-image model--a larger cutoff including interactions with more images is possible than with the minimum-image convention; right: the shaded region identifies which molecules are selected as ghost images within the cutoff distance of any molecules in the unit cell.

The right-hand side of Figure 5 shows which molecules in the adjacent unit cells become explicit ghost molecules for a given cutoff distance. Not every molecule in an adjacent cell becomes a ghost. However, if a cutoff distance that is longer than the cell length is used, ghosts from unit cells beyond the nearest neighbor cells may be included. As molecules move in and out of the boundaries, those that are ghosts can change. Therefore, the ghost list is regenerated periodically.

Non-bond interactions do not have to be calculated between ghost atoms. This helps to reduce computation time.

When group-based cutoffs are used, the non-bond potential is cut off on the basis of charge groups (i.e., only if two groups are within the cutoff is the interaction calculated), and only those groups in molecular ghosts that are within the cutoff distance of a real group are included in the ghost atom list.

Ghost molecules follow their symmetrically related counterparts. However, when it comes time to move the molecules (in a dynamics step or minimization iteration), only the real molecules (A1 and B1) are actually moved according to the accumulated forces each molecule has felt. The ghost molecule positions are simply regenerated by applying the defined symmetry relations to the new positions.

Perfect symmetry is maintained between the primary structure and all its image objects. For many applications, this condition is satisfactory. However, it is not possible to study, for example, cooperative changes between image objects.



To maintain all molecules in the central cell, image centering is used. Molecules that happen to migrate to an edge of the primary structure and would appear in one of its image objects instead reappear in the primary structure from the opposite direction. Thus a constant number of atoms is maintained and no molecules are lost, no matter how far they may diffuse during the calculation.

Bonds across boundaries Allowing bonds (with additional energy terms including angles, dihedrals, and improper dihedrals) between the primary atoms and image atoms enables you to study polymers such as DNA.

Further information

The potential energy surface The forcefield The energy expression Forcefields

Supported forcefields The following forcefields are available in the classical simulations modules in MS Modeling:

z Consistent forcefields - PCFF and COMPASS: The consistent family of forcefields (CFF91, PCFF, CFF and COMPASS) are closely related second-generation forcefields. They were parameterized against a wide range of experimental observables for organic compounds containing H, C, N, O, S, P, halogen atoms and ions, alkali metal cations, and several biochemically important divalent metal cations. PCFF is based on CFF91, extended so as to have a broad coverage of organic polymers, (inorganic) metals, and zeolites. COMPASS is a new version of PCFF.

z CVFF and CVFF_aug: The CVFF, consistent valence forcefield is a classic forcefield having some anharmonic and cross term enhancements. As the traditional default forcefield in the Discover program, it has been used extensively and can be considered well tested and characterized. CVFF was parameterized to reproduce peptide and protein properties. CVFF_aug is an augmented version of CVFF that includes non-bond parameters (Born model) for additional forcefield types that are useful for simulations of silicates, aluminosilicates, clays and aluminophosphates.

z Dreiding: A good robust all purpose forcefield. Specialized forcefields are more accurate for predicting a limited number of structures, the Dreiding forcefield allows reasonable predictions for a much larger number of structures, including those with novel combinations of elements and those for which there is little or no experimental data. It can be used for structure prediction and dynamics calculations on organic, biological and main-group inorganic molecules.

z Universal: Excellent general purpose forcefield. The parameters are generated from a set of rules based on element, hybridization and connectivity. The Universal forcefield was parameterized for the full periodic table and has been carefully validated for the main-group compounds, organic molecules and metal complexes.

Forcefield Applications Classical simulations of models using these forcefields are possible with the Discover, Forcite or



Polymorph modules. The forcefields these modules can load and use are as follows:

z Discover can be used with the COMPASS, PCFF, CVFF and CVFF_aug forcefields. z Forcite can be used with the COMPASS, PCFF, CVFF, Dreiding and Universal forcefields. z Polymorph can be used with the same forcefields as Forcite.

Further information

Forcefields The potential energy surface The forcefield The energy expression Modeling periodic systems Assigning forcefield types

Consistent forcefields - PCFF and COMPASS All the consistent forcefields (CFF91, CFF, PCFF, COMPASS) have the same functional form, differing mainly in the range of functional groups to which they were parameterized (and therefore, having slightly different parameter values). These differences can be examined in the forcefield files. Atom equivalences for assignment of parameters to forcefield types may also differ, as may some combination rules for non-bond terms.

The analytic expressions used to represent the energy surface are shown below:



Both anharmonic diagonal terms and many crossterms are necessary for a good fit to a variety of structures and relative energies, as well as to vibrational frequencies.

The CFF forcefields employ quartic polynomials for bond stretching (Term 1) and angle bending (Term 2) and a three-term Fourier expansion for torsions (Term 3). The out-of-plane (also called



inversion) coordinate (Term 4) is defined according to Wilson et al., (1980). All the crossterms up through third order that have been found to be important (Terms 5-11) are also included. This gives a forcefield equivalent to the best used in a formate anion test case (Maple et al., 1990). Term 12 is the Coulombic interaction between the atomic charges and Term 13 represents the van der Waals interactions, using an inverse 9th-power term for the repulsive part rather than the more customary 12th-power term.

No explicit special forcefield types are used for carbons in strained three- and four-membered rings. The quartic angle potential, combined with crossterms, enables accurate description of normal alkanes, cyclobutane, and cyclopropane with one set of parameters.

PCFF forcefield

Applicability

PCFF was developed based on CFF91 and is intended for application to polymers and organic materials. It is useful for polycarbonates, melamine resins, polysaccharides, other polymers, organic and inorganic materials, about 20 inorganic metals, as well as for carbohydrates, lipids, and nucleic acids and also cohesive energies, mechanical properties, compressibilities, heat capacities, elastic constants. It handles electron delocalization in aromatic rings by means of a charge library rather than bond increments.

Validation

Parameterization, testing, and validation of PCFF included the compounds listed for CFF91 and these functional groups: carbonates, carbamates, phosphazene, urethanes, siloxanes, silanes, ureas (Sun et al., 1994; Sun, 1994 & 1995), and zeolites (Hill and Sauer, 1994). Metal parameters were derived by fitting to crystal structures and elastic constants.

Forcefield types

PCFF has parameters for functional groups that consist of those for CFF91 and also He, Ne, Kr, Xe. In addition, it includes Lennard-Jones parameters for the metals Li, K, Cr, Mo, W, Fe, Ni, Pd, Pt, Cu, Ag, Au, Al, Sn, Pb. Forcefield type coverage in PCFF includes those for CFF91 and the atoms listed here.

COMPASS forcefield

A high quality general forcefield

COMPASS (Condensed-phase Optimized Molecular Potentials for Atomistic Simulation Studies) represents a technology break-through in forcefield method. It is the first ab initio forcefield that enables accurate and simultaneous prediction of gas-phase properties (structural, conformational, vibrational, etc.) and condensed-phase properties (equation of state, cohesive energies, etc.) for a broad range of molecules and polymers. It is also the first high quality forcefield to consolidate parameters of organic and inorganic materials.

Parameterization

COMPASS is an ab initio forcefield. Most parameters were derived based on ab initio data. Generally speaking, the parameterization procedure can be divided into two phases: ab initio parameterization and empirical optimization. In the first phase, partial charges and valence



parameters were derived by fitting to ab initio potential energy surfaces. At this point, the van der Waals parameters were fixed to a set of initial approximated parameters. In the second phase, emphasis is on optimizing the forcefield to yield good agreement with experimental data. A few critical valence parameters were adjusted based on the gas phase experimental data. More importantly, the van der Waals parameters were optimized to fit the condensed-phase properties. For covalent molecular systems, this refinement was done based on molecular dynamics simulations of liquids; for inorganic systems, this is based on energy minimization on crystals.

Validation

The parameters for covalent molecules have been thoroughly validated using various calculation methods including extensive MD simulations of liquids, crystals, and polymers. (Sun, 1998; Sun et al., 1998; Rigby et al., 1998). For the inorganic materials, validations of COMPASS were performed based on energy minimization method.

Applicability

The COMPASS forcefield has broad coverage in covalent molecules including most common organics, small inorganic molecules, and polymers. For these molecular systems, the COMPASS forcefield has been parameterized to predict various properties for molecules in isolation and in condensed phases. The properties include molecular structures, vibrational frequencies, conformation energies, dipole moments, liquid structures, crystal structures, equations of state, and cohesive energy densities. The latest development in COMPASS extended the coverage to include inorganic materials: metals, metal oxides, and metal halides using various non-covalent models. Currently, some of these materials have been parameterized. COMPASS is able to predict various solid-state properties: unit cell structures, lattice energies, elastic constants and vibrational frequencies. The combination of parameters for organics and for inorganics opens up the possibility of future study of interfacial and mixed systems.

Further information

Supported forcefields CVFF forcefield Dreiding forcefield UFF forcefield Assigning forcefield types

CVFF forcefield The consistent-valence forcefield (CVFF), the original forcefield provided with the Discover program, is a generalized valence forcefield (Dauber-Osguthorpe et al., 1988). Parameters are provided for amino acids, water and a variety of other functional groups.

The augmented CVFF was developed for materials science applications and is provided with Discover in Materials Studio. It includes additional atom types for aluminosilicates and aluminophosphates.

CVFF also has the ability to use automatic parameters (Automatic assignment of values for missing parameters) when no explicit parameters are present. These are noted in the output file from the calculation.

Applicability



CVFF was fit to small organic (amides, carboxylic acids, etc.) crystals and gas phase structures. It handles peptides, proteins, and a wide range of organic systems. As the default forcefield in Discover, it has been used extensively for many years. It is primarily intended for studies of structures and binding energies, although it predicts vibrational frequencies and conformational energies reasonably well.

Functional form The analytic form of the energy expression used in CVFF is shown below:

Types of terms and computational costs

Terms 1-4 are commonly referred to as the diagonal terms of the valence forcefield and represent the energy of deformation of bond lengths, bond angles, torsion angles, and out-of-plane interactions, respectively.

A Morse potential (Term 1) is used for the bond-stretching term. The Discover program also supports a simple harmonic potential for this term. The Morse form is computationally more expensive than the harmonic form. Since the number of bond interactions is usually negligible relative to the number of nonbond interactions, the additional cost of using the more accurate Morse potential is insignificant, so this is the default option.

When not to use the Morse term

When the model being simulated is high in energy (caused, for example, by overlapping atoms or a high target temperature), a Morse-style function might allow bonded atoms to drift unrealistically far apart (see Figure 1). This would not be desirable unless you were intending to study bond breakage.

Figure 1. Morse vs. harmonic potentials A: Morse potential for a C-H bond; B: harmonic potential for a C-H bond. The Morse potential allows a bond to stretch to an unrealistic length.



Use of crossterms

Terms 5-9 are off-diagonal (or cross) terms and represent couplings between deformations of internal coordinates. For example, Term 5 describes the coupling between stretching of adjacent bonds.

These terms are required to accurately reproduce experimental vibrational frequencies and, therefore, the dynamic properties of molecules. In some cases, research has also shown them to be important in accounting for structural deformations. However, crossterms can become unstable when the structure is far from a minimum.

Non-bond interactions

Terms 10-11 describe the non-bond interactions. Term 10 represents the van der Waals interactions with a Lennard-Jones function. Term 11 is the Coulombic representation of electrostatic interactions. The dielectric constant, , can be made distance dependent (i.e., a function of rij).

In the CVFF forcefield, hydrogen bonds are a natural consequence of the standard van der Waals and electrostatic parameters, and special hydrogen bond functions do not improve the fit of CVFF to experimental data (Hagler et al., 1979a & 1979b).

CVFF forcefield types The CVFF forcefield supplied by Accelrys defines atom types for the 20 commonly occurring amino acids, most hydrocarbons, and many other organic models.

It automatically supplies generic parameters when specific parameters are not found.

Augmented CVFF

The augmented version of CVFF includes non-bond parameters (Born model) for additional forcefield types that are useful for simulations of silicates, aluminosilicates, clays and aluminophosphates. These added parameters were derived using Ewald summation for nonbond interactions between the additional atom types.

Partial charges

The bond increment section of the .frc file for CVFF has been expanded so that partial charges can be determined whenever forcefield types can be assigned automatically.

Further information



Supported forcefields Consistent forcefields - PCFF and COMPASS Dreiding forcefield UFF forcefield Assigning forcefield types

Dreiding forcefield General force constants and geometry parameters for the Dreiding forcefield are based on simple hybridization rules rather than on specific combinations of atoms. The Dreiding forcefield does not generate parameters automatically in the way that UFF does. Instead, explicit parameters were derived by a rule-based approach.

Functional form The Dreiding forcefield is a purely diagonal forcefield with harmonic valence terms and a cosine-Fourier expansion torsion term. The umbrella functional form is used for inversions, which are defined according to the Wilson out-of-plane definition. The van der Waals interactions are described by the Lennard-Jones potential. Electrostatic interactions are described by atomic monopoles and a screened (distance-dependent) Coulombic term. Hydrogen bonding is described by an explicit Lennard-Jones 12-10 potential (Mayo et al., 1990).

Coverage of the periodic table The Dreiding forcefield has good coverage for organic, biological and main-group inorganic molecules. It is only moderately accurate for geometries, conformational energies, intermolecular binding energies and crystal packing.

Forcefield types Atom typing in the Dreiding forcefield is straightforward. An forcefield type is denoted by a name of up to five characters:

z The first two characters are the elemental symbol (for example, C_ for carbon, Sn for tin). z The third character (if present) represents the hybridization state (for example, 1 = linear, sp1;

2 = trigonal, sp2; 3 = tetrahedral, sp3; and R = an sp2 atom involved in resonance). z The fourth character (if present) indicates the number of implicit hydrogen atoms (for

example, C_R2 is a resonant carbon with two implicit hydrogens). z The fifth character (if present) is reserved to indicate other special characteristics (for example,

H___A denotes a hydrogen atom that is capable of forming a hydrogen bond).

Further information

Supported forcefields Consistent forcefields - PCFF and COMPASS CVFF forcefield UFF forcefield Assigning forcefield types

Universal forcefield



MS Modeling contains a full implementation of the Universal forcefield, including bond order assignment. The MS Modeling implementation has been rigorously tested and results are in agreement with published work on this forcefield (Rapp et al., 1992; Casewit et al., 1992a & 1992b; Rapp et al., 1993).

Parameter generation is based on physically realistic rules.

Functional form Universal is a purely diagonal, harmonic forcefield. Bond stretching is described by a harmonic term, angle bending by a three-term Fourier cosine expansion, and torsions and inversions by cosine-Fourier expansion terms. The van der Waals interactions are described by the Lennard-Jones potential. Electrostatic interactions are described by atomic monopoles and a screened (distance-dependent) Coulombic term.

Forcefield types The Universal forcefield forcefield types are denoted by an element name of one or two characters followed by up to three other characters:

z The first two characters are the element symbol (for example, N_ for nitrogen or Ti for titanium).

z The third character (if present) represents the hybridization state or geometry (for example, 1 = linear, 2 = trigonal, R = an atom involved in resonance, 3 = tetrahedral, 4 = square planar, 5 = trigonal bipyramidal, 6 = octahedral).

z The fourth and fifth characters (if present) indicate characteristics such as the oxidation state (for example, Rh6+3 represents octahedral rhodium in the +3 formal oxidation state; H___b indicates a diborane bridging hydrogen type; and O_3_z is a framework oxygen type suitable for zeolites).

Coverage of the periodic table Universal has full coverage of the periodic table. Universal is moderately accurate for predicting geometries and conformational energy differences of organic molecules, main-group inorganics, and metal complexes. It is recommended for organometallic systems and other systems for which other forcefields do not have parameters.

Parameterization The Universal forcefield includes a parameter generator that calculates forcefield parameters by combining atomic parameters. Thus, forcefield parameters for any combination of forcefield types can be generated as required.

The atomic parameters are combined using a prescribed set of equations (rules) that generate forcefield parameters for bond, angle, torsion, inversion (i.e., out-of-plane), and van der Waals and Coulombic energy terms. For further details, including the generator equations, see Rapp et al., 1992.

Dummy atoms are used in complexation and are associated with explicit parameters.

Note. To obtain correct results when using Universal, calculate fractional bond orders after atom typing the structure and before setting up the energy expression.



Charges in the Universal forcefield The Universal forcefield was developed in conjunction with the QEq charge equilibration method (Rapp and Goddard, 1991). Therefore this method of electrostatic charge calculation is highly recommended for use with the Universal forcefield. For more on the QEq charge equilibration algorithm, see the Charging algoritms topic.

Further information

Supported forcefields Consistent forcefields - PCFF and COMPASS CVFF forcefield Dreiding forcefield Assigning forcefield types

Preparing the structure Prior to running a simulation, an energy expression must be created for the structure under investigation. The nature of this energy expression depends on the forcefield selected and on the molecular topology of the structure. Before the energy expression is constructed, each atom in the structure is examined and (based on element type and bonding environment) a forcefield type is assigned. Typically the atomic charges are also assigned when the atoms are typed.

The following topics describe these preliminary steps in more detail:

z Assigning forcefield types z Assigning charges z Applying constraints z Non-bond interactions

Further information

Forcefields Geometry optimization Dynamics

Assigning forcefield types The simulation engine needs the forcefield type of each atom in the structure in order to determine which forcefield parameters to use. Forcefield parameters apply to particular combinations of forcefield types as specified by the forcefield.

Relationship between forcefield types and atoms The forcefield types are related to the microchemical environment of the atoms in a way defined by the particular forcefield. For example, a methane structure has only two forcefield types, one for the carbon and one for the hydrogens, even though each of the atoms may have a distinct atom name for labeling purposes. The hydrogen atoms are equivalent by symmetry; therefore, they would all have the same forcefield type in any forcefield.



A more complicated example is propane, which has four distinct types of atoms: methyl carbon atoms, methyl hydrogen atoms, a methylene carbon atom, and the methylene hydrogens. In principle, a forcefield could consider these to be four distinct forcefield types, but in practice, the chemical difference between the carbon atoms or between the hydrogen atoms is very small, so in most forcefields the carbon atoms are all assigned one forcefield type, and all the hydrogens are assigned another forcefield type.

Assigning forcefield types to a structure Forcefield types need to be assigned to all atoms in a structure before any energy related calculation can take place. Forcefield types are automatically assigned by using a set of rules that link the type to an atom via its element type and its chemical microenvironment (for example, the number and nature of connected atoms).

Forcefields types may be assigned manually, but in general this approach is not recommended. The normal reason to do so would be when the automatic typing has failed. This typically indicates that the forcefield has not been parameterized for the structure. If you do assign forcefield types manually remember to turn off the automatic assignment.

The forcefield type information can also be supplied by a molecular data file such as a .msi file or a .mdf file. These structure files are usually created by modeling programs such as Cerius2, Insight, or QUANTA. Forcefield types are also stored in the MS Modeling structure file (.xsd). Structures that have been typed and then saved in one of these formats do not need to be retyped, unless they have been modified. Charge information is also saved in the structure file.

To make sure that forcefield types are assigned, use the 3D Viewer to display atom labels according to forcefield type.

To ensure that you use the most appropriate forcefield types, you should always check the assigned forcefield types against those listed for the selected forcefield. By default, forcefield types are assigned automatically prior to launching a job. However, for these assignments to work the structure must be built correctly. One of the most critical pieces of information is the bond order, which should be set before the forcefield is assigned.

Note. A newly assigned atom type (including associated parameters such charge) replaces any previously assigned or calculated value.

Further information

Assigning charges Applying constraints Non-bond interactions Preparing the structure

Assigning charges A forcefield charge is simply a value that the forcefield suggests should be assigned to be the partial charge of an atom. Not all forcefields have charges associated with them and the overall neutrality of a structure is not necessarily achieved. When a forcefield supports charges, they will be assigned automatically, by default, at the same time as forcefield types are assigned. You may prefer to assign charges manually.



It is important to assign the correct charge as Electrostatic interactions play a critical role in determining the structures of inorganic systems and the packing of organic molecules.

The forcefields supplied with MS Modeling have been parameterized with nonzero forcefield charges and employ the bond increments approach. Therefore you usually just assign charges automatically when you do forcefield typing, instead of having to assign specific charges.

Forcefield charges can also be supplied by a molecular data file such as a .msi file or a .mdf file. These structure files are usually created by modeling programs such as Cerius2, Insight, or QUANTA. Charges are also stored in the MS Modeling structure file (.xsd). Structures that have been assigned charges and then saved in one of these formats do not need to be recharged, unless they have been modified.

Note. If you wish to assign charges different from those specified in the forcefield, you must ensure that automatic charge assignment is turned off.

To make sure that forcefield charges are assigned, use the 3D Viewer to display atom labels according to charge.

Further information

Assigning forcefield types Applying constraints Non-bond interactions Preparing the structure

Applying constraints Constraints allow you to restrict a calculation to the region or conformation of interest in a molecule. They also enable you to set up computational experiments. Such experiments are one of the primary uses of molecular modeling, allowing you control over a structure at the atomic level.

Note. The difference between a constraint and a restraint is that a constraint is an absolute restriction imposed on the calculation, while a restraint is an energetic bias that tends to force the calculation toward a certain restriction (even though many people use these terms as if they were interchangeable).

Constraints are often used to control and direct a minimization. For example, you can fix some atoms in space, not allowing them to move.

Fixed atom constraints Fixed atoms are constrained to a given location in space, so they cannot move at all. Fixed atoms reduce the expense of a calculation in two ways:

z Terms in the energy expression involving only fixed atoms can be eliminated, because they merely add a constant to the total energy. Since the positions of fixed atoms cannot change, neither can the contribution of the terms that depend only on these positions. (Interactions between moving and fixed atoms are calculated.)

z Fixing atoms reduces the number of degrees of freedom in the system, so minimizers converge in fewer steps and dynamics requires fewer steps to sweep the available conformational space.



Note. The energy calculated by simulation engines is correct only to an arbitrary constant, depending on the structure as well as the fixed atoms. Thus, only differences in energy between conformations of the same structure having the same fixed atoms are meaningful.

Use atom constraints when you want to apply minimization or dynamics to part of a structure, while keeping the remainder of the structure fixed and rigid. For example, use atom constraints to quickly minimize a sorbate in a zeolite by fixing the atom positions of the zeolite frame and allowing only the sorbate atoms to move. Or fix all residues in a protein except for those in the active site.

Further information

Assigning forcefield types Assigning charges Non-bond interactions Preparing the structure

Non-bond interactions Electrostatic (Coulombic) and van der Waals interactions are referred to collectively as non-bond interactions.

Calculating non-bond terms can be computationally expensive. To avoid significant increases in calculation times, approximation schemes are often employed. Choosing the best method for your particular structure can save computational time without sacrificing accuracy.

In addition, you have some direct control over the functional terms for non-bond interactions:

z You might be able to improve your simulation by changing the default combination rules for van der Waals interactions between non-identical atom types.

z You can change the dielectric constant to account for nonaqueous solvents and/or solvent screening or make the dielectric "constant" a function of distance.

You may use different methods for van der Waals and electrostatic interactions.

Typically, both van der Waals and Coulombic interactions are calculated by the same method and (if by the non-bond cutoff method) with the same non-bond list. However, different methods and parameters may be used for van der Waals and Coulombic terms in MS Modeling. This allows you to, for instance, use a large cutoff for electrostatic interactions and a smaller cutoff for van der Waals interactions.

The van der Waals interaction potential is relatively short range, it decreases at a rate of 1/r6. So, at distances of 8-10, the energy and forces are quite small. Therefore, using cutoffs to bring the van der Waals potential to zero at around 10 is a reasonable approximation. Coulombic interactions, on the other hand, decrease at a much slower rate with distance (1 /r), so even at considerable distances the energy of interaction is not negligible. However, the precise nature of the variation in electrostatic interaction energy with distance depends on the structure. This is because, apart from a few formally charged groups, most molecules are composed of neutral fragments with dipoles and quadrupoles. As a result, in most models, the major component of the electrostatic interaction between molecules or parts of molecules is a dipole-dipole interaction, which decreases at a rate of 1/r3.

Automatic exclusions



Van der Waals and Coulombic interactions are ordinarily calculated between all atom pairs that are not specifically excluded. Most forcefields exclude non-bond terms for atoms connected by bonds (1-2 interactions) and valence angles (1-3). Some forcefields also exclude non-bond terms between end atoms in torsion (1-4) interactions. These interactions are illustrated in Figure 6.

Figure 6. Types of interactions usually excluded from non-bond calculations

Combination rules for van der Waals terms Any van der Waals interaction parameters that are actually defined for heterogeneous atom pairs are called off-diagonal parameters. Off-diagonal parameters that are not available for such atom pairs are calculated by averaging those for each of the two atom types, using a geometric, arithmetic, or 6th-power combination rule:

Eq. 10

Eq. 11

Eq. 12

In MS Modeling a choice of combination rules is available and is specified in the forcefield file.



The arithmetic mean gives marginally better equilibrium distances for van der Waals interactions than the geometric combination rule (Halgren, 1992). The 6th-power rule (not available with all forcefields) yields even better results (Waldman and Hagler, 1993).

With the Ewald method, the geometric mean leads to faster convergence than the arithmetic mean.

The dielectric constant and the Coulombic term The electrostatic potential is computed from the partial atomic charges associated with the structure. Approximate solvent-screening effects can be included by specifying a nondefault value for the dielectric constant if it is explicitly included in the forcefield. (The "dielectric constant" used in modeling is not the dielectric constant that most experimental chemists would think of, instead it is an empirical, dimensionless scaling factor.)

The dielectric constant reflects the polarizability of the solvent molecules. A polarizable solvent such as water has a greater dielectric constant than less polar liquids. Electrostatic interactions in polarizable solvents with high dielectric constants are greatly attenuated. In closely packed molecules, however, there are fewer solvent molecules to screen the charge interactions.

A relatively large dielectric constant can be used for simulating the aqueous environment of small systems. However, many calculations on models use a smaller dielectric constant. For example, dielectric constants of between 2.0 and 10.0 have been used for simulations in the interior of proteins. A typical value for water would be around 4.

For a helpful review, see Harvey (1989).

A distance-dependent dielectric "constant" The dielectric constant can be kept constant, or the Coulombic term can be made a shielded function, where the dielectric "constant" is a function of distance (r ). This is useful for electrostatic interactions in closely packed molecules, where the number of solvent molecules between two interacting charges is usually fewer than in a bulk solvent. A distance-dependent dielectric constant is also useful for models in which explicit solvent molecules are not included.

A shielded Coulombic term is faster to calculate than a non-shielded term because no square root has to be evaluated.

Note. A distance-dependent dielectric constant cannot be used on a periodic structure with the Ewald sum method.

Further information

Non-bond cutoffs Atom-based cutoffs Charge groups and group-based cutoffs Cell multipole methods Ewald sums for periodic systems

Non-bond cutoffs An energy expression is computationally tractable only for systems with relatively small numbers of atoms. The number of internal coordinates grows linearly with the size of a structure, so the



computational work involved in the first nine terms in Eq. 9 also grows linearly.

However, inspection of the final summation, which represents the non-bond interactions, reveals a quadratic dependence on the number of atoms in the system. Thus, if the system of interest has 1000 atoms, the non-bond summation has about 500,000 terms. If it has 10,000 atoms, the summation has 50,000,000.

Therefore, it is common to neglect or approximate non-bond interactions between widely separated pairs of atoms.

Choosing how to treat long-range non-bond interactions is an important factor in determining the accuracy and the time taken to perform an energy calculation.

Several cutoff methods are discussed in the Atom based cutoffs and Group based cutoffs topics. In addition, a good review was published by Brooks et al., 1985b. More recently, two other methods, cell multipoles and Ewald sums, have also become available. You should read all of these topics to decide which method is best for your structure and computational problem.

Note. Generally, the same non-bond methods and specifications should be used for all energy calculations within a given project. However, the method and/or specifications used for van der Waals interactions may differ from those used for Coulombic interactions.

Effect of non-bond cutoff distance To appreciate the impact of cutoffs on computational efficiency, consider a receptor-ligand-solvent system with a total of 5000 atoms. An example would be a small protein (100-150 residues) surrounded by 1-2 layers of water.

Figure 7 shows how the number of non-bond interactions increases with the cutoff distance. This calculation would run at least 10 times faster with an 8.0 cutoff than with no cutoff (assuming that the non-bond term is rate limiting, which it usually is). The trade-off is that interactions beyond the cutoff distance are not accounted for.

Figure 7. Number of non-bond interactions as a function of cutoff distance The number of non-bond pairwise interactions (in millions) expected for a 5000-atom system as a function of cutoff distance. The time required to evaluate the total energy of this system is approximately proportional to the number of non-bond interactions.



The significance of non-bond interactions beyond the cutoff distance depends on the system being simulated. When modeling an isolated molecule or cluster, the use of cutoffs for van der Waals interactions is quite reasonable since the potential is relatively short range, it decreases at a rate of 1/r6. So, at distances of 8-10, the energy and forces are quite small.

The situation is slightly different when modeling disordered or ordered crystalline systems. For a typical disordered system, which might consist of a cube of organic material with ~25 edges, the contribution of van der Waals interactions at distances greater than 8-10 to the energy and pressure can amount to ~50-200 kcal mol-1 and ~500-1000 bar (0.05-0.10 GPa), respectively, although the contributions of electrostatic interactions are much smaller. Contributions from remote non-bond interactions of all types to the resultant force on the atoms is also small. In such systems, it is possible to apply tail corrections, which permit the use of 8-10 cutoffs while simultaneously yielding accurate values of energy and pressure.

Finally, in periodic crystalline systems, both van der Waals interactions and electrostatic interactions can be significant up to 15, or more. For example, in a calculation of the energy as a function of cutoff distance in the hexapeptide crystal, [Ala-Pro-D-Phe]2, Kitson and Hagler showed that the non-bond energy accounted for changes from 63% to 97% of the asymptotic value as the cutoff distance was increased from 8 to 15 (Kitson and Hagler, 1988).

Figure 8. Van der Waals energy as a function of cutoff distance The van der Waals energy for the hexapeptide crystal, [Ala-Pro-D-Phe]2 as a function of cutoff distance. The van der Waals energy does not converge until approximately 20.



Neighbor lists and buffer widths To maximize the efficiency of non-bond calculations, MS Modeling creates a neighbor list that contains all atom pair interactions to be considered during the calculation. Atom pairs are not included in the list if they are too far apart or if they are excluded.

Neighbor list generation was chosen over other approaches for computational efficiency:

z A pairwise search through all atoms at every step of a calculation is computationally expensive.

z During minimization or dynamics, the distances between atoms do not change radically between one step and the next.

Although a neighbor list requires time to set up, overall time is saved for models containing more than about 50 atoms, because the list is not recalculated each time the energy expression is evaluated. Since the list is not updated at every step, it includes atoms in a buffer region that might move close enough together to contribute to the energy calculation before the next update of the neighbor list.

To ensure that no atoms outside the buffer region can move close enough to interact during an energy minimization or molecular dynamics simulation, the non-bond list is automatically updated whenever any atom moves more than half the buffer width. Thus, the width of the buffer region, coupled with the velocity with which atoms move, determines the maximum amount of time before the neighbor list is updated.

Long-range corrections For disordered periodic systems, contributions to the potential energy and pressure from van der Waals interactions outside the cutoff can be written as:

Eq. 13



Eq. 14

where Ni and i denote the number and number density of atoms of type i, U (r) denotes the van der Waals non-bond potential describing interactions between atoms of type and , and g (r) denotes the pair correlation function describing the probability of finding and at separation r relative to the probability of finding the pair at an infinite distance (McQuarrie, 1976).

In most cases, the function g (r) is short range, reaching its limiting value of unity at distances of ~10. Moreover, g (r) - 1.0 is small even at shorter distances. As a result, accurate estimates of the tail corrections for all normal non-bond cutoff values may be safely made by setting all g (r)=1.0 in Eqs. 13 and 14.

Note. Applying Eqs. 13 and 14 at each step in a simulation contributes negligibly to the overall simulation cost, since for constant-volume simulations the full correction may be precomputed, and in simulations where the volume fluctuates it is only necessary to recompute the volume at each step.

Cell-based cutoffs MS Modeling also supports cell-based cutoffs for periodic systems. This is another image-based method, in which the neighbor list is based on a specified number of cell layers surrounding the central cell.

Further information

Atom-based cutoffs Charge groups and group-based cutoffs Cell multipole methods Ewald sums for periodic systems

Atom-based cutoffs A simple approach to the calculation of long-range non-bond interactions is the direct method, where non-bond interactions are simply calculated to a cutoff distance and interactions beyond this distance are ignored. However, the direct method can lead to discontinuities in the energy and its derivatives. As an atom pair distance moves in and out of the cutoff range between calculation steps, the energy jumps, since the non-bond energy for that atom pair is included in one step and excluded from the next. (For small models you may, of course, calculate all non-bond interactions by setting a large



enough cutoff distance and using the direct method.)

Minimizing discontinuities in the potential energy surface To avoid the discontinuities caused by direct cutoffs, most simulations use some kind of switching function to smoothly turn off non-bond interactions over a range of distances. S(r), in Figure 9 shows the features that a switching function must have:

z it must be unity for small non-bond distances where the greatest changes in the potential occur z at intermediate non-bond distances it must tend, smoothly, to zero z it must be zero for large distances.

An effective potential is created by multiplying the actual potential by the smoothing function. Clearly the choice of the function in the intermediate range is crucial and should be continuously differentiable in this region so that forces can be calculated. One possible choice for this function is to use a spline. The range over which the S(r) tends to zero is also important. As indicated in Figure 9 the upper limit (i.e. large non-bond distance) for this range is the cut-off distance. The location of the lower limit is variable and often requires some investigation. If it is large (i.e. small spline width) then unrealistic forces may be calculated, if it is too small then the important features of the equilibrium region of the potential might be lost.

Figure 9. Application of a switching function Application of a switching function; energy=E(r) S(r). Variable names in Accelrys simulation engines that relate to cutoffs are also illustrated. Thick dark curve=the unmodified van der Waals potential; dashed curve=the switching function S(r); gray curve=the resulting, switched potential.



Further information

Charge groups and group-based cutoffs Cell multipole methods Ewald sums for periodic systems Non-bond cutoffs

Charge groups and group-based cutoffs Most forcefields include terms for the Coulombic interaction between the partial charges that are located on each atom. Partial charges occur even for charge-neutral species, such as a water molecule, and reflect the difference in electronegativity between the atoms from which the molecule is composed. In simulations, these charges are often estimated using methods such as charge equilibrium or assigned by the forcefield being used, e.g., COMPASS. When evaluating these Coulomb terms, care must be taken when using cutoff techniques to ensure that erroneous monopole terms are not introduced. To understand why, consider the following: the interaction energy between two monopoles, each with 1 e.u. of charge, is about 33 kcal mol-1 at 10 , while that for two dipoles at the same distance, formed from unit monopoles 1 apart, is no more than about 0.3 kcal mol-1.

Clearly, ignoring monopole-monopole interactions would give grossly misleading results, whereas ignoring dipole-dipole interactions would be only a modest approximation. However, problems can also occur if monopole interactions are treated unevenly. This may happen if non-bond cutoffs are applied to a structure on an atom-by-atom basis, generating spurious monopoles by artificially splitting dipoles (when one of the dipole atoms is inside the cutoff and one is outside). Instead of ignoring a relatively small dipole-dipole interaction, this would artificially introduce a large monopole-monopole interaction. To avoid these artifacts, MS Modeling can apply cutoffs over charge groups.

A 'charge group' is a small group of atoms close to one another that has a net charge of zero or almost zero. In practical applications, charge groups are identical to common chemical functional groups, e.g., a carbonyl group, methyl group, or carboxylate group would all be approximately neutral charge groups. The potential excerted by such a group of atoms at a distance, R, from the center of the group can be obtained by making an expansion in terms of the inverse powers of R:

Eq. 15

where Q is the total charge of the group, is the dipole moment of the group and is the quadrapole moment. (The terms in describe the orientation of the charge group.) Thus, if the group is charge neutral, the leading term will be proportional to /R2. Similiarly, if the center of another charge group is placed at a distance, R, from the first, a similar expansion in terms of R can be written and, if the two groups are both charge neutral, it can be shown that the leading interaction will be proportional to '/R3 (see Maitland et al., 1987), where ' is the dipole moment of the second charge group. This observation is the basis of the charge group method. The R-3 dependence of charge group interactions suggest that they decay more rapidly than charge-charge interactions and the above example is a demonstration of this. For a given cutoff, charge groups can be expected to give a better estimate of the Coulomb interaction than atom-based cutoff methods.



In practice, expansions such as that above are not used explicitly because as R becomes small, the number of terms needed for the expansion to converge increases rapidly. Instead, all the pairwise interactions between the atoms of the interacting groups are evaluated explicitly. At first glance, this may appear to offer little or no advantage over atom-based cutoff methods, but the grouping of atoms in this way and the evaluation of all pair interactions mean that no dipoles are split, so avoiding the problems outlined above.

Another benefit that can often occur is that group based methods tend to be faster than similar atom-based methods using the same cutoff. The interaction of two groups is determined by the distance between their centers. (There is some flexibility in assigning the center of a charge group. An obvious candidate is the center of charge, but for a neutral group, this can difficult to define. The center of geometry is another possibility, typically though, the center of a charge group is taken as the atom nearest the center of geometry and this atom is termed the switching atom.) If the distance between centers falls within the specified cutoff distance, all atomic pair interactions are included, and if not, all atomic pair interactions are excluded. As there is only one switching atom per charge group, the number of tests that need to done to determine which groups interact is much less than that for atom-based methods. Furthermore, the number of pair interactions that need to be evaluated can be less.

Further information

Non-bond cutoffs Atom-based cutoffs Cell multipole methods Ewald sums for periodic systems Charge group assignment

Cell multipole method The cell multipole method is a way of handling non-bond interactions in both nonperiodic and periodic systems that is more rigorous and efficient than cutoffs. This method (Greengard and Rokhlin, 1987; Schmidt and Lee, 1991; Ding et al., 1992) is a hierarchical approach that allows the accuracy of the nonbond calculation to be controlled. Short range interactions are treated in the usual way, but long range group-group interactions are treated in terms of multipoles. Computational time scales with N (the number of atoms).

The cell multipole method applies to the general energy expression of the following form:

Eq. 16

where i is the potential at atom i, Rij is the distance between atom i and atom j, p is a number (p=1 for Coulombic and 6 for London dispersion interactions, for example), and the 's are general charges. For Coulombic interactions, the 's are real charges.

The general potential i may be divided into a near-field potential due to the surrounding atoms (those within a few angstroms) and a far-field potential due to the rest of the atoms that interact with



the ith atom.

The number of interactions in the near field is limited, so it is relatively easy to calculate the near-field potential exactly. The number of interactions in the far field is of order N 2, making an exact calculation of this potential intractable for large models. The cell multipole method calculates the far-field potential accurately and efficiently in the following manner.

Derivation of cell multipole method We begin by placing an arbitrarily shaped molecule in a rectangular box. The box is then divided into a number of basic cells of length 4-6, containing 2-4 atoms on average. The basic cell level is denoted level A in Figure 10. Starting from a corner of the box, every eight basic cells may be considered to constitute a larger, parent cell, termed level B. Every eight parent cells may constitute a grandparent cell, termed level C. This procedure is repeated until only a few large cells fill the box. For example, considering any atom in cell A0 of the three-level cell system the other atoms in A0 and all atoms in An contribute to the near-field potential, and the atoms in Af, B, and C contribute to the far-field potential.

Figure 10. Three-level hierarchical cell system Definition of hierarchical cells and division of near field and far field for a basic cell A0. Larger cells are formed as cells are farther from cell A0 (this constitutes the hierarchy). The near field is one layer thick.

Key steps used in cell multipole method The cell multipole method involves the following key steps:

1. Multipole expansion and calculation of general multipole moments.

C C C C C

C B B B B B B C

B Af Af Af Af Af Af B BAf An AnAnAf Af

C B Af An A0AnAf Af B B CAf An AnAnAf Af

B Af Af Af Af Af Af B BAf Af Af Af Af Af

C B B B B B B C

B B B B B B

C C C C C



The potential associated with each basic cell can be represented as a general potential originating at the center of the cell. This potential may be expanded into an infinite series of multipole moments. For example, the potential associated with cell Af in Figure 10 centered at rAf, is expressed as:

Eq. 17

where R=rAf - r; r is any point outside cell Af; , =x, y, z; and Z, D, and Q are monopoles, dipoles, and quadrupoles, respectively.

The potentials associated with the higher-level cells can be expanded in an analogous manner, with moments derived from lower-level cell moments.

2. Generation of Taylor coefficients.

Using this expansion to represent the potentials associated with Af-, B-, and C-level cells, the far-field potential of cell A0 may be obtained by summing all the far-cell contributions. The resulting potential may now be expanded as a Taylor series about the center of cell A0:

Eq. 18

where rA0 is the position vector of the center of cell A0 and r =r - rA0. The Taylor coefficients in Eq. 18 are due to all the far-cell contributions.

A key point of the cell multipole method is that, once the set of Taylor coefficients is calculated at rA0, the far-field potential of any atom in cell A0 is obtained easily through Eq. 18.

Since the Taylor coefficients must be generated for every basic cell, another key point of the cell multipole method is efficient generation of these coefficients. A hierarchical procedure is used, in which coefficients determined for higher-level cells are propagated to the coefficients for lower-level cells. Thus, coefficients for a child B cell are obtained by adding contributions directly translated from the C-level coefficients at the center of the parent C cell to the coefficients at the center of B, generated by considering only the B-cell contributions.

Improved computational performance and accuracy

The cell multipole method is an order-N method. The time savings with respect to an exact N 2 algorithm, as well as the improved accuracy relative to using cutoffs, can be dramatic.



Non-bond interaction energies Due to the nature of the cell multipole method, specific non-bond interaction energies cannot be calculated. The per-atom energy is calculated by using the cell multipole method and the non-bond interaction energy is calculated using the group-based method. You can specify cutoffs for the group-based method of non-bond analysis. A large cutoff in the group-based method may give reasonably accurate energies compared with the cell multipole method.

Further information

Atom-based cutoffs Charge groups and group-based cutoffs Cell multipole methods Non-bond cutoffs

Ewald sums for periodic systems The Ewald technique (Tosi, 1964; Ewald, 1921) is a method for the computation of non-bond energies in periodic systems. Crystalline solids are the most appropriate candidates for Ewald summation, partly because the error associated with using cutoff methods is much greater in an infinite lattice. However, the technique can also be applied to amorphous solids and solutions.

Figure 11 shows the electrostatic energy for quartz as computed by various techniques. You might expect that all the techniques should converge to the same value at high cutoff distances. However, the direct atom-based cutoff approach yields results that fluctuate dramatically as the cutoff increases, even at large cutoff values. The reason for this is that the sum is only conditionally convergent. As the cutoff increases, charges of opposite sign are taken into account and the partial sum is modified significantly. Worse, reordering the terms of a conditionally convergent series can yield arbitrary results. The problem then, is to find physically and chemically meaningful orderings of the series.

Figure 11. Electrostatic energy vs. cutoff distance for quartz The electrostatic energy of quartz was calculated with Discover using several methods. Medium line with points: using atom-based cutoffs; thin dark line: using cell-based cutoffs; thick line: using group-based cutoffs; same thick line: by the Ewald method with dipole correction; and medium dashed line: by the Ewald method without dipole correction.



The cell-based and group-based cutoff techniques are natural candidates. However, they yield somewhat different values, due to the different cutoff conventions employed. The group-based technique computes the result for a sphere, but the cell-based technique computes the result for a parallelepiped that preserves the shape of the unit cell.

A standard Ewald calculation that does not take the dipole moment of the unit cell into account yields yet another value. An Ewald calculation that includes the effect of the dipole moment agrees with the group-based calculation (Figure 11).

For van der Waals energy, the energy sum is absolutely convergent, and nochaotic behavior arises from the direct approach. Even so, as Figure 12 indicates, the convergence of the dispersive energy is slower than might be expected. Even with a cutoff distance of 30, the error is a significant fraction of 0.1 kcal mol-1. (The Ewald calculation is less costly for comparable accuracy.) The repulsive energy, on the other hand, converges at a cutoff distance of only 15 and needs no special treatment. (However, atom-based calculations for much larger systems, show that sometimes even the repulsive energy can exhibit a surprisingly high error at a cutoff of 12.)

Figure 12. van der Waals energy vs. cutoff distance for NaCl The graph shows the (solid lines) dispersive and (dashed line) repulsive portions of the van der Waals energy as a function of the cutoff distance, as calculated by the (thin lines) atom-based and (thick line) Ewald methods. The Ewald calculation was performed with Discover to an accuracy of 1 e-6, which requires a cutoff distance of 9.5.



Theory of Ewald technique For full details on the Ewald summation method and parameter optimization procedure used in MS Modeling, refer to Karasawa and Goddard (1989).

The Ewald method for improving convergence is to multiply a general lattice sum:

Eq. 19

by a convergence function (r), which decreases rapidly with r. Of course, to preserve equality, one must then add a term equal to the product of 1 - (r) to the lattice sum:

Eq. 20

Here, the first term converges quickly, because m(r) decreases rapidly. Ewald's insight was that the second term can be Fourier transformed to provide a rapidly converging sum over the reciprocal lattice. The sum over L in Eq. 19 runs over all lattice vectors, but the i=j terms must be omitted when L=0.

The convergence functions



The convergence function for the electrostatic energy is:

Eq. 21

and for the dispersive energy:

Eq. 22

Optimizing computational effort The electrostatic convergence function 1 was also used by Catlow and Norgett (1976) and Karasawa and Goddard (1989). The dispersive convergence function 6 was recommended and used by Karasawa and Goddard. The convergence parameter plays a similar role in both cases. As increases, the real-space sum converges more rapidly and the reciprocal space sum converges more slowly. (That is, a large implies a heavy computational load for reciprocal space, and a small implies a heavy computational load for real space.) Cutoffs must be adjusted accordingly, and processing time is affected by the cutoffs. A value of that balances processing in the real and reciprocal spaces proves to be optimal. The same value of can be used for both the dispersive and electrostatic energy, and thus they can be combined for greater efficiency.

MS Modeling automatically chooses so as to balance the computational loads for real and reciprocal space.

Electrostatic energy The Ewald expression for the electrostatic energy is (dropping a factor of 1/4 0):

Eq. 23

where a= |ri - rj - RL|; ri =

Classical Simulations Theory

Documents

Transcript of Classical Simulations Theory