Automatic Construction of Ab Initio Potential Energy Surfaces Interpolative Moving Least Squares...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
2
Transcript of Automatic Construction of Ab Initio Potential Energy Surfaces Interpolative Moving Least Squares...
Automatic Construction of Ab Initio Potential Energy Surfaces
Interpolative Moving Least Squares (IMLS) Fitting of Ab Initio Data for Constructing Global Potential Energy
Surfaces for Spectroscopy and Dynamics
Donald L. ThompsonUniversity of Missouri – Columbia
Richard Dawes, Al Wagner,& Michael Minkoff
Fourth International meeting : "Mathematical Methods for Ab Initio Quantum Chemistry"
13-14 November 2008Laboratoire J.A. Dieudonné
CNRS et Université de Nice - Sophia-Antipolis
Potential Energy Surfaces
Basis for quantum and classical dynamics, spectroscopy Electronic structure calculations can provide accurate energies (even gradients and Hessians) – but at a high cost (Highly accurate energy calculations for a single geometry can take hours or days)
We want to: Generate accurate global PESs fit to a minimum number (100’s – 1000’s) of ab initio points Make ab initio dynamics feasible for the highest levels of quantum chemistry methods (for which gradients may not be directly available)
As “blackbox” as possible
Requirements:: Minimize number of ab initio points Minimal human effort and cost of fitting Low-cost accurate evaluations
Our approach:Interpolating Moving Least Squares (IMLS) Much cheaper than high-level quantum chemistry Doesn’t need gradients, but can use gradients and Hessians Can use high-degree polynomials
How to make efficient and practical: Optimally place minimum number of points Weight functions Reuse fitting coefficients (store local expansions) Use zeroth-order PES and fit difference Other techniques
Least-Squares FittingLeast-Squares FittingUsual applications are Usual applications are for data with statisticalfor data with statisticalerrors, but errors, but trends that that follow known follow known functional forms.functional forms.
1-D morse function: five data points
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
yFitting data
exact function
Fitting ab initio energies
Ab initiob initio energies do not have random errors energies do not have random errors A PES does not have a precisely known functional form A PES does not have a precisely known functional form the energy points lie on a surface the energy points lie on a surface of unknown shapeof unknown shape Thus, fit with a general basisThus, fit with a general basisset (e.g., polynomials) set (e.g., polynomials) Basis functions ~ the “true”Basis functions ~ the “true”function provides a more compactfunction provides a more compactrepresentationrepresentation
Weighted least squares equationsWeighted least squares equations
V fitted (z) j1
m
aiT (z) bi(z)
D[V fitted (z)] w i
j1
N
(z)[V (z(i)) V fitted (z(i))]2
BTW(z) B a(z) = BTW(z)V
)()()(
)()()(
)()()(
)()(2
)(1
)2()2(2
)2(1
)1()1(2
)1(1
nm
nn
m
m
bbb
bbb
bbb
zzz
zzz
zzz
B
W=1 gives standardleast squares
We use standardroutines
Weighted vs. standard least squaresWeighted vs. standard least squares1st-degree std least squares vs IMLS
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
y
Fitting data
exact function
std least squares
IMLS interpolation
2nd-degree std least squares vs IMLS
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
y
Fitting data
exact function
std least squares
IMLS interpolation
Standard, first degreefit to the 5 points
IMLS, first degree
IMLS fits perfectlyat each point
Standard, second degree
IMLS, second degree
First Degree
Second Degree
Optimum Point Placement We want to do the fewest number of ab initio calculations
A non-uniform distribution of points is best
We can use the fact that IMLS fits perfectly at each point
to determine where to place points for the most accurate
fit using the fewest possible points Use fits of different degree IMLS fits
1-D morse function: five data points
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
y
Fitting data
exact function
Illustrate for 1-D Morse potential 5 “seed” points
Automatic Point Placement: 1-D IllustrationAutomatic Point Placement: 1-D Illustration
Squared difference surface indicates point where data is required
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
y
Fitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
Start with 5 uniformly placed pointsFit with 2nd & 3rd degree IMLSAdd new point where they differ the most
Squared difference indicates where new points are needed
Automatic Point Placement Point Placement
Squared difference surface indicates point where data is required
-20
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
Ene
rgy
Fitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
Squared difference surface indicates point where data is required
-20
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
Ene
rgy
Fitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
Squared difference surface indicates point where data is required
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
Ener
gy
Fitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
Squared difference surface indicates point where data is required
-20
0
20
40
60
80
100
120
140
0 1 2 3 4 5
Bond distance
Ener
gy
Fitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
1 new point added5 initial points
2 new points added 3 new points added
Density adaptive weight functionDensity adaptive weight function
))
)(/(()( 2)(
2
piid
zz
i id
zzeZw
i
10 5 0 5 100
2000
4000
6000
8000
1 104
Automatic pointplacement will generate a nonuniform densityof points.
Thus, we use aflexible, density-dependentweight function
High Dimensional Model Representation(HDMR) basis set
...),()()...,(,
)2(,
)1(21
jijiji
iiiN QQVQVQQQV
• Can represent high dimensional function through an Can represent high dimensional function through an expansion of lower order termsexpansion of lower order terms
• Can also use full dimensional expansion but restrict the Can also use full dimensional expansion but restrict the order of terms differentlyorder of terms differently
• Evaluation scales as NMEvaluation scales as NM22. HDMR greatly reduces M.. HDMR greatly reduces M.
• This also reduces the number of points required.This also reduces the number of points required.
Accurate PESs from Low-Density DataAccurate PESs from Low-Density Data
Initial testing for 3-D: HCN-HNC
We used the global PES fit to ab initio points by van Mourik et al.* as a source for (cheap) points. Saves time obtaining points Allows extensive error analyses
We fit using (12,9,7) HDMR basis: 1-coordinate term truncated at 12th degree 2-coordinate term truncated at 9th degree 3-coordinate term truncated at 7th degree180 basis functions
* T. van Mourik, G. J. Harris, O. L. Polyansky, J. Tennyson, A. G. Császár, and P. J. Knowles, J. Chem. Phys. 115, 3706 (2001).
Error as function of automatically selected data points
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
120 160 200 240 280 320
Number of points with Hessians
Err
or
(kca
l/m
ol)
RMS succ. orders IMLS
RMS error IMLS
Mean succ. orders IMLS
Mean error IMLS
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
120 160 200 240 280 320
Number of points with Hessians
Err
or
(kca
l/m
ol)
RMS succ. orders IMLS
RMS error IMLS
Mean succ. orders IMLS
Mean error IMLS
3-D HCN:HNC
Automatic surface generationUsing (12,9,7) & (11,8,6) bases
Data Points:van Mourik et al. PES
Seed points: Start with 4, 6, & 8 for r, R & cosθ Energy cutoff: 100 kcal/mol
RMS
Mean
Successive Order: Solid True Error: Open
The difference in successiveorders follows closely thetrue error.
Thus, adding points based ondifference criteria results inconverged true error
Convergence rate dependence on basis set: HCN
Power-law convergence of 3-D PES (HCN)
0.00001
0.0001
0.001
0.01
0.1
1
100 1000 10000
Number of points
RM
S e
rro
r (k
cal/m
ol)
5th-degree
4th-degree
6th-degree
7th-degree
8th-degree
HDMR (12,9,7)
Number of Points
RM
S E
rror
(k
cal/
mol
)
Obeys power law over 3 orders of magnitude
Accuracy follows Farwig’s* formula for power-law Accuracy follows Farwig’s* formula for power-law convergence convergence Linear on log-log plot with slope ~(n+1)/D, Linear on log-log plot with slope ~(n+1)/D, where n = degree of basiswhere n = degree of basis
* R. Farwig, J. Comput. Appl. Math. 16, 79 (1986); Math. Comput. 46, 577 (1986).
8th degree &HDMR (12,9,7)both have ~ 180fcts., but HDMRconverges faster
Cutting cost: Local IMLS
• Cost of evaluation scales as NM2 for standard IMLS (N=# ab initio points, M=# basis functions)• High-degree standard IMLS is too costly to use directly, thus we use local-IMLS: Local approximants (polynomials) of the
potential near data points are calculated using IMLS (expensive) & the interpolated value is taken to be a weighted sum of them
• In standard IMLS they are recomputed at each evaluation point In standard IMLS they are recomputed at each evaluation point (very accurate, but too costly)(very accurate, but too costly)
• The coefficients are generally slowly varyingThe coefficients are generally slowly varying• In the L-IMLS approach coefficients are computed & stored at a In the L-IMLS approach coefficients are computed & stored at a
relatively small number of pointsrelatively small number of points• Evaluations are low cost weighted interpolations between stored Evaluations are low cost weighted interpolations between stored
pointspoints
Overcoming scaling problem for automatic point selection• We get high accuracy & low cost with high-degree L-IMLS But must find optimum place to add each We get high accuracy & low cost with high-degree L-IMLS But must find optimum place to add each ab initioab initio
pointpoint• Trivial in 1-DTrivial in 1-D
as shownas shown
earlierearlier
• With L-IMLS the functions whose maxima we seek are continuously globally defined as are their gradientsWith L-IMLS the functions whose maxima we seek are continuously globally defined as are their gradients• So, define negative of the squared-difference surfaceSo, define negative of the squared-difference surface• We can use efficient minimization schemes such as conjugate gradient to find local minimaWe can use efficient minimization schemes such as conjugate gradient to find local minima
• Difference between successive orders of IMLSDifference between successive orders of IMLS• Can also use variance of weighted contributions to interpolated value with local IMLSCan also use variance of weighted contributions to interpolated value with local IMLS
• Grid or random search scales very poorly with dimensionGrid or random search scales very poorly with dimension
Squared difference surface indicates point where data is required
0
20
40
60
80
100
120
0 1 2 3 4 5
Bond distance
En
erg
yFitting data
exact function
2nd-degree IMLS
3rd-degree IMLS
squared difference
Method schematicMethod schematic
Compute Ab Initio Data
Compute L-IMLS
Automatic Data Point Location
Add New Ab Initio Data
Test Fitting Error Statistics Write PES Data, Terminate
Seed Grid
Read Input
Automated PES fitting in 3-D: HCN-HNCAutomatic surface generation:HCN
0.001
0.01
0.1
1
10
100
1000
10 100 1000 10000
Number of points
RM
S e
rro
r (k
cal/m
ol)
)
Val only
Val+grad
Val+grad+Hess
(12,9,7)
Basis set not well supported
Spectroscopic accuracy To less than 1 cm-1
within 792 pts with Hessians or 1000 pts with gradients
The PES is fit up to 100 kcal/mol
~ cm-1
828318
223
Used 30 random starting points for minimizations
HDMR (12,9,7)
For 0.1 kcal/mol
But we can do even betterDiscussed below
Dynamic Basis Procedure
Avoids including points in the seed data that are not optimally located
Start with very small initial grid of points &use automatic surface generation with a small basis, successively increasing the basis as points are added
Automated Dynamic Basis: 6-D (HOOH)
Automatic surface generation: HOOH
0.1
1
10
100
1000
10 100 1000 10000
Number of Points
RM
S e
rro
r (k
ca
l/m
ol)
Val only
Val+grad
Val+grad+Hess
(10,7,5,4)
(6,3)
(7,4)(8,5,3)
(9,6,4)
Dynamic basis
Fit up to 100 kcal/mol
Fit to analylic H2O2 PES*
* B. Kuhn et al. J. Chem. Phys. 111, 2565 (1999).
114
164
754
RMS error based onrandomly selectedtest points
A min. of 591 pts. would be needed if we started with the(10,7,5,4) basis.We started with 108.Convergence alsomuch faster
Spectroscopic Accuracy: 9-D (CH4)
Test Case: Schwenke & Partridge PES: a least squares fit to ~8000 CCSD(T)/cc-pVTZ ab initio data over therange 0-26,000 cm-1
We fit the range 0-20,000 cm-1 (57.2 kcal/mol). Energies & gradients only (Hessians data not cost effective as shown earlier) Bond distances Exploited permutation symmetry Dynamic basis procedure
D. W. Schwenke & H. Partridge, Spectrochim Acta Part A 57, 887 (2001)
Automated PES fitting in 9-D (CH4)Automatic surface generation: CH4
9D dynamic basis
0.1
1
10
100 1000 10000
Number of points
Mea
n e
rro
r (k
cal/m
ol)
Val only
Val + grad
(7,4)
(8,5,3)
(9,6,4)
(6,3)
(9,6,4)
With 1552 pts. the E onlyRMS error is 0.41 kcal/mol& including gradientsbrings it down to 0.32 kcal/mol.
The RMS error for the Schwenke-Partridge PES(based on 8000 pts) is~0.35 kcal/mol
The IMLS fitting is essentially automatic, little human effort, and no priorknowledge of the topology
9,6,4,4
9,6,4,4
A General 3-Atom IMLS-QC CodeA General 3-Atom IMLS-QC Code
• Input fileInput file• Accuracy targetAccuracy target• Energy rangeEnergy range• Basis setBasis set• Number of seed points and coordinate rangesNumber of seed points and coordinate ranges• Type of coordinates, Jacobi, valence, bond Type of coordinates, Jacobi, valence, bond
distancesdistances• Generates input files for Gaussian, MolPro, and Generates input files for Gaussian, MolPro, and
Aces IIAces II• Energies only or energies & gradientsEnergies only or energies & gradients
A New PES for the Methylene RadicalA New PES for the Methylene Radical
We have generated a spectroscopically accurate PES for CH2 forenergies up to 20,000 cm-1 (216 vibrational states).
CASSCF calculations in valence coordinates.
Vibrational levels were computed using a discrete variable representation(DVR) method.
DVR typically requires 10’s of thousands of ab initio points. For abenchmark we performed a DVR calculation using ab initio calculations at all 22,400 DVR points.
Singlet Methylene: fit to energies and gradientsSinglet Methylene: fit to energies and gradients
0.1
1
10
100
1000
10000
100 1000
number ab initio points
Me
an
an
d R
MS
err
ors
(c
m-1
)
mean est. error
RMS est. error
true mean error
true RMS error
291 355 435259
CASSCF calculation in valence coordinates. Energy range of 20000 cm-1.Estimated error vs. true error (sets of 500 random ab initio calcs).True error (RMS and mean) are sub-wavenumber using 355 points.
Black: estimated errors
Red: true errors
True and estimatederrors are in nearperfect agreement
Singlet Methylene Vibrational Levels: Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) CalculationDiscrete Variable Representation (DVR) Calculation
Absolute errors for 216 vibrational levels (below 20,000 cm-1). Variational vibrational calculations were performed using DVR and a PES fitted with a mean estimated error of 2.0 cm-1
Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points.
0
1
2
3
4
5
6
7
3250
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
0
0.25
0.5
0.75
1
3250
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
Plot of absolute errors for 216 vibrational levels (below 20,000 cm-1). Variational vibrational calculations were performed using a DVR and fitted PESs with mean estimated errors of 0.5 cm-1
Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points.
Singlet Methylene Vibrational Levels: Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) CalculationDiscrete Variable Representation (DVR) Calculation
Singlet Methylene Vibrational Levels: ComparisonsSinglet Methylene Vibrational Levels: Comparisons
0
0.25
0.5
0.75
1
3250
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
0
1
2
3
4
5
6
732
50
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
2.0 cm-1 meanestimated error
0.5 cm-1 meanestimated error
Singlet Methylene Vibrational Levels: Singlet Methylene Vibrational Levels: Discrete Variable Representation (DVR) CalculationDiscrete Variable Representation (DVR) Calculation
0
0.25
0.5
0.75
132
50
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
Absolute errors for 216 vibrational levels (below 20,000 cm-1). Variational vibrational calculations were performed using a DVR and PES fitted with mean estimated errors of 0.33 cm-1
Exact levels were benchmarked by a DVR calculation using ab initio calculations at all 22,400 DVR points. Mean and maximum errors for levels computed with this PES are 0.10 and 0.41 cm-1.
Singlet Methylene Vibrational Levels: ComparisonsSinglet Methylene Vibrational Levels: Comparisons
0
1
2
3
4
5
6
732
50
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
2.0 cm-1 meanestimated error
0.33 cm-1 meanestimated error
0
0.25
0.5
0.75
1
3250
1000
0
1200
0
1500
0
1600
0
1700
0
1800
0
1900
0
2000
0
Vibrational level (cm-1)
Ab
so
lute
err
or
(cm
-1)
IMLS & Classical TrajectoriesIMLS & Classical TrajectoriesPreliminary EffortsPreliminary Efforts
Two difference approaches:
IMLS-accelerate direct dynamics
Dynamics Driven Fitting
(both under development)
In both cases IMLS “intercepts” ab initio PES calls & the electronic structure code is called only if necessary (based on error estimate)
Accelerated Direct DynamicsAccelerated Direct DynamicsTest case: HONO Test case: HONO cis-transcis-trans isomerization isomerization
Trajectories were initiated with 8 quanta in the HON bend to cause Trajectories were initiated with 8 quanta in the HON bend to cause rapid IVR & then isomerization (Want rapid exploration of rapid IVR & then isomerization (Want rapid exploration of configuration space)configuration space)
Integration stepsize: 0.05 fsIntegration stepsize: 0.05 fs
Trajectories were stopped once they spent 3 times the period of the Trajectories were stopped once they spent 3 times the period of the torsion mode in the range of the torsion mode in the range of the transtrans torsion angle or violated energy torsion angle or violated energy conservation criterionconservation criterion
Used HF/cc-pVDZ – want fast Used HF/cc-pVDZ – want fast ab initioab initio calculation to test the method calculation to test the method• IMLS “intercepts” direct dynamics IMLS “intercepts” direct dynamics ab initioab initio PES calls. Electronic PES calls. Electronic
structure code is called only if necessary (based on error estimate) structure code is called only if necessary (based on error estimate) Data collection trajectories are moved back in time if the rare event of Data collection trajectories are moved back in time if the rare event of adding new ab initio data occursadding new ab initio data occurs
Accelerated direct dynamics with IMLS: HONO
Speedup
0
50000
100000
150000
200000
250000
0 1000 2000 3000
ab initio calculations
PE
S e
va
lua
tio
ns
speedup 25.2
speedup 18.1
speedup 7.4
speedup 76.3
0
2
4
6
8
10
12
0 20 40 60 80 100
speedup
Ma
x E
ne
rgy
dri
ft (
kc
al/m
ol-
ps
)
(10,7,5,5) basis of 651 functions Values and gradients usedThe fit began after 25 ab initio "seed" points were generated
10-2
10-5
Factor of ~20 speed upwith 0.06 drift in total energy
Speedup depends on error tolerance
7.6 evaluations per ab initio callfor 10-5 error tolerance
76.3 evaluations per ab initio callfor 10-2 error tolerance
Dynamics Driven Fitting: HONO Dynamics Driven Fitting: HONO cis-transcis-trans isomerization rateisomerization rate
0
1000000
2000000
3000000
4000000
5000000
6000000
7000000
8000000
0 200 400 600 800 1000 1200 1400 1600 1800 2000
ab initio calculations
PE
S e
va
lua
tio
ns
0.1 kcal/mol max error
0.3 kcal/mol max error
0.5 kcal/mol max error
1.0 kcal/mol max error
3.0 kcal/mol max error
A series of sets of trajectories, with various energy conservation limits, are used to explore configuration space.
Accelerated direct dynamics: HONO Accelerated direct dynamics: HONO cis-transcis-trans isomerization rateisomerization rate Rate calculation
y = 1.2491x
R2 = 0.9911
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
0 0.2 0.4 0.6 0.8 1 1.2 1.4
t (ps)
-ln
(Nt/
N0
)
IMLS tolerance (kcal/mol)
k (ps-1) Ab initio points PES calls
0.1 0.97 1867 ~107
0.3 0.99 1020 ~107 0.5 1.02 807 ~107 1.0 1.11 428 ~107 2.0 1.02 362 ~107 3.0 1.02 261 ~107 4.0 1.25 229 ~107 5.0 1.28 221 ~107
Results for PESs fit with 8 differentmaximum error tolerances
Concluding CommentsConcluding Comments
• IMLS allows automated generation of PESs for various IMLS allows automated generation of PESs for various applicationsapplications• SpectroscopySpectroscopy• DynamicsDynamics
• Flexible fits to energies, energies and gradients, or higher Flexible fits to energies, energies and gradients, or higher derivatives…derivatives…
• Interfaced to general classical trajectory code: GenDynInterfaced to general classical trajectory code: GenDyn• Interfaced to electronic structure codesInterfaced to electronic structure codes
• Gaussian, Molpro, Aces IIGaussian, Molpro, Aces II• Robust, efficient, practical methods that assures fidelity toRobust, efficient, practical methods that assures fidelity to the the ab initioab initio data data