Simen Reine February 13, 2015 - Universitetet i oslofolk.uio.no/simensr/Reine_ICCMSE.pdfSimen Reine...

29
Fast orbital-dependent exchange Simen Reine Centre for Theoretical and Computational Chemistry (CTCC), Department of Chemistry, University of Oslo, Norway CTCC Seminar, Tromsø February 13, 2015 Simen Reine (CTCC, University of Oslo) February 13, 2015 1 / 29

Transcript of Simen Reine February 13, 2015 - Universitetet i oslofolk.uio.no/simensr/Reine_ICCMSE.pdfSimen Reine...

Fast orbital-dependent exchange

Simen Reine

Centre for Theoretical and Computational Chemistry (CTCC),Department of Chemistry, University of Oslo, Norway

CTCC Seminar, Tromsø

February 13, 2015

Simen Reine (CTCC, University of Oslo) February 13, 2015 1 / 29

Outline DFT – MP2 electrostatic potential

B3LYP-MP2 cam-B3LYP-MP2

Kristensen et al., Phys. Chem. Chem. Phys., (2012)Kohn-Sham DFT

Why exchange?

RI approximation

ADMM exchange

Results

Simen Reine (CTCC, University of Oslo) February 13, 2015 2 / 29

Kohn-Sham DFTDensity-functional theory P. Hohenberg and W. Kohn, Phys. Rev. B. 136, 864 (1964)

wave function Ψ(x1, x2, . . . , xN ) replaced by electron density ρ(r)

HΨ = EΨ→ E [v ] = infρ

(F [ρ] +

∫v(r)ρ(r)dr

)the universal density functional F [ρ] is unknown

Kohn-Sham DFT L. J. Sham and W. Kohn, Phys. Rev. A. 140, 1133 (1965)

density represented by a single Slater determinant ΨKS = |φ1, . . . , φN |

F [ρ] = TS[ρ] + J[ρ] + X [ρ] + C[ρ] + ∆T [ρ]

≈ TS[φ] + J[ρ]− µK [φ] + XC[ρ]

reintroduce orbitals to get good estimate of the kinetic energyfor pure DFT there is no orbital-dependent exchange (µ = 0)XC[ρ] local in ρ and∇ρfor range-separated functionals, K and X are separated into short- and long-range contributionsby the operator splitting

1r12

=erf(ωr12)

r12+

erfc(ωr12)

r12

orbital-dependent exchange especially important for the long range/large systems

Simen Reine (CTCC, University of Oslo) February 13, 2015 3 / 29

Kohn-Sham DFT

LCAO, AO density matrix

φi =occ∑

i

Cai a(r), Dab =occ∑

i

Cai Cbi

Iterative SCF optimization procedure

F (k)(D(k))→ D(k+1), Fab(D) =

dEdDab

Kohn-Sham matrix

Fab(D) = hab + Jab(ρ)− µKab(D) + XCab(D)

Four bottlenecksCoulomb J(D)exchange K (D)exchange-correlation XC(D)wave-function optimization

Jab =∑

cd

(ab|cd)Dcd

Kab =∑

cd

(ac|bd)Dcd

XCab =∑

g

wgvXC(rg)a(rg)b(rg)

(ab|cd) =∑

tEab

t∑u

Ecdu (t|u)

Jab =∑

tEab

t∑u

Fu(t|u), J-engine

Fu =∑

cd

Ecdu Dcd

L. E. McMurchie and E. R. Davidson, J. Comp. Phys. 26, 218 (1978)

S. Reine et. al Phys Chem Chem Phys, 9, 4771 (2007)

S. Reine, T. Helgaker, R. Lindh, WIREs Comput Mol Sci 2, 290 (2012)

G. R. Ahmadi and J. Almlof, Chem. Phys. Lett. 246, 364 (1995)

C. A. White and M. Head-Gordon, J. Chem. Phys. 104, 2620 (1996)

Simen Reine (CTCC, University of Oslo) February 13, 2015 4 / 29

Linear scaling

Cauchy-Schwartz screening M. Haser and R. Ahlrichs, J. Comp. Chem. 10, 104 (1989)

|(ab|cd)| ≤√

(ab|ab)(cd |cd)

reduces scaling from O(N4) to O(N2)

LinK/ONX for exchange C. Ochsenfeld, C. A. White and M. Head-Gordon, J. Chem. Phys. 109, 1663 (1998) , E. Schwegler, M.

Challacombe and M. Head-Gordon, J. Chem. Phys. 106, 9708 (1997)

Kab =∑cd

(ac|bd)Dcd

Linear scaling Kohn-Sham DFT   Cauchy-Schwartz screening M. Häser and R. Ahlrichs, J. Comp. Chem. 10, 104 (1989)

LinK for exchange C. Ochsenfeld, C. A. White and M. Head-Gordon, J. Chem. Phys. 109, 1663 (1998) E. Schwegler, M. Challacombe and M. Head-Gordon, J. Chem. Phys. 106, 9708 (1997)

  FMM for Coulomb C. A. White, B. G. Johnson, P. M. W. Gill and M. Head-Gordon, Chem. Phys. Lett. 230, 8 (1994)

  XC by atomic grids - linear scaling in nature A. D. Becke, J. Chem. Phys. 88, 2547 (1988), J. M. Perez-Jorda, W. Jang, Chem. Phys. Lett. 241 (1995) 469, O. Treutler, R. Ahlrichs, J. Chem. Phys. 102, 346 (1995)

  Wave-function optimization - linear scaling for sparse matrices P. Sałek, et. al, J. Chem. Phys. 126, 114110 (2007)

(ab | cd) ≤ (ab | ab) (cd | cd)

Kab = (ac |bd)Dcdcd∑

(ab | cd) = qlmab (P)Tlm,l 'm ' (P,Q)ql 'm'

cd (Q)lm,l 'm'∑

b a c d

FMM for Coulomb C. A. White, B. G. Johnson, P. M. W. Gill and M. Head-Gordon, Chem. Phys. Lett. 230, 8 (1994)

(ab|cd) =∑

lm,l′n′qab

lm (P)Tlm,l′m′ (P,Q)qcdl′m′ (Q)

XC by atomic grids - linear scaling in nature A. D. Becke, J. Chem. Phys. 88, 2547 (1988) , J. M. Perez-Jorda, W. Jang,

Chem. Phys. Lett. 241 (1995) , O. Treutler, R. Ahlrichs, J. Chem. Phys. 102, 346 (1995)

Wave-function optimization - linear scaling for sparse matrices P. Sałek et. al, J. Chem. Phys. 126, 114110

(2007)

Simen Reine (CTCC, University of Oslo) February 13, 2015 5 / 29

Why exchange?

B3LYP - MP220% long-range exchange

DFT - MP2 difference in electrostatic potential

B3LYP - MP2

Blue/red regions correspond to increased/decreased

electrostatic potential for DFT compared to MP2

(no long-range correction)

CAMB3LYP - MP2(includes long-range correction)

26

See Frank Jensen, J. Chem Theory Comput. 6, 2726 (2010) for related discussion on electron affinity

camB3LYP - MP265% long-range exchange

DFT - MP2 difference in electrostatic potential

B3LYP - MP2

Blue/red regions correspond to increased/decreased

electrostatic potential for DFT compared to MP2

(no long-range correction)

CAMB3LYP - MP2(includes long-range correction)

26

See Frank Jensen, J. Chem Theory Comput. 6, 2726 (2010) for related discussion on electron affinity

Jakobsen S, Kristensen K and Jensen F, JCTC 9, 3978 (2013)

Simen Reine (CTCC, University of Oslo) February 13, 2015 6 / 29

Why exchange?

Alanine residue peptides, 6-31GHessian eigenvalues and homo-lumo gap

SCF optimizations in small and large molecules

• Diagonalization can be avoided by solving Newton equations

• However, SCF convergence is typically more difficult in larger systems

– small (or negative) HOMO-LUMO gaps and small Hessian eigenvalues in DFT

– lowest Hessian eigenvalue and HOMO-LUMO gap in alanine residue peptides (6-31G)

100 150 200 250 300 3500

0.1

0.2

0.3

0.4alanine residue peptides

HF HOMO!LUMO gap

B3LYP HOMO!LUMO gap

lowest HF Hessian eigenvalue

B3LYP eigenvalue

• We have modified the standard SCF scheme, to make it more robust

11

SCF convergence is typically more difficult in larger systemssmall (or negative) HOMO-LUMO gaps and small Hessian eigenvalues in DFT

(long-range) exchange becomes essential

Simen Reine (CTCC, University of Oslo) February 13, 2015 7 / 29

Why exchange?

Alanine residue peptides, timings HF/6-31G

Illustration: alanine residue peptides

• Features of the code

– diagonalization-free trust-region Roothaan–Hall (TRRH) energy minimization

– trust-region density-subspace minimization (TRDSM) for density averaging

– boxed density-fitting with FMM for Coulomb evaluation (Simen Reine)

– LinK for exact exchange, linear-scaling exchange-correlation evaluation

– compressed sparse-row (CSR) representation of few-atom blocks

• alanine residue peptides

– CPU time against atoms

– HF/6-31G

– 5th SCF iteration

– dominated by exchange

– RH step least expensive

– full lines: sparse algebra

– dashed lines: dens algebra100 200 300 400 500 600

2500

5000

7500

10000

12500

15000

exchange

Coulomb

DSM

RH

14exchange is the bottleneck

even more prominent with increasing basis set size

Simen Reine (CTCC, University of Oslo) February 13, 2015 8 / 29

RI approximation

”Standard” resolution-of-the-identity (RI) approximation

(ab|cd) ≈∑

α,β∈M(ab|α)(α|β)−1(β|cd)

Coulomb J. L. Whitten, J. Chem. Phys. 58, 4496 (1973) , E. J. Baerends, D. E. Ellis and P. Ros, Chem. Phys. 2, 41 (1973) , B. I. Dunlap, J.

W. D. Connolly and J. R. Sabin, J. Chem. Phys. 71, 4993 (1979)

Jab = (ab|ρ) ≈ (ab|ρ) =∑α

(ab|α)cα, cα =∑β

(α|β)−1(β|ρ)

exchange F. Weigend, Phys. Chem. Chem. Phys. 4, 4285 (2002) , R. Polly, H.-J. Werner, F. R. Manby and P. J. Knowles, Mol. Phys.

102,2311 (2004)

Kab =occ∑

i

(ai|bi) ≈occ∑

i

(ai|bi) =occ∑

i

∑α

(ai|α)cbiα , cbi

α =∑β

(α|β)−1(β|bi)

scaling wall at about 1000 basis functions

Pair-atomic RI (PARI) Merlot et. al, JCC 34, 1486 (2013) , D. S. Hollman, H. F. Schaefer, and E. F. Valeev, J. Chem. Phys. 140,

064109 (2014) , S. F. Manzer , E. Epifanovsky and M. Head-Gordon, JCTC (2014) ,

(ab|cd) ≈∑

α∈A∪B

cabα (α|cd) +

∑β∈C∪D

(ab|β)ccdβ −

∑α∈A∪B

∑β∈C∪D

cabα (α|β)ccd

β

Simen Reine (CTCC, University of Oslo) February 13, 2015 9 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

Coulomb

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 10 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

J-engine

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 11 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

RI-J

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 12 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

XC

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 13 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

LinK

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 14 / 29

Performance of RI (and J-engine)

B3LYP, naphthalene

PARI-K

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 15 / 29

RI error

B3LYP, naphthalene, basis–set limit −385.822

RI-J

2 2,5 3 3,5 4

100

1000

10000

100000

Cardinal number X (cc-pVXZ)

Erro

r (m

icro

Hartr

ee)

Simen Reine (CTCC, University of Oslo) February 13, 2015 16 / 29

RI error

B3LYP, naphthalene, basis–set limit −385.822

PARI-K

2 2,5 3 3,5 4

100

1000

10000

100000

Cardinal number X (cc-pVXZ)

Erro

r (m

icro

Hartr

ee)

Simen Reine (CTCC, University of Oslo) February 13, 2015 17 / 29

RI error

B3LYP, naphthalene, basis–set limit −385.822

Basis-set error

2 2,5 3 3,5 4

100

1000

10000

100000

Cardinal number X (cc-pVXZ)

Erro

r (m

icro

Hartr

ee)

Simen Reine (CTCC, University of Oslo) February 13, 2015 18 / 29

RI summary

RI error three orders of magnitude smaller than regular basis-set error

RI-J, speed-up factor 19–174

J-engine, speed-up factor 1.3–4

Combined speed-up for Coulomb factor 25–800

PARI-K, speed-up factor 1.4–9

Coulomb is an order of magnitude (or more) faster than exchange

Greater speed ups for larger systems

Simen Reine (CTCC, University of Oslo) February 13, 2015 19 / 29

ADMM approximation

The expression for the auxiliary density matrix method (ADMM) Guidon et. al, JCTC 6, 2348 (2010) isbased on the following trivial rearrangement of the exchange energy

K (D) = k(d) + K (D)− k(d)

with capital letters representing the regular basis and small letters a smaller auxiliary basis

In the ADMM approximation the two last terms are replaced by a GGA-type exchange

K (D) = k(d) + X(D)− x(d)

in ADMM2 the auxiliary density is obtained by least-square fitting of the projected occupiedorbitals, which gives

d2 = TDT T, T ≡ s−1Q

with s the overlap matrix in the small basis and Q the mixed overlap between the small and theregular basis. This gives the ADMM2 exchange matrix

K 2 = X (D) + T T(k(d2)− x(d2))T

in ADMM1 the projection is subject to the constraint that the projected MOs are orthonormal,giving a density d1 that cannot be expressed directly in terms of the regular AO density matrix D

Simen Reine (CTCC, University of Oslo) February 13, 2015 20 / 29

Charge-constrained ADMM

We have tested the ADMM approximation for Merlot et. al, JCP 141, 094104 (2014)

all electron calculationsvarious GGA correctionfour different basis-set combinationswith three new ADMM variants, ADMMQ, ADMMS and ADMMP

In the ADMMQ approximation the projection is made subject to the charge constraint∫ρ(r)dr =

∫ρ(r)dr → dQ = ξd2, ξ =

NN2

which for the energy gives

KQ(D) = k(dQ)+X(D)−x(dQ)+2Λ [Tr(DS)− Tr(dQs)] , Λ =2N

Tr ((k(dQ)− x(dQ))dQ)

works well in many cases, but in some cases ξ is artificially increased through SCF in turnincreasing the difference k(dQ)− x(dQ)

explained for LDA by the ξ2 dependence for k(dQ) versus the ξ4/3 dependence for x(dQ)

In ADMMP and ADMMS we include the missing ξ2/3 dependence directly in the energyexpression to avoid the artificial increase in ξ

KS(D) = k(dQ) + X(D)− ξ2/3x(dQ) + 2Λ [Tr(DS)− Tr(dQs)]

KP(D) = ξ2k(d2) + X(D)− ξ2x(d2) + 2Λ [Tr(DS)− Tr(dQs)]

Simen Reine (CTCC, University of Oslo) February 13, 2015 21 / 29

M19 ADMM benchmark, 6-31G**/3-21G

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

TZVPP/SVP

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

cc-pVTZ/cc-pVDZ

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

cc-pVTZ/3-21G

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

6-31G**/3-21G

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

Simen Reine (CTCC, University of Oslo) February 13, 2015 22 / 29

M19 ADMM benchmark, cc-pVTZ/3-21G

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

TZVPP/SVP

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

cc-pVTZ/cc-pVDZ

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

cc-pVTZ/3-21G

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

�20 �10 0 10 20

0

0.1

0.2

0.3

0.4

Error (mEh)

6-31G**/3-21G

ADMM2/PBEX

ADMM2/KT3X

ADMM2/OPTX

ADMMS/PBEX

ADMMS/KT3X

ADMMS/OPTX

Simen Reine (CTCC, University of Oslo) February 13, 2015 23 / 29

Performance of ADMM

B3LYP, naphthalene

PARI-K

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 24 / 29

Performance of ADMM

B3LYP, naphthalene

ADMM

2 3 4 5

1

10

100

1000

10000

Cardinal number X (cc-pVXZ)

Tim

ings

(s)

Simen Reine (CTCC, University of Oslo) February 13, 2015 25 / 29

RI error

B3LYP, naphthalene, basis–set limit −385.822

Basis-set error

2 2,5 3 3,5 4

100

1000

10000

100000

Cardinal number X (cc-pVXZ)

Erro

r (m

icro

Hartr

ee)

Simen Reine (CTCC, University of Oslo) February 13, 2015 26 / 29

ADMM error

B3LYP, naphthalene, basis–set limit −385.822

ADMM

2 2,5 3 3,5 4

100

1000

10000

100000

Cardinal number X (cc-pVXZ)

Erro

r (m

icro

Hartr

ee)

Simen Reine (CTCC, University of Oslo) February 13, 2015 27 / 29

Example calculation - Titin

Model I27SS, 392 atoms, 8 MPI nodes, 16 cores/nodeB3LYP/cc-pVTZ(df-def2/3-21G)

8700 regular, 18761 RI and 2196 ADMM basis functionsIntel Xeon Processor E5-2670, 2.60 GHz

KS matrix 86ADMM-K 32XC 24RI-J 30RH diag 76

LinK 2499J-engine 2121

RI-J 4mHADMMS/KT3 −34mHADMM2/PBE −114mH

#SCF 15RI+ADMM total 3256RI+LinK total 40436

Simen Reine (CTCC, University of Oslo) February 13, 2015 28 / 29

Acknowledgements

Trygve Helgaker

Patrick Merlot

Robert Izsak

Thomas Kjærgaard

Thomas Bondo Pedersen

Alex Borgoo

NOTUR

CTCC

NFR

And you for your attention - thanks!

Simen Reine (CTCC, University of Oslo) February 13, 2015 29 / 29