From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi...

15
1 From Fundamental First-Principle Calculations to NanoEngineering Applications: Review of the NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal- culations can adequately address the needs for ever higher levels of numerical accuracy and high-performance in large-scale electronic structure simulations, and pioneer the fundamental study of quantum many-body effects in a large number of emerging nanomaterials. Index Terms—DFT, TDDFT, electronic structure, first- principle, real-space mesh, real-time propagation, excited-state, plasmonic, FEAST, NESSIE I. I NTRODUCTION T HE TECHNOLOGY for electronic devices has been on a rapidly rising trajectory since the 1960s. The main factor in this development has been the ability to fabricate ever smaller silicon CMOS devices (‘Moore’s Law’), with today’s device sizes in the nanometer range. The ability to control electronic materials and understand their properties has been a driving force for technological breakthroughs. The emergence of new nanoscale materials and devices, whose operating principles rely entirely on quantum effects, necessi- tates a fundamental and comprehensive understanding of the nanoscale physics of systems. First principle calculations offer a unique approach to study materials that start directly from the mathematical equations describing the physical laws and do not require any empirical parameters aside from fundamental constants. They are known as electronic structure calculations when applied to the configuration of electrons in a molecule or solid which determine most of the physical properties of matter through chemical bonding. Fundamentals laws governing the physics have been known since the beginning of the 20 th century with the development of quantum mechanics. The difficulty, then, does not lie in formulating the problem, but actually solving it. Atom-by-atom large-scale first-principle calculations have become critical for supplementing the ex- perimental investigations and obtaining detailed electronic structure properties and reliable characterization of emerging nanomaterials. These simulations are essential to assist the every day work of numerous engineers and scientists and can universally impact a wide range of disciplines (engineering, physics, chemistry, and biology) that span technological fields of computing, sensing and energy. In spite of the enormous progress that has been made in the last few decades, the room for improvement in first- principle calculations is still significant. Traditional numerical Department of Electrical and Computer Engineering, University of Mas- sachusetts, Amherst, MA, 01003 USA e-mail: [email protected] and modeling techniques are indeed largely inadequate to cope with the new generation of challenges encountered in large- scale nanoengineering applications including systems with many thousand atoms. Atomistic simulations must adapt to leverage the current needs in scalability by capitalizing on the massively parallel capabilities of modern high-performance computing (HPC) platforms. Additionally, well-established public or commercial software packages were originally in- tended to investigate the basic electronic structure properties of materials using ground-state calculations. They possess only limited capabilities for performing excited-state calculations that can efficiently model and predict quantum many-body effects in emerging nanomaterials. The ability to capture these fundamental nanophysics effects is increasingly important for exploring and prototyping new revolutionary functional mate- rials in nanotechnology. There is an urgent need in nanoengi- neering for new quantum-based transformative solutions that will play a key role in future electronics including plasmonics, phononics and excitonics. Future breakthrough could enable disruptive technologies to compete directly with CMOS or be integrated into existing systems to increase throughput and decrease power dissipation. To this end, new one and two- dimensional nanostructures (e.g. graphene, carbon nanotubes, MoS 2 , layered transition metal dichalcogenides) have been the center of large research efforts, with first-principle atomistic simulations playing a significant role. Plasmonic devices, that rely on collective many-body effects, have also shown promise as high frequency analog sensors to be used in bio-medical applications and telecommunications. This paper presents the entire computational process needed to bring fundamental first-principle calculations up to the level where they can significantly impact innovations in na- noengineering. As depicted in Figure 1, modern first principle calculations must be capable of: (i) achieving both accuracy and scalability; (ii) allowing for the study emerging many- body functionality; and (iii) fully taking advantage of recent advanced in algorithms development for high-performance computing (HPC). The basics of first principle modeling are first summarized in Section II. From ground-state to excited-state calculations, the NESSIE modeling framework is then introduced in Section III. NESSIE’s accuracy and scalability are described in detail using various simulation results. The paper ends by discussing plasmonics in Section IV as a nanoengineering application of large-scale first-principle calculations. arXiv:2002.06732v1 [cond-mat.mtrl-sci] 17 Feb 2020

Transcript of From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi...

Page 1: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

1

From Fundamental First-Principle Calculations toNanoEngineering Applications: Review of the

NESSIE projectJames Kestyn, Eric Polizzi

Abstract—This paper outlines how modern first-principle cal-culations can adequately address the needs for ever higherlevels of numerical accuracy and high-performance in large-scaleelectronic structure simulations, and pioneer the fundamentalstudy of quantum many-body effects in a large number ofemerging nanomaterials.

Index Terms—DFT, TDDFT, electronic structure, first-principle, real-space mesh, real-time propagation, excited-state,plasmonic, FEAST, NESSIE

I. INTRODUCTION

THE TECHNOLOGY for electronic devices has been ona rapidly rising trajectory since the 1960s. The main

factor in this development has been the ability to fabricateever smaller silicon CMOS devices (‘Moore’s Law’), withtoday’s device sizes in the nanometer range. The ability tocontrol electronic materials and understand their propertieshas been a driving force for technological breakthroughs. Theemergence of new nanoscale materials and devices, whoseoperating principles rely entirely on quantum effects, necessi-tates a fundamental and comprehensive understanding of thenanoscale physics of systems. First principle calculations offera unique approach to study materials that start directly from themathematical equations describing the physical laws and donot require any empirical parameters aside from fundamentalconstants. They are known as electronic structure calculationswhen applied to the configuration of electrons in a molecule orsolid which determine most of the physical properties of matterthrough chemical bonding. Fundamentals laws governing thephysics have been known since the beginning of the 20th

century with the development of quantum mechanics. Thedifficulty, then, does not lie in formulating the problem, butactually solving it. Atom-by-atom large-scale first-principlecalculations have become critical for supplementing the ex-perimental investigations and obtaining detailed electronicstructure properties and reliable characterization of emergingnanomaterials. These simulations are essential to assist theevery day work of numerous engineers and scientists and canuniversally impact a wide range of disciplines (engineering,physics, chemistry, and biology) that span technological fieldsof computing, sensing and energy.

In spite of the enormous progress that has been madein the last few decades, the room for improvement in first-principle calculations is still significant. Traditional numerical

Department of Electrical and Computer Engineering, University of Mas-sachusetts, Amherst, MA, 01003 USA e-mail: [email protected]

and modeling techniques are indeed largely inadequate to copewith the new generation of challenges encountered in large-scale nanoengineering applications including systems withmany thousand atoms. Atomistic simulations must adapt toleverage the current needs in scalability by capitalizing onthe massively parallel capabilities of modern high-performancecomputing (HPC) platforms. Additionally, well-establishedpublic or commercial software packages were originally in-tended to investigate the basic electronic structure propertiesof materials using ground-state calculations. They possess onlylimited capabilities for performing excited-state calculationsthat can efficiently model and predict quantum many-bodyeffects in emerging nanomaterials. The ability to capture thesefundamental nanophysics effects is increasingly important forexploring and prototyping new revolutionary functional mate-rials in nanotechnology. There is an urgent need in nanoengi-neering for new quantum-based transformative solutions thatwill play a key role in future electronics including plasmonics,phononics and excitonics. Future breakthrough could enabledisruptive technologies to compete directly with CMOS or beintegrated into existing systems to increase throughput anddecrease power dissipation. To this end, new one and two-dimensional nanostructures (e.g. graphene, carbon nanotubes,MoS2, layered transition metal dichalcogenides) have been thecenter of large research efforts, with first-principle atomisticsimulations playing a significant role. Plasmonic devices, thatrely on collective many-body effects, have also shown promiseas high frequency analog sensors to be used in bio-medicalapplications and telecommunications.

This paper presents the entire computational process neededto bring fundamental first-principle calculations up to thelevel where they can significantly impact innovations in na-noengineering. As depicted in Figure 1, modern first principlecalculations must be capable of: (i) achieving both accuracyand scalability; (ii) allowing for the study emerging many-body functionality; and (iii) fully taking advantage of recentadvanced in algorithms development for high-performancecomputing (HPC). The basics of first principle modelingare first summarized in Section II. From ground-state toexcited-state calculations, the NESSIE modeling frameworkis then introduced in Section III. NESSIE’s accuracy andscalability are described in detail using various simulationresults. The paper ends by discussing plasmonics in Section IVas a nanoengineering application of large-scale first-principlecalculations.

arX

iv:2

002.

0673

2v1

[co

nd-m

at.m

trl-

sci]

17

Feb

2020

Page 2: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

2

Fig. 1. Requirement in modern first-principle calculations at a glance.

II. FIRST-PRINCIPLE MODELING: FROM PHYSICS TOALGORITHMS

The field of first-principle modeling can be broadly sep-arated into three categories: (i) physical models to reducethe complexity of the full many-body solution while keepingmuch of the important physics. The choice of a physicalmodel is often motivated by the objectives of the simulations;(ii) discretization and mathematical models that transformphysical equations into the language of linear algebra; and(iii) computing and numerical algorithms to solve the resultingproblems. In order to improve on current software imple-mentation by fully capitalizing on modern HPC computingplatforms, it is essential to revisit all the various stages ofthe electronic structure modeling process which are brieflysummarized in the following.

A. Physical Modeling

A direct numerical treatment of a full many-bodySchrodinger equation leads to a deceptively simple lineareigenvalue problem, which is well known to be intractablebecause of its exponential growing dimension with the numberof particles. This limitation has historically motivated the needfor lower levels of sophistication in the description of the elec-tronic structure using a single electron picture approximationwhere the size of the Hamiltonian operator ends up scalinglinearly with the number of electrons. First-principle electronicstructure calculations are usually performed within the single-electron picture [1], [2] using either quantum chemistry (i.e.post Hartree-Fock) methods or, as an alternative to wavefunction based methods, Density Functional Theory (DFT)associated with the Kohn-Sham equations [3], [4]. AlthoughDFT does not allow for systematic accuracy as traditionalquantum chemistry techniques would, it is the method ofchoice when dealing with moderate sized systems containingmore than a handful of atoms. DFT has been widely used in

computational material science for decades, since it provides(in principle) an exact method for calculating the ground-statedensity and energy of a system of interacting electrons using anon-linear single electron Schrodinger-like equation associatedwith exchange-correlation (XC) functionals. In practice, thereliability of DFT depends on the numerical approximationsused for the XC terms that range from the simplest localdensity approximation (LDA) or the generalized gradientapproximation (GGA), to more advanced (hybrid) schemeswhich are still the subject of active research efforts [5]–[8].Solutions of the DFT/Kohn-Sham problem are routinely usedin the calculations of many ground-state properties including:total energy and ionization potential, crystal-atomic structure,ionic forces, vibrational frequencies, and phonon bandstructurevia pertubation theory.

Although DFT cannot fundamentally provide informationon excited-states and many-body properties, the Kohn-Shameigenvectors are often needed by more advanced techniques:e.g. either Green’s function-based [9] (e.g. GW, Bethe-Salpeter) or time-dependent density-based (i.e. TDDFT [10],[11]) approaches. The pros and cons of these approaches arediscussed in Ref. [12]. TDDFT, proposed by Runge and Gross[13], continues to gain popularity as one of the most numer-ically affordable many-body techniques capable of providingfairly accurate results. TDDFT has been successfully appliedto calculate many physical observables of the time-dependentHamiltonian, such as excitation energies and complex permit-tivities, as well as non-linear phenomena. It is often used toobtain the absorption spectra of complex molecular systems.While the design of advanced time-dependent XC functionalsis still a challenging task [14], ALDA (Adiabatic LDA) forTDDFT has been found to perform extremely well on a widevariety of systems by capturing many nanoscopic effects (suchas plasmonic effects) which, in turn, can be quantitativelycompared with the experimental data.

TDDFT calculations can be performed in frequency or real-

Page 3: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

3

time domain. The real-time TDDFT technique is a relativelyrecent approach introduced by Yabana and Bertsch in [15],[16], and it has become an important focus of the TDDFTresearch activities. It has notably been integrated into thesoftware packages Octopus [17], [18], NWChem [19], andGPAW [20] for the study of molecular systems. In essence,spectroscopic information can be obtained using the standardformalism of dipole time-response from weak short-polarizedimpulses in any given direction of the system, and whichrequires all the occupied single electron wave functions tobe propagated (non-linearly) in time. The imaginary part ofthe dipole’s Fourier transform provides the dipole strengthfunction. The absorption spectrum is then obtained alongwith the expected “true many-body” excited energy levels. Incontrast to the numerical models derived from the TDDFTlinear response theory in frequency domain [10], [21]–[23],the real-time TDDFT approach is better suited for achievinglinear parallel scalability and it can also address any form ofnon-linear responses, including ion dynamics [24].

B. Mathematical Modeling and Discretization

Although, first-principle calculations have provided a prac-tical (i.e. numerical tractable) path for solving the electronicstructure problem, they have also introduced new numericalchallenges. Within the single electron picture, the resultingeigenvalue problem becomes fully non-linear since the Hamil-tonian operator depends on all the occupied eigenfunctions(i.e. H(ψ)ψ = Eψ). In practice, this non-linear eigenvectorproblem is commonly addressed using direct minimizationschemes or self-consistent field methods (SCF) wherein aseries of linear eigenvalue problems (i.e. Hψ = Eψ), needs tobe solved iteratively until convergence. Computing the electrondensity at a given iteration step becomes one of the mosttime-consuming and challenging part of the DFT electronicstructure calculations. Successfully reaching convergence byperforming SCF iterations is of paramount importance tofirst-principle electronic structure calculations software. Real-time TDDFT comes also with its own set of mathematicaland numerical challenges for performing the time-propagation,those will be discussed further in Section III.

To perform the numerical calculations, the mathematicalmodels need first to be discretized by expanding the wavefunctions over a set of basis functions. One can identify threemain discretization techniques that have been widely used overthe past four decades by both the quantum chemistry and thesolid-state physics communities [1]: (i) the linear combinationof atomic orbitals (LCAO) (along with the dominant useof Gaussian local basis sets), (ii) the plane wave expansionscheme, and (iii) the real-space mesh techniques [25]–[37](also loosely called numerical grids) based on finite differencemethod (FDM), finite element method (FEM), spectral elementor wavelets methods. Each of these approaches have pros andcons.• Plane waves have traditionally been used within the solid-

state physics community because their natural periodicnature can be easily applied to crystal structures. How-ever, this can be cumbersome when dealing with finite

systems where the computational domain must be mademuch larger than the molecular size to ensure interac-tions due to periodicity are negligible. Additionally, theyoften make use of pseudopotentials to mimic the effectsof core electrons, which do not directly participate inchemical bonding and would otherwise necessitate a verylarge number of plane waves due to their high-frequencyvariations.

• LCAO benefits from a large collection of local basis setsthat has been improved and refined throughout the yearsby the quantum chemistry community to obtain high-level of accuracy in simulations. However, LCAO basesmay suffer from numerical truncation errors of finiteexpansions, and the solutions cannot be universally andsystematically improved towards convergence.

• Real-space mesh techniques provide a natural way ofquantifying atomic information by employing universallocal mathematical approximations. They can easily han-dle the treatment of various boundary conditions, suchas Dirichlet (for the confined directions), periodic orabsorbing (for transport simulations). Similarly to planewave schemes, however, the high-level of refinementneeded to capture the core electrons may be problematic.

In all cases, the level of approximation in the discretizationstage, is bounded by the capabilities of the numerical algo-rithms for solving the resulting system matrices. In modernnanoelectronic applications, one aims at fully utilizing thepower of modern HPC architectures to tackle large-scale finitesystems by exploiting parallelism at multiple levels. In thiscontext, real-space mesh techniques offer the most significantadvantages. They produce very sparse matrices that can takeadvantage of recent advances made in O(N) linear scalingmethods and domain decomposition techniques.

C. Computing

Much of the progress in this field is directly tied toadvancements in algorithmic research allowing larger andmore complex systems to be simulated. Within the SCF-DFTprocedure, computing the electron density by solving the linearand symmetric eigenvalue problem at each iteration becomesthe major computational challenge. The characterization ofcomplex systems and nanostructures of current technologicalinterests, requires the repeated computations of many tensof thousands of eigenvectors, for eigenvalue systems thatcan have sizes in the tens of millions. It is important tomention that Green’s function-based formalism can alterna-tively be used for computing directly the electron density(using efficient evaluations of the diagonal elements of theGreen’s function along a complex contour, e.g. [38]–[40]).However, this method gives rise to difficulties in algorithmiccomplexity (i.e. O(N2) for 3D systems), parallel scalabilityand accuracy. In that regard, it is difficult to bypass the wavefunction formalism, and progress in large-scale electronicstructure calculations can then be tied together with advancesin numerical algorithms for addressing the eigenvalue problem,in particular.

Page 4: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

4

Traditional methods for solving the eigenvalue problem(including Arnoldi, Lanczos methods, or other Davidson-Jacobi techniques [41], [42]) and related packages [43], arelargely unable to cope with these challenges. In particular,they suffer from the orthogonalization of a very large basiswhen many eigenpairs are computed. In this case, a divide-and-conquer approach that can compute wanted eigenpairs byparts becomes mandatory, since ’windows’ or ’slices’ of thespectrum can be computed independently of one another andorthogonalization between eigenvectors in different slices is nolonger necessary. These issues have motivated the developmentof a new family of eigensolver based on contour integrationtechniques [44]–[47] such as the FEAST eigensolver [48],[49]. FEAST is an optimal accelerated subspace iterativetechnique for computing interior eigenpairs making use of a ra-tional filter to approximate the spectral projector [50]. FEASTcan be applied for solving both standard and generalized formsof the Hermitian or non-Hermitian problems [71]. Once agiven search interval is selected, FEAST’s main computationaltask consists of solving a set of independent linear systemsalong a complex contour. Not only does the FEAST algorithmfeature some remarkable and robust convergence properties[50], [51], it can exploit natural parallelism at three differentlevels (L1, L2 or L3): (L1) search intervals can be treatedseparately (no overlap), (L2) linear systems can be solvedindependently across the quadrature nodes of the complexcontour, and (L3) each complex linear system with multipleright-hand-sides can be solved in parallel. Parallel resourcescan be placed at all three levels simultaneously in order toachieve scalability and optimal use of the computing platform.

III. FIRST-PRINCIPLE CALCULATIONS USING NESSIE

A first-principle simulation software must be capable ofaddressing all the modern challenges summarized in Figure 1.One of the major goal in modern first-principle calculationsis to develop numerical algorithms and simulation softwarefor electronic structure that can scale the system size to thou-sands of atoms, without resorting to additional approximationsbeyond the DFT and TDDFT physical models. The targetcomputing architecture is usually comprised of thousands ofprocessor cores and contains multiple hierarchical levels ofparallelism.

The NESSIE project [52] is an electronic structure codethat uses a real-space finite element (FEM) discretization anddomain decomposition to perform all-electron ground-stateDFT and real-time excited-state TDDFT calculations. Thecode is written to take advantage of multi-level parallelisms totarget systems containing many distributed-memory computenodes. Custom numerical algorithms have been developed forthe eigenvalue problems and linear systems representing themajor linear algebra operations within the software. NESSIE’scapabilities can be separated into three main categories: (i) ac-curate large-scale full core potential DFT calculations usingreal space FEM discretization and domain-decomposition;(ii) TDDFT real-time propagation for efficient spectroscopiccalculations allowing the study of many-body effects; and(iii) massively parallel implementation on modern high-end

computing platforms using state of the art parallel algo-rithms/solvers.

The next sections present a step-by-step description ofNESSIE’s modeling framework applied to the benzenemolecule as an example.

A. An All-Electron HPC Framework

In NESSIE, the equations are discretized using FEM withquadratic (P2) or cubic (P3) order, along with a muffin-tin domain-decomposition (DD) technique. The latter hasbeen proposed as early as the 1930’s [53] to specificallyaddress a multi-center atomic system. The whole simulationdomain is separated into multiple atom-centered regions (i.e.muffins) and one large interstitial region. Without any lossof generality, Figure 2 illustrates the essence of the muffin-tindomain decomposition using FEM and applied to the Benzenemolecule. The 3D finite-element muffin-tin mesh can be builtin two steps: (i) a 3D atom-centered mesh which is highlyrefined around the nucleus to capture the core states, and (ii) amuch coarser 3D interstitial mesh that connects all the muffins(generated in NESSIE using the Tetgen software [54], [55]).For the atom-centered mesh, which is common to all atomsof the same atomic number, it is convenient to use successivelayers of polyhedra similar to the ones proposed in [37]. Thisdiscretization provides both tetrahedra of good quality, and anarbitrary level of refinement i.e. the distance between layerscan be arbitrarily refined while approaching the nucleus (thisis known as a h-refinement for FEM).

The muffin-tin decomposition can bring flexibility in thediscretization step (different basis-sets can also be used inde-pendently to describe the different regions), reduce the maincomputational efforts within the interstitial region alone, andshould also guaranteed maximum linear parallel scalabilityperformances. It is important to note that, independently ofthe type of atoms, the outer layer of the muffin is consistentlyproviding the same (relatively small) number of connectivitynodes nj with the interstitial mesh at the muffin edges (i.e.nj = 98, or 218 nodes respectively using quadratic P2 orcubic P3 FEM). Consequently, the size of the system matrixin the interstitial region stays independent of the size of theatom-centered regions, and the approach can then ideally dealwith full potential (all-electron).

Once the “Schrodinger” eigenvalue problem (i.e. Hψ =Eψ) is reformulated using domain decomposition strategies,the resulting (and still exact) problem now takes a a non-linear form in the interstitial region (i.e. HI(E)ψI = EψI ,since the boundary conditions at the interfaces with the muffinsare energy dependent). As originally pointed out by Slater in1937 while introducing the muffin-tin augmented plane wave(APW) method, this non-linear eigenvalue problem gives riseto an energy dependent secular equation which cannot behandled by traditional eigenvalue algorithms. Although solvingsuch non-linear problem explicitly is not impossible [56], [57],it remains practically challenging and it is still the subjectof active research efforts [58], [59]. Therefore, the main-stream approaches to all-electron (i.e. full-potential) electronicstructure calculations in the solid-state physics community,

Page 5: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

5

Fig. 2. Using a muffin-tin domain-decomposition method, the whole simula-tion domain is separated into multiple atom-centered regions (i.e. muffins) andone large interstitial region. The top figure represents a 2D section of localFEM discretization using a coarse interstitial mesh (represented only partiallyhere) connecting all of the atoms of a Benzene molecule. The bottom figuresrepresent a finer mesh for the atomic (muffin) regions suitable to capture thehighly localized core states around the nuclei.

have been mostly relying on approximations, such as directlinearization techniques, which have been improved through-out the years (e.g. LAPW, LMTO, LAPW+lo, etc.) [60]–[64]. Alternatively, linear eigenvalue problems can directly beobtained from pseudopotential approximation techniques [65]–[68] that eliminate the core states by introducing smooth butnon-local potentials in muffin-like atom-centered regions.

In NESSIE, an exact strategy has been introduced for per-forming all-electron electronic structure calculations within aparallel computing environment [69], [70]. The approach relieson the shift-and-invert capability of eigenvalue algorithms suchas FEAST, which leads to formulating well-defined linearsystems.0 Domain decomposition methods have been wellstudied and are a natural framework for addressing large sparselinear systems generated from real-space meshes. They areoften associated with the use of distributed-memory numericalalgorithms to address the data distribution using the MessagePassing Interface (MPI) paradigm. Consequently, the solutionof the FEAST’s linear systems can be fully parallelized usingMPI since the muffin-tin decomposition naturally allows each

muffin to be factorized and solve independently. When themuffin-tin decomposition is applied to a given linear system,the resulting linear system in the interstitial domain (a.k.a., theSchur complement) remains linear. Figure 3 illustrates howthe muffin-DD can be used to optimally solve the FEAST’slinear systems. The details of the muffin-DD strategy imple-mentation have been provided in Ref. [69]. In comparisonwith linearization techniques discussed above, the set of ‘pivotenergies’ used to evaluate the interstitial Hamiltonian systemare now explicitly provided by the FEAST algorithm (i.e. theycorrespond to quadrature nodes in the complex plane) and theyguarantee global convergence toward the correct solutions (i.e.no approximation needed). Since the complexity of interstitialsystem scales linearly with the number of atoms while in-cluding non-locality only at the interfaces with the muffins,one can also demonstrate that this all-electron framework is(paradoxically) capable of better scalability performances thanpseudopotential approaches on parallel architectures.

Fig. 3. In NESSIE, the muffin-DD solver operates in three stages: (i) theatoms are distributed throughout the MPI processors and factorized at a givenenergy pivot (i.e. shift value provided by FEAST). The boundary conditions(i.e. self-energy) are then derived at the interfaces of the muffins. (ii) theresulting interstitial problem (Schur complement) is solved in parallel; and(iii) knowing the exact solution at the muffin interfaces, the solution withineach atom is retrieved in parallel.

The FEAST solver has been recently undergone a significantupgrade to support the MPI-MPI-MPI distributed parallelprogramming model in v4.0 [85] (where the last ’MPI’ refersto the linear system solves at level L3). Furthermore, NESSIEcan take advantage of the first two MPI levels of parallelismoffered by FEAST assuming that the eigenvalue spectrum isdistributed among the compute nodes. At the second levelL2, FEAST can naturally distribute all the linear systemsassociated with a given search interval (typically less than 10).At the first level L1, FEAST enables ‘spectrum slicing’ whereall the intervals of interests are solved in parallel. The use ofspectrum slicing is essential to address a major bottleneck inlarge-scale DFT calculations concerning the computation andstorage of the DFT wave functions. The storage requirement,

Page 6: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

6

in particular, keeps increasing linearly with the number ofelectrons in the nanostructures. Even with simplified physicalmodel such as DFT, it becomes particularly difficult to scalethe electronic structure problem for systems containing morethan a few hundred electrons without the ability to performspectrum slicing. This technique is illustrated for benzene inFigure 4, using three elliptical contours for FEAST, one thatcomputes the core states and two others that computes halfof the valence states. In larger systems, each contour oftencontains hundreds of eigenvalues.

-24 -20 -16 -12 -8 -4 0Real Energy (eV)

Valence States

-268 -264 -260Real Energy (eV)

-3

-2

-1

0

1

2

3

Imag

inar

y En

ergy

(eV

)

ContourFull ContourEigenvalue

Core States

Fig. 4. Splitting the full search interval into three separate contoursfor benzene. The lowest energy contour captures the core states while theother two target valence electrons. Each half-contour has eight quadraturenodes shown as circles. Symmetry allows the FEAST algorithm to onlyperform computations for the upper upper half of the contour (solving eightindependent linear systems per interval).

B. DFT Ground-State Calculations and Scalability

The DFT/Kohn-Sham problem can be expressed as:[− h2

2m∇2 + vKS [n](r)

]ψi = Eiψi,

with n(r) = 2

Ne∑i=1

|ψi(r)|2, (1)

where the Kohn-Sham potential, vKS [n] = vH [n] + vXC [n] +vext, is composed of the Hartree potential vH solution ofthe Poisson equation, the XC potential, and other externalpotential vext including the ionic core potential. The Ne lowestoccupied electronic states ψi(i=1,...,Ne) are needed to com-pute the electron density (the factor 2 stands for the electronspin). Formally, the system (1) forms a non-linear eigenvectorproblem which is commonly addressed using a self-consistentfield method (SCF) wherein a series of linear eigenvalueproblems need to be solved iteratively until convergence. Thenaive approach which consists of updating the input electrondensity at each SCF iteration directly from the output electrondensity, results in large oscillations between SCF iterations.This approach is very unlikely to converge as the initialguess for the density is usually far from the ground-statesolution. Instead, traditional SCF methods employ successiveapproximation iterates of a fixed point mapping to generatethe new input electron (a.k.a., ’mixing’ techniques).

1) Discussions on Convergence: Using a mixing technique,two different electron densities are considered to construct theinput density at the (k+ 1)th SCF iteration: the input densityρkin used to construct the Kohn Sham Hamiltonian and the

output density ρkout computed from the wave functions. Withsimple mixing, the input electron density for the next iterationcan be computed as,

ρk+1in = (1− β)ρkin + βρkout, (2)

where the parameter β is usually chosen less than 1/2. This,however, will converge very slowly. In order to increase theconvergence rate, more sophisticated methods have been de-veloped for solving this fixed-point problem. Newton methodscannot be used in electronic structure since it is impracticalto construct the Jacobian matrix. Other quasi-Newton methodshave been developed in the 1960s, notably by Anderson [72]and Broyden [73], which do not require the Jacobian orHessian. These techniques were later refined in the 1980s inthe context of SCF iteration by Pulay [74], [75] and havesince been expanded upon [76]–[79]. They are also relatedto Krylov methods and GMRES [80]–[82]. For electronicstructure calculations these iterative techniques are usuallyreferred to as Direct Inversion of the Iterative Subspace (DIIS)methods. The general idea is to build the input electrondensity as a linear combination of past densities. One can thenconstruct the input mixing density,

ρk+1in = c0ρ

0in + c1ρ

1in + · · ·+ ckρ

kin, (3)

and the output mixing density,

ρk+1out = c0ρ

0out + c1ρ

1out + · · ·+ ckρ

kout, (4)

from the previous input and output densities. The input for thenext iteration is chosen as a linear combination of the mixingdensities:

ρk+1in = (1− β)ρk+1

in + βρk+1out , (5)

where β is again referred to as the mixing parameter. Thisapproach can be truncated in order to keep the density sub-space size small. The inclusion of more densities in the mixingsubspace can result in better convergence, but has diminishingreturns. Keeping a history of ten to twenty input and outputdensities seems to be more than sufficient. Better performancecan be obtained by choosing a larger value of β, but it may alsoresult in instability. However, improvement in convergence canbe obtained by progressively increasing the β parameter alongthe SCF iterations.

The coefficients c0, ..., ck of (3) and (4) are the samefor both the input and output mixing subspaces. They arecomputed by solving a k × k linear system with one right-hand-side,

Mk×k

ck×1

= rk×1

, (6)

where the ith element of r depends on the difference betweenthe output and input densities of the current iteration k and aprevious iteration i,

ri =

∫Ω

ρkout,in ×[ρkout,in − ρiout,in

]dΩ, (7)

with ρpout,in = (ρpout − ρpin), and each element Mij of matrix

M takes into account the densities at iterations i and j:

Mij =

∫Ω

[ρkout,in − ρiout,in

]×[ρkout,in − ρ

jout,in

]dΩ. (8)

Page 7: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

7

The effect of beta mixing ratio β and the number of densitymixing subspaces kept in memory can bee seen in Figure 5while analyzing the convergence of benzene.

0.1 0.2 0.3 0.4 0.5 0.6Beta

0

10

20

30

40

50

60

70

80

90

100

# S

CF

Ite

rati

on

Simple Mixing

0 0.2 0.4 0.6 0.8 1Beta

0

5

10

15

20

25

30

35

40

# S

CF

Ite

rati

on

Anderson Mixing

0 4 8 12 16 20 24 28 32# Subspaces

0

10

20

30

40

50

# S

CF

Ite

rati

on

Anderson Mixing with Beta=0.1

Fig. 5. The effect of SCF parameters on the convergence of benzene. Theconvergence criteria is satisfied when the relative error is below 10−10 forthe total energy. For values of β = 0.1 and β > 0.5 the simple mixing (left)does not converge. With Anderson mixing (middle and right) the total energyconverges consistently much faster, and only a few mixing subspaces can bekept in memory.

In NESSIE, the mixing scheme can take advantage of oneimportant feature of the FEAST eigensolver. As the densitybegins to converge, the previous eigenvector subspace solutioncan be used as a very good initial guess for solving the currenteigenvalue problem. Since the eigenvalue convergence criteriais set to only slightly exceed the current SCF convergence,FEAST must only perform a single subspace iteration onaverage and, with parallelism, solve a single linear system perdiagonalization.

The other major numerical operation in ground-state DFT,is the computation of the Hartree potential - the potentialcorresponding to a classical charge distribution - throughthe solution to the Poisson equation. Once the Dirichletboundary conditions have been determined at the edge the ofsimulation domain, this amounts to solving a real symmetric-positive-definite linear system with one right-hand-side. ThePoisson equation gives then rise to a much less expensivelinear systems than the ones obtained with the eigenvaluecomputation. The computation of the boundary conditions forPoisson uses the integral form of the Poisson equation andscales as O(N2

s /p) where Ns is the number of surface nodes(which stays relatively small using a coarse FEM mesh farfrom the atomistic region), and p is the total number of MPIprocesses.

Selected DFT/LDA ground-state simulation results for ben-zene obtained using NESSIE and other all-electron first-principle software, are reported in Table I. NESSIE resultsare in excellent agreement with other approaches. In addition,a real-space mesh discretization such as FEM can easilybe refined either by adding more local mesh nodes or byincreasing the accuracy of the basis functions. Consequently,the numerical solutions can systematically converge toward theexact solutions at the level of the physical model (LDA in theexample).

2) Discussions on Scalability: In Table I, the NESSIE FEMdiscretization leads to system matrices of size 2, 454 for theatom-centered mesh (i.e. a single muffin) using P2, or 8, 155using P3. The interstitial size matrix varies from 20, 653 forP2 to 69, 305 for P3. These system matrices can effectivelybe handled in parallel using the muffin-tin decomposition and

Method/Energy(eV) E1 EHOMO Etot

NWChem 6-311g* [19] −266.35 −6.40 −6262.27NWChem cc-pvqz [19] −266.41 −6.52 −6263.65P2-FEM [37] −264.66 −6.54 −6226.57P3-FEM [37] −266.38 −6.53 −6262.57P4-FEM [37] −266.44 −6.53 −6263.78FHI-AIMS [37], [83] −266.44 −6.53 −6263.83NESSIE P2-FEM −266.39 −7.04 −6244.45NESSIE P3-FEM −266.49 −6.55 −6263.41

TABLE IDFT ENERGY RESULTS FOR BENZENE (INCLUDING THE FIRST CORE

EIGENVALUE, THE HOMO LEVEL AND TOTAL ENERGY) OBTAINED USINGDIFFERENT ALL-ELECTRON FIRST-PRINCIPLE SOFTWARE AND BASIS

FUNCTIONS, BUT THE SAME LDA APROACH [84] TO MODEL THE XCTERM.

spectrum slicing approach presented in Figures 3 and 4. Forexample, using only two search intervals (one for the core andone for the valence states, i.e. L1=2 in FEAST), eight contourpoints per interval (so eight linear systems in total, i.e. L2=8in FEAST), a molecule like benzene with twelve atoms (i.e.L3=12 in FEAST) can effectively scale up to 2×8×12 = 192MPI processes on HPC platforms.

In general, the three levels of parallelism of FEAST canwork together to minimize time spent in all stages of thealgorithm. The third level L3 can be used to reduce thememory per node and to decrease the solution time of boththe linear system factorization and solve. The second level L2has close to ideal scaling, and, if fully utilized, can reducethe algorithmic complexity to solving a single complex linearsystem per FEAST iteration. The first level L1, in turn, allowsfor the computation of a very large number of eigenvaluesby subdividing the full search interval. The simulation resultsobtained in Ref. [85] have demonstrated that the NESSIE’smuffin-tin approach associated with the FEAST eigensolver,is ideally suited for achieving both strong and weak scalabilityon high-end HPC platforms. Some results on weak scalability(i.e. the number of MPI processes increases proportionallywith the number of atoms) are reported in Figure 6. Theseresults outline, in particular, the efficiency of the muffin-tinDD solver presented in Figure 3 in comparison with other‘black-box’ sparse parallel direct solvers.

C. Real-time TDDFT Excited States Calculations

In TDDFT theory, all the occupied Ne ground-state wavefunctions Ψ = ψ1, ψ2, . . . , ψNe solutions of the Kohn-Sham system for the ground-state problem (1), are used asinitial conditions for solving a time-dependent Schrodinger-type equation (∀i = 1, . . . , Ne):

ıh∂

∂tψi(r, t) =

[− h2

2m∇2 + vKS [n](r, t)

]ψi(r, t),

with n(r, t) = 2

Ne∑i=1

|ψi(r, t)|2, (9)

where the electron density of the interacting system can thenbe obtained at any given time from the time-dependent Kohn-Sham wave functions. In principle, TDDFT can be used to

Page 8: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

8

#MPI 9 17 25 33 41#Atoms 54 102 150 198 246System size 211K 392K 575K 758K 942K

Fig. 6. L3 weak scaling of factorization and solve stages for a singleFEAST iteration using 16 contour points (L2=1 here) and 600 right-hand-sides (i.e. search interval with up to 600 states). The matrix size is increasedproportionally with the number of MPI processes (using 12 cores per MPI ofHaswell E5-2680v3 – so 492 cores in total for the 246 atom systems). Theresults show that the muffin-DD solver in Figure 3 outperforms both MKL-Cluster-Pardiso [86] and MUMPs solvers [87]. Overall timings can be easilyreduced by a factor 16 using L2=16 MPI parallelism (using 7872 cores intotal). Finally, timings can further improve by increasing the number of MPIprocesses for a fixed atom size system (strong scalability).

calculate any time dependent observable as a functional of theelectron density. In (9), the Kohn-Sham potential becomes afunctional of the time-dependent density:

vKS(n(r, t)) = vext(r, t) + vH(n(r, t)) + vXC(n(r, t)),

where it is common practice to consider a local dependencyon time for the XC potential term vXC (a.k.a., the adiabaticapproximation).

1) Real-Time Propagation: Assuming a constant time step∆t, the integral form of (9) introduces the time-orderedevolution operator U(t+ ∆t, t), such that:

Ψ(t+ ∆t) = U(t+ ∆t, t)Ψ(t),

with U(t+ ∆t, t) = T exp

− ıh

∫ t+∆t

t

dτH(τ)

.

(10)

There exists a large number of efficient numerical methodsfor solving this real-time propagation problem [88] which canbroadly be classified into two categories:(i) PDE-based such as the Crank-Nicolson (CN) scheme [89]

where

U(t+ ∆t, t) =

[1 +

i

2∆tH(t+ ∆t/2)

]−1

×[1− i

2∆tH(t+ ∆t/2)

]. (11)

This is an implicit scheme that requires solving a linearsystem at each-time step.

(ii) Integral-based that acts directly on the evolution operator,such the mid-point exponential rule i.e.

U(t+ ∆t, t) = exp−ı∆tH(t+ ∆t/2). (12)

This represents the starting point for splitting methods[90], Magnus expansion [91] or other spectral decompo-sitions [92], [93].

If ∆t is small enough, one can usually assume that H(t +∆t/2) ' H(t). Alternatively, corrector-predictor schemes canbe used to evaluate the Hamiltonian at (t + ∆t/2). Thenon-linear nature of the time propagation arises from theKohn-Sham VKS potential that needs to be reevaluated ateach time step to form the new Hamiltonian. The Hartreepotential VH and exchange and correlation potential VXC areboth functionals of the electron density. The time dependentHartree potential is solution of the Poisson equation, while theXC term must be computed with time instantaneous electrondensity (using the adiabatic approximation) and the sameapproximation used in ground-state calculations (such as LDA,GGA, etc.).

The CN scheme for TDDFT is particularly effective withinthe NESSIE framework, since the linear system solves alongthe time steps in (11) can take advantage of the highly scalablemuffin-tin real-space domain decomposition solver shown inFigure 3. As a result, the approach can be parallelized at twodifferent MPI levels: (i) by propagating independent chunksof the occupied wavefunctions along ∆t; and (ii) by solvingthe linear system in parallel.

Spectral-based schemes for the integral-approach (12) areknown to be robust and accurate permitting larger time stepsthan other usual integration schemes. They are, however, rarelyused in practice for large-scale simulations since a directdiagonalization of the evolution operator (12) would requiresolving hundreds to thousands eigenvalue problems alongthe time-domain (i.e. one large-scale eigenvalue problem pertime step). In addition to CN, NESSIE includes an efficientspectral-based approach which relies on the efficiency of theFEAST eigensolver [92]. First, good approximations of theexponential in (12) can be obtained with a partial spectraldecomposition using an eigenvector subspace four to five timesthe number of propagated states ψj . Since the latter are low-energy states, this truncated spectral basis is typically sufficientto accurately expand the solutions. Second, FEAST can reusethe eigenvector subspace computed in the current time-stepas very good initial guess for the next one. As a result, onlyone or two subspace iterations are usually sufficient to obtainconvergence. While the linear systems arising in CN need tobe solved one after another along small time-intervals, a par-allel FEAST implementation permits the solution of a singlelinear system by larger time intervals. Although the FEASTlinear system is notably more computationally demanding i.e.including a lot of extended states, linear parallel scalability canstill be naturally achieved using multiple search intervals andmore parallel computing power. It is worth mentioning that thesame strategy would not be possible with the techniques usedin TDDFT linear response theory in frequency domain (e.g.using Casida equation [10]), where the demand in extendedstates is even higher and represents the bottleneck of theircubic arithmetic complexity. Consequently, if one can keepup with the demand in parallel computing power, directdiagonalizations for the real-time TDDFT formalism usingFEAST become a viable high-performance alternative to other

Page 9: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

9

schemes, potentially capable of both higher-scalability andbetter accuracy for obtaining linear and non-linear responses.

2) Dipole-time Response: In principle, all time dependentobservables are functionals of the density, however, in practicethe functional form is rarely known. A very useful case ofTDDFT is related to spectroscopy, where the absorption andemission spectrum corresponding to electronic excitations canbe derived directly from the induced dipole moment d(t). Thelatter is related to the response of the system to an appliedelectric field E:

d(t) =

∫ t

−∞α(t− t′)E(t′)dt′, (13)

where α is defined as the dynamic polarizability. In general,E and d are vectors quantities with x, y and z componentsand α is a tensor. In computational spectroscopy, one oftenconsiders the response of the system in a given direction µ(e.g. x, y or z) associated with an excitation polarized in thesame direction. The induced dipole moment in (13) can becalculated as a measure of how far the electron density n(r, t)has moved away from its ground-state value n0 along a givendirection µ [17], [94], [95]:

dµ(t) = −q∫

Ω

(µ− r0) (n(r, t)− n0(r)) dr, (14)

where r0 stands for the molecular center of mass. As a result,for an isotropic material, it is possible to compute the dynamicpolarizability by inverting (13). In frequency domain (afterFourier transform), the expression becomes:

α(ω) =d(ω)

E(ω). (15)

In practice, it is necessary to introduce artificial dampingsignal into the computed dipole moment before taking itsFourier transform:

d(ω) =

∫ T

0

(d(t)× e−γt

)eiωtdt, (16)

where γ is a damping coefficient. This damping term isused to mimic the system relaxation effect since the TDDFTsimulations presented here do not explicitly account for energydissipation (i.e. a system would physically emit energy asphotons and relax back to the ground-state after being excited).

In time-dependent simulations, any external electric fieldE(t) may be considered. It is common practice, however,to either use a step potential u(t) or an impulse excitationδ(t) both along a given direction [16]. Table II presents theresulting expressions for the dynamic polarizability α(ω) afterFourier transforms of these particular electric fields.

E(t) α(ω)

E0 × δ(t) −d(ω)/E0

E0 × u(t) −iω × d(ω)/E0

TABLE IIPOLARIZABILITY FOR EXCITATIONS IN THE FORM OF AN IMPULSE

POTENTIAL AND A STEP POTENTIAL (E0 STANDS FOR THE AMPLITUDE OFTHE ELECTRIC FIELD).

Finally, the imaginary part of the dynamic polarizabilityprovides the photo-absorption cross section [96], which isrelated the probability that a photon passing through theatomistic system is absorbed. A measure of the strength ofthis interaction (a.k.a., the oscillator strength) can be computedfrom the polarizability as follows [97], [98]:

σ(ω) =4πω

c=(α(ω)). (17)

Once the oscillator strength ω is plotted in function of thefrequency ω (or equivalently the absorption energy), it pro-vides the absorption spectrum of the system. As an example,Figure 7 shows the variations of the induced dipole momentfor benzene obtained after three distinct impulse excitationspolarized in the x, y and z directions, as well as the corre-sponding three absorption spectra.

3) Resonances and response density: The peaks in theabsorption spectrum correspond to specific quantum many-body excitations (such as plasmon, band-band, etc.). Theelectron dynamics for a specific peak can be investigatedfurther by computing and then visualizing the response densityin 4D. Such simulations aim at providing more details on theelectron dynamics of the particular resonances with relevantinformation about their nature. The response density δn(ω, r)is the change in electron density due to an excitation atfrequency ω (i.e. charge oscillations at ω). One possible wayto visualize δn(ω, r) is by applying a sinusoidal excitation ata given frequency of interest, waiting for the induced dipolemoment to reach a steady state where it oscillates at ω, andplotting the 4D data when the dipole reaches a maximum anda minimum [99]. Another more efficient approach consistsof computing δn(ω, r) directly following the same procedureused for deriving the dipole moment in (14) and (16), whichleads to:

δn(r, ω) =

∫ T

0

(n(r, t)− n0(r)) e−γt)eiωtdt. (18)

In practice, there is no need to store all the n(r, t) functionsthat would be too prohibitive. Once the peaks/resonances ofinterests (i.e. ωj) have been identified in the absorptionspectrum (for a given polarized excitation), one can proceed byrunning a new time-dependent TDDFT calculation to computethe response density δn(r, ωj) (all frequency ωj at once)using an on-the-fly Fourier transform of the time varyingelectron density. Results from this approach are shown inFigure 8 for few selected peaks in the absorption spectrum ofbenzene when the molecule is excited with an impulse electricfield polarized in the z direction (as shown in Figure 7).

4) Discussion on Accuracy and Reliability: The directionindependent absorption spectrum which is computed as theaverage of the spectra in x, y and z directions, can be directlyand quantitatively compared with the experimental data, ifavailable. Figure 9 compares the experimental absorptionspectra of various molecules with the NESSIE’s first-principlesimulation results (obtained at T=0K). In general, the TDDFTsimulation results compare remarkably well with experimentaldata for a large number of atomistic systems.

While the choice of the XC functional can significantlyimpact the reliability of the DFT ground-state results, a simple

Page 10: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

10

Fig. 7. The left plots represent the relative orientation of the benzene molecule if the applied electric field was polarized from left to right. The middle plotsshow the induced dipole moment for benzene after an impulse excitation polarized in the x (top), y (middle) and z (bottom) direction (time steps of 5as areused). The corresponding absorption spectra are shown in the right plots.

Fig. 8. 4D isosurface plots of the response electron density for four differentpeaks chosen from the absorption spectrum of z-polarized excited benzene(electric field going from left to right). From left to right, four frequency ωvalues are represented: 6.91eV, 10.04eV, 13.61eV, and 16.20eV.

adiabatic LDA (ALDA) approximation for TDDFT appears tobe sufficient for a wide variety of systems. Using NESSIE,it is also interesting to note that the choice of P2 vs P3FEM basis functions, does have only a minimal impact onthe accuracy of the absorption spectrum [70]. This is clearlynot the case for DFT ground-state calculations as reported inTable I, where good accuracy would require an appropriatelevel of refinement for FEM (using at least cubic P3 FEM).As a result, the real-time TDDFT framework appears to beresilient to some approximations (such as the choice of XCterm, or basis functions) as long as they are not too far offand stay consistent throughout the time propagation.

Beside offering the opportunity to perform X-Ray spec-troscopy, there are many other advantages for consideringa full real-space all-electron treatment in simulations. Incontrast to other approaches, a full-core real-space potentialoffers numerical consistency while performing simulations intime-domain. Transferability issues are indeed likely happenwith the use of pseudopotentials that are generated for time-independent calculations, or in turn, with the use of LCAObasis which cannot offer the same reliability to capture bothconfined and extended states (although LCAO bases can be“augmented” in time-dependent simulations). Comparisonsbetween NESSIE and LCAO approaches are reported inTable III. Although NESSIE is using both a low level ofreal-space approximation (P2-FEM) and a rather simple XCterm (LDA), the results compare relatively well with theexperimental data. These results are actually much better thanthe ones obtained with the NWChem software using the ccpvtz

basis and more advanced XC term (B3LYP) (except for theCO molecule).

H2 CH4 CO C6H6

NWChem-ccpvtz/LDA 12.31 10.29 8.28 7.10NWChem-ccpvtz/B3LYP 12.90 10.75 8.55 7.18NESSIE-P2/LDA 11.19 9.40 8.40 6.90Experiment 11.19 9.70 8.55 6.90

TABLE IIICOMPARISON OF REAL-TIME TDDFT NESSIE AND NWCHEM

SIMULATION RESULTS, WITH THE EXPERIMENTAL DATA FOR THE LOWESTEXCITATION ENERGIES (IN EV), AS REPORTED IN [114].

5) Discussion on large-scale real-time TDDFT simulations:The computing challenges end up being very similar betweenground-state DFT and excited-state real-time TDDFT calcula-tions. There are only two main operations to consider: (i) solv-ing a Hamiltonian linear system with multiple right hand sidesusing in particular the muffin-tin technique in Figure 3; and(ii) solving a linear system for the Poisson equation (usinglocal XC). For DFT, these two steps have to be repeated self-consistently until convergence, while for TDDFT, they need tobe repeated at each time step of the time-propagation (usingeither a Crank-Nicolson or a spectral decomposition scheme).In comparison to other approaches, the muffin-tin solver hasdemonstrated great efficiency to achieve strong and weakscalability [85] (see Figure 6). The scalability bottleneck ofthe muffin-tin solver would eventually come from solving theinterstitial Hamiltonian system in parallel using MPI (Schurcomplement in step 2 of Figure 3). In practice, our strong andweak scalability results show that up to one thousand atoms(corresponding to ∼ 1M size interstitial matrix), this systemcan be efficiently solved using a standard direct ’black-box’parallel sparse system solver (such as cluster MKL-PARDISO[86]). Reaching the milestone of ten thousand atoms andbeyond, is still the subject of active research efforts thatinvestigate new directions in numerical linear algebra suchas the use of hybrid parallel solvers with customized low-communication preconditioners.

Page 11: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

11

Fig. 9. Comparison between the computed absorption spectrum for various molecules using NESSIE (with ALDA and P2 FEM) and multiple set ofexperimental values for H2 (Exp-A [100], B [101] C [102] and D [103]), CO [104], H2O (Exp-A [105], B [106] C [107] and D [108]), CH4 (Exp-A [109],and B [110]), SiH4 [111], and C6H6 (Exp-A [112], and B [113])

IV. NANOPLASMONIC APPLICATIONS

Nanoplasmonics is a field that has grown rapidly in the lastfew years [115], [116], and it already offers numerous appli-cations to electronics and photonics. In the visible and nearIR range, nano-antennas [117] and nanoparticles have provideddrastically enhanced coupling to electromagnetic waves [118],[119]. A 2008 comment in Nature Nanotechnology [117]stated “molecular components promise to revolutionize theelectronics industry, but the vision of devices built from quan-tum wires and other nanostructures remains beyond presentday technology. Making such devices will require an extremelydetailed knowledge of the properties of these components suchas the dynamics of charge carriers, electron spins and variousexcitations, on nanometer length scales and subpicosecondtimescales at very high (up to THz) frequencies”. Reference[117] points out that single-wall carbon nanotube (SWCNT)resonators would constitute a unique THz ultra-compact circuitelement, which might for example be used to control a THzoscillator source. Similar opportunities exist in the IR andvisible ranges. One concludes that discovery of new plasmonicmaterials is mandatory for the future expansion of the field ofnanoplasmonics.

In Ref. [99], NESSIE has been used to provide evidenceof the plasmon resonances (collective electron excitations)in a number of representative short 1D finite carbon-basednanostructures using real-time TDDFT simulations. The sim-ulated systems ranged from small molecules such as C2H2

to various carbon nanostructures that are equivalent to 1Dconductors with finite lengths, including: carbon chains, nar-row armchair and zigzag graphene nanoribbons (i.e. acenesand PPP), and short carbon nanotubes (CNT). NESSIE all-electron TDDFT/ALDA’s model was able to accurately capturethe bright components of the spectra which account for theplasmonic excitation. The chief signature of 1D plasmons isa high-frequency excitation that is inversely proportional tothe length of the conductor. In particular, it was shown thatmetallic 1D CNTs can be well described with the Tomonaga-Luttinger theory [99], [120]–[124]. The plasmon velocity isexpected to reach an asymptotic value (up to 3 to 5 timesthe single particle Fermi velocity) when the simulations areextended to tens of unit cells, such as very long CNT thatbecome relevant for THz spectroscopy.

Since the reported preliminary work on short CNT [99](about 5 unit-cells), NESSIE has been upgraded to simulatelarge-scale atomistic systems by taking advantage of new highperformance computing techniques such as the muffin-DDsolver presented and discussed in Figures 2 and 3. This all-electron real-space and real-time TDDFT framework is nowcapable to simulate very large structures (up to tens of unitcells for CNT – from hundred atoms to few thousands), andlead to more relevant predicted data of the plasmonic effectsfor 1D systems. As an example, Table IV summarizes the mainNESSIE parameters and simulation results while consideringincreasingly longer (3,3)-CNTs. These results show that theposition of the plasmon excitation peak (lowest excitationenergy/frequency i.e. Ep, fp) keeps shifting with longer CNTs.The corresponding absorption spectra are provided in Fig-ure 10.

5-CNT 10-CNT 20-CNT 40-CNT

Syst

em

Length (nm) 1.26 2.49 4.99 9.99#Atoms 78 138 258 498#Electrons 408 768 1488 2928

Mes

h size muffin 2065 2065 2065 2065size inters. 149499 126431 233696 446559size total 302925 397877 741182 1426125

DFT

E1 (eV) −267.73 −268.05 −268.07 −268.28EHOMO (eV) −4.996 −4.823 −5.000 −4.955Etot (eV) −67713 −128825 −251351 −496359

RT-

CN ∆t (fs) 0.01 0.01 0.01 0.01

Total T (fs) 15 30 45 65#Time-steps 1500 3000 4500 6500

TD

DFT Ep (eV) 1.62 1.24 0.82 0.525

fp (THz) 394.13 299.83 198.27 126.94vp (106m/s) 0.98 1.49 1.98 2.54

TABLE IVNESSIE’S MAIN PARAMETERS AND SIMULATION RESULTS OBTAINED

USING THE ALL-ELECTRON/DFT/TDDFT/ALDA FRAMEWORK APPLIEDTO INCREASINGLY LONGER (3,3)-CNTS. THE REAL-SPACE MESH USES AP2-FEM DISCRETIZATION WHILE THE REAL-TIME APPROACH USES A CN(RT-CN) PROPAGATION SCHEME. THE SIMULATIONS WERE EXECUTED ON

XSEDE-COMET [125].

In Table IV, the plasmon velocity is obtained with thereasonable assumption that the plasmon (collective electroncloud) must travel back and forth the full length L of thenanotube to complete a single oscillation (i.e. vp = 2Lfp).This is further supported by the 4-dimensional isosurface plots

Page 12: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

12

Fig. 10. Computed absorption spectra for three (3,3)-CNTS presented inTable IV (10, 20 and 40 units cells). The selected energy range outlinesthe lowest excitation peak which keeps shifting left (red-shift) with longernanostructures. The corresponding 4D isosurface plots of the response electrondensity (on top) provide visual information about the dynamic of theseplasmonic excitations.

of the response density δn(ω, r) (18) in Figure 10, whichhave been calculated for the specific plasmon resonances. Theplasmon velocity increases to 2.54 times the Fermi velocity(here at 106 m/s) for the 40 unit cells (3,3)-CNT, and it isexpected to level off if we keep increasing the length of theCNT.

V. CONCLUSION

Modern first-principle calculations aim at bringing compu-tational activities up to the level where they can significantlyimpact innovations in electronic nanomaterials and devicesresearch. Nanostructures with many atoms and electrons canonly be treated by addressing the efficiency and scalabilityof algorithms on modern computing platforms with multiplehierarchical levels of parallelism. These goals can be achievedusing an efficient modeling framework that can perform real-space DFT ground-state calculations, and real-time TDDFTexcited-state calculations. The latter can be used to study var-ious relevant quantum many-body effects (such as plasmoniceffects) by performing electronic spectroscopy. Spectroscopictechniques are among the most fundamental probes of matter:incoming radiation perturbs the sample and the response to thisperturbation is measured. The system is inherently excited inthis process and hence, a calculation of ground-state propertiesis insufficient to interpret the response of the system. TDDFThas had considerable success modeling the interaction ofelectromagnetic fields with matter, and obtaining spectroscopicinformation with absorption and emission spectra.

The paper discussed the NESSIE software which has beenfundamentally designed to take advantage of parallel opti-mization at various levels of the entire modeling process.NESSIE benefits from the linear scaling capabilities of real-space mesh techniques and domain decomposition methodsto perform all-electron (full core potential) calculations. The

modeling approach is tailored to optimally take advantage ofthe full capability of the state-of-the-art FEAST eigensolverthat can achieve significant parallel scalability on modern HPCarchitectures. With the success in meeting these challenges,NESSIE is currently able to extend the first-principle simula-tions to very large atomistic structures (i.e. many thousandselectrons at the level of all-electron/DFT/TDDFT/ALDA the-ory). The modeling framework opens new perspectives for ad-dressing the numerical challenges in TDDFT excited-state cal-culations to operate the full range of electronic spectroscopy,and study the nanoscopic many-body effects in arbitrary com-plex molecules and finite-size large-scale nanostructures. It isexpected that the NESSIE software and associated numericalcomponents can become a new valuable new tool for thescientific community, to investigate the fundamental electronicproperties of numerous nanostructured materials.

ACKNOWLEDGMENT

This work was also supported by the National Science Foun-dation, under grants #CCF-1510010 and SI2-SSE#1739423.The CNT calculations used the Extreme Science and Engi-neering Discovery Environment (XSEDE), which is supportedby National Science Foundation grant number ACI-1548562.

REFERENCES

[1] R. M. Martin, Electronic Structure: Basic Theory and Practical Methods,Cambridge University Press, (2004).

[2] J. Kohanoff, Electronic Structure Calculations for Solids and Molecules:Theory and Computational Methods, Cambridge University Press (2006).

[3] P. Hohenberg and W. Kohn, Inhomogeneous Electron Gas, Phys. Rev.136, B864 - B871 (1964).

[4] W. Kohn and L. J. Sham, Self-Consistent Equations Including Exchangeand Correlation Effects, Phys. Rev. 140, A1133 - A1138 (1965).

[5] K. Burke, Perspective on density functional theory, J. Chem. Phys. 136,150901, (2012).

[6] L. J. Sham, Theoretical and Computational Development Some effortsbeyond the local density approximation, International Journal of QuantumChemistry Vol. 56, 4 , pp 345-350 (2004).

[7] V. I. Anisimov, Strong Coulomb Correlations in Electronic StructureCalculations: Beyond the Local Density Approximation, Gordon andBreach Science Publisher, (2000).

[8] E. M. Stoudenmire, L. O. Wagner, S. R. White, and K. Burke, One-dimensional Continuum Electronic Structure with the Density MatrixRenormalization Group and Its Implications For Density FunctionalTheory, Phys. Rev. Lett. 109, I5, 056402 (2012).

[9] W. G. Aulbur, L. Jonsson, J. W. Wilkins, Quasiparticle Calculations inSolids. Solid State Physics 54: 1218, (2000).

[10] C. A. Ullrich, Time-Dependent Density-Functional Theory: Conceptsand Applications. Oxford University Press, 2012.

[11] M. A. L. Marques, N. T. Maitra, F. M. S. Nogueira, E. K. U. Gross,A. Rubio Fundamentals of Time-Dependent Density Functional Theory,Lecture Notes in Physics vol. 837, Springer-Verlag, Berlin Heidelberg(2012)

[12] G. Onida, L. Reining, A. Rubio, Electronic excitations: density-functional versus many-body Greens-function approaches , Rev. Mod.Phys. 74, pp 601-658 (2002).

[13] E. Runge, E. K. U. Gross, Density-Functional Theory for Time-Dependent Systems, Phys. Rev. Let., 52, pp997-1000, (1984).

[14] K. Burke, J. Werschnik, E. K. U. Gross, Time-dependent density func-tional theory: Past, present, and future, J. Chem. Phys. 123, 062206(2005)

[15] K. Yabana, G. F. Bertsch, Application of the time-dependent local densityapproximation to optical activity, Phys. Rev. A60 1271 (1999).

[16] K. Yabana, T. Nakatsukasa, J.-I. Iwata, and G. F. Bertsch, Real-time,real-space implementation of the linear response time-dependent density-functional theory, Phys. Stat. Sol. (b) 243, No. 5 , pp11211138 (2006).

Page 13: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

13

[17] X. Andrade, J. Alberdi-Rodriguez, D. A. Strubbe, M. J. T. Oliveira,F. Nogueira, A. Castro, J. Muguerza, A. Arruabarrena, S. G. Louie,A. Aspuru-Guzik, A. Rubio, and M. A. L. Marques,Time-dependentdensity-functional theory in massively parallel computer architectures:the octopus project, J. Phys.: Cond. Matt. 24 233202 (2012).

[18] Octopus http://www.tddft.org/programs/octopus/[19] NWChem http://www.nwchem-sw.org/index.php/Main Page[20] GPAW https://wiki.fysik.dtu.dk/gpaw/[21] M.E. Casida, Time-dependent density-functional response theory for

molecules, in Recent Advances in Density Functional Methods, Part I,edited by D.P. Chong (Singapore, World Scientific, 1995), p 155.

[22] E. Lorin de la Grandmaison, S. B. Gowda, Y. Saad, M. L. Tiago, andJ. R. Chelikowsky. Efficient computation of the coupling matrix in time-dependent density functional theory. Computer Physics Communications,167:7–22, (2005).

[23] W. R. Burdick, Y. Saad, L. Kronik, Manish Jain, and James Che-likowsky.Parallel implementations of time-dependent density functionaltheory. Computer Physics Communications, 156:22–42, (2003).

[24] S. Meng, E. Kaxiras, Real-time, local basis-set implementation of time-dependent density functional theory for excited state dynamics simula-tions, J. of Chem. Phys. 129, 054110 (2008).

[25] T. Beck, Real-space mesh techniques in density-functional theory, Rev.of Modern Phys., 72, 1041, (2000).

[26] P. F. Batcho, Computational method for general multicenter electronicstructure calculations, Phys. Rev. B, 61, 7169, (2000).

[27] S. R. White, J. W. Wilkins, M. P. Teter, Finite Element method forelectronic structure, Phys. Rev. B, 39, 5819 (1989).

[28] J. R. Chelikowsky, N. Troullier, Y. Saad, Finite-Difference-Pseudopotential method: electronic structure calculations without a basis,Phys. Rev Let. 72, 1240, (1994).

[29] E. Tsuchida, M. Tsukada, Electronic-structure calculations based onfinite-element method, Phys. Rev. B, 52, 5573, (1995).

[30] N. A. Modine, G. Zumbach, E. Kaxiras Adaptive-coordinate real-spaceelectronic structure calculations for atoms, molecules, and solids, Phys.Rev. B, 55, 10289, (1997).

[31] E. L. Briggs, D. J. Sullivan, J. Bernholc, Real-space multigrid-basedapproach to large-scale electronic structure calculations, Phys. Rev. B,54, 14362, (1996).

[32] M. Heisakanen, T. Torsti, M. J. Puska, and R. M. Nieminen, Multigridmethod for electronic structure calculations, Phys. Rev. B, 63, 245106,(2001).

[33] S. Goedecker, Linear scaling electronic structure methods, Rev. ofModern Phys., 71, 1085, (1999).

[34] J. E. Pask, B. M. Klein, P. A. Sterne, C. Y. Fong, Finite-element methodsin electronic structure theory, Comput. Phys. Comm., 135,1, (2001).

[35] J.R. Chelikowsky, The pseudopotential-Density Functional methodapplied to nanostructures, J. Phys. D 33, R33 (2000).

[36] T. Torsti, T. Eirola, J. Enkovaara, T. Hakala, P. Havu, V. Havu, T. Hnmaa,J. Ignatius,-A M. Lyly, I. Makkonen, T. Rantala, K. Ruotsalainen, E. nen,-A H. Saarikoski, M. J. Puska, Three real-space discretization techniquesin electronic structure calculations, Physica Status Solidi (b) 243, No. 5,1016 1053 (2006).

[37] L. Lehtovaara, V. Havu, and M. Puska, All-electron density functionaltheory and time-dependent density functional theory with high-order finiteelements J. Chem. Phys. 131, p054103 (2009)

[38] S. Baroni and P. Giannozzi, Towards very large scale electronic structurecalculations Europhys, Lett. 17, 547 (1992).

[39] L. Lin, C. Yang, J. Meza, J. Lu, L. Ying, W. E, SelInvAn Algorithm forSelected Inversion of a Sparse Symmetric Matrix ACM Trans. on Math.Softw. (TOMS), V.37 I4, no40 (2011).

[40] D. Zhang and E. Polizzi, Linear scaling techniques for first-principlecalculations of large nanowire devices 2008 NSTI NanotechnologyConference and Trade Show. Technical Proceedings, Vol. 1 pp12-15(2008).

[41] G. Golub, H. A. van der Vorst. Eigenvalue computation in the 20thcentury, J. of Comput. and Appl. Math., V. 123, pp. 35-65 (2000).

[42] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe and H. van der Vorst, Templatesfor the solution of Algebraic Eigenvalue Problems: A Practical Guide,Society for Industrial and Applied Mathematics, Philadelphia, PA, (2000).

[43] http://www.netlib.org/utk/people/JackDongarra/la-sw.html[44] T. Sakurai and H. Sugiura, A projection method for generalized eigen-

value problems using numerical integration, Journal of Computationaland Applied Mathematics, 159 (2003), pp. 119–128.

[45] T. Sakurai and H. Tadano, CIRR: a Rayleigh-Ritz type method withcontour integral for generalized eigenvalue problems, Hokkaido Mathe-matical Journal, 36 (2007), pp. 745–757.

[46] A. Imakura, L. Du, and T. Sakurai, A block Arnoldi-type contour integralspectral projection method for solving generalized eigenvalue problems,Applied Mathematics Letters, 32 (2014), pp. 22–27.

[47] A. P. Austin and L. N. Trefethen, Computing eigenvalues of realsymmetric matrices with rational filters in real arithmetic, SIAM Journalon Scientific Computing, 37 (2015), pp. A1365–A1387.

[48] E. Polizzi Density-matrix-based algorithm for solving eigenvalue prob-lems, Phys. Rev. B, 79, p115112, (2009).

[49] The FEAST eigenvalue solver, http://www.feast-solver.org[50] P. Tang, E. Polizzi, FEAST as a Subspace Iteration EigenSolver

Accelerated by Approximate Spectral Projection, SIAM Journal on MatrixAnalysis and Applications, (SIMAX), 35, 354 (2014).

[51] S. Guttel, E. Polizzi, P. T. Tang, G. Viaud, Optimized Quadrature Rulesand Load Balancing for the FEAST Eigenvalue Solver, SIAM Journal onScientific Computing (SISC), 37 (4), pp2100-2122 (2015)

[52] NESSIE Software http://www.nessie-code.org[53] J. C. Slater, Wave Functions in a Periodic Potential, Phys. Rev., 51,

pp846-851, (1937).[54] H. Si, TetGen, a Delaunay-Based Quality Tetrahedral Mesh Generator,

ACM Trans. Math. Softw. 41, 2, Art. 11 (2015).[55] TetGen http://wias-berlin.de/software/tetgen[56] B. N. Harmon and D. D. Koelling, Technique for rapid solution of the

APW secular equation, J. Phys. C: Solid-state Phys., 7, p210, (1974).[57] E. Sjostedt L. Nordstrom, Efficient solution of the non-linear augmented

plane wave secular equation, J. Phys.: Condens. Matter, 14, pp12485-12494, (2002).

[58] W-J Beyn, An integral method for solving nonlinear eigenvalueproblems, Linear Algebra and its Applications, V. 436, I. 10, 3839, (2012).

[59] B. Gavin, A. Miedlar, E. Polizzi, FEAST eigensolver for nonlineareigenvalue problems, Journal of Computational Science, V. 27, 107,(2018).

[60] O. K. Andersen, Linear methods in band theory, Phys. Rev. B, 12,pp3060-3083, (1975).

[61] D. J. Singh, Ground-state properties of lanthanum: Treatment of extend-core states, Phys. Rev. B, 43, p6388, (1991).

[62] E. Sjostedt, L. Nordstrom, and D. J. Singh, An alternative way of lin-earizing the augmented plane-wave method, Solid state communications,114, pp15-20, (2000).

[63] G. K. H. Madsen, P. Blaha, K. Schwarz, E. Sjostedt, L. Nordstrom,Efficient linearization of the augmented plane-wave method, Phys. Rev.B, 64, p195134, (2001)

[64] D. J. Singh, L. Nordstrom, Planewaves, Pseudopotentials, and theLAPW Method, Springer, 2nd edition, (2006).

[65] H. Hellmann, A New Approximation Method in the Problem of ManyElectrons, J. Chem. Phys.,3, p61, (1935).

[66] J. C. Phillips and L. Kleinman, New Method for Calculating WaveFunctions in Crystals and Molecules, Phys. Rev., 116, p287, (1959).

[67] L. Kleinman and D. M. Bylander, Efficacious Form for Model Pseu-dopotentials, Phys. Rev. Lett., 48, p1425, (1982)

[68] P. E. Blochl, Projector augmented-wave method, Phys. Rev. B, 50,p17953, (1994)

[69] A. Levin, D. Zhang, E. Polizzi, FEAST fundamental framework forelectronic structure calculations: Reformulation and solution of themuffin-tin problem, Comput. Phys. Comm. V183 I11, pp2370–2375,(2012)

[70] J. Kestyn, Parallel Algorithms for Time Dependent Density FunctionalTheory in Real-space and Real-time, Doctoral Dissertations, UMassAmherst, (2018).

[71] J. Kestyn, E. Polizzi, P.T. Tang FEAST eigensolver for non-Hermitianproblems, SIAM Journal on Scientific Computing, 38, S772–S799, (2016)

[72] D. G. Anderson, Iterative procedures for nonlinear integral equations.Journal of the ACM (JACM) 12, 4, 547-560, (1965).

[73] C. G. Broyden, A class of methods for solving nonlinear simultaneousequations. Mathematics of computation 19, 92 (1965), 577593.

[74] P. Pulay, Convergence acceleration of iterative sequences. the case ofSCF iteration, Chem. Phys. Let. 73 (2): 393398 (1980).

[75] P. Pulay, Improved scf convergence acceleration. Journal of Computa-tional Chemistry 3, 4, 556-560 (1982).

[76] K. N. Kudin, G. E. Scuseria, and E. Cances. A black-box self-consistentfield convergence algorithm: One step closer. The Journal of chemicalphysics 116, 19, 8255-8261, (2002).

[77] C. Yang, W. Gao, J. C. Meza On the convergence of the self-consistentfield iteration for a class of non-linear eigenvalue problems SIAM J.Matrix Anal. Appl., V. 30 (4), pp. 1773-1788 (2009).

[78] H. F. Walker, and P. Ni. Anderson acceleration for fixed-point iterations.SIAM Journal on Numerical Analysis 49, 4, 1715-1735, (2011).

Page 14: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

14

[79] B. Gavin, E. Polizzi, Non-linear eigensolver-based alternative totraditional SCF methods, J. Chem. Phys. 138, 194101 (2013)

[80] V. Eyert, A comparative study on methods for convergent accelerationof iterative vector sequences, Journal of Computational Physics 124,pp271-285 (1996).

[81] Y. Saad and M. H. Schultz. GMRES: A generalized minimal residualalgorithm for solving nonsymmetric linear systems. SIAM Journal onscientific and statistical computing 7, 3, 856-869, (1986)

[82] Y. Saad, J. R. Chelikowsky, and S. M. Shontz. Numerical methods forelectronic structure calculations of materials. SIAM review 52, 1, 3-54,(2010).

[83] FHI-aims, https://aimsclub.fhi-berlin.mpg.de/[84] J. P. Perdew, A. Zunger Self-interaction correction to density- functional

approximations for many-electron systems, Phys. Rev. B 23, 10, 5048,(1981).

[85] J. Kestyn, V. Kalantzis, E. Polizzi, Y. Saad, PFEAST: A High Per-formance Eigenvalue Solver Using Distributed-Memory Linear Solvers,Proc Int Conf High Perform Comput Networking, Storage Anal. 2016:16,(2016).

[86] Intel Math Kernel Library. http://software.intel.com/en-us/intel-mkl.[87] P. R. Amestoy, I. S. Duff, J. L’Excellent, and J. Koster, MUMPS: a

general purpose distributed memory sparse solver, in Applied ParallelComputing. New Paradigms for HPC in Industry and Academia, Springer,pp. 121–130, (2000).

[88] A. Castro, M. A. L. Miguel, A. Rubio, Propagators for the time-dependent Kohn-Sham equations, J. Chem. Phys. 121, 3425 (2004).

[89] J. Crank, P. Nicolson, A practical method for numerical evaluation ofsolutions of partial differential equations of the heat conduction type,Proc. Camb. Phil. Soc. 43 (1): 5067 (1947).

[90] T.Y. Mikhailova and V.I. Pupyshev, Symmetric approximations for theevolution operator, Physics Letters A 257, 1-2 p16 (1999).

[91] On the exponential solutions of differential equations for a linearoperator Commun. Pure Appl. Math. VII (1954), 649673. (1954).

[92] Z. Chen, E. Polizzi Spectral-based propagation schemes for time-dependent quantum systems with application to carbon nanotubes, Phys.Rev. B, Vol. 82, 205410 (2010).

[93] A. Russakoff, Y. Li, S. He, and K. Varga, Accuracy and computa-tional efficiency of real-time subspace propagation schemes for the time-dependent density functional theory, The Journal of Chemical Physics144, 204125 (2016).

[94] X. Andrade, S. Botti, M. A. L. Marques, and A. Rubio, Time-dependentdensity functional theory scheme for efficient calculations of dynamic(hyper)polarizabilities, J. Chem. Phys. 126, 184106 (2007).

[95] Y. Takimoto, A Real-time Time-dependent Density Functional TheoryMethod for Calculating Linear and Nonlinear Dynamic Optical Response.PhD thesis, University of Washington, (2008).

[96] V. Astapenko, Interaction of Ultrashort Electromagnetic Pulses withMatter. Springer, (2013).

[97] R. C. Hilborn, Einstein coefficients, cross sections, f values, dipolemoments, and all that. American Journal of Physics 50, 11, 982-986,(1982).

[98] Z. Chen, Computational All-Electron Time-Dependent Density Func-tional Theory in Real-Space and Real-Time: Applications to Moleculesand Nanostructures, Doctoral Dissertations, UMass Amherst, (2013).

[99] E. Polizzi, S. Yngvesson, Universal nature of collective plasmonicexcitations in finite 1-D carbon-based nanostructures, Nanotechnology,26, p325201 (2015).

[100] T. E. Sharp, Potential-energy curves for molecular hydrogen and itsions. Atomic Data and Nuclear Data Tables 2, 119-169 (1970).

[101] G. Herzberg. Molecular spectra and molecular structure. vol. 1:Spectra of diatomic molecules. New York: Van Nostrand Reinhold, 1950,2nd ed. (1950).

[102] O. W. Richardson. Molecular hydrogen and its spectrum, vol. 23. YaleUniversity Press, (1934).

[103] C. Backx, G. R. Wight, and M. J. Van der Wiel, Oscillator strengths(10-70 ev) for absorption, ionization and dissociation in h2, hd and d2,obtained by an electron-ion coincidence method. Journal of Physics B:Atomic and Molecular Physics 9, 2, 315 (1976).

[104] W. F. Chan, G. Cooper, C.E. and Brion, Absolute optical oscillatorstrengths for discrete and continuum photoabsorption of carbon monoxide(7200 ev) and transition moments for the x 1σ+ a 1π system. Chemicalphysics 170, 1, 123-138 (1993).

[105] C-Y. Chung, E. P. Chew, B-M. Cheng, M. Bahou, and Y-P. Lee,Temperature dependence of absorption cross-section of h2o, hod, and d2oin the spectral region 140-193 nm. Nuclear Instruments and Methods inPhysics Research Section A: Accelerators, Spectrometers, Detectors andAssociated Equipment 467 15721576, (2001).

[106] B-M. Cheng, C-Y. , Chung, M. Bahou, Y-P. Lee, L.C. Lee, R. vanHarrevelt, and M. C. van Hemert, Quantitative spectroscopic andtheoretical study of the optical absorption spectra of h2o, hod, and d2o inthe 125-145 nm region. The Journal of chemical physics 120, 1, 224-229(2004).

[107] D. H. Katayama, R. E. Huffman, C. L. and OBryan, Absorption andphotoioniza- tion cross sections for h2o and d2o in the vacuum ultraviolet.The Journal of Chemical Physics 59, 8, 4309-4319, (1973).

[108] W.F. Chan, G. Cooper, C. E. and Brion, The electronic spectrum ofwater in the discrete and continuum regions. absolute optical oscillatorstrengths for photoabsorption (6-200 eV). Chemical physics 178, 1-3,387-400, (1993).

[109] F. Z. Chen, C. Y. R. Wu. Temperature-dependent photoabsorptioncross sections in the VUV-UV region. I. Methane and ethane. Journal ofQuantitative Spectroscopy and Radiative Transfer 85, 2, 195-209, (2004).

[110] K. Kameta, N. Kouchi, M. Ukai, and Y. Hatano. Photoabsorption,photoionization, and neutral-dissociation cross sections of simple hydro-carbons in the vacuum ultraviolet range. Journal of Electron Spectroscopyand Related Phenomena 123, 2-3, 225-238 (2002).

[111] G. Cooper, G. R. Burton, W. F. Chan, and C. E. Brion. Absoluteoscillator strengths for the photoabsorption of silane in the valence andSi 2p and 2s regions (7.5-350 ev). Chemical physics 196, 1-2, 293-306,(1995).

[112] A. Dawes, N. Pascual, S. V. Hoffmann, N. C. Jones, and N. J.Mason. Vacuum ultraviolet photoabsorption spectroscopy of crystallineand amorphous benzene. Physical Chemistry Chemical Physics 19, 40,2754427555 (2017).

[113] E. E. Rennie, C. A. F. Johnson, J. E. Parker, D. M. P. Holland, D.A. Shaw, and M. A. Hayes. A photoabsorption, photodissociation andphotoelectron spectroscopy study of c6h6 and c6d6, Chemical physics229, 1, 107-123, (1998).

[114] K. Lopata, and N. Govind, Modeling Fast Electron Dynamics withReal-Time Time-Dependent Density Functional Theory: Application toSmall Molecules and Chromophores, J. Chem. Theory Comput., 7 (5),pp 13441355 (2011).

[115] M.I. Stockman, Nanoplasmonics: the physics behind the applications,Phys. Today 64 3944 (2011).

[116] N.J. Halas, Connecting the dots: reinventing optics for nanoscaledimensions, Proc. Natl. Acad. Sci. 106, 3463 (2009).

[117] L. Novotny, and N. van Hulst, Antennas for light, Nature Photonics5, 83 (2011).

[118] H. Cang, A. Labno, C. Lu, X. Yin, M. Liu, C. Gladden, Y. Liu and X.Zhang, Probing the electromagnetic field of a 15 nanometre hotspot bysingle molecule imaging, Nature 469, 385 (2011).

[119] J.B. Khurgin, How to deal with the loss in plasmonics and metamate-rials, Nature Nanotechnology 10, 26 (2015).

[120] C. Kittel, Introduction to Solid State Physics 8th edn, New York:Wiley; (2005).

[121] S. Tomonaga, Remarks on Blochs method of sound waves applied tomany-fermion problems, Prog. Theor. Phys. 5, 54469 (1950).

[122] J. Luttinger, An exactly soluble model of a many-fermion system, J.Math. Phys. 4, 115462 (1963).

[123] V.K. Deshpande, M. Bockrath, L. Glazman and A. Yacoby A, Electronliquids and solids in one dimension, Nature 464 209 (2010).

[124] C. Kane, L. Balents and P.A. Fisher, Coulomb interactions andmesoscopic effects in carbon nanotubes, Phys. Rev. Lett. 79 5086 (1997).

[125] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V.Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott,N. Wilkins-Diehr, XSEDE: Accelerating Scientific Discovery, Computingin Science & Engineering, vol.16, no. 5, pp. 62-74, Sept.-Oct. (2014).

James Kestyn James Kestyn received his PhD in Electrical Engineering at theUniversity of Massachusetts, Amherst in 2018. His graduate studies focusedon electronic structure calculations, where he helped to develop the multi-level parallel approach used in NESSIE, and on numerical linear algebra,developing the non-Hermitian extension of the FEAST eigenvalue solver. Hehas held internships at Intel and Samsung and has since been working oncomputational electromagnetics at Stellar Science Ltd Co.

Page 15: From Fundamental First-Principle Calculations to ... · NESSIE project James Kestyn, Eric Polizzi Abstract—This paper outlines how modern first-principle cal-culations can adequately

15

Eric Polizzi Eric Polizzi is Professor in the Department of Electrical andComputer Engineering and the Department of Mathematics and Statisticsat the University of Massachusetts, Amherst. He received an education intheoretical and computational physics, and a PhD in Applied Mathematics(2001) from the University of Toulouse, France. Prior to joining UMass in2005, he served as a postdoctoral research associate in Electrical Engineering(2002-2003) and as a senior research scientist in Computer Sciences (2003-2005), both at Purdue University. Prof. Polizzi is conducting interdisciplinaryresearch activities at the intersection between advanced mathematical tech-niques, parallel numerical algorithms, and computational nanosciences. Theseactivities can be broadly divided into three projects: the DFT/TDDFT NESSIEcode, the parallel linear system solver SPIKE, and the eigenvalue solverFEAST.