Understanding Parallel Computers Parallel Processing EE 613.
Large-Scale Molecular Dynamics Simulations of Materials on Parallel Computers
description
Transcript of Large-Scale Molecular Dynamics Simulations of Materials on Parallel Computers
Large-Scale Molecular Dynamics Simulations of Materials on Parallel Computers
CCLMSCCLMSCCLMSCCLMS
Aiichiro Nakano & Priya VashishtaConcurrent Computing Laboratory for Materials Simulations
Department of Computer ScienceDepartment of Physics & Astronomy
Louisiana State UniversityEmail: [email protected] URL: www.cclms.lsu.edu
VII International Workshop on Advanced Computing & Analysis Techniques in Physics Research
Organizers:Dr. Pushpalatha Bhat & Dr. Matthias Kasemann
October 19, 2000, Fermilab, IL
Outline
1. Scalable atomistic-simulation algorithms
2. Multidisciplinary hybrid-simulation algorithms
3. Large-scale atomistic simulation of nanosystems
> Nanophase & nanocomposite materials
> Nanoindentation & nano-impact damage
> Epitaxial & colloidal quantum dots
4. Ongoing projects
Concurrent Computing Laboratory for Materials Simulations
Faculty (Physics, Computer Science): Rajiv Kalia, Aiichiro Nakano, Priya Vashishta
Postdocs/research faculty: Martina Bachlechner, Tim Campbell, Hideaki Kikuchi, Sanjay Kodiyalam, Elefterios Lidorikis, Fuyuki Shimojo,Laurent Van Brutzel, Phillip Walsh
Ph.D. Students: Gurcan Aral, Paulo Branicio, Jabari Lee, Xinlian Liu, Brent Neal, Cindy Rountree, Xiaotao Su, Satavani Vemparala, Troy Williams
Visitors: Elisabeth Bouchaud (ONERA), Antonio da Silva (São Paulo),Simon de Leeuw (Delft), Ingvar Ebbsjö (Uppsala), Hiroshi Iyetomi (Niigata), Shuji Ogata (Yamaguchi), Jose Rino (São Carlos)
• Ph.D. in physics & MS from computer science in 5 years —Broad career options (APS News, August/September, ‘97)
• Synergism between HPCC (MS) & application (Ph.D.) research—Best dissertation award (Andrey Omeltchenko, ‘97)
—MS publication (Parallel Comput., IEEE CS&E, Comput. Phys. Commun., etc.)
• Internship—deliverable-oriented approach to real-worldproblems provides excellent job training
Boeing, NASA Ames, Argonne Nat’l Lab. (Web-based simulation/
experimentation, Alok Chatterjee, Enrico Fermi Scholar, ‘99)
• International collaborationNiigata, Yamaguchi (NSF/U.S.-Japan), Studsvik (Sweden), Delft (The Netherlands), São Carlos (Brazil)
• NSF Graduate Research Traineeship Program
• New program: Ph.D. biological sciences & MS computer science
Education: Dual-Degree Opportunity
Web-based course involving LSU, Delft Univ. in the Netherlands,Niigata Univ. in Japan, & Federal Univ. of Sao Carlos in Brazil
SPOriginT3E
The NetherlandsDelft Univ.
ImmersaDesk
USALouisiana State Univ.
VR workbench
Video Conferencing
Virtual Classroom
Chat Tool Whiteboard Tool
International Collaborative Course
Alpha cluster
DoD Challenge Applications Award1.3 million node-hours in 2000/2001
1. Scalable Atomistic-Simulation Algorithms
CCLMSCCLMSCCLMSCCLMS
• Peta (1015) flop computers direct atomistic simulations• Scalable applications multiresolution algorithms are key
Teraflop
0.25 m
Atomistic regime
107-109 atoms
CMOS (SIA Roadmap)
Molecular Dynamics
Petaflop
70 nm
Continnum regime
1010-1012 atoms
Atomistic Simulationof Real Devices
10 nm
100 nm
1 m
1996 1998 2000 2002 2004 2006 2008 2010
Lin
e w
idth
Year
Atomistic Simulation of Nanosystems
• Newton’s equations of motion
• Many-body interatomic potential> 2-body: Coulomb; steric
repulsion; charge-dipole; dipole-dipole
> 3-body: Bond bending & stretching
—SiO2, Si3N4, SiC, GaAs, AlAs, InAs, etc.
Molecular Dynamics Simulation
mi
d 2 r idt 2
Vr N
r i
(i 1, ..., N)
V uij rij
i j v jik
r ij ,
r ik
i, jk
riki
jkrij
Validation of Interatomic Potentials
Phonon dispersion
High-pressure phase transition
Si3N4
amorphous
0
10
20
30
40
0
10
20
30
40
Freq
uen
cy (
meV
)
K X L X W L
Experiment (Strauch & Dorner, '90)
Theory
0
0.5
1
1.5
2
0 5 10 15 20
Expt.
MD
SN(q
)
q (Å -1)
Johnson et al. (‘83)
3
4
5
0 50 100 150
Den
sity
(g/
cm3 )
Pressue (GPa)
Reversetransition (Expt.)
MD
Forward transition (Expt.)
Yoshida et al. (‘93)
SiC
amorphous SiO2
Neutron staticstructure factor
GaAs
Space-time Multiresolution AlgorithmChallenge 1: Scalability to billion-atom systems
• Hierarchical Fast Multipole Method
• Multiple Time-Scale method
O(N2) O(N)
Long-range Short-range
FMM MTS
Rapid
Slow
1.02 billion-atom MD for SiO2: 26.4 sec/step on 1,024 Cray T3E processors at NAVO-MSRC, Parallel efficiency = 0.97
Scaled speedupon Cray T3E
Wavelet-based Load Balancing
Irregulardata-structures/processor-speed
Parallelcomputer
Map
“Computational-space decomposition” in curved space
Challenge 2: Load imbalance on a parallel computer
Regular mesh topology in computational space,
Curved partition in physical space, x
Wavelet representation speeds up optimization of (x)
Fractal-based Data Compression
Scalable encoding:• Spacefilling curve—store relative positions
Result:• I/O size, 50 Bytes/atom 6 Bytes/atom
1
98
7
65
4
3
2
14 13
1211
10
Challenge 3: Massive data transfer via OC-3 (155 Mbps)75 GB/frame of data for a 1.5-billion-atom MD!
VES i
0qi 12 Ji
0qi2
i
12
d3r1 d3r2
i r1;qi j r2 ;q j r12
ij
Intra-atomic Inter-atomic
Variable-charge MD
Electronegativity equalization:• Determine atomic charges at every MD step—O(N3)!
(Streitz & Mintmire, ‘94)
• i) Fast multipole method; ii) q(init)(t+t) = q(t) O(N)
Multilevel preconditioned conjugate gradient (MPCG):• Sparse, short-range interaction
matrix as a preconditioner• 20% speed up• Enhanced data locality:
parallel efficiency, 0.93 0.96for 26.5M-atom Al2O3on 64 SP2 nodes
Challenge 4: Complex realism—chemical reactions
Linear-Scaling Quantum-Mechanical Algorithm
• Density functional theory (DFT)(Kohn, ‘98 Nobel Chemistry Prize)—O(CN )O(N3 )
• Pseudopotential (Troullier & Martins, ‘91)
• Higher-order finite-difference (Chelikowsky, Saad, et al., ‘94)
• Multigrid acceleration (Bernholc, et al., ‘96)
• Spatial decomposition
O(N) algorithm (Mauri & Galli, ‘94)
• Unconstrained minimization• Localized orbitals• Parallel efficiency ~ 96% for
a 22,528-atom GaAs systemon 1,024 Cray T3E processors
Challenge 5: Complexity of ab initio QM calculations
On 1,280 IBM SP3 processors:• 8.1-billion-atom MD of SiO2• 140,000-atom DFT of GaAs
Scalable MD/QM Algorithm Suite
Design-space diagram on 1,024 Cray T3E processors
Immersive & Interactive Visualization
Last Challenge: Sequential bottleneck of graphics pipeline
• Octree data structure for fast visibility culling
• Multiresolution & hybrid (atom, texture) rendering
• Parallel preprocessing/predictive prefetch
• Graph-theoretical data mining of topological defects
2. Multidisciplinary Hybrid-Simulation Algorithms
CCLMSCCLMSCCLMSCCLMS
Multiscale SimulationLifetime prediction of safety-critical
micro-electro-mechanical systems (MEMS)
• Engineering mechanics experimentally validated > 1 m• Atomistic simulation possible < 0.1 m
[R. Ritchie, Berkeley]
Bridging the length-scale gap by seamlessly coupling:• Finite-element (FE) calculation based on elasticity;• Atomistic molecular-dynamics (MD) simulation;• Ab initio quantum-mechanical (QM) calculation.
Hybrid QM/MD Algorithm
QM
MD
Handshakeatoms
Additive hybridizationReuse of existing QM & MD codes
Handshake atomsSeamless coupling ofQM & MD systems
MD simulation embeds a QM cluster described by a real-space multigrid-based density functional theory
E EMDsystem EQM
cluster EMDcluster
FE/MD/Tight-binding QM (Abraham, Broughton, Bernstein, Kaxiras, ‘98)
Hybrid MD/FE Algorithm• FE nodes & MD atoms coincide in the handshake region• Additive hybridization
MD
FE
[0 1 1]
[1 1 1]_
HS
_[1 1 1]
[2 1 1]
Oxidation on Si Surface
Dissociation energy of O2 on a Si (111) surface dissipated seamlessly from the QM cluster through the MD regionto the FE region
QM cluster
MD FE
QM O
QM Si Handshake H
MD Si
3. Large-Scale Atomistic Simulation of Nanosystems
CCLMSCCLMSCCLMSCCLMS
Fracture Simulation & Experiment
Microcrackcoalescence
Multiplebranching
Si3N4Ti3Al alloyE. Bouchaud
Graphite GlassK. Ravi-Chandar
Good agreement with experimentsPlane Gc (MD) Gc (expt.)(110) 1.4 ± 0.1 1.72*
1.52#
Fracture Energy of GaAs: 100-million-atom MD Simulation
256 Cray T3E processors at DoD’s NAVO-MSRC
1.3 m
-0.8 0 0.8 Shear stress (GPa)
*Messmer (‘81) #Michot (‘88)
Color code: Si3N4; SiC; SiO2
Si3N4-SiC Fiber Nanocomposite
Fracture surfaces in ceramic-fiber nanocomposites:Toughening mechanisms?
1.5-billion-atom MD on 1,280 IMB SP3 processors at NAVO-MSRC
0.3 m
0
Pressure (GPa)
-5 -2 2 >20105
Nanoindentation on Silicon Nitride Surface
Use Atomic Force Microscope (AFM) tipfor nanomechanical testing of hardness
Highly compressive/tensile local stresses
10 million atom MDat ERDC-MSRC
Indentation Fracture & Amorphization
<1210> Indentation fractureat indenter diagonals
Amorphous pile-upat indenter edges
Anisotropicfracture toughness
<1010>
<0001>
Hypervelocity Impact Damage Design of damage-tolerant spacecraft
Impact graphitization
Diamond impactor
Impact velocity: 8 - 15 km/s
Diamond coating
QuickTime™ and aVideo decompressor
are needed to see this picture.Meteoroid detector onMir Orbitor
Reactive bond-order potential (Brenner, ‘90)
V = 8 km/s
V = 15 km/s
V = 11 km/s
Impact-Velocity Sensitivity
Crossover from quasi-elastic to evaporation at ~ 10 km/s
time
-
Epitaxially Grown Quantum Dots
A. Madhukar (USC)
Substrate-encoded size-reducing epitaxy
GaAs (001) substrate; <100> square mesas
10nm
101
GaAsAlGaAsQDQD
001
AlGaAs
70 nm
Stress Domains in Si3N4/Si Nanopixels
Stress domains in Sidue to an amorphousSi3N4 film
-2GPa 2GPa
27 million atom MD simulation
Stress well in Si with a crystalline Si3N4 film due to lattice mismatch
Si
Si3N4
Colloidal Semiconductor Quantum Dots
17.5 GPa
Multiple domains
Applications
• LED, display
• Pressure synthesis of novel materials
High-pressure structural transformationin a GaAs nanocrystal
22.5 GPa
30 Å
Nucleation at surface
Oxide Growth in an Al Nanoparticle
Oxide thickness saturates at 40 Å after 0.5 ns—Excellent agreement with experiments
Unique metal/ceramic nanocomposite
70 Å 110 Å
Al AlOx
4. Ongoing Projects
CCLMSCCLMSCCLMSCCLMS
Information Grid
Metacomputing collaboration with DoD MSRCs:4-billion-atom MD simulation of 0.35 m fiber composites
http://www.nas.nasa.gov/aboutNAS/Future2.html
Universal access to networked supercomputing
I. Foster & C. Kesselman, The Grid: Blueprint for a New Computating Infrastructure (‘99)
MD Moore’s LawNumber of atoms in MD simulations has doubled:• Every 19 months in the past 36 years for classical MD• Every 13 months in the past 15 years for DFT-MD
A petaflop computer will enable 1012-atom MD & 107-atom QM
CDC3600
1,280 x IBM SP3
QM
FE
MD
Si3N4 AFM Tip
Hybrid Simulation of Functionalized AFMNanodevices to design new biomolecules
Biological Computation &Visualization Center, LSU($3.9M, 2000- )
Large-scale, multiscale simulations ofrealistic nanoscale systems will be possible
in a metacomputing environmentof the Information Grid
Conclusion
Research supported
by
NSF, AFOSR, ARO, USC/LSU MURI, DOE, NASA, DOD Challenge Applications Award
CCLMSCCLMSCCLMSCCLMS