Sublattice Parallel Replica Dynamics (SLPRD) -...
Transcript of Sublattice Parallel Replica Dynamics (SLPRD) -...
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D Slide 1
Sublattice Parallel Replica Dynamics (SLPRD)
Enrique Martinez, Arthur F. Voter and
Blas P. Uberuaga
Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Exascale Computing
Slide 2
From
To
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Exascale Computing
Slide 3
"Big" computers have come a long way since the completion of the Mark I (left) at Harvard in 1944. Capable of approximately three calculations per second, the Mark I was "the first operating machine that could execute long computations
automatically," according to IBM. Today, the fastest supercomputer is the Tianhe-2 (right), developed by China’s National University of Defense Technology. The Tianhe-2 has achieved almost 34 petaflops, or 34,000,000,000,000,000 floating-point
operations per second.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D Slide 4
§ Adaptation to regional climate changes such as sea level rise, drought and
flooding, and severe weather patterns;
§ Reduction of the carbon footprint of the transportation sector;
§ Efficiency and safety of the nuclear energy sector;
§ Innovative designs for cost-effective renewable energy resources
such as batteries, catalysts, and biofuels;
§ Certification of the U.S. nuclear stockpile, life extension programs, and
directed stockpile work;
§ Design of advanced experimental facilities, such as accelerators, and
magnetic and inertial confinement fusion;
§ First-principles understanding of the properties of fission and fusion
reactions;
§ Reverse engineering of the human brain;
§ Design, control and manufacture of advanced materials.
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
The mission and science opportunities in going to exascale are compelling.
For the growing number of problems where experiments are impossible, dangerous, or inordinately costly, extreme-scale computing will enable the solution of vastly more accurate predictive models and the analysis of massive quantities of data, producing quantum advances in areas of science and technology that are essential to DOE and Office of Science missions. For example, exascale computing will push the frontiers of § Adaptation to regional climate changes such as sea level rise, drought and flooding, and severe
weather patterns; § Reduction of the carbon footprint of the transportation sector; § Efficiency and safety of the nuclear energy sector; § Innovative designs for cost-effective renewable energy resources
such as batteries, catalysts, and biofuels; § Certification of the U.S. nuclear stockpile, life extension programs, and directed stockpile work; § Design of advanced experimental facilities, such as accelerators, and magnetic and inertial
confinement fusion; § First-principles understanding of the properties of fission and fusion reactions; § Reverse engineering of the human brain; § Design, control and manufacture of advanced materials.
U.S. economic competitiveness will also be significantly enhanced as companies utilize accelerate development of superior new products and spur creativity arising from modeling and simulation at unprecedented speed and fidelity. An important ingredient is the boost to continued international leadership of the U.S.-based information technology industry at all levels, from laptop to exaflop.
Slide 5
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Making the transition to exascale poses numerous unavoidable scientific and technological challenges.
Every major change in computer architecture has led to changes, usually dramatic and often unanticipated, and the move to exascale will be no exception. At the hardware level, feature size in silicon will almost certainly continue to decrease at Moore’s Law pace at least over the next decade. To remain effective in high-end computing systems and in consumer electronics, computer chips must change in radical ways, such as containing 100 times more processing elements than today. Three challenges to be resolved are:
n Reducing power requirements. Based on current technology, scaling today’s systems to an exaflop level would consume more than a gigawatt of power, roughly the output of Hoover Dam. Reducing the power requirement by a factor of at least 100 is a challenge for future hardware and software technologies.
n Coping with run-time errors. Today’s systems have approximately 500,000 processing elements. By 2020, due to design and power constraints, the clock frequency is unlikely to change, which means that an exascale system will have approximately one billion processing elements. An immediate consequence is that the frequency of errors will increase (possibly by a factor of 1000) while timely identification and correction of errors become much more difficult.
n Exploiting massive parallelism. Mathematical models, numerical methods, and software implementations will all need new conceptual and programming paradigms to make effective use of unprecedented levels of concurrency.
Slide 6
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D Slide 7
Computer Scientist
Researcher
Compiler
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Fully atomistic models
n Classical Molecular Dynamics • Solves Newton equations of motion
• All the physics are included in the empirical potential • Extremely parallelizable in space for large systems • For small systems the parallel efficiency drops
n Parallel Replica Dynamics • Parallelizes time evolution • Assumptions:
— Infrequent events — Exponential distribution of first-escape times
n Parareal • Parallelizes time evolution
Slide 8
mr..= −∇U + f
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Traditional Parallel Replica Dynamics
Slide 9
The properties of the exponential allow us to parallelize time, by having many processors seek the first escape event. The procedure:
Must detect every transition.
dephase allow correlated events
parallel time
AFV, Phys. Rev. B 57, 13985 (1998).
add all these times together (tsum) when first event occurs on any processor
(overhead) (overhead) τcorr
τcorr
replicate
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
103
106
109
1012
1015
1018
fs ps ns µs ms s ks
Num
ber
of ato
ms
Timescale
PRD
MD
Petascale
Exascale
k=105 (s atom)
-1
The Exascale Challenge
Slide 10
Even with PRD, we have a challenge at the exascale (dephasing overhead dominates for large nproc)
PRD curves assume an average rate of 105 events per atom.second.
Petascale = 104 processors
Exascale = 107 processors
PRD uses 1 proc per replica.
void growth, dislocation pileups …
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
ηParPRD = NPPδ(AR )RS ⋅ x ⋅ tS
x ⋅ tS (1+tMtB)+ nS ⋅ (tD + tC )
Slide 11
The Exascale Challenge
Speedup:
Spatially Parallelized Efficiency (MD)
Time Parallelized Efficiency (PRD)
Exascale Challenge
As the number of processors increases the Spatially Parallelized Efficiency decreases
As the number of particles or the number of replicas goes up, the average number of events increases, PRD Efficiency decreases
Unreachable part of the exascale triangle
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Pushing PRD a bit Further
Slide 12
Can we arrange the processors in a different way so to improve the efficiency?
E. Martinez et. al, Journal of Computational Physics 230 (2011) 1359–1369 Y. Shim, J.G. Amar, Physical Review B 71 (2005) 115436
Y. Shim et al. Physical Review B 76, 205439 2007
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Sublattice PRD (SLPRD)
Slide 13
Speedup:
Spatially Parallelized Efficiency (MD)
Time Parallelized Efficiency (PRD)
ηSLPRD = NPPδ(AR )DR ⋅ x ⋅ tS
x ⋅ tS (1+tMtB)+ nmax ⋅ (tD + tC )
If the number of events is considered a stochastic variable Poisson distributed, the average of the maximum number of events can be
calculated analytically
n = nPnexact
n∑ Pn
exact = (A+Pn )D − AD A = P0 +P1 ++Pn−1
Pi =Natktadv( )i ⋅exp −Natktadv{ }
i!
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Validation
Slide 14
o 1 Vacancy in 107 Ni atoms at T=723 K gives an average rate of k=105 (s atom)−1.
o Run dynamics for 192 ps o Good agreement between theoretical
expressions and simulation results o Black points for 96 procs and red points for
512 procs 0
50
100
150
200
250
300
350
MD PRD ParPRD SLPRD
η
Method
Analytical
Simulated
0
2
4
6
8
10
12
14
1 10 100 1000 10000
<M
axi
mum
Num
ber
of E
vents
>
Number of Domains
Analytical
Sampled
Simulated
o 1 Vacancy in 107 Ni atoms at T=1022 K gives an average event rate of 1.07 107 (s atom)−1.
o Run dynamics for 2.688 ns.
Checking expression for nmax
Checking η
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Speedup with the number of processors
Slide 15
For a large numbers of processors SLPRD becomes more efficient than existing methodologies
100
101
102
103
104
105
106
102 103 104 105 106 107
η
Number of Processors
a54784 Atoms
k=105 1/s atom
SLPRD (A) ParPRD
PRDMD
SLPRD (S)
102 103 104 105 106 107 108
Number of Processors
b106 Atoms
k=106 1/s atom
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Speedup with the number of atoms
Slide 16
For an intermediate numbers of atoms SLPRD becomes more efficient than existing methodologies
10-1
100
101
102
103
104
105
106
107
103
104
105
106
107
108
109
η
Number of Atoms
a107 Processors
k=105 1/s atom
SLPRD
ParPRD
PRD
MD
103
104
105
106
107
108
109
Number of Atoms
b107 Processors
k=106 1/s atom
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Crossover in efficiency
Slide 17
104 105 106 107
rate (1/s atom)
104
105
106
107
108
109
Num
ber
of A
tom
s
103
104
105
106
107
108
Num
ber
of P
roce
ssors
There is a minimum in the number of processors for a given number of atoms and an average rate
Pool of processors needed for the SLPRD algorithm to become the most efficient
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
SLPRD optimum parameters
Slide 18
We can predict the simulation conditions to have optimum scalability
104
105
106
100 101 102 103 104
η
Number of Domains
a
107 Atoms
107 Processors
10-1 100 101 102
Tadv (ns)
b
k=107 1/s atom
k=106 1/s atom
k=105 1/s atom
k=104 1/s atom
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
The Exascale Challenge Revisited
Slide 19
SLPRD fills part of the exascale hole
103
106
109
1012
1015
1018
fs ps ns µs ms s ks
Nu
mb
er
of
ato
ms
Timescale
SLPRD
ParPRD
PRD
MD
Exascale
12
k=106 (s atom)-1
107 processors
103
106
109
1012
1015
1018
fs ps ns µs ms s ks
Nu
mb
er
of
ato
ms
Timescale
SLPRD
ParPRD
PRD
MD
Exascale
1 2
k=105 (s atom)-1
107 processors
Operated by Los Alamos National Security, LLC for the U.S. Department of Energy’s NNSA
U N C L A S S I F I E D
Conclusions
n An increase in computational power will have to come along with new algorithms.
n A strong interaction between researchers and computational scientists will have to be developed in order to take advantage of the new architectures and programming paradigms.
Slide 20