Post on 03-Feb-2018
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Two case studies of Monte Carlo simulation
on GPU
Junqi Yin
Material Theory GroupOak Ridge National Lab
Titan Summit, 2011
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Outline
1 Introduction
2 Discrete energy lattice model: Ising model
3 Continuous energy off-lattice model: Water model
4 Summary
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Monte Carlo simulation
What is Monte Carlo simulation?
A class of methods to solve problems by repeated randomsampling
Random number generator on GPU
Linear congruential RNG
CURAND library
Mersenne Twister with dynamically generated parameters
Applications of MC
Physics(statistical sampling)
Mathematics(numerical integrations)
Finance(option pricing)
· · ·
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Desktop GPUs
GTX 285 Tesla C1060 Tesla C2050
Processor elements 240 240 448
Peak performance (GFLOPS single)
1062 933 1030
Peak performance (GFLOPS double)
88.5 77 515
Memory Bandwidth(GB/s)
159 102 144
Memory size(GB)
1 4 3
Cost $330 $1300 $2500
•2008-2009 •2010
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Discrete energy lattice model
Ising Model
E = J∑
<i ,j>
σiσj + H∑
i
σi σi = ±1
fruit fly for statistical physics
describe magnetic phase transition
study critical phenomena
· · ·
Ising square lattice
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Parallelism in space
Generate new configuration: Metropolis single spin flip
W (σ ↔ −σ) = min[1, exp(− δEkBT
)]
checkerboard update
step 1: update all spins insublattice black
step 2: update all spins insublattice white
If the next-nearest neighborinteraction is included, 4sublattices are needed.
checkerboard decomposition
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Parallelism in algorithm
Parallel Tempering (replica exchange)
W (βm ↔ βn) = min[1, exp(−(βn − βm)(Em − En))]
Advantage
Overcome energybarriers
Fast approach toequilibrium
More independentsamples
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Overall implementation
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Code
Main functiondim3 grid(nT,L/2);
dim3 block(L/2);
for(MC steps) {// Monte Carlo Sweeps
Kernel<<<grid,block>>>(d Spin,d random,1);
Kernel<<<grid,block>>>(d Spin,d random,2);
Kernel<<<grid,block>>>(d Spin,d random,3);
Kernel<<<grid,block>>>(d Spin,d random,4);
· · · · ··}
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Code
Kernel function
global void Kernel(int* Spin, int* State, int sublattice) {int tid = threadIdx.x, bidx = blockIdx.x, bidy = blockIdx.y;
shared int r[2*L/2];
// Load state for random number generator
r[tid] = State[tid + L/2*bidy + L/2*L/2*bidx];
r[tid+L/2] = State[tid + L/2*bidy + L/2*L/2*bidx + nT*L/2*L/2];
switch(sublattice) {case(1):
site = 2*tid + bidx*L + 2*L*nT*bidy;
· · · · ··case(2):
site = 2*tid + bidx*L + 2*L*nT*bidy + 1;
· · · · ··case(3):
site = 2*tid + bidx*L + 2*L*nT*bidy + 1 + L*nT;
· · · · ··case(4):
site = 2*tid + bidx*L + 2*L*nT*bidy + L*nT;
· · · · ··}
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Optimization
Optimize memory transfers
measure E, M, etc on device
swap {σ}T on device
array({σ}) for coalescedmemory access
lookup table on constantmemory
use shared memory as acache
Maximize arithmetic intensity
more computation permemory access(28 SPFP perload for Fermi)
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Performance
100 150 200 250 300 350 400
50
100
150
200
250
300
Tim
e (h
ours
)
L
MPI with 32 CPU on IBM p655 cluster GPU on GeForce GTX285
32 replica 107 MC steps
J. Yin and D. P. Landau, Phys. Rev. E80 051117 (2009)
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Continuous energy off-lattice model
Water Model
ǫmn =
m∑
i
n∑
j
qiqje2
rij+
A
r12OO
−C
r6OO
rOH (A) 0.9572∠HOH(deg) 104.52
rOM(A) 0.15qH 0.52qM -1.04
A× 10−3(kcal A12/mol) 600
C(kcal A12/mol) 610
parameter for water model TIP4P
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
General ensemble sampling
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Implementation
walker 1 2 3
Ene
rgy
Monte Carlo steps
H(E)
g(E)
GPU 1 GPU 2 GPU 3
H(E) g(E)
H2(E) g2(E) H3(E) g3(E) H4(E) g4(E) H1(E) g1(E)
GPU 4
“gather” instead of “scatter”
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Results
0 100 200 300
0.06
0.08
0.10
(H2O)50
c(kJ
/K/m
ol/m
olec
ule)
T(K)
For more results on water models, refer to J. Yin and D.P. Landau,J. Chem. Phys 134, 074501(2011)
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Tuning
2 4 6
8
12
(512,8)
(256,16)(128,32)
(64,64)(32,128)
(16,256)
(8,512)
(H2O)12 on Tesla C2050
Configuration(grid, block)
T(m
s / M
CS
)
Maximize processor occupancy
Optimize execution configuration (Occupancy calculator)
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Tuning
0 1000 2000
2.0x105
4.0x105
6.0x105
0 1000 2000
2x103
3x103
4x103
Tim
e(s)
MC
S
(H2O)12 on Tesla C2050 Monte Carlo Step(MCS)
MCS to Synchronize
Wall time
Balance
Tradeoff between sync overhead and convergence
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Performance
8 12 16
1E-3
0.01
0.1
T(m
s / M
CS
)
N
i7-930 GTX285 Tesla S1070 Tesla C2050
1 2 3 4
20
40
60
80
(b)
MC
S/s
#GPU
(H2O)
20
Tesla S1070 (each card with 1024 walkers) Linear Fit
Junqi Yin Two case studies of Monte Carlo simulation on GPU
Outline Introduction Discrete energy lattice model: Ising model Continuous energy off-lattice model: Water model Summary
Summary
Ising models
The implementation can be applied to variant lattice spinmodels, such as Heisenberg model, spin glass, etc.
50× ∼ 100× speedup can be expected.
Water models
Algorithms involving many replicas can straightforwardly beextended to multi-GPUs.
Wang-Landau method is a good candidate for samplingcomplex systems on GPU.
Junqi Yin Two case studies of Monte Carlo simulation on GPU