Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics J. Foley, C. Morningstar, CMU...
-
Upload
julianna-allison -
Category
Documents
-
view
212 -
download
0
Transcript of Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics J. Foley, C. Morningstar, CMU...
Dynamical Anisotropic-Clover Lattice Production for Hadronic Physics
J. Foley, C. Morningstar, CMUK. Orginos, College W&M
J. Dudek, R. Edwards, B. Joo,D. Richards, C. Thomas, JLab
S. Wallace, U. of MarylandH.-W. Lin, U. of Washington
N. Mathur, Tata InstituteM. Peardon, S. Ryan, Trinity College
Anisotropic Lattices for Hadronic Physics
• Hadronic spectroscopy– Hadron resonance determinations– Exotic meson spectrum and transition form-factors– HadSpec (Richards)
• Hadronic structure– 3-D picture of hadrons from gluon & quark spin+flavor dist.– Ground & excited E&M transition FF-s– E&M polarizabilities of hadrons– FormFactor (Orginos), EMC (Walker-Loud), DISCO (Osborn)
• Nuclear interactions– Nuclear processes relevant for stellar evolution– Hyperon-hyperon scattering– 3 & 4 nucleon interaction properties– NPLQCD (Savage)
• Physics of BSM– Neutron decay constraints on BSM from Ultra Cold Neutron source (LANL)– FormFactor (Orginos)
Hadronic Spectroscopy
New generation of experiments: hadron spectroscopy– GlueX (JLab/HallD)– Panda (GSI/Fair)– BES III (Beijing)
• Spectrum: (Anisotropic-Clover fermions)– Excited state baryon resonances (Hall B)– Conventional and exotic (hybrid) mesons (Hall
D)– Ground and excited state form-factors (Hall B)
• Critical need: – Hybrid meson spectrum/photo-coupling– Baryon spectrum
Nf=2+1 Anisotropic Clover: dynamical generation
Current proposal: 323x256 at m¼ ~ 230 MeV, as=0.1227fmTwo streams: INCITE @ ORNL & ANL15 M core-hours / 1000 traj.
Future INCITE+ESP: 483x256 at m¼ ~ 140 MeV
Blue uses NSF resources
HMC Improvements
Recent major improvements/changes:– 2-flavor Clover mass preconditioned by TWM– MD integration all in double precision– Switch to “reliable” mixed-precision IBiCG
inverter - Threading + QMP -> coalesce communications
Current effort:– Implementing force-gradient integrator– Tuning viaShadow Hamiltonian– >~ 2X improvement?
Performance of Cray vs BG/P
Cray hour ~ BG/P hour
Performance of Cray vs BG/P
Crays: loaded system -> comms interference -> perf. degradation
BG/P
Crays (loaded) Cray (dedicated)
Current production strategy
• Two streams of 32^3x256 @ 230 MeV – ORNL+ANL
• Appear sufficiently decorrelated• Expect total of 10k trajectories with 5k in each
stream
Priorities
• Current calculations at m¼ ~ 230 MeV• Finite volume effects:
– Crucial for resonance/scattering extraction• Chiral effects (large pion mass) appear large
– Excited resonance: chiral extrap. problematic• High statistics important (~1000 cfgs)• Discretization effects (rotational) appear
negligible– Evidence via spectra of Subduced operators
• Priorities:1. Physical limit @ 6fm box -> 483x2562. Second lattice spacing @ ~500MeV pion
mass
Longer term
• Unify calculations of spectroscopy & structure• Idea outlined at NP Exascale meeting• Clover @ small lattice spacing (< 0.06fm)
Hadron Spectrum: 2010-2011
Complete factorization of resource requirements
• Gauge generation:– INCITE: Crays/BG/P-s, ~ 16K – 24K cores– Double precision with mixed precision solves
• Valence spectrum: Distillation– Perambulators (~propagators)
• 1 GPU (243x128) or 8 GPU+IB (323x256)• Currently single precision (could be double)
– Contractions: • Clusters: many cores, 1 time-slice per core• Double precision• Infiniband only for I/O
Distillation: mesons
• Smearing in correlator: use low-rank approximation
• Correlator
• Factorizes: operators and perambulators
• A posteriori evaluation of C(2) after ¿arxiv:0905.2160
Distillation: annihilation diagrams
• Two-meson creation op
• Correlator
arxiv:0905.2160
Perambulators
• Current calculation (243x128, m¼ ~ 230MeV)• Solve: [all space sites]• Perambulator
• High angular momentum (J = 4), want N = 128• All time-sources -> 64K inversions / config• Can construct all possible source/sink multi-particles
• Ideally suited for GPU-s: – 300 configs on 180 GPU-s -> 4 months @ 185
GF/gpu
Care and feeding of GPUs
• Front-end overhead a BIG concern (Amdahl’s law)• Solver (IBiCGstab): 1 GPU ~ 256 7n-cores
• Front-end memory footprint: 12 GB• Time/solve ~ 100 secs• Inner-products only over (at most) 8 cores
• For work on 2 gpus/box (4 cores): lose ~ 10%– Using threading in QDP/C++ with QMT or OpenMP
• Older cards: no ECC, double precision performance small
• Mileage will vary for other applications…
Operators and contractions
• New operator technique: Subduction – Derivative-based continuum ops -> lattice
irreps– Operators at rest or in-flight, mesons &
baryons
• Large basis of operators -> lots of contractions– E.g., nucleon Hg 49 ops up through 2 derivs
• Feed all this to variational method
– Diagonalization: handles near degeneracies
PRL 103 (2009)
Subduced operators: demonstration
GPU results in “friendly-user” time: m¼ ~ 383MeV, 163x128, 479 configs
< 2% error bars
Spin identified
Need multi-hadron ops !!
Prospects
• Gauge production: – Used by several proposals– ORNL: Run-time environment effects on performance– ANL: welcome “big-job” queues
• Distillation + subduction– Looks promising!– Framework for multi-particle states– Flexible: useful for 2-pt and 3-pt
• GPU-s– Powerful resource for inversions– New ECC+double precision -> handle contractions