SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24,...
-
Upload
trevor-neal -
Category
Documents
-
view
217 -
download
0
Transcript of SciDAC Software Infrastructure for Lattice Gauge Theory Richard C. Brower & Robert Edwards June 24,...
SciDAC Software InfrastructureSciDAC Software Infrastructurefor Lattice Gauge Theoryfor Lattice Gauge Theory
Richard C. Brower & Robert Edwards
June 24, 2003
K. Wilson (1989 Capri):K. Wilson (1989 Capri):
““One lesson is that lattice gaugeOne lesson is that lattice gauge theory could also theory could also
require a require a 101088 increase in computer increase in computer power AND power AND spectacular algorithmicspectacular algorithmic advances before useful advances before useful
interactions with experimentinteractions with experiment ...” ...”
• ab initio Chemistry1. 1930+50 = 19802. 0.1 flops 10 Mflops3. Gaussian Basis functions
• ab initio QCD1. 1980 + 50 = 2030?*2. 10 Mflops 1000 Tflops3. Clever Collective Variable?
vs
*Hopefully sooner but need $1/Mflops $1/Gflops!
Sci
D
A
C
Through
entific
iscovery
dvancedomputing
http://www.lqcd.org/scidac
QCD Infrastructure ProjectQCD Infrastructure ProjectFunded 2001-2003 (2005?)Funded 2001-2003 (2005?)
HARDWARE: – 10+ Tflops each at BNL, FNAL & JLab
QCDOC @ BNL (2004), Clusters @ FNAL/JLab (2005-2006)
SOFTWARE: – enable US lattice physicists to use the QCDOC @ BNL and
Clusters @ FNAL & JLab
PHYSICS: – Provide Crucial Lattice “Data” that now dominate some tests of
the Standard Model.– Deeper understanding of Field Theory (and even String Theory!)
Software Infrastructure Goals: Create a unified software environment that will enable the US lattice community to achieve very high efficiency on diverse multi-terascale hardware.
TASKS: LIBRARIES:
I. QCD Data Parallel API QDP
II. Optimize Message Passing QMP
III. Optimize QCD Linear Algebra QLA
IV. I/O, Data Files and Data Grid QIO
V. Opt. Physics Codes CPS/MILC/Croma/etc.
VI. Execution Environment unify BNL/FNAL/JLab
TASKS: LIBRARIES:
I. QCD Data Parallel API QDP
II. Optimize Message Passing QMP
III. Optimize QCD Linear Algebra QLA
IV. I/O, Data Files and Data Grid QIO
V. Opt. Physics Codes CPS/MILC/Croma/etc.
VI. Execution Environment unify BNL/FNAL/JLab
Arizona Doug Toussaint FNAL Amitoj Sing
Arizona Eric Gregory Illinois Celso Mendes *
BU Rich Brower * Illinois Daniel Reed
BU Hartmut Neff JLab Robert Edwards *
BNL Chulwoo Jung JLab Chip Watson *
BNL Chris Miller JLab Jie Chen
BNL Kostantin Petrov JLab Walt Akers
Columbia Bob Mawhinney * MIT Andrew Pochinsky
FNAL Don Holmgren * Utah Carleton DeTar *
FNAL Jim Simone Utah James Osborn
Participants in Software Project (partial list)
* Software Coordinating Committee
Lattice QCD – extremely Lattice QCD – extremely uniformuniform
Periodic or very simple boundary conditions
SPMD: Identical sublattices per processor
Lattice Operator:
D x igA x x
Dirac operator:
†1ˆ ˆ ˆ( )
2D x U x x U x x
a
Optimised Dirac Operators, InvertersLevel 3
QDP (QCD Data Parallel)
Lattice Wide Operations, Data shifts
Level 2
QMP (QCD Message Passing)
QLA (QCD Linear Algebra)
Level 1
QIO
XML I/ODIME
SciDAC Software Structure
Exists in C/C++, implemented over MPI, GM, QCDOC,gigE
Optimised for P4 and QCDOC
Exists in C/C++
Overlapping communications and Overlapping communications and computationscomputations
C(x)=A(x) * shift(B, + mu):
– Send face forward non-blocking to neighboring node.
– Receive face into pre-allocated buffer. – Meanwhile do A*B on interior sites. – “Wait” on receive to perform A*B on
the face.
Lazy Evaluation (C style): Shift(tmp, B, + mu);
Mult(C, A, tmp);
Data layout over processors
QCDOC 1.5 Tflops (Fall 2003)QCDOC 1.5 Tflops (Fall 2003)
Performance of Dirac Inverters (% peak)– clover Wilson (assembly): 24 56%, 44 59%– naive Staggered (MILC) 24 14%, 44 22%
(44 assembly 38%)– Asqtad Force (MILC) 24 3%, 44 7%– Asqtad Force (1st attempt optimize) 44 16%
as determined by ASIC Simulator with native SciDAC message passing (QMP).
Cluster Performance: 2002
Future Software GoalsFuture Software Goals
Critical needs:– On going Optimization, Testing and Hardening of
SciDAC software infrastructure– Leverage SciDAC QCD infrastructure with
collaborative efforts with ILDG and SciParC projects– Develop mechanism to maintain distributed software
libraries.– Foster an international (Linux style?) development of
application code.
Message Passing QMPMessage Passing QMP
Philosophy: Subset of MPI capability appropriate to QCD
Broadcasts, Global reductions, Barrier
Minimal copying / DMA where possible Channel-oriented / asynchronous communication Multidirection sends/receives for QCDOC Grid and switch model for node layout
Implemented on GM and MPI. gigE nearly completed
QMP Simple ExampleQMP Simple Example
char buf[size];QMP_msgmem_t mm;QMP_msghandle_t mh;
mm = QMP_declare_msgmem(buf,size);mh = QMP_declare_send_relative(mm,+x);QMP_start(mh); // Do computationsQMP_wait(mh);
Receiving node coordinates with the same steps exceptmh = QMP_declare_receive_from(mm,-x);
Multiple calls
Data Parallel QDP/C,C++ APIData Parallel QDP/C,C++ API
Hides architecture and layout Operates on lattice fields across sites Linear algebra tailored for QCD Shifts and permutation maps across sites Reductions Subsets Entry/exit – attach to existing codes
Data-parallel OperationsData-parallel Operations
Unary and binary:-a; a-b; …
Unary functions:adj(a), cos(a), sin(a), …
Random numbers:// platform independent
random(a), gaussian(a)
Comparisons (booleans)
a <= b, … Broadcasts:
a = 0, … Reductions:
sum(a), …
Fields have various types (indices):
Lattice: CoA( ) lor:, , , , Spin:i j i i jx U x x Q x
QDP ExpressionsQDP Expressions Can create expressions
( ) ( ) ( ) 2 ( ) Eveni ij i ic x U x b x d x x
QDP/C++ code multi1d<LatticeColorMatrix> u(Nd); LatticeDiracFermion b, c, d; int mu; c[even] = u[mu] * shift(b,mu) + 2 * d;
PETE: Portable Expression Template Engine Temporaries eliminated, expressions optimised
Linear Algebra ImplementationLinear Algebra Implementation Naïve ops involve lattice temps –
inefficient Eliminate lattice temps -PETE Allows further combining of
operations (adj(x)*y) Overlap communications/computations
// Lattice operationA = adj(B) + 2 * C;// Lattice temporariest1 = 2 * C;t2 = adj(B);t3 = t2 + t1;A = t3;// Merged Lattice loopfor (i = ... ; ... ; ...) { A[i] = adj(B[i]) + 2 * C[i]; }
Binary File/Interchange FormatsBinary File/Interchange Formats
Metadata – data describing data; e.g., physics params
Use XML for metadataFile formats:
– Files mixed mode – XML ascii+binary– Using DIME (similar to e-mail MIME) to package– Use BinX (Edinburgh) to describe binary
Replica-catalog web-archive repositories
Current StatusCurrent Status
Releases and documentation http://www.lqcd.org/scidac
QMP, QDP/C,C++ in first releasePerformance improvements/testing underwayPorting & development of physics codes over QDP
on-goingQIO/XML support near completionCluster/QCDOC Run-time environment in
development
SciDAC Prototype ClustersSciDAC Prototype Clusters
Myrinet + Pentium 4– 48 duals 2.0 GHz P4 @ FNAL (Spring 2002)– 128 single 2.0 GHz P4 @ JLab (Summer 2002)– 128 dual 2.4 GHz P4 @ FNAL (Fall 2002)
Gigabit Ethernet Mesh + Pentium 4– 256 (8x8x4) singles 2.8 GHz P4 @ JLab (Summer 2003)– FPGA NIC for GigE Ethernet @FNAL (Summer 2003)– 256 (?) @ FNAL (Fall 2003?)
Cast of CharactersCast of Characters
Software Committee*: R.Brower (chair), C.DeTar, R.Edwards, D.Holmgren, R.Mawhinney, C.Mendes,
C.Watson
Additional Software: J.Chen, E.Gregory, J.Hetrick, B.Joó,C.Jung, J.Osborn, K.Petrov, A.Pochinsky, J.Simone et al
(*Minutes and working documents: http://physics.bu.edu/~brower/SciDAC/scc.html)
• Executive Committee: R. Brower, N. Christ, M. CreutzP. Mackenzie, J. Negele, C. Rebbi, S. Sharpe, R. Suger(chair)C. Watson