ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National...
-
Upload
belinda-sharp -
Category
Documents
-
view
214 -
download
0
Transcript of ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National...
ACTS Toolsand
Case Studies of their Use
Osni Marques and Tony DrummondLawrence Berkeley National Laboratory
[oamarques,ladrummond]@lbl.gov
NERSC User Group Meeting June 24-25, 2004
06/24/2004NERSC User Group Meeting 2
Goals Extended support for experimental software
Make ACTS tools available on DOE computers
Provide technical support
Maintain ACTS information center
Coordinate efforts with other supercomputing centers
Enable large scale scientific applications
Educate and train
• High• Intermediate level• Tool expertise• Conduct tutorials
What is the DOE ACTS Collection?
• Advanced CompuTational Software Collection
• Tools for developing parallel applications
• ACTS started as an “umbrella” project
http://acts.nersc.gov [email protected]
• Intermediate• Basic level• Higher level of support to users of the tool
• Basic• Help with installation • Basic knowledge of the tools• Compilation of user’s reports
06/24/2004NERSC User Group Meeting 3
Motivation 1: Challenges in the Development of Scientific Codes
• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements
• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity
• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications
• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements
• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity
• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications
• Libraries written in different languages.• Discussions about standardizing
interfaces are often sidetracked into implementation issues.
• Difficulties managing multiple libraries developed by third-parties.
• Need to use more than one language in one application.
• The code is long-lived and different pieces evolve at different rates
• Swapping competing implementations of the same idea and testing without modifying the code
• Need to compose an application with some other(s) that were not originally designed to be combined
• Libraries written in different languages.• Discussions about standardizing
interfaces are often sidetracked into implementation issues.
• Difficulties managing multiple libraries developed by third-parties.
• Need to use more than one language in one application.
• The code is long-lived and different pieces evolve at different rates
• Swapping competing implementations of the same idea and testing without modifying the code
• Need to compose an application with some other(s) that were not originally designed to be combined
06/24/2004NERSC User Group Meeting 4
Motivation 2: Addressing the Performance Gap through Software
Peak performance is skyrocketing In 1990s, peak performance increased
100x; in 2000s, it will increase 1000x
But Efficiency for many science
applications declined from 40-50% on the vector supercomputers of 1990s to as little as 5-10% on parallel supercomputers of today
Need research on Mathematical methods and algorithms
that achieve high performance on a single processor and scale to thousands of processors
More efficient programming models for massively parallel supercomputers
0.1
1
10
100
1,000
2000 2004
Ter
aflo
ps
1996
PerformanceGap
Peak Performance
Real Performance
06/24/2004NERSC User Group Meeting 5
ACTS Tools Functionalities Category Tool Functionalities
Numerical
Aztec Algorithms for the iterative solution of large sparse linear systems.
Hypre Algorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters.
PETSc Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations.
OPT++ Object-oriented nonlinear optimization package.
SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.
ScaLAPACK Library of high performance dense linear algebra routines for distributed-memory message-passing.
SuperLU General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.
TAO Large-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization.
Code DevelopmentGlobal Arrays Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-
memory view of distributed arrays.
Overture Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries.
Code Execution
CUMULVS Framework that enables programmers to incorporate fault-tolerance, interactive visualization and computational steering into existing parallel programs
Globus Services for the creation of computational Grids and tools with which applications can be developed to access the Grid.
PAWS Framework for coupling parallel applications within a component-like model.
SILOON Tools and run-time support for building easy-to-use external interfaces to existing numerical codes.
TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs.
Library Development
ATLAS and PHiPAC Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers.
PETE Extensible implementation of the expression template technique (C++ technique for passing expressions as function arguments).
ODEs
PDEs
TVUAzAz
bAx
06/24/2004NERSC User Group Meeting 6
Use of ACTS Tools
3D overlapping grid for a submarine produced with Overture’s module ogen.
Model of a "hard" sphere included in a "soft" material, 26 million d.o.f.
Unstructured meshes in solid mechanics using Prometheus and
PETSc (Adams and Demmel).
Multiphase flow using PETSc, 4 million cell blocks, 32 million DOF, over 10.6
Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes
(Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler).
3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a
legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of
Kaushik and Keyes).
Electronic structure optimization performed with TAO, (UO2)3(CO3)6
(courtesy of deJong).
Molecular dynamics and thermal flow simulation using codes based on Global
Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid,
etc.
06/24/2004NERSC User Group Meeting 7
Use of ACTS Tools
Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon,
Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.
OPT++ is used in protein energy minimization problems (shown
here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)
Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution
of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution
of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.
Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.
06/24/2004NERSC User Group Meeting 8
PETSc
Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK
Profiling Interface
PETSc PDE Application Codes
Object-OrientedMatrices, Vectors, Indices
GridManagement
Linear SolversPreconditioners + Krylov Methods
Nonlinear Solvers,Unconstrained Minimization
ODE Integrators Visualization
Interface
Portable, Extensible Toolkit for Scientific Computation
Vectors: Fundamental objects for storing field solutions, right-hand
sides, etc.
Matrices: Fundamental objects for storing linear
operators (e.g., Jacobians)
06/24/2004NERSC User Group Meeting 9
PETSc - Numerical Components
CompressedSparse Row
(AIJ)
Blocked CompressedSparse Row
(BAIJ)
BlockDiagonal(BDIAG)
Dense Other
Indices Block Indices Stride Other
Index Sets
Vectors
Line Search Trust Region
Newton-based MethodsOther
Nonlinear Solvers
AdditiveSchwartz
BlockJacobi Jacobi ILU ICC
LU(Sequential only) Others
Preconditioners
EulerBackward
EulerPseudo Time
Stepping Other
Time Steppers
GMRES CG CGS Bi-CG-STAB TFQMR Richardson Chebychev Other
Krylov Subspace Methods
Matrices
Distributed Arrays
Matrix-free
06/24/2004NERSC User Group Meeting 10
TAO
Toolkit for Advanced Optimization
}:)(min{ ul xxxxf
06/24/2004NERSC User Group Meeting 11
OPT++
• Assumptions:• Objective function is smooth
• Twice continuously differentiable
• Constraints are linearly independent
• Expensive objective functions
• Four major classes of problems available• NLF0(ndim, fcn, init_fcn, constraint) basic nonlinear function,
no derivative information available
• NLF1(ndim, fcn, init_fcn, constraint) nonlinear function, first derivative information available
• FDNLF1(ndim, fcn, init_fcn, constraint) nonlinear function, first derivative information approximated
• NLF2(ndim, fcn, init_fcn, constraint) nonlinear function, first and second derivative information available
Objective function
0)(
,0)(..
xg
xhts
),(min xfnx
)()()( xgwxhyxfL TT
Equality constraints
Inequality constraints
06/24/2004NERSC User Group Meeting 12
Hypre
Data Layout
structured composite block-struc unstruc CSR
Linear Solvers
GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...
Linear System Interfaces
06/24/2004NERSC User Group Meeting 13
SUNDIALS
SUite of Nonlinear and DIfferential/ALgebraic Solvers
BandLinearSolver
PreconditionedGMRES
Linear Solver
GeneralPreconditioner
Modules
VectorKernels
DenseLinearSolver
User main routineUser problem-defining functionUser preconditioner function
CVODEODE
Integrator
IDADAE
Integrator
KINSOLNonlinear
Solver
Solvers
• x’ = f(t,x), x(t0) = x0 CVODE• F(t,x,x’) = 0, x(t0) = x0 IDA• F(x) = 0 KINSOL
Sensitivity analysis (forward and adjoint) also available
• Software Hierarchy• Data distribution• Simple examples• Application
ScaLAPACK
06/24/2004NERSC User Group Meeting 15
ScaLAPACK: software hierarchy
ScaLAPACK
BLAS
LAPACK BLACS
MPI/PVM/...
PBLASGlobal
Local
platform specific
Clarity,modularity, performance and portability.
Atlas can be used here for automatic tuning.
Clarity,modularity, performance and portability.
Atlas can be used here for automatic tuning.
Linear systems, least squares, singular value decomposition,
eigenvalues.
Linear systems, least squares, singular value decomposition,
eigenvalues.
Communication routines targeting
linear algebra operations.
Communication routines targeting
linear algebra operations.
Parallel BLAS.
Parallel BLAS.
Communication layer (message
passing).
Communication layer (message
passing).
http://acts.nersc.gov/scalapack
06/24/2004NERSC User Group Meeting 16
ScaLAPACK: 2D Block-Cyclic Distribution
a11 a12 a15 a13 a14
a21 a22 a25 a23 a24
a51 a52 a55 a53 a54
a31 a32 a35 a33 a34
a41 a42 a45 a43 a44
5x5 matrix partitioned in 2x2 blocks 2x2 process grid point of view
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
a11 a12 a13 a14 a15
a21 a22 a23 a24 a25
a31 a32 a33 a34 a35
a41 a42 a43 a44 a45
a51 a52 a53 a54 a55
0 1
2 3
http://acts.nersc.gov/scalapack/hands-on/datadist.html
06/24/2004NERSC User Group Meeting 17
2D Block-Cyclic Distribution
5.54.53.52.51.5
5.44.43.42.41.4
5.34.33.32.31.3
5.24.23.22.21.2
5.14.13.12.11.1
CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
IF ( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN A(1) = 1.1; A(2) = -2.1; A(3) = -5.1; A(1+LDA) = 1.2; A(2+LDA) = 2.2; A(3+LDA) = -5.2; A(1+2*LDA) = 1.5; A(2+3*LDA) = 2.5; A(3+4*LDA) = -5.5;ELSE IF ( MYROW.EQ.0 .AND. MYCOL.EQ.1 ) THEN A(1) = 1.3; A(2) = 2.3; A(3) = -5.3; A(1+LDA) = 1.4; A(2+LDA) = 2.4; A(3+LDA) = -5.4; ELSE IF ( MYROW.EQ.1 .AND. MYCOL.EQ.0 ) THEN A(1) = -3.1; A(2) = -4.1; A(1+LDA) = -3.2; A(2+LDA) = -4.2; A(1+2*LDA) = 3.5; A(2+3*LDA) = 4.5;ELSE IF ( MYROW.EQ.1 .AND. MYCOL.EQ.1 ) THEN A(1) = 3.3; A(2) = -4.3; A(1+LDA) = 3.4; A(2+LDA) = 4.4; END IF CALL PDGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU, JU, DESCU, VT, IVT, JVT, DESCVT, WORK, LWORK, INFO )
a11 a12 a15 a13 a14
a21 a22 a25 a23 a24
a51 a52 a55 a53 a54
a31 a32 a35 a33 a34
a41 a42 a45 a43 a44
0 1
2 3
0 1
0
1
LDA is the leading dimension of the local array
LDA is the leading dimension of the local array
Array descriptor for A (contains information about A)
Array descriptor for A (contains information about A)
On line tutorial: http://acts.nersc.gov/scalapack/hands-on/main.html
PROGRAM PSGESVDRIVER** Example Program solving Ax=b via ScaLAPACK routine PSGESV** .. Parameters ..
* ..* .. Local Scalars ..
* ..* .. Local Arrays ..
* .. Executable Statements ..** INITIALIZE THE PROCESS GRID* CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )** If I'm not in the process grid, go to the end of the program* IF( MYROW.EQ.-1 ) $ GO TO 10** DISTRIBUTE THE MATRIX ON THE PROCESS GRID* Initialize the array descriptors for the matrices A and B* CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA, $ INFO ) CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT, $ MXLLDB, INFO )** Generate matrices A and B and distribute to the process grid* CALL MATINIT( A, DESCA, B, DESCB )** Make a copy of A and B for checking purposes* CALL PSLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA ) CALL PSLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB )** CALL THE SCALAPACK ROUTINE* Solve the linear system A * X = B* CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, $ INFO )
/**********************************************************************//* This program illustrates the use of the ScaLAPACK routines PDPTTRF *//* and PPPTTRS to factor and solve a symmetric positive definite *//* tridiagonal system of linear equations, i.e., T*x = b, with *//* different data in two distinct contexts. */ /**********************************************************************/
/* a bunch of things omitted for the sake of space */
main(){ /* Start BLACS */ Cblacs_pinfo( &mype, &npe ); Cblacs_get( 0, 0, &context ); Cblacs_gridinit( &context, "R", 1, npe ); /* Processes 0 and 2 contain d(1:4) and e(1:4) */ /* Processes 1 and 3 contain d(5:8) and e(5:8) */ if ( mype == 0 || mype == 2 ){ d[0]=1.8180; d[1]=1.6602; d[2]=1.3420; d[3]=1.2897; e[0]=0.8385; e[1]=0.5681; e[2]=0.3704; e[3]=0.7027; } else if ( mype == 1 || mype == 3 ){ d[0]=1.3412; d[1]=1.5341; d[2]=1.7271; d[3]=1.3093; e[0]=0.5466; e[1]=0.4449; e[2]=0.6946; e[3]=0.0000; } if ( mype == 0 || mype == 1 ) { /* New context for processes 0 and 1 */ map[0]=0; map[1]=1; Cblacs_get( context, 10, &context_1 ); Cblacs_gridmap( &context_1, map, 1, 1, 2 ); /* Right-hand side is set to b = [ 1 2 3 4 5 6 7 8 ] */ if ( mype == 0 ) { b[0]=1.0; b[1]=2.0; b[2]=3.0; b[3]=4.0; } else if ( mype == 1 ) { b[0]=5.0; b[1]=6.0; b[2]=7.0; b[3]=8.0; } /* Array descriptor for A (D and E) */ desca[0]=501; desca[1]=context_1; desca[2]=n; desca[3]=nb; desca[4]=0; desca[5]=lda; desca[6]=0; /* Array descriptor for B */ descb[0]=502; descb[1]=context_1; descb[2]=n; descb[3]=nb; descb[4]=0; descb[5]=ldb; descb[6]=0; /* Factorization */ pdpttrf( &n, d, e, &ja, desca, af, &laf, work, &lwork, &info ); /* Solution */ pdpttrs( &n, &nrhs, d, e, &ja, desca, b, &ib, descb, af, &laf, work, &lwork, &info ); printf( "MYPE=%i: x[:] = %7.4f %7.4f %7.4f %7.4f\n", mype, b[0], b[1], b[2], b[3]); }
else { /* New context for processes 0 and 1 */ map[0]=2; map[1]=3; Cblacs_get( context, 10, &context_2 ); Cblacs_gridmap( &context_2, map, 1, 1, 2 ); /* Right-hand side is set to b = [ 8 7 6 5 4 3 2 1 ] */ if ( mype == 2 ) { b[0]=8.0; b[1]=7.0; b[2]=6.0; b[3]=5.0; } else if ( mype == 3 ) { b[0]=4.0; b[1]=3.0; b[2]=2.0; b[3]=1.0; } /* Array descriptor for A (D and E) */ desca[0]=501; desca[1]=context_2; desca[2]=n; desca[3]=nb; desca[4]=0; desca[5]=lda; desca[6]=0; /* Array descriptor for B */ descb[0]=502; descb[1]=context_2; descb[2]=n; descb[3]=nb; descb[4]=0; descb[5]=ldb; descb[6]=0; /* Factorization */ pdpttrf( &n, d, e, &ja, desca, af, &laf, work, &lwork, &info ); /* Solution */ pdpttrs( &n, &nrhs, d, e, &ja, desca, b, &ib, descb, af, &laf, work, &lwork, &info ); printf( "MYPE=%i: x[:] = %7.4f %7.4f %7.4f %7.4f\n", mype, b[0], b[1], b[2], b[3]); } Cblacs_gridexit( context ); Cblacs_exit( 0 );}
Using Matlab notation:
T = diag(D)+diag(E,-1)+diag(E,1)
where
D = [ 1.8180 1.6602 1.3420 1.2897 1.3412 1.5341 1.7271 1.3093 ] E = [ 0.8385 0.5681 0.3704 0.7027 0.5466 0.4449 0.6946 ]
Then, solving T*x = b,
if b = [ 1 2 3 4 5 6 7 8 ] x = [ 0.3002 0.5417 1.4942 1.8546 1.5008 3.0806 1.0197 5.5692 ] if b = [ 8 7 6 5 4 3 2 1 ] x = [ 3.9036 1.0772 3.4122 2.1837 1.3090 1.2988 0.6563 0.4156 ]
06/24/2004NERSC User Group Meeting 21
Application: Cosmic Microwave Background (CMB) Analysis
• The statistics of the tiny variations in the CMB (the faint echo of the Big Bang) allows the determination of the fundamental parameters of cosmology to the percent level or better.
• MADCAP (Microwave Anisotropy Dataset Computational Analysis Package)
• Makes maps from observations of the CMB and then calculates their angular power spectra. (See http://crd.lbl.gov/~borrill).
• Calculations are dominated by the solution of linear systems of the form M=A-1B for dense nxn matrices A and B scaling as O(n3) in flops. MADCAP uses ScaLAPACK for those calculations.
• On the NERSC Cray T3E (original code):• Cholesky factorization and triangular solve.• Typically reached 70-80% peak performance.• Solution of systems with n ~ 104 using tens of processors. • The results demonstrated that the Universe is spatially flat, comprising
70% dark energy, 25% dark matter, and only 5% ordinary matter.
• On the NERSC IBM SP:• Porting was trivial but tests showed only 20-30% peak performance. • Code rewritten to use triangular matrix inversion and triangular matrix
multiplication one-day work • Performance increased to 50-60% peak.• Solution of previously intractable systems with n ~ 105 using hundreds
of processors.
The international BOOMERanG collaboration announced results of the most detailed measurement of the cosmic microwave background radiation (CMB), which strongly indicated that the universe is flat (Apr. 27, 2000).
• Functionalities• Simple examples• Application
SuperLU
06/24/2004NERSC User Group Meeting 23
SuperLU
• Solve general sparse linear system A x = b.
• Algorithm: Gaussian elimination (factorization A = LU), followed by lower/upper triangular solutions.
• Efficient and portable implementation for high-performance architectures, flexible interface.
SuperLU SuperLU_MT SuperLU_DIST
Platform Serial SMP Distributed
Language C C and Pthread (or pragmas)
C and MPI
Data type Real/complex,
Single/double
Real, double Real/complex,
Double
06/24/2004NERSC User Group Meeting 24
SuperLU: Example Pddrive.c (1/2)
#include "superlu_ddefs.h“
main(int argc, char *argv[]){ superlu_options_t options; SuperLUStat_t stat; SuperMatrix A; ScalePermstruct_t ScalePermstruct; LUstruct_t LUstruct; SOLVEstruct_t SOLVEstruct; gridinfo_t grid;
· · · · · · /* Initialize MPI environment */ MPI_Init( &argc, &argv );
· · · · · · /* Initialize the SuperLU process grid */
nprow = npcol = 2; superlu_gridinit(MPI_COMM_WORLD,
nprow,npcol, &grid);
/* Read matrix A from file, distribute it, and set up theright-hand side */
dcreate_matrix(&A, nrhs, &b, &ldb, &xtrue, &ldx, fp, &grid);
/* Set the options for the solver. Defaults are:
options.Fact = DOFACT; options.Equil = YES; options.ColPerm = MMD_AT_PLUS_A; options.RowPerm = LargeDiag; options.ReplaceTinyPivot = YES; options.Trans = NOTRANS; options.IterRefine = DOUBLE; options.SolveInitialized = NO; options.RefineInitialized = NO; options.PrintStat = YES; */ set_default_options_dist(&options);
06/24/2004NERSC User Group Meeting 25
SuperLU: Example Pddrive.c (2/2)
/* Initialize ScalePermstruct and LUstruct. */ ScalePermstructInit(m, n, &ScalePermstruct); LUstructInit(m, n, &LUstruct); /* Initialize the statistics variables. */ PStatInit(&stat); /* Call the linear equation solver. */ pdgssvx(&options, &A, &ScalePermstruct, b,
ldb, nrhs, &grid, &LUstruct, &SOLVEstruct, berr, &stat,
&info);
/* Print the statistics. */
PStatPrint(&options, &stat, &grid);
/* Deallocate storage */
PStatFree(&stat);
Destroy_LU(n, &grid, &LUstruct); LUstructFree(&LUstruct);
/* Release the SuperLU process grid */
superlu_gridexit(&grid);
/* Terminate the MPI execution environment */
MPI_Finalize();
}
06/24/2004NERSC User Group Meeting 26
Application: Accelerator Cavity Design (1/2)
• Calculate cavity mode frequencies and field vectors
• Solve Maxwell equation in electromagnetic field• Omega3P simulation code developed at SLAC• Finite element methods lead to large sparse
generalized eigensystem K x = M x• Real symmetric for lossless cavities; complex
symmetric when lossy in cavities• Seek interior eigenvalues (tightly clustered) that are
relatively small in magnitude
Omega3P model of a 47-cell section of the 206-cell Next Linear Collider accelerator structure
06/24/2004NERSC User Group Meeting 27
Application: Accelerator Cavity Design (2/2)
dds47 Performace (16 eigenvalues)
0
2000
4000
6000
8000
10000
12000
14000
16 32 48 64
Processors
Tim
e (s
)
Minimum Degree
METIS
Filtering
METIS+Shared
Minimum Degree Shared
• Speed up Lanczos convergence by shift-invert. Seek largest eigenvalues, well separated, of the transformed system M (K - M)-1 x = M x, = 1 / ( - )
• The Filtering algorithm [Y. Sun]: Inexact shift-invert Lanczos + JOCC• Exact shift-invert Lanczos (ESIL): PARPACK+ SuperLU_DIST for shifted
linear system (no pivoting, no iterative refinement)
dds47 (linear elements) - total eigensolver time: N=1.3x106, NNZ = 20x106
• Functionalities• Simple examples• Application
TAU
06/24/2004NERSC User Group Meeting 29
TAU: Functionalities
• Profiling of Java, C++, C, and Fortran codes
• Detailed information (much more than prof/gprof)
• Profiles for each unique template instantiation
• Time spent exclusively and inclusively in each function
• Start/Stop timers
• Profiling data maintained for each thread, context, and node
• Parallel IO Statistics for the number of calls for each profiled function
• Profiling groups for organizing and controlling instrumentation
• Support for using CPU hardware counters (PAPI)
• Graphic display for parallel profiling data
• Graphical display of profiling results (built-in viewers, interface to Vampir)
06/24/2004NERSC User Group Meeting 30
TAU: Example 1 (1/4)
Index of http://acts.nersc.gov/tau/programs/psgesv
Options currently installed:
option used in the example
06/24/2004NERSC User Group Meeting 31
TAU: Example 1 (2/4)
PROGRAM PSGESVDRIVER!! Example Program solving Ax=b via ScaLAPACK routine PSGESV!! .. Parameters ..
!**** a bunch of things omitted for the sake of space **** ! .. Executable Statements ..!! INITIALIZE THE PROCESS GRID! integer profiler(2) save profiler
call TAU_PROFILE_INIT() call TAU_PROFILE_TIMER(profiler,'PSGESVDRIVER') call TAU_PROFILE_START(profiler) CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )
!**** a bunch of things omitted for the sake of space ****
CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, & INFO )
!**** a bunch of things omitted for the sake of space ****
call TAU_PROFILE_STOP(profiler) STOP END
psgesvdriver.int.f90
NB. ScaLAPACK routines have not been instrumented and therefore are not shown in the charts.
06/24/2004NERSC User Group Meeting 32
TAU: Example 1 (3/4)
…
…
06/24/2004NERSC User Group Meeting 33
TAU: Example 1 (4/4)
06/24/2004NERSC User Group Meeting 34
TAU: Example 2 (1/2)
Index of http://acts.nersc.gov/tau/programs/pdgssvx
OBJS = pddrive.o dcreate_matrix.o pdgssvx.o TARGET = pddrive.x COMPRULE = $(PDTCPARSE) $< $(COMPFLAGS) $(CDEFS) $(INCLUDEDIR); \ $(TAUINSTR) $*.pdb $< -o $*.inst.c -g "COMPUTATION" ; \ $(CC) $(COMPFLAGS) $(CDEFS) $(BLASDEF) $(INCLUDEDIR) -c $*.inst.c -o $@ ; \ rm -f $*.pdb ; $(TARGET): $(OBJS) $(LINKER) $(LNKFLAGS) $(OBJS) $(LIBS) $(TAULIBS) -lm -o $@ # Compilation rule .c.o: $(COMPRULE)
Makefile: compilation rule
include $(TAUROOTDIR)/rs6000/lib/Makefile.tau-mpi-papi-pdt-profile-trace
06/24/2004NERSC User Group Meeting 35
TAU: Example 2 (2/2)
PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact [email protected] for the corresponding TAU events). In this example we are just measuring FLOPS.
PARAPROF
06/24/2004NERSC User Group Meeting 36
Study Case: Electronic Structure Calculations Code
…
Additional Information
http://acts.nersc.govhttp://acts.nersc.gov
Tool descriptions, installation
details, examples, etc
Agenda, accomplishmen
ts, conferences, releases, etc
Goals and other relevant
information
Points of
contact
Search engine
• High Performance Tools• portable• library calls• robust algorithms• help code optimization
• Scientific Computing Centers (like PSC, NERSC):
• Reduce user’s code development time that sums up in more production runs and faster and effective scientific research results
• Overall better system utilization• Facilitate the accumulation and
distribution of high performance computing expertise
• Provide better scientific parameters for procurement and characterization of specific user needs
• High Performance Tools• portable• library calls• robust algorithms• help code optimization
• Scientific Computing Centers (like PSC, NERSC):
• Reduce user’s code development time that sums up in more production runs and faster and effective scientific research results
• Overall better system utilization• Facilitate the accumulation and
distribution of high performance computing expertise
• Provide better scientific parameters for procurement and characterization of specific user needs
See also: http://acts.nersc.gov/documentsSee also: http://acts.nersc.gov/documents
• An expanded Framework for the Advanced Computational Testing and Simulation Toolkit, http://acts.nersc.gov/documents/Proposal.pdf
• The Advanced Computational Testing and Simulation (ACTS) Toolkit. Technical Report LBNL-50414.
• A First Prototype of PyACTS. Technical Report LBNL-53849. • ACTS - A collection of High Performing Tools for Scientific
Computing. Technical Report LBNL-53897.• The ACTS Collection: Robust and high-performance tools for
scientific computing. Guidelines for tool inclusion and retirement. Technical Report LBNL/PUB-3175.
• An Infrastructure for the creation of High End Scientific and Engineering Software Tools and Applications. Technical Report LBNL/PUB-3176.
ReferencesReferences
[email protected]://acts.nersc.gov
To appear: two journals featuring
ACTS Tools.
To appear: two journals featuring
ACTS Tools.
• How Can ACTS Work for you?, http://acts.nersc.gov/events/Workshop2000
• Solving Problems in Science and Engineering, http://acts.nersc.gov/events/Workshop2001
• Robust and High Performance Tools for Scientific Computing, http://acts.nersc.gov/events/Workshop2002
• Robust and High Performance Tools for Scientific Computing, http://acts.nersc.gov/events/Workshop2003
• The ACTS Collection: Robust and High Performance Libraries for Computational Sciences, SIAM PP04http://www.siam.org/meetings/pp04
• Enabling Technologies For High End Computer Simulationshttp://acts.nersc.gov/events/Workshop2004
Tutorials and WorkshopsTutorials and Workshops• FY 2003-1, http://acts.nersc.gov/documents/Report2003-1.pdf• FY 2003-2, http://acts.nersc.gov/documents/Report2003-2.pdf• FY 2004-1, http://acts.nersc.gov/documents/Report2004-1.pdf• FY 2004-2, http://acts.nersc.gov/documents/Report2004-2.pdf
Progress ReportsProgress Reports
06/24/2004NERSC User Group Meeting 40
Matrix of Applications and Tools
... More Applications …
http://acts.nersc.gov/AppMathttp://acts.nersc.gov/AppMat
Enabling sciencesand discoveries…
withhigh performance and scalability...