ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National...

40
ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User Group Meeting June 24-25, 2004

Transcript of ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National...

Page 1: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

ACTS Toolsand

Case Studies of their Use

Osni Marques and Tony DrummondLawrence Berkeley National Laboratory

[oamarques,ladrummond]@lbl.gov

NERSC User Group Meeting June 24-25, 2004

Page 2: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 2

Goals Extended support for experimental software

Make ACTS tools available on DOE computers

Provide technical support

Maintain ACTS information center

Coordinate efforts with other supercomputing centers

Enable large scale scientific applications

Educate and train

• High• Intermediate level• Tool expertise• Conduct tutorials

What is the DOE ACTS Collection?

• Advanced CompuTational Software Collection

• Tools for developing parallel applications

• ACTS started as an “umbrella” project

http://acts.nersc.gov [email protected]

• Intermediate• Basic level• Higher level of support to users of the tool

• Basic• Help with installation • Basic knowledge of the tools• Compilation of user’s reports

Page 3: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 3

Motivation 1: Challenges in the Development of Scientific Codes

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Productivity• Time to the first solution (prototype)• Time to solution (production)• Other requirements

• Complexity• Increasingly sophisticated models• Model coupling• Interdisciplinarity

• Performance• Increasingly complex algorithms• Increasingly complex architectures• Increasingly demanding applications

• Libraries written in different languages.• Discussions about standardizing

interfaces are often sidetracked into implementation issues.

• Difficulties managing multiple libraries developed by third-parties.

• Need to use more than one language in one application.

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

• Libraries written in different languages.• Discussions about standardizing

interfaces are often sidetracked into implementation issues.

• Difficulties managing multiple libraries developed by third-parties.

• Need to use more than one language in one application.

• The code is long-lived and different pieces evolve at different rates

• Swapping competing implementations of the same idea and testing without modifying the code

• Need to compose an application with some other(s) that were not originally designed to be combined

Page 4: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 4

Motivation 2: Addressing the Performance Gap through Software

Peak performance is skyrocketing In 1990s, peak performance increased

100x; in 2000s, it will increase 1000x

But Efficiency for many science

applications declined from 40-50% on the vector supercomputers of 1990s to as little as 5-10% on parallel supercomputers of today

Need research on Mathematical methods and algorithms

that achieve high performance on a single processor and scale to thousands of processors

More efficient programming models for massively parallel supercomputers

0.1

1

10

100

1,000

2000 2004

Ter

aflo

ps

1996

PerformanceGap

Peak Performance

Real Performance

Page 5: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 5

ACTS Tools Functionalities Category Tool Functionalities

Numerical

Aztec Algorithms for the iterative solution of large sparse linear systems.

Hypre Algorithms for the iterative solution of large sparse linear systems, intuitive grid-centric interfaces, and dynamic configuration of parameters.

PETSc Tools for the solution of PDEs that require solving large-scale, sparse linear and nonlinear systems of equations.

OPT++ Object-oriented nonlinear optimization package.

SUNDIALS Solvers for the solution of systems of ordinary differential equations, nonlinear algebraic equations, and differential-algebraic equations.

ScaLAPACK Library of high performance dense linear algebra routines for distributed-memory message-passing.

SuperLU General-purpose library for the direct solution of large, sparse, nonsymmetric systems of linear equations.

TAO Large-scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization.

Code DevelopmentGlobal Arrays Library for writing parallel programs that use large arrays distributed across processing nodes and that offers a shared-

memory view of distributed arrays.

Overture Object-Oriented tools for solving computational fluid dynamics and combustion problems in complex geometries.

Code Execution

CUMULVS Framework that enables programmers to incorporate fault-tolerance, interactive visualization and computational steering into existing parallel programs

Globus Services for the creation of computational Grids and tools with which applications can be developed to access the Grid.

PAWS Framework for coupling parallel applications within a component-like model.

SILOON Tools and run-time support for building easy-to-use external interfaces to existing numerical codes.

TAU Set of tools for analyzing the performance of C, C++, Fortran and Java programs.

Library Development

ATLAS and PHiPAC Tools for the automatic generation of optimized numerical software for modern computer architectures and compilers.

PETE Extensible implementation of the expression template technique (C++ technique for passing expressions as function arguments).

ODEs

PDEs

TVUAzAz

bAx

Page 6: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 6

Use of ACTS Tools

3D overlapping grid for a submarine produced with Overture’s module ogen.

Model of a "hard" sphere included in a "soft" material, 26 million d.o.f.

Unstructured meshes in solid mechanics using Prometheus and

PETSc (Adams and Demmel).

Multiphase flow using PETSc, 4 million cell blocks, 32 million DOF, over 10.6

Gflops on an IBM SP (128 nodes), entire simulation runs in less than 30 minutes

(Pope, Gropp, Morgan, Seperhrnoori, Smith and Wheeler).

3D incompressible Euler,tetrahedral grid, up to 11 million unknowns, based on a

legacy NASA code, FUN3d (W. K. Anderson), fully implicit steady-state, parallelized with PETSc (courtesy of

Kaushik and Keyes).

Electronic structure optimization performed with TAO, (UO2)3(CO3)6

(courtesy of deJong).

Molecular dynamics and thermal flow simulation using codes based on Global

Arrays. GA have been employed in large simulation codes such as NWChem, GAMESS-UK, Columbus, Molpro, Molcas, MWPhys/Grid,

etc.

Page 7: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 7

Use of ACTS Tools

Induced current (white arrows) and charge density (colored plane and gray surface) in crystallized glycine due to an external field (Louie, Yoon,

Pfrommer and Canning), eigenvalue problems solved with ScaLAPACK.

OPT++ is used in protein energy minimization problems (shown

here is protein T162 from CASP5, courtesy of Meza , Oliva et al.)

Omega3P is a parallel distributed-memory code intended for the modeling and analysis of accelerator cavities, which requires the solution

of generalized eigenvalue problems. A parallel exact shift-invert eigensolver based on PARPACK and SuperLU has allowed for the solution

of a problem of order 7.5 million with 304 million nonzeros. Finding 10 eigenvalues requires about 2.5 hours on 24 processors of an IBM SP.

Two ScaLAPACK routines, PZGETRF and PZGETRS, are used for solution of linear systems in the spectral algorithms based AORSA code (Batchelor et al.), which is intended for the study of electromagnetic wave-plasma interactions. The code reaches 68% of peak performance on 1936 processors of an IBM SP.

Page 8: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 8

PETSc

Computation and Communication KernelsMPI, MPI-IO, BLAS, LAPACK

Profiling Interface

PETSc PDE Application Codes

Object-OrientedMatrices, Vectors, Indices

GridManagement

Linear SolversPreconditioners + Krylov Methods

Nonlinear Solvers,Unconstrained Minimization

ODE Integrators Visualization

Interface

Portable, Extensible Toolkit for Scientific Computation

Vectors: Fundamental objects for storing field solutions, right-hand

sides, etc.

Matrices: Fundamental objects for storing linear

operators (e.g., Jacobians)

Page 9: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 9

PETSc - Numerical Components

CompressedSparse Row

(AIJ)

Blocked CompressedSparse Row

(BAIJ)

BlockDiagonal(BDIAG)

Dense Other

Indices Block Indices Stride Other

Index Sets

Vectors

Line Search Trust Region

Newton-based MethodsOther

Nonlinear Solvers

AdditiveSchwartz

BlockJacobi Jacobi ILU ICC

LU(Sequential only) Others

Preconditioners

EulerBackward

EulerPseudo Time

Stepping Other

Time Steppers

GMRES CG CGS Bi-CG-STAB TFQMR Richardson Chebychev Other

Krylov Subspace Methods

Matrices

Distributed Arrays

Matrix-free

Page 10: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 10

TAO

Toolkit for Advanced Optimization

}:)(min{ ul xxxxf

Page 11: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 11

OPT++

• Assumptions:• Objective function is smooth

• Twice continuously differentiable

• Constraints are linearly independent

• Expensive objective functions

• Four major classes of problems available• NLF0(ndim, fcn, init_fcn, constraint) basic nonlinear function,

no derivative information available

• NLF1(ndim, fcn, init_fcn, constraint) nonlinear function, first derivative information available

• FDNLF1(ndim, fcn, init_fcn, constraint) nonlinear function, first derivative information approximated

• NLF2(ndim, fcn, init_fcn, constraint) nonlinear function, first and second derivative information available

Objective function

0)(

,0)(..

xg

xhts

),(min xfnx

)()()( xgwxhyxfL TT

Equality constraints

Inequality constraints

Page 12: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 12

Hypre

Data Layout

structured composite block-struc unstruc CSR

Linear Solvers

GMG, ... FAC, ... Hybrid, ... AMGe, ... ILU, ...

Linear System Interfaces

Page 13: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 13

SUNDIALS

SUite of Nonlinear and DIfferential/ALgebraic Solvers

BandLinearSolver

PreconditionedGMRES

Linear Solver

GeneralPreconditioner

Modules

VectorKernels

DenseLinearSolver

User main routineUser problem-defining functionUser preconditioner function

CVODEODE

Integrator

IDADAE

Integrator

KINSOLNonlinear

Solver

Solvers

• x’ = f(t,x), x(t0) = x0 CVODE• F(t,x,x’) = 0, x(t0) = x0 IDA• F(x) = 0 KINSOL

Sensitivity analysis (forward and adjoint) also available

Page 14: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

• Software Hierarchy• Data distribution• Simple examples• Application

ScaLAPACK

Page 15: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 15

ScaLAPACK: software hierarchy

ScaLAPACK

BLAS

LAPACK BLACS

MPI/PVM/...

PBLASGlobal

Local

platform specific

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Clarity,modularity, performance and portability.

Atlas can be used here for automatic tuning.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Linear systems, least squares, singular value decomposition,

eigenvalues.

Communication routines targeting

linear algebra operations.

Communication routines targeting

linear algebra operations.

Parallel BLAS.

Parallel BLAS.

Communication layer (message

passing).

Communication layer (message

passing).

http://acts.nersc.gov/scalapack

Page 16: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 16

ScaLAPACK: 2D Block-Cyclic Distribution

a11 a12 a15 a13 a14

a21 a22 a25 a23 a24

a51 a52 a55 a53 a54

a31 a32 a35 a33 a34

a41 a42 a45 a43 a44

5x5 matrix partitioned in 2x2 blocks 2x2 process grid point of view

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

a11 a12 a13 a14 a15

a21 a22 a23 a24 a25

a31 a32 a33 a34 a35

a41 a42 a43 a44 a45

a51 a52 a53 a54 a55

0 1

2 3

http://acts.nersc.gov/scalapack/hands-on/datadist.html

Page 17: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 17

2D Block-Cyclic Distribution

5.54.53.52.51.5

5.44.43.42.41.4

5.34.33.32.31.3

5.24.23.22.21.2

5.14.13.12.11.1

CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

IF ( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN A(1) = 1.1; A(2) = -2.1; A(3) = -5.1; A(1+LDA) = 1.2; A(2+LDA) = 2.2; A(3+LDA) = -5.2; A(1+2*LDA) = 1.5; A(2+3*LDA) = 2.5; A(3+4*LDA) = -5.5;ELSE IF ( MYROW.EQ.0 .AND. MYCOL.EQ.1 ) THEN A(1) = 1.3; A(2) = 2.3; A(3) = -5.3; A(1+LDA) = 1.4; A(2+LDA) = 2.4; A(3+LDA) = -5.4; ELSE IF ( MYROW.EQ.1 .AND. MYCOL.EQ.0 ) THEN A(1) = -3.1; A(2) = -4.1; A(1+LDA) = -3.2; A(2+LDA) = -4.2; A(1+2*LDA) = 3.5; A(2+3*LDA) = 4.5;ELSE IF ( MYROW.EQ.1 .AND. MYCOL.EQ.1 ) THEN A(1) = 3.3; A(2) = -4.3; A(1+LDA) = 3.4; A(2+LDA) = 4.4; END IF CALL PDGESVD( JOBU, JOBVT, M, N, A, IA, JA, DESCA, S, U, IU, JU, DESCU, VT, IVT, JVT, DESCVT, WORK, LWORK, INFO )

a11 a12 a15 a13 a14

a21 a22 a25 a23 a24

a51 a52 a55 a53 a54

a31 a32 a35 a33 a34

a41 a42 a45 a43 a44

0 1

2 3

0 1

0

1

LDA is the leading dimension of the local array

LDA is the leading dimension of the local array

Array descriptor for A (contains information about A)

Array descriptor for A (contains information about A)

Page 18: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

On line tutorial: http://acts.nersc.gov/scalapack/hands-on/main.html

Page 19: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

PROGRAM PSGESVDRIVER** Example Program solving Ax=b via ScaLAPACK routine PSGESV** .. Parameters ..

* ..* .. Local Scalars ..

* ..* .. Local Arrays ..

* .. Executable Statements ..** INITIALIZE THE PROCESS GRID* CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )** If I'm not in the process grid, go to the end of the program* IF( MYROW.EQ.-1 ) $ GO TO 10** DISTRIBUTE THE MATRIX ON THE PROCESS GRID* Initialize the array descriptors for the matrices A and B* CALL DESCINIT( DESCA, M, N, MB, NB, RSRC, CSRC, ICTXT, MXLLDA, $ INFO ) CALL DESCINIT( DESCB, N, NRHS, NB, NBRHS, RSRC, CSRC, ICTXT, $ MXLLDB, INFO )** Generate matrices A and B and distribute to the process grid* CALL MATINIT( A, DESCA, B, DESCB )** Make a copy of A and B for checking purposes* CALL PSLACPY( 'All', N, N, A, 1, 1, DESCA, A0, 1, 1, DESCA ) CALL PSLACPY( 'All', N, NRHS, B, 1, 1, DESCB, B0, 1, 1, DESCB )** CALL THE SCALAPACK ROUTINE* Solve the linear system A * X = B* CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, $ INFO )

Page 20: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

/**********************************************************************//* This program illustrates the use of the ScaLAPACK routines PDPTTRF *//* and PPPTTRS to factor and solve a symmetric positive definite *//* tridiagonal system of linear equations, i.e., T*x = b, with *//* different data in two distinct contexts. */ /**********************************************************************/

/* a bunch of things omitted for the sake of space */

main(){ /* Start BLACS */ Cblacs_pinfo( &mype, &npe ); Cblacs_get( 0, 0, &context ); Cblacs_gridinit( &context, "R", 1, npe ); /* Processes 0 and 2 contain d(1:4) and e(1:4) */ /* Processes 1 and 3 contain d(5:8) and e(5:8) */ if ( mype == 0 || mype == 2 ){ d[0]=1.8180; d[1]=1.6602; d[2]=1.3420; d[3]=1.2897; e[0]=0.8385; e[1]=0.5681; e[2]=0.3704; e[3]=0.7027; } else if ( mype == 1 || mype == 3 ){ d[0]=1.3412; d[1]=1.5341; d[2]=1.7271; d[3]=1.3093; e[0]=0.5466; e[1]=0.4449; e[2]=0.6946; e[3]=0.0000; } if ( mype == 0 || mype == 1 ) { /* New context for processes 0 and 1 */ map[0]=0; map[1]=1; Cblacs_get( context, 10, &context_1 ); Cblacs_gridmap( &context_1, map, 1, 1, 2 ); /* Right-hand side is set to b = [ 1 2 3 4 5 6 7 8 ] */ if ( mype == 0 ) { b[0]=1.0; b[1]=2.0; b[2]=3.0; b[3]=4.0; } else if ( mype == 1 ) { b[0]=5.0; b[1]=6.0; b[2]=7.0; b[3]=8.0; } /* Array descriptor for A (D and E) */ desca[0]=501; desca[1]=context_1; desca[2]=n; desca[3]=nb; desca[4]=0; desca[5]=lda; desca[6]=0; /* Array descriptor for B */ descb[0]=502; descb[1]=context_1; descb[2]=n; descb[3]=nb; descb[4]=0; descb[5]=ldb; descb[6]=0; /* Factorization */ pdpttrf( &n, d, e, &ja, desca, af, &laf, work, &lwork, &info ); /* Solution */ pdpttrs( &n, &nrhs, d, e, &ja, desca, b, &ib, descb, af, &laf, work, &lwork, &info ); printf( "MYPE=%i: x[:] = %7.4f %7.4f %7.4f %7.4f\n", mype, b[0], b[1], b[2], b[3]); }

else { /* New context for processes 0 and 1 */ map[0]=2; map[1]=3; Cblacs_get( context, 10, &context_2 ); Cblacs_gridmap( &context_2, map, 1, 1, 2 ); /* Right-hand side is set to b = [ 8 7 6 5 4 3 2 1 ] */ if ( mype == 2 ) { b[0]=8.0; b[1]=7.0; b[2]=6.0; b[3]=5.0; } else if ( mype == 3 ) { b[0]=4.0; b[1]=3.0; b[2]=2.0; b[3]=1.0; } /* Array descriptor for A (D and E) */ desca[0]=501; desca[1]=context_2; desca[2]=n; desca[3]=nb; desca[4]=0; desca[5]=lda; desca[6]=0; /* Array descriptor for B */ descb[0]=502; descb[1]=context_2; descb[2]=n; descb[3]=nb; descb[4]=0; descb[5]=ldb; descb[6]=0; /* Factorization */ pdpttrf( &n, d, e, &ja, desca, af, &laf, work, &lwork, &info ); /* Solution */ pdpttrs( &n, &nrhs, d, e, &ja, desca, b, &ib, descb, af, &laf, work, &lwork, &info ); printf( "MYPE=%i: x[:] = %7.4f %7.4f %7.4f %7.4f\n", mype, b[0], b[1], b[2], b[3]); } Cblacs_gridexit( context ); Cblacs_exit( 0 );}

Using Matlab notation:

T = diag(D)+diag(E,-1)+diag(E,1)

where

D = [ 1.8180 1.6602 1.3420 1.2897 1.3412 1.5341 1.7271 1.3093 ] E = [ 0.8385 0.5681 0.3704 0.7027 0.5466 0.4449 0.6946 ]

Then, solving T*x = b,

if b = [ 1 2 3 4 5 6 7 8 ] x = [ 0.3002 0.5417 1.4942 1.8546 1.5008 3.0806 1.0197 5.5692 ] if b = [ 8 7 6 5 4 3 2 1 ] x = [ 3.9036 1.0772 3.4122 2.1837 1.3090 1.2988 0.6563 0.4156 ]

Page 21: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 21

Application: Cosmic Microwave Background (CMB) Analysis

• The statistics of the tiny variations in the CMB (the faint echo of the Big Bang) allows the determination of the fundamental parameters of cosmology to the percent level or better.

• MADCAP (Microwave Anisotropy Dataset Computational Analysis Package)

• Makes maps from observations of the CMB and then calculates their angular power spectra. (See http://crd.lbl.gov/~borrill).

• Calculations are dominated by the solution of linear systems of the form M=A-1B for dense nxn matrices A and B scaling as O(n3) in flops. MADCAP uses ScaLAPACK for those calculations.

• On the NERSC Cray T3E (original code):• Cholesky factorization and triangular solve.• Typically reached 70-80% peak performance.• Solution of systems with n ~ 104 using tens of processors. • The results demonstrated that the Universe is spatially flat, comprising

70% dark energy, 25% dark matter, and only 5% ordinary matter.

• On the NERSC IBM SP:• Porting was trivial but tests showed only 20-30% peak performance. • Code rewritten to use triangular matrix inversion and triangular matrix

multiplication one-day work • Performance increased to 50-60% peak.• Solution of previously intractable systems with n ~ 105 using hundreds

of processors.

The international BOOMERanG collaboration announced results of the most detailed measurement of the cosmic microwave background radiation (CMB), which strongly indicated that the universe is flat (Apr. 27, 2000).

Page 22: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

• Functionalities• Simple examples• Application

SuperLU

Page 23: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 23

SuperLU

• Solve general sparse linear system A x = b.

• Algorithm: Gaussian elimination (factorization A = LU), followed by lower/upper triangular solutions.

• Efficient and portable implementation for high-performance architectures, flexible interface.

SuperLU SuperLU_MT SuperLU_DIST

Platform Serial SMP Distributed

Language C C and Pthread (or pragmas)

C and MPI

Data type Real/complex,

Single/double

Real, double Real/complex,

Double

Page 24: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 24

SuperLU: Example Pddrive.c (1/2)

#include "superlu_ddefs.h“

main(int argc, char *argv[]){ superlu_options_t options; SuperLUStat_t stat; SuperMatrix A; ScalePermstruct_t ScalePermstruct; LUstruct_t LUstruct; SOLVEstruct_t SOLVEstruct; gridinfo_t grid;

· · · · · · /* Initialize MPI environment */ MPI_Init( &argc, &argv );

· · · · · · /* Initialize the SuperLU process grid */

nprow = npcol = 2; superlu_gridinit(MPI_COMM_WORLD,

nprow,npcol, &grid);

/* Read matrix A from file, distribute it, and set up theright-hand side */

dcreate_matrix(&A, nrhs, &b, &ldb, &xtrue, &ldx, fp, &grid);

/* Set the options for the solver. Defaults are:

options.Fact = DOFACT; options.Equil = YES; options.ColPerm = MMD_AT_PLUS_A; options.RowPerm = LargeDiag; options.ReplaceTinyPivot = YES; options.Trans = NOTRANS; options.IterRefine = DOUBLE; options.SolveInitialized = NO; options.RefineInitialized = NO; options.PrintStat = YES; */ set_default_options_dist(&options);

Page 25: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 25

SuperLU: Example Pddrive.c (2/2)

/* Initialize ScalePermstruct and LUstruct. */ ScalePermstructInit(m, n, &ScalePermstruct); LUstructInit(m, n, &LUstruct); /* Initialize the statistics variables. */ PStatInit(&stat); /* Call the linear equation solver. */ pdgssvx(&options, &A, &ScalePermstruct, b,

ldb, nrhs, &grid, &LUstruct, &SOLVEstruct, berr, &stat,

&info);

/* Print the statistics. */

PStatPrint(&options, &stat, &grid);

/* Deallocate storage */

PStatFree(&stat);

Destroy_LU(n, &grid, &LUstruct); LUstructFree(&LUstruct);

/* Release the SuperLU process grid */

superlu_gridexit(&grid);

/* Terminate the MPI execution environment */

MPI_Finalize();

}

Page 26: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 26

Application: Accelerator Cavity Design (1/2)

• Calculate cavity mode frequencies and field vectors

• Solve Maxwell equation in electromagnetic field• Omega3P simulation code developed at SLAC• Finite element methods lead to large sparse

generalized eigensystem K x = M x• Real symmetric for lossless cavities; complex

symmetric when lossy in cavities• Seek interior eigenvalues (tightly clustered) that are

relatively small in magnitude

Omega3P model of a 47-cell section of the 206-cell Next Linear Collider accelerator structure

Page 27: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 27

Application: Accelerator Cavity Design (2/2)

dds47 Performace (16 eigenvalues)

0

2000

4000

6000

8000

10000

12000

14000

16 32 48 64

Processors

Tim

e (s

)

Minimum Degree

METIS

Filtering

METIS+Shared

Minimum Degree Shared

• Speed up Lanczos convergence by shift-invert. Seek largest eigenvalues, well separated, of the transformed system M (K - M)-1 x = M x, = 1 / ( - )

• The Filtering algorithm [Y. Sun]: Inexact shift-invert Lanczos + JOCC• Exact shift-invert Lanczos (ESIL): PARPACK+ SuperLU_DIST for shifted

linear system (no pivoting, no iterative refinement)

dds47 (linear elements) - total eigensolver time: N=1.3x106, NNZ = 20x106

Page 28: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

• Functionalities• Simple examples• Application

TAU

Page 29: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 29

TAU: Functionalities

• Profiling of Java, C++, C, and Fortran codes

• Detailed information (much more than prof/gprof)

• Profiles for each unique template instantiation

• Time spent exclusively and inclusively in each function

• Start/Stop timers

• Profiling data maintained for each thread, context, and node

• Parallel IO Statistics for the number of calls for each profiled function

• Profiling groups for organizing and controlling instrumentation

• Support for using CPU hardware counters (PAPI)

• Graphic display for parallel profiling data

• Graphical display of profiling results (built-in viewers, interface to Vampir)

Page 30: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 30

TAU: Example 1 (1/4)

Index of http://acts.nersc.gov/tau/programs/psgesv

Options currently installed:

option used in the example

Page 31: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 31

TAU: Example 1 (2/4)

PROGRAM PSGESVDRIVER!! Example Program solving Ax=b via ScaLAPACK routine PSGESV!! .. Parameters ..

!**** a bunch of things omitted for the sake of space **** ! .. Executable Statements ..!! INITIALIZE THE PROCESS GRID! integer profiler(2) save profiler

call TAU_PROFILE_INIT() call TAU_PROFILE_TIMER(profiler,'PSGESVDRIVER') call TAU_PROFILE_START(profiler) CALL SL_INIT( ICTXT, NPROW, NPCOL ) CALL BLACS_GRIDINFO( ICTXT, NPROW, NPCOL, MYROW, MYCOL )

!**** a bunch of things omitted for the sake of space ****

CALL PSGESV( N, NRHS, A, IA, JA, DESCA, IPIV, B, IB, JB, DESCB, & INFO )

!**** a bunch of things omitted for the sake of space ****

call TAU_PROFILE_STOP(profiler) STOP END

psgesvdriver.int.f90

NB. ScaLAPACK routines have not been instrumented and therefore are not shown in the charts.

Page 32: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 32

TAU: Example 1 (3/4)

Page 33: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 33

TAU: Example 1 (4/4)

Page 34: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 34

TAU: Example 2 (1/2)

Index of http://acts.nersc.gov/tau/programs/pdgssvx

OBJS = pddrive.o dcreate_matrix.o pdgssvx.o TARGET = pddrive.x COMPRULE = $(PDTCPARSE) $< $(COMPFLAGS) $(CDEFS) $(INCLUDEDIR); \ $(TAUINSTR) $*.pdb $< -o $*.inst.c -g "COMPUTATION" ; \ $(CC) $(COMPFLAGS) $(CDEFS) $(BLASDEF) $(INCLUDEDIR) -c $*.inst.c -o $@ ; \ rm -f $*.pdb ; $(TARGET): $(OBJS) $(LINKER) $(LNKFLAGS) $(OBJS) $(LIBS) $(TAULIBS) -lm -o $@ # Compilation rule .c.o: $(COMPRULE)

Makefile: compilation rule

include $(TAUROOTDIR)/rs6000/lib/Makefile.tau-mpi-papi-pdt-profile-trace

Page 35: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 35

TAU: Example 2 (2/2)

PAPI provides access to hardware performance counters (see http://icl.cs.utk.edu/papi for details and contact [email protected] for the corresponding TAU events). In this example we are just measuring FLOPS.

PARAPROF

Page 36: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 36

Study Case: Electronic Structure Calculations Code

Page 37: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

Additional Information

Page 38: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

http://acts.nersc.govhttp://acts.nersc.gov

Tool descriptions, installation

details, examples, etc

Agenda, accomplishmen

ts, conferences, releases, etc

Goals and other relevant

information

Points of

contact

Search engine

• High Performance Tools• portable• library calls• robust algorithms• help code optimization

• Scientific Computing Centers (like PSC, NERSC):

• Reduce user’s code development time that sums up in more production runs and faster and effective scientific research results

• Overall better system utilization• Facilitate the accumulation and

distribution of high performance computing expertise

• Provide better scientific parameters for procurement and characterization of specific user needs

• High Performance Tools• portable• library calls• robust algorithms• help code optimization

• Scientific Computing Centers (like PSC, NERSC):

• Reduce user’s code development time that sums up in more production runs and faster and effective scientific research results

• Overall better system utilization• Facilitate the accumulation and

distribution of high performance computing expertise

• Provide better scientific parameters for procurement and characterization of specific user needs

See also: http://acts.nersc.gov/documentsSee also: http://acts.nersc.gov/documents

Page 39: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

• An expanded Framework for the Advanced Computational Testing and Simulation Toolkit, http://acts.nersc.gov/documents/Proposal.pdf

• The Advanced Computational Testing and Simulation (ACTS) Toolkit. Technical Report LBNL-50414.

• A First Prototype of PyACTS. Technical Report LBNL-53849. • ACTS - A collection of High Performing Tools for Scientific

Computing. Technical Report LBNL-53897.• The ACTS Collection: Robust and high-performance tools for

scientific computing. Guidelines for tool inclusion and retirement. Technical Report LBNL/PUB-3175.

• An Infrastructure for the creation of High End Scientific and Engineering Software Tools and Applications. Technical Report LBNL/PUB-3176.

ReferencesReferences

[email protected]://acts.nersc.gov

To appear: two journals featuring

ACTS Tools.

To appear: two journals featuring

ACTS Tools.

• How Can ACTS Work for you?, http://acts.nersc.gov/events/Workshop2000

• Solving Problems in Science and Engineering, http://acts.nersc.gov/events/Workshop2001

• Robust and High Performance Tools for Scientific Computing, http://acts.nersc.gov/events/Workshop2002

• Robust and High Performance Tools for Scientific Computing, http://acts.nersc.gov/events/Workshop2003

• The ACTS Collection: Robust and High Performance Libraries for Computational Sciences, SIAM PP04http://www.siam.org/meetings/pp04

• Enabling Technologies For High End Computer Simulationshttp://acts.nersc.gov/events/Workshop2004

Tutorials and WorkshopsTutorials and Workshops• FY 2003-1, http://acts.nersc.gov/documents/Report2003-1.pdf• FY 2003-2, http://acts.nersc.gov/documents/Report2003-2.pdf• FY 2004-1, http://acts.nersc.gov/documents/Report2004-1.pdf• FY 2004-2, http://acts.nersc.gov/documents/Report2004-2.pdf

Progress ReportsProgress Reports

Page 40: ACTS Tools and Case Studies of their Use Osni Marques and Tony Drummond Lawrence Berkeley National Laboratory [oamarques,ladrummond]@lbl.gov NERSC User.

06/24/2004NERSC User Group Meeting 40

Matrix of Applications and Tools

... More Applications …

http://acts.nersc.gov/AppMathttp://acts.nersc.gov/AppMat

Enabling sciencesand discoveries…

withhigh performance and scalability...