Introduction to scientific computing using PETSc and · PDF file ·...

Post on 30-Mar-2018

218 views 0 download

Transcript of Introduction to scientific computing using PETSc and · PDF file ·...

Introduction to scientific computing using

PETSc and Trilinos Václav Hapla David Horák

Michal Merta

PRACE Spring School, Cracow 2012

Should we reinvent the wheel?

many complex but well-known and often-used algorithms (LU, CG, matrix-vector multiply, …) have been already implemented, tested and are ready to use!

a software framework is a software providing generic functionality that can be selectively changed by user code, thus providing application specific software (wikipedia.org)

motivation: programmers should consider focusing on new, original algorithms that make an added value

Frameworks for scientific computing – why ?

are parallelized on the data level (vectors & matrices) using MPI

use BLAS and LAPACK – de facto standard for dense LA

have their own implementation of sparse BLAS

include robust preconditioners, linear solvers (direct and iterative) and nonlinear solvers

can cooperate with many other external solvers and libraries (e.g. MATLAB, MUMPS, UMFPACK, …)

already support CUDA and hybrid parallelization

are licensed as open-source

Both PETSc and Trilinos…

PETSc

„essential object orientation“

for programmers used to procedural programming but seeking for modular code

recommended for C and FORTRAN users

Trilinos

„pure object orientation“

for programmers who are not scared of OOP, appreciate good SW design and have some experience with C++

extensibility and reusability

Potential users

PETSc Library Václav Hapla

David Horák

PETSc project

PETSc programming primitives

Objects in PETSc

Vectors, index sets and matrices in PETSc

Linear solvers

Debugging & profiling

Outline of PETSc tutorial

PETSc project Václav Hapla

PETSc = Portable, Extensible Toolkit for Scientific computation

developed by Argonne National Laboratory since 1991

data structures and routines for the scalable parallel solution of scientific applications modeled by PDE

coded primarily in C language but good FORTRAN support, can also be called from C++ and Python codes

homepage: www.mcs.anl.gov/petsc

current stable version is 3.2

PETSc project (1)

petsc-dev (development branch) is evolving intensively

code and mailing lists open to anybody

portable to any parallel system supporting MPI

tightly coupled systems (Cray XT5, BG/P, Earth Simulator, Sun Blade, SGI Altix)

loosely coupled systems, such as networks of workstations (Linux, Windows, IBM, Mac, Sun)

iPhone support

PETSc project (2)

Developing parallel, nontrivial PDE solvers that deliver high performance is still difficult and requires months (or even years) of concentrated effort. PETSc is a toolkit that can ease these difficulties and reduce the development time, but it is not a black-box PDE solver, nor a silver bullet.

Barry Smith

(PETSc Team)

Role of PETSc

„We will continually add new features and enhanced functionality in upcoming releases; small changes in usage and calling sequences of PETSc routines will continue to occur. Although keeping one's code accordingly up-to-date can be annoying, all PETSc users will be rewarded in the long run with a cleaner, better designed, and easier-to-use interface.“

Changes

Documentation

all documention available at http://www.mcs.anl.gov/petsc/documentation/index.html

PETSc users manual – PDF (fully searchable, hypertext)

help topics – general topics such as „error handling“, „multigrid“, „shared memory“

manual pages – individual routines, split into 4 categories: Beginner - basic usage

Intermediate - setting options for algorithms and data structures

Advanced - setting more advanced options and customization

Developer - interfaces intended primarily for library developers

PETSc is layered on top of MPI

MPI provides low-level tools to exchange data primitives between processes

PETSc provides medium-level tools such as insert matrix element to arbitrary location

parallel matrix-vector product

you do not need to know much about MPI

but you can call arbitrary MPI routine directly if needed

same code for sequential and parallel runs

Parallelism in PETSc

PETSc cooperates with... (1)

Python: petsc4py

Documentation utilities: Sowing, lgrind, c2html

MPI: MPICH, MPE, Open MPI

Dense LA: BLAS, LAPACK, BLACS, ScaLAPACK, PLAPACK

Graphs & load balancing: ParMetis, Chaco, Jostle, Party, Scotch, Zoltan

Direct linear solvers: MUMPS, Spooles, SuperLU, SuperLU_Dist, UMFPack

PETSc cooperates with... (2)

Iterative linear solvers: PaStiX, HYPRE

Multigrid: Trilinos ML

Eigenvalue solvers: BLOPEX

FFT: FFTW

Time-stepping: Sundials

Meshing: Triangle, TetGen, FIAT, FFC, Generator

Data exchange: HDF5

Boost

TAO - Toolkit for Advanced Optimization

SLEPc - Scalable Library for Eigenvalue Problems

fluidity - a finite element/volume fluids code

Prometheus - scalable unstructured finite element solver

freeCFD - general purpose CFD solver

OpenFVM - finite volume based CFD solver

OOFEM - object oriented finite element library

libMesh - adaptive finite element library

Packages that use/extend PETSc (1)

MOOSE - Multiphysics Object-Oriented Simulation Environment developed at INL built on top of libmesh on top of PETSc

DEAL.II - sophisticated C++ based finite element simulation package

PHMAL - The Parallel Hierarchical Adaptive MultiLevel Project

Chaste - Cancer, Heart and Soft Tissue Environment

Packages that use/extend PETSc (2)

PETSc has been used for modeling in all of these areas: Acoustics, Aerodynamics, Air Pollution, Arterial Flow, Bone Fractures, Brain Surgery, Cancer Surgery, Cancer Treatment, Carbon Sequestration, Cardiology, Cells, CFD, Combustion, Concrete, Corrosion, Data Mining, Dentistry, Earth Quakes...

Applications (1)

Applications (2)

Fracture mechanics

Mechanics- elasticity

Real-time surgery

Magma dynamics

PETSc installation in a nutshell

Václav Hapla

stable releases of PETSc can be downloaded via HTTP as a tarball

petsc-3.2-p7.tar.gz - full distribution (including all current patches) with documentation

petsc-lite-3.2-p7.tar.gz - smaller version with no documentation (all documentation may be accessed online)

Download - tarball

stable releases as well as current development release can be downloaded using Mercurial versioning system

caution – build system has its own separate repository!

stable: hg clone http://petsc.cs.iit.edu/petsc/releases/petsc-3.2

hg clone http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.2 \

petsc-3.2/config/BuildSystem

dev: hg clone http://petsc.cs.iit.edu/petsc/releases/petsc-3.2

hg clone http://petsc.cs.iit.edu/petsc/releases/BuildSystem-3.2 \

petsc-3.2/config/BuildSystem

Download - Mercurial

./configure script written in Python

realizes PETSc auto-tuning capabilities

sets many internal variables and macros depending on the machine

generates makefile

--help – prints all options

see www.mcs.anl.gov/petsc/documentation/installation.html

Configuration

PETSC_DIR and PETSC_ARCH variables that control the configuration and build process of PETSc

These variables can be set as environment variables or specified on the command line.

PETSC_DIR points to the location of the PETSc installation that is used.

Multiple PETSc versions can coexist on the same file-system. By changing PETSC_DIR value, one can switch between these installed versions of PETSc.

PETSC_DIR

PETSC_ARCH variable gives a name to a configuration and build.

configure uses this value to store the generated makefiles in ${PETSC_DIR}/${PETSC_ARCH}/conf.

make uses this value to determine the location

program libraries (.a or .so) of PETSc and downloaded external packages are stored into ${PETSC_DIR}/${PETSC_ARCH}/lib

Thus one can install multiple variants of PETSc libraries - by providing different PETSC_ARCH values to each configure build.

Then one can switch between using these variants of libraries by switching the PETSC_ARCH value used.

PETSC_ARCH

PETSc supports tens of external packages

[pkg] = mumps, superlu, parmetis, sprng, netcdf, ...

download and compile automatically:

--download-[pkg] - downloads and installs a package for you in $PETSC_DIR/lib

use existing installation

--with-[pkg] =<bool> test for [pkg]

--with-[pkg]-dir=<dir> the root directory of the [pkg] installation

--with-[pkg]-include=<dirs>

--with-[pkg]-lib=<libraries: e.g.[/Users/..../libboost.a,...]>

External packages

./configure --with-batch

for machines with a batch system

configure generates special executable binary conftest

run conftest on one computing node (e.g. submit the batch script)

it will generate a new ./reconfigure-$PETSC_ARCH script with machine specific variables set (cache size etc.)

run ./reconfigure to complete the configuration stage

Batch mode

after configuration stage is completed successfully you get the message like this Configure stage complete. Now build PETSc libraries with (cmake build): make PETSC_DIR=/home/vhapla/devel/petsc-dev \

PETSC_ARCH=debug-so-mpich2-gnu all

you can copy and paste the make command

it will compile the source files and build the program library

it can make use of CMake if installed

significant speedup of compilation

shows progress percentage

Compilation

PETSc programming primitives Václav Hapla

#include "petsc.h"

#undef __FUNCT__

#define __FUNCT__ "main"

int main(int argc,char **argv)

Declare the name of each routine by redefining __FUNCT__ macro to get more useful tracebacks on error

Program header in C

program init

implicit none

#include "finclude/petsc.h"

FORTRAN has more limited error handling, one cannot use __FUNCT__ macro

If you are familiar with C, please use C.

We will focus on PETSc C interface.

Program header in F

You can include all PETSc headers at once by #include "petsc.h" //includes all PETSc headers

Or you can include specific headers #include "petscsys.h" //framework routines

#include "petscvec.h" //vectors

#include "petscmat.h" //matrices

Higher level headers include all lower level headers needed

#include "petscksp.h" //includes vec,mat,dm,pc

What headers to include?

Initialize & Finalize (1)

static char help[] = "Empty program.\n\n";

#include <petscsys.h>

int main(int argc,char **argv)

{

ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);

ierr = PetscFinalize();CHKERRQ(ierr);

return 0;

}

Every PETSc program begins with the call to PetscInitialize()

ends with the call to PetscFinalize()

they call MPI_Init(), MPI_Finalize()

Initialize & Finalize (2)

static char help[] = "Empty program.\n\n";

#include <petscsys.h>

int main(int argc,char **argv)

{

ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);

ierr = PetscFinalize();CHKERRQ(ierr);

return 0;

}

argc,argv - propagate command line arguments to PETSc and MPI

help - additional help messages to print when the executable is invoked with the cmd-line-arg -help (will be discussed later)

PETSc is written in C

C has no support for C++ exceptions

instead of throwing exception, every routine returns integer error code (PetscErrorCode type)

error code is „catched“ by CHKERRQ(ierr) macro

PetscErrorCode ierr;

ierr = SomePetscRoutine();CHKERRQ(ierr);

Error handling (1)

#include <petscsys.h>

int main(int argc,char **argv)

{

ierr = PetscFinalize(); CHKERRQ(ierr);

return 0;

}

This code throws this error: PetscInitialize() must be called before PetscFinalize()

(+ stacktrace)

Error handling (2)

Communicators

communicator = an opaque object of MPI_Comm type that defines process group and synchronization channel

PETSc built-in communicators: PETSC_COMM_SELF – just this process – for serial objects

PETSC_COMM_WORLD – all processes – for parallel objects

MPI can split communicators, spawn processes on new communicators – PETSc does not deal with it

Function Collectiveness

1. Not Collective – no communication nor synchronization VecGetLocalSize(), MatSetValues()

2. Logically Collective – checked when running in debug mode KSPSetType(), PCMGSetCycleType()

3. Neighbor-wise Collective – point-to-point communication between two processes VecScatterBegin(), MatMult()

4. Collective – global communication, synchronous VecNorm(), MatAssemblyBegin(), KSPCreate()

PETSc provides many useful utilities

prefixed by Petsc

parallel flow control: Barrier, SequentialPhaseBegin/End

memory management and checking: Malloc,Free,MallocValidate,MallocDump

Utility routines (1)

logging: PetscLogEventRegister/Begin/End

string handling: Strcat/cmp/cpy/len/tolower/replace/ToArray

MATLAB engine interface: MatlabEngineCreate/Destroy/Evaluate

and many more

Utility routines (2)

PetscInt n = 20;

PetscScalar v = -3.5, w = 3.1e9;

PetscReal x = 2.55, y = 1e-9;

PETSc has its own typedefs for numeric data types

It is better to use them instead of built-in C types

Better portability and easier switching between

real and complex numbers

32-bit and 64-bit numbers

Primitive datatypes

PETSc provides routines for managing the options database

in your program, you can call routines PetscOptionsGetInt,

PetscOptionsGetString,

PetscOptionsGetReal, etc. to obtain the values

Options (1)

Example:

in command-line ./yourapp -myint 10 -myreal 1e3

in program yourapp: PetscReal myreal; PetscInt myint; PetscOptionsGetInt(PETSC_NULL,"-myint",&myint,

PETSC_NULL);

PetscOptionsGetReal(PETSC_NULL,"-myreal",&myreal,

PETSC_NULL);

Options (2)

-help command-line argument prints essential info about the PETSc-based program:

program description (the last argument of PetscInitialize()

options specific for the program

general built-in options

built-in options relevant for the program

PETSc version

command-line help

trainee@pss2012vm:~/petsc-tutorial$ ./ex2 -help

Solves a linear system in parallel with KSP.

Input parameters include:

-random_exact_sol : use a random exact solution vector

-view_exact_sol : write exact solution vector to stdout

-m <mesh_x> : number of mesh points in x-direction

-n <mesh_n> : number of mesh points in y-direction

-----------------------------------------------------------

Petsc Release Version 3.2.0, Patch 7, Thu Mar 15 09:30:51 CDT 2012

...

-----------------------------------------------------------

Options for all PETSc programs:

-help: prints help method for each option

-on_error_abort: cause an abort when an error is detected. Useful

only when run in the debugger

...

command-line help - example

command line

filename in the third argument of PetscInitialize()

~/.petscrc

$PWD/.petscrc

$PWD/petscrc

PetscOptionsInsertFile()

PetscOptionsInsertString()

PETSC_OPTIONS environment variable

command line option -options_file [file]

Ways to set options

C: PetscErrorCode PetscPrintf(MPI_Comm,

const char format[],...)

prints to standard output

only from the first processor in the communicator comm

F: PetscPrintf(MPI_Comm, character(*),

PetscErrorCode)

limited support in FORTRAN

only single character string can be passed

Print to standard output

static char help[] = "Hello world program.\n\n";

#include <petscsys.h>

int main(int argc,char **argv)

{

PetscErrorCode ierr;

PetscMPIInt rank;

PetscInitialize(&argc,&argv,(char *)0,help);

MPI_Comm_rank(PETSC_COMM_WORLD,&rank);

PetscPrintf(PETSC_COMM_SELF,"Hello World from %d\n",rank);

PetscFinalize();

return 0;

}

PETSc Hello world in C

program main

integer ierr, rank

#include "include/finclude/petsc.h"

call PetscInitialize(PETSC_NULL_CHARACTER, ierr)

call MPI_Comm_rank(PETSC_COMM_WORLD, rank, ierr)

if (rank .eq. 0) then

print *, ‘Hello World from ’, rank

endif

call PetscFinalize(ierr)

end

PETSc Hello world in F

static char help[] = "Hello world program.\n\n";

#include <petscsys.h>

int main(int argc,char **argv)

{

PetscErrorCode ierr;

PetscMPIInt rank;

ierr = PetscInitialize(&argc,&argv,(char *)0,help);CHKERRQ(ierr);

ierr = MPI_Comm_rank(PETSC_COMM_WORLD,&rank);CHKERRQ(ierr);

ierr = PetscPrintf(PETSC_COMM_SELF,"Hello World from %d\n",

rank);CHKERRQ(ierr);

ierr = PetscFinalize();

return 0;

}

PETSc Hello world in C - with error checking

To obtain output of the first processor followed by that of the second, etc., one can call:

PetscSynchronizedPrintf(PETSC_COMM_WORLD,

"Hello World from %d\n",rank);

PetscSynchronizedFlush(PETSC_COMM_WORLD);

Output: Hello World from 0

Hello World from 1

Hello World from 2

Synchronized print

Objects in PETSc Václav Hapla

Hierarchy of components le

vel o

f abstractio

n

PETSc

paralle

lization

Nonlinear solvers (SNES)

Time Steppers (TS)

Linear solvers (KSP)

Preconditioners (PC)

Matrices (Mat)

Vectors (Vec)

Index Sets (IS)

MPI BLAS

Application

use

r

LAPACK

every object in PETSc belongs to some communicator

MPI_Comm is the first argument of every object‘s constructor

two objects can only interact if they belong to the same communicator

Objects and communicators

PETSc uses specific and limited inheritance

every object in PETSc is an instance of a class: Vec, Mat, KSP, SNES, …

functions called on objects (= methods in C++) are prefixed by a class name: MatMult(Mat,…)

class is specified when the object is created using proper Create function (= constructor in C++): Mat A;

MatCreate(comm, &A);

PETSc object oriented design: classes

PETSc object oriented design: types

classes are further subdivided into types: seqaij,mpidense,composite,…

= seq. sparse, par. dense, implicit matrix addition/multiplication

type of object is specified during object lifetime Mat A;

MatCreate(comm, &A);

MatSetType(A, MATSEQAIJ);

Mat A,B; Vec x; KSP solver; are opaque objects

you don‘t access inner fields directly

in include/petscmat.h you can find typedef struct _p_Mat* Mat;

so B = A only copies pointer, not data

prevents unwanted data copying

makes pointer handling easier

allows hiding implementation from public interface → polymorphism

PETSc object oriented design: opaque objects

Polymorphism

MatMult(Mat A,Vec x,Vec y); //y = A*x

public interface

uniform for all types of matrices: sequential, parallel, dense, sparse, …

documented

calls private implementation based on type: MatMult_SeqDense(Mat A,Vec x,Vec y)

hidden, specific for each matrix type

PetscObject (1)

Every PETSc object can be cast to PetscObject: Mat A;

PetscObject obj;

obj = (PetscObject) A;

PetscObject provides general methods such as:

Get/SetName() – name the object (used for printing, MATLAB interface, etc.)

GetType() – the type of the object

GetComm() – the communicator the object belongs to

PetscObject (2)

Mat A;

char *type;

MPI_Comm comm;

PetscObjectGetComm((PetscObject)A,&comm);

PetscObjectGetType((PetscObject)A,&type);

//is the same as

MatGetType(A,&type);

PETSc inheritance

classes

types

...

...

once again: method names must be prefixed by the class name: Vec,Mat,KSP, etc.

all PETSc buil-in classes support following methods

Create() - create the object

Get/SetType() - set the implementation type

Common methods (1)

SetFromOptions() - set all options of the object from the options database

Get/SetOptionsPrefix() - set a specific option prefix for the given object

SetUp() - prepare the object inner state for computation

View() - print object info to specified output

Destroy() - deallocate the memory used by the object

Common methods (2)

Destroy method uses simple reference counting.

If counter > 0, then only nullify the pointer and decrement the counter.

If reference count equals 0

call type-specific private destroy routine

deallocate the whole object

So PETSc uses destroy always paradigm

Not like smart pointers in new C standard, Boost or Trilinos RCP, that use destroy never paradigm

Destroy

PETSc contains special PetscViewer class for printing to stdout, files (several text and binary formats), strings or even socket connection

basic usage: PetscViewer viewer;

PetscViewerCreate(comm, &viewer);

PetscViewerSetType(viewer, PETSCVIEWERASCII);

PetscViewerDestroy(&viewer);

prints only from the first processor of comm

Viewers (1)

predefined viewers: PETSC_VIEWER_STDOUT_WORLD, PETSC_VIEWER_BINARY_SELF, ...

every PETSc object can be viewed by the viewer:

Viewer v; Mat A; Vec x;

...

MatView(A,v);

VecView(x,v);

Viewers (2)

#include <petscviewer.h>

int main(int argc,char **args)

{

PetscViewer viewer;

PetscInt i;

PetscInitialize(&argc,&args,(char *)0,(char *)0);

PetscViewerCreate(PETSC_COMM_WORLD, &viewer);

PetscViewerSetType(viewer, PETSCVIEWERASCII);

PetscViewerFileSetMode(viewer, FILE_MODE_APPEND);

PetscViewerFileSetName(viewer, "test.txt");

for(i = 0; i <= 5; i++) {

PetscViewerASCIIPrintf(viewer, "test line %d\n", i);

}

PetscViewerDestroy(&viewer);

PetscFinalize();

return 0;

}

PetscViewer Example (1)

This program will append the following text to the file test.txt:

test line 0

test line 1

test line 2

test line 3

test line 4

test line 5

PetscViewer Example (2)

Vectors, index sets and matrices in PETSc

David Horák

Vec v;

VecCreate(MPI_Comm comm,&v);

VecDestroy(&v);

a vector is an array of PetscScalars

the vector object is not completely created in one call, you must at least set sizes: VecSetSizes(Vec v, int m, int M);

Create another vector with the same type and layout: VecDuplicate(Vec v,Vec *w);

Vec: Vectors

Create a vector from an existing array

Create vector from user provided array:

VecCreateSeqWithArray(MPI_Comm comm,

PetscInt n, const PetscScalar array[],

Vec *v)

VecCreateMPIWithArray(MPI_Comm comm,

PetscInt n, PetscInt N,

const PetscScalar array[], Vec *vv)

Global size can be specified as PETSC_DECIDE.

Local size can be specified as PETSC_DECIDE.

Vector parallel layout

Query vector layout:

VecGetOwnershipRange(Vec x, PetscInt *low,

PetscInt *high)

Create general layout:

PetscSplitOwnership(MPI_Comm comm,PetscInt *n,

PetscInt *N)

Ownership Range

Vec x;

Set all entries of vector to constant value: VecSet(Vec,PetscScalar)

VecSet(x,1.0);

Set individual elements (global indexing !): VecSetValues(Vec,PetscInt,PetscInt*,

PetscScalar*,InsertMode);

i = 1; v = 3.14;

VecSetValues(x,1,&i,&v,INSERT_VALUES);

//eq.

VecSetValue(x,i,v,INSERT_VALUES);

Setting vector values (1)

Setting vector values (2)

Set more entries at once: ii[0]=1; ii[1]=2; vv[0]=2.7; vv[1]=3.1;

VecSetValues(x,2,ii,vv,INSERT_VALUES);

The last argument can be INSERT_VALUES - replace original value

ADD_VALUES - add to original value

VecSetValues is not collective, values are cached

after setting all values, you must call assembly routine to exchange values between processors: VecAssemblyBegin(Vec x);

VecAssemblyEnd(Vec x);

get a copy of entries of x with indices ix to an array y:

VecGetValues(Vec x, PetscInt ni, const PetscInt ix[],

PetscScalar y[])

user must provide an allocated array y

get the pointer to the internal array:

Vec x; PetscScalar *a;

VecGetArray(Vec x,PetscScalar *a[]);

/* do something with the array */

VecRestoreArray(Vec x,PetscScalar *a[]);

local only; see VecScatter for general

Getting values

int localsize,first,i;

PetscScalar *a;

VecGetLocalSize(x,&localsize);

VecGetOwnershipRange(x,&first,PETSC_NULL);

VecGetArray(x,&a);

for (i=0; i<localsize; i++)

printf("Vector element %d : %e\n",

first+i,a[i]);

VecRestoreArray(x,&a);

Getting values example

VecAXPY(Vec y,PetscScalar a,Vec x); /* y = y + a*x */

VecAYPX(Vec y,PetscScalar a,Vec x); /* y = a*y + x */

VecScale(Vec x, PetscScalar a);

VecDot(Vec x, Vec y, PetscScalar *r); /* several variants */

VecMDot(Vec x,int n,Vec y[],PetscScalar *r);

VecNorm(Vec x,NormType type, double *r);

VecSum(Vec x, PetscScalar *r);

VecCopy(Vec x, Vec y);

VecSwap(Vec x, Vec y);

Basic operations (1)

VecPointwiseMult(Vec w,Vec x,Vec y);

VecPointwiseDivide(Vec w,Vec x,Vec y);

VecMAXPY(Vec y,int n, PetscScalar *a, Vec x[]);

VecMax(Vec x, int *idx, double *r);

VecMin(Vec x, int *idx, double *r);

VecAbs(Vec x);

VecReciprocal(Vec x);

VecShift(Vec x,PetscScalar s);

Basic operations (2)

Index Set is a set of indices

generalization of an integer array

can be distributed (if comm has more than one process)

general IS: IS is; PetscInt indices[]={1,3,7}; PetscInt n=3;

ISCreateGeneral(comm,n,indices,PETSC_COPY_VALUES,&is);

/* indices can now be freed */

ISCreateGeneral(comm,n,indices,PETSC_OWN_VALUES,&is);

/* indices are stored inside is and freed when

ISDestroy(&is) is called */

IS: Index Sets (1)

IS: Index Sets (2)

stride IS

in MATLAB: is = 0:2:n-1

in PETSCc:

ISCreateStride (comm,n,0,2,&is);

ISDestroy(&is);

Various manipulations: ISSum, ISDifference, ISInvertPermutations

To get the values given by isx from x and put them at positions

determined by isy into y:

VecScatterCreate(Vec x,IS isx,Vec y,IS isy,VecScatter*)

VecScatterBegin(VecScatter,Vec x,Vec y,InsertMode,

ScatterMode)

VecScatterEnd(VecScatter,Vec x,Vec y,InsertMode,

ScatterMode)

VecScatterDestroy(VecScatter*)

IS & VecScatters

Creating a vector and a scatter context that copies all values of MPI

vector vin to each processor into Seq. vector vout :

VecScatterCreateToAll(Vec vin,VecScatter *ctx,Vec *vout)

Creating an output vector and a scatter context used to copy all

values of MPI vector vin into the seq. vector vout on the zeroth core

VecScatterCreateToZero(Vec vin,VecScatter *ctx,Vec *vout)

Standard sequence follows: VecScatterBegin(), VecScatterEnd(),

VecScatterDestroy()

Other VecScatters

The usual create/destroy calls:

MatCreate(MPI_Comm comm,Mat *A);

MatDestroy(Mat *A);

Several more aspects to creation:

MatSetType(A,MATSEQAIJ); /*or MATMPIAIJ,MATAIJ */

MatSetSizes(Mat A,PetscInt m,PetscInt n,PetscInt M,

PetscInt N);

MatSeqAIJSetPreallocation(Mat B, PetscInt nz,

const PetscInt nnz[]);

Local or global size can be PETSC_DECIDE.

Mat: Matrices

MatCreateSeqAIJ(MPI_Comm comm, PetscInt m, PetscInt n,

PetscInt nz, const PetscInt nnz[],Mat *A);

nz - expected number of nonzeros per row (or slight overestimate)

nnz - array of expected row lengths (or slight overestimates)

considerable savings over dynamic allocation!

Matrix creation all in one

MatCreateMPIAIJ(MPI_Comm comm,PetscInt m,

PetscInt n,PetscInt M,PetscInt N,

PetscInt d_nz,const PetscInt d_nnz[],

PetscInt o_nz,const PetscInt o_nnz[],

Mat *A);

d_nz - # of nonzeros per row in diagonal part

o_nz - # of nonzeros per row in off-diagonal part

d_nnz - array of # of nonzeros per row in diagonal part

o_nnz - array of # of nonzeros per row in off-diagonal part

Matrix creation all in one

Basic matrix types

MATAIJ, MATSEQAIJ, MATMPIAIJ

basic sparse format, known as compressed row format, CRS, Yale

MATAIJ is identical to MATSEQAIJ when constructed with a single process communicator, and MATMPIAIJ otherwise.

MATBAIJ, MATSEQBAIJ, MATMPIAIJ

extensions of the AIJ formats described above

store matrix elements by fixed-sized dense blocks

intended especially for use with multiclass PDEs

multiple DOFs per mesh node

MATDENSE, MATSEQDENSE, MATMPIDENSE

dense matrices

MatGetSize(Mat mat, PetscInt *M, PetscInt* N);

MatGetLocalSize(Mat mat, PetscInt *m, PetscInt* n);

MatGetOwnershipRange(Mat A, PetscInt *first row,

PetscInt *last row);

Querying parallel structure

MatGetVecs(Mat mat,Vec *right,Vec *left)

right - vector that the matrix can be multiplied against

left - vector that the matrix vector product can be stored in

both can be PETSC_IGNORE

Compatible vectors

PETSc matrix creation is very flexible

No sparsity pattern

any processor can set any element => potential for lots of malloc calls

malloc is very expensive

tell PETSc the matrix' sparsity structure (do construction loop twice: once counting, once making)

MatSeqAIJSetPreallocation(Mat B,

PetscInt nz, const PetscInt nnz[]);

Matrix Preallocation

Set one value:

MatSetValue(Mat v, PetscInt i,PetscInt j,

PetscScalar va,InsertMode mode);

where insert mode is INSERT_VALUES, ADD_VALUES

Set logically 2-D array of values:

MatSetValues(Mat A,

PetscInt m, const PetscInt idxm[],

PetscInt n, const PetscInt idxn[],

const PetscScalar values[], InsertMode mode);

Setting values

MatSetValues is not collective, values are cached

MatAssemblyBegin(Mat A,MAT_FINAL_ASSEMBLY);

MatAssemblyEnd(Mat A,MAT_FINAL_ASSEMBLY);

cannot mix inserting/adding values

need to do assembly in between

Assembling the matrix

MatGetValues(Mat mat, PetscInt m, const PetscInt

idxm[], PetscInt n, const PetscInt idxn[],

PetscScalar v[])

Gets a block of values given by idxm and idxn from a matrix, only returns a local block

mat - the matrix

v - a logically two-dimensional array for storing the values

m, idxm - the number of rows and their global indices

n, idxn - the number of columns and their global indices

The user must allocate space (m*n PetscScalars) for the values v which are then returned in a row-oriented format, analogous to that used by default in MatSetValues()

Getting Values

Values are often not needed: many matrix operations supported

Matrix elements can only be obtained locally

PetscErrorCode MatGetRow(Mat mat,PetscInt row,

PetscInt *ncols,const PetscInt *cols[],

const PetscScalar *vals[]);

PetscErrorCode MatRestoreRow(/*same parameters*/);

Getting values in array

Extract one parallel submatrix:

MatGetSubMatrix(Mat mat, IS isrow, IS iscol,

MatReuse cll, Mat *newmat)

Extract multiple single-processor matrices:

MatGetSubMatrices(Mat mat, PetscInt n,

const IS irow[], const IS icol[],

MatReuse scall, Mat *submat[])

Collective call, but different index sets per processor

Submatrices

MatTranspose(Mat A, MatReuse reuse, Mat *B)

computes an out-of-place transpose B of a matrix A if reuse=MAT_INITIAL_MATRIX or

an in-place transpose of a matrix A if reuse=MAT_REUSE_MATRIX and B=A

MatMultTranspose()

MatMultTransposeAdd()

MatIsTranspose()

Matrix Transpose

matrix-vector

MatMult(Mat A,Vec in,Vec out);

MatMultAdd

MatMultTranspose

MatMultTransposeAdd

simple operations on matrices

MatNorm

MatScale

MatDiagonalScale

Matrix operations

Implicit matrices

some of the matrix types in PETSc are not stored by elements but they behave like normal matrices in some operations

nomenclature: matrix-free, implicit, not assembled, not formed, not stored ...

the most important operation is a matrix-vector product (MatMult) which can be considered an application of a linear operator

when using an iterative solver, this operation suffices to solve a linear system

matrix type MATTRANSPOSE

implicit transpose of a matrix

maintains pointer to the original matrix

its MatMult just calls MatMultTranspose of an underlying matrix and vice versa

MatTranspose (1)

Mat A, Ati, Ate;

Vec x, yi, ye;

//assemble somehow matrix A and vector x

MatCreateTranspose(A, &Ati);

MatTranspose(A, MAT_INITIAL_MATRIX,&Ate);

MatGetVecs(Ati,&x,&yi);

VecDuplicate(yi, &ye);

MatMult(Ati,x,yi);

MatMult(Ate,x,ye);

//norm(yi-ye) is close to 0

MatTranspose (2)

MatComposite

Mat F,G;

Mat arr[3] = {C, B, A}; // reverse order!

// F = A*B*C (implicitly)

MatCreateComposite(comm, 3, arr, &F);

MatCompositeSetType(F,

MAT_COMPOSITE_MULTIPLICATIVE);

// G = A+B+C (implicitly)

MatCreateComposite(comm, 3, arr, &G);

MatCompositeSetType(G, MAT_COMPOSITE_ADDITIVE);

matrix type MATCOMPOSITE

implicit matrix sum or product

matrix type MATSHELL

no predefined operation

arbitrary size

any operations can be defined by the user (C function pointers) using MatShellSetOperation function

can have a context with additional data

MatShellSetContext(Mat mat,void *ctx);

MatShellGetContext(Mat mat,void **ctx);

Shell matrices

#undef __FUNCT__

#define __FUNCT__ "mymatmult"

/* user-defined matrix-vector multiply */

PetscErrorCode mymatmult(Mat mat,Vec in,Vec out) {

MyType *matData;

PetscFunctionBegin;

MatShellGetContext(mat,(void**)&matData);

/* compute out from in, using matData */

PetscFunctionReturn(0);

}

Shell matrix example (1)

Shell matrix example (2)

Mat A;

PetscInt m,n,M,N;

MyType Adata;

...

MatCreate(comm,&A);

MatSetSizes(A,m,n,M,N);

MatSetType(A,MATSHELL);

MatShellSetOperation(A,MATOP_MULT,

(void(*)(void)) mymatmult);

MatShellSetContext(A,(void*)&Adata);

...

Linear solvers David Horák

Solving a linear system Ax = b with Gaussian elimination can take a lot of time and memory.

alternative: iterative solvers use successive approx. of the solution:

convergence not always guaranteed

possibly much faster / less memory

basic operation: y = Ax executed once per iteration

convergence can be accelerated by a preconditioner B ~ A-1

KSP & PC: Iterative solvers

All KSP solvers in PETSc are iterative

direct solvers - one iteration with perfect preconditioning (LU, Cholesky)

Object oriented: solvers only need matrix action, so can handle shell matrices

Preconditioners

Fargoing control through commandline options

Tolerances

Convergence and divergence reason

Custom monitors and convergence tests

Basic concepts

KSPCreate(comm,&solver);

// general:

KSPSetOperators(solver,A,B,DIFFERENT_NONZERO_PATTERN);

// common:

KSPSetOperators(solver,A,A,DIFFERENT_NONZERO_PATTERN);

// also SAME_NONZERO_PATTERNS and SAME_PRECONDITIONER

KSPSolve(solver,rhs,sol);

/* optional */ KSPSetup(solver);

KSPDestroy(solver);

Iterative solver basics

KSPSetType(solver,KSPGMRES);

KSP can be controlled from the commandline:

KSPSetFromOptions(solver);

/* right before KSPSolve or KSPSetUp */

then options -ksp_... are parsed -ksp_type gmres

-ksp_gmres_restart 20

-ksp_view

Solver type

Iterative solvers can fail

solve call itself gives no feedback

solution may be completely wrong

KSPGetConvergedReason(solver,&reason)

positive for convergence, negative for divergence

KSPGetIterationNumber(solver,&nits) after how many iterations did the method stop?

Convergence

KSPSetTolerances(solver,rtol,atol,dtol,maxit);

Monitors can also be set in code, but easier:

-ksp_monitor

-ksp_monitor_true_residual

Monitors and convergence tests

Many options for the (mathematically) sophisticated user, some specific to one method

KSPSetInitialGuessNonzero

KSPGMRESSetRestart

KSPSetPreconditionerSide

KSPSetNormType

Advanced options

MatNullSpace sp;

MatNullSpaceCreate /* constant vector */

(PETSC_COMM_WORLD,PETSC_TRUE,0,PETSC_NULL,&sp);

MatNullSpaceCreate /* general vectors */

(PETSC_COMM_WORLD,PETSC_FALSE,5,vecs,&sp);

KSPSetNullSpace(ksp,sp);

The solver will now properly remove the null space at each iteration.

Null spaces

PC usually created as part of KSP: separate create and destroy calls exist, but are (almost) never needed

KSP solver; PC precon;

KSPCreate(comm,&solver);

KSPGetPC(solver,&precon);

PCSetType(precon,PCJACOBI);

PCILU, PCJACOBI, PCASM, PCBJACOBI, PCMG, etc.

Controllable through commandline options:

-pc_type ilu -pc_factor_levels 3

PC basics

Iterative method with direct solver as preconditioner would converge in one step

Direct methods in PETSc implemented as special iterative method: KSPPREONLY only apply preconditioner - skips stopping criteria etc.

All direct methods are preconditioner type PCLU:

myprog -pc_type lu -ksp_type preonly \

-pc_factor_mat_solver_package mumps

KSP direct methods

IS isr, isc; MatFactorInfo info;

MatGetOrdering(A,MATORDERING_NATURAL,&isr,&isc);

MatLUFactor(A,isr,isc,&info);

// MatLUFactorSymbolic(), MatLUFactorNumeric()

// MatCholeskyFactor(A, isr, &info);

MatSolve(A,b,x);

MatSolves(Mat A,Vecs bs,Vecs xs)

// Solves A x = b, given a factored matrix, for a

collection of vectors

MatMatSolve(Mat A,Mat B,Mat X)

//Solves A X = B, given a factored matrix

Low-level direct methods

Krylov Subspace Methods

Using PETSc linear algebra, just add:

KSPSetOperators(KSP ksp, Mat A, Mat M,

MatStructure flag);

KSPSolve(KSP ksp, Vec b, Vec x);

Can access subobjects

KSPGetPC(KSP ksp, PC *pc)

Preconditioners must obey PETSc interface

Basically just the KSP interface

Can change solver dynamically from the command line

-ksp_type bicgstab

Linear solvers - summary

Newton and Picard Methods

Using PETSc linear algebra, just add:

SNESSetFunction(SNES snes,Vec r,residualFunc,

void *ctx);

SNESSetJacobian(SNES snes, Mat A, Mat M,

jacFunc,void *ctx);

SNESSolve(SNES snes, Vec b, Vec x);

Can access subobjects

SNESGetKSP(SNES snes, KSP *ksp)

Can customize subobjects from the cmd line

Set the subdomain preconditioner to ILU with -sub_pc_type ilu

Nonlinear solvers - summary

1 Sequential LU

ILUDT (SPARSEKIT2, Yousef Saad, U of MN)

EUCLID & PILUT (Hypre, David Hysom, LLNL)

ESSL (IBM)

SuperLU (Jim Demmel and Sherry Li, LBNL)

Matlab

UMFPACK (Tim Davis, U. of Florida)

LUSOL (MINOS, Michael Saunders, Stanford)

2 Parallel LU

MUMPS (Patrick Amestoy, IRIT)

SPOOLES (Cleve Ashcroft, Boeing)

SuperLU_Dist (Jim Demmel and Sherry Li, LBNL)

3 Parallel Cholesky

DSCPACK (Padma Raghavan, Penn. State)

MUMPS (Patrick Amestoy, Toulouse)

CHOLMOD (Tim Davis, Florida)

3rd party direct solvers in PETSc

1 Parallel ICC

BlockSolve95 (Mark Jones and Paul Plassman, ANL)

2 Parallel ILU

PaStiX (Faverge Mathieu, INRIA)

3 Parallel Sparse Approximate Inverse

Parasails (Hypre, Edmund Chow, LLNL)

SPAI 3.0 (Marcus Grote and Barnard, NYU)

4 Sequential Algebraic Multigrid

RAMG (John Ruge and Klaus Steuben, GMD)

SAMG (Klaus Steuben, GMD)

5 Parallel Algebraic Multigrid

Prometheus (Mark Adams, PPPL)

BoomerAMG (Hypre, LLNL)

ML (Trilinos, Ray Tuminaro and Jonathan Hu, SNL)

3rd party preconditioners in PETSc

DM: Data management and grid manipulation

SNES: Nonlinear solvers

TS: Time stepping

PETSc components we were not speaking about

Debugging & profiling

Launch the debugger

-start_in_debugger [gdb,dbx,noxterm]

-on_error_attach_debugger [gdb,dbx,noxterm]

Attach debugger only to some parallel processes: -debugger_nodes 0,1

Put a breakpoint in PetscError() to catch errors as they occur

Debugging - stepping

PETSc tracks memory overwrites at both ends of arrays

the CHKMEMQ macro causes a check of all allocated memory

track memory overwrites by bracketing them with CHKMEMQ

PETSc checks for leaked memory

use PetscMalloc() and PetscFree() for all allocation

print unfreed memory on PetscFinalize() with -malloc_dump

Simply the best tool today is valgrind (http://www.valgrind.org)

it checks memory access, cache performance, memory usage...

needs -trace-children=yes when running under MPI

Debugging - memory checking

PETSc has integrated profiling (timing, flops, memory usage, MPI messages)

Option -log_summary prints a report on PetscFinalize()

PETSc allows user-defined events

PetscLogEventRegister(), PetscLogEventBegin/End()

to create and to manage events reporting time, calls, flops, communication, etc.

Memory usage is tracked by object

Events may also be nested and will aggregate in a nested fashion

Profiling is separated into stages

PetscLogStageRegister(), PetscLogStagePush/Pop()

to create and to to manage stages identified by an integer handle

Stages may be nested, but will not aggregate in a nested fashion

Profiling

output of -log_summary:

Example profiling

References

Introduction to PETSc, TACC, Jan 17, 2012 (Victor Eijkhout). Slides

Short Course at the Graduate University, Chinese Academy of Sciences, Beijing, China, July 2010 (Matthew Knepley). Slides

Tutorial at ICES, UT Austin, TX September 2011 (Matthew Knepley). Slides

PETSc homepage, http://www.mcs.anl.gov/petsc/

PETSc Users Manual, http://www.mcs.anl.gov/petsc/petsc-current/docs/manual.pdf

PETSc Developer Guide, http://www.mcs.anl.gov/petsc/developers/developers.pdf

Thank you for your attention!