Modern tools to manage multi-developers...

Modern tools to manage

multi-developers codes

HPC Applications to Turbulence and Complex Flows

Rome, 10-14 October 2016

Outline

Code development

Git versioning system

CMake

FTMake

HDF5

FFTW

P3DFFT

10-14 Oct 2016F.Bonaccorso, HPC-LEAP

Code development

From source to program

Compilation and compiler

Linking

Make and makefile

Source code with many developers

Code repository

Versioning



clone

add

commit

push

pull

Branching

checkout

branch

Merge

merge

Conflicts

Founding changes

diff

difftool


CMake

Purpose of CMake

Multi-platform

Generators

CMakeLists.txt

Simple program

Adding source files

Adding library deps

Library

Adding source files

Target dependency

Advanced usage


FTMake: Advanced example based on

CMake

Algorithm selection

CMakeLists.mine

parameter.xml

Portability layer

Machine and user profile

Versioning support

Auto update

Syncing with master branch

Prevents user branch divergence

Tag local variations

Unique id for current build: algo and code


HDF5

Description

Data types

Hierarchy

Group

Dataset

Attributes

Traversing

Files/Group/Datasets

Reading/Writing

Selection

Parallel with MPI


FFTW

Introduction

Coefficients: Plan

Data layout

Real data

Complex data

2D / 3D


P3DFFT

Introduction

Memory configuration

API

Setup

Transform


Code development



From source to program

Programming languages are high level way of

speaking to a computer

Low-level

Little to no abstraction over the machine ISA

Assembly, forth, ...

High level

Near to human way of thinking

Fortran, C, C++, Java, Python, ...

HPC Applications to

Turbulence and Complex

Flows

F.Bonaccorso, HPC-LEAP

Compiled vs interpreted

Using a high level language require a translation

phase

Once for all: compilation into machine code

Long operation

Optimizations and better running time

Not portable result

At every execution: interpreted

Fast edit/run cycle

Not the best performance for final product

HPC Applications to


Flows


Fortran,C and C++: compiled

On every source file, the compiler is invoked to

produce an object file containing machine code

in this step, many optimizations and source

transformations can be done

All the object files are put together by the linker to

get the executable

Needed libraries are collected here

HPC Applications to


Flows


Make

Make is a common tool to automate the chain:

editing source file -> compile into obj -> link

Its input is a makefile which describes the rules for

descending the chain

Name of the target product and its needed input files

Compiler and flags which are used

Linker and libraries

At every change (modification time of file) when you

invoke make, it runs only the needed action

HPC Applications to


Flows


Example Makefile

flux.o: flux.c

cc -O3 flux.c -c -I/mypath/for/includes

main.o: main.c

cc -O3 main.c -c -I/myotherpath/for/includes

flux: flux.o

cc flux.o main.o -o Prg -lm -L/mypath/for/fftw -l fftw

HPC Applications to


Flows


Distributed version control

Version control systems are a category of software

tools that help a software team manage changes to

source code over time

Version control software keeps track of every

modification to the code in a special kind of database

(the code repository)

If a mistake is made, developers can go back to a

previous version and compare the code to help fix the

mistake

HPC Applications to


Flows


Concurrent development

Software developers working in teams are

continually writing new source code and changing

existing source code.

The code for a project is typically organized in a

folder structure or "file tree".

One developer in the team may be working on a new

feature while another developer fixes an unrelated

bug by changing code, each developer may make

their changes in several parts of the file tree

HPC Applications to


Flows


Fixing conflicts Version control tracks every individual change by each

contributor and helps preventing concurrent work fromconflicting

Changes made in one part of the software can beincompatible with those made by another developer working atthe same time. This problem should be discovered and solvedin an orderly manner without blocking the work of the rest ofthe team

Further, in all software development, any change canintroduce new bugs on its own.

Testing and development proceed together until a new versionis ready

HPC Applications to


Flows


Git commands

clone

add

commit

push

pull

Branching

checkout

branch

Merge

fast-forward

3 way

Conflicts

Examining the repo

status

diff

difftool

Discarding changes

HPC Applications to


Flows


Git: fast versioning system

Git was created to overcome difficulties with other

versioning tools while coding the Linux kernel

Worldwide spread software team

Big number of concurrent feature sub-teams

If it works for Linux, it can scale up

Git imposes no overhead compared to zip all files

and numbering the zips

It’s way more than this

• It can be used even in small projects

HPC Applications to


Flows


Cloning the repository

The repository is the database which will contain all

the versions of the code

It can be created locally

git init

Normally you clone a central repository

git clone [email protected]

Now you have an equivalent copy, locally

local repository

HPC Applications to


Flows


mailto:[email protected]



Git config

Git tracks every commit, securing it with a SHA-1

hash as an integrity check

Every commit is by one author, but who am I?

git config --global user.name fabio

git config --global user.mail [email protected]

There is also the choice to have per-repo options

git config [--local]

HPC Applications to


Flows



Normal workflow

You edit a file

You tell git you want a new version for this file

Repeat for all related files

main.c, sub1.c, sub2.c

Commit the changes explaining them in a comment

HPC Applications to


Flows


git add

Git tracks changes of every file (under its control)

In your working directory there can be files outside git

control: obj, exe, libs, ...

You notify git of a change in FileName with

git add FileName

This command does a backup (stage) of FileName

for future use (in a commit)

Think to your zip file

HPC Applications to


Flows


git commit

When you need to finalize a version, or a little new

step, you do a commit

git commit [-m Comment for this ver]

At this point all notified changes are grouped and

named with a comment and a SHA-1 number

Now your zip has an official number

The SHA-1 number is the way to refer to the commit

HPC Applications to


Flows


git add and commit

The git add and git commit commands composethe fundamental Git workflow

Developing a project is all around the basicedit/stage/commit pattern

Very important is the comment for the commit

Since it will be used some day in the future to look backand recover from errors/bugs, it’s crucial to explain theintent of the commit

HPC Applications to


Flows


Viewing the history of your project

Every commit adds up in the project, each one in its

branch

To see the entire history up to current version:

git log

This command will output the list of commits

For an ascii-art graph:

git log --oneline --all --graph

HPC Applications to


Flows


GUI front-end to git

Command line git is tough

Gui programs to interact with a git repo

Linux: qgit, gitk, git gui, ...

Windows: Git

MacOS: SourceTree, ...

Way better looking history graph

Simple presentation of differences in working dir

HPC Applications to


Flows


Log with GUI tools

Native git look

Same repository using

QGit on Linux

HPC Applications to


Flows


git push

After a commit is in your local repo, you can forward

to the central repository

git push

It will make your progress available to the other

developers

This is where problems can arise

Concurrent changes in the same files

HPC Applications to


Flows


Centralized workflow This kind of workflow uses a central repository to serve as the

single point-of-entry for all changes to the project

The default development branch is called master and allchanges are committed into this branch. This workflow doesn’trequire any other branches besides master

Developers work in their own local repo, they edit files andcommit changes

To publish changes to the official project, developers “push”their local master branch to the central repository

HPC Applications to


Flows


Centralized workflow 2

Git preserves the sequence of commits

HPC Applications to


Flows


git pull

To synchronize your local repo with the remote:

git pull (--rebase)

--rebase will avoid a merge [More later]

This command will get from remote the commits

made by others

This will give the chance for a push!

HPC Applications to


Flows


Centralized workflow 2

Now it is ok!

HPC Applications to


Flows


Branch

A branch represents an independent line of

development

You can think of them as a way to request a brand

new working directory, staging area, and project

history

New commits are recorded in the history for the

current branch, which results in a fork in the history

of the project.

HPC Applications to


Flows


Branch

To create a new branch from current commit, the

command is:

git branch BranchName

It's important to understand that branches are just

pointers to commits

When you create a branch, all Git needs to do is

create a new pointer

it doesn’t change the repository in any other way

HPC Applications to


Flows


Branches: example

HPC Applications to


Flows


checkout

To start using a branch, you check it out:git checkout BranchName

The creation of a branch followed by a checkout canbe done in one command:

git checkout -b BranchName

All the following commits will belong to the branch,with no interference to the other branches This can be a way to test ideas

HPC Applications to


Flows


Merge

What if a feature you developed in a branch is worth tostay?

Porting back commits of one branch to another is calledmerginggit merge BranchToBeMergedHere

Merging is Git's way of putting a forked history, backtogether again

You take the independent lines of development createdby git branch and integrate them into a single branch

HPC Applications to


Flows


Merge: fast-forward

When there is a linear path to your current branch

and the commits from the branch to merge, git

simply “moves forward” the history line

After the merge, it’s like everything happened in a

single branch

HPC Applications to


Flows


Merge: fast-forward

HPC Applications to


Flows


Merge: three-way

When fast-forward cannot be applied, git uses a 3-

way merge, doing some magic surgery at the history

line

Git recovers the last commit of the current branch,

the last commit of the branch to be merged and their

common father

This leads to 3 commits merge -> 3-way merge

HPC Applications to


Flows


Branch big_Feature merged into masterHPC Applications to


Flows


Rebase vs Merge

Merge keeps both commit history Rebase rewrite commits

DON'T DO on public commits!

HPC Applications to


Flows


Merge: conflict

If 3-way merge also fails -> Conflict

Git raises hands up, and it’s up to you

Same cycle: edit/add/commit to solve the problem

You must choose what you want to be the resulting

version

It’s safer to have TESTS to check everything is ok

after a merge

HPC Applications to


Flows


Examining the local repo

To examine the status of your local branch:

git status

It will show if there are differences in your files under

version control, showing also everything that is not

versioned

If you want to avoid extraneous files:

git status -uno

This skips also not yet added source files!

HPC Applications to


Flows


Differences in files

To show the difference between your currentimplementation and previous version:git diff FileName

Textual tool, based on the UNIX command diff

For a graphical visual comparison:

git difftool FileName

Several tools can be configured to be used with git

tkdiff, vimdiff, ...

HPC Applications to


Flows


Differences with cmdline

HPC Applications to


Flows


Differences with GUI

HPC Applications to


Flows


Merge: solving a conflict

You inspect your current situation with

git status

You see the conflicting changes with

git diff (or better with git difftool)

You edit the file deciding the final version

You add the result file

Repeat for every file with conflict

HPC Applications to


Flows


Discarding current changes

To recover the previous version of a file:

git checkout -- FileName

This command destroys every change you did to this

file

You can recover a file at same version, using the

SHA1 of the commit:

git checkout sha1 FileName

Also changes your copy of FileName

HPC Applications to


Flows


Back to a previous commit

To recover the previous version of the project:git checkout sha1

This command recovers the state of particularversion (identified by its sha1)

You can compile, run tests,... to clarify how things was atthat point

Checking out a commit makes the entire workingdirectory match that commit You can go back safely to your state

HPC Applications to


Flows


Checkout commit vs checkout files

Checking out a commit is a read-only operation

Checking out a file lets you use an old version of that

particular file, leaving the rest of your working

directory untouched

It will show in git as “Change to be committed”

HPC Applications to


Flows


CMake



Building system

Many project use Makefiles to tackle the task of

compiling the executable when source changes

This is a solution from the UNIX world

Pro:

Minimal rebuilts, textual file, full customizable

But...

Hard to generalize

Not cross-platform


Toward a general Makefile

Unix / Linux world

Autotools (includes autoconf)

qmake (from the QT library)

Portable

CMake


CMake

CMake is a cross-platform generator

Can produce Makefiles, Visual Studio solutions, Eclipse

projects, ...

It has a simple syntax and needs 1 file

CMakeLists.txt

With two extra features:

CTest Automate testing

CPack For the creation of installation package


Basic usage of CMake

In a basic use, we need this CMakeLists.txt:

project(HelloWorld)

add_executable(hello hello.c)

Then we can run cmake:


Project with two progr languages

We have C and C++ source files, so:

project(HelloWorld)

enable_language(CXX)

add_executable(hello hello.c main.cpp)


Other generators: Eclipse

To list available generators:

cmake -G

The output is platform dependant, so in Linux we have:

Unix Makefiles

Ninja

Watcom WMake

CodeBlocks - Ninja CodeBlocks - Unix Makefiles

CodeLite - Ninja CodeLite - Unix Makefiles

Eclipse CDT4 - Ninja Eclipse CDT4 - Unix Makefiles

KDevelop3 KDevelop3 - Unix Makefiles

Kate - Ninja Kate - Unix Makefiles

Sublime Text 2 - Ninja Sublime Text 2 - Unix Makefiles


Multi/language example in Eclipse CDT


String variables

To set a variable to a string value:

set(varName Value)

To print a message to the console

message(“String")

Variable are referred with {varName}, so in a

message:

message(“The variable is {varName}”)


Compiler and linker options

In CMake there are special named variables for

controlling some behaviour: CMAKE_Fortran_FLAGS, CMAKE_C_FLAGS,

CMAKE_CXX_FLAGS for the compiler flags

For each one the debug/release version

CMAKE_C_FLAGS_DEBUG, CMAKE_C_FLAGS_RELEASE

For example, we can set:

set(CMAKE_C_FLAGS ${CMAKE_Fortran_FLAGS} -

Wall")

set(CMAKE_C_FLAGS_DEBUG "-O0 -g")

set(CMAKE_C_FLAGS_RELEASE "-O3")


Default configuration

It’s useful to define a default build type:

if(NOT CMAKE_BUILD_TYPE)

set(CMAKE_BUILD_TYPE "Release")

endif()

We can choose the DEBUG config at command line:

cmake -DCMAKE_BUILD_TYPE=Debug ..


Libraries

We can structure the project to use a library:

project( mylibrary )

set( mylib_SRCS library.c )

add_library( my [SHARED] ${mylib_SRCS} )

The keyword SHARED changes from static to

dynamic library


Simple library

Building the library:


Project with one library

To build a library used by the main program, we

should combine the add_executable and

add_library in CMakeLists.txt:

PROJECT( myproj )

SET( mylib_SRCS library.c )

ADD_LIBRARY( my ${mylib_SRCS} )

add_executable(hello hello.c)

target_link_libraries(hello my)


Project with one library


List Variables and loop

To set a variable to a list, simply list the elements with a space separator, eg:

set(FRUITS Apple Banana Orange Kiwi Mango)

To loop over the elements of list variable:

foreach(varName ${varListName}) / endforeach()

A simple example with message:

foreach(fruit ${FRUITS})

message("${fruit} is a tasty fruit")

endforeach()


If & then

The language of CMake is powerful enough to include the usual if/then/else construct, leading to complex CMakeLists.txt

Example:

if(ENABLE_MPI)

message("MPI is enabled")

else()

message("MPI is disabled")

endif()


Using external libraries

CMake has the built-in ability to use libraries in the

default installation path (platform dependant)

For the non-standard case it has a special keyword:

Headers: INCLUDE_DIRECTORIES

Libraries: FIND_LIBRARY and link with the

result of it


Going further

Built using INCLUDE_DIRECTORIES and

FIND_LIBRARY in CMake there are ways to search

for wide-spread software libraries:

FIND_PACKAGE( LibName [REQUIRED] )

The keyword REQUIRED tells CMake to abort if it

failed to find the library

Example:

FIND_PACKAGE( MPI REQUIRED )


FTMake: advanced build system



Introduction

Based on CMake, FTMake is a portable build system

used in our research group

Main features

Portability layer

Algorithm selection

Policy for the versioning system

Testing framework

UUID identification of builds (in progress)


Portability layer

FTMake is made to run codes on a wide range of

platforms

PC (Win/MacOS/Linux)

Little clusters of Linux servers

Computer centers

The required libraries are handled via standard

CMake-based tecniques…

They correctly work on PC

Computer centers mosly provide alternative versions

of libraries

For different compilers, legacy versions…


Portability layer

In FTMake, there are two ways to customize the

path for some binaries / libraries

Host name based

User name based

The central list of customization if the file

FTMAKE_DIR/cmake/hooks.cmake

They are added under the folder

FTMAKE_DIR/cmake/HOST-xxxx.cmake

FTMAKE_DIR/cmake/USER-xxxx


Host name based configuration

In hooks.cmake, the variable

HOOKS_KNOWN_HOST is set to the list of extra

configuration host-based files:

set(HOOKS_KNOWN_HOST droemu.roma2.infn.it fen01 fen02

…)

For each entry there is a corresponding file:

cmake/HOST-droemu.roma2.infn.it.cmake

cmake/HOST-fen01.cmake

cmake/HOST-fen02.cmake


Example HOST-xxxx.cmake

message(STATUS "You are on fen01/02/07/08 aka fermi front-end")

set(TIMING "YES")

set(WANT_CURL "NO")

set(PROF_TIMING "NO")

set(MPI_C_FOUND TRUE)

set(MPI_C_LIBRARIES "m")

set(MPI_C_INCLUDE_PATH " ")

set(MPI_Fortran_FOUND TRUE)

set(MPI_Fortran_LIBRARIES "m")

set(MPI_Fortran_INCLUDE_PATH " ")


Algorithm selection

FTMake implement the selection of the flags needed

by C codes

In parameter.xml (in the source dir) there is a

hierarchical listing of all the flags of the code

For each flag, there is the name and optionally:

if needs a variable (type, optional/not)

if the program needs one more source file


Policy for the versioning system

With git you can work in a feature branch, without

disturbing the master branch

At some point the feature will be merged back

When time passes the merge will become more and

more likely to conflict

FTMake implements a policy to merge the master

branch into the feature branch


Testing framework and continous

integration

FTMake also enforces the use of testing framework

based on CTest

At every commit, a central gitlab server is notified to

schedule a test suite


UUID identification of builds

Every (successful) compilation is labeled with a

unique string (UUID)

This UUID identifies the particular combination of

flags, repository branch and software used to build

the code

A central database is notified…

The simulation experiments can be related to the

UUID


HDF5 library



HDF5: Introdution Hierarchical Data Format 5 (HDF5) is an open source technology

suite for managing data collections of all sizes and complexity

HDF5 was specifically designed for: high volume and/or complex data (but can be used for low volume and

simple data)

every size and type of system (portable)

flexible, efficient storage and I/O

Fast on parallel machines

HDF5 is similar to XML… files are self-describing and allow users to specify complex data

relationships and dependencies

…But HDF5 files can contain binary data and allow direct access toparts of the file without first parsing the entire contents.

It is cross-platform, with serial and MPI API Fortran, C bindings


Hierarchy of HDF5 files

Modeled after the directory/file structure, an HDF5

file constists of

groups (analogous to dirs, can be nested)

datasets (contain data)

attributes (carries info about data, meta-data)

There name of the dataset contains its groups:

/Euler/velocity

/Particle/positions


Tools: h5ls, h5dump

To list the content of an HDF5 file:

h5ls [-r] FileName

Example output:

$ h5ls -r test.h5

/ Group

/IntArray Dataset {8, 5}


Tools: h5ls, h5dump

To see the content of a HDF5 file:

h5dump [options] FileName

Example:

$ h5dump -H test.h5

HDF5 "test.h5" {

GROUP "/" {

DATASET "IntArray" {

DATATYPE H5T_STD_I32LE

DATASPACE SIMPLE { ( 8, 5 ) / (

8, 5 ) }

}

}

}

$ h5dump test.h5

HDF5 "test.h5" {

GROUP "/" {

DATASET "IntArray" {

DATATYPE H5T_STD_I32LE

DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) }

DATA {

(0,0): 10, 10, 10, 10, 10,

(1,0): 10, 10, 10, 10, 10,

(2,0): 11, 11, 11, 11, 11,

(3,0): 11, 11, 11, 11, 11,

(4,0): 12, 12, 12, 12, 12,

(5,0): 12, 12, 12, 12, 12,

(6,0): 13, 13, 13, 13, 13,

(7,0): 13, 13, 13, 13, 13


File operations

In HDF5 you need to use the concept of file

Same as the Standard C Library

Instead of fopen/fclose

hid_t H5Fopen (char *name, unsigned flags, hid_t);

hid_t H5Fcreate (char *name, unsigned flags, hid_t ,

hid_t);

herr_t H5Fclose (hid_t);

The returned hid_t is the IDentifier analogous to

FILE* of the StdLibC


Dataset

Data (vector, matrix) must be written in a dataset

A dataset has:

rank (1,2,3,… for 1D,2D,3D,… data)

type (H5_NATIVE_[INT|FLOAT|DOUBLE])

size(10 elems, 3x2 elems, 4x4x4 elems)


HDF5 Lite: make_dataset

Using the Lite API, this is all we need to write some data:hid_t file_id;

hsize_t dims[2]={2,3};

int data[6]={1,2,3,4,5,6};

/* create a HDF5 file */

file_id = H5Fcreate ("ex_lite1.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);

/* create and write an integer type dataset named "dset" */

H5LTmake_dataset(file_id,"/dset", 2, dims, H5T_NATIVE_INT,data);

/* close file */

H5Fclose (file_id);


HDF5 Lite: read_dataset To read back:

int data[6];

hsize_t dims[2];

/* open file from ex_lite1.c */

hid_t file_id = H5Fopen ("ex_lite1.h5", H5F_ACC_RDONLY, H5P_DEFAULT);

/* read dataset */

H5LTread_dataset(file_id,"/dset", H5T_NATIVE_INT , data); /*data[]={1,2,3,4,5,6}*/

/* get the dimensions of the dataset */

H5LTget_dataset_info(file_id,"/dset",dims,NULL,NULL); /* nowdims[2]={2,3} */

/* close file */

H5Fclose (file_id);


The low level API

When you need finer control, you explicitly take care

of creating/opening + closing a:

group H5Gcreate/H5Gopen + H5Gclose

dataset H5Dcreate/H5Dopen + H5Dclose

Each create or close function exists with 2 names

create1/open1 For old v1.6 HDF5

create2/open2 From HDF5 v1.8

The create2 and open2 forms take more parameters

(property list)

H5P_DEFAULT can always be used


Dataspace

A dataspace describes the dimensionality of the dataarray

A dataspace is a regular N-dimensional array of datapoints, called a simple dataspace

a more general collection of data points exists, called acomplex dataspace

The dimensions of a dataset can be fixed(unchanging), or they may be unlimited, whichmeans that they are extensible

A dataspace can also describe a portion of adataset, making it possible to do partial I/Ooperations on selections


Creating dataspaces

To create a dataspace:

hid_t H5Screate_simple(int rank, const hsize_t *dims,

const hsize_t * maximum_dims);

The parameter maximum_dims can be NULL

Fixed size equal to dims

Example, 4x6 elems:

hsize_t dims[2] = {4,6};

hid_t dataspace_id = H5Screate_simple(2, dims, NULL);


Creation of a dataset

Define the dataset characteristics:

Define a datatype or specify a pre-defined datatype

A user defined datatype is needed dealing with structures

Define a dataspace

Specify the property list(s) or use the default For the storage: contiguos, chunked, compressed

Fixed or Extensible


H5Dcreate

To create a dataset:

hid_t H5Dcreate2( hid_t loc_id, const char *name, hid_t dtype_id, hid_t space_id, hid_t lcpl_id, hid_t dcpl_id, hid_t dapl_id );

loc_id can be a file_id or a group_id

dcpl_id and dcpl_id are the creation and access datasetproperty list for non-default features

Example:

hid_t dims[] = { 4,6};

hid_t dataspace_id = H5Screate_simple(2, dims, NULL);

/* Create the dataset. */

hid_t dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE,

dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);


H5Dopen

To use a pre-existant dataset, we open it:

hid_t H5Dopen2 (hid_t loc_id, const char *name, hid_t

dapl_id );

loc_id can be a file_id or a group_id

name is relative to "/" or to the group

dapl_id is the access dataset property list for non-

default features


H5Dwrite

To write actual data to a dataset:

herr_t H5Dwrite(hid_t dataset_id, hid_t mem_type_id,

hid_t mem_space_id, hid_t file_space_id,

hid_t xfer_plist_id, const void * buf);

dataset_id is the dataset

mem_type_id is the type of the data in memory

mem_space_id and file_space_id are the memory

and file dataspace selections or H5P_ALL

xfer_plist_id is the plist of the transfer or

H5P_DEFAULT


H5Dwrite: example

/* Open an existing file. */

hid_t file_id = H5Fopen(FILE, H5F_ACC_RDWR,H5P_DEFAULT);

/* Open an existing dataset. */

hid_t dataset_id = H5Dopen2(file_id, "/dset",H5P_DEFAULT);

/* Write the dataset. */

int dset_data[6] = { 1,2,3,4,5,6 };

status = H5Dwrite(dataset_id, H5T_NATIVE_INT,

H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);


H5Dread

To read from a dataset:

herr_t H5Dread(hid_t dataset_id, hid_t mem_type_id,

hid_t mem_space_id, hid_t file_space_id,

hid_t xfer_plist_id, void * buf);

Example:

status = H5Dread(dataset_id, H5T_NATIVE_INT,

H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);


HDF5 library usage

During read or write HDF5 library perform type

conversion

This is the place which makes it cross-platform

This conversion can be slow

Files, groups, datasets, … MUST be closed when

not needed

Memory leak and strange behaviour otherwise


Selection: partial write

Imagine a 2d dataset:

• You want a partial

write:

We select and writes to a 3 x 4 subset of the dataset

with an offset of 1 x 2


Partial write: example/* Create memory space with size of subset. Get file dataspace

and select subset from file dataspace. */

int dimsm[2] = { 3,4 };

hid_t memspace_id = H5Screate_simple (RANK, dimsm,NULL);

hid_t dataspace_id = H5Dget_space (dataset_id);

int offset[2] = { 1,2 };

int count[2] = { 3,4 };herr_t status = H5Sselect_hyperslab(dataspace_id, H5S_SELECT_SET,

offset,NULL, count,NULL);

/* Write a subset of data to the dataset */

status = H5Dwrite(dataset_id, H5T_NATIVE_INT, memspace_id,

dataspace_id, H5P_DEFAULT, sdata);


Hyperslab selection

To select part of a dataspace:herr_t H5Sselect_hyperslab(hid_t space_id, H5S_seloper_t op,

const hsize_t *start, const hsize_t *stride,

const hsize_t *count, const hsize_t *block);

space_id identifies the dataspace

op is H5_SELECT_SET or additive operation

start is the offset of the 1st element in space_id

count is the number of block to select

stride is the distance in space_id between elems

NULL means 1

block is the size of the block

NULL means 1(x1x1x…)


Hyperslab examples

This is ok This is outside the

dataspace


Hyperslab examples


Parallel interface (MPI) The parallel interface requires MPI and MPI-IO

First operation is to open a parallel file with an MPI communicator It returns a file handle to be used for future access to the file

All processes are required to participate in the collective ParallelHDF5 API Different files can be opened using different communicators

Examples of what you can do with the Parallel HDF5 collective API: File Operation: create, open and close a file

Object Creation: create, open, and close a dataset

Object Structure: extend a dataset (increase dimension sizes)

Dataset Operations: Write to or read from a dataset

Once a file is opened by the processes of a communicator: All parts of the file are accessible by all processes

All objects in the file are accessible by all processes

Multiple processes write to the same dataset

Each process writes to a individual dataset


Parallel file operations 1

We must use the property list feature to setup a

parallel access list for the file, then we bind the

access list to the MPI communicator:

herr_t H5Pset_fapl_mpio( hid_t fapl_id, MPI_Comm

comm, MPI_Info info );

Example:

MPI_Comm comm = MPI_COMM_WORLD;

MPI_Info info = MPI_INFO_NULL;

MPI_Init(&argc, &argv);

plist_id = H5Pcreate(H5P_FILE_ACCESS);

H5Pset_fapl_mpio(plist_id, comm, info);



For parallel write/read of a dataset, PHDF5 uses the

hyperslab concept

Now each process define its own contribution to the

dataset

There are two strategies in PHDF5 for each trasfer:

IndividualEvery MPI process transfers

Collettive Buffering algorithm in PHDF5



One strategy is to write by contiguos slabs


Contiguos hyperslabsint dimsf[2] = { 8,5 };

count[0] = dimsf[0]/mpi_size;

count[1] = dimsf[1];

offset[0] = mpi_rank * count[0];

offset[1] = 0;

filespace = H5Dget_space(dset_id);

H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,count, NULL);

hid_t plist_id = H5Pcreate(H5P_DATASET_XFER);

H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);

hid_t memspace = H5Screate_simple(2, count, NULL);

status = H5Dwrite(dset_id, H5T_NATIVE_INT, memspace, filespace,plist_id, data);


Regularly spaced Using two processes that write to the same dataset

each writing to every other column in the dataset.

For each process the hyperslab in the file is set up as follows: count[0] = 1;

count[1] = dims_mem[1];

offset[0] = 0;

offset[1] = mpi_rank;

stride[0] = 1;

stride[1] = 2;

block[0] = dims_file[0];

block[1] = 1;

The stride is 2 for dimension 1 to indicate that every other position along thisdimension will be written to

A stride of 1 indicates that every position along a dimension will be written to

For two processes, the mpi_rank will be either 0 or 1. Therefore: Process 0 writes to even columns (0, 2, 4...)

Process 1 writes to odd columns (1, 3, 5...)


By pattern

Using 4 processes to write the pattern shown below:

Each process defines a hyperslab by:

Specifying a stride of 2 for each dimension, which indicates that you wish

to write to every other position along a dimension.

Specifying a different offset for each process:

Process 0

offset[0] = 0

offset[1] = 0

Process 1

offset[0] = 1

offset[1] = 0

Process 2

offset[0] = 0

offset[1] = 1

Process 3

offset[0] = 1

offset[1] = 1


By chunk

Using 4 processes to write the pattern shown below:

Use the block parameter to specify a chunk of size 4 x 2

Use a different offset (start) for each process, based on

the chunk size:

Process 0

offset[0] = 0

offset[1] = 0

Process 1

offset[0] = 0

offset[1] = 2

Process 2

offset[0] = 4

offset[1] = 0

Process 3

offset[0] = 4

offset[1] = 2


FFT: FFTW and P3DFFT



FFTW FFTW, the Fastest Fourier Transform in the West, is a collection of

fast C routines for computing the discrete Fourier transform (DFT)

FFTW computes the DFT of complex data, real data, even- or odd-symmetric real data (these symmetric transforms are usually knownas the discrete cosine or sine transform, respectively), and thediscrete Hartley transform (DHT) of real data

The input data can have arbitrary length. FFTW employs O(n log n)algorithms for all lengths, including prime numbers

FFTW supports arbitrary multi-dimensional data

FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec,VSX, and NEON vector instruction sets

FFTW includes parallel (multi-threaded) transforms for shared-memory systems

Starting with version 3.3, FFTW includes distributed-memory paralleltransforms using MPI


FFTW: Basic definitions

FFTW uses real and complex numbers

fftw_real, fftw_complex

Allocating an array of N complex elems:

fftw_malloc( sizeof(fftw_complex) * N ) );

Before computing the DFT, some prefactors need to

be calcolated. FFTW uses an object, called a plan,

to store these numbers

Many strategies available: speed vs efficiency


FFTW: 1D complex plan

To create a plan for 1D:

fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in,

fftw_complex *out, int sign, unsigned flags);

n is the dimension of input (and output) array

sign is FFTW_FORWARD(-1), FFTW_BACKWARD (+1)

sign of exp

flags can be one of:

FFTW_MEASURE Some tests to time the best

FFTW_ESTIMATE Euristics for suboptimal plan

FFTW_PATIENT Exhaustive search…


1D complex DFT

Once the plan has been created, you can use it as

many times as you like for transforms on the

specified in/out arrays, computing the actual

transforms via fftw_execute(plan):

void fftw_execute(const fftw_plan plan);

The DFT results are stored in-order in the array out,

with the zero-frequency (DC) component in out[0]

If in != out, the transform is out-of-place and the input

array in is not modified

Otherwise, the input array is overwritten with the

transform


1D real valued DFT

For the 1D DFT of real valued data, create the plan with

fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex

*out, unsigned flags);

n is the number of elems in the array of double *in

real to complex DFTs are always FFTW_FORWARD

For going back from complex space to real space:

fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double

*out, unsigned flags);

n is the number of elems in the array of double *out

c2r DFTs are always FFTW_BACKWARD

The array in is a (n/2+1) vector of fftw_complex

It has space for the Nyquist frequency, with no packing


FFTW and memory layout

FFTW operates on array in row-major ("C" ordering)

memory layout

The same as double[L][M][N];

For dynamic arrays, use linear memory and then

manual indexing

For example, for a 5x12x27 matrix:fftw_complex *array = fftw_malloc(5*12*27 * sizeof(fftw_complex) );

reference the (i,j,k)-th element with the expression

array[k + 27 * (j + 12 * i)]


2D and 3D

Multi-dimensional DFTs of real data use the following

planner routines:

fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,

double *in, fftw_complex *out, unsigned flags);

fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,

double *in, fftw_complex *out, unsigned flags);

Complex to real have arg in swapped with out

They always refers to dimensions in real space


2D memory layout for real and complex An array of 2D real data has dimensions n0 × n1 (in row-major

order)

After an r2c transform, the output is an n0 × (n1-1/2 + 1) arrayof fftw_complex values in row-major order

For out-of-place transforms, this is the end of the story

For in-place transforms, however, extra padding of the real-data array is necessary the complex array is larger than the real array, and the two arrays

share the same memory locations

Thus, for in-place transforms, the final dimension of the real-data array must be padded with extra values to accommodatethe size of the complex data two values if the last dimension is even and one if it is odd.

the last dimension of the real data must physically contain 2 * (n1-/2+1) double values


2D memory layout for real

An array of 2D real data has

dimensions n0 × n1 (in row-major

order)

Array of n0 × (n1-1/2 + 1) array of

fftw_complex values

•in row-major order

An array of 2D real data has

dimensions n0 × (n1-1/2 + 1)

•in row-major order


Parallel FFTW with MPI FFTW uses a 1d block distribution of the data, distributed

along the first dimension

For example, if you want to perform a 100 × 200 complex DFT,distributed over 4 processes, each process will get a 25 × 200slice of the data

It is critical that you allocate the storage size that is returnedby ‘fftw_mpi_local_size’, which is not necessarily the size ofthe local slice of the array Intermediate steps of FFTW’s algorithms involve transposing the

array and redistributing the data, so at these intermediate steps FFTW may require more local storage space


MPI 3d real to complex Obtain the local dimension of input data for a LxMxN global 3d FFT, large enough to

store the complex result of size LxMx(N/2+1):

int alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD, &local_n0, &local_0_start);

Allocate your local memory, real and complex:

double *rin = fftw_alloc_real(2 * alloc_local);

fftw_complex *cout = fftw_alloc_complex(alloc_local);

Create a plan for out-of-place r2c DFT:

fftw_plan plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD, FFTW_MEASURE);

Refer to your local memory using values previous from fftw_mpi_local_size_3d:

for (i = 0; i < local_n0; ++i)

for (j = 0; j < M; ++j)

for (k = 0; k < N; ++k)

rin[ (i*M + j) * (2*(N/2+1)) + k ] = my_func(local_0_start+i, j, k);

Execute the FFT:

fftw_execute(plan);


P3DFFT Parallel Three-Dimensional Fast Fourier Transforms, dubbed P3DFFT, is a library for large-scale

computer simulations on parallel platforms

This project was initiated at San Diego Supercomputer Center (SDSC) at UC San Diego by itsmain author Dmitry Pekurovsky, Ph.D.

P3DFFT uses 2D decomposition. This overcomes an important limitation to scalability inherent inFFT libraries implementing 1D decomposition: the number of processors/tasks used to run thisproblem in parallel can be as large as N2, where N is the linear problem size. This approach hasshown good scalability up to ½ million cores.

P3DFFT is written in Fortran90 and is optimized for parallel performance. It uses MPI forinterprocessor communication, and starting from v.2.7.5 there is a multithreading option for hybridMPI/OpenMP implementation. C interface is available

This package depends on a serial FFT library such as FFTW or IBM's ESSL

In the forward transform, given an input of an array of 3D real values, an output of 3D complexarray of Fourier coefficients is returned. Current features include:

real-to-complex/complex-to-real FFT in 3D

real-to-complex FFT in 2D followed by sine/cosine/Chebyshev/empty transform, and the reverse for backward transform.

pruned transforms (less than full input or output)

in-place or out-of-place transforms

multi-variable transforms

multithreaded version (MPI/OpenMP)


P3DFFT memory layout in real space

P3DFFT employs 2D blockdecomposition wherebyprocessors are arranged into a2D grid P1 x P2, based on theirMPI rank.

Y and Z dimensions of the 3Dgrid are block-distributed acrossthe processor grid

X dimension of the grid remainsundivided, contained entirelywithin local memory

Memory is "Fortran" ordering In C eqv is double[NZ][NY][NX];


P3DFFT memory layout in Fourier space

The output array for the forward transform (and the

input array of the backward transform) contains

(Nx/2+1) times Ny times Nz complex numbers

X and Y dimensions of the 3D grid are block-

distributed across the processor grid

Z dimension of the grid remains undivided, contained

entirely within local memory


Memory configurations in Fourier

Memory layout for the complex array type depends onhow the P3DFFT library was built

By default, it preserves the ordering of the real array, i.e.(X,Y,Z)

It is possible to have Z dimension contiguous, i.e. amemory layout (Z,Y,X): this often results in betterperformance of P3DFFT transforms themselves. The(Z,Y,X) layout can be triggered by building the library with-DSTRIDE1

On a processor mesh of M1xM2 processes:

Physical space Fourier space

STRIDE1 defined

STRIDE1

undefined

Nx, Ny/M1, Nz/M2

Nx, Ny/M1, Nz/M2

Nz, Ny/M2, (Nx+2)/(2M1)

(Nx+2)/(2M1), Ny/M2, Nz


Initialization

Before using the library it is necessary to call an

initialization routine 'p3dfft_setup':

p3dfft_setup(int *dims, int *nx,int *ny,int *nz,

int *comm, int *nxc, int *nyc, int *nzc, int *ow, int *memsize);

dims[2] contains P1xP2 processor mesh size

*nx, *ny, *nz is the global 3D size

*comm is the MPI communicator to use

*ow is 1 when doing an in-place trasform

memsize[3] declares how many components to retain

different only in pruned transforms


P3DFFT local space

After the initialization phase, the size of the local

portion of the global 3D space should be obtained by

P3DFFT:

p3dfft_get_dims(int start[3], int end[3], int size[3], int ip);

The output array start[3], end[3], size[3] will contain

the size of local array in the z,y,x dir

ip select the direction

ip=1 for dimensions of the real array

ip=2 for dimensions of the complex array

ip=3 for dimensions large enough for in-place FFT


Real to Fourier

Forward transform is done by:

p3dfft_ftran_r2c(double *in, double *out, char op[3]);

in and out can be the same memory

for in-place FFT, memory should be big enough

op is a 3-letter string to select ops in x,y,z:

op[0] = op[1] = 'f' for FFT

op[2] can be

'f' FFT also in z

's' Sine transform in z

'c' Cosine transorm in z

'0' No operation in z


Fourier to real

Backward transform is done by:

p3dfft_ftran_c2r(double *in, double *out, char op[3]);

in and out can be the same memory

for in-place FFT, memory should be big enough

op is a 3-letter string to select ops in x,y,z:

op[0] = op[1] = 'f' for FFT

op[2] can be

'f' FFT also in z

's' Sine transform in z

'c' Cosine transorm in z

'0' No operation in z


Modern tools to manage multi-developers...

Documents

Transcript of Modern tools to manage multi-developers...