Modern tools to manage multi-developers...
Transcript of Modern tools to manage multi-developers...
Modern tools to manage
multi-developers codes
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
Outline
Code development
Git versioning system
CMake
FTMake
HDF5
FFTW
P3DFFT
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Code development
From source to program
Compilation and compiler
Linking
Make and makefile
Source code with many developers
Code repository
Versioning
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Git versioning system
clone
add
commit
push
pull
Branching
checkout
branch
Merge
merge
Conflicts
Founding changes
diff
difftool
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
CMake
Purpose of CMake
Multi-platform
Generators
CMakeLists.txt
Simple program
Adding source files
Adding library deps
Library
Adding source files
Target dependency
Advanced usage
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FTMake: Advanced example based on
CMake
Algorithm selection
CMakeLists.mine
parameter.xml
Portability layer
Machine and user profile
Versioning support
Auto update
Syncing with master branch
Prevents user branch divergence
Tag local variations
Unique id for current build: algo and code
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
HDF5
Description
Data types
Hierarchy
Group
Dataset
Attributes
Traversing
Files/Group/Datasets
Reading/Writing
Selection
Parallel with MPI
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FFTW
Introduction
Coefficients: Plan
Data layout
Real data
Complex data
2D / 3D
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
P3DFFT
Introduction
Memory configuration
API
Setup
Transform
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Code development
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
From source to program
Programming languages are high level way of
speaking to a computer
Low-level
Little to no abstraction over the machine ISA
Assembly, forth, ...
High level
Near to human way of thinking
Fortran, C, C++, Java, Python, ...
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Compiled vs interpreted
Using a high level language require a translation
phase
Once for all: compilation into machine code
Long operation
Optimizations and better running time
Not portable result
At every execution: interpreted
Fast edit/run cycle
Not the best performance for final product
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Fortran,C and C++: compiled
On every source file, the compiler is invoked to
produce an object file containing machine code
in this step, many optimizations and source
transformations can be done
All the object files are put together by the linker to
get the executable
Needed libraries are collected here
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Make
Make is a common tool to automate the chain:
editing source file -> compile into obj -> link
Its input is a makefile which describes the rules for
descending the chain
Name of the target product and its needed input files
Compiler and flags which are used
Linker and libraries
At every change (modification time of file) when you
invoke make, it runs only the needed action
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Example Makefile
flux.o: flux.c
cc -O3 flux.c -c -I/mypath/for/includes
main.o: main.c
cc -O3 main.c -c -I/myotherpath/for/includes
flux: flux.o
cc flux.o main.o -o Prg -lm -L/mypath/for/fftw -l fftw
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Distributed version control
Version control systems are a category of software
tools that help a software team manage changes to
source code over time
Version control software keeps track of every
modification to the code in a special kind of database
(the code repository)
If a mistake is made, developers can go back to a
previous version and compare the code to help fix the
mistake
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Concurrent development
Software developers working in teams are
continually writing new source code and changing
existing source code.
The code for a project is typically organized in a
folder structure or "file tree".
One developer in the team may be working on a new
feature while another developer fixes an unrelated
bug by changing code, each developer may make
their changes in several parts of the file tree
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Fixing conflicts Version control tracks every individual change by each
contributor and helps preventing concurrent work fromconflicting
Changes made in one part of the software can beincompatible with those made by another developer working atthe same time. This problem should be discovered and solvedin an orderly manner without blocking the work of the rest ofthe team
Further, in all software development, any change canintroduce new bugs on its own.
Testing and development proceed together until a new versionis ready
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Git versioning system
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
Git commands
clone
add
commit
push
pull
Branching
checkout
branch
Merge
fast-forward
3 way
Conflicts
Examining the repo
status
diff
difftool
Discarding changes
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Git: fast versioning system
Git was created to overcome difficulties with other
versioning tools while coding the Linux kernel
Worldwide spread software team
Big number of concurrent feature sub-teams
If it works for Linux, it can scale up
Git imposes no overhead compared to zip all files
and numbering the zips
It’s way more than this
• It can be used even in small projects
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Cloning the repository
The repository is the database which will contain all
the versions of the code
It can be created locally
git init
Normally you clone a central repository
git clone [email protected]
Now you have an equivalent copy, locally
local repository
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Git config
Git tracks every commit, securing it with a SHA-1
hash as an integrity check
Every commit is by one author, but who am I?
git config --global user.name fabio
git config --global user.mail [email protected]
There is also the choice to have per-repo options
git config [--local]
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Normal workflow
You edit a file
You tell git you want a new version for this file
Repeat for all related files
main.c, sub1.c, sub2.c
Commit the changes explaining them in a comment
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
git add
Git tracks changes of every file (under its control)
In your working directory there can be files outside git
control: obj, exe, libs, ...
You notify git of a change in FileName with
git add FileName
This command does a backup (stage) of FileName
for future use (in a commit)
Think to your zip file
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
git commit
When you need to finalize a version, or a little new
step, you do a commit
git commit [-m Comment for this ver]
At this point all notified changes are grouped and
named with a comment and a SHA-1 number
Now your zip has an official number
The SHA-1 number is the way to refer to the commit
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
git add and commit
The git add and git commit commands composethe fundamental Git workflow
Developing a project is all around the basicedit/stage/commit pattern
Very important is the comment for the commit
Since it will be used some day in the future to look backand recover from errors/bugs, it’s crucial to explain theintent of the commit
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Viewing the history of your project
Every commit adds up in the project, each one in its
branch
To see the entire history up to current version:
git log
This command will output the list of commits
For an ascii-art graph:
git log --oneline --all --graph
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
GUI front-end to git
Command line git is tough
Gui programs to interact with a git repo
Linux: qgit, gitk, git gui, ...
Windows: Git
MacOS: SourceTree, ...
Way better looking history graph
Simple presentation of differences in working dir
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Log with GUI tools
Native git look
Same repository using
QGit on Linux
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
git push
After a commit is in your local repo, you can forward
to the central repository
git push
It will make your progress available to the other
developers
This is where problems can arise
Concurrent changes in the same files
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Centralized workflow This kind of workflow uses a central repository to serve as the
single point-of-entry for all changes to the project
The default development branch is called master and allchanges are committed into this branch. This workflow doesn’trequire any other branches besides master
Developers work in their own local repo, they edit files andcommit changes
To publish changes to the official project, developers “push”their local master branch to the central repository
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Centralized workflow 2
Git preserves the sequence of commits
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
git pull
To synchronize your local repo with the remote:
git pull (--rebase)
--rebase will avoid a merge [More later]
This command will get from remote the commits
made by others
This will give the chance for a push!
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Centralized workflow 2
Now it is ok!
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Branch
A branch represents an independent line of
development
You can think of them as a way to request a brand
new working directory, staging area, and project
history
New commits are recorded in the history for the
current branch, which results in a fork in the history
of the project.
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Branch
To create a new branch from current commit, the
command is:
git branch BranchName
It's important to understand that branches are just
pointers to commits
When you create a branch, all Git needs to do is
create a new pointer
it doesn’t change the repository in any other way
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Branches: example
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
checkout
To start using a branch, you check it out:git checkout BranchName
The creation of a branch followed by a checkout canbe done in one command:
git checkout -b BranchName
All the following commits will belong to the branch,with no interference to the other branches This can be a way to test ideas
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge
What if a feature you developed in a branch is worth tostay?
Porting back commits of one branch to another is calledmerginggit merge BranchToBeMergedHere
Merging is Git's way of putting a forked history, backtogether again
You take the independent lines of development createdby git branch and integrate them into a single branch
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge: fast-forward
When there is a linear path to your current branch
and the commits from the branch to merge, git
simply “moves forward” the history line
After the merge, it’s like everything happened in a
single branch
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge: fast-forward
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge: three-way
When fast-forward cannot be applied, git uses a 3-
way merge, doing some magic surgery at the history
line
Git recovers the last commit of the current branch,
the last commit of the branch to be merged and their
common father
This leads to 3 commits merge -> 3-way merge
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Branch big_Feature merged into masterHPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Rebase vs Merge
Merge keeps both commit history Rebase rewrite commits
DON'T DO on public commits!
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge: conflict
If 3-way merge also fails -> Conflict
Git raises hands up, and it’s up to you
Same cycle: edit/add/commit to solve the problem
You must choose what you want to be the resulting
version
It’s safer to have TESTS to check everything is ok
after a merge
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Examining the local repo
To examine the status of your local branch:
git status
It will show if there are differences in your files under
version control, showing also everything that is not
versioned
If you want to avoid extraneous files:
git status -uno
This skips also not yet added source files!
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Differences in files
To show the difference between your currentimplementation and previous version:git diff FileName
Textual tool, based on the UNIX command diff
For a graphical visual comparison:
git difftool FileName
Several tools can be configured to be used with git
tkdiff, vimdiff, ...
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Differences with cmdline
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Differences with GUI
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Merge: solving a conflict
You inspect your current situation with
git status
You see the conflicting changes with
git diff (or better with git difftool)
You edit the file deciding the final version
You add the result file
Repeat for every file with conflict
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Discarding current changes
To recover the previous version of a file:
git checkout -- FileName
This command destroys every change you did to this
file
You can recover a file at same version, using the
SHA1 of the commit:
git checkout sha1 FileName
Also changes your copy of FileName
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Back to a previous commit
To recover the previous version of the project:git checkout sha1
This command recovers the state of particularversion (identified by its sha1)
You can compile, run tests,... to clarify how things was atthat point
Checking out a commit makes the entire workingdirectory match that commit You can go back safely to your state
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
Checkout commit vs checkout files
Checking out a commit is a read-only operation
Checking out a file lets you use an old version of that
particular file, leaving the rest of your working
directory untouched
It will show in git as “Change to be committed”
HPC Applications to
Turbulence and Complex
Flows
F.Bonaccorso, HPC-LEAP
CMake
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
Building system
Many project use Makefiles to tackle the task of
compiling the executable when source changes
This is a solution from the UNIX world
Pro:
Minimal rebuilts, textual file, full customizable
But...
Hard to generalize
Not cross-platform
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Toward a general Makefile
Unix / Linux world
Autotools (includes autoconf)
qmake (from the QT library)
Portable
CMake
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
CMake
CMake is a cross-platform generator
Can produce Makefiles, Visual Studio solutions, Eclipse
projects, ...
It has a simple syntax and needs 1 file
CMakeLists.txt
With two extra features:
CTest Automate testing
CPack For the creation of installation package
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Basic usage of CMake
In a basic use, we need this CMakeLists.txt:
project(HelloWorld)
add_executable(hello hello.c)
Then we can run cmake:
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Project with two progr languages
We have C and C++ source files, so:
project(HelloWorld)
enable_language(CXX)
add_executable(hello hello.c main.cpp)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Other generators: Eclipse
To list available generators:
cmake -G
The output is platform dependant, so in Linux we have:
Unix Makefiles
Ninja
Watcom WMake
CodeBlocks - Ninja CodeBlocks - Unix Makefiles
CodeLite - Ninja CodeLite - Unix Makefiles
Eclipse CDT4 - Ninja Eclipse CDT4 - Unix Makefiles
KDevelop3 KDevelop3 - Unix Makefiles
Kate - Ninja Kate - Unix Makefiles
Sublime Text 2 - Ninja Sublime Text 2 - Unix Makefiles
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Multi/language example in Eclipse CDT
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Multi/language example in Eclipse CDT
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
String variables
To set a variable to a string value:
set(varName Value)
To print a message to the console
message(“String")
Variable are referred with {varName}, so in a
message:
message(“The variable is {varName}”)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Compiler and linker options
In CMake there are special named variables for
controlling some behaviour: CMAKE_Fortran_FLAGS, CMAKE_C_FLAGS,
CMAKE_CXX_FLAGS for the compiler flags
For each one the debug/release version
CMAKE_C_FLAGS_DEBUG, CMAKE_C_FLAGS_RELEASE
For example, we can set:
set(CMAKE_C_FLAGS ${CMAKE_Fortran_FLAGS} -
Wall")
set(CMAKE_C_FLAGS_DEBUG "-O0 -g")
set(CMAKE_C_FLAGS_RELEASE "-O3")
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Default configuration
It’s useful to define a default build type:
if(NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE "Release")
endif()
We can choose the DEBUG config at command line:
cmake -DCMAKE_BUILD_TYPE=Debug ..
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Libraries
We can structure the project to use a library:
project( mylibrary )
set( mylib_SRCS library.c )
add_library( my [SHARED] ${mylib_SRCS} )
The keyword SHARED changes from static to
dynamic library
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Simple library
Building the library:
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Project with one library
To build a library used by the main program, we
should combine the add_executable and
add_library in CMakeLists.txt:
PROJECT( myproj )
SET( mylib_SRCS library.c )
ADD_LIBRARY( my ${mylib_SRCS} )
add_executable(hello hello.c)
target_link_libraries(hello my)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Project with one library
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
List Variables and loop
To set a variable to a list, simply list the elements with a space separator, eg:
set(FRUITS Apple Banana Orange Kiwi Mango)
To loop over the elements of list variable:
foreach(varName ${varListName}) / endforeach()
A simple example with message:
foreach(fruit ${FRUITS})
message("${fruit} is a tasty fruit")
endforeach()
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
If & then
The language of CMake is powerful enough to include the usual if/then/else construct, leading to complex CMakeLists.txt
Example:
if(ENABLE_MPI)
message("MPI is enabled")
else()
message("MPI is disabled")
endif()
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Using external libraries
CMake has the built-in ability to use libraries in the
default installation path (platform dependant)
For the non-standard case it has a special keyword:
Headers: INCLUDE_DIRECTORIES
Libraries: FIND_LIBRARY and link with the
result of it
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Going further
Built using INCLUDE_DIRECTORIES and
FIND_LIBRARY in CMake there are ways to search
for wide-spread software libraries:
FIND_PACKAGE( LibName [REQUIRED] )
The keyword REQUIRED tells CMake to abort if it
failed to find the library
Example:
FIND_PACKAGE( MPI REQUIRED )
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FTMake: advanced build system
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
Introduction
Based on CMake, FTMake is a portable build system
used in our research group
Main features
Portability layer
Algorithm selection
Policy for the versioning system
Testing framework
UUID identification of builds (in progress)
F.Bonaccorso, HPC-LEAP
Portability layer
FTMake is made to run codes on a wide range of
platforms
PC (Win/MacOS/Linux)
Little clusters of Linux servers
Computer centers
The required libraries are handled via standard
CMake-based tecniques…
They correctly work on PC
Computer centers mosly provide alternative versions
of libraries
For different compilers, legacy versions…
F.Bonaccorso, HPC-LEAP
Portability layer
In FTMake, there are two ways to customize the
path for some binaries / libraries
Host name based
User name based
The central list of customization if the file
FTMAKE_DIR/cmake/hooks.cmake
They are added under the folder
FTMAKE_DIR/cmake/HOST-xxxx.cmake
FTMAKE_DIR/cmake/USER-xxxx
F.Bonaccorso, HPC-LEAP
Host name based configuration
In hooks.cmake, the variable
HOOKS_KNOWN_HOST is set to the list of extra
configuration host-based files:
set(HOOKS_KNOWN_HOST droemu.roma2.infn.it fen01 fen02
…)
For each entry there is a corresponding file:
cmake/HOST-droemu.roma2.infn.it.cmake
cmake/HOST-fen01.cmake
cmake/HOST-fen02.cmake
F.Bonaccorso, HPC-LEAP
Example HOST-xxxx.cmake
message(STATUS "You are on fen01/02/07/08 aka fermi front-end")
set(TIMING "YES")
set(WANT_CURL "NO")
set(PROF_TIMING "NO")
set(MPI_C_FOUND TRUE)
set(MPI_C_LIBRARIES "m")
set(MPI_C_INCLUDE_PATH " ")
set(MPI_Fortran_FOUND TRUE)
set(MPI_Fortran_LIBRARIES "m")
set(MPI_Fortran_INCLUDE_PATH " ")
F.Bonaccorso, HPC-LEAP
Algorithm selection
FTMake implement the selection of the flags needed
by C codes
In parameter.xml (in the source dir) there is a
hierarchical listing of all the flags of the code
For each flag, there is the name and optionally:
if needs a variable (type, optional/not)
if the program needs one more source file
F.Bonaccorso, HPC-LEAP
Policy for the versioning system
With git you can work in a feature branch, without
disturbing the master branch
At some point the feature will be merged back
When time passes the merge will become more and
more likely to conflict
FTMake implements a policy to merge the master
branch into the feature branch
F.Bonaccorso, HPC-LEAP
Testing framework and continous
integration
FTMake also enforces the use of testing framework
based on CTest
At every commit, a central gitlab server is notified to
schedule a test suite
F.Bonaccorso, HPC-LEAP
UUID identification of builds
Every (successful) compilation is labeled with a
unique string (UUID)
This UUID identifies the particular combination of
flags, repository branch and software used to build
the code
A central database is notified…
The simulation experiments can be related to the
UUID
F.Bonaccorso, HPC-LEAP
HDF5 library
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
HDF5: Introdution Hierarchical Data Format 5 (HDF5) is an open source technology
suite for managing data collections of all sizes and complexity
HDF5 was specifically designed for: high volume and/or complex data (but can be used for low volume and
simple data)
every size and type of system (portable)
flexible, efficient storage and I/O
Fast on parallel machines
HDF5 is similar to XML… files are self-describing and allow users to specify complex data
relationships and dependencies
…But HDF5 files can contain binary data and allow direct access toparts of the file without first parsing the entire contents.
It is cross-platform, with serial and MPI API Fortran, C bindings
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Hierarchy of HDF5 files
Modeled after the directory/file structure, an HDF5
file constists of
groups (analogous to dirs, can be nested)
datasets (contain data)
attributes (carries info about data, meta-data)
There name of the dataset contains its groups:
/Euler/velocity
/Particle/positions
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Tools: h5ls, h5dump
To list the content of an HDF5 file:
h5ls [-r] FileName
Example output:
$ h5ls -r test.h5
/ Group
/IntArray Dataset {8, 5}
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Tools: h5ls, h5dump
To see the content of a HDF5 file:
h5dump [options] FileName
Example:
$ h5dump -H test.h5
HDF5 "test.h5" {
GROUP "/" {
DATASET "IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 8, 5 ) / (
8, 5 ) }
}
}
}
$ h5dump test.h5
HDF5 "test.h5" {
GROUP "/" {
DATASET "IntArray" {
DATATYPE H5T_STD_I32LE
DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) }
DATA {
(0,0): 10, 10, 10, 10, 10,
(1,0): 10, 10, 10, 10, 10,
(2,0): 11, 11, 11, 11, 11,
(3,0): 11, 11, 11, 11, 11,
(4,0): 12, 12, 12, 12, 12,
(5,0): 12, 12, 12, 12, 12,
(6,0): 13, 13, 13, 13, 13,
(7,0): 13, 13, 13, 13, 13
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
File operations
In HDF5 you need to use the concept of file
Same as the Standard C Library
Instead of fopen/fclose
hid_t H5Fopen (char *name, unsigned flags, hid_t);
hid_t H5Fcreate (char *name, unsigned flags, hid_t ,
hid_t);
herr_t H5Fclose (hid_t);
The returned hid_t is the IDentifier analogous to
FILE* of the StdLibC
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Dataset
Data (vector, matrix) must be written in a dataset
A dataset has:
rank (1,2,3,… for 1D,2D,3D,… data)
type (H5_NATIVE_[INT|FLOAT|DOUBLE])
size(10 elems, 3x2 elems, 4x4x4 elems)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
HDF5 Lite: make_dataset
Using the Lite API, this is all we need to write some data:hid_t file_id;
hsize_t dims[2]={2,3};
int data[6]={1,2,3,4,5,6};
/* create a HDF5 file */
file_id = H5Fcreate ("ex_lite1.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT);
/* create and write an integer type dataset named "dset" */
H5LTmake_dataset(file_id,"/dset", 2, dims, H5T_NATIVE_INT,data);
/* close file */
H5Fclose (file_id);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
HDF5 Lite: read_dataset To read back:
int data[6];
hsize_t dims[2];
/* open file from ex_lite1.c */
hid_t file_id = H5Fopen ("ex_lite1.h5", H5F_ACC_RDONLY, H5P_DEFAULT);
/* read dataset */
H5LTread_dataset(file_id,"/dset", H5T_NATIVE_INT , data); /*data[]={1,2,3,4,5,6}*/
/* get the dimensions of the dataset */
H5LTget_dataset_info(file_id,"/dset",dims,NULL,NULL); /* nowdims[2]={2,3} */
/* close file */
H5Fclose (file_id);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
The low level API
When you need finer control, you explicitly take care
of creating/opening + closing a:
group H5Gcreate/H5Gopen + H5Gclose
dataset H5Dcreate/H5Dopen + H5Dclose
Each create or close function exists with 2 names
create1/open1 For old v1.6 HDF5
create2/open2 From HDF5 v1.8
The create2 and open2 forms take more parameters
(property list)
H5P_DEFAULT can always be used
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Dataspace
A dataspace describes the dimensionality of the dataarray
A dataspace is a regular N-dimensional array of datapoints, called a simple dataspace
a more general collection of data points exists, called acomplex dataspace
The dimensions of a dataset can be fixed(unchanging), or they may be unlimited, whichmeans that they are extensible
A dataspace can also describe a portion of adataset, making it possible to do partial I/Ooperations on selections
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Creating dataspaces
To create a dataspace:
hid_t H5Screate_simple(int rank, const hsize_t *dims,
const hsize_t * maximum_dims);
The parameter maximum_dims can be NULL
Fixed size equal to dims
Example, 4x6 elems:
hsize_t dims[2] = {4,6};
hid_t dataspace_id = H5Screate_simple(2, dims, NULL);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Creation of a dataset
Define the dataset characteristics:
Define a datatype or specify a pre-defined datatype
A user defined datatype is needed dealing with structures
Define a dataspace
Specify the property list(s) or use the default For the storage: contiguos, chunked, compressed
Fixed or Extensible
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
H5Dcreate
To create a dataset:
hid_t H5Dcreate2( hid_t loc_id, const char *name, hid_t dtype_id, hid_t space_id, hid_t lcpl_id, hid_t dcpl_id, hid_t dapl_id );
loc_id can be a file_id or a group_id
dcpl_id and dcpl_id are the creation and access datasetproperty list for non-default features
Example:
hid_t dims[] = { 4,6};
hid_t dataspace_id = H5Screate_simple(2, dims, NULL);
/* Create the dataset. */
hid_t dataset_id = H5Dcreate2(file_id, "/dset", H5T_STD_I32BE,
dataspace_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
H5Dopen
To use a pre-existant dataset, we open it:
hid_t H5Dopen2 (hid_t loc_id, const char *name, hid_t
dapl_id );
loc_id can be a file_id or a group_id
name is relative to "/" or to the group
dapl_id is the access dataset property list for non-
default features
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
H5Dwrite
To write actual data to a dataset:
herr_t H5Dwrite(hid_t dataset_id, hid_t mem_type_id,
hid_t mem_space_id, hid_t file_space_id,
hid_t xfer_plist_id, const void * buf);
dataset_id is the dataset
mem_type_id is the type of the data in memory
mem_space_id and file_space_id are the memory
and file dataspace selections or H5P_ALL
xfer_plist_id is the plist of the transfer or
H5P_DEFAULT
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
H5Dwrite: example
/* Open an existing file. */
hid_t file_id = H5Fopen(FILE, H5F_ACC_RDWR,H5P_DEFAULT);
/* Open an existing dataset. */
hid_t dataset_id = H5Dopen2(file_id, "/dset",H5P_DEFAULT);
/* Write the dataset. */
int dset_data[6] = { 1,2,3,4,5,6 };
status = H5Dwrite(dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
H5Dread
To read from a dataset:
herr_t H5Dread(hid_t dataset_id, hid_t mem_type_id,
hid_t mem_space_id, hid_t file_space_id,
hid_t xfer_plist_id, void * buf);
Example:
status = H5Dread(dataset_id, H5T_NATIVE_INT,
H5S_ALL, H5S_ALL, H5P_DEFAULT, dset_data);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
HDF5 library usage
During read or write HDF5 library perform type
conversion
This is the place which makes it cross-platform
This conversion can be slow
Files, groups, datasets, … MUST be closed when
not needed
Memory leak and strange behaviour otherwise
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Selection: partial write
Imagine a 2d dataset:
• You want a partial
write:
We select and writes to a 3 x 4 subset of the dataset
with an offset of 1 x 2
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Partial write: example/* Create memory space with size of subset. Get file dataspace
and select subset from file dataspace. */
int dimsm[2] = { 3,4 };
hid_t memspace_id = H5Screate_simple (RANK, dimsm,NULL);
hid_t dataspace_id = H5Dget_space (dataset_id);
int offset[2] = { 1,2 };
int count[2] = { 3,4 };herr_t status = H5Sselect_hyperslab(dataspace_id, H5S_SELECT_SET,
offset,NULL, count,NULL);
/* Write a subset of data to the dataset */
status = H5Dwrite(dataset_id, H5T_NATIVE_INT, memspace_id,
dataspace_id, H5P_DEFAULT, sdata);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Hyperslab selection
To select part of a dataspace:herr_t H5Sselect_hyperslab(hid_t space_id, H5S_seloper_t op,
const hsize_t *start, const hsize_t *stride,
const hsize_t *count, const hsize_t *block);
space_id identifies the dataspace
op is H5_SELECT_SET or additive operation
start is the offset of the 1st element in space_id
count is the number of block to select
stride is the distance in space_id between elems
NULL means 1
block is the size of the block
NULL means 1(x1x1x…)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Hyperslab examples
This is ok This is outside the
dataspace
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Hyperslab examples
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Parallel interface (MPI) The parallel interface requires MPI and MPI-IO
First operation is to open a parallel file with an MPI communicator It returns a file handle to be used for future access to the file
All processes are required to participate in the collective ParallelHDF5 API Different files can be opened using different communicators
Examples of what you can do with the Parallel HDF5 collective API: File Operation: create, open and close a file
Object Creation: create, open, and close a dataset
Object Structure: extend a dataset (increase dimension sizes)
Dataset Operations: Write to or read from a dataset
Once a file is opened by the processes of a communicator: All parts of the file are accessible by all processes
All objects in the file are accessible by all processes
Multiple processes write to the same dataset
Each process writes to a individual dataset
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Parallel file operations 1
We must use the property list feature to setup a
parallel access list for the file, then we bind the
access list to the MPI communicator:
herr_t H5Pset_fapl_mpio( hid_t fapl_id, MPI_Comm
comm, MPI_Info info );
Example:
MPI_Comm comm = MPI_COMM_WORLD;
MPI_Info info = MPI_INFO_NULL;
MPI_Init(&argc, &argv);
plist_id = H5Pcreate(H5P_FILE_ACCESS);
H5Pset_fapl_mpio(plist_id, comm, info);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Parallel file operations 2
For parallel write/read of a dataset, PHDF5 uses the
hyperslab concept
Now each process define its own contribution to the
dataset
There are two strategies in PHDF5 for each trasfer:
IndividualEvery MPI process transfers
Collettive Buffering algorithm in PHDF5
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Parallel file operations 3
One strategy is to write by contiguos slabs
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Contiguos hyperslabsint dimsf[2] = { 8,5 };
count[0] = dimsf[0]/mpi_size;
count[1] = dimsf[1];
offset[0] = mpi_rank * count[0];
offset[1] = 0;
filespace = H5Dget_space(dset_id);
H5Sselect_hyperslab(filespace, H5S_SELECT_SET, offset, NULL,count, NULL);
hid_t plist_id = H5Pcreate(H5P_DATASET_XFER);
H5Pset_dxpl_mpio(plist_id, H5FD_MPIO_COLLECTIVE);
hid_t memspace = H5Screate_simple(2, count, NULL);
status = H5Dwrite(dset_id, H5T_NATIVE_INT, memspace, filespace,plist_id, data);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Regularly spaced Using two processes that write to the same dataset
each writing to every other column in the dataset.
For each process the hyperslab in the file is set up as follows: count[0] = 1;
count[1] = dims_mem[1];
offset[0] = 0;
offset[1] = mpi_rank;
stride[0] = 1;
stride[1] = 2;
block[0] = dims_file[0];
block[1] = 1;
The stride is 2 for dimension 1 to indicate that every other position along thisdimension will be written to
A stride of 1 indicates that every position along a dimension will be written to
For two processes, the mpi_rank will be either 0 or 1. Therefore: Process 0 writes to even columns (0, 2, 4...)
Process 1 writes to odd columns (1, 3, 5...)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
By pattern
Using 4 processes to write the pattern shown below:
Each process defines a hyperslab by:
Specifying a stride of 2 for each dimension, which indicates that you wish
to write to every other position along a dimension.
Specifying a different offset for each process:
Process 0
offset[0] = 0
offset[1] = 0
Process 1
offset[0] = 1
offset[1] = 0
Process 2
offset[0] = 0
offset[1] = 1
Process 3
offset[0] = 1
offset[1] = 1
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
By chunk
Using 4 processes to write the pattern shown below:
Use the block parameter to specify a chunk of size 4 x 2
Use a different offset (start) for each process, based on
the chunk size:
Process 0
offset[0] = 0
offset[1] = 0
Process 1
offset[0] = 0
offset[1] = 2
Process 2
offset[0] = 4
offset[1] = 0
Process 3
offset[0] = 4
offset[1] = 2
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FFT: FFTW and P3DFFT
HPC Applications to Turbulence and Complex Flows
Rome, 10-14 October 2016
FFTW FFTW, the Fastest Fourier Transform in the West, is a collection of
fast C routines for computing the discrete Fourier transform (DFT)
FFTW computes the DFT of complex data, real data, even- or odd-symmetric real data (these symmetric transforms are usually knownas the discrete cosine or sine transform, respectively), and thediscrete Hartley transform (DHT) of real data
The input data can have arbitrary length. FFTW employs O(n log n)algorithms for all lengths, including prime numbers
FFTW supports arbitrary multi-dimensional data
FFTW supports the SSE, SSE2, AVX, AVX2, AVX512, KCVI, Altivec,VSX, and NEON vector instruction sets
FFTW includes parallel (multi-threaded) transforms for shared-memory systems
Starting with version 3.3, FFTW includes distributed-memory paralleltransforms using MPI
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FFTW: Basic definitions
FFTW uses real and complex numbers
fftw_real, fftw_complex
Allocating an array of N complex elems:
fftw_malloc( sizeof(fftw_complex) * N ) );
Before computing the DFT, some prefactors need to
be calcolated. FFTW uses an object, called a plan,
to store these numbers
Many strategies available: speed vs efficiency
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FFTW: 1D complex plan
To create a plan for 1D:
fftw_plan fftw_plan_dft_1d(int n, fftw_complex *in,
fftw_complex *out, int sign, unsigned flags);
n is the dimension of input (and output) array
sign is FFTW_FORWARD(-1), FFTW_BACKWARD (+1)
sign of exp
flags can be one of:
FFTW_MEASURE Some tests to time the best
FFTW_ESTIMATE Euristics for suboptimal plan
FFTW_PATIENT Exhaustive search…
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
1D complex DFT
Once the plan has been created, you can use it as
many times as you like for transforms on the
specified in/out arrays, computing the actual
transforms via fftw_execute(plan):
void fftw_execute(const fftw_plan plan);
The DFT results are stored in-order in the array out,
with the zero-frequency (DC) component in out[0]
If in != out, the transform is out-of-place and the input
array in is not modified
Otherwise, the input array is overwritten with the
transform
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
1D real valued DFT
For the 1D DFT of real valued data, create the plan with
fftw_plan fftw_plan_dft_r2c_1d(int n, double *in, fftw_complex
*out, unsigned flags);
n is the number of elems in the array of double *in
real to complex DFTs are always FFTW_FORWARD
For going back from complex space to real space:
fftw_plan fftw_plan_dft_c2r_1d(int n, fftw_complex *in, double
*out, unsigned flags);
n is the number of elems in the array of double *out
c2r DFTs are always FFTW_BACKWARD
The array in is a (n/2+1) vector of fftw_complex
It has space for the Nyquist frequency, with no packing
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
FFTW and memory layout
FFTW operates on array in row-major ("C" ordering)
memory layout
The same as double[L][M][N];
For dynamic arrays, use linear memory and then
manual indexing
For example, for a 5x12x27 matrix:fftw_complex *array = fftw_malloc(5*12*27 * sizeof(fftw_complex) );
reference the (i,j,k)-th element with the expression
array[k + 27 * (j + 12 * i)]
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
2D and 3D
Multi-dimensional DFTs of real data use the following
planner routines:
fftw_plan fftw_plan_dft_r2c_2d(int n0, int n1,
double *in, fftw_complex *out, unsigned flags);
fftw_plan fftw_plan_dft_r2c_3d(int n0, int n1, int n2,
double *in, fftw_complex *out, unsigned flags);
Complex to real have arg in swapped with out
They always refers to dimensions in real space
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
2D memory layout for real and complex An array of 2D real data has dimensions n0 × n1 (in row-major
order)
After an r2c transform, the output is an n0 × (n1-1/2 + 1) arrayof fftw_complex values in row-major order
For out-of-place transforms, this is the end of the story
For in-place transforms, however, extra padding of the real-data array is necessary the complex array is larger than the real array, and the two arrays
share the same memory locations
Thus, for in-place transforms, the final dimension of the real-data array must be padded with extra values to accommodatethe size of the complex data two values if the last dimension is even and one if it is odd.
the last dimension of the real data must physically contain 2 * (n1-/2+1) double values
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
2D memory layout for real
An array of 2D real data has
dimensions n0 × n1 (in row-major
order)
Array of n0 × (n1-1/2 + 1) array of
fftw_complex values
•in row-major order
An array of 2D real data has
dimensions n0 × (n1-1/2 + 1)
•in row-major order
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Parallel FFTW with MPI FFTW uses a 1d block distribution of the data, distributed
along the first dimension
For example, if you want to perform a 100 × 200 complex DFT,distributed over 4 processes, each process will get a 25 × 200slice of the data
It is critical that you allocate the storage size that is returnedby ‘fftw_mpi_local_size’, which is not necessarily the size ofthe local slice of the array Intermediate steps of FFTW’s algorithms involve transposing the
array and redistributing the data, so at these intermediate steps FFTW may require more local storage space
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
MPI 3d real to complex Obtain the local dimension of input data for a LxMxN global 3d FFT, large enough to
store the complex result of size LxMx(N/2+1):
int alloc_local = fftw_mpi_local_size_3d(L, M, N/2+1, MPI_COMM_WORLD, &local_n0, &local_0_start);
Allocate your local memory, real and complex:
double *rin = fftw_alloc_real(2 * alloc_local);
fftw_complex *cout = fftw_alloc_complex(alloc_local);
Create a plan for out-of-place r2c DFT:
fftw_plan plan = fftw_mpi_plan_dft_r2c_3d(L, M, N, rin, cout, MPI_COMM_WORLD, FFTW_MEASURE);
Refer to your local memory using values previous from fftw_mpi_local_size_3d:
for (i = 0; i < local_n0; ++i)
for (j = 0; j < M; ++j)
for (k = 0; k < N; ++k)
rin[ (i*M + j) * (2*(N/2+1)) + k ] = my_func(local_0_start+i, j, k);
Execute the FFT:
fftw_execute(plan);
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
P3DFFT Parallel Three-Dimensional Fast Fourier Transforms, dubbed P3DFFT, is a library for large-scale
computer simulations on parallel platforms
This project was initiated at San Diego Supercomputer Center (SDSC) at UC San Diego by itsmain author Dmitry Pekurovsky, Ph.D.
P3DFFT uses 2D decomposition. This overcomes an important limitation to scalability inherent inFFT libraries implementing 1D decomposition: the number of processors/tasks used to run thisproblem in parallel can be as large as N2, where N is the linear problem size. This approach hasshown good scalability up to ½ million cores.
P3DFFT is written in Fortran90 and is optimized for parallel performance. It uses MPI forinterprocessor communication, and starting from v.2.7.5 there is a multithreading option for hybridMPI/OpenMP implementation. C interface is available
This package depends on a serial FFT library such as FFTW or IBM's ESSL
In the forward transform, given an input of an array of 3D real values, an output of 3D complexarray of Fourier coefficients is returned. Current features include:
real-to-complex/complex-to-real FFT in 3D
real-to-complex FFT in 2D followed by sine/cosine/Chebyshev/empty transform, and the reverse for backward transform.
pruned transforms (less than full input or output)
in-place or out-of-place transforms
multi-variable transforms
multithreaded version (MPI/OpenMP)
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
P3DFFT memory layout in real space
P3DFFT employs 2D blockdecomposition wherebyprocessors are arranged into a2D grid P1 x P2, based on theirMPI rank.
Y and Z dimensions of the 3Dgrid are block-distributed acrossthe processor grid
X dimension of the grid remainsundivided, contained entirelywithin local memory
Memory is "Fortran" ordering In C eqv is double[NZ][NY][NX];
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
P3DFFT memory layout in Fourier space
The output array for the forward transform (and the
input array of the backward transform) contains
(Nx/2+1) times Ny times Nz complex numbers
X and Y dimensions of the 3D grid are block-
distributed across the processor grid
Z dimension of the grid remains undivided, contained
entirely within local memory
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Memory configurations in Fourier
Memory layout for the complex array type depends onhow the P3DFFT library was built
By default, it preserves the ordering of the real array, i.e.(X,Y,Z)
It is possible to have Z dimension contiguous, i.e. amemory layout (Z,Y,X): this often results in betterperformance of P3DFFT transforms themselves. The(Z,Y,X) layout can be triggered by building the library with-DSTRIDE1
On a processor mesh of M1xM2 processes:
Physical space Fourier space
STRIDE1 defined
STRIDE1
undefined
Nx, Ny/M1, Nz/M2
Nx, Ny/M1, Nz/M2
Nz, Ny/M2, (Nx+2)/(2M1)
(Nx+2)/(2M1), Ny/M2, Nz
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Initialization
Before using the library it is necessary to call an
initialization routine 'p3dfft_setup':
p3dfft_setup(int *dims, int *nx,int *ny,int *nz,
int *comm, int *nxc, int *nyc, int *nzc, int *ow, int *memsize);
dims[2] contains P1xP2 processor mesh size
*nx, *ny, *nz is the global 3D size
*comm is the MPI communicator to use
*ow is 1 when doing an in-place trasform
memsize[3] declares how many components to retain
different only in pruned transforms
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
P3DFFT local space
After the initialization phase, the size of the local
portion of the global 3D space should be obtained by
P3DFFT:
p3dfft_get_dims(int start[3], int end[3], int size[3], int ip);
The output array start[3], end[3], size[3] will contain
the size of local array in the z,y,x dir
ip select the direction
ip=1 for dimensions of the real array
ip=2 for dimensions of the complex array
ip=3 for dimensions large enough for in-place FFT
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Real to Fourier
Forward transform is done by:
p3dfft_ftran_r2c(double *in, double *out, char op[3]);
in and out can be the same memory
for in-place FFT, memory should be big enough
op is a 3-letter string to select ops in x,y,z:
op[0] = op[1] = 'f' for FFT
op[2] can be
'f' FFT also in z
's' Sine transform in z
'c' Cosine transorm in z
'0' No operation in z
10-14 Oct 2016F.Bonaccorso, HPC-LEAP
Fourier to real
Backward transform is done by:
p3dfft_ftran_c2r(double *in, double *out, char op[3]);
in and out can be the same memory
for in-place FFT, memory should be big enough
op is a 3-letter string to select ops in x,y,z:
op[0] = op[1] = 'f' for FFT
op[2] can be
'f' FFT also in z
's' Sine transform in z
'c' Cosine transorm in z
'0' No operation in z
10-14 Oct 2016F.Bonaccorso, HPC-LEAP