Introduction Generative Programming NT2 Quaff Conclusion
Generative and Meta-ProgrammingModern C++ Design for Parallel Computing
Joel Falcou
05/02/2011
LRI, University Paris Sud XI
1 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Context
In Scientific Computing ...there is Scientific
Applications are domain driven
Users 6= Developers
Users are reluctant to changes
there is Computing
Computing requires performance ...
... which implies architectures specific tuning
... which requires expertise
... which may or may not be available
The ProblemPeople using computers to do science want to do science first.
2 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Context
In Scientific Computing ...there is Scientific
Applications are domain driven
Users 6= Developers
Users are reluctant to changes
there is Computing
Computing requires performance ...
... which implies architectures specific tuning
... which requires expertise
... which may or may not be available
The ProblemPeople using computers to do science want to do science first.
2 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Context
In Scientific Computing ...there is Scientific
Applications are domain driven
Users 6= Developers
Users are reluctant to changes
there is Computing
Computing requires performance ...
... which implies architectures specific tuning
... which requires expertise
... which may or may not be available
The ProblemPeople using computers to do science want to do science first.
2 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Context
In Scientific Computing ...there is Scientific
Applications are domain driven
Users 6= Developers
Users are reluctant to changes
there is Computing
Computing requires performance ...
... which implies architectures specific tuning
... which requires expertise
... which may or may not be available
The ProblemPeople using computers to do science want to do science first.
2 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Context
In Scientific Computing ...there is Scientific
Applications are domain driven
Users 6= Developers
Users are reluctant to changes
there is Computing
Computing requires performance ...
... which implies architectures specific tuning
... which requires expertise
... which may or may not be available
The ProblemPeople using computers to do science want to do science first.
2 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
The Problem – and how we want to solve it
The FactsThe ”Library to bind them all” doesn’t exist (or we should have it already )
All those users want to take advantage of new architectures
Few of them want to actually handle all the dirty work
The EndsProvide a ”familiar” interface that let users benefit from parallelism
Helps compilers to generate better parallel code
Increase sustainability by decreasing amount of code to write
The MeansGenerative Programming
Template Meta-Programming
Embedded Domain Specific Languages
3 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Talk Layout
1 Introduction
2 Generative Programming
3 NT2
4 Quaff
5 Conclusion
4 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Generative Programming
Domain SpecificApplication Description
Generative Component Concrete Application
Translator
Parametric Sub-components
5 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Generative Programming as a Tool
Available techniquesDedicated compilers
External pre-processing tools
Languages supporting meta-programming
Definition of Meta-programming
Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.
6 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Generative Programming as a Tool
Available techniquesDedicated compilers
External pre-processing tools
Languages supporting meta-programming
Definition of Meta-programming
Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.
6 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Generative Programming as a Tool
Available techniquesDedicated compilers
External pre-processing tools
Languages supporting meta-programming
Definition of Meta-programming
Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.
6 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
From Generative to Meta-programming
Meta-programmable languagestemplate HASKELL
metaOcaml
C++
C++ meta-programmingRelies on the C++ template sub-language
Handles types and integral constants at compile-time
Proved to be Turing-complete
7 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
From Generative to Meta-programming
Meta-programmable languagestemplate HASKELL
metaOcaml
C++
C++ meta-programmingRelies on the C++ template sub-language
Handles types and integral constants at compile-time
Proved to be Turing-complete
7 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
From Generative to Meta-programming
Meta-programmable languagestemplate HASKELL
metaOcaml
C++
C++ meta-programmingRelies on the C++ template sub-language
Handles types and integral constants at compile-time
Proved to be Turing-complete
7 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Embedded Domain Specific Languages
What’s an EDSL ?DSL = Domain Specific Language
Declarative language, easy-to-use, fitting the domain
EDSL = DSL within a general purpose language
EDSL in C++Relies on operator overload abuse (Expression Templates)
Carry semantic information around code fragment
Generic implementation become self-aware of optimizations
Exploiting static ASTAt the expression level: code generation
At the function level: inter-procedural optimization
8 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Expression Templates
matrix x(h,w),a(h,w),b(h,w);
x = cos(a) + (b*a);
expr<assign ,expr<matrix&> ,expr<plus , expr<cos ,expr<matrix&> > , expr<multiplies ,expr<matrix&> ,expr<matrix&> > >(x,a,b);
+
*cos
a ab
=
x
#pragma omp parallel forfor(int j=0;j<h;++j){ for(int i=0;i<w;++i) { x(j,i) = cos(a(j,i)) + ( b(j,i) * a(j,i) ); }}
Arbitrary Transforms appliedon the meta-AST
9 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
What is NT 2 ?
A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users
Provide high-performance computing entities and primitives
Easily extendable
Principles
Take a .m file, copy to a .cpp file
Add #include <nt2/nt2.hpp> and do cosmetic changes
Compile the file and link with libnt2.a
10 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
What is NT 2 ?
A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users
Provide high-performance computing entities and primitives
Easily extendable
PrinciplesTake a .m file, copy to a .cpp file
Add #include <nt2/nt2.hpp> and do cosmetic changes
Compile the file and link with libnt2.a
10 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
What is NT 2 ?
A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users
Provide high-performance computing entities and primitives
Easily extendable
PrinciplesTake a .m file, copy to a .cpp file
Add #include <nt2/nt2.hpp> and do cosmetic changes
Compile the file and link with libnt2.a
10 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
What is NT 2 ?
A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users
Provide high-performance computing entities and primitives
Easily extendable
PrinciplesTake a .m file, copy to a .cpp file
Add #include <nt2/nt2.hpp> and do cosmetic changes
Compile the file and link with libnt2.a
10 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
MATLAB you said ?
R = I(:,:,1);G = I(:,:,2);B = I(:,:,3);
Y = min(abs(0.299.*R+0.587.*G+0.114.*B),235);U = min(abs(-0.169.*R-0.331.*G+0.5.*B),240);V = min(abs(0.5.*R-0.419.*G-0.081.*B),240);
11 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Now with NT 2
table<double> R = I(_,_,1);table<double> G = I(_,_,2);table<double> B = I(_,_,3);table<double> Y, U, V;
Y = min(abs(0.299*R+0.587*G+0.114*B),235);U = min(abs(-0.169*R-0.331*G+0.5*B),240);V = min(abs(0.5*R-0.419*G-0.081*B),240);
12 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Now with NT 2
table<float,settings(shallow,of_size_<640,480>)> R = I(_,_,1);table<float,settings(shallow,of_size_<640,480>)> G = I(_,_,2);table<float,settings(shallow,of_size_<640,480>)> B = I(_,_,3);table<float,settings(of_size_<640,480>)> Y, U, V;
Y = min(abs(0.299*R+0.587*G+0.114*B),235),U = min(abs(-0.169*R-0.331*G+0.5*B),240),V = min(abs(0.5*R-0.419*G-0.081*B),240);
12 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Some Performances
RGB2YUV timing (in cycles/pixels)
Size 128x128 256x256 512x512 1024x1024MATLAB 2010a (2 cores) 85 89 97 102C (1 core) 23.6 23.8 23.9 24.0NT2 (1 core) 22.3 22.4 22.2 24.1NT2 OpenMP (2 cores) 11.4 11.4 11.5 12.2NT2 SIMD 5.5 5.6 5.6 5.9NT2 SIMD+OpenMP 2.9 2.9 2.9 3.0NT2 SIMD speed-up 3.9 3.9 3.9 4.0NT2 OpenMP speed-up 1.92 1.96 1.98 1.95NT2 vs MATLAB speed-up 29.3 30.7 33.5 34
13 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Some real life applications
14 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Parallel Skeletons in a nutshell
Basic Principles [COLE 89]There are patterns in parallel applications
Those patterns can be generalized in Skeletons
Applications are assembled as combination of such patterns
Functionnal point of viewSkeletons are Higher-Order Functions
Skeletons support a compositionnal semantic
Applications become composition of state-less functions
15 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Parallel Skeletons in a nutshell
Basic Principles [COLE 89]There are patterns in parallel applications
Those patterns can be generalized in Skeletons
Applications are assembled as combination of such patterns
Functionnal point of viewSkeletons are Higher-Order Functions
Skeletons support a compositionnal semantic
Applications become composition of state-less functions
15 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Spotting skeletons when you see one
Processus 1
F1
Processus 7
F3
Processus 3
F2
Processus 4
F2
Processus 5
F2
Processus 2
Distrib.
Processus 6
Collect.
FarmPipeline
16 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Objectives
Proposing a C++ skeletons libraryLimit runtime overhead
Use the formal model skeleton as guidelines
17 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Objectives
Proposing a C++ skeletons libraryLimit runtime overhead
Use the formal model skeleton as guidelines
What we want to writerun( pipeline( seq(F1), seq(F2), seq(F3) ) )
17 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Objectives
Proposing a C++ skeletons libraryLimit runtime overhead
Use the formal model skeleton as guidelines
What we want to runif( rank == 0 )
do { out = F1();MPI_Send(&out,1,MPI_INT,1,0,MPI_COMM_WORLD);
} while( isValid(out) );
if( rank == 1 )do { MPI_Recv(&in,1,MPI_INT,0,0,MPI_COMM_WORLD,&s);
out = F2(in);MPI_Send(&in,1,MPI_INT,2,0,MPI_COMM_WORLD);
} while ( isValid(out) )
if( rank == 2 )do { MPI_Recv(&in,1,MPI_INT,1,0,MPI_COMM_WORLD,&s);
F2(in);} while ( isValid(in) )
17 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Design Rationale
Code generation processGenerating the skeleton tree at compile-time
Turning this tree into a process network using MPL
Producing the C+MPI target code using Fusion
C++
PIPE
FARM3φ1 φ3
φ2φ1 φ3
φ2
φ2
f
φ2
GénérationAST
Productiondu RPSC
Générationde code C+MPI C
MPI
18 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Quaff over the CELL
The OCELLE projectIEF/CEA/TRT
Tools for heterog. proc.
MPI and skeleton for the Cell
Developing for the CELLHow to find a good mapping for an application ?
19 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Harris Corner Detector
I
Grad X
Grad Y
Ix
Iy
Mul
Ixx=Ix*Ix
Ixy=Ix*Iy
Iyy=Iy*IyMul
Mul Gauss
Gauss
Gauss
Sxx
Sxy
Syy
coarsitySxx*Syy-Sxy² K
20 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Harris Corner Detector
I
Grad X
Grad Y
Ix
Iy
Mul
Ixx=Ix*Ix
Ixy=Ix*Iy
Iyy=Iy*IyMul
Mul Gauss
Gauss
Gauss
Sxx
Sxy
Syy
coarsitySxx*Syy-Sxy² K
Skeleton Code and Benchmarksvoid full_chain_harris(tile const&,tile&){run( pardo<8>((seq(grad),seq(mul),seq(gauss),seq(coarsity)));}
20 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Harris Corner Detector
I
Grad X
Grad Y
Ix
Iy
Mul
Ixx=Ix*Ix
Ixy=Ix*Iy
Iyy=Iy*IyMul
Mul Gauss
Gauss
Gauss
Sxx
Sxy
Syy
coarsitySxx*Syy-Sxy² K
Skeleton Code and Benchmarksvoid half_chain_harris(tile const&,tile&){run( pardo<4>((seq(grad),seq(mul)) | (seq(gauss),seq(coarsity)));}
20 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Harris Corner Detector
I
Grad X
Grad Y
Ix
Iy
Mul
Ixx=Ix*Ix
Ixy=Ix*Iy
Iyy=Iy*IyMul
Mul Gauss
Gauss
Gauss
Sxx
Sxy
Syy
coarsitySxx*Syy-Sxy² K
Skeleton Code and Benchmarksvoid no_chain_harris(tile const&,tile&){run( pardo<2>( seq(grad) | seq(mul) | seq(gauss) | seq(coarsity));}
20 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Harris Corner Detector
I
Grad X
Grad Y
Ix
Iy
Mul
Ixx=Ix*Ix
Ixy=Ix*Iy
Iyy=Iy*IyMul
Mul Gauss
Gauss
Gauss
Sxx
Sxy
Syy
coarsitySxx*Syy-Sxy² K
Results (PACT 09)
Version Full-chain Half-chain No-chainManual 11.26 8.36 9.97
Skell 11.86 8.64 10.43Overhead 5.33% 3.35% 4.61%
20 / 23LSU Seminar
N
Introduction Generative Programming NT2 Quaff Conclusion
Let’s round this up!
Parallel Computing for ScientistParallel programmign tools are required BUT:
... They need to cater to users’ needs !
... Users should feel «at home»
Our claimsSoftware Libraries built as Generic and Generative components cansolve a large chunk of parallelism related problems while beingeasy to use.
EDSL are an efficient way to design such libraries
C++ provides a wide selection of idioms to achieve this goal.
21 / 23LSU SeminarN
Introduction Generative Programming NT2 Quaff Conclusion
Let’s round this up!
Parallel Computing for ScientistParallel programmign tools are required BUT:
... They need to cater to users’ needs !
... Users should feel «at home»
Our claimsSoftware Libraries built as Generic and Generative components cansolve a large chunk of parallelism related problems while beingeasy to use.
EDSL are an efficient way to design such libraries
C++ provides a wide selection of idioms to achieve this goal.
21 / 23LSU SeminarN
Introduction Generative Programming NT2 Quaff Conclusion
Prospectives
What we’re cooking at the momentGet a public release of NT 2 and Quaff
NT 2: Automatic matlab conversion with source to source compiler
Quaff: Simplify new skeletons design by providing a SemanticRules EDSL
What we’ll be cooking in the futureMulti-stage programming for skeleton over the Grid/Cloud
NT 2 supports for embedded architectures
22 / 23LSU Seminar
N
Thanks for your attention
Top Related