Generative and Meta-Programming - Modern C++ Design for Parallel Computing

43
Introduction Generative Programming NT2 Quaff Conclusion Generative and Meta-Programming Modern C++ Design for Parallel Computing Joel Falcou 05/02/2011 LRI, University Paris Sud XI 1 / 23 LSU Seminar N

description

Those slides are from my 2011 Texas A&M IAMCS seminar. Generative and Generic progrmamign techniques for HPC are presented over NT2 and Quaff projects

Transcript of Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Page 1: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Generative and Meta-ProgrammingModern C++ Design for Parallel Computing

Joel Falcou

05/02/2011

LRI, University Paris Sud XI

1 / 23LSU Seminar

N

Page 2: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Context

In Scientific Computing ...there is Scientific

Applications are domain driven

Users 6= Developers

Users are reluctant to changes

there is Computing

Computing requires performance ...

... which implies architectures specific tuning

... which requires expertise

... which may or may not be available

The ProblemPeople using computers to do science want to do science first.

2 / 23LSU Seminar

N

Page 3: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Context

In Scientific Computing ...there is Scientific

Applications are domain driven

Users 6= Developers

Users are reluctant to changes

there is Computing

Computing requires performance ...

... which implies architectures specific tuning

... which requires expertise

... which may or may not be available

The ProblemPeople using computers to do science want to do science first.

2 / 23LSU Seminar

N

Page 4: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Context

In Scientific Computing ...there is Scientific

Applications are domain driven

Users 6= Developers

Users are reluctant to changes

there is Computing

Computing requires performance ...

... which implies architectures specific tuning

... which requires expertise

... which may or may not be available

The ProblemPeople using computers to do science want to do science first.

2 / 23LSU Seminar

N

Page 5: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Context

In Scientific Computing ...there is Scientific

Applications are domain driven

Users 6= Developers

Users are reluctant to changes

there is Computing

Computing requires performance ...

... which implies architectures specific tuning

... which requires expertise

... which may or may not be available

The ProblemPeople using computers to do science want to do science first.

2 / 23LSU Seminar

N

Page 6: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Context

In Scientific Computing ...there is Scientific

Applications are domain driven

Users 6= Developers

Users are reluctant to changes

there is Computing

Computing requires performance ...

... which implies architectures specific tuning

... which requires expertise

... which may or may not be available

The ProblemPeople using computers to do science want to do science first.

2 / 23LSU Seminar

N

Page 7: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

The Problem – and how we want to solve it

The FactsThe ”Library to bind them all” doesn’t exist (or we should have it already )

All those users want to take advantage of new architectures

Few of them want to actually handle all the dirty work

The EndsProvide a ”familiar” interface that let users benefit from parallelism

Helps compilers to generate better parallel code

Increase sustainability by decreasing amount of code to write

The MeansGenerative Programming

Template Meta-Programming

Embedded Domain Specific Languages

3 / 23LSU Seminar

N

Page 8: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Talk Layout

1 Introduction

2 Generative Programming

3 NT2

4 Quaff

5 Conclusion

4 / 23LSU Seminar

N

Page 9: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Generative Programming

Domain SpecificApplication Description

Generative Component Concrete Application

Translator

Parametric Sub-components

5 / 23LSU Seminar

N

Page 10: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Generative Programming as a Tool

Available techniquesDedicated compilers

External pre-processing tools

Languages supporting meta-programming

Definition of Meta-programming

Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.

6 / 23LSU Seminar

N

Page 11: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Generative Programming as a Tool

Available techniquesDedicated compilers

External pre-processing tools

Languages supporting meta-programming

Definition of Meta-programming

Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.

6 / 23LSU Seminar

N

Page 12: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Generative Programming as a Tool

Available techniquesDedicated compilers

External pre-processing tools

Languages supporting meta-programming

Definition of Meta-programming

Meta-programming is the writing of computer programs thatanalyse, transform and generate other programs (or them-selves) as their data.

6 / 23LSU Seminar

N

Page 13: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

From Generative to Meta-programming

Meta-programmable languagestemplate HASKELL

metaOcaml

C++

C++ meta-programmingRelies on the C++ template sub-language

Handles types and integral constants at compile-time

Proved to be Turing-complete

7 / 23LSU Seminar

N

Page 14: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

From Generative to Meta-programming

Meta-programmable languagestemplate HASKELL

metaOcaml

C++

C++ meta-programmingRelies on the C++ template sub-language

Handles types and integral constants at compile-time

Proved to be Turing-complete

7 / 23LSU Seminar

N

Page 15: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

From Generative to Meta-programming

Meta-programmable languagestemplate HASKELL

metaOcaml

C++

C++ meta-programmingRelies on the C++ template sub-language

Handles types and integral constants at compile-time

Proved to be Turing-complete

7 / 23LSU Seminar

N

Page 16: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Embedded Domain Specific Languages

What’s an EDSL ?DSL = Domain Specific Language

Declarative language, easy-to-use, fitting the domain

EDSL = DSL within a general purpose language

EDSL in C++Relies on operator overload abuse (Expression Templates)

Carry semantic information around code fragment

Generic implementation become self-aware of optimizations

Exploiting static ASTAt the expression level: code generation

At the function level: inter-procedural optimization

8 / 23LSU Seminar

N

Page 17: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Expression Templates

matrix x(h,w),a(h,w),b(h,w);

x = cos(a) + (b*a);

expr<assign ,expr<matrix&> ,expr<plus , expr<cos ,expr<matrix&> > , expr<multiplies ,expr<matrix&> ,expr<matrix&> > >(x,a,b);

+

*cos

a ab

=

x

#pragma omp parallel forfor(int j=0;j<h;++j){ for(int i=0;i<w;++i) { x(j,i) = cos(a(j,i)) + ( b(j,i) * a(j,i) ); }}

Arbitrary Transforms appliedon the meta-AST

9 / 23LSU Seminar

N

Page 18: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

What is NT 2 ?

A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users

Provide high-performance computing entities and primitives

Easily extendable

Principles

Take a .m file, copy to a .cpp file

Add #include <nt2/nt2.hpp> and do cosmetic changes

Compile the file and link with libnt2.a

10 / 23LSU Seminar

N

Page 19: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

What is NT 2 ?

A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users

Provide high-performance computing entities and primitives

Easily extendable

PrinciplesTake a .m file, copy to a .cpp file

Add #include <nt2/nt2.hpp> and do cosmetic changes

Compile the file and link with libnt2.a

10 / 23LSU Seminar

N

Page 20: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

What is NT 2 ?

A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users

Provide high-performance computing entities and primitives

Easily extendable

PrinciplesTake a .m file, copy to a .cpp file

Add #include <nt2/nt2.hpp> and do cosmetic changes

Compile the file and link with libnt2.a

10 / 23LSU Seminar

N

Page 21: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

What is NT 2 ?

A Scientific Computing LibraryProvide a simple, MATLAB-like interface for users

Provide high-performance computing entities and primitives

Easily extendable

PrinciplesTake a .m file, copy to a .cpp file

Add #include <nt2/nt2.hpp> and do cosmetic changes

Compile the file and link with libnt2.a

10 / 23LSU Seminar

N

Page 22: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

MATLAB you said ?

R = I(:,:,1);G = I(:,:,2);B = I(:,:,3);

Y = min(abs(0.299.*R+0.587.*G+0.114.*B),235);U = min(abs(-0.169.*R-0.331.*G+0.5.*B),240);V = min(abs(0.5.*R-0.419.*G-0.081.*B),240);

11 / 23LSU Seminar

N

Page 23: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Now with NT 2

table<double> R = I(_,_,1);table<double> G = I(_,_,2);table<double> B = I(_,_,3);table<double> Y, U, V;

Y = min(abs(0.299*R+0.587*G+0.114*B),235);U = min(abs(-0.169*R-0.331*G+0.5*B),240);V = min(abs(0.5*R-0.419*G-0.081*B),240);

12 / 23LSU Seminar

N

Page 24: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Now with NT 2

table<float,settings(shallow,of_size_<640,480>)> R = I(_,_,1);table<float,settings(shallow,of_size_<640,480>)> G = I(_,_,2);table<float,settings(shallow,of_size_<640,480>)> B = I(_,_,3);table<float,settings(of_size_<640,480>)> Y, U, V;

Y = min(abs(0.299*R+0.587*G+0.114*B),235),U = min(abs(-0.169*R-0.331*G+0.5*B),240),V = min(abs(0.5*R-0.419*G-0.081*B),240);

12 / 23LSU Seminar

N

Page 25: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Some Performances

RGB2YUV timing (in cycles/pixels)

Size 128x128 256x256 512x512 1024x1024MATLAB 2010a (2 cores) 85 89 97 102C (1 core) 23.6 23.8 23.9 24.0NT2 (1 core) 22.3 22.4 22.2 24.1NT2 OpenMP (2 cores) 11.4 11.4 11.5 12.2NT2 SIMD 5.5 5.6 5.6 5.9NT2 SIMD+OpenMP 2.9 2.9 2.9 3.0NT2 SIMD speed-up 3.9 3.9 3.9 4.0NT2 OpenMP speed-up 1.92 1.96 1.98 1.95NT2 vs MATLAB speed-up 29.3 30.7 33.5 34

13 / 23LSU Seminar

N

Page 26: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Some real life applications

14 / 23LSU Seminar

N

Page 27: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Parallel Skeletons in a nutshell

Basic Principles [COLE 89]There are patterns in parallel applications

Those patterns can be generalized in Skeletons

Applications are assembled as combination of such patterns

Functionnal point of viewSkeletons are Higher-Order Functions

Skeletons support a compositionnal semantic

Applications become composition of state-less functions

15 / 23LSU Seminar

N

Page 28: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Parallel Skeletons in a nutshell

Basic Principles [COLE 89]There are patterns in parallel applications

Those patterns can be generalized in Skeletons

Applications are assembled as combination of such patterns

Functionnal point of viewSkeletons are Higher-Order Functions

Skeletons support a compositionnal semantic

Applications become composition of state-less functions

15 / 23LSU Seminar

N

Page 29: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Spotting skeletons when you see one

Processus 1

F1

Processus 7

F3

Processus 3

F2

Processus 4

F2

Processus 5

F2

Processus 2

Distrib.

Processus 6

Collect.

FarmPipeline

16 / 23LSU Seminar

N

Page 30: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Objectives

Proposing a C++ skeletons libraryLimit runtime overhead

Use the formal model skeleton as guidelines

17 / 23LSU Seminar

N

Page 31: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Objectives

Proposing a C++ skeletons libraryLimit runtime overhead

Use the formal model skeleton as guidelines

What we want to writerun( pipeline( seq(F1), seq(F2), seq(F3) ) )

17 / 23LSU Seminar

N

Page 32: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Objectives

Proposing a C++ skeletons libraryLimit runtime overhead

Use the formal model skeleton as guidelines

What we want to runif( rank == 0 )

do { out = F1();MPI_Send(&out,1,MPI_INT,1,0,MPI_COMM_WORLD);

} while( isValid(out) );

if( rank == 1 )do { MPI_Recv(&in,1,MPI_INT,0,0,MPI_COMM_WORLD,&s);

out = F2(in);MPI_Send(&in,1,MPI_INT,2,0,MPI_COMM_WORLD);

} while ( isValid(out) )

if( rank == 2 )do { MPI_Recv(&in,1,MPI_INT,1,0,MPI_COMM_WORLD,&s);

F2(in);} while ( isValid(in) )

17 / 23LSU Seminar

N

Page 33: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Design Rationale

Code generation processGenerating the skeleton tree at compile-time

Turning this tree into a process network using MPL

Producing the C+MPI target code using Fusion

C++

PIPE

FARM3φ1 φ3

φ2φ1 φ3

φ2

φ2

f

φ2

GénérationAST

Productiondu RPSC

Générationde code C+MPI C

MPI

18 / 23LSU Seminar

N

Page 34: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Quaff over the CELL

The OCELLE projectIEF/CEA/TRT

Tools for heterog. proc.

MPI and skeleton for the Cell

Developing for the CELLHow to find a good mapping for an application ?

19 / 23LSU Seminar

N

Page 35: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Harris Corner Detector

I

Grad X

Grad Y

Ix

Iy

Mul

Ixx=Ix*Ix

Ixy=Ix*Iy

Iyy=Iy*IyMul

Mul Gauss

Gauss

Gauss

Sxx

Sxy

Syy

coarsitySxx*Syy-Sxy² K

20 / 23LSU Seminar

N

Page 36: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Harris Corner Detector

I

Grad X

Grad Y

Ix

Iy

Mul

Ixx=Ix*Ix

Ixy=Ix*Iy

Iyy=Iy*IyMul

Mul Gauss

Gauss

Gauss

Sxx

Sxy

Syy

coarsitySxx*Syy-Sxy² K

Skeleton Code and Benchmarksvoid full_chain_harris(tile const&,tile&){run( pardo<8>((seq(grad),seq(mul),seq(gauss),seq(coarsity)));}

20 / 23LSU Seminar

N

Page 37: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Harris Corner Detector

I

Grad X

Grad Y

Ix

Iy

Mul

Ixx=Ix*Ix

Ixy=Ix*Iy

Iyy=Iy*IyMul

Mul Gauss

Gauss

Gauss

Sxx

Sxy

Syy

coarsitySxx*Syy-Sxy² K

Skeleton Code and Benchmarksvoid half_chain_harris(tile const&,tile&){run( pardo<4>((seq(grad),seq(mul)) | (seq(gauss),seq(coarsity)));}

20 / 23LSU Seminar

N

Page 38: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Harris Corner Detector

I

Grad X

Grad Y

Ix

Iy

Mul

Ixx=Ix*Ix

Ixy=Ix*Iy

Iyy=Iy*IyMul

Mul Gauss

Gauss

Gauss

Sxx

Sxy

Syy

coarsitySxx*Syy-Sxy² K

Skeleton Code and Benchmarksvoid no_chain_harris(tile const&,tile&){run( pardo<2>( seq(grad) | seq(mul) | seq(gauss) | seq(coarsity));}

20 / 23LSU Seminar

N

Page 39: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Harris Corner Detector

I

Grad X

Grad Y

Ix

Iy

Mul

Ixx=Ix*Ix

Ixy=Ix*Iy

Iyy=Iy*IyMul

Mul Gauss

Gauss

Gauss

Sxx

Sxy

Syy

coarsitySxx*Syy-Sxy² K

Results (PACT 09)

Version Full-chain Half-chain No-chainManual 11.26 8.36 9.97

Skell 11.86 8.64 10.43Overhead 5.33% 3.35% 4.61%

20 / 23LSU Seminar

N

Page 40: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Let’s round this up!

Parallel Computing for ScientistParallel programmign tools are required BUT:

... They need to cater to users’ needs !

... Users should feel «at home»

Our claimsSoftware Libraries built as Generic and Generative components cansolve a large chunk of parallelism related problems while beingeasy to use.

EDSL are an efficient way to design such libraries

C++ provides a wide selection of idioms to achieve this goal.

21 / 23LSU SeminarN

Page 41: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Let’s round this up!

Parallel Computing for ScientistParallel programmign tools are required BUT:

... They need to cater to users’ needs !

... Users should feel «at home»

Our claimsSoftware Libraries built as Generic and Generative components cansolve a large chunk of parallelism related problems while beingeasy to use.

EDSL are an efficient way to design such libraries

C++ provides a wide selection of idioms to achieve this goal.

21 / 23LSU SeminarN

Page 42: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Introduction Generative Programming NT2 Quaff Conclusion

Prospectives

What we’re cooking at the momentGet a public release of NT 2 and Quaff

NT 2: Automatic matlab conversion with source to source compiler

Quaff: Simplify new skeletons design by providing a SemanticRules EDSL

What we’ll be cooking in the futureMulti-stage programming for skeleton over the Grid/Cloud

NT 2 supports for embedded architectures

22 / 23LSU Seminar

N

Page 43: Generative and Meta-Programming - Modern C++ Design for Parallel Computing

Thanks for your attention