Abstractions: Programming and deploying apps. on Grids

17
Abstractions: Programming and deploying apps. on Grids Franck Cappello INRIA* (*this is my own opinion!) CCGRID’08 - Panel

description

Abstractions: Programming and deploying apps. on Grids. Franck Cappello INRIA* (*this is my own opinion!) CCGRID’08 - Panel. Application. Programming Environments. Application Runtime. Measurement tools. Experimental conditions injector. Grid or P2P Middleware. Operating System. - PowerPoint PPT Presentation

Transcript of Abstractions: Programming and deploying apps. on Grids

Page 1: Abstractions:  Programming and deploying apps. on Grids

Abstractions: Programming and deploying apps. on

GridsFranck Cappello

INRIA*(*this is my own opinion!)

CCGRID’08 - Panel

Page 2: Abstractions:  Programming and deploying apps. on Grids

Grid’5000Grid’5000*

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

Application Runtime

Grid or P2P Middleware

Operating System

Programming Environments

Networking

Application

Exp

eri

men

tal co

nd

itio

ns

inje

ctor

Measu

rem

ent

tools

A fully reconfigurable and controllable environment(resource “dedication”)

>400 experiments in total>100 experiments on apps.

Page 3: Abstractions:  Programming and deploying apps. on Grids

What are the main Distributed Apps. In your

project?Application domains:• Life science (mammogram comparison, protein sequencing, Gene prediction, virtual screening, conformation sampling, etc.)• Physics (seismic imaging, parallel solvers, hydrogeology, Self-propelled solids, seismic tomography, etc.) • Applied Mathematics (sparse matrix computation, combinatorial Optimization, parallel model checkers, PDE problem solvers, etc.)• Chemistry (molecular simulation, estimation of thickness on Thin films),• Industrial processes,• Financial computing

Main usage of Grid’5000 for these applications:• Evaluate the performance of applications ported to the Grid,• Test alternatives,• Design new algorithms and new methods

Page 4: Abstractions:  Programming and deploying apps. on Grids

What programming difficulties and

abstraction opportunities ?• Organizing the calculus

• Tolerating performance variations and Hw&Sw failures

• Scheduling computation & communications• Implementing computing codes• Synchronizing task executions• Implementing global operations• Selecting the communication protocols• Dealing with resources (data, computers, etc.)• Dealing with administration domains

Page 5: Abstractions:  Programming and deploying apps. on Grids

Current infrastructures: how

they mask complexity• Solution 1) ask the “user” to conform to a certain

abstraction of the execution platform --> developing applications following standard interfaces (HPC centers, most deployed Grids)

• Solution 2) ask the execution platform to conform to “users” abstractions --> users keep their apps. and environment unchanged and need a reconfiguration of the platform (Grid’5000, Amazon Elastic Compute Cloud)

• Solution 3) ask the user to choose from a variety of predefined execution environments

Page 6: Abstractions:  Programming and deploying apps. on Grids

What are the common patterns –

programming?

• Rule reduction (e.g. Chemical Computing) --> soon• Graph of components (Data&Workflow) --> OpenWP• Specific control graph controlled by data (e.g. Divide &

Conquer, B&B) --> Proactive, PARADISEO• Components (code coupling) --> Grid Corba component model• Components with Control Graphs (Workflow) -->

DagMan&Condor• Global operations (MAP-Reduce) --> not aware of• SPMD (MPI for Grids) --> QcGOpenMPI, MPICH-G2, etc.• Client-server (Grid-RPC) --> DIET, XtremWeb, etc.• Assembly languages (set of scripts) …

Programming models tested on Grid’5000:

Page 7: Abstractions:  Programming and deploying apps. on Grids

Example 1: Combinatorial

Optimization Problems•Flow-shop (one of the hardest challenge problems in combinatorial optimization):

•Schedule a set of jobs on a set of machines minimizing makespan. •Exhaustive enumeration of all combinations would take several years. •The challenge is thus to reduce the number of explored solutions.

New Grid exact method based on the Branch-and-Bound, combining new approaches of combinatorial algorithmic, grid computing, load balancing and fault tolerance.

Problem: 50 jobs on 20 machines, optimally solved for the 1st time, with 1245 CPUs (peak) 1245 CPUs (peak)

Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse.Involved Grid5000 sites (6): Bordeaux, Lille, Orsay, Rennes, Sophia-Antipolis and Toulouse. The optimal solution required a wall-clock time of The optimal solution required a wall-clock time of 25 days.25 days.

Many success stories in combinatorial optimizations:

one of the most promising one, in 2008:Grid’5000 was used to design and improve the algorithm (MOGO) used in the first computer victory against a professional Go player (5 Dan) on a 9x9 plate in the last Paris tournament! (it’s close to the Dan!)

Page 8: Abstractions:  Programming and deploying apps. on Grids

Example2: OpenWP

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.QuickTime™ et un

décompresseur TIFF (LZW)sont requis pour visionner cette image.

QuickTime™ et undécompresseur TIFF (LZW)

sont requis pour visionner cette image.

OpenWP: • A directive based language and runtime for coarse grain distributed executions• Express dependencies of computing blocs+work distribution• For existing codes• Uses a virtual shared memory model• Run over existing workflow engines

Linear Speedup

Non parallel region

Workflow engineoverhead

Negligible cost

Effect of optimizations

QuickTime™ et undécompresseur TIFF (non compressé)

sont requis pour visionner cette image.

•AMIBES (EADS): Mesher Module of the jCAE (CAD environment in Java)

Page 9: Abstractions:  Programming and deploying apps. on Grids

Applications “deployment” on Grid’5000:• Site level:

– Node selection --> OAR– Node Reservation (ISOLATION) --> OAR (batch

scheduler)– Reconfiguration --> Kadeploy

• Grid Level --> GRUDU (Grid Reservation Utility)• Application configuration and launch --> Adage

What are the common

patterns – Deployment?

Page 10: Abstractions:  Programming and deploying apps. on Grids

Deployment: Grudu (G5K Reservation Utility)

Main goals :

– Displaying the status of the platform

– Resources allocation through the use of OAR

– Resources monitoring through Ganglia

– Deployment management with a GUI for KaDeploy

– A terminal emulator and a file transfer manager

All-in-one GUI client-side tool for the monitoring of the Grid'5000 platform.

Page 11: Abstractions:  Programming and deploying apps. on Grids

ADAGE: Automatic deployment of large scale applications that need one or multiple middleware systems:MPI, CCM, JXTA, Jobs, GFarm, P2P overlays, DIET

MPI Application

CCM Application

Resource Description

Generic Application Description

Control Parameters

Deployment Planning

Deployment

Execution

Application Configuratio

n

LEGO Application

Application deployment

Rendez-vous peers

JXTA edge peers

“rendez vous” peers known by one of the “rendez vous” peerX axis: time ; Y axis: “rendez vous” peer ID

“rendez vous” peers known by one of the “rendez vous” peerX axis: time ; Y axis: “rendez vous” peer ID

Jxta Scalability test:-Evaluation of the peerview and discovery protocols-Deployment of 1000s of Jxta peers-Run the scalability test

Page 12: Abstractions:  Programming and deploying apps. on Grids

Resource Dedication: G5K VS. EGEE

number of images

Execution time (seconds)

Data Parallelism + Pipelining

Data Parallelism

number of images

Data Parallelism + Pipelining

Data Parallelism

Naive execution

Execution time (seconds)

1800

3600

5400

7200

9000

10800

12600

14400

Bronze Standard method addressing the issue of medical image algo. evaluation.•Application on estimation of the spatial rigid transformation between two images (convenient to align two different images of a same patient acquired separately).

Complex workflow of computations on large number of data sets.•Typically require 10s to 100s of 3D images pairs. 15 minutes per image pair.

•The method is executed with MOTEUR (workflow engine) •Several degrees of parallelism are tested:

only the workflow intrinsic parallelism

data sets are processed concurrentlyservices in sequential branches are pipelined

data sets are processed concurrently

Page 13: Abstractions:  Programming and deploying apps. on Grids

Are the patterns (applications) well supported? --> Thanks to the Node reconfiguration model,

many patterns are well supported

What further abstractions should be considered? --> Node configuration and deployment are still difficult and

require too much effort for the users--> the Network resources should be reserved and isolated

What abstractions have worked for you? --> Reservation, Isolation, Reconfiguration and Deployment

What abstractions do you feel you need? --> Reservation, Configuration and Deployment issues

How well will abstractions work with the next generation of infrastructure that your project will use?

--> Reservation, Isolation, Reconfiguration and deployment will be required for “transparent” Cloud Computing

Gap Analysis

Page 14: Abstractions:  Programming and deploying apps. on Grids

The notion of energy “conservation”

Programming interface

Compile-time operations & optimizations

Runtime operations & optimizations

Grid Infrastructure

Programming interface(less abstraction but more optimization

Opportunities)

Compile-time operations & optimizations

Runtime operations & optimizations

Grid Infrastructure

Page 15: Abstractions:  Programming and deploying apps. on Grids

“programming” models & Abstractions

• Chemical Computing• Data&Workflow

• Divide & Conquer• Workflow

• MAP-Reduce• MPI for Grids

• Grid-RPC• Set of scripts

•O

rgan

izin

g t

he c

alc

ulu

s •

Tole

rati

ng v

ari

ati

ons

•Sch

edulin

g c

om

puta

tion &

co

mm

unic

ati

ons

•Im

ple

menti

ng

com

puti

ng

codes

•Synch

roniz

ing t

ask

execu

tions

•Im

ple

menti

ng

glo

bal

opera

tions

•Sele

ctin

g t

he

com

mu

nic

ati

on p

roto

cols

•D

ealin

g w

ith r

eso

urc

es

(data

, co

mpute

rs, etc

.)•

Dealin

g w

ith a

dm

inis

trati

on

dom

ain

s

Page 16: Abstractions:  Programming and deploying apps. on Grids

I didn’t know that Grid had to be

programmed (??)• Is there anything so different on Grids that it justifies to program them in a specific way?

• What was the promise? An infrastructure providing resources (data, storage, computing) as the power Grid provides electricity --> Transparently

• So, why should we care about “programming Grids”?• Because the “abstracting job” is not finished:

– Moving data and programs rapidly (protocols)– Dealing with several (many) administration domains (VO)– Dealing with several (many) batch schedulers (interfaces)– Moving data and jobs in a smart way (control)– Tolerating the performance variations & failures of resources– Provide QoS– Etc.

• Even the “good” software layer(s) where to implement the abstraction is (are) not stabilized (Middleware, OS, Network ?)

• So YES I still have to program Grids

Page 17: Abstractions:  Programming and deploying apps. on Grids

That’s not a problem• Why should I care about Grid at all ?• There is a new very promising solution…• It is cleaner (environment friendly, more abstract, etc.)• It does not compare with Electricity distribution (the power

Grid)• BUT with Water distribution…

• It’s