Workshop on Grid Applications Programming, July 2004 GRID superscalar: a programming paradigm for...

30
Workshop on Grid Applications Programming, July 2004 GRID superscalar: a programming paradigm for GRID applications CEPBA-IBM Research Institute Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta

Transcript of Workshop on Grid Applications Programming, July 2004 GRID superscalar: a programming paradigm for...

Workshop on Grid Applications Programming, July 2004

GRID superscalar: a programming paradigm for GRID applications

CEPBA-IBM Research Institute

Raül Sirvent, Josep M. Pérez, Rosa M. Badia, Jesús Labarta

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

Objective

• Ease the programming of GRID applications

• Basic idea:

L3

Dir

ec

tory

/Co

ntr

ol

L2 L2 L2

LSU LSUIFUBXU

IDU IDU

IFUBXU

FPU FPU

FX

U

FX

UISU ISU

Grid

ns seconds/minutes/hours

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Current run-time features• Programming experiences• Future work• Conclusions

Workshop on Grid Applications Programming, July 2004

The essence

• Assembly language for the GRID– Simple sequential programming, well defined operations and

operands

– C/C++, Perl, …

• Automatic run time “parallelization”– Use architectural concepts from microprocessor design

• Instruction window (DAG), dependence analysis, scheduling, locality, renaming, forwarding, prediction, speculation,…

Workshop on Grid Applications Programming, July 2004

The essence

for (int i = 0; i < MAXITER; i++) {

newBWd = GenerateRandom();

subst (referenceCFG, newBWd, newCFG);

dimemas (newCFG, traceFile, DimemasOUT);

post (newBWd, DimemasOUT, FinalOUT);

if(i % 3 == 0) Display(FinalOUT);

}

fd = GS_Open(FinalOUT, R);

printf("Results file:\n"); present (fd);

GS_Close(fd);

Workshop on Grid Applications Programming, July 2004

The essenceSubst

DIMEMAS

Post

Subst

DIMEMAS

Post…

GS_open

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Display

Display

CIRI Grid

Workshop on Grid Applications Programming, July 2004

The essenceSubst

DIMEMAS

Post

Subst

DIMEMAS

Post…

GS_open

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Subst

DIMEMAS

Post

Display

Display

CIRI Grid

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

• Three components:

– Main program

– Subroutines/functions

– Interface Definition Language (IDL) file

• Programming languages: C/C++, Perl

User’s interface

Workshop on Grid Applications Programming, July 2004

• A Typical sequential program

– Main program:

for (int i = 0; i < MAXITER; i++) {

newBWd = GenerateRandom();

subst (referenceCFG, newBWd, newCFG);

dimemas (newCFG, traceFile, DimemasOUT);

post (newBWd, DimemasOUT, FinalOUT);

if(i % 3 == 0) Display(FinalOUT);

}

fd = GS_Open(FinalOUT, R);

printf("Results file:\n"); present (fd);

GS_Close(fd);

User’s interface

Workshop on Grid Applications Programming, July 2004

User’s interface

void dimemas(in File newCFG, in File traceFile, out File DimemasOUT){ char command[200]; putenv("DIMEMAS_HOME=/usr/local/cepba-tools"); sprintf(command, "/usr/local/cepba-tools/bin/Dimemas -o %s %s", DimemasOUT, newCFG ); GS_System(command);}

• A Typical sequential program– Subroutines/functions

void display(in File toplot){ char command[500];

sprintf(command, "./display.sh %s", toplot); GS_System(command);}

Workshop on Grid Applications Programming, July 2004

User’s interface

• GRID superscalar programming requirements

– Main program: open/close files with• GS_FOpen, GS_Open, GS_FClose, GS_Close

– Subroutines/functions• Temporal files on local directory or ensure uniqueness of name per

subroutine invocation• GS_System instead of system• All input/output files required must be passed as arguments

Workshop on Grid Applications Programming, July 2004

interface MC {void subst(in File referenceCFG, in double newBW, out File newCFG);void dimemas(in File newCFG, in File traceFile, out File DimemasOUT);void post(in File newCFG, in File DimemasOUT, inout File FinalOUT);void display(in File toplot)

};

• Gridifying the sequential program

– CORBA-IDL Like Interface: • In/Out/InOut files• Scalar values (in or out)

– The subroutines/functions listed in this file will be executed in a remote server in the Grid.

User’s interface

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

Automatic code generation: C

app.idl

app-worker.capp.c app-functions.c

server

gsstubgen

app.h

client

app-stubs.c

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

Run-time features

• Data dependence analysis – Detects RaW, WaR, WaW dependencies based on file parameters

– Tasks’ Directed Acyclic Graph is built based on these dependencies

• File renaming – WaW and WaR dependencies are avoidable with renaming

• Shared disks management– Supports shared working directories: NFS

– Allows shared input directories: mirrors of large DBs

Workshop on Grid Applications Programming, July 2004

Run-time features

• Resource brokering and task scheduling– Scheduling policy exploits file locality

– File transfer time vs execution time tradeoff considered

– Tasks submitted for execution as soon as the data dependencies are solved if resources are available

– End of tasks is detected by means of asynchronous callbacks

– Calls to globus:• globus_gram_client_job_request• globus_gram_client_job_status• globus_gram_client_job_cancel• globus_gram_client_callback_allow• globus_poll_blocking

Workshop on Grid Applications Programming, July 2004

Run-time features

• Communication between workers and master– Socket and file mechanisms provided

• Checkpointing at task level– Inter-task checkpointing

– Transparent to application developer

• All based in Globus Toolkit C APIs (version 2.x)– Provides authentication and authorization

– File transfers through gsiftp service

– Task handling with gram service

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

Programming experiences

• Parameter studies (Dimemas, Paramedir)– Algorithm flexibility

• NAS Grid Benchmarks– Improved component programs flexibility

– Reduced Grid level source code lines

• Bioinformatics application (production)– Improved portability (Globus vs just LoadLeveler)

– Reduced Grid level source code lines

• Pblade solution for bioinformatics

Workshop on Grid Applications Programming, July 2004

Outline

• Objective• The essence• User’s interface• Automatic code generation• Run-time features• Programming experiences• Ongoing work• Conclusions

Workshop on Grid Applications Programming, July 2004

Ongoing work

• Automatic deployment

Workshop on Grid Applications Programming, July 2004

Ongoing work

• fastDNAml– Computes the likelihood of various phylogenetic trees, starting with

aligned DNA sequences from a number of species (Indiana University code)

– Sequential and MPI (grid-enabled) versions available

– Porting to GRID superscalar • Lower pressure on communications than MPI• Simpler code than MPI

Workshop on Grid Applications Programming, July 2004

Ongoing work

• Run-time: exception handling try{

for (int n=0; n<=10; n++){

if (n>9) throw "Out of range";

myarray[n]='z';

}

}

catch (char * str){

cout << "Exception: " << str << endl;

}

• Interesting case: throw in workers, catch in main program

Workshop on Grid Applications Programming, July 2004

Ongoing work

• OGSA oriented resource broker, based on Globus Toolkit 3.x. • And more future work:

– Bindings to other basic middlewares• GAT, Ninf-G2

– New language bindings (shell script)

– Enhancements in the run-time performance guided by the performance analysis

Workshop on Grid Applications Programming, July 2004

Conclusions

• Presentation of the ideas of GRID superscalar

• Exists a viable way to ease the programming of Grid applications

• GRID superscalar run-time enables– Use of the resources in the Grid

– Exploiting the existent parallelism

Workshop on Grid Applications Programming, July 2004

How GAT can help us

• Middleware in a higher level (skip Globus details)• Avoid changing when Globus changes• Abstraction for using other Grid Middlewares

• Resource Broker• Intra-Task checkpointing mechanism

• Interesting GATObjects: – GATFile (GATFile_Copy, GATFile_Delete)

– GATResourceDescription, GATResourceBroker, GATJob

Workshop on Grid Applications Programming, July 2004

More information

• GRID superscalar home page:

http://people.ac.upc.es/rosab/index_gs.htm

• Rosa M. Badia, Jesús Labarta, Raül Sirvent, Josep M. Pérez, José M. Cela, Rogeli Grima, “Programming Grid Applications with GRID Superscalar”, Journal of Grid Computing, Volume 1 (Number 2): 151-170 (2003).