Charm++ Data-driven Objects
description
Transcript of Charm++ Data-driven Objects
Charm++Data-driven Objects
L. V. Kale
Parallel Programming• Decomposition
– what to do in parallel
• Mapping:– Which processor does each task
• Scheduling (sequencing)– On each processor
• Machine dependent expression– Express the above decisions for the particular parallel machine
The parallel objects model of Charm++ automates Mapping, Scheduling, and machine dependent expression
Shared objects model:
• Basic philosophy:– Let the programmer decide what to do in parallel
– Let the system handle the rest:
• Which processor executes what, and when
• With some override control to the programmer, when needed
• Basic model:– The program is set of communicating objects
– Objects only know about other objects (not processors)
– System maps objects to processors
• And may remap the objects for load balancing etc. dynamically
• Shared objects, not shared memory– in-between “shared nothing” message passing, and “shared everything” of SAS
– Additional information sharing mechanisms
– “Disciplined” sharing
Charm++• Charm++ programs specify parallel computations
consisting of a number of “objects”– How do they communicate?
• By invoking methods on each other, typically asynchronously
• Also by sharing data using “specifically shared variables”
– What kinds of objects?• Chares: singleton objects
• Chare arrays: generalized collections of objects
• Advanced: Chare group (Used by library writers, system)
Data Driven Execution in Charm++
Scheduler Scheduler
Message Q Message Q
Objects
Need for Proxies• Consider:
– Object x of class A wants to invoke method f of obj y of class B.
– x and y are on different processors
– what should the syntax be?• y->f( …)? : doesn’t work because y is not a local pointer
• Needed:– Instead of “y” we must use an ID that is valid across processors
– Method Invocation should use this ID
– Some part of the system must pack the parameters and send them
– Some part of the system on the remote processor must invoke the right method on the right object with the parameters supplied
Charm++ solution: proxy classes• Classes with remotely invocable methods
– inherit from “chare” class (system defined)
– entry methods can only have one parameter: a subclass of message
• For each chare class D – which has methods that we want to remotely invoke
– The system will automatically generate a proxy class Cproxy_D
– Proxy objects know where the real object is
– Methods invoked on this class simply put the data in an “envelope” and send it out to the destination
• Each chare object has a global ID – CkChareID thishandle; // thishandle inherited from “chare”
– Also you can get the id of a chare when you create it:• Cproxy_D *p = new Cproxy_D(msgPtr);
Chare creation and method invocation
Msg * m = new Msg();
m->arg = 25;
CProxy_D *x = new CProxy_D(m);
Msg2 * m2 = new Msg2();
m2->a = 5;
m2->b= 7;
x->f();
Sequential equivalent:
y = new D(25);
y->f(5,7);
x->f(new Msg2(5,7));
Alternatively:
Chares (Data driven Objects)
• Regular C++ classes, – with some methods designated as remotely invokable
(called entry methods )
– entry methods have only one parameter: • of type message
• Creation: of an instance of chare class C– Cproxy_C * p = new CProxy_C(msg);
– Creates an instance of C on a specified processor “pe”• new CProxy_C (msg, pe);
– Cproxy_C: a proxy class generated by Charm for chare class C declared by the user
Messages
• A user-defined C++ class– inherits from a system-defined class
• messages can be communicated to others as parameters
– Has regular data fields
• Declaration: normal C++, – inherit from a system defined class
• Creation: (just usual C++)– MsgType * m = new MsgType;
Remote method invocation
• Proxy Classes:– For each chare class C, the system generates a proxy class.
• (C : CProxy_C)
• Each chare has a global ID (ChareID)– Global: in the sense of being valid on all processors
– thishandle (analogous to this) gets you the ChareID
– You can send thishandle in messages
– Given a handle h, you can create a proxy– CProxy_C p(h); // or q = new CProxy_C(h)– p.method(msg); // or q->method(msg);
CkChareID mainhandle;main::main(CkArgMsg * m){ int i = 0; for (i=0; i<100; i++) new CProxy_piPart(); responders = 100; count = 0; mainhandle = thishandle; // readonly initialization}void main::results(DataMsg *msg){ count += msg->count; if (0 == --responders) { CkPrintf("pi=: %f \n", 4.0*count/100000); CkExit(); }}
argc/argv
Execution begins here
Exit scheduler after method returns
piPart::piPart(){ // declarations.. CProxy_main mainproxy(mainhandle); srand48((long) this); mySamples = 100000/100; for (i= 0; i<= mySamples; i++) { x = drand48(); y = drand48(); if ((x*x + y*y) <= 1.0) localCount++; }
DataMsg *result = new DataMsg;result->count = localCount;mainproxy.results(result);delete this;
}
mainproxy.results( new DataMsg(localCount));
Generation of proxy classes• How does charm generate the proxy classes?
– Needs help from the programmer
– name classes and methods that can be remotely invoked
– declare this in a special “charm interface” file (pgm.ci)
– Include the generated code in your program
pgm.ci
mainmodule PiMod {
message DataMsg;
mainchare main {
entry main();
entry results(DataMsg *);
};
chare piPart {
entry piPart(void); };
Generates
PiMod.def.h
PiMod.def.h
pgm.h
#include “PiMod.decl.h”
..
Pgm.c
…
#include “PiMod.def.h”
Charm++
• Data Driven Objects• Message classes• Asynchronous method invocation• Prioritized scheduling• Object Arrays• Object Groups:
– global object with a “representative” on each PE
• Information sharing abstractions– readonly data
– accumulators
– distributed tables
Object Arrays• A collection of chares,
– with a single global name for the collection, and
– each member addressed by an index
– Mapping of element objects to processors handled by the system
A[0] A[1] A[2] A[3] A[..]
A[3]A[0]
User’s view
System view
Introduction• Elements are parallel objects like chares• Elements are indexed by a user-defined data type--
[sparse] 1D, 2D, 3D, tree, ...• Send messages to index, receive messages at element.
Reductions and broadcasts across the array• Dynamic insertion, deletion, migration-- and everything
still has to work!• Interfaces with automatic load balancer.
module m { message HiMsg; array [1D] Hello { entry Hello(void); entry void SayHi(HiMsg *); };};
CProxy_Hello p = CProxy_Hello::ckNew();for (int i=12;i<73;i+=7) p[i].insert();p.doneInserting();p[12].SayHi(new HiMsg(...));
1D Declare & Use
In the interface (.ci) file
In the .C file
1D Definition
class Hello:public ArrayElement1D{
public:
Hello(void) { ... thisArrayID ... ... thisIndex ... } void SayHi(HiMsg *m) { ... }
Hello(CkMigrateMessage *m) {}};
Inherited from ArrayElement1D
module m { message HiMsg; array [3D] Hello { entry Hello(void); entry void SayHi(HiMsg *); };}; CProxy_Hello p=
CProxy_Hello::ckNew();for (int i=0;i<800000;i++) p(x(i),y(i),z(i)).insert();p.doneInserting();p(12,23,7).SayHi(new HiMsg(...));
3D Declare & Use
3D Definition
class Hello:public ArrayElement3D{ public: Hello(void) { ... thisArrayID ... ... thisIndex.x, thisIndex.y, thisIndex.z ... } void SayHi(HiMsg *m) { ... }
Hello(CkMigrateMessage *m) {}};
3D Definitionclass Hello:public ArrayElement3D{ public: Hello(void) { ... thisArrayID ... ... thisIndex.x, .y, .z ... } void SayHi(HiMsg *m) { ... }
Hello(CkMigrateMessage *m) {} void pup(PUP::er &p) { ArrayElement3D::pup(p); p(myVar1);p(myVar2); ... }};
module m{ message HiMsg; array [Foo] Hello { entry Hello(void); entry void SayHi(HiMsg *); };}; CProxy_Hello p=
CProxy_Hello::ckNew();for (...) p[CkArrayIndexFoo(..)].insert();p.doneInserting();p[CkArrayIndexFoo(..)].SayHi(..);
Generalized “arrays”: Declare & Use
class Hello:public ArrayElementT<CkArrayIndexFoo>{ public: Hello(void) { ... thisIndex ...
class CkArrayIndexFoo: public CkArrayIndex{ Bar b; //char b[8]; float b[2];..public: CkArrayIndexFoo(...) {... nInts=sizeof(b)/sizeof(int); }};
General Definition
Broadcast message SayHi: p.SayHi(new HiMsg(...));
Reduce x across all elements: contribute(sizeof(x),&x,CkReduction::sum_int);
Where do reduction results go? To a reduction “client” function, registered by the caller (typically as soon as the array is created)
CProxy_A a = Cproxy_A::ckNew();a.setReductionClient(clientFunction, (void *) refData);
Collective ops
Delete element i: p[i].destroy();
Migrate to processor destPe: migrateMe(destPe);
Enable load balancer: by creating a load balancing object
Provide pack/unpack functions:
Each object that needs this, provides a “pup” method. (pup is a single abstraction that allows data traversal for determining size, packing and unpacking)
Migration support
Object Groups• A group of objects (chares)
– with exactly one representative on each processor
– A single Id for the group as a whole
– invoke methods in a branch (asynchronously), all branches (broadcast), or in the local branch
– creation: • groupId = new Cproxy_C(msg)
– remote invocation: • CProxy_C p(groupId);
• p.methodName(msg); // p.methodName(msg, peNum);
• p.LocalBranch->f(….);
Information sharing abstractions• Observation:
– Information is shared in several specific modes in parallel programs
• Other models support only a limited sets of modes:– Shared memory: everything is shared: sledgehammer approach
– Message passing: messages are the only method
• Charm++: identifies and supports several modes– Readonly / writeonce
– Tables (hash tables)
– accumulators
– Monotonic variables
Compiling Charm++ programs• Need to define an interface specification file
– mod.ci for each module mod
– Contains declarations that the system uses to produce proxy classes
– These produced classes must be included in your mod.C file
– See examples provided on the class web site.
• More information:– Manuals, example programs, papers
• http://charm.cs.uiuc.edu
• These slides are currently at: – http://charm.cs.uiuc.edu/kale/cse320
Fortran 90 version• Quick implementation on top of Charm++• How to use:
– follow example program, with the same basic concepts
– Only use object arrays, for now• Most useful construct
• Object groups can be implemented in C++, if needed
Further Reading• More information:
– Manuals, example programs, papers• http://charm.cs.uiuc.edu
• These slides are currently at: – http://charm.cs.uiuc.edu/kale/cse320