DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to...

18
DIANE Overview DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid

Transcript of DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to...

Page 1: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

DIANE Overview

Germán Carrera, Alfredo Solano (CNB/CSIC)

EMBRACE COURSE

Monday 19th of February to Friday 23th. CNB-CSIC Madrid

Page 2: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• DIANE is a lightweight distributed framework for parallel scientific applications in master-worker model. It assumes that a job may be split into a number of independent tasks which is a typical case in many scientific application.

• As opposed to standard message passing libraries such as MPI, the DIANE framework takes care of all synchronization, communication and workflow management details on behalf of the application. The execution of a job is fully controlled by the framework which decides when and where the tasks are executed.

• DIANE is a thin software layer which easily works on top of more fundamental middleware such as LSF, PBS or the Grid Resource Brokers. It may also work in a standalone mode and does not require any complex underlying software.

Page 3: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Main Features and Design Principles– The big picture

Page 4: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Master-Worker Workflow Model– DIANE is based on pull model - workers ask for tasks to the master.

Master decides how to assign tasks to workers and user may optimize this process for a particular application.

Page 5: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Active feedback versus batch operation mode– Typically End User interacts with the GRID using some sort of User

Interface. User Interface may be as simple as a set of command line tools or more complicated GUI based application which contains modules to prepare and monitor jobs (Application and Job Handler respectively).

Page 6: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• In the active feedback operation mode each Worker pull for new subjobs when it becomes available. Fast feedback to Job Master allows interactive work for the end user.

Page 7: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Core framework– DIANE core framework does not depend on any concrete application (in

particular any data analysis software) and is explicitly designed in such a way that application specific parts are implemented as a separate component. Core framework is implemented in python running CORBA in the backend in a way completely transparent for applications.

• Supported languages for applications– C++ and python application components are supported directly and may

be configured at runtime according to different usage scenarios (as threads or separate processes). Application written in any language in a form of executable file (FORTRAN, Java) may also be used.

• Error Recovery– Users may specify customized error recovery policies if needed. A set of

default policies is provided and may be used immediately. User may easily write and add special recovery policies by implementing simple python functions

Page 8: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Job Monitoring and Outbound Connectivity from Worker Nodes– Remote client (user) gets full information about the state of a job and may

connect and disconnect at any time. Administrator may set up any number of proxies between Client and Master so outbound connectivity from worker nodes is not required. In this way DIANE may be very easily adapted to local policies of computing centers.

– Example: each of the commands below is executed on a different machine. Connecting remote client directly to job master:

% diane.startmaster --job=test # cluster% diane.startclient --job=test # end user

Connecting remote client through a proxy:

% diane.startmaster --job=test # cluster% diane.startclient --proxy # proxy on a gateway machine of the cluster% diane.startclient --job=test # end user

Page 9: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Simple single-user job execution on a local cluster– User using LSF at CERN may start a new job running master on his

local desktop machine while submitting workers as individual jobs to LSF:

– % diane.startjob --job=test --workers=30 --broker=LSF --broker-options=-q8nm

• Software building blocks– Master/Worker components may be arranged in a variaty of way to build

more sophistcated systems or to integrate into existing frameworks.

Page 10: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• How DIANE fits into the GRID picture– DIANE runs on top of low-level GRID services.

Page 11: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• Quick start– JobInitData contains application specific parameters for 'crashTest' application,

which is used to simulate application failures in different time patterns. Here we will use it to make sure everything was installed correctly.

diane.startjob -j $DIANE_TOP/dev/workspace/testOK.job -w2@localhost --wms=xterm DIANE: 22:18:39: Initializing: appname = crashTest

DIANE: 22:18:39: starting new job: id = 2DIANE: 22:18:39: number of registered workers = 0DIANE: 22:18:39: client running...[<function app_ok at 0x8294904>, 5.8641818780727872][<function app_ok at 0x8294904>, 10.35566135792468][<function app_ok at 0x8294904>, 11.051037211240827][<function app_ok at 0x8294904>, 10.967285308043389][<function app_ok at 0x8294904>, 9.5686214756534991][<function app_ok at 0x8294904>, 4.4414560806191457][<function app_ok at 0x8294904>, 11.219275775397689][<function app_ok at 0x8294904>, 8.9302782987551801][<function app_ok at 0x8294904>, 11.908280602567558][<function app_ok at 0x8294904>, 9.5101635521295869]DIANE: 22:18:39: job plan: #10 tasks<thread:JobControl>: 22:18:39: current job processing time: 0 s

Page 12: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• At the same time 2 xterminal windows should pop-up automatically: you will see the worker immediately put to work by the master and tasks succesively dispatched, executed and integrated.

switching to current user job workspace: /home/moscicki/diane.workspace/jobs/2DIANE: 22:19:41: reading master address from the default file: MasterOID

DIANE: 22:19:42: registering new worker with wid = 1worker: 22:19:42: initializing job 2, worker id 1worker: 22:19:42: job initialization finished with the status: ok

<thread:JobControl>: 22:19:43: dispatching taskid=1 to worker wid=1worker: 22:19:43: starting task #1doing action: <function app_ok at 0x82d8294> sleeping: 5.8641818780727872worker: 22:19:48: task 1 finished with the status: ok

DIANE: 22:19:48: recieved result, taskid =1 status: ok from worker: 1integrating result... waiting...5.8641818780727872DIANE: 22:19:54: Integrated result successfully...

<thread:JobControl>: 22:19:54: dispatching taskid=2 to worker wid=1worker: 22:19:54: starting task #2doing action: <function app_ok at 0x82d8294> sleeping: 10.35566135792468worker: 22:20:05: task 2 finished with the status: ok

Page 13: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• At the end of job execution you should see output like this:

<thread:JobControl>: 22:22:55: job completed ok, quitting control loop

DIANE: 22:22:55: notifying workers about finished job

DIANE: 22:22:55: deactivating master

worker: 22:22:55: notification from master: job 2 finished

worker: 22:22:55: worker cleanup status: ok

DIANE: 22:22:55: Trying to terminate server...

DIANE: 22:22:55: notifying client

Job terminated, id= 2

Summary =

DIANE: 22:22:55: Trying to terminate server...

DIANE: 22:22:55: job output in: /home/moscicki/diane.workspace/jobs/2

Page 14: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

• You can construct applications by creating Planner, Integrator and Worker objects in Python language.

• You decide what data structures are exchanged between these objects.

• More examples may be found in $DIANE_TOP/dev/applications.

Page 15: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

class Planner: def env_createPlan(self, jobData, chunkNum): init_list = []

import random random.seed(jobData[1])

prob = jobData[2] avg_wait = jobData[3] std_dev = jobData[4] # ...

for i in range(jobData[0]): action = random.choice(failures.values()) init_list.append([action,random.gauss(avg_wait, std_dev)])

return (None,init_list)

Page 16: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

class Integrator:

def env_init(self,job_data): pass

def env_addPartialOutput(self, wait):

print "integrating result... waiting..."+`wait`

if wait>0:

time.sleep(wait)

return 1

def env_getResult(self):

return None

Page 17: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

class Worker: def env_init(self, init_data): return 1

def env_performWork(self, what): action = what[0] wait = what[1] print "doing action: " + str(action) + " sleeping: " + `wait` if wait > 0: time.sleep(wait) return action(wait)

def env_done(self): return 1

Page 18: DIANE Overview Germán Carrera, Alfredo Solano (CNB/CSIC) EMBRACE COURSE Monday 19th of February to Friday 23th. CNB-CSIC Madrid.

DIANE Overview

•DIANE•Is free software under the GPL license•You can download at:

http://ganga.web.cern.ch/