[email protected] DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi-...

18
[email protected] DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki, CERN/IT

Transcript of [email protected] DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi-...

Page 1: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE

Distributed Analysis Environment for semi-interactive simulation and analysis in Physics

Jakub T. Moscicki, CERN/IT

Page 2: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

The need for distribution

do the analysis/simulation job in parallel tasks

to speed up the work

by using powerful, worldwide distributed computentional resources,

acessing the data in mass storage systems otherwise too big to fit on your laptop.

Page 3: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Practical Exampleexample: simulation with analysis

each task produces a file with histograms

job result = sum of histograms produced by tasks

master-worker model

client starts a job

workers perform tasks and produce histograms

master integrates the results

Page 4: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Tools at hand: local batch queue

clusters/farms of PCs running batch queuesuse LSF or PBS to submit parallel analysis tasks producing histograms

collect and post-process results by hand

add all the resulting histogram files

> foreach i (1 2 3 4 5 6 7 8 9 10) > bsub -q 8nh run-worker > end

Job <250973> is submitted to queue <8nh>. Job <250974> is submitted to queue <8nh>. ...

>ls LSFJOB_250973 LSFJOB_250974 LSFJOB_250975

Page 5: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Tools at hand: global batch queue

federation of clusters also known as a GRIDuse EDG Resource Broker to submit tasks

> dg-job-submit worker.jdl

Connecting to host grid014.ct.infn.it, port 7771Logging to host grid014.ct.infn.it, port 15830

****************************************************************************************** JOB SUBMIT OUTCOME The job has been successfully submitted to the Resource Broker. Use dg-job-status command to check job current status. Your job identifier (dg_jobId) is:

- https://grid014.ct.infn.it:7846/137.138.181.249/195456283026315?grid014.ct.infn.it:7771

******************************************************************************************

Page 6: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Comments

using middleware directly requires a lot of manual workintegration of task results

keeping track of failed task and resubmiting workers

not easy to monitor the job progress and cancel jobs

only one task per workervery inefficient if worker initialization time is long

Page 7: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

User Wishlist

automatic integration of task results

monitoring of job progress and individual tasks

automatic error-recovery policies

granularity of the size of the task may change independently of the number of workers -- natural load-balancing and optimization of performance

performance fine tuning – workers may be mapped to threads, processed or machines depending on the context

uniform, transparent and easy user interface and API hiding complexity of underlying middleware mechanisms

the same API and UI is used when running local jobs and GRID jobs

batch, interactive and semi-interactive operation mode

Page 8: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Wishlist (cntd)

a lightweight “add-on” framework which drives the execution of parallel jobs in master worker model over any specific middleware implementation:

application oriented: target common HEP use cases

independent from any particular analysis tool

with layered and modular architecture which is easy to adapt to new environment: important for middleware transition

integrated in modern scripting environment: e.g. python

using standards: e.g. exploit AIDA for analysis making it easy to plug your favourite analysis tool

To address these issues DIANE Project was set up in CERN/IT

Page 9: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE OverviewDIANE R&D Project

started in 2001 in CERN/IT with very limited resources (~1FTE)

collaboration with Geant 4 groups at CERN, INFN, ESA

succesful prototypes running on LSF and EDG

Page 10: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Applications of DIANEExamples of interdisciplinary applications

Geant4 simulation and analysis

speed-up factor ~ 30 times

cern.ch/diane

LHC: ntuple analysis and simulationradiotherapy: brachytherapy, IMRTspace missions: ESA Bepi Colombo, LISA

Page 11: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE for HEP workgroup clusters

features many users, many jobs diverse applications:

ntuple analysis, simulation, ... interactive ... semi-interactive ... batch ~ 100s of machines

dynamic environment users may submit their analysis code

mixed CPU and I/O intensive some applications may be preconfigured

general analysis e.g. ntuple projections or experiment specific apps load balancing important

Page 12: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE for Simulation in Medical Apps

example: brachytherapy optimization of the treatment planning by MC simulation

features CPU intensive few users, few jobs one preconfigured application interactive: seconds .. minutes ~ 10s of machines

ongoing joint collaboration with G4and hospital units in Torino, Italy

Page 13: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE for Simulation in Space Science

LISA: MC simulation for gravitational waves experiment

Bepi Colombo mission: HERMES experiment features

CPU intensive big jobs (10 processor-years) preconfigured applications batch: days 1000+ machines

requirements: error recovery important monitoring and diagnostics

Page 14: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE Prototype and Testing scalability tests

70 worker nodes

140 milion Geant 4 events

Page 15: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE Screenshot

Sun Mar 16 14:58:31 2003 : DIANE.JobMaster.workerReady : worker 5 now readySun Mar 16 14:58:42 2003 : DIANE.JobMaster<ControlThread>.run : number of tasks to finish: 1 len(self.master.job_progress) : 5 len(self.master.ready_workers) : 9 len(self.master.busy_workers) : 1 len(self.master.registered_workers):10

Sun Mar 16 14:58:45 2003 : DIANE.JobMaster.receiveTaskResult : recieved result, taskid =3 status: ok

Processing file task-output2.hbkAdding histogram 10Adding histogram 20Scanned all IDs from 0 to 100, other HBOOK ids (if any) were ignoredSun Mar 16 14:58:45 2003 : DIANE.JobMaster<ControlThread>.run : job completed ok, quitting control loopDIANE.JobMaster<ControlThread>.notifyJobFinished : starting notificationDIANE.JobMaster<ControlThread>.notifyJobFinished : deactivating masterDIANE.JobMaster.workerReady : master not activatedDIANE.JobMaster<ControlThread>.sendResultToClient : terminated...terminating JobMaster server process312.520u 77.250s 15:09.53 42.8% 0+0k 0+0io 5835pf+0w

[1] Done start_master

Page 16: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

DIANE Web Interface

Page 17: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

Referencesmore informarion:

cern.ch/diane

www.ge.infn.it/geant4/techtransf

aida.freehep.org

Page 18: Jakub.Moscicki@cern.ch DIANE Project CHEP 03 DIANE Distributed Analysis Environment for semi- interactive simulation and analysis in Physics Jakub T. Moscicki,

[email protected] DIANE ProjectCHEP 03

The end