Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

26
VIP Virtual Imaging Platform VIP ANR-09-COSI-013-02 1 www.creatis.insa-lyon.fr/vip Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform Rafael FERREIRA DA SILVA 1 , Sorina CAMARASU-POP 1 , Baptiste GRENIER 3 Vanessa HAMAR 2 , David MANSET 3 , Johan MONTAGNAT 4 , Jérôme REVILLARD 3 Javier ROJAS BALDERRAMA 4 , Andrei TSAREGORODTSEV 2 , Tristan GLATARD 1 1 Université de Lyon, CNRS, INSERM, CREATIS 2 Centre de Physique des Particules de Marseille 3 maatG France 4 CNRS/UNS, I3S lab, MODALIS team Bristol, HealthGrid 2011 1 www.creatis.insa-lyon.fr/vip

description

Presentation held at Healthgrid Conference 2011 - Bristol, England Abstract. This paper presents the architecture of the Virtual Imaging Platform supporting the execution of medical image simulation workflows on multiple computing infrastructures. The system relies on the MOTEUR engine for workflow execution and on the DIRAC pilot-job system for workload management. The jGASW code wrapper is extended to describe applications running on multiple infrastructures and a DIRAC cluster agent that can securely involve personal cluster resources with no administrator intervention is proposed. Grid data management is complemented with local storage used as a failover in case of file transfer errors. Between November 2010 and April 2011 the platform was used by 10 users to run 484 workflow instances representing 10.8 CPU years. Tests show that a small personal cluster can significantly contribute to a simulation running on EGI and that the improved data manager can decrease the job failure rate from 7.7% to 1.5%. More information: www.rafaelsilva.com

Transcript of Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

Page 1: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 1 www.creatis.insa-lyon.fr/vip

Multi-infrastructure workflow execution for medical simulation in the

Virtual Imaging Platform

Rafael FERREIRA DA SILVA1, Sorina CAMARASU-POP1, Baptiste GRENIER3 Vanessa HAMAR2, David MANSET3, Johan MONTAGNAT4, Jérôme REVILLARD3 Javier ROJAS BALDERRAMA4, Andrei TSAREGORODTSEV2, Tristan GLATARD1

1 Université de Lyon, CNRS, INSERM, CREATIS 2 Centre de Physique des Particules de Marseille

3 maatG France 4 CNRS/UNS, I3S lab, MODALIS team

Bristol, HealthGrid 2011

1 www.creatis.insa-lyon.fr/vip

Page 2: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 2 www.creatis.insa-lyon.fr/vip

  Simulation

Introduction

Example: Prostate traitement in protontherapy Computation: 2 months [L. Grevillot, D. Sarrut]

Example: 2D+t ultrasound simulation [O. Bernard]

Computation: 16h

8.5 CPU days

US PET MRI CT

Page 3: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 3 www.creatis.insa-lyon.fr/vip

  European Grid Infrastructure (EGI)   High throughput for computation and data transfers   Challenges

  High latencies   Low reliability

Introduction

Page 4: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 4 www.creatis.insa-lyon.fr/vip

  Multi-infrastructure workflow execution   Grid resources   Personal clusters (non-grid resources)

  Improve data transfer reliability   Local reliable storage

Our Goal

Page 5: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 5 www.creatis.insa-lyon.fr/vip

VIP Architecture

Page 6: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 6 www.creatis.insa-lyon.fr/vip

  Virtual Imaging Platform (VIP)   Workflow

  Core Simulation Workflows   MRI, US, CT and PET

Workflows

Activities

Data sinks

Data sources

Simulated US image of the heart

Page 7: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 7 www.creatis.insa-lyon.fr/vip

Workflow Wrapping

  Simulations are described as workflows

  Workflows are interpreted and executed by MOTEUR   Core engine workflow interpreter   Data flow evaluation   Production of computational tasks

  Workflow activities are described using jGASW   Code wrapper for distributed platforms and local executions   Grid jobs (bash scripts) are generated from processed data

  Each jGASW descriptor bundles a unique executable

http://modalis.i3s.unice.fr/softwares/moteur/start http://modalis.i3s.unice.fr/jgasw

Page 8: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 8 www.creatis.insa-lyon.fr/vip

Pilot Jobs

User Jobs

Pilot-jobs System

gLite WMS

Worker Nodes

  Pull mode   Execution environment verification   Worker node reservation

X DIRAC Architecture

Moteur + jGASW

Page 9: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 9 www.creatis.insa-lyon.fr/vip

Multi-infrastructure Execution

  Design constraints   No intervention from the cluster administrator   No sharing or generic user accounts   Minimal technical assumptions on the cluster architecture

  Setup   The agent is launched by the user on his cluster(s) account(s)   Download of a tar.gz bundle and run setup script   Support to PBS, BQS and SLURM

  Execution   Queries WMS for user’s waiting tasks   Submits pilots to the local cluster queue   Embedded data transfer client (VLET)

http://vip.creatis.insa-lyon.fr:9002/projects/dirac-cluster-agent/wiki/Wiki!

Personal cluster

VIP cluster bundle

Page 10: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 10 www.creatis.insa-lyon.fr/vip

Multi-infrastructure Execution

  Conditions   EGI   134-node cluster limited to 67 pilots running concurrently   3 executions of a PET workflow

  Results

  Conclusion   Involving a small personal cluster in a simulation has a significant

impact on the execution

Page 11: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 11 www.creatis.insa-lyon.fr/vip

Reliable Data Management

  EGI three-tier data management   Challenge: data availability between 80-95%   Storage Elements (SE) – DPM, dCache, STORM, Castor   Logical File Catalog (LFC) – single index space

  Current Data Management in VIP   Critical input files are replicated   Files are cached by pilot jobs   Output files are stored on site SE   Jobs error rate: 5-10%

  Local Data Manager   Failover storage   Available for users and grid jobs   Overlay of DPM SE

Page 12: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 12 www.creatis.insa-lyon.fr/vip

Reliable Data Management   Data Management use case

Input download

Output upload

Page 13: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 13 www.creatis.insa-lyon.fr/vip

Impact of the Data Manager on Job Reliability

  Conditions   EGI, biomed VO (production infrastructure)   Ultrasonic Simulation comprised of 128 jobs   Each job has 5 input files + 1 output file   Failure rate: 1%

  Results

  Conclusion   Job data transfers failure rate can be significantly decreased in the

presence of a failover storage

Page 14: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 14 www.creatis.insa-lyon.fr/vip

Web Portal

  Workflow submission and management

  Workflow execution and monitoring

Page 15: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 15 www.creatis.insa-lyon.fr/vip

Web Portal

  Detailed performance information

Page 16: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 16 www.creatis.insa-lyon.fr/vip

Web Portal

  Authentication based on X.509 certificates

Page 17: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 17 www.creatis.insa-lyon.fr/vip

Web Portal

  File transfer operations

http://vip.creatis.insa-lyon.fr

Page 18: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 18 www.creatis.insa-lyon.fr/vip

Platform Usage Statistics

  484 workflows executions (Nov 2010 – Apr 2011)   10 (real) users   5 application classes

Page 19: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 19 www.creatis.insa-lyon.fr/vip

Conclusions

  VIP can execute workflows on multi-infrastructures   Extension of jGASW application description   Extension of DIRAC to support personal clusters with no administration

intervention and respecting common security rules   Results show that a small personal cluster can significantly contribute to

a simulation running on EGI

  VIP Data Manager   A first test shows that job failure rate was decreased from 7.7% to 1.5%

  Try it!   Requires a certificate registered in the Biomed VO   http://vip.creatis.insa-lyon.fr

Page 20: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 20 www.creatis.insa-lyon.fr/vip

Future Work

  Workflow Execution

  Scalability and Reliability   Memory shortage   Workflow checkpointing

  Provenance of simulated data

Page 21: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 21 www.creatis.insa-lyon.fr/vip

Future Work

  Simulators and Models

  Biological object model sharing for image simulation   See poster FORESTIER et al. at CBMS 2011   Semantic catalog of biological models   3D visualization

Page 22: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 22 www.creatis.insa-lyon.fr/vip

Future Work

  Simulators and Models

  Multi-Modality Simulation Workflow   See poster MARION et al. at CBMS 2011

Sindbad (CT) SIMRI (MRI)

Page 23: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 23 www.creatis.insa-lyon.fr/vip

Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

Questions?

23 www.creatis.insa-lyon.fr/vip

http://www.creatis.insa-lyon.fr/vip!

Page 24: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 24 www.creatis.insa-lyon.fr/vip

  Fabrizio Gagliardi, Bob Jones, Fran ̧ois Grey, Marc-Elian Bégin, and Matti Heikkurinen. Building an infrastructure for scientific grid computing: status and goals of the egee project. Phil. Trans. R. Soc. A 15, 363(1833):1729–1742, aug 2005.

  T. Li, S. Camarasu-Pop, T. Glatard, T. Grenier, and H. Benoit-Cattin. Optimization of mean-shift scale parameters on the egee grid. In Studies in health technology and informatics, Proceedings of Healthgrid 2010, volume 159, pages 203–214, 2010.

  A. Marion, G. Forestier, H. Benoit-Cattin, S. Camarasu-Pop, P. Clarysse, R. Ferreira da Silva, B. Gibaud, T. Glatard, P. Hugonnard, C. Lartizien, H. Liebgott, J. Tabary, S. Valette, and D. Friboulet. Multi- modality medical image simulation of biological models with the Virtual Imaging Platform (VIP). In IEEE CBMS 2011, Bristol, UK, 2011. submitted.

  Tristan Glatard, Johan Montagnat, Diane Lingrand, and Xavier Pennec. Flexible and efficient workflow deployement of data-intensive applications on grids with MOTEUR. International Journal of High Performance Computing Applications (IJHPCA), 22(3):347–360, August 2008.

  Javier Rojas Balderrama, Johan Montagnat, and Diane Lingrand. jGASW: A Service-Oriented Frame- work Supporting High Throughput Computing and Non-functional Concerns. In IEEE International Conference on Web Services, ICWS 2010, Miami (FL), USA, July 2010. IEEE Computer Society.

  A Tsaregorodtsev, N Brook, A Casajus Ramo, Ph Charpentier, J Closier, G Cowan, R Graciani Diaz, E Lanciotti, Z Mathe, R Nandakumar, S Paterson, V Romanovsky, R Santinelli, M Sapunov, A C Smith, M Seco Miguelez, and A Zhelezov. DIRAC3 . The New Generation of the LHCb Grid Software. Journal of Physics: Conference Series, 219(6):062029, 2009.

  L. Grevillot, T. Frisson, D Maneval, N. Zahra, J.N. Badel, and D. Sarrut. Simulation of a 6 mv elekta precise linac photon beam using gate / geant4. Phys Med Biol, 56(4), 2011.

References

Page 25: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 25 www.creatis.insa-lyon.fr/vip

  Silvia Olabarriaga, T. Glatard, and P.T. de Boer. A virtual laboratory for medical image analysis. IEEE T Inf Technol B, 14(4):979–985, 2010.

  Vladimir V. Korkhov, Jakub T. Moscicki, and Valeria V. Krzhizhanovskaya. Dynamic workload bal- ancing of parallel applications with user-level scheduling on the grid. Future Generation Computer Systems, 25(1):28 – 34, 2009.

  Ivo D. Dinov, John D. Van Horn, Kamen M. Lozev, Rico Magsipoc, Petros Petrosyan, Zhizhong Liu, Allan MacKenzie-Graham, Paul Eggert, Parker Douglas S., and Arthur W. Toga. Efficient, Distributed and Interactive Neuroimaging Data Analysis Using the LONI Pipeline. Frontiers in Neuroinformatics, 3(22):1–10, 2009.

  Thierry Delaitre, Tamas Kiss, Ariel Goyeneche, Gabor Terstyanszky, Stephen Winter, and Péter Kacsuk. GEMLCA: Running Legacy Code Applications as Grid services. Journal of Grid Computing, 3(1):75– 90, 2005.

  Kyle Chard, Wei Tan, Joshua Boverhof, Ravi Madduri, and Ian Foster. Wrap Scientific Applications as WSRF Grid Services Using gRAVI. In International Conference on Web Services, ICWS’09, Los Angeles (CA), USA, July 2009.

  Martin Senger, Peter Rice, Alan Bleasby, Tom Oinn, and Mahmut Uludag. Soaplab2: More Reliable Sesame Door to Bioinformatics Programs. In Bioinformatics Open Source Conference, BOSC’08, Toronto, ON Canada, July 2008.

  Douglas Thain, Todd Tannenbaum, and Miron Livny. Condor and the Grid, pages 299–335. John Wiley & Sons, Ltd, 2003.

  Vinod Kasam, Jean Salzemann, Marli Botha, Ana Dacosta, Gianluca Degliesposti, Raul Isea, Do- man Kim, Astrid Maass, Colin Kenyon, Giulio Rastelli, Martin Hofmann-Apitius, and Vincent Breton. Wisdom-ii: Screening against multiple targets implicated in malaria using computational grid infrastruc- tures. Malaria Journal, 8(1):88, 2009.

References

Page 26: Multi-infrastructure workflow execution for medical simulation in the Virtual Imaging Platform

VIP Virtual Imaging Platform

VIP ANR-09-COSI-013-02 26 www.creatis.insa-lyon.fr/vip

Workflow Wrapping

  Multi-infrastructure code descriptor (jGASW extension)