Lightgrid—an agile distributed computing architecture for Geant4

5
Lightgridan agile distributed computing architecture for Geant4 Jason Young a , John O. Perry a , Tatjana Jevremovic b, a School of Nuclear Engineering, Purdue University, West Lafayette, IN, USA b Utah Nuclear Engineering Program, The University of Utah, Salt Lake City, UT, USA article info Article history: Received 12 September 2009 Received in revised form 13 November 2009 Accepted 7 December 2009 Available online 24 December 2009 Keywords: Geant4 Monte Carlo Grid computing MySQL PHP abstract A light weight grid based computing architecture has been developed to accelerate Geant4 computations on a variety of network architectures. This new software is called LightGrid. LightGrid has a variety of features designed to overcome current limitations on other grid based computing platforms, more specifically, smaller network architectures. By focusing on smaller, local grids, LightGrid is able to simplify the grid computing process with minimal changes to existing Geant4 code. LightGrid allows for integration between Geant4 and MySQL, which both increases flexibility in the grid as well as provides a faster, reliable, and more portable method for accessing results than traditional data storage systems. This unique method of data acquisition allows for more fault tolerant runs as well as instant results from simulations as they occur. The performance increases brought along by using LightGrid allow simulation times to be decreased linearly. LightGrid also allows for pseudo- parallelization with minimal Geant4 code changes. & 2009 Elsevier B.V. All rights reserved. 1. Introduction Grid computing is a concept that has been developed over a decade ago, but the full extent of its potential is only now being fully realized. Massive physics projects, such as the Large Hadron Collider (LHC) [1], use grid computers coupled with Monte Carlo simulations. The European Organization for Nuclear Research (CERN) is one of the foremost leaders in the development and maintenance of these codes. They have created or contributed to the development of such reputable grid computing solutions as the ‘‘Enabling Grid for E-sciencE’’ (EGEE) [2] and the LHC Computing Grid (LCG) [3]. These computing solutions are massive implementations that are designed to scale over thousands of computers across the world. While the developments that have come out of these grid computing initiatives are invaluable, they are designed for mainly large deployments. There are several possible limitations with the EGEE and LCG that prevent a streamlined implementation on a smaller scale. Both grid implementations require internet access and require users to request approval from EGEE to use grid resources. While for most research groups this is not a problem, there is a subset of users that wish to use their own available resources for grid computing. Applications of simulations that require independent, on-site grid computing would be unsuited for use of EGEE resources. In addition, industry applications that require constant Monte Carlo simulations would not be an ideal use of EGEE computing power. Therefore the authors have developed a new approach to grid computing; the software, called LightGrid, solves this issue by having a small disk footprint, readily available dependencies, and the ability to run on independent networks of machines without oversight by EGEE. LightGrid is a small, easy to manage grid computing solution built specifically for Geant4 [4] simulations. This software was designed for grids with less than one hundred nodes that have a broadband connection to the master server, as well as grids where the main computational power is not in the analysis but the simulation. This is an ideal situation for smaller groups such as those in a university setting who wish to set up their own grid computing solution. Previous grid and parallelization technolo- gies with Geant4 rely upon the programmers writing the simulation code to modify their code in multiple ways to allow it to work with the specific grid software. Current usage of Geant4 for parallelization and grid computing requires the use of other computing packages [5] such as DIANE [6]. Implementation of DIANE involves setting up DIANE and modifying extra DIANE scripts to allow for grid and parallel computation, in addition to slight changes to the Geant4 application to allow setting the seed. This creates a large learning curve that is detrimental to the usability of these very powerful grid computing codes. LightGrid is designed so that very minimal changes to the existing code are needed. Scripts are included to handle the MySQL integration and the web-based graphical user interface. The initial setup includes installing the LightGrid PHP files and commonly used programs ARTICLE IN PRESS Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/nima Nuclear Instruments and Methods in Physics Research A 0168-9002/$ - see front matter & 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.nima.2009.12.019 Corresponding author. Tel.: + 1 765 409 4536; fax: + 1 765 494 9570. E-mail addresses: [email protected] (J. Young), [email protected] (J.O. Perry), [email protected], [email protected] (T. Jevremovic). Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158

Transcript of Lightgrid—an agile distributed computing architecture for Geant4

Page 1: Lightgrid—an agile distributed computing architecture for Geant4

ARTICLE IN PRESS

Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158

Contents lists available at ScienceDirect

Nuclear Instruments and Methods inPhysics Research A

0168-90

doi:10.1

� Corr

E-m

(J.O. Per

journal homepage: www.elsevier.com/locate/nima

Lightgrid—an agile distributed computing architecture for Geant4

Jason Young a, John O. Perry a, Tatjana Jevremovicb,�

a School of Nuclear Engineering, Purdue University, West Lafayette, IN, USAb Utah Nuclear Engineering Program, The University of Utah, Salt Lake City, UT, USA

a r t i c l e i n f o

Article history:

Received 12 September 2009

Received in revised form

13 November 2009

Accepted 7 December 2009Available online 24 December 2009

Keywords:

Geant4

Monte Carlo

Grid computing

MySQL

PHP

02/$ - see front matter & 2009 Elsevier B.V. A

016/j.nima.2009.12.019

esponding author. Tel.: +1 765 409 4536; fax

ail addresses: [email protected] (J. Young

ry), [email protected], tatjanaj@p

a b s t r a c t

A light weight grid based computing architecture has been developed to accelerate Geant4

computations on a variety of network architectures. This new software is called LightGrid. LightGrid

has a variety of features designed to overcome current limitations on other grid based computing

platforms, more specifically, smaller network architectures. By focusing on smaller, local grids,

LightGrid is able to simplify the grid computing process with minimal changes to existing Geant4 code.

LightGrid allows for integration between Geant4 and MySQL, which both increases flexibility in the grid

as well as provides a faster, reliable, and more portable method for accessing results than traditional

data storage systems. This unique method of data acquisition allows for more fault tolerant runs as well

as instant results from simulations as they occur. The performance increases brought along by using

LightGrid allow simulation times to be decreased linearly. LightGrid also allows for pseudo-

parallelization with minimal Geant4 code changes.

& 2009 Elsevier B.V. All rights reserved.

1. Introduction

Grid computing is a concept that has been developed over adecade ago, but the full extent of its potential is only now beingfully realized. Massive physics projects, such as the Large HadronCollider (LHC) [1], use grid computers coupled with Monte Carlosimulations. The European Organization for Nuclear Research(CERN) is one of the foremost leaders in the development andmaintenance of these codes. They have created or contributed tothe development of such reputable grid computing solutions asthe ‘‘Enabling Grid for E-sciencE’’ (EGEE) [2] and the LHCComputing Grid (LCG) [3]. These computing solutions are massiveimplementations that are designed to scale over thousands ofcomputers across the world. While the developments that havecome out of these grid computing initiatives are invaluable, theyare designed for mainly large deployments. There are severalpossible limitations with the EGEE and LCG that prevent astreamlined implementation on a smaller scale. Both gridimplementations require internet access and require users torequest approval from EGEE to use grid resources. While for mostresearch groups this is not a problem, there is a subset of usersthat wish to use their own available resources for grid computing.Applications of simulations that require independent, on-site gridcomputing would be unsuited for use of EGEE resources. In

ll rights reserved.

: +1 765 494 9570.

), [email protected]

urdue.edu (T. Jevremovic).

addition, industry applications that require constant Monte Carlosimulations would not be an ideal use of EGEE computing power.Therefore the authors have developed a new approach to gridcomputing; the software, called LightGrid, solves this issue byhaving a small disk footprint, readily available dependencies, andthe ability to run on independent networks of machines withoutoversight by EGEE.

LightGrid is a small, easy to manage grid computing solutionbuilt specifically for Geant4 [4] simulations. This software wasdesigned for grids with less than one hundred nodes that have abroadband connection to the master server, as well as grids wherethe main computational power is not in the analysis but thesimulation. This is an ideal situation for smaller groups such asthose in a university setting who wish to set up their own gridcomputing solution. Previous grid and parallelization technolo-gies with Geant4 rely upon the programmers writing thesimulation code to modify their code in multiple ways to allowit to work with the specific grid software. Current usage of Geant4for parallelization and grid computing requires the use of othercomputing packages [5] such as DIANE [6]. Implementation ofDIANE involves setting up DIANE and modifying extra DIANEscripts to allow for grid and parallel computation, in addition toslight changes to the Geant4 application to allow setting the seed.This creates a large learning curve that is detrimental to theusability of these very powerful grid computing codes. LightGridis designed so that very minimal changes to the existing code areneeded. Scripts are included to handle the MySQL integration andthe web-based graphical user interface. The initial setup includesinstalling the LightGrid PHP files and commonly used programs

Page 2: Lightgrid—an agile distributed computing architecture for Geant4

ARTICLE IN PRESS

Fig. 1. LightGrid’s run setup webpage.

J. Young et al. / Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158 155

onto the grid computers. Setup can be completed on the mainserver in an hour, with client node installation complete inapproximately 10 min. This allows for unparalleled usability ofgrid computing resources. Job management is handled on themain grid server and requires no interaction past the initial runassignment. Worker nodes that are brought offline due to variouscauses are detected by LightGrid and their jobs are assignedelsewhere. LightGrid is CPU core aware and only sets onesimulation to run at any time on any CPU core. This allows formaximum performance with today’s high performance, multi-core processors.

An ideal LightGrid grid is comprised of more than twomachines running any current Linux distribution. This softwareis all popular, commonly used open source software that caneasily be installed by package management systems on mostLinux distributions. As long as all of the clients are running thesame software architecture, the grid software will work correctly.This can be easily accomplished by selecting a standard Linuxdistribution and version, which will install the same set ofpackages on each worker. A broadband connection is requiredto make the grid run to its full potential. Jobs are pushed out tonodes from the master server via a set of PHP queries and pagerequests. The files involved in the simulation are hosted with aNetwork File System (NFS) on the master server, which the clientspull from to obtain both the Geant4 jobs as well as the Geant4code. Geant4 simulations are compiled on the master serverbefore they are sent to their clients.

Fig. 2. LightGrid’s client control page.

2. LightGrid architecture

The grid software that has been previously developed by CERNis designed for very large networks. The EGEE, as employed byCERN, has approximately 114,000 CPUs available for the grid, withapproximately 20 petabytes of data storage capacity [7]. It is ableto handle thousands of nodes going offline, large latency times,and being assigned thousands of jobs every day. To be able tohandle this sort of workload, the grid computing software hasbeen developed to easily handle multiple problems that mayoccur in this system. While this is excellent for CERN and otherlarge computing organizations, this vastly outweighs the needs ofsmaller grid operators. LightGrid is designed to deal with theseexact problems by creating a smaller grid setup that is designed towork mainly on smaller, local area networks.

The initial development grid setup includes two identicalPentium 4 computers running Ubuntu 8.04, and a Core 2 Quadcomputer running Ubuntu 8.04. This provided enough computersto accurately assess the performance of the LightGrid. WhileLightGrid can easily handle more worker nodes than this, it canscale down to as few as one worker node that has multiple coresand still provide a performance boost over a single core run.LightGrid can even be applied as an easy way to pseudo-parallelize Geant4 simulations, by setting up the main server asa worker node. While this is not true parallelization, the speedupover the default Geant4 installation would still be achieved.

2.1. PHP interface

To aid the usability of LightGrid, a PHP-based graphical userinterface (GUI) was developed. This GUI is extremely straightfor-ward and is accessed with a web browser. After the user uploadstheir Geant4 code to a specified directory using FTP, they thenvisit the main LightGrid page. There are currently four options onthis page, as shown in Fig. 1.

LightGrid automatically detects Geant4 simulations in its workdirectory and scans the folder for a C++ source code file. LightGrid

requires, at minimum, four inputs: the simulation name, thenumber of requested runs, the macro file name (as required by theGeant4 application), and the MySQL data table chosen by the userfor data collection. The user then submits the form and LightGridthen processes simulations, providing that the four input criteriaare met.

LightGrid provides other simple interfaces that allow users tointeract with the system without touching the MySQL database.These scripts are available for other parts of the grid system,including server management, run history, and optional dataanalysis.

The server addition screen, as shown in Fig. 2, allows for usersto see all servers that have been input into LightGrid. Essentialinformation, such as nickname, server IP, CPU statistics, RAMmemory, and server status are displayed from the MySQLdatabase for the user. Servers can be toggled between availableand unavailable for the LightGrid by a simple checkbox.

2.2. MySQL backend

MySQL is an integral part of LightGrid. As fast, free and open-source software, MySQL allows for easy data storage, retrieval,and backup that can be accessed by any internet connection. TheMySQL server stores the data that is used for server information,job status information and Geant4 data collection. LightGrid usesMySQL as its means of storing all dynamic data that is needed byLightGrid. By relying on MySQL, LightGrid has selected a systemthat allows multiple data collection locations for the Geant4simulation. To adapt Geant4 to the LightGrid package, a fewsimple changes must be made. First, the seed in the main Geant4simulation source file must be randomized. To accomplish this,

Page 3: Lightgrid—an agile distributed computing architecture for Geant4

ARTICLE IN PRESS

J. Young et al. / Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158156

the setTheSeed() command for CLHEP is passed a parameter thatsufficiently randomizes it. The user can choose what to use to setthe seed—random number generators work fine for this purpose.The next step is to change Geant4 to allow PHP and MySQL tohandle the data. Advanced users can compile MySQL support intoGeant4, but for simplicity PHP is used in the default LightGridinstallation. In place of the normal data collection routine used byGeant4, such as writing to a text file, the user must instead call thesystem() command from C++ to execute a provided PHP script.The data to be collected, such as particle energy, can be passed tothe PHP script via command line arguments. The final step is tomodify the PHP script to pass the data to the MySQL server. Forthis, users simply need to fill in the details of their MySQLdatabase and the modifications are complete. This PHP script hasbeen adapted to insert data directly into the database as thesimulation is executed. MySQL is also easily accessible by allmainstream programming languages, allowing for data analysispackages to draw their data from MySQL databases instead oflarge text files. Text parsing can slow down analysis, as timeis needed to copy the entire file, load data from the file on eachcomputer that wishes to perform analysis, and then start theanalysis. MySQL makes results much easier to transfer betweenresearchers, as researchers can simply build in MySQL supportinto their analysis packages using a number of readily availablelibraries.

2.3. Job assignment method

LightGrid is designed to take advantage of the multiple CPUcores that have recently become more popular in computing. Bydefault, LightGrid only assigns one Geant4 simulation to eachprocessing core. This allows for each core to be dedicated to oneGeant4 simulation without having to use their cycles on multiplesimulations, therefore maximizing efficiency in each CPU whilestill decreasing overall runtime. Job assignments are handled byvarious PHP scripts contained within LightGrid. When a new job issubmitted to LightGrid, it contains a Geant4 simulation as well asdetails on how many times the programmer needs the simulationrun. LightGrid then calls a PHP script that queries the MySQLdatabase for all active servers on the grid. The MySQL databaseholds information on the servers including their IP address,number of cores, and speed of their CPU. Using this information, itsends a query to a PHP script on each worker node to see howmany Geant4 simulations are currently being run on that worker.As each worker node responds with their number of activesimulations, LightGrid starts to assign jobs to workers based uponthe difference between the number of cores and the number ofactive simulations on that node. This load balancing ensures thatall machines are utilized efficiently and not overloaded. Duringthe time that a job is active in the LightGrid system, LightGridcontinuously queries all workers to ensure that they are alwaysutilized to their full capacity. If a machine crashes or disappearsfrom the grid, LightGrid reassigns that job to another node.LightGrid continues to monitor every active node for activity untilall jobs have been completed.

Fig. 3. Bremsstrahlung from Geant4.

3. MySQL for data analysis

Geant4, by default, has two main data outputs that are notconnected to analysis packages such as ROOT [8]. The first is asimple output to the shell which calls Geant4. This output is idealfor small simulations where the goal is to determine if thesimulation is working as expected and what the general resultsare. The second main output is using the C++ libraries to write tofiles on a hard disk. This output is much more suitable to larger

experiments that produce large amounts of data. However, suchlarge text files are unwieldy and provide no easy way to transportthem without hosting on a public web server. LightGrid utilizesthe powerful MySQL database software to get around thislimitation. MySQL is a flexible solution to this problem becauseit allows data to be collected in an uncomplicated fashion withoutsacrificing any of the speed of normal output.

In the interest of simplicity, LightGrid does not use C++wrappers or connectors to connect Geant4 to MySQL. Includinga MySQL client implementation in C++ would require the use oflibraries that require various dependencies and cannot be easilyshared between the master server and the worker nodes. Futureprogress on LightGrid will address these issues in a manner thatwill be easy to adapt for users. There are also conflicts withvarious versions of dependencies that could cause the MySQLconnection to fail. Different versions of the MySQL client libraries,along with older C++ connectors, could potentially not supportnewer MySQL servers. LightGrid is designed to work on aheterogeneous grid with only a few requirements of each node.LightGrid uses the C++ function system(), as discussed earlier, toexecute PHP scripts packaged along with LightGrid. While PHPalso comes in multiple varieties, it is one of the most popular opensource packages and has easily installable MySQL support.Differing versions execute the packaged MySQL code with noproblem, allowing multiple versions of PHP to be implementedacross the grid. The commands for implementing MySQL in PHP,such as mysql_connect() and mysql_query(), allow for easymodification for programmers who have not been formallyintroduced to PHP. When, during the course of a simulation,Geant4 has a result, it makes a system call to the PHP/MySQLscript. This script accepts results as command line arguments andsends those results to the MySQL server. As Geant4 makes thissystem call, it continues simulating as to minimize overhead.LightGrid comes with a default MySQL database that userscan modify to their own needs. LightGrid can be programmed tosend multiple parameters to the MySQL server, such as physicallocation, energy, and any other characteristic of a particle inGeant4.

Page 4: Lightgrid—an agile distributed computing architecture for Geant4

ARTICLE IN PRESS

J. Young et al. / Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158 157

The MySQL integration with Geant4 also has a low overheadrelative to the Geant4 computational process. For benchmarkingpurposes, a simulation of a bremsstrahlung spectrum with a5.3 MV incident electron beam was used [9]. This simulationin Geant4 contained a world volume, a 2 mm thick block oftungsten, a detector region, and the electron beam. The Geant4bremsstrahlung is shown in Fig. 3. According to this simulationapproximately 10% of the original electrons createbremsstrahlung radiation that travel through the detectorregion. This bremsstrahlung simulation was set to simulate fivemillion electrons. This translates to approximately 500,000particles that travel through the detector region and then haveto be output into their respective data bank. Benchmarks wereperformed on a 2.8 GHz Pentium 4 machine running Ubuntu 8.04.The execution time, as reported by the Geant4 run manager, wasused. A 5 million particle run with MySQL took 21573.32 s tocomplete, while the same simulation with a text file outputtook 21306.89 s. This means that the MySQL integration viaPHP produces approximately 1.25% overhead on the initial datastorage. MySQL integration, however, includes transporting thedata to the main server from each client node in addition toproviding real time results to scientists. Therefore, the overheadassociated with MySQL integration is negligible.

4. LightGrid benchmarks

Due to its focus on usability, LightGrid produces an approxi-mately linear trend of runtime reduction. For benchmarking, atotal one million particles had to be run. For each case, thenumber of particles run by each processor was calculated byevenly splitting one million particles evenly between theprocessors. The data shown in Table 1 were calculated on a2.40 GHz Core 2 Quad CPU.

To test the overall scale of the grid computing system the onemillion particle bremsstrahlung simulation was broken into eightparts of 125,000 particle simulations. Each of these simulations

Fig. 4. Bremsstrahlung benchmark

Table 1Parallelization speedup times.

CPUs Runtime Speedup

1 2319.8 s –

2 1130.97 2.05

4 582.67 3.98

was run on a dedicated core of a CPU. This simulation wasconducted using two 2.80 GHz Pentium 4 computers, a 2.40 GHzCore 2 Quad computer, and a 2.00 GHz Core 2 Duo computer.Therefore, half of the particles were simulated on the Core 2 QuadCPU, with the others being distributed between the remainingthree computers. The Core 2 Quad was able to simulate particlesthe fastest, followed by the Core 2 Duo and finally the twoPentium 4 computers.

The simulation results are shown in Fig. 4, where particles areonly counted as simulated if they were detected by the detectorregion in the Geant4 application. The initial number of particlessimulated is initially being computed on a linear trend at a rate of27.16 particles detected per second. At approximately 350 s intothe benchmarked simulation, the Core 2 Quad computer finishesits part of the simulation. This results in an average detection rateof 10.81 particles per second from 350 to 380 s. At this point, theCore 2 Duo computer finishes its simulation. This leaves only theremaining Pentium 4 computers, the slowest in the benchmark, tocomplete their simulation at an approximate rate of 5.1 particlesdetected per second. The simulation is completed 577 s after itbegins. Due to the slow nature of some of the grid computers andthe relatively quick initial simulation, the speedup may not beinitially appreciated. With longer simulations designed to runhundreds or thousands of times, even with smaller particle sets,the faster computers would be assigned more jobs and, therefore,would be kept busier by the LightGrid.

5. Conclusion

Grid computing is a powerful resource that allows for speedupof computationally intensive tasks. The new LightGrid architec-ture that the authors have developed for Geant4 brings gridcomputing to smaller teams who wish for faster, grid basedsimulations without massive code rewrites. The integrationbetween Geant4 and MySQL allows for real-time simulationresults with negligible slowdown. The overall speedup usingLightGrid follows a linear trend, which allows for more rapidsimulations. The small footprint of LightGrid, combined with thereadable PHP code and readily available backend software allowsit to be easily customized to fit the needs of scientists withminimal development work.

As development of LightGrid continues there will be threemain focuses: graphical user interface expansion, more robustclient handling, and more automated PHP/MySQL integrationwith Geant4 and C++. These changes will continue to allow

on a four computer LightGrid.

Page 5: Lightgrid—an agile distributed computing architecture for Geant4

ARTICLE IN PRESS

J. Young et al. / Nuclear Instruments and Methods in Physics Research A 614 (2010) 154–158158

LightGrid to be an excellent solution to small, localized gridcomputing.

Acknowledgements

This research is supported by the National Science Foundationand the Department of Homeland Security. Partial research doneby Jason Young is supported through the Purdue SURF program,summer of 2009.

References

[1] L. Evans, et al., J. Instrum. 3 (2008) S08001.

[2] F. Gagliardi, et al., Philos. Trans.: Math. Phys. Eng. Sci. 363 (1833) (2005)1729.

[3] M. Lamanna, Nucl. Instr. and Meth. A 534 (2004) 1.[4] S. Agostinelli, et al., Nucl. Instr. and Meth. A 506 (2003) 250.[5] P.M. Lorenzo, CERN Comput. Newsl. 42 (2007) 7.[6] J.T. Moscicki, et al., Distributed Geant4 Simulation in Medical and Space

Science Applications using DIANE framework and the GRID, Seminar onInnovative Detectors, Siena, Italy, 24 October 2002.

[7] EGEE, 2009, EGEE Project: EGEE in numbers. /http://project.eu-egee.org/index.php?id=417S.

[8] CERN, 2008, ROOT. /http://root.cern.chS.[9] J.O. Perry, S. Xiao, T. Jevremovic, iMASS: Evolved NRF simulations for more

accurate detection of nuclear threats, ICONE17, 2009.