Mathematical Modeling and Computational Physics 2013
description
Transcript of Mathematical Modeling and Computational Physics 2013
LOGO
Development of the distributed computing system for the MPD at the NICA collider, analytical estimations
Mathematical Modeling and Computational Physics 2013
Gertsenberger K. V.
Joint Institute for Nuclear Research, Dubna
NICA scheme
Gertsenberger K.V. 2MMCP’2013
Multipurpose Detector (MPD)
The software MPDRoot is developed for the event simulation, reconstruction and physical analysis of the heavy ions’ collision registered by MPD at the NICA collider.
3Gertsenberger K.V.MMCP’2013
Prerequisites of the NICA cluster
high interaction rate (to 6 KHz) high particle multiplicity, about 1000 charged particles for the
central collision at the NICA energyone event reconstruction takes tens of seconds in
MPDRoot now, 1M events – months large data stream from the MPD:
100k events ~ 5 TB
100 000k events ~ 5 PB/yearunified interface for parallel processing and storing of
the event data
4Gertsenberger K.V.MMCP’2013
Development of the NICA cluster
2 main lines of the development:
data storage development for the experiment
organization of parallel processing of the MPD events
5
development and expansion distributed cluster for the MPD experiment based on LHEP farm
development and expansion distributed cluster for the MPD experiment based on LHEP farm
Gertsenberger K.V.MMCP’2013
Current NICA cluster in LHEP for MPD
6Gertsenberger K.V.MMCP’2013
Distributed file system GlusterFS
aggregates the existing file systems in common distributed file system
automatic replication works as background process
background self-checking service restores corrupted files in case of hardware or software failure
implemented on application layer and working in user space
7Gertsenberger K.V.MMCP’2013
Data storage on the NICA cluster
8Gertsenberger K.V.MMCP’2013
Development of the distributed computing system
PROOF serverparallel data processing in a ROOT macro on the parallel architectures
NICA clusterconcurrent data processing
on cluster nodes
MPD-schedulerscheduling system for the task distribution to parallelize data processing on cluster nodes
9Gertsenberger K.V.MMCP’2013
Parallel data processing with PROOF
PROOF (Parallel ROOT Facility) – the part of the ROOT software, no additional installations
PROOF uses data independent parallelism based on the lack of correlation for MPD events good scalability
Parallelization for three parallel architectures:
1. PROOF-Lite parallelizes the data processing on one multiprocessor/multicores machine
2. PROOF parallelizes processing on heterogeneous computing cluster
3. Parallel data processing in GRID
Transparency: the same program code can execute both sequentially and concurrently
10Gertsenberger K.V.MMCP’2013
Using PROOF in MPDRoot The last parameter of the reconstruction: run_type (default, “local”).
Speedup on the user multicore machine:
$ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof”)
parallel processing of 1000 events with thread count being equal logical processor count
$ root reco.C (“evetest.root”, “mpddst.root”, 0, 500, “proof:workers=3”)
parallel processing of 500 events with 3 concurrent threads
Speedup on the NICA cluster:$ root reco.C(“evetest.root”, “mpddst.root”, 0, 1000, “proof:[email protected]:21001”)
parallel processing of 1000 events on all cluster nodes of PoD farm
$ root reco.C (“eve”, “mpddst”, 0, 500, “proof:[email protected]:21001:workers=10”)
parallel processing of 500 events on PoD cluster with 10 workers
11Gertsenberger K.V.MMCP’2013
Speedup of the reconstruction on 4-cores machine
12Gertsenberger K.V.MMCP’2013
PROOF on the NICA cluster
13Gertsenberger K.V.MMCP’2013
proofproof proof proof proof proof proof
proof
proof = master serverproof = slave node
*.root
GlusterFS
Proof On Demand Cluster
(8) (8) (16) (16) (24) (24) (32)
$ root reco.C(“evetest.root”,”mpddst.root”, 0, 3, “proof:[email protected]:21001”)
event count
evetest.rootevent №0
event №1
event №2
mpddst.root
Speedup of the reconstruction on the NICA cluster
14Gertsenberger K.V.MMCP’2013
MPD-schedulerDeveloped on C++ language with ROOT classes support.
Uses scheduling system Sun Grid Engine (qsub command) for execution in cluster mode.
SGE combines cluster machines on LHEP farm into the pool of worker nodes with 78 logical processors.
The job for distributed execution on the NICA cluster is described and passed to MPD-scheduler as XML file:
$ mpd-scheduler my_job.xml
15Gertsenberger K.V.MMCP’2013
Job description
16
<job>
<macro name="$VMCWORKDIR/macro/mpd/reco.C" start_event=”0” count_event=”1000” add_args=“local”/>
<file input="$VMCWORKDIR/macro/mpd/evetest1.root" output="$VMCWORKDIR/macro/mpd/mpddst1.root"/>
<file input="$VMCWORKDIR/macro/mpd/evetest2.root" output="$VMCWORKDIR/macro/mpd/mpddst2.root"/>
<file db_input="mpd.jinr.ru*,energy=3,gen=urqmd" output="~/mpdroot/macro/mpd/evetest_${counter}.root"/>
<run mode="local" count="5" config=“~/build/config.sh" logs="processing.log"/>
</job>The description starts and ends with tag <job>.
Tag <macro> sets information about macro being executed by MPDRoot
Tag <file> defines files to process by macro above
Tag <run> describes run parameters and allocated resources
* mpd.jinr.ru – server name with production database
Gertsenberger K.V.MMCP’2013
<job> <macro name="~/mpdroot/macro/mpd/reco.C"/> <file input="$VMCWORKDIR/evetest1.root" output="$VMCWORKDIR/mpddst1.root"/> <file input="$VMCWORKDIR/evetest2.root" output="$VMCWORKDIR/mpddst2.root"/> <file input="$VMCWORKDIR/evetest3.root" output="$VMCWORKDIR/mpddst3.root"/> <run mode=“global" count=“3" config=“~/mpdroot/build/config.sh"/></job>
Job execution on the NICA cluster
17Gertsenberger K.V.MMCP’2013 17Gertsenberger K.V.MMCP’2013
SGESGE SGE SGE SGE SGE SGE
SGE = Sun Grid Engine serverSGE = Sun Grid Engine worker
*.root
GlusterFS
SGE batch system
(8) (8) (16) (16) (24) (24) (32)
qsubevetest1.root
SGE
MPD-schedulerevetest2.root
evetest3.root
free free free busy busy busy busy
mpddst2.root
job_reco.xml
<job> <command line="get_mpd_production energy=5-9 "/> <run mode="global" config="~/mpdroot/build/config.sh"/></job>
job_command.xml
mpddst1.root mpddst3.rootjob_command.xml
Speedup of the one reconstruction on NICA cluster
18Gertsenberger K.V.MMCP’2013
NICA cluster section on mpd.jinr.ru
19Gertsenberger K.V.MMCP’2013
Conclusions The distributed NICA cluster was deployed based on LHEP farm for
the NICA/MPD experiment (Fairsoft, ROOT/PROOF, MPDRoot, Gluster, Torque, Maui). 128 cores
The data storage was organized with distributed file system GlusterFS: /nica/mpd[1-8]. 10 TB
PROOF On Demand cluster was implemented to parallelize event data processing for the MPD experiment, PROOF support was added to the reconstruction macro.
The system for the distributed job execution MPD-scheduler was developed to run MPDRoot macros concurrently on the cluster.
The web site mpd.jinr.ru in section Computing – NICA cluster presents the manuals for the systems described above.
20Gertsenberger K.V.MMCP’2013
LOGO
Analytical model for parallel processing on cluster
22
Spሺnሻ= BD ∗Pnode ∗ቀ2∗ nW + T1ቁ n∗ሺPnode + 1ሻ + BD ∗T1 speedup for point (data independent) algorithm of image processing
Pnode – count of logical processors, n – data to process (byte), ВD – speed of the data access (MB/s), T1 – “pure” time of the sequential processing (s)
Gertsenberger K.V.MMCP’2013
Prediction of the NICA computing power
23
How many are logical processors required to process NTASK physical analysis tasks and one reconstruction within Tday days in parallel?
Pnode = n+ BD ∗T1 BD ∗Tpar – n Pnode (NTASK ) = n1 ∗(NTASK + 1) ∗ NEVENT + BD ∗(TPA ∗NTASK + TREC) ∗NEVENT BD ∗(Tday ∗24∗3600)– n1 ∗(NTASK + 1) ∗ NEVENT
If n1 = 2 MB, NEVENT = 10 000 000 events, TPA = 5 s/event, TREC = 10 s/event., BD = 100 MB/s, Tday = 30 days
Gertsenberger K.V.MMCP’2013