Post on 08-Jul-2015
description
1
GangaAn interface to the LHC computing grid
Matt WilliamsUniversity of Birmingham
)/, . #$("(, - ,#
2
CERN and the LHC
● Largest particle physics experiment in the world
● 27km in circumference ● Over 100m underground ● Thousands of physicists● 100s of petabytes of data
3
The Grid
4
GANGA
● ~2001 LHCb started GANGA, an in-house tool– Specific to our needs
● By 2010 when the LHC turned on, it was used by many more– ATLAS, NA62, T2K and many more smaller experiements
● Python had always been the obvious choice– Used everywhere in Particle Physics (along with C++)
– Easy to create new plugins for experiments
● Can be scripted or with an IPython-based interactive console● Open source, released as GPL (like most CERN software)
5
How is it used
j = Job(name = 'Example job')
j.application = Executable()
j.application.exe = File('test.sh')
j.outputfiles = [LocalFile('out.txt')]
j.backend = Local()
j.submit()
6
Retrieving results
In [1]: j.peek()
total 200
-rw-r--r-- 1 phrfbi lhcb 0 Jun 22 2013 __syslog__
-rw-r--r-- 1 phrfbi lhcb 141999 Jun 22 2013 stdout
-rw-r--r-- 1 phrfbi lhcb 53671 Jun 22 2013 stderr
-rw-r--r-- 1 phrfbi lhcb 2463 Jun 22 2013 out.txt
-rw-r--r-- 1 phrfbi lhcb 135 Jun 22 2013 __jobstatus__
In [2]: j.peek('out.txt')
7
Using the Grid
Just change backend from Local() to LCG()
Other backends are Interactive, PBS, LSF, SGE, Panda, Jedi, Dirac, Condor, ARC, CREAM...
8
Input data and splitting
j = Job(name = 'Input splitter', backend = LCG())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [LocalFile(f.strip()) for f in open('inputs.txt')]
j.splitter = SplitByFiles(filesPerJob = 10)
j.outputfiles = [LocalFile('histogram.root')]
j.submit()
9
Mergers
j = Job(name = 'Merger', backend = LCG())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [LocalFile(f.strip()) for f in open('inputs.txt')]
j.splitter = SplitByFiles(filesPerJob = 10)
j.outputfiles = [LocalFile('histogram.root')]
j.merger = RootMerger(files = ['histogram.root'])
j.submit()
10
Job catalogue
In [1]: jobs
Out [1]:
fqid | status | name | subjobs | application | backend
----------------------------------------------------------------------
0 | completed | Example job | | Executable | Local
1 | running | Input splitter | 324 | Executable | LCG
2 | running | Merger | 324 | Executable | LCG
11
Full API access
In [2]: jobs(2).status
Out [2]: running
In [3]: len([j for j in jobs(2).subjobs if j.status == 'completed'])
Out [3]: 24
In [4]: for subjob in jobs(2).subjobs:
if subjob.status == 'failed':
subjob.resubmit()
Can define custom functions in ~/.ganga.py which will be available at runtime
12
Dealing with large files
j = Job(name = 'Large output', backend = Dirac())
j.application = Executable()
j.application.exe = File('analyse_data')
j.inputfiles = [DiracFile('input.root')]
j.outputfiles = [DiracFile('histogram.root')]
j.submit()
13
Find more at cern.ch/ganga
Download code from cern.ch/ganga/download/
Thank you