Ganga Action Plan-A critical analysis The Ganga River Ganga is not ...
Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS...
-
Upload
herbert-lawson -
Category
Documents
-
view
218 -
download
0
Transcript of Distributed Analysis using Ganga I.Ideas behind Ganga II.Getting started III.Running ATLAS...
Distributed Analysis using Ganga
I. Ideas behind Ganga II.Getting startedIII.Running ATLAS
applications
Distributed Analysis TutorialATLAS Computing & Software Workshop, München, 26-30 March 2007
http://cern.ch/ganga
Karl Harrison / University of Cambridge
29 March 2007 2/31
Ganga basics
• Depending on context, Ganga can be any of:
(A) a Hindu goddess
(B) an hallucinogenic drug
(C) a job-management framework (Gaudi/Athena and Grid Alliance),implemented in Python, that simplifiesrunning jobs on the Grid
• Anyone expecting a presentation on (A) or (B) is going to be disappointed
• Some have suggested: A + B = C
Sculpture of Ganga in cave temple,Elephanta Island, Mumbai harbour
Ganga, or ganja, is prepared fromthe plant cannabis sativa
29 March 2007 3/31
Ganga as a job-management framework (1)
• Ganga is developed as ATLAS-LHCb common project• Ganga 4.2.12 (current release), has built-in support for applications based on Athena framework, for JobTransforms and for DQ2 data-management system
- Ganga 4.3 (release early in April) will additionally be interfaced with AMI and TNT
• Component model allows customisations for other types of application, e.g. ROOT
• Ganga provides a uniform interface for accessing different types of processing system
- Allow trivial switching between testing on local batch system and running full-scale analysis on the Grid
Job definition
Job submission
29 March 2007 4/31
Ganga as a job-management framework (2)
• Whenever started, Ganga runs a monitoring loop in the background
- Track progress of submitted jobs- Retrieve outputs of completed jobs- Check validity of user credentials: Grid proxy and/or AFS token
• Ganga stores job information locally or (Ganga 4.3) on a remote server with certificate-based authentication• Job inputs and outputs are kept in Ganga workspace until moved or deleted by user• User can modify code without affecting a submitted job
Monitoring
Archival
29 March 2007 5/31
Ganga job abstraction
• A job in Ganga is constructed from a set of building blocks, not all required for every job
Merger
Application
Backend
Input Dataset
Output Dataset
Splitter
Data read by application
Data written by application
Rule for dividing into subjobs
Rule for combining outputs
Where to run
What to run
Job
29 March 2007 6/31
Plugin classes
Athena
GangaObject
IApplication IBackendIDatasetISplitter IMerger
LCG-CE-requirements-jobtype-middleware-id-status-reason-actualCE-exitcode
-atlas_release-max_events-options-option_file-user_setupfile-user_area
User
System
Plugin
Interfaces
Example plugins
and schemas
• Ganga handles many types of Application, Backend, Dataset, Splitter and Merger, implemented as plugin classes
• Each plugin class has its own schema• New plugin classes can readily be added: the system is
extensible
29 March 2007 7/31
Applications and Backends• Running of a particular Application on a given Backend is enabled by
implementing an appropriate adapter component or Runtime Handler– Can often use same Runtime Handler for more than one Backend:
less coding
PBS OSG NorduGridLocal LSF PANDA
US-ATLAS WMS
LHCb WMS
ExecutableAthena
(Simulation/Digitisation/Reconstruction/Analysis)
AthenaMC(Production)
Gauss/Boole/Brunel/DaVinci(Simulation/Digitisation/Reconstruction/Analysis)
LHCb Experiment neutral ATLAS
Available in Ganga 4.2
Work in progress
New in Ganga 4.3
29 March 2007 8/31
Help with Ganga
• Ganga documentation can be found in the User Guides section of the Ganga web side: http://cern.ch/ganga/– Most relevant items are:
• Installation• Working with Ganga - general introduction to functionality• GUI manual - introduction to graphical interface• Link to ATLAS Wiki page for distributed analysis using Ganga
– https://twiki.cern.ch/twiki/bin/view/Atlas/GangaTutorial427– Tomorrow’s hands-on sessions will use this
• For problems or feature requests, do any of the following:– Use hypernews forum for Ganga users and developers:
https://hypernews.cern.ch/HyperNews/Atlas/get/GANGAUserDeveloper.html
– Send e-mail to [email protected]– Submit a report via Ganga’s bug-submission page in Savannah:
https://savannah.cern.ch/bugs/?func=additem&group=ganga• Should either login to Savannah first, or give e-mail address
29 March 2007 9/31
Installation for distributed analysis with Ganga
• Software for distributed analysis with Ganga is already installed at CERN and a number of other sites
• If needed, you can perform your own installation– Install the ATLAS software
• See: https://twiki.cern.ch/twiki/bin/view/Atlas/InstallingAtlasSoftware
– To be able to access LCG resources, install LCG user interface• See: https://twiki.cern.ch/twiki/bin/view/LCG/TarUIInstall
– Install DQ2 client• See: https://twiki.cern.ch/twiki/bin/view/Atlas/DDMClientDQ2
– Install Ganga• Download installation script: http://cern.ch/ganga/download/ganga-install
• Perform installation of latest release using:
• With Ganga 4.3, will be able to add GangaNorduGrid to package list– Automatically install NorduGrid client software
python ganga-install --extern=GangaAtlas,GangaGUI,GangaPlotter last
29 March 2007 10/31
Setting up for distributed analysis with Ganga
• Setup sequence is as follows– Ensure that you have a Grid certificate installed, and that you are registered with the ATLAS Virtual Organisation
– Setup environment for Athena, then checkout and build UserAnalysis package (or equivalent)
– Setup the environment for using LCG client tools– Setup the environment for using DQ2– Setup the environment for using Ganga
• On an lxplus account at CERN, Ganga setup is performed using:
• Ganga setup at other sites should ensure the following:– Directory containing ganga executable is added to PATH–
• Detailed setup instructions given as part of hands-on exercises
source /afs/cern.ch/sw/ganga/install/etc/setup-atlas.sh
Optional, butsometimes useful
GANGA_CONFIG_PATH is set to GangaAtlas/Atlas.ini
29 March 2007 11/31
Using Ganga
• Command Line Interface in Python (CLIP) provides interactive job definition and submission from an enhanced Python shell (IPython)– Especially good for trying things out, and seeing how the system works
• Scripts, which may contain any Python/IPython or CLIP commands,allow automation of repetitive tasks
• Scripts included in distribution enable kind of approach traditionally used when submitting jobs to a local batch system
• Graphical User Interface (GUI)allows job management based on mouse selectionsand field completion– Lots of configuration possibilities
• Ganga allows users to work in a variety of ways
29 March 2007 12/31
Ganga startup and configuration files
ganga --help
ganga -g
ganga --gui &
ganga <script>
• Before processing .gangarc, Ganga processes, in the order they are specified, any configuration files pointed to by the environment variable GANGA_CONFIG_PATH
– This makes possible the use of group configuration files, but allows settings to be overridden on a user-by-user basis
print Ganga help information
start GUI session
run specified script in Ganga environment
ganga start CLIP session
• Ganga can be invoked in any of the following ways:
– If user doesn’t have a valid proxy then his/her Grid passphrase is requested
• When Ganga is first run, a configuration file .gangarc is created in the user’s home directory
– The file includes comments on the configuration possibilities– The latest default configuration file can always be obtained with:
29 March 2007 13/31
Ganga workspace• Ganga creates a directory gangadir in your home directory
and uses this for storing job-related files and information– You can’t move this directory but, before running Ganga, you can create ~/gangadir as a link to another location
– Should delete jobs when they are no longer needed, so that Ganga input/output files don’t exhaust disk quota
gangadir
repository
input
Local
templates
output
workspace
Remote
gui
<username>
jobs 66 67
29 March 2007 14/31
Python commands
• Ganga is developed in Python, making use of IPython extensions
• All Python/IPython commands can be used at the prompt in a Ganga CLIP session, and the syntax for CLIP and Python commands is the same
• Information about Python can be found at: http://www.python.org/– If you’re new to Python, the on-line tutorial is very helpful
• The following are often useful
# A hash (#) marks the start of a comment# A slash (\) at the end of a line indicates that# the following line is a continuationdir() # List currently available objectshelp() # Give helphelp( item ) # Give help on specified itemx = 5 # Assign value to variableprint x # Print value of variablectrl-D # Exit from session
29 March 2007 15/31
IPython commands
• Information about IPython extensions can be found at: http://ipython.scipy.org/
• One useful extension is the possibility to use shell commands from Python, together with both shell variables and Python variables# Use ! before shell commands# Use $ before Python variables# Use $$ before shell variables
here = ‘where the heart is’!echo $$HOME is $here
!ls $$HOME/mySubdir
!emacs # Start emacs session!zsh # Give shell prompt
Exit # Exit from session
29 March 2007 16/31
Ganga CLIP commands (1)
• Ganga commands are explained in the guide Working with Ganga:http://cern.ch/ganga/user/html/GangaIntroduction
• From a CLIP session, available classes, objects and functions may be listed, and help can be requested for each
• Useful commands include the followingplugins( ‘type’) # List plugins of specified type: # ‘applications’, ‘backends’, etcj1 = Job( backend =LSF() ) # Create a new job for LSFa1 = Executable() # Create Executable applicationj1.application = a1 # Set value for job’s applicationj1.backend = LCG() # Change job’s backend to LCGexport( j1, ‘myJob.py’ ) # Write job to specified fileload( ‘myJob.py’ ) # Load job(s) from specified filej2 = j1.copy() # Create j2 as a copy of job j1jobs # List jobsjobs[ i ].subjobs # List subjobs for split job i
29 March 2007 17/31
Ganga CLIP commands (2)
• When a job j has been defined, the following methods can be used
• Once a job has been submitted, it can no longer be modified, and it cannot be resubmitted, but the job can be copied and the copy can be modified/submitted
• Ganga supports use of templates, which can be used as the basis of a job definition
j.submit() # Submit the jobj.kill() # Kill the job (if running)j.remove() # Kill the job and delete associated filesj.peek() # List files in job’s output directory
t = JobTemplate() # Create templatetemplates # List templatesj3 = Job( templates[ i ] ) # Create job from template i
29 March 2007 18/31
CLIP: “Hello World” example
• From a Ganga CLIP session, a job that writes “Hello World” can be created, and then submitted to LCG, as follows app = Executable()app.exe = ‘/bin/echo’app.env = {}app.args = [‘Hello World’ ]# Property values set above are in fact the defaults# for Executable applicationj = Job( application = app, backend = LCG() )j.submit()# Check on job progressjobs# When job has completed, check the outputj.peek( ‘stdout’ )
29 March 2007 19/31
Using Ganga commands from a Linux shell• Ganga includes scripts that can be used from a Linux shell (i.e.
outside of CLIP) # Create a job for submitting Executable to LCG ganga make_job Executable LCG test.py [ Edit test.py to set Executable and/or LCG properties ] # Submit job ganga submit test.py # Query status, triggering output retrieval if job is completed, # but not recommended because of risk of time-outs for status queries ganga query
# Kill job ganga kill id # Remove job from Ganga repository and workspace ganga remove id
• Given job name or id as returned by query, also have possibilities such as
• Same syntax can be used from inside CLIP, with no overheads for startup
29 March 2007 20/31
Ganga plugins for ATLAS jobs
Athena
GangaObject
IApplication IBackendIDatasetISplitter IMerger
LCG
ATLASCastorDataset
DQ2Dataset
ATLASDataset
ATLASLocalDataset
ATLASOutputDataset
DQ2OutputDatasetAthenaMC
AthenaMCpyJY
AthenaSplitterJob
AthenaMCSplitterJob
AthenaMCpyJTSplitterJob
AthenaOutputMergerLSF
Other
Analysis
Production
Input data
Output data
Dataset in DQ2/DDM
Files on local storage
Old mc10 data in old LFC
Older data on CASTOR at CERN
Dataset in DQ2/DDM
Files on local storage
29 March 2007 21/31
Starting point for using Ganga to run ATLAS applications• Need usual setup for running Athena• For analysis:
– Need steering package that defines the physics analysis• This is any package where cmt/requirements defines all dependencies
• In the hands-on exercises, and for anyone who’s followed the analysis examples in the ATLAS Workbook, the steering package is UserAnalysis
– Work from /run subdirectory of steering package• For user-level production
– Should download JobTransform archive to directory where Ganga is run
– Archive used in hands-on exercises is:http://cern.ch/atlas-computing/links/kitsDirectory/Production/kits/AtlasProduction_12_0_4_1_noarch.tar.gz
29 March 2007 22/31
Using Ganga’s athena script to submit analysis job to LCG• From the Linux shell, job can be submitted to LCG using the syntax:
ganga athena \--inDS misalg_csc11.005300.PythiaH130zz4l.recon.AOD.v12003104 \--outputdata AnalysisSkeleton.aan.root \--split 3 \--maxevt 100 \--lcg \--ce ce102.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas \AnalysisSkeleton_topOptions.py
Use Ganga’s athena script Input dataset Output data
Split job into 3 subjobs
Limit analysis to 100 events per subjob
Submit to LCGForce use of particular compute elementJob options
• Replace --lcg with --lsf, and omit --ce, to submit to LSF– Trivial switching between running locally and running on Grid
• Help available on options accepted by Ganga’s athena scriptganga athena --help
29 March 2007 23/31
Monitoring job progress and retrieving output
• To monitor job progress, you should start a Ganga CLIP or GUI session
• In CLIP, changes in the status of jobs/subjobs are buffered, and are printed when you hit return
• At any time, you can also explicitly request status information # print status information for all jobsjobs# Print status information for particular subjobprint jobs[5].subjobs[27].status
• When a job completes, the Ganga monitoring loop takes care of storing the output, and registers it with DQ2 with a dataset name of the form user.username.ganga.jobid
• Output can be listed and retrieved using DQ2 client tools
dq2_ls -f user.username.ganga.jobiddq2_get -r user.username.ganga.jobid
29 March 2007 24/31
Running an analysis job from CLIP (1)
• Create application object, set job options and prepare tar file of user area– Other properties filled automatically, based on user setup app = Athena()app.application.option_file = ‘myOpts.py’app.prepare( athena_compile = False )
• Define the input dataset
inData = DQ2Dataset()inData.dataset = ‘interestingDataset.AOD.v12003104’inData.type = ‘DQ2_Local’
• Define the output dataset
outData = AthenaOutputDataset()outData.outputdata = ‘myOutput.root’
29 March 2007 25/31
Running an analysis job from CLIP (2)
• Define splitter, merger and backend
splitter = AthenaSplitterJob( numsubjobs = 2 )merger = AthenaOutputMerger()backend = LCG( CE = ‘reliableCE’ )
• Create job template from defined objects
t = JobTemplate( name = ‘TestAnalysis’ )t.application = appt.backend = backendt.inputdata = inDatat.outputdata = outDatat.splitter = splittert.merger = merger
29 March 2007 26/31
Running an analysis job from CLIP (3)
• Create job from the template and submit the job
j = Job( t )j.submit()
• Check job status
jobs
• When job has completed, check standard outputs of subjobs, then retrieve and merge ROOT output files
j.subjobs[0].peek( “stdout” )j.subjobs[1].peek( “stdout” )j.outputdata.retrieve()j.merge()
29 March 2007 27/31
User-level production
• Event production is broken down into three steps:– evgen: generate particle kinematics– simul+digit: simulate particles passing through detector - RDO output
– recon: event reconstruction - AOD, ESD, CBNT output• With Ganga 4.3, submission of production jobs from Linux
shell will be possible using Ganga’s athena script• As CLIP example, consider generation of 30 events
containing single electron with ET > 40 GeV
– Same example used in hands-on exercises
29 March 2007 28/31
Running user-level production from CLIP (1)
• Create application object, and set propertiesapp = AthenaMC()app.atlas_release = ‘12.0.4’app.transform_archive = ‘AtlasProduction_12_0_4_1_noarch.tar.gz’app.production_name = ‘tutorial’app.mode = ‘evgen’app.evgen_job_option = ‘DC3.007004.singlepart_e_Et40.py’app.process_name = ‘single_e_Et40’app.run_number = ‘000001’app.firstevent = ‘1’app.random_seed = ‘1102362401’app.number_events_job = ‘30’app.se_name = ‘NIKHEF’
29 March 2007 29/31
Running user-level production from CLIP (2)
• Define the output dataset– The output is stored at the site specified by app.se_name– Naming convention explained in hands-on exercises
• Define LCG backend, with execution forced at a particular site
backend = LCG()backend.CE = ‘tbn20.nikhef.nl:2119/jobmanager-pbs-atlas’
• Create job template from defined objects
t = JobTemplate( name = ‘TestGeneration’ )t.application = appt.backend = backendt.outputdata = outData
outData = DQ2OutputDataset()
• Create job from template and submit Job( t ).submit()
29 March 2007 30/31
Ganga Graphical User Interface (GUI)
• GUI consists of central monitoring panel and dockable windows
• Essentially everything that can be done in CLIP can be done with the GUI– More details in presentation tomorrowJob
details
Logical
Folders
Scriptor
Job Monitoring
Log window
Job builder
29 March 2007 31/31
Conclusions
• Have given an overview of:– the ideas behind Ganga– getting started with Ganga, running a “Hello World” job
– using Ganga to run ATLAS applications• Have probably made it seem more complicated than it is
in practice• To see that Ganga is quite easy to use, you just have to
try it– Chance for this, and more detailed explanations of the functionality, in the Ganga hands-on sessions tomorrow