ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen...
-
Upload
archibald-stevenson -
Category
Documents
-
view
212 -
download
0
Transcript of ATLAS Data Challenge on NorduGrid CHEP2003 – UCSD Anders Wäänänen...
2ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid project
Launched in spring of 2001, with the aim of creating a Grid infrastructure in the Nordic countries.
Idea to have a Monarch architecture with a common tier 1 center
Partners from Denmark, Norway, Sweden, and Finland
Initially meant to be the Nordic branch of the EU DataGrid (EDG) project
3 full-time researchers with few externally funded
3ATLAS Data Challenge with NorduGridAnders Wäänänen
Motivations
NorduGrid was initially meant to be a pure deployment project
One goal was to have the ATLAS data challenge run by May 2002
Should be based on the the Globus Toolkit™
Available Grid middleware: The Globus Toolkit™
A toolbox – not a complete solution
European DataGrid software Not mature for production in the beginning of 2002 Architecture problems
4ATLAS Data Challenge with NorduGridAnders Wäänänen
A Job Submission Example
UIJDL
Logging &Logging &Book-keepingBook-keeping
ResourceResourceBrokerBroker
Output “sandbox”
Input “sandbox”
Job SubmissionJob SubmissionServiceService
StorageStorageElementElement
ComputeComputeElementElement
Brokerinfo
Output “sandbox”
Input “sandbox”
Information Information ServiceService
Job Status
ReplicaReplicaCatalogueCatalogue
Author.&Authen. Job
Subm
it
Job Q
uery
Job Status
5ATLAS Data Challenge with NorduGridAnders Wäänänen
Architecture requirements
No single point of failure
Should be scalable
Resource owners should have full control over their resources
As few site requirements as possible: Local cluster installation details should not be dictated
Method, OS version, configuration, etc…
Compute nodes should not be required to be on the public network
Clusters need not be dedicated to the Grid
6ATLAS Data Challenge with NorduGridAnders Wäänänen
User interface
The NorduGrid user interface provides a set of commands for interacting with the grid
ngsub – for submitting jobs ngstat – for states of jobs and clusters ngcat – to see stdout/stderr of running jobs ngget – to retrieve the results from finished jobs ngkill – to kill running jobs ngclean – to delete finished jobs from the system ngcopy – to copy files to, from and between file servers and replica
catalogs ngremove – to delete files from file servers and RC’s
7ATLAS Data Challenge with NorduGridAnders Wäänänen
ATLAS Data Challenges
A series of computing challenges within Atlas of increasing size and complexity.
Preparing for data-taking and analysis at the LHC.
Thorough validation of the complete Atlas software suite.
Introduction and use of Grid middleware as fast and as much as possible.
8ATLAS Data Challenge with NorduGridAnders Wäänänen
Data Challenge 1
Main goals: Need to produce data for High Level Trigger & Physics groups
Study performance of Athena framework and algorithms for use in HLT High statistics needed
Few samples of up to 107 events in 10-20 days, O(1000) CPU’s Simulation & pile-up
Reconstruction & analysis on a large scale learn about data model; I/O performances; identify bottlenecks etc
Data management Use/evaluate persistency technology (AthenaRoot I/O) Learn about distributed analysis
Involvement of sites outside CERN use of Grid as and when possible and appropriate
9ATLAS Data Challenge with NorduGridAnders Wäänänen
DC1, phase 1: Task Flow
Example: one sample of di-jet events PYTHIA event generation: 1.5 x 107 events split into partitions (read: ROOT files) Detector simulation: 20 jobs per partition, ZEBRA output
Atlsim/Geant3+ Filter
105 events
Atlsim/Geant3+ Filter
Hits/Digits
MCTruth
Atlsim/Geant3+ Filter
Pythia6
Di-jet
Athena-Root I/O Zebra
HepMC
HepMC
HepMC
Event generation Detector Simulation
(5000 evts)(~450 evts)
Hits/Digits
MCTruth
Hits/Digits
MCtruth
10ATLAS Data Challenge with NorduGridAnders Wäänänen
DC1, phase 1: Summary
July-August 2002
39 institutes in 18 countries
3200 CPU’s , approx.110 kSI95 – 71000 CPU-days
5 × 107 events generated
1 × 107 events simulated
30 Tbytes produced
35 000 files of output
11ATLAS Data Challenge with NorduGridAnders Wäänänen
DC1, phase1 for NorduGrid
Simulation
Dataset 2000 & 2003 (different event generation) assigned to NorduGrid
Total number of fully simulated events: 287296 (1.15 × 107 of input events)
Total output size: 762 GB.
All files uploaded to a Storage Element (University of Oslo) and registered in the Replica Catalog.
12ATLAS Data Challenge with NorduGridAnders Wäänänen
Job xRSL script
&
(executable=”ds2000.sh”)
(arguments=”1244”)
(stdout=”dc1.002000.simul.01244.hlt.pythia_jet_17.log”)
(join=”yes”)
(inputfiles=(“ds2000.sh” “http://www.nordugrid.org/applications/dc1/2000/dc1.002000.simul.NG.sh”))
(outputfiles=
(“atlas.01244.zebra” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.zebra”)
(“atlas.01244.his” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.his”)
(“dc1.002000.simul.01244.hlt.pythia_jet_17.log” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.log”)
(“dc1.002000.simul.01244.hlt.pythia_jet_17.AMI” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.AMI”)
(“dc1.002000.simul.01244.hlt.pythia_jet_17.MAG” “rc://dc1.uio.no/2000/log/dc1.002000.simul.01244.hlt.pythia_jet_17.MAG”))
(jobname=”dc1.002000.simul.01244.hlt.pythia_jet_17”)
(runtimeEnvironment=”DC1-ATLAS”)
(replicacollection=”ldap://grid.uio.no:389/lc=ATLAS,rc=NorduGrid,dc=nordugrid,dc=org”)
(maxCPUTime=2000)(maxDisk=1200)
(notify=”e [email protected])
13ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid job submission
The user submits a xRSL-file specifying the job-options.
The xRSL-file is processed by the User-Interface.
The User-Interface queries the NG Information System for resources and the NorduGrid Replica-Catalog for location of input-files and submits the job to the selected resource.
Here the job is processed by the Grid Manager, which downloads or links files to the local session directory.
The Grid Manager submits the job to the local resource management system.
After simulation finishes, the Grid-Manager moves requested output to Storage Elements and registers these into the NorduGrid Replica-Catalog.
14ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid job submission
RC
RSL
MDSGrid
Manager
GatekeeperGridFTP
RSL
RSL
15ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid Production sites
16ATLAS Data Challenge with NorduGridAnders Wäänänen
17ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid Pileup
DC1, pile-up: Low luminosity pile-up for the phase 1 events
Number of jobs: 1300 dataset 2000: 300
dataset 2003: 1000
Total output-size: 1083 GB dataset 2000: 463 GB
dataset 2003: 620 GB
18ATLAS Data Challenge with NorduGridAnders Wäänänen
Pileup procedure
Each job downloaded one zebra-file from dc1.uio.no of approximate
900MB for dataset 2000 400MB for dataset 2003
Use locally present minimum-bias zebra-files to "pileup" events on top of the original simulated ones present in the downloaded file. The output size of each file was about 50 % bigger than the original downloaded file i.e.:
1.5 GB for dataset 2000 600 GB for dataset 2003
Upload output-files to dc1.uio.no and dc2.uio.no SE‘s
Register into the RC.
19ATLAS Data Challenge with NorduGridAnders Wäänänen
Other details
At peak production, up to 200 jobs were managed by the NorduGrid at the same time.
Has most of Scandinavian production clusters under its belt (2 of them are in Top 500)
However not all of them allow for installation of ATLAS Software
Atlas job manager Atlas Commander support the NorduGrid toolkit
Issues Replica Catalog scalability problems MDS / OpenLDAP hangs – solved Software threading problems – partly solved
Problems partly in Globus libraries
20ATLAS Data Challenge with NorduGridAnders Wäänänen
NorduGrid DC1 timeline
April 5th 2002 First ATLAS job submitted (Athena Hello World)
May 10th 2002 First pre-DC1-validation-job submitted
(ATLSIM test using Atlas-release 3.0.1)
End of May 2002 Now clear that NorduGrid mature enough to handle real production
Spring 2003 (now) Keep running Data challenges and improve the toolkit
21ATLAS Data Challenge with NorduGridAnders Wäänänen
Quick client installation/job run
As a normal user (non system privileges required): Retrieve nordugrid-standalone-0.3.17.rh72.i386.tgz
tar xfz nordugrid-standalone-0.3.17.rh72.i386.tgz
cd nordugrid-standalone-0.3.17
source ./setup.sh
Get a personal certificate:
grid-cert-request
Install certificate per instructions
Get authorized on a cluster
Run a job
grid-proxy-init
ngsub '&(executable=/bin/echo)(arguments="Hello World")‘
22ATLAS Data Challenge with NorduGridAnders Wäänänen
Resources
Documentation and source code are available for download
Main Web site: http://www.nordugrid.org/
ATLAS DC1 with NorduGrid http://www.nordugrid.org/applications/dc1/
Software repository ftp://ftp.nordugrid.org/pub/nordugrid/
23ATLAS Data Challenge with NorduGridAnders Wäänänen
The NorduGrid core group
Александр Константинов
Balázs Kónya
Mattias Ellert
Оксана Смирнова
Jakob Langgaard Nielsen
Trond Myklebust
Anders Wäänänen