ATLAS Data Challenges on the Grid Oxana Smirnova Lund University October 31, 2003, Košice.

ATLAS Data ATLAS Data Challenges Challenges on the Gridon the Grid

Oxana SmirnovaOxana SmirnovaLund UniversityLund UniversityOctober 31, 2003, KoOctober 31, 2003, Košicešice

2003-10-31 [email protected] 2

Large Hadron Collider:World’s biggest accelerator at CERN


Collisions at LHC


Physics

Higgs: Clarify the origin of the spontaneous symmetry-breaking mechanism in the Electroweak sector of the Standard Model; origin of mass

New forces (symmetries) New particles Super symmetries Substructure Etc…


ATLAS: one of 4 detectors at LHC


ATLAS: preparing for data taking


Just finished: Data Challenge 1 (DC1) Event generation completed during DC0 Main goals of DC1:

Need to produce simulated data for High Level Trigger & Physics Groups

Reconstruction & analysis on a large scale learn about data model; I/O performances; identify bottlenecks

etc Data management

Use/evaluate persistency technology Learn about distributed analysis

Involvement of sites outside CERN Use of Grid as and when possible and appropriate


DC1, Phase 1: Task Flow Example: one sample of di-jet events

PYTHIA event generation: 1.5 x 107 events split into partitions (read: ROOT files) Detector simulation: 20 jobs per partition, ZEBRA output

Atlsim/Geant3+ Filter

105 events


Hits/Digits

MCTruth


Pythia6

Di-jet

Athena-Root I/O Zebra

HepMC

HepMC

HepMC

Event generation Detector Simulation

(5000 evts)(~450 evts)

Hits/Digits

MCTruth

Hits/Digits

MCtruth


Piling up events


DC1 in facts Integrated DC1 numbers:

56 institutes in 21 countries 120000 “normalized” CPU-days 5 × 107 events generated ca 1 × 107 events simulated and reconstructed 70 TB produced

More precise quantification is VERY difficult because of orders of magnitude complexity differences between different physics channels and processing steps

1. CPU time consumption: largely unpredictable, VERY irregular2. OS: GNU/Linux, 32 bit architecture3. Inter-processor communication: never been a concern so far (no MPI needed)4. Memory consumption: depends on the processing step/data set: 512MB is enough

for simulation, while reconstruction needs 1GB as minimum 5. Data volumes: vary from KB to GB per job6. Data access pattern: mostly unpredictable, irregular7. Data bases: each worker node is expected to be able to access a remote

database8. Software is under constant development, will certainly exceed 1 GB, includes

multiple dependencies on HEP-specific software, sometimes licensed


US-ATLAS EDG App. Testbed NorduGrid

Grids for ATLAS


US-ATLAS Grid: Virtual Data Toolkit Perhaps the largest facility in a single country CPU-wise

Very heterogeneous But still within one country: little administrative and security-

related problems VDT: a set of low-level Grid services

Globus is a major component: security, servers Condor is used in a “reduced” manner: no parallelism Chimera: a tool to interpret job description and perform job

submission Globus RLS is used for data management, BUT no workload

management, no information system Active participation in pile-up and reconstruction stage, estimated

at 3% of total ATLAS DC effort in these stages


By the time EDG set up an Applications Testbed, ATLAS DC were already on the Grid in a production mode (NorduGrid, VDT)

ATLAS-EDG Task Force was put together in August 2002 with the aims: To assess the usability of the EDG testbed for the immediate

production tasks To introduce the Grid awareness to the ATLAS collaboration

The Task Force had representatives both from ATLAS and EDG: 40+ members (!) on the mailing list, ca 10 of them working nearly full-time

The initial goal: to reproduce 2% of the Dataset 2000 simulation on the EDG testbed; if this works, continue with other tasks

The Task Force was the first of this kind both for EDG and LHC experiments

ATLAS-EDG Task Force


EDG: Execution of jobs It was expected that we can make full use of the Resource Broker

functionality Data-driven job steering Best available resources otherwise

Input files are pre-staged once (copied from CASTOR and replicated elsewhere)

A job consists of the standard DC1 shell-script, very much the way it is done on a conventional cluster

A Job Definition Language is used to wrap up the job, specifying: The executable file (script) Input data Files to be retrieved manually by the user Optionally, other attributes (maxCPU, Rank etc)

Storage and registration of output files is a part of the job script: i.e., application manages output data the way it needs


Simulation: many hurdles were hit EDG could not replicate files directly from CASTOR and could not

register them in the Replica Catalog Replication was done via CERN SE. CASTOR team wrote a GridFTP

interface, which helps a bit Big file transfer interrupted after 21 minutes

Also known Globus GridFTP server problem, temporary fixed by using multi-threaded GridFTP instead of EDG tools (later fixed by Globus)

Jobs were “lost” by the system after 20 minutes of execution Known problem of the Globus software (GASS Cache mechanism),

temporary fixed on expense of frequent job submission (later fixed by Globus)

Static information system: if a site goes down, it should be removed manually from the index

Due to Globus and OpenLDAP bugs; were never fixed as EDG changed to R-GMA

Plenty of other small problems; in general, every service was VERY unstable

EDG developers went on working hard to fix it


Reconstruction: improved service Similar task computing-wise, but smaller load

250 reconstruction jobs ~500K events (~70 CPU-days) Input originally resides at CASTOR, copied onto EDG sites via FTP

mostly Total of ~70 CPUs were available

Dataset Physics description # of files Location

2040QCD di-jets,Et<70 GeV, lumi02

50 Lyon


5025 CNAF25 Milan


50 Cambridge


10020 Rome30 CNAF50 Lyon


Much preparatory work needed

Downgrade PBS version w.r.t. the EDG standard At the time, ATLAS needed RedHat7.3, and EDG –

RedHat6.2; some merging was necessary Force install of some components – ignoring some

dependencies Publish ATLAS RunTimeEnvironment in EDG Information

System Installation procedures for ATLAS s/w differed from site to

site: Milan and CNAF: LCFG Rome and Cambridge: by hand Lyon: through AFS


EDG tests: final reconstruction results Success rate:

Submitted jobs: 250 (by 4 different people) Successful jobs: 248 out of which, resubmitted: ~35

14 for corrupted input file 15 remained in “Status = Ready” 5 for HPSS problems (HPSS timed out)

Failed jobs: 2 NOT due to GRID failures:

one probably related to input file corruption one created an abnormally small output

Overall performance: the RB needed maintenance only twice Lyon was out for 2 days due to certificates problems Otherwise, stability is encouraging Still, a strong involvement of each site manager was necessary


NorduGrid: how does it work All the services are either taken from Globus, or written using Globus libraries

and API Information system knows everything

Substantially re-worked and patched Globus MDS Distributed and multi-rooted Allows for a mesh topology No need for a centralized broker

The server (“Grid manager”) on each gatekeeper does most of the job Pre- and post- stages files Interacts with PBS Keeps track of job status Cleans up the mess Sends mails to users

The client (“User Interface”) does the Grid job submission, monitoring, termination, retrieval, cleaning etc

Interprets user’s job task Gets the testbed status from the information system Forwards the task to the best Grid Manager Does some file uploading, if requested


The resourcesCluster Processor type CPUs Memory

Monolith Intel Xeon 2200 MHz 392 2048

Seth AMD Athlon 1667 MHz 234 850

Parallab Pentium III 1001 MHz 62 2048

Lund Farm Pentium III 450 MHz 48 256

Lund Farm Pentium III 800 MHz 32 512

LSCF Dual AMD Athlon MP 1800+ MHz 32 512

Ingvar AMD Athlon 908 MHz 31 512

SCFAB AMD Athlon 950 MHz 22 512

Oslo Pentium III 1001 MHz 19 256

Grendel AMD Athlon 1007 MHz 13 256

Lund Pentium III 1001 MHz 7 512

NBI AMD Athlon 1200 MHz 6 512

Uppsala Pentium III 870 MHz 4 512

Bergen Pentium III 1001 MHz 4 256Very heterogeneous: no identical installations!


Running ATLAS DC1 April 14, 2002: first live public demonstration of a PYTHIA job May 23, 2002: first simulation tests

Information system in place Grid manager has basic capabilities User Interface can do simple matchmaking RPMs of ATLAS software are introduced; have to be installed by each site

manager July-August 2002: real work

Simulation: 300 partitions of the dataset 002000 (July 25 to August 8) Total absence of a conventional “fall back” solution All the input files are replicated to all the sites First real test – surprisingly, it did not fell apart

Intermittent problems with the Replica Catalog: unstable Some glitches in MDS And of course our own bugs But it was possible to upgrade “on fly” And we were not the last!


Running ATLAS DC1 (contd.) Fall 2002: NorduGrid is no longer considered a “test”, but rather a facility

Non-ATLAS users at times are taking over Simulation of the full dataset 2003 (1000 output partitions, August 31 to September

10) Introducing new features in order to meet pile-up demands

Winter 2002-2003: running min. bias pile-up Dataset 002000 (300 partitions, December 9 to 14) Dataset 002003 – full (February 11 to March 5, upgrade in the middle) Some sites can not accommodate all the needed min. bias files, hence jobs are not

really data-driven any longer Spring-summer 2003: reconstruction (2 stages)

The NorduGrid facilities and middleware are very reliable (people at times forget it’s actually a Grid setup)

Apart of our “own” data, 750 other partitions are taken over from US and Russia Logistical problems of having certificates mutually installed!

No data-driven jobs The biggest challenge – to “generalize” the ATLAS software to suit everybody

and to persuade big sites to install it No alternatively available conventional resources! If NorduGrid fails, ATLAS

DC1 is in trouble.


Accommodating the data

Every mass production period increases amount of data to be stored

At the moment, ca. 4 TB is occupied

Existing Storages: Oslo: 4×1 TB Umeå: 600 GB Linköping: 1 TB Lund: 1.2 TB

Within a couple of years, and with other applications coming, we would need an order (or 2?) of magnitude more

Serious Data Management System is badly needed

Stage Input data Output data

Simulation 002000 20 GB 320 GB

Simulation 002003 160 GB 440 GB

Pile-up 002000 320 GB 450 GB

Pile-up 002003 440 GB 900 GB

Extra 600 GB

Reconstruction 002000 1.1 TB 9 GB

Extra 750 GB

Reconstruction 002001 750 GB 10 GB


NorduGrid DC1: Summary

All tasks are performed in the Grid environment Input/output data transfer and management Job execution Monitoring

Total CPU-time: 6500 NCPU-days Approximately 15% contribution to ATLAS DC1 in the latest phase

Phase 1: Input 176 GB → Output 760 GB Phase 2: Input 1080 GB → Output 1440 GB Phase 3: Input 5 TB → Output 40 GB Very fast development from testbed to real production-level Grid.

NorduGrid functionality is driven by the needs from DC1 Stability and reliability are very impressive: no extra efforts from sysadmins

are required Hardware failures are inevitable, but clever job recovery mechanisms are still

missing


Summary

ATLAS DC ran on Grid since summer 2002 (NorduGrid, US Grid) Future DCs will be to large extent (if not entirely) gridified

Grids that we tried: NorduGrid – a Globus-based solution developed in Nordic countries,

provides stable and reliable facility, executes all the Nordic share of DCs US Grid (iVDGL/VDT) – basically, Globus tools, hence missing high-level

services, but still serves ATLAS well, executing ca 10% of US DC share EU DataGrid (EDG) – way more complex solution (but Globus-based, too),

still in development, not yet suitable for production, but can perform simple tasks. Did not contribute significantly to DCs, but extensive tests were conducted

Grids that are coming: LCG: will be initially strongly based on EDG, hence may not be reliable

before 2004 EGEE: another continuation of EDG, still in the preparation state Globus moves towards Open Grid Services – may imply major changes both

in existing solutions, and in planning

ATLAS Data Challenges on the Grid Oxana Smirnova Lund University October 31, 2003, Košice.

Documents

Transcript of ATLAS Data Challenges on the Grid Oxana Smirnova Lund University October 31, 2003, Košice.