Atlas Computing Alessandro De Salvo Terzo workshop sul calcolo dell’INFN 5-2004...
-
Upload
irvin-avary -
Category
Documents
-
view
215 -
download
2
Transcript of Atlas Computing Alessandro De Salvo Terzo workshop sul calcolo dell’INFN 5-2004...
Atlas ComputingAtlas ComputingAlessandro De Salvo <Alessandro De Salvo <[email protected]>>
Terzo workshop sul calcolo dell’INFN 5-2004Terzo workshop sul calcolo dell’INFN 5-2004
OutlineOutline
Computing modelComputing model Activities in 2004Activities in 2004 ConclusionsConclusions
A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004A. De Salvo – Terzo workshop sul calcolo nell'INFN, 27-5-2004
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Data Rates per yearAtlas Data Rates per year
Rate(Hz) sec/year Events/year Size(MB) Total(TB)
Raw Data 200 1.00E+07 2.00E+09 1.6 3200
ESD (Event Summary Data) 200 1.00E+07 2.00E+09 0.5 1000
General ESD 180 1.00E+07 1.80E+09 0.5 900
General AOD (Analysis Object Data) 180 1.00E+07 1.80E+09 0.1 180
General TAG 180 1.00E+07 1.80E+09 0.001 2
Calibration 40
MC Raw 1.00E+08 2 200
ESD Sim 1.00E+08 0.5 50
AOD Sim 1.00E+08 0.1 10
TAG Sim 1.00E+08 0.001 0
Tuple 0.01
Nominal year: 10Nominal year: 1077 s sAccelerator efficiency: 50%Accelerator efficiency: 50%
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Processing timesProcessing times
Reconstruction Time/event for Reconstruction now: 60 kSI2k sec
• We could recover a factor 4:• factor 2 from running only one default algorithm• factor 2 from optimization
• Foreseen reference: 15 kSI2k sec/event
Simulation Time/event for Simulation now: 400 kSI2k sec
• We could recover a factor 4:• factor 2 from optimization (work already in progress)• factor 2 on average from the mixture of different physics processes (and rapidity
ranges)• Foreseen reference: 100 kSI2k sec/event
Number of simulated events needed: 108 events/year• Generate samples about 3-6 times the size of their streamed AOD samples
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Production/analysis modelProduction/analysis model
Central analysis Central production of tuples and TAG collections from ESD Estimate data reduction to 10% of full AOD
• About 720Gb/group/annum 0.5kSI2k per event (estimate), quasi real time 9MSI2k
User analysis Tuples/streams analysis New selections Each user will perform 1/N of the MC non-central simulation load
• analysis of WG samples and AOD• private simulations
Total requirement 4.7kSI2k and 1.5/1.5Tb disk/tape Assume this is all done on T2s
DC2 will provide very useful informations in this domain
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Computing centers in AtlasComputing centers in Atlas Tiers defined by capacity and level of service
Tier-0 (CERN)• Hold a copy of all raw data to tape• Copy in real time all raw data to Tier-1’s (second copy useful also for later reprocessing)• Keep calibration data on disk• Run first-pass calibration/alignment and reconstruction• Distribute ESD’s to external Tier-1’s
• (1/3 to each one of 6 Tier-1’s)
Tier-1’s (at least 6):• Regional centers• Keep on disk 1/3 of the ESD’s and a full AOD’s and TAG’s• Keep on tape 1/6 of Raw Data• Keep on disk 1/3 of currently simulated ESD’s and on tape 1/6 of previous versions• Provide facilities for physics group controlled ESD analysis• Calibration and/or reprocessing of real data (one per year)
Tier-2’s (about 4 per Tier-1)• Keep on disk a full copy of TAG and roughly one full AOD copy per four T2s• Keep on disk a small selected sample of ESD’s• Provide facilities (CPU and disk space) for user analysis and user simulation (~25 users/Tier-2)• Run central simulation
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Tier-1 RequirementsTier-1 Requirements
External T1 : Storage requirement Fraction
Disk (TB) Tape (TB)
General ESD (curr.) 429 150 1/3
General ESD (prev..) 214 150 1/6
AOD 257 180 1/1
TAG 3 2 1/1
RAW Data (sample) 6 533 1/6
RAW sim 0.0 33.3 1/6
ESD Sim (curr.) 23.8 8.3 1/3
ESD Sim (prev.) 11.9 8.3 1/6
AOD Sim 14 10 1/1
Tag Sim 0 0 1/1
User Data (20 groups) 171 120 1/3
Total 1130 1195
R.
Jon
es –
Atl
as S
oft
ware
Work
sh
op
may 2
00
5
Processing for Physics Groups 1760 kSI2kReconstruction 588 kSI2k
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Tier-2 RequirementsTier-2 Requirements
R.
Jon
es –
Atl
as S
oft
ware
Work
sh
op
may 2
00
5
External T2 : Storage requirement Fraction
Disk (TB) Tape (TB)
General ESD (curr.) 26 0 1/50
General ESD (prev..) 0 18 1/50
AOD 64 0 1/4
TAG 3 0 1/1
ESD Sim (curr.) 1.4 0 1/50
ESD Sim (prev.) 0 1 1/50
AOD Sim 14 10 1/1
User Data (600/6/4=25) 37 26
Total 146 57
Simulation 21 kSI2kReconstruction 2 kSI2k
Users 176 kSI2kTotal: 199 kSI2k
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Tier 0/1/2 sizesTier 0/1/2 sizes
CERNT0+T1/2
All T1(6)
All T2(24) Total
Auto tape (Pb) 4.4 7.2 1.4 12.9
Shelf tape (Pb) 3.2 0.0 0.0 3.2
Disk (Pb) 1.9 6.8 3.5 12.2
CPU (MSI2k) 4.8 14.2 4.8 23.8
Efficiencies (LCG numbers, Atlas sw workshop May 2004 – R. Jones) Scheduled CPU activity, 85% efficiency Chaotic CPU activity, 60% Disk usage, 70% efficient Tape assumed 100% efficient
R.
Jon
es –
Atl
as S
oft
ware
Work
sh
op
may 2
00
5
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing SystemAtlas Computing System
R.
Jon
es –
Atl
as S
oft
ware
Work
sh
op
may 2
00
5
Tier2 Centre ~200kSI2k
Event Builder
Event Filter~159kSI2k
T0 ~5MSI2k
UK Regional Centre (RAL)
US Regional Centre
French Regional Centre
Italian Regional Centre
RM1 MINALNF
Workstations
10 GB/sec
450 Mb/sec
100 - 1000 MB/s
•Some data for calibration and monitoring to institutess
•Calibrations flow back
Each Tier 2 has ~25 physicists working on one or more channels
Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data
Tier 2 do bulk of simulation
Physics data cache
~Pb/sec
~ 300MB/s/T1 /expt
Tier2 Centre ~200kSI2k
Tier2 Centre ~200kSI2k622Mb/s
Tier 0Tier 0
Tier 1Tier 1
PC (2004) = ~1 kSpecInt2k
Tier 2Tier 2 ~200 Tb/year/T2
~7.7MSI2k/T1 ~2 Pb/year/T1
~9 Pb/year/T1 No simulation
622Mb/s
DesktopDesktop
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas computing in 2004Atlas computing in 2004
““Collaboration” activitiesCollaboration” activities Data Challenge 2
• May-August 2004• Real test of computing model for computing TDR (end 2004)• Simulation, reconstruction, analysis & calibration
Combined test-beam activities• Combined test-beam operation concurrent with DC2
and using the same tools
“Local” activities• Single muon simulation (Rome1, Naples)• Tau studies (Milan)• Higgs production (LNF)• Other ad-hoc productions
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Goals in 2004Goals in 2004
DC2/test-beamDC2/test-beam Computing model studies Pile-up digitization in Athena Deployment of the complete Event Data Model and the Detector Description Simulation of full Atlas and 2004 Combined Testbeam Test of the calibration and alignment procedures Full use of Geant4, POOL and other LCG applications Use widely the GRID middleware and tools Large scale physics analysis Run as much as possible the production on GRID
• Test the integration of multiple GRIDs
“Local” activities• Run local, ad-hoc productions using the LCG tools
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 timescaleDC2 timescale September 03: Release7
Mid-November 03: pre-production release
March 17th 04: Release 8 (production)
May 17th 04:
June 23rd 04:
July 15th 04:
August 1st
Put in place, understand & validate: Geant4; POOL; LCG applications Event Data Model Digitization; pile-up; byte-stream Conversion of DC1 data to POOL; large scale persistency
tests and reconstruction
Testing and validation Run test-production
Testing and validation Continuous testing of s/w components Improvements on Distribution/Validation Kit
Start final validation Intensive test of “Production System”
Event generation ready Simulation ready
Data preparation Data transfer
Reconstruction ready
Tier 0 exercise
Physics and Computing model studies Analysis (distributed) Reprocessing Alignment & calibration
Slid
e f
rom
Gilb
ert
Pou
lard
Slid
e f
rom
Gilb
ert
Pou
lard
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 resourcesDC2 resources
Process No. of events
Time duration
CPU power
Volume of data
AtCERN
Offsite
months kSI2k TB TB TB
Simulation 107 2 1000 20 4 16 Phase I(May-June-July)RDO 107 2 100 20 4 16
Pile-upDigitization
107 2 100 30 30 24
Event mixing & Byte-stream
107 2 (small) 20 20 0
Total Phase I 107 2 1200 90 58 56
ReconstructionTier-0
107 0.5 600 5 5 10 PhaseII
(>July)ReconstructionTier-1
107 2 600 5 0 5
Total 107 100 63 71
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Tiers in DC2Tiers in DC2Country “Tier-1” Sites Grid kSI2k (ATLAS DC)
Australia NG 12
Austria LCG 7
Canada TRIUMF 7 LCG 331
CERN CERN 1 LCG 700
China LCG 30
Czech Republic LCG 25
France CCIN2P3 1 LCG ~ 140
Germany GridKa 3 LCG 90
Greece LCG 10
Israel LCG 23
Italy CNAF 5 LCG 200
Japan Tokyo 1 LCG 127
Netherlands NIKHEF 1 LCG 75
NorduGrid NG 30 NG 380
Poland LCG 80
Russia LCG ~ 70
Slovakia LCG
Slovenia NG
Spain PIC 4 LCG 50
Switzerland LCG 18
Taiwan ASTW 1 LCG 78
UK RAL 8 LCG ~ 1000
US BNL 28 Grid3/LCG ~ 1000
More than 23 countries involvedMore than 23 countries involved
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 toolsDC2 tools
Installation toolsInstallation tools Atlas software distribution kitAtlas software distribution kit Validation suiteValidation suite
Production systemProduction system Atlas production system interfaced to LCG, US-Grid, NorduGrid and
legacy systems (batch systems) Tools
• Production management• Data management• Cataloguing• Bookkeping• Job submission
GRID distributed analysis• ARDA domain: test services and implementations
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Software installationSoftware installation Software installation and configuration via PACMAN
Full use of the Atlas Code Management Tool (CMT)
Relocatable, multi-release distribution No root privileges needed to install GRID-enabled installation
Grid installation via submission of a job to the destination sites
Software validation tools, integrated with the GRID installation procedure A site is marked as validated after the installed software is checked with the validation tools
Distribution format Pacman packages (tarballs)
Kit creation Building scripts (Deployment package) Built in about 3 hours, after the release is built
Kit requirementsKit requirements RedHat 7.3RedHat 7.3 >= 512 MB of RAM>= 512 MB of RAM Approx 4 GB of disk space + 2 GB in the installation phase for a full installation of a single releaseApprox 4 GB of disk space + 2 GB in the installation phase for a full installation of a single release
Kit installation pacman –get http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/pacman/cache:7.5.0/AtlasRelease
Documentation (building, installing and using) http://atlas.web.cern.ch/Atlas/GROUPS/SOFTWARE/OO/sit/Distribution
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Production System componentsAtlas Production System components
Production database Oracle based Hold definition for the job transformations Hold sensible data on the jobs life cycle
Supervisor (Windmill) Consumes jobs from the production database Dispatch the work to the executors Collect info on the job life-cycle Interact with the DMS for data registration and movements among the systems
Executor One for each grid falvour and legacy system
• LCG (Lexor)• NorduGrid (Dulcinea)• US Grid (Capone)• LSF
Communicates with the supervisor Executes the jobs to the specific subsystems
• Flavour-neutral job definitions are specialized for the specific needs• Submit to the GRID/legacy system• Provide access to GRID flavour specific tools
Data Management System (Don Quijote) Global cataloguing system Allows global data management Common interface on top of the system-specific facilities
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Production System architectureAtlas Production System architecture
RBChimera
RB
Task(Dataset)
PartitionTransf.
Definition
TaskTransf.
Definition+ physics signature
Transformation infosRelease versionsignature
Supervisor 1 Supervisor 2 Supervisor 4
US Grid LCG NG Local Batch
Task = [job]*Task = [job]*Dataset = [partition]*Dataset = [partition]* JOB DESCRIPTION
Humanintervention
DataManagement
System
US GridExecuter
LCGExecuter
NGExecuter
LSFExecuter
Supervisor 3
JobRun Info
LocationHint
(Task)
LocationHint(Job)
Job(Partition)
Jabber Jabber Jabber Jabber
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
DC2 statusDC2 status
DC2 first phase started May 3rdDC2 first phase started May 3rd Test the production system Start the event generation/simulation tests
Full production should start next week Full use of the 3 GRIDs and legacy systems
DC2 jobs will be monitored via GridICEand an ad-hoc monitoring system, interfaced to the production DB and the production systems
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing & INFN (1)Atlas Computing & INFN (1)
Responsibles & managersResponsibles & managers D. BarberisD. Barberis
• Genova, inizialmente membro del Computing Steering Group come responsabile del software Genova, inizialmente membro del Computing Steering Group come responsabile del software dell.Inner Detector, ora ATLAS Computing Coordinatordell.Inner Detector, ora ATLAS Computing Coordinator
G. CataldiG. Cataldi• Lecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, MooreLecce, nuovo coordinatore del programma OO di ricostruzione dei muoni, Moore
S. FalcianoS. Falciano• Roma1, responsabile TDAQ/LVL2Roma1, responsabile TDAQ/LVL2
A. FarillaA. Farilla• Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction Roma3, inizialmente responsabile Moore e segretario scientifico SCASI, ora Muon Reconstruction
Coordinator e coordinatore del software per il Combined Test BeamCoordinator e coordinatore del software per il Combined Test Beam L. LuminariL. Luminari
• Roma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in ItaliaRoma1, rappresentante INFN nell.ICB e referente per attivit legate al modello di calcolo in Italia A. NisatiA. Nisati
• Roma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute BoardRoma1, in rappresentanza della LVL1 simulation e Chair del TDAQ Institute Board L. PeriniL. Perini
• Milano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEEMilano, presidente, ATLAS Grid Co-convener, rappresentante di ATLAS in vari organismi LCG e EGEE G. PoleselloG. Polesello
• Pavia, Atlas Physics CoordinatorPavia, Atlas Physics Coordinator A. RimoldiA. Rimoldi
• Pavia, ATLAS Simulation Coordinator e membro del Software Project Management BoardPavia, ATLAS Simulation Coordinator e membro del Software Project Management Board V. VercesiV. Vercesi
• Pavia, PESA Coordinator e membro del Computing Managament BoardPavia, PESA Coordinator e membro del Computing Managament Board
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
Atlas Computing & INFN (2)Atlas Computing & INFN (2)
Atlas INFN sites LCG compliant for DC2Atlas INFN sites LCG compliant for DC2 Tier-1
• CNAF (G.Negri) Tier-2
• Frascati (M. Ferrer)• Milan (L. Perini, D. Rebatto, S. Resconi, L. Vaccarossa) • Naples (G. Carlino, A. Doria, L. Merola)• Rome1 (A. De Salvo, A. Di Mattia, L. Luminari)
Activities Development of the LCG interface to the Atlas Production Tool
• F. Conventi, A. De Salvo, A. Doria, D. Rebatto, G. Negri, L. Vaccarossa Participation to the DC2 using the GRID middleware (May - July 2004) Local productions with GRID tools Atlas VO management (A. De Salvo) Atlas code distribution (A. De Salvo)
• Atlas code distribution model (PACMAN based) fully deployed• The current installation system/procedure gives the possibility to have easily the cohexistence of
the Atlas software and other experiments’ environment Atlas distribution kit validation (A. De Salvo) Transformations for DC2 (A. De Salvo)
A. De Salvo – A. De Salvo – Atlas Computing Atlas Computing – – Terzo workshop sul calcolo nell'INFN, 27-5-2004Terzo workshop sul calcolo nell'INFN, 27-5-2004
ConclusionsConclusions First real test of the Atlas computing model is starting
DC2 tests started at the beginning of May “Real” production starting in June Will give important informations for the Computing TDR
Very intensive use of the GRIDs Atlas Production System interfacted to LCG, NG and US Grid
(GRID3) Global data management system
Getting closer to the real experiment computing model