Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006...
-
date post
20-Dec-2015 -
Category
Documents
-
view
219 -
download
0
Transcript of Grid Computing at LHC and ATLAS Data Challenges IMFP-2006 El Escorial, Madrid, Spain. April 4, 2006...
Grid Computing at LHC and ATLAS Data Challenges
IMFP-2006El Escorial, Madrid, Spain.
April 4, 2006
Gilbert Poulard (CERN PH-ATC)
IMPF-2006 G. Poulard - CERN PH-ATC 2
Overview Introduction LHC experiments Computing challenges WLCG: Worldwide LHC Computing Grid ATLAS experiment
o Building the Computing System Conclusions
IMPF-2006 G. Poulard - CERN PH-ATC 3
LHC (CERN)IntroductionIntroduction: LHC/CERN: LHC/CERN
Mont Blanc, 4810 m
Geneva
IMPF-2006 G. Poulard - CERN PH-ATC 4
LHC Computing Challenges
Large distributed community Large data volume … and access to it to everyone Large CPU capacity
IMPF-2006 G. Poulard - CERN PH-ATC 5
Challenge 1: Large, distributed community
CMSATLAS
LHCb~ 5000 Physicistsaround the world- around the clock
“Offline” software effort:
1000 person-yearsper experiment
Software life span: 20 years
IMPF-2006 G. Poulard - CERN PH-ATC 6
Large data volume
Rate
[Hz]
RAW
[MB]
ESDrDSTRECO[MB]
AOD
[kB]
MonteCarlo
[MB/evt]
MonteCarlo
% of real
ALICE HI 100 12.5 2.5 250 300 100
ALICE pp 100 1 0.04 4 0.4 100
ATLAS 200 1.6 0.5 100 2 20
CMS 150 1.5 0.25 50 2 100
LHCb 2000 0.025 0.025 0.5 20
50 days running in 2007107 seconds/year pp from 2008 on ~2 x 109 events/experiment106 seconds/year heavy ion
IMPF-2006 G. Poulard - CERN PH-ATC 7
Large CPU capacity
CPU (MSi2k) Disk (PB) Tape (PB)
Tier-0 4.1 0.4 5.7
CERN Analysis Facility 2.7 1.9 0.5
Sum of Tier-1s 24.0 14.4 9.0
Sum of Tier-2s 19.9 8.7 0.0
Total 50.7 25.4 15.2
~50000 today’s CPU
ATLAS resources in 2008o Assume 2 x 109 events per year (1.6 MB per event)o First pass reconstruction will run at CERN Tier-0o Re-processing will be done at Tier-1s (Regional Computing Centers) (10)o Monte Carlo simulation will be done at Tier-2s (e.g. Physics Institutes)
(~30) 4 Full simulation of ~20% of the data rate
o Analysis will be done at Analysis Facilities; Tier-2s; Tier-3s; …
IMPF-2006 G. Poulard - CERN PH-ATC 8
CPU Requirements
0
50
100
150
200
250
300
350
2007 2008 2009 2010Year
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
CE
RN
Tie
r-1
Tie
r-2
58%
pled
ged
IMPF-2006 G. Poulard - CERN PH-ATC 9
Disk Requirements
0
20
40
60
80
100
120
140
160
2007 2008 2009 2010Year
PB
LHCb-Tier-2
CMS-Tier-2
ATLAS-Tier-2
ALICE-Tier-2
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
CE
RN
Tie
r-1
Tie
r-2
54%
pled
ged
IMPF-2006 G. Poulard - CERN PH-ATC 10
Tape Requirements
CE
RN
Tie
r-1
0
20
40
60
80
100
120
140
160
2007 2008 2009 2010Year
PB
LHCb-Tier-1
CMS-Tier-1
ATLAS-Tier-1
ALICE-Tier-1
LHCb-CERN
CMS-CERN
ATLAS-CERN
ALICE-CERN
75%
pled
ged
IMPF-2006 G. Poulard - CERN PH-ATC 11
LHC Computing Challenges
Large distributed community Large data volume … and access to it to everyone Large CPU capacity
How to face the problems?
CERN Computing Review (2000-2001) “Grid” is the chosen solution “Build” the LCG (LHC Computing Grid) project Roadmap for the LCG project
And for experimentso In 2005 LCG became WLCG
IMPF-2006 G. Poulard - CERN PH-ATC 12
What is the Grid? The World Wide Web provides seamless access to
information that is stored in many millions of different geographical locations.
The Grid is an emerging infrastructure that provides seamless access to computing power and data storage capacity distributed over the globe.o Global Resource Sharingo Secure Accesso Resource Use Optimization
o The “Death of Distance” - networkingo Open Standards
IMPF-2006 G. Poulard - CERN PH-ATC 13
The Worldwide LHC Computing Grid Project - WLCG
Collaborationo LHC Experimentso Grid projects: Europe, USo Regional & national centres
Choiceso Adopt Grid technology.o Go for a “Tier” hierarchy
Goalo Prepare and deploy the computing environment to help the experiments analyse the data from the LHC detectors.
grid for a physicsstudy group
Tier3physics
department
Desktop
Germany
Tier 1
USAUK
France
Italy
Taipei
SARA
SpainCERN Tier 0Tier2
Lab aUni a
Lab c
Uni n
Lab m
Lab b
Uni bUni y
Uni xgrid for a regional group
IMPF-2006 G. Poulard - CERN PH-ATC 14
Memberso The experimentso The computing centres –
Tier-0, Tier-1, Tier-2
Memorandum of understandingo Resources, services, defined
service levelso Resource commitments
pledged for the next year, with a 5-year forward look
The Worldwide LCG Collaboration
IMPF-2006 G. Poulard - CERN PH-ATC 15
WLCG services – built on two majorscience grid infrastructures
EGEE - Enabling Grids for E-SciencEOSG - US Open Science Grid
IMPF-2006 G. Poulard - CERN PH-ATC 16
Enabling Grids for E-SciencE
• EU supported project • Develop and operate a multi-science grid• Assist scientific communities to embrace grid technology• First phase concentrated on operations and technology• Second phase (2006-08) Emphasis on extending the scientific, geographical and industrial scope
world-wide Grid infrastructure international collaboration in phase 2 will have > 90 partners in 32 countries
IMPF-2006 G. Poulard - CERN PH-ATC 17
Open Science Grid
Multi-disciplinary Consortiumo Running physics experiments: CDF, D0, LIGO, SDSS, STARo US LHC Collaborationso Biology, Computational Chemistryo Computer Science researcho Condor and Globuso DOE Laboratory Computing Divisionso University IT Facilities
OSG todayo 50 Compute Elementso 6 Storage Elementso VDT 1.3.9o 23 VOs
IMPF-2006 G. Poulard - CERN PH-ATC 18
Architecture – Grid services
Storage Elemento Mass Storage System (MSS) (CASTOR, Enstore, HPSS, dCache, etc.)o Storage Resource Manager (SRM) provides a common way to access MSS,
independent of implementationo File Transfer Services (FTS) provided e.g. by GridFTP or srmCopy
Computing Elemento Interface to local batch system e.g. Globus gatekeeper.o Accounting, status query, job monitoring
Virtual Organization Managemento Virtual Organization Management Services (VOMS)o Authentication and authorization based on VOMS model.
Grid Catalogue Serviceso Mapping of Globally Unique Identifiers (GUID) to local file nameo Hierarchical namespace, access control
Interoperabilityo EGEE and OSG both use the Virtual Data Toolkit (VDT)o Different implementations are hidden by common interfaces
IMPF-2006 G. Poulard - CERN PH-ATC 19
Technology - Middleware
Currently, the LCG-2 middleware is deployed in more than 100 sites
It originated from Condor, EDG, Globus, VDT, and other projects.
Will evolve now to include functionalities of the gLite middleware provided by the EGEE project which has just been made available.
Site services include security, the Computing Element (CE), the Storage Element (SE), Monitoring and Accounting Services – currently available both form LCG-2 and gLite.
VO services such as Workload Management System (WMS), File Catalogues, Information Services, File Transfer Services exist in both flavours (LCG-2 and gLite) maintaining close relations with VDT, Condor and Globus.
IMPF-2006 G. Poulard - CERN PH-ATC 20
Technology – Fabric Technology
Moore’s law still holds for processors and disk storageo For CPU and disks we count a lot on the evolution of the
consumer marketo For processors we expect an increasing importance of 64-
bit architectures and multicore chips Mass storage (tapes and robots) is still a computer centre
item with computer centre pricingo It is too early to conclude on new tape drives and robots
Networking has seen a rapid evolution recentlyo Ten-gigabit Ethernet is now in the production environmento Wide-area networking can already now count on 10 Gb
connections between Tier-0 and Tier-1s. This will move gradually to the Tier-1 – Tier-2 connections.
IMPF-2006 G. Poulard - CERN PH-ATC 21
Common Physics Applications
Core software librarieso SEAL-ROOT mergero Scripting: CINT, Pythono Mathematical librarieso Fitting, MINUIT (in C++)
Data managemento POOL:
ROOT I/O for bulk dataRDBMS for metadata
o Conditions database – COOL
Event simulationo Event generators:
generator library (GENSER)o Detector simulation:
GEANT4 (ATLAS, CMS, LHCb)
o Physics validation, compare GEANT4, FLUKA, test beam
Software development infrastructureo External librarieso Software development and
documentation toolso Quality assurance and testingo Project portal: Savannah
Core
PluginMgr Dictionary
MathLibs I/O
Interpreter
GUI 2D Graphics
Geometry Histograms Fitters
Simulation
Foundation Utilities
Engines
Generators
Data Management
Persistency
FileCatalogFramework
DataBase
Distributed Analysis
Batch
Interactive
OS binding
3D Graphics
NTuple Physics
Collections
Conditions
Experiment Frameworks
Simulation Program Reconstruction Program Analysis Program
Event Detector Calibration Algorithms
Core
PluginMgr Dictionary
MathLibs I/O
Interpreter
GUI 2D Graphics
Geometry Histograms Fitters
Simulation
Foundation Utilities
Engines
Generators
Data Management
Persistency
FileCatalogFramework
DataBase
Distributed Analysis
Batch
Interactive
OS binding
3D Graphics
NTuple Physics
Collections
Conditions
Experiment Frameworks
Simulation Program Reconstruction Program Analysis Program
Event Detector Calibration Algorithms
IMPF-2006 G. Poulard - CERN PH-ATC 22
The Hierarchical Model
Tier-0 at CERNo Record RAW data (1.25 GB/s ALICE; 320 MB/s ATLAS)o Distribute second copy to Tier-1so Calibrate and do first-pass reconstruction
Tier-1 centres (11 defined)o Manage permanent storage – RAW, simulated, processedo Capacity for reprocessing, bulk analysis
Tier-2 centres (>~ 100 identified)o Monte Carlo event simulationo End-user analysis
Tier-3o Facilities at universities and laboratorieso Access to data and processing in Tier-2s, Tier-1so Outside the scope of the project
IMPF-2006 G. Poulard - CERN PH-ATC 23
Tier-1s
Tier-1 CentreExperiments served with
priority
ALICE ATLAS CMS LHCb
TRIUMF, Canada X
GridKA, Germany X X X X
CC, IN2P3, France X X X X
CNAF, Italy X X X X
SARA/NIKHEF, NL X X X
Nordic Data Grid Facility (NDGF)
X X X
ASCC, Taipei X X
RAL, UK X X X X
BNL, US X
FNAL, US X
PIC, Spain X X X
IMPF-2006 G. Poulard - CERN PH-ATC 24
Tier-2s
~100 identified – number still growing
IMPF-2006 G. Poulard - CERN PH-ATC 25
Tier-0 -1 -2 Connectivity
Tier-2s and Tier-1s are inter-connected by the general
purpose research networks
Any Tier-2 mayaccess data at
any Tier-1
Tier-2 IN2P3
TRIUMF
ASCC
FNAL
BNL
Nordic
CNAF
SARAPIC
RAL
GridKa
Tier-2
Tier-2
Tier-2
Tier-2
Tier-2
Tier-2
Tier-2Tier-2
Tier-2
National Research Networks (NRENs) at Tier-1s:ASnetLHCnet/ESnetGARRLHCnet/ESnetRENATERDFNSURFnet6NORDUnetRedIRISUKERNACANARIE
IMPF-2006 G. Poulard - CERN PH-ATC 26
Prototypes It is important that the hardware and software systems
developed in the framework of LCG be exercised in more and more demanding challenges
Data Challenges have been recommended by the ‘Hoffmann Review’ of 2001. They though the main goal was to validate the distributed computing model and to gradually build the computing systems, the results have been used for physics performance studies and for detector, trigger, and DAQ design. Limitations of the Grids have been identified and are being addressed.
o A series of Data Challenges have been run by the 4 experiments
Presently, a series of Service Challenges aim to realistic end-to-end testing of experiment use-cases over extended period leading to stable production services.
The project ‘A Realisation of Distributed Analysis for LHC’ (ARDA) is developing end-to-end prototypes of distributed analysis systems using the EGEE middleware gLite for each of the LHC experiments.
IMPF-2006 G. Poulard - CERN PH-ATC 27
Service Challenges
Purposeo Understand what it takes to operate a real grid servicereal grid service – run for days/weeks
at a time (not just limited to experiment Data Challenges)o Trigger and verify Tier1 & large Tier-2 planning and deployment –
- tested with realistic usage patternso Get the essential grid services ramped up to target levels of reliability,
availability, scalability, end-to-end performance
Four progressive steps from October 2004 thru September 2006o End 2004 - SC1 – data transfer to subset of Tier-1so Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier-2so 2nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services
o Jun-Sep 2006 – SC4 – pilot service
IMPF-2006 G. Poulard - CERN PH-ATC 28
Key dates for Service Preparation
SC3
LHC Service OperationFull physics run
2005 20072006 2008
First physicsFirst beams
cosmics
Sep05 - SC3 Service Phase
Jun06 –SC4 Service Phase
Sep06 – Initial LHC Service in stable operation
SC4
• SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 1GB/sec, including mass storage 500 MB/sec (150 MB/sec & 60 MB/sec at Tier-1s)• SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput (~ 1.5 GB/sec mass storage throughput)• LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput
Apr07 – LHC Service commissioned
IMPF-2006 G. Poulard - CERN PH-ATC 29
ARDA: A Realisation of Distributed Analysis for LHC
Distributed analysis on the Grid is the most difficult and least defined topic
ARDA sets out to develop end-to-end analysis prototypes using the LCG-supported middleware.
ALICE uses the AliROOT framework based on PROOF. ATLAS has used DIAL services with the gLite prototype as
backend; this is rapidly evolving. CMS has prototyped the ‘ARDA Support for CMS Analysis
Processing’ (ASAP) that us used by several CMS physicists for daily analysis work.
LHCb has based its prototype on GANGA, a common project between ATLAS and LHCb.
IMPF-2006 G. Poulard - CERN PH-ATC 30
Production GridsWhat has been achieved
Basic middleware A set of baseline services agreed and
initial versions in production All major LCG sites active 1 GB/sec distribution data rate mass
storage to mass storage, > 50% of the nominal LHC data rate
Grid job failure rate 5-10% for most experiments,down from ~30% in 2004
Sustained 10K jobs per day > 10K simultaneous jobs
during prolonged periods
Average number of jobs/dayEGEE Grid - 2005
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
jan feb mar apr may jun jul aug sep oct nov
month
jobs/day .
j
IMPF-2006 G. Poulard - CERN PH-ATC 31
Summary on WLCG Two grid infrastructures are now in operation, on
which we are able to complete the computing services for LHC
Reliability and performance have improved significantly over the past year
The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams.
Development of new functionality andservices must continue, but we must be careful that this does not interferewith the main priority for this year –reliable operation of the baseline services
From Les Robertson (CHEP’06)
IMPF-2006 G. Poulard - CERN PH-ATC 32
ATLASATLAS
Detector for the study of high-energy proton-proton collision.
The offline computing will have to deal with an output event rate of 200 Hz. i.e 2x109 events per year with an average event size of 1.6 Mbyte.
Researchers are spread all over the world.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
ATLAS:~ 2000 Collaborators~150 Institutes34 Countries
Diameter 25 mBarrel toroid length 26 mEndcap end-wall chamber span 46 mOverall weight 7000 Tons
A Toroidal LHC ApparatuS
IMPF-2006 G. Poulard - CERN PH-ATC 33
Tier2 Centre ~200kSI2k
Event Builder
Event Filter~159kSI2k
T0 ~5MSI2k
UK Regional Centre (RAL)
US Regional Centre
Spanish Regional Centre (PIC)
Italian Regional Centre
SheffieldManchesterLiverpoolLancaster ~0.25TIPS
Workstations
10 GB/sec
450 Mb/sec
100 - 1000 MB/s
•Some data for calibration and monitoring to institutess
•Calibrations flow back
Each Tier 2 has ~25 physicists working on one or more channels
Each Tier 2 should have the full AOD, TAG & relevant Physics Group summary data
Tier 2 do bulk of simulation
Physics data cache
~Pb/sec
~ 300MB/s/T1 /expt
Tier2 Centre ~200kSI2k
Tier2 Centre ~200kSI2k
622Mb/s
Tier 0Tier 0
Tier 1Tier 1
DeskDesktoptop
PC (2004) = ~1 kSpecInt2k
Northern Tier ~200kSI2k
Tier 2Tier 2 ~200 Tb/year/T2
~7.7MSI2k/T1 ~2 Pb/year/T1
~9 Pb/year/T1 No simulation
622Mb/s
The Computing Model
IMPF-2006 G. Poulard - CERN PH-ATC 34
ATLAS Data Challenges (1) LHC Computing Review (2001)
“Experiments should carry out Data Challenges of increasing size and complexity
to validatetheir Computing Modeltheir Complete Software suitetheir Data Model
to ensure the correctness of the technical choices to be made”
IMPF-2006 G. Poulard - CERN PH-ATC 35
ATLAS Data Challenges (2) DC1 (2002-2003)
o First ATLAS exercise on world-wide scale O(1000) CPUs peak
o Put in place the full software chain Simulation of the data; digitization; pile-up; reconstruction
o Production system Tools
• Bookkeeping of data and Jobs (~AMI); Monitoring; Code distribution
o “Preliminary” Grid usage NorduGrid: all production performed on the Grid US: Grid used at the end of the exercise LCG-EDG: some testing during the Data Challenge but not “real” production
o At least one person per contributing site Many people involved
o Lessons learned Management of failures is a key concern Automate to cope with large amount of jobs
o “Build” the ATLAS DC community Physics Monte Carlo data needed for ATLAS High Level Trigger Technical Design Report
IMPF-2006 G. Poulard - CERN PH-ATC 36
ATLAS Data Challenges (3) DC2 (2004)
o Similar exercise as DC1 (scale; physics processes) BUTo Introduced the new ATLAS Production System (ProdSys)
Unsupervised production across many sites spread over three different Grids (US Grid3; ARC/NorduGrid; LCG-2)
Based on DC1 experience with AtCom and GRAT• Core engine with plug-ins
4 major components• Production supervisor• Executor• Common data management system• Common production database
Use middleware components as much as possible• Avoid inventing ATLAS’s own version of Grid
– Use middleware broker, catalogs, information system, …
Immediately followed by “Rome” production (2005)o Production of simulated data for an ATLAS Physics workshop in Rome in June
2005 using the DC2 infrastructure.
IMPF-2006 G. Poulard - CERN PH-ATC 37
ATLAS Production System
ATLAS uses 3 Gridso LCG (= EGEE)o ARC/NorduGrid (evolved from EDG)o OSG/Grid3 (US)
Plus possibility for local batch submission (4 interfaces) Input and output must be accessible from all Grids The system makes use of the native Grid middleware as
much as possible (e.g.. Grid catalogs); not “re-inventing” its own solution.
IMPF-2006 G. Poulard - CERN PH-ATC 38
ATLAS Production System
The production database, which contains abstract job definitions
A supervisor (Windmill; Eowyn) that reads the production database for job definitions and present them to the different Grid executors in an easy-to-parse XML format
The Executors, one for each Grid flavor, that receives the job-definitions in XML format and converts them to the job description language of that particular Grid
DonQuijote (DQ), the ATLAS Data Management System, moves files from their temporary output locations to their final destination on some Storage Elements and registers the files in the Replica Location Service of that Grid
In order to handle the task of ATLAS DCs an automated Production system was developed. It consists of 4 components:
IMPF-2006 G. Poulard - CERN PH-ATC 39
The 3 Grid flavors: LCG-2
Number of sites;resourcesare evolving quickly
ATLA
S D
C2
Aut
umn
2004
IMPF-2006 G. Poulard - CERN PH-ATC 40
The 3 Grid flavors: Grid3
The deployed infrastructure has been in operation since November 2003 At this moment running 3 HEP and 2 Biological applications Over 100 users authorized to run in GRID3
Sep 04•30 sites, multi-VO•shared resources•~3000 CPUs (shared)
ATLA
S D
C2
Aut
umn
2004
IMPF-2006 G. Poulard - CERN PH-ATC 41
The 3 Grid flavors: NorduGrid
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
•> 10 countries, 40+ sites, ~4000 CPUs, •~30 TB storage
NorduGrid is a research collaboration established mainly across Nordic Countries but includes sites from other countries.
They contributed to a significant part of the DC1 (using the Grid in 2002).
It supports production on several operating systems.
ATLA
S D
C2
Aut
umn
2004
IMPF-2006 G. Poulard - CERN PH-ATC 42
HitsMCTruth
Digits(RDO)
MCTruth
BytestreamRaw
Digits
ESD
ESD
Geant4
Reconstruction
Reconstruction
Pile-up
BytestreamRaw
Digits
BytestreamRaw
Digits
HitsMCTruth
Digits(RDO)
MCTruth
Physicsevents
EventsHepMC
EventsHepMC
HitsMCTruth
Digits(RDO)
MCTruthGeant4
Geant4
Digitization
Digits(RDO)
MCTruth
BytestreamRaw
Digits
BytestreamRaw
Digits
BytestreamRaw
DigitsEventsHepMC
HitsMCTruth
Geant4Pile-up
Digitization
Mixing
Mixing Reconstruction ESD
Pyt
hia
Event generation
Detector Simulation
Digitization (Pile-up)
ReconstructionEventMixingByte stream
EventsHepMC
Min. biasEvents
Piled-upevents Mixed events
Mixed eventsWith
Pile-up
~5 TB 20 TB 30 TB20 TB 5 TB
TBVolume of datafor 107 events
Persistency:Athena-POOL
Production phases
AOD
AOD
AOD
IMPF-2006 G. Poulard - CERN PH-ATC 43
ATLAS productions DC2
o Few datasetso Different type of jobs
Physics Events Generation• Very short
Geant simulation • Geant3 in DC1; Geant4 in DC2
& “Rome”• Long: more than 10 hours
Digitization• Medium: ~5 hours
Reconstruction• short
o All types of jobs run sequentially
Each phase one after the other
“Rome”o Many different (>170) datasets
Different physics channelso Same type of jobs
Event Generation; Simulation, etc.
o All type of jobs run in parallel
Now “continuous” productiono Goal is to reach 2M events per
week.
The different type of running has a large impact on the production rate
IMPF-2006 G. Poulard - CERN PH-ATC 44
ATLAS Productions: countries (sites)
Australia (1) (0) Austria (1) Canada (4) (3) CERN (1) Czech Republic (2) Denmark (4) (3) France (1) (4) Germany (1+2) Greece (0) (1) Hungary (0) (1) Italy (7) (17) Japan (1) (0)
Netherlands (1) (2) Norway (3) (2) Poland (1) Portugal (0) (1) Russia (0) (2) Slovakia (0) (1) Slovenia (1) Spain (3) Sweden (7) (5) Switzerland (1) (1+1) Taiwan (1) UK (7) (8) USA (19)
DC2: 20 countries; 69 sites“Rome”: 22 countries; 84 sites
DC2: 13 countries; 31 sites“Rome”: 17 countries; 51 sites
DC2: 7 countries; 19 sites“Rome”: 7 countries; 14 sites Spring 2006: 30 countries; 126 sites
LCG: 104OSG/Grid3: 8NDGF: 14
IMPF-2006 G. Poulard - CERN PH-ATC 45
ATLAS DC2: Jobs Total
0%0%0%1% 1%3%
1%1%1%
2%
3%
0%0%2%
0%1%
1%0%1%0%1%1%0%
4%
0%
1%
0%
0%
3%
0%
2%
1%
1%
2%
0%4%
0%0%1%1%
1%1%
4%4%1%3%2%
3%
0%
5%
0%
5%
6%
1%
1%
3%
0%
1%
0%
0%
0%
1%
1%
4%
0%
2%
0%6% 0%
at.uibk ca.alberta
ca.montreal ca.toronto
ca.triumf ch.cern
cz.cesnet cz.golias
de.fzk es.ifae
es.ific es.uam
fr.in2p3 it.cnaf
it.lnf it.lnl
it.mi it.na
it.roma it.to
jp.icepp nl.nikhef
pl.zeus tw.sinica
uk.cam uk.lancs
uk.man uk.pp.ic
uk.rl uk.shef
uk.ucl au.melbourne
ch.unibe de.fzk
de.lrz-muenchen dk.aau
dk.dcgc dk.nbi
dk.sdu no.uib
no.grid.uio no.hypatia.uio
se.hoc2n.umu se.it.uu
se.lu se.lunarc
se.nsc se.pdc
se.unicc.chalmers si.ijs
ANL_HEP BNL_ATLAS
BU_ATLAS_Tier2 CalTech_PG
FNAL_CMS IU_ATLAS_Tier2
OU_OSCER PDSF
PSU_Grid3 Rice_Grid3
SMU_Physics_Cluster UBuffalo_CCR
UCSanDiego_PG UC_ATLAS_Tier2
UFlorida_PG UM_ATLAS
UNM_HPC UTA_dpcc
UWMadison
20 countries69 sites~ 260000 Jobs~ 2 MSi2k.months
As of 3
0 Nov
ember
2004
IMPF-2006 G. Poulard - CERN PH-ATC 46
Rome productionNumber of Jobs
ATLAS Rome Production - Number of Jobsuibk.ac.at triumf.caumomtreal.ca utoronto.cacern.ch unibe.chcsvs.ch golias.czskurut.cz gridka.fzk.deatlas.fzk.de lcg-gridka.fzk.debenedict.dk nbi.dkmorpheus.dk ific.uv.esft.uam.es ifae.esmarseille.fr cclcgcdli.in2p3.frclrece.in2p3.fr cea.frisabella.gr kfki.hucnaf.it lnl.itroma1.it mi.itba.it pd.itlnf.it na.itto.it fi.itct.it ca.itfe.it pd.itroma2.it bo.itpi.it sara.nlnikhef.nl uio.nohypatia.no zeus.pllip.pt msu.ruhagrid.se bluesmoke.sesigrid.se pdc.sechalmers.se brenta.sisavka.sk ihep.susinica.tw ral.ukshef.uk ox.ukucl.uk ic.uklancs.uk man.uked.uk UTA.usBNL.us BU.usUC_ATLAS.us PDSF.usFNAL.us IU.usOU.us PSU.usHamptom.us UNM.usUCSanDiego.us Uflorida.suSMU.us CalTech.usANL.us UWMadison.usUC.us Rice.usUnknown
573315 jobs22 countries84 sites
As of 1
7 June 2
005
6 %
5 %
5 %6 %
4 %
5 %4 %
6 %
IMPF-2006 G. Poulard - CERN PH-ATC 47
Rome production statistics
173 datasets 6.1 M events simulated and
reconstructed (without pile-up)
Total simulated data 8.5 M events
Pile-up done for 1.3 M events
o 50 K reconstructed
Number of Jobs
Grid324%
LCG34%
LCG-CG31%
NorduGrid11%
Grid3
LCG
LCG-CG
NorduGrid
IMPF-2006 G. Poulard - CERN PH-ATC 48
ATLAS production (January 1st - March 15th) - Number of jobs
0%0% 5%
0%
1%
1%
2%
2%
2%
2%
2%
2%
1%
8%
4%
6%
0%
4%
2%
1%
5%0%
2%
15%
19%
9%
1%
Austria
TRIUMF (T1)
CanadaCERN
Switzerland
Czech Republic
FZK-GridKA (T1)
Germany
PIC (T1)Spain
CC-IN2P3 (T1)
France
Greece
CNAF (T1)Italy
NDGF (T1)
SARA (T1)
Netherlands
PolandRussia
Slovenia
ASGC (T1)
RAL (T1)
UK
BNL (T1)US
Others
Total number of jobs: 260K
Percentage of jobs at Tier-1s: 41%
ATLAS Production (January 1st - March 15th) - Number of jobs
OSG28%
LCG62%
NDGF10%
OSG
LCG
NDGF
Total Number of Jobs: 260727
Number of sites: 126OSG: 8LCG: 104NDGF: 14
ATLAS Production (2006)
IMPF-2006 G. Poulard - CERN PH-ATC 49
ATLAS Production(July 2004 - May 2005)
IMPF-2006 G. Poulard - CERN PH-ATC 50
ATLAS & Service Challenges 3
Tier-0 scaling testso Test of the operations at CERN Tier-0o Original goal: 10% exercise
Preparation phase July-October 2005Tests October’05-January’06
IMPF-2006 G. Poulard - CERN PH-ATC 51
ATLAS & Service Challenges 3
The Tier-0 facility at CERN is responsible for the following operations: o Calibration and alignment; o First-pass ESD production; o First-pass AOD production; o TAG production; o Archiving of primary RAW and first-pass ESD, AOD and
TAG data; o Distribution of primary RAW and first-pass ESD, AOD
and TAG data.
IMPF-2006 G. Poulard - CERN PH-ATC 52
ATLAS SC3/Tier-0 (1) Components of Tier-0
o Castor mass storage system and local replica catalogue;
o CPU farm; o Conditions DB; o TAG DB;o Tier-0 production database; o Data management system, Don Quijote 2 (DQ2)o To be orchestred by the Tier-0 Management
System: TOM, based on ATLAS Production System (ProdSys)
IMPF-2006 G. Poulard - CERN PH-ATC 53
ATLAS SC3/Tier-0 (2)
Deploy and testo LCG/gLite components (main focus on T0 exercise)
FTS server at T0T0 and T1 LFC catalog at T0T0, T1T1 and T2 VOBOX at T0T0, T1T1 and T2 SRM Storage element at T0T0, T1T1 and T2
o ATLAS DQ2 specific components Central DQ2 dataset catalogs DQ2 site services
• Sitting in VOBOXes DQ2 client for TOM
IMPF-2006 G. Poulard - CERN PH-ATC 54
ATLAS Tier-0
EF
CPU
T1T1T1castor
tape
RAW
1.6 GB/file0.2 Hz17K f/day320 MB/s27 TB/day
ESD
0.5 GB/file0.2 Hz17K f/day100 MB/s8 TB/day
AOD
10 MB/file2 Hz170K f/day20 MB/s1.6 TB/day
AODm
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
RAW
AOD
RAW
ESD (2x)
AODm (10x)
RAW
ESD
AODm
0.44 Hz37K f/day440 MB/s
1 Hz85K f/day720 MB/s
0.4 Hz190K f/day340 MB/s
2.24 Hz170K f/day (temp)20K f/day (perm)140 MB/s
IMPF-2006 G. Poulard - CERN PH-ATC 55
Scope of the Tier-0 Scaling Test
It was only possible to testo EF writing into Castoro ESD/AOD production on reco farmo archiving to tapeo export to Tier-1s of RAW/ESD/AOD
the goal was to test as much as possible, as realistic as possible
mainly data-flow/infrastructure test (no physics value) calibration & alignment processing not included yet CondDB and TagDB streams
IMPF-2006 G. Poulard - CERN PH-ATC 56
Oct-Dec 2005 Test: Some Results
Castor Writing Rates (Dec 19-20) - EF farm Castor (write.raw)
- reco farm Castor - reco jobs: write.esd + write.aodtmp - AOD-merging jobs: write.aod
IMPF-2006 G. Poulard - CERN PH-ATC 57
Tier-0 Internal Test, Jan 28-29, 2006
READING (nom. rate: 780 MB/s) - Disk WN - Disk Tape
WRITING (nom. rate: 460 MB/s) - SFO Disk - WN Disk
440 M
460 M
780 M
WRITING (nom. rate: 440 MB/s) - Disk Tape
IMPF-2006 G. Poulard - CERN PH-ATC 58
ATLAS SC4 Tests (June to December 2006)
Complete Tier-0 testo Internal data transfer from “Event Filter” farm to Castor disk
pool, Castor tape, CPU farmo Calibration loop and handling of conditions data
Including distribution of conditions data to Tier-1s (and Tier-2s)
o Transfer of RAW, ESD, AOD and TAG data to Tier-1so Transfer of AOD and TAG data to Tier-2so Data and dataset registration in DB
Distributed productiono Full simulation chain run at Tier-2s (and Tier-1s)
Data distribution to Tier-1s, other Tier-2s and CAF
o Reprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF
Distributed analysiso “Random” job submission accessing data at Tier-1s (some) and Tier-2s
(mostly)o Tests of performance of job submission, distribution and output retrieval
Need to define and test Tiers infrastructure
and
Tier-1 Tier-1
Tier-1 Tier-2s
associations
IMPF-2006 G. Poulard - CERN PH-ATC 59
ATLAS Tier-1s
“2008” ResourcesCPU Disk Tape
MSI2K % PB % PB %
Canada TRIUMF 1.06 4.4 0.62 4.3 0.4 4.4
France CC-IN2P3 3.02 12.6 1.76 12.2 1.15 12.8
Germany FZK 2.4 10 1.44 10 0.9 10
Italy CNAF 1.76 7.3 0.8 5.5 0.67 7.5
Nordic Data Grid Facility
1.46 6.1 0.62 4.3 0.62 6.9
Netherlands
SARA 3.05 12.7 1.78 12.3 1.16 12.9
Spain PIC 1.2 5 0.72 5 0.45 5
Taiwan ASGC 1.87 7.8 0.83 5.8 0.71 7.9
UK RAL 1.57 6.5 0.89 6.2 1.03 11.5
USA BNL 5.3 22.1 3.09 21.4 2.02 22.5
Total 2008 pledged
22.69 94.5 12.55 87 9.11 101.4
2008 needed
23.97 100 14.43 100 8.99 100
2008 missing
1.28 5.5 1.88 13 -0.12 -1.4
IMPF-2006 G. Poulard - CERN PH-ATC 60
ATLAS Tiers Association (SC4-draft)Associated Tier-1 Tier-2 or planned Tier-2
% Disk TB % PB %
Canada TRIUMF 5.3 SARA East T2 Fed.
West T2 Fed.
France CC-IN2P3 13.5 BNL CC-IN2P3 AF GRIF LPC HEP-
Beijing
Romanian T2
Germany FZK-GridKa 10.5 BNL DESY Munich
Fed. Freiburg
Uni.Wuppertal
Uni.
FZU AS (CZ)
Polish T2 Fed.
Italy CNAF 7.5 RAL INFN T2 Fed.
Netherlands SARA 13.0 TRIUMF
ASGC
Nordic Data Grid Facility
5.5 PIC
Spain PIC 5.5 NDGF ATLAS T2 Fed
Taiwan ASGC 7.7 SARA Taiwan AF Fed
UK RAL 7.5 CNAF Grid London NorthGrid ScotGrid SouthGrid
USA BNL 24 CC-IN2P3FZK-GridKa BU/HU T2 Midwest
T2Southwes
t T2
No association (yet)
Melbourne Uni.
ICEPP Tokyo LIP T2 HEP-IL
Fed.
Russian Fed. CSCS (CH) UIBK Brazilian
T2 Fed.
IMPF-2006 G. Poulard - CERN PH-ATC 61
Computing System Commissioning
We have defined the high-level goals of the Computing System Commissioning operation during 2006o More a running-in of continuous operation than a stand-alone
challenge Main aim of Computing System Commissioning will be to test the
software and computing infrastructure that we will need at the beginning of 2007:o Calibration and alignment procedures and conditions DBo Full trigger chaino Event reconstruction and data distributiono Distributed access to the data for analysis
At the end (autumn-winter 2006) we will have a working and operational system, ready to take data with cosmic rays at increasing rates
IMPF-2006 G. Poulard - CERN PH-ATC 62
IMPF-2006 G. Poulard - CERN PH-ATC 63
Conclusions (ATLAS) Data Challenges (1,2); productions(“Rome”; “current (continuous)”)
o Have proven that the 3 Grids LCG-EGEE; OSG/Grid3 and Arc/NorduGrid can be used in a coherent way for real large scale productions
Possible, but not easy In SC3
o We succeeded to reach the nominal data transfer at Tier-0 (internally) and reasonable transfers to Tier-1
SC4 o Should allow us to test the full chain using the new WLCG middleware
and infrastructure and the new ATLAS Production and Data management systems
o This will include a more complete Tier-0 test; Distributed productions and distributed analysis tests
Computing System Commissioningo Will have as main goal to have a full working and operational system o Leading to a Physics readiness report
IMPF-2006 G. Poulard - CERN PH-ATC 64
Thank you