Alexandre Vaniachine (ANL)
ATLAS Collaboration
Invited talk at ACAT’2002, Moscow, RussiaJune 25, 2002
Alexandre Vaniachine (ANL)[email protected]
Data Challengesin ATLAS Computing
Alexandre Vaniachine (ANL)
Outline & Acknowledgements
World Wide computing model Data persistency Application framework Data Challenges: Physics + Grid Grid integration in Data Challenges Data QA and Grid validation
Thanks to all ATLAS collaborators whose contributions I used in my talk
Alexandre Vaniachine (ANL)
Core Domains in ATLAS Computing
ATLAS Computing is right in the middle of first period of Data Challenges
Data Challenge (DC) for software is analogous to Test Beam for detector: many components have to be brought together to work
Application
GridData
Separation of the data and the algorithms in ATLAS software architecture determines our core domains:
•Persistency solutions for event data storage •Software framework for data processing algorithms•Grid computing for the data processing flow
Alexandre Vaniachine (ANL)
World Wide Computing Model
The focus of my presentation is on the integration of these three core software domains in ATLAS Data Challenges towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data
Alexandre Vaniachine (ANL)
ATLAS Computing Challenge
The emerging World Wide computing model is an answer to the LHC computing challenge:
For ATLAS the raw data itself constitute 1.3 PB/year adding “reconstructed” events and Monte Carlo data results in a
~10 PB/year (~3 PB on disk)The required CPU estimates including analysis are ~1.6M
SpecInt95CERN alone can handle only a fraction of these resources
Computing infrastructure, which was centralized in the past, now will be distributed (in contrast to the reverse trend for the experiments that were more distributed in the past)
Validation of the new Grid computing paradigm in the period before the LHC requires Data Challenges of increasing scope and complexity
These Data Challenges will use as much as possible the Grid middleware being developed in Grid projects around the world
Alexandre Vaniachine (ANL)
Ensuring that the ‘application’ software is independent of underlying persistency technology is one of the defining characteristics of the ATLAS software architecture (“transient/persistent” split)
Integrated operation of framework & database domains demonstrated the capability of• switching between persistency technologies• reading the same data from different frameworks
Implementation: data description (persistent dictionary) is stored together with the data, application framework uses transient data dictionary for transient/persistent conversion
Grid integration problem is very similar to the transient/persistent issue, since all objects become just the bytestream either on disk or on the net
Technology Independence
Alexandre Vaniachine (ANL)
ATLAS Database Architecture
Independent of underlying persistency technology
Ready for Grid integration
Data description
stored together with
the data
Alexandre Vaniachine (ANL)
For some time ATLAS has had both a ‘baseline’ technology (Objectivity) and a baseline evaluation strategy• We implemented persistency in Objectivity for DC0• A ROOT-based conversion service (AthenaROOT) provides
the persistence technology for Data Challenge 1Technology strategy is to adopt LHC-wide LHC Computing Grid
(LCG) common persistence infrastructure (hybrid relational and ROOT-based streaming layer) as soon as this is feasible
ATLAS is committed to ‘common solutions’ and look forward to LCG being the vehicle for providing these in an effective way
Changing the persistency mechanism (e.g. Objectivity -> Root I/O) requires a change of “converter”, but of nothing else
The ‘ease’ of the baseline change demonstrates benefits of decoupling transient/persistent represenations
Our architecture, in principle, is capable to provide language independence (in the long-term)
Change of Persistency Baseline
Alexandre Vaniachine (ANL)
Athena Software Framework
ATLAS Computing is steadily progressing towards a highlyfunctional software suite and implementing World Wide model(Note that a legacy software suite was produced and still
exists and is used: so it can be done for ATLAS detector!)
Athena Software Framework is used in Data Challenges for:
•generator events production• fast simulation•data conversion•production QA• reconstruction (off-line and High Level Trigger)
Work in progress: integrating detector simulationsFuture Directions: Grid integration
Alexandre Vaniachine (ANL)
Athena Architecture Features
•Separation of data and algorithms•Memory management•Transient/Persistent separation
Athena has a common code base with GAUDI framework
(LHCb)
Alexandre Vaniachine (ANL)
ATLAS Detector Simulations
Scale of the problem:25,5 millions distinct volume copies23 thousands different volume objects4,673 different volume typesmanaging up to few hundred
pile-up eventsone million hits per event on average
Alexandre Vaniachine (ANL)
MC event(HepMC)
MC event(HepMC)
MC event(HepMC)
MC event(HepMC)
Universal Simulation Box
With all interfaces clearly defined, simulations become “Geant-neutral”
You can in principle run G3, G4, Fluka, parameterized simulation with no effect on the end users
G4 robustness test completed in DC0
Detector simulationprogram MC event
(HepMC)
MC event(HepMC)
MCTruthMCTruth
HitsHits
DigitisationDigitisation
DetDescription
Alexandre Vaniachine (ANL)
Data Challenges
Data Challenges prompted increasing integration of grid components in ATLAS software
DC0 used to test the software readiness and the production pipeline continuity/robustness
Scale was limited to < 1 M eventsPhysics oriented: output for leptonic channels analysis
and legacy Physics TDR dataDespite the centralized production in DC0 we started
deployment of our DC infrastructure (organized in 13 work packages) covering in particular areas related to Grid like:• production tools• Grid tools for metadata bookkeeping and replica
management
We started distributed production on the Grid in DC1
Alexandre Vaniachine (ANL)
DC0 Data Flow
Multiple production pipelines Independent data transformation steps Quality Assurance procedures
Alexandre Vaniachine (ANL)
Data Challenge 1Reconstruction & analysis on a large scale: exercise data model, study ROOT I/O
performance, identify bottlenecks, exercise distributed analysis, …
Produce data for High Level Trigger (HLT) TDR & Physics groups• Study performance of Athena and algorithms for use in High Level
Trigger • Test of ‘data-flow’ through HLT: byte-stream -> HLT-> algorithms -> recorded
data• High statistics needed (background rejection study)
• Scale ~10M simulated events in 10-20 days, O(1000) PC’s
Exercising LHC Computing model: involvement of CERN & outside-CERN sites
Deployment of ATLAS Grid infrastructure: outside sites essential for this event scale
Phase 1 (started in June)• ~10oM Generator particles events (all data produced at CERN) • ~10M simulated detector response events (June – July)• ~10M reconstructed objects events
Phase 2 (September –December)• Introduction and use of new Event Data Model and Detector Description• More Countries/Sites/Processors • Distributed Reconstruction • Additional samples including pile-up• Distributed analyses• Further tests of GEANT4
Alexandre Vaniachine (ANL)
DC1 Phase 1 Resources
Organization & infrastructure is in place lead by CERN ATLAS group
2000 processors, 1.5.1011 SI95sec
adequate for ~ 4*107 simulated events2/3 of data produced outside of CERNproduction on a global scale: Asia, Australia, Europe and North
America17 countries, 26 production sites
AustraliaMelbourne
CanadaAlbertaTriumf
Czech RepublicPrague
DenmarkCopenhagen
FranceCCIN2P3 Lyon
Switzerland
CERNTaiwan
Academia Sinica
UK
RAL
Lancaster
Liverpool (MAP)USA
BNL. . .
Germany
Karlsruhe
Italy: INFN
CNAF
Milan
Roma1
NaplesJapan
TokyoNorway
Oslo
PortugalFCUL Lisboa
Russia: RIVK BAKJINR DubnaITEP MoscowSINP MSU MoscowIHEP Protvino
SpainIFIC Valencia
SwedenStockholm
Alexandre Vaniachine (ANL)
Data Challenge 2Schedule: Spring-Autumn 2003 Major physics goals:
• Physics samples have ‘hidden’ new physics• Geant4 will play a major role• Testing calibration and alignment procedures
Scope increased to what has been achieved in DC0 & DC1• Scale at a sample of 108 events • System at a complexity ~50% of 2006-2007 system
Distributed production, simulation, reconstruction and analysis:• Use of GRID testbeds which will be built in the context of the
Phase 1 of the LHC Computing Grid Project, • Automatic ‘splitting’, ‘gathering’ of long jobs, best available
sites for each job• Monitoring on a ‘gridified’ logging and bookkeeping system,
interface to a full ‘replica catalog’ system, transparent access to the data for different MSS system
• Grid certificates
Alexandre Vaniachine (ANL)
Grid Integration in Data Challenges
Grid and Data Challenge Communities -overlapping objectives:
Grid middleware– testbed deployment, packaging, basic sequential
services, user portals Data management
– replicas, reliable file transfers, catalogs Resource management
– job submission, scheduling, fault tolerance Quality Assurance
– data reproducibility, application and data signatures, Grid QA
Alexandre Vaniachine (ANL)
Grid Middleware ?
Alexandre Vaniachine (ANL)
Grid Middleware !
Alexandre Vaniachine (ANL)
ATLAS Grid Testbeds
US-ATLAS Grid
Testbed
EU DataGrid
NorduGrid
For more information see presentations by Roger Jones and Aleksandr Konstantinov
Alexandre Vaniachine (ANL)
Interfacing Athena to the GRID
Areas of work:Data access (persistency), Event Selection,GANGA (job configuration & monitoring, resource estimation &
booking, job scheduling, etc.),Grappa - Grid User Interface for Athena
Athena/GAUDI Application
GANGA/GrappaGU
I
Virtual DataAlgorithms
GRIDServices
HistogramsMonitoringResults
Making the Athena framework working in the GRID environment requires:Architectural design & components making use of the Grid services
Alexandre Vaniachine (ANL)
Data Management Architecture
AMI
ATLAS Metatda
ta Interface
MAGDA
MAnageMAnager for r for Grid-Grid-based based DAtaDAta
VDC
Virtual Data
Catalog
Alexandre Vaniachine (ANL)
AMI Architecture
Data warehousing principle (star architecture)
Alexandre Vaniachine (ANL)
MAGDA Architecture
Component-based architecture emphasizing fault-tolerance
Alexandre Vaniachine (ANL)
VDC Architecture
Two-layer architecture
Alexandre Vaniachine (ANL)
Introducing Virtual Data
Recipes for producing the data (jobOptions, kumacs) has to be fully tested, the produced data has to be validated through a QA step
Preparation production recipes takes time and efforts, encapsulating considerable knowledge inside. In DC0 more time has been spent to assemble the proper recipes than to run the production jobs
When you got the proper recipes, producing the data is straightforward
After the data have been produced, what do we have to do with the developed recipes? Do we really need to save them?
Data are primary, recipes are secondary
Alexandre Vaniachine (ANL)
Virtual Data Perspective
GriPhyN project (www.griphyn.org) provides a different perspective:
recipes are as valuable as the data production recipes are the Virtual Data
If you have the recipes you do not need the data (you can reproduce them)recipes are primary, data are secondary
Do not throw away the recipes,save them (in VDC)
From the OO perspective:Methods (recipes) are encapsulated together with
the data in Virtual Data Objects
Alexandre Vaniachine (ANL)
VDC-based Production System
High-throughput features:• scatter-gather data processing architecture
Fault tolerance features:• independent agents• pull-model for agent tasks assignment (vs push)• local caching of output and input data (except Objy input)
ATLAS DC0 and Dc1 parameter settings for simulations are recorded in the Virtual Data Catalog database using normalized components: parameter collections structured “orthogonally”
• Data reproducibility• Application complexity• Grid location
Automatic “garbage collection” by the job scheduler:• Agents pull the next derivation from VDC• After the data has been materialized agents register “success”
in VDC• When previous invocation has not been completed within the
specified timeout period, it is invoked again
Alexandre Vaniachine (ANL)
AthenaGenerators
HepMC.root
digis.zebra
atlsimAthena
conversiondigis.root Athena
reconrecon.root
QA.ntuple
geometry.zebraAthena QA
AthenaAtlfast
filtering.ntuple
geometry.root
Athenaconversion
QA.ntuple
Athena QA
Atlfast.root
Atlfastrecon
recon.root
Exercising rich possibilities for data processing comprised of multiple independent data transformation steps
Tree-like Data Flow
Alexandre Vaniachine (ANL)
Data Reproducibility
The goal is to validate DC samples productions by insuring the reproducibility of simulations run at different sites
We need the tool capable to establish the similarity or the identity of two samples produced in different conditions, e.g at different sites
A very important (and sometimes overlooked) component for the Grid computing deployment
It is complementary to the software and/or data digital signatures approaches that are still in the R&D phase
Alexandre Vaniachine (ANL)
Grid Production Validation
Simulations are run in different conditions;
for instance, same generation input but different production sites
For each sample, Reconstruction, i.e Atrecon is run to produce standard CBNT ntuples
The validation application launches specialized independent analyses for ATLAS subsystems
For each sample standard histograms are produced
Alexandre Vaniachine (ANL)
Comparison Procedure
Test sample
Reference sample
Superimposed Samples
Contributions to 2
Alexandre Vaniachine (ANL)
Comparison procedure endswith a 2 -bar chart summary
Give a pretty nice overview
of how samples compare:
Summary of Comparison
Alexandre Vaniachine (ANL)
Example of Finding
Comparing energy in calorimeters for Z 2l samples DC0, DC1
It works!
Difference caused by the cut at generation
Alexandre Vaniachine (ANL)
Summary
ATLAS computing is in the middle of first period of Data Challenges of increasing scope and complexity and is steadily progressing towards a highly functional software suite, plus a World Wide computing model which gives all ATLAS equal and equal quality of access to ATLAS data
These Data Challenges are executed at the prototype tier centers and use as much as possible the Grid middleware being developed in Grid projects around the world
In close collaboration between the Grid and Data Challenge communities ATLAS is testing large-scale testbed prototypes, deploying prototype components to integrate and test Grid software in a production environment, and running Data Challenge 1 production in 26 prototype tier centers in 17 countries on four continents
Quite promising start for ATLAS Data Challenges!
Top Related