A Proposal for the MEG Offline Systems
description
Transcript of A Proposal for the MEG Offline Systems
A Proposal for the MEG A Proposal for the MEG Offline SystemsOffline Systems
Lecce 1/20/2004Corrado Gatto
OutlookOutlook Choice of the framework: ROOT
– Offline Requirements for MEG– ROOT I/O Systems: is it suitable?– LAN/WAN file support– Tape Storage Support– Parallel Code Execution Support
Architecture of the offline systems– Computing Model– General Architecture– Database Model– Montecarlo– Offline Organization
Already Discussed (phone meeting Jan 9th, 2004)– Why ROOT?
FrameworkFrameworkServicesServices
ExperimentExperimentIndependentIndependent
OfflineSotfware
Coming Soon…Coming Soon…Next Meeting (PSI Feb 9th, 2004)
– Data Model (Object Database vs ROOT+RDBMS)– More on ROOT I/O– UI and GUI– GEANT3 compatibility
Dataflow and Reconstruction Dataflow and Reconstruction RequirementsRequirements
100 Hz L3 trigger
evt size : 1.2 MB Raw Data throughput: (10+10)Hz 1.2Mb/Phys evt
0.1 + 80Hz 0.01 Mb/bkg evt = 3.5 Mb/s
<evt size> : 35 kB
Total raw data storage: 3.5Mb/s 107s = 35 TB/yr
Framework Implementation Framework Implementation ConstraintsConstraints
• Geant3 compatible (at least at the beginning)
• Written and maintained by few people
• Low level of concurrent access to reco data
• Scalability (because of the uncertainty of the event rate)
Compare to BABARCompare to BABAR
BABAR MEG
No. of Subsystems 5 3
No. of Channels ~250,000 ~1000
Event Rate 109 event/yr 109 event/yr
Raw Event Size 32 kB 1.2 MB
L3 to Reconstruction 30 Hz+70Hz
(2.5 MB/s)
20 Hz+80 Hz
(3.5 MB/s)
Reco to HSS 100 Hz (7.5 MB/s) 100 Hz (3.6 MB/s)
Storage Requirements (including MC)
300 TB/yr 70 TB/yr(reprocessing not
included)
BaBar Offline SystemsBaBar Offline Systems
>400 nodes (+320 in Pd) >20 physicists/engineers
Experiments Using ROOT for the Experiments Using ROOT for the OfflineOffline
Experiment Max Evt size Evt rate DAQ out Tape Storage Subdetectors Collaborators
STAR 20 MB 1 Hz 20 MB/s 200 TB/yr 3 >400
Phobos 300 kB 100 Hz 30 MB/s 400 TB/yr 3 >100
Phenix 116 kB 200 Hz 17 MB/s 200 TB/yr 12 600
Hades 9 kB (compr.) 33 Hz 300 kB/s 1 TB/yr 5 17
Blast 0.5 kB 500 Hz 250 kB/s 5 55
Meg 1.2 MB 100 Hz 3.5 MB/s 70 TB/yr 3
Incidentally……Incidentally…… All the above experiments use the ROOT framework and I/O system
BABAR former Offline Coordinator now at STAR (T. Wenaus) moved to ROOT
BABAR I/O is switching from Objy to ROOT Adopted by Alice Online+Offline with the most demanding requirements
regarding raw data processing/storage.– 1.25 GBytes/s
– 2 PBytes/yr
Large number of experiments (>30) using ROOT world-wide ensures open-source style support
Requirements for HEP software Requirements for HEP software architecture or frameworkarchitecture or framework
Easy interface with existing packages:– Geant3 , Geant4, Event generators
Simple structure to be used by non-computing experts Portability Experiment-wide framework Use a world-wide accepted framework, if possible
Collaboration-specific framework is less likely to survive in the long term
ROOT I/O BenchmarkROOT I/O Benchmark Phobos: 30 MB/s - 9 TB (2001)Phobos: 30 MB/s - 9 TB (2001)
– Event size: 300 kB– Event rate: 100 Hz
NA57 MDC1: 14 MB/s - 7 TB (1999)NA57 MDC1: 14 MB/s - 7 TB (1999)– Event size: 500kB
– Event rate: 28 Hz
Alice MDC2: 100 MB/s - 23 TB (2000)Alice MDC2: 100 MB/s - 23 TB (2000)– Event size: 72 Mbytes
CDF II: 20 MB/s - 200 TB (2003)CDF II: 20 MB/s - 200 TB (2003)– Event size: 400kB– Event Rate: 75 Hz
ROOT I/O Performance Check #1:ROOT I/O Performance Check #1:
Event size: 300 kB Event rate: 100 Hz
Phobos: 30 MB/s-9 TB (2001)Phobos: 30 MB/s-9 TB (2001)
ROOT I/O used between Evt builder (ROOT code) and HPSS Raid (2 disks * 2 SCSI ports only used to balance CPU load) 30 MB/sec data transfer not limited by ROOT streamer (CPU
limited) With additional disk arrays estimated throughput > 80 MB/sec Farm: 10 nodes running Linux
rootdrootd File I/O Benchmark File I/O Benchmark 2300 evt read in a loop (file access only, no reco, no selection) Results: 17 evt/sec rootd(aemon) transfers data to actual node with 4.5 MB/sec Inefficient design: ideal situation for PROOF
Real DetectorReal Detector
Raw Performances (Alice MDC2)Raw Performances (Alice MDC2)
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Number of event builders
Glo
bal t
hrou
ghpu
t (M
B/s)
Pure Linux setup20 data sourcesFastEthernet local connection
Experiences with Root I/OExperiences with Root I/O Many experiments are using Root I/O today or
planning to use it in the near future: RHIC (started last summer)
– STAR 100 TB/year + MySQL– PHENIX 100 TB/year + Objy– PHOBOS 50 TB/year + Oracle– BRAHMS 30 TB/year
JLAB (starting this year)– Hall A,B,C, CLASS >100 TB/year
FNAL (starting this year)– CDF 200 TB/year + Oracle– MINOS
ExperienceExperiencess with Root I/O with Root I/O
DESY– H1 moving from BOS to Root for DSTs and
microDSTs 30 TB/year DSTs + Oracle– HERA-b extensive use of Root + RDBMS– HERMES moving to Root– TESLA Test beam facility have decided for Root, expect
many TB/year
GSI– HADES Root everywhere + Oracle
PISA– VIRGO > 100 TB/year in 2002 (under discussion)
ExperienceExperiencess with Root I/O with Root I/O
SLAC– BABAR >5 TB microDSTs, upgrades under way + Objy
CERN– NA49 > 1 TB microDSTs + MySQL– ALICE MDC1 7 TB, MDC2 23 TB + MySQL– ALICE MDC3 83 TB in 10 days 120 MB/s (DAQ->CASTOR)– NA6i starting– AMS Root + Oracle– ATLAS, CMS test beams– ATLAS,LHCb, Opera have chosen Root I/O against Objy
+ several thousand people using Root like PAW+ several thousand people using Root like PAW
LAN/WAN filesLAN/WAN files Files and Directories
– a directory holds a list of named objects
– a file may have a hierarchy of directories (a la Unix)
– ROOT files are machine independent
– built-in compression
Support for local, LAN and WAN files– TFile f1("myfile.root")
– TFile f2("http://pcbrun.cern.ch/Renefile.root")
– TFile f3("root://cdfsga.fnal.gov/bigfile.root")
– TFile f4("rfio://alice/run678.root")
Local file
Remote fileaccess via
a Web server
Remote fileaccess via
the ROOT daemon
Access to a fileon a mass store
hpps, castor, via RFIO
Support for HSM SystemsSupport for HSM Systems
Two popular HSM systems are supported:– CASTOR
developed by CERN, file access via RFIO API and remote rfiod
– dCache developed by DESY, files access via dCache API and
remote dcached
TFile *rf = TFile::Open(“rfio://castor.cern.ch/alice/aap.root”)
TFile *df = TFile::Open(“dcache://main.desy.de/h1/run2001.root”)
Parallel ROOT FacilityParallel ROOT Facility
Data Access StrategiesEach slave get assigned, as much as possible,
packets representing data in local files If no (more) local data, get remote data via
rootd and rfio (needs good LAN, like GB eth)
The PROOF system allows:– parallel analysis of trees in a set of files– parallel analysis of objects in a set of files– parallel execution of scripts
on clusters of heterogeneous machines
root
Remote PROOF Cluster
proof
proof
proof
TNetFile
TFile
Local PC
$ root
ana.C
stdout/obj
node1
node2
node3
node4
$ root
root [0] .x ana.C
$ root
root [0] .x ana.C
root [1] gROOT->Proof(“remote”)
$ root
root [0] .x ana.C
root [1] gROOT->Proof(“remote”)
root [2] gProof->Exec(“.x ana.C”)
ana.C
proof
proof = slave server
proof
proof = master server
#proof.confslave node1slave node2slave node3slave node4
*.root
*.root
*.root
*.root
Parallel Script ExecutionParallel Script Execution
A Proposal for the MEG Offline A Proposal for the MEG Offline ArchitectureArchitecture
Computing Model
General Architecture
Database Model
Montecarlo
Offline Organization
PSI 9/2/2004Corrado Gatto
Computing Model: OrganizationComputing Model: Organization
Based on a distributed computing scheme with a hierarchical architecture of sites
Necessary when software resources (like the software groups working on the subdetector code) are deployed over several geographic regions and need to share common data (like the calibration).
Also important when a large MC production involves several sites.
The hierarchy of sites is established according to the computing resources and services the site provides.
Computing Model: MONARCComputing Model: MONARC A central site, Tier-0
– will be hosted by PSI. Regional centers, Tier-1
– will serve a large geographic region or a country. – Might provide a mass-storage facility, all the GRID services, and an adequate
quantity of personnel to exploit the resources and assist users. Tier-2 centers
– Will serve part of a geographic region, i.e., typically about 50 active users. – Are the lowest level to be accessible by the whole Collaboration.– These centers will provide important CPU resources but limited personnel. – They will be backed by one or several Tier-1 centers for the mass storage.– In the case of small collaborations, Tier-1 and Tier-2 centers could be the
same. Tier-3 Centers
– Correspond to the computing facilities available at different Institutes. – Conceived as relatively small structures connected to a reference Tier-2
center. Tier-4 centers
– Personal desktops are identified as Tier-4 centers
Data Processing Flow Data Processing Flow STEP 1 (Tier-0)
– Prompt Calibration of Raw data (almost Real Time)– Event Reconstruction of Raw data (within hours of PC)– Enumeration of the reconstructed objects– Production of three kinds of objects per each event:
ESD (Event Summary Data) AOD (Analysis Object Data) Tag objects
– Update of the database of calibration data, (calibration constants, monitoring data and calibration runs for all the MEG sub-detectors).
– Update the Run Catalogue– Post the data for Tier-1 access
Data Processing Flow Data Processing Flow STEP 2 (Tier-1)
– Some reconstruction (probably not needed at MEG)– Eventual reprocessing– Mirror locally the reconstructed objects. – Provide a complete set of information on the
production (run #, tape #, filenames) and on the reconstruction process (calibration constants, version of reconstruction program, quality assessment, and so on).
– Montecarlo production– Update the Run catalogue.
Data Processing FlowData Processing FlowSTEP 3 (Tier-2)
– Montecarlo production– Creation of DPD (Derived Physics Data) objects. – DPD objects will contain information specifically
needed for a particular analysis. – DPD objects are stored locally or remotely and might
be made available to the collaboration.
Data ModelData Model ESD (Event Summary Data)
– contain the reconstructed tracks (for example, track pt, particle Id, pseudorapidity and phi, and the like), the covariance matrix of the tacks, the list of track segments making a track etc…
– AOD (Analysis Object Data)
– contain information on the event that will facilitate the analysis (for example, centrality, multiplicity, number of electron/positrons, number of high pt particles, and the like).
Tag objects– identify the event by its physics signature (for example, a Higgs electromagnetic decay and
the like) and is much smaller than the other objects. Tag data would likely be stored into a database and be used as the source for the event selection.
DPD ( Derived Physics Data) – are constructed from the physics analysis of AOD and Tag objects. – They will be specific to the selected type of physics analysis (ex: mu->e gamma, mu->e e e)– Typically consist of histograms or ntuple-like objects. – These objects will in general be stored locally on the workstation performing the analysis,
thus not add any constraint to the overall data-storage resources
Building a Modular SystemBuilding a Modular System
Use ROOT’s Folders
FoldersFoldersA Folder can contain:
other Folders an Object or multiple Objects. a collection or multiple collections.
Folders TypesFolders TypesTasksData
Data Folders are filled by Tasks (producers)Data Folders are used by Tasks (consumers)
Folders InteroperateFolders Interoperate
Folders Type: TasksFolders Type: Tasks
Clusterizer(per detector)
Digitizer(per detector)
Reconstructioner
1…3Reconstructioner
(per detector)
Analizer
DQM
1…3 Fast Reconstructioner(per detector)
Alarmer
Digitizer{User code}
Clusterizer{User code}
Analizer
Folders Type: TasksFolders Type: Tasks
Histogrammer(per detector)
Calibrator
1…3Calibrator
(per detector)
Histogrammer(per detector)
Aligner
1…3Aligner
(per detector)
Folders Type: TasksFolders Type: Tasks
Histogrammer(per detector)
Vertexer
1…3Vertexer
(per detector)
Histogrammer(per detector)
Trigger
1…3Trigger
(per detector)
Data FolderData Folder Structure Structure
Event(i)
1…n Raw DataDC Hits
EMC Hits
TOF Hits
Reco Data
DCDigi
TrackParticles
TOF
Run
Configur.
Conditions
Constants
Header
Track Ref.
KinematicsParticles
EMC
MC only
Kine.rootEvent #2
TreeK
Event #1TreeK
DCH.Hits.rootEvent #2
TreeH
Event #1TreeeH
DCH.Digits.rootEvent #2
TreeD
Event #1TreeeD
Main.rootHeaderTreeeH
Files Structure of « raw » DataFiles Structure of « raw » Data
MEG: Run Info
TBranch : DCH
TBranch : EMC
TClonesArray TParticles
TClonesArray
TClonesArray
DCH.Hits.root
EMC.Hits.root
main.root
1 common TFile + 1 TFile per detector
• • •
TTree0,…i,…,n : kinematics
TTree0,…i,…,n : hits
TreeH0,…i,…,n : hits
+1 TTree per event
Files Structure of « reco » DataFiles Structure of « reco » Data
TBranchv1,…vn
TClonesArray
SDigits.root
• • •
Reco.root
TTree0,…i,…n
Each task generates one TFile• 1 event per TTree• task versioning in TBranchEMC.Digits.rootDCH.Digits.root
• • •
DCH.Reco.root EMC.Reco.root
• • •
• • •
Detector wise splitting
Run-time Data-ExchangeRun-time Data-Exchange
Post transient data to a white board
Structure the whiteboard according to detector sub-structure & tasks results
Each detector is responsible for posting its data
Tasks access data from the white board
Detectors cooperate through the white board
Whiteboard Data CommunicationWhiteboard Data Communication
Class 1 Class 2
Class 3
Class 4
Class 5Class 6
Class 7
Class 8
Coordinating Tasks & DataCoordinating Tasks & Data
Detector stand alone (Detector Objects)– Each detector executes a list of detector actions/tasks– On demand actions are possible but not the default– Detector level trigger, simulation and reconstruction are implemented as clients of
the detector classes
Detectors collaborate (Global Objects)– One or more Global objects execute a list of actions involving objects from
several detectors
The Run Manager – executes the detector objects in the order of the list– Global trigger, simulation and reconstruction are special services controlled by the
Run Manager class
The Offline configuration is built at run time by executing a ROOT macro
Global Offline StructureGlobal Offline Structure
Detector Class
Global Reco
Detectortasks
Run Class
Run Manager
Detector Class
DCH
Detectortasks
ROOT Data Base
TreeBranches
One or moreGlobal Objects
execute a list of tasksinvolving objects
from several detectors
Detector Class
EMC
Detectortasks
Run Class
MC Run Manager
Detector Class
TOF
Detectortasks
The Run managerexecutes
the detector objectsin the order of the list
Each detectorexecutes a list
of detector tasks
On demand actionsare possible butnot the default
Detector Level StructureDetector Level Structure
Detector ClassDCH
DetectorTask Class
DCH Simulation
DetectorTask Class
DCH TriggerDetectorTask Class
DCH Reconstruction
DetectorTask Class
DCH Digitization
Hits
Digits
Local tracksTrigInfo
Branchesof a Root Tree
List ofdetectors
List ofdetector
tasks
AliDetectorAliFMD
AliDetectorAliTRD
AliDetectorAliTPC
Detectoractions
AliDetectorAliTOF
Detectoractions
DCHDetector Class
Module Class DCHGeometry
Geometry Class-CreateGeometry-BuildGeometry-CreateMaterials
Detectortasks
The Detector ClassThe Detector Class Base class for MEG subdetectors
modules. Both sensitive modules (detectors)
and non-sensitive ones are described by this base class. This class
supports the hit and digit trees produced by the simulation
supports the the objects produced by the reconstruction.
This class is also responsible for building the geometry of the detectors.
MEG Montecarlo MEG Montecarlo OrganizationOrganization
The Virtual Montecarlo
Geant3/Geant4 Interface
Generator Interface
The Virtual MC ConceptThe Virtual MC Concept Virtual MC provides a virtual interface to Monte Carlo
It enables the user to build a virtual Monte Carlo application independent of any actual underlying Monte Carlo implementation itself
The concrete Monte Carlo (Geant3, Geant4, Fluka) is selected at run time
Ideal when switching from a fast to a full simulation: VMC allows to run different simulation Monte Carlo from the same user code
The The VVirtual irtual MC MC ConceptConcept
User Code
VMC
Virtual Geometrical Modeller
G3 G3 transport
G4 transportG4
FLUKA transportFLUKA
Geometrical Modeller
Reconstruction
Visualisation
Generators
Running a Virtual MontecarloRunning a Virtual Montecarlo
Root particle Root particle stackstack
hits structureshits structures
Simplified Root Simplified Root geometrygeometry
Geometry Geometry DatabaseDatabase
Virtual MC
Run Control
RootOutput
File
FLUKAGeant3.Geant3.
2121Geant4
Fast MCGenerators
Transportengine
selectedat run time
Generator InterfaceGenerator InterfaceTGenerator is an abstract base class, that
defines the interface of ROOT and the various event generators (thanks to inheritance)
Provide user with– Easy and coherent way to study variety of physics signals– Testing tools – Background studies
Possibility to study– Full events (event by event) – Single processes – Mixture of both (“Cocktail events”)
Data Access: ROOT + RDBMS ModelData Access: ROOT + RDBMS Model
histograms
Calibrations
Geometries
Run/FileCatalog
Trees
Event Store
ROOTfiles
OracleMySQL
Offline OrganizationOffline Organization
No chance to have enough people at one site to develop the majority of the code of MEG.
Planning based on maximum decentralisation. – All detector specific software developed at outside
institutes– Few people directly involved today
Off-line team responsible for– central coordination, software distribution and
framework development– prototyping
Offline Team Tasks:Offline Team Tasks: I/O Development:
– Interface to DAQ– Distributed Computing (PROOF)– Interface to Tape– Data Challenges
Offline Framework:– ROOT Installation and Maintenance– Main program Implementation– Container Classes– Control Room User Interface (Run control, DQM, etc…)
Database Development (Catalogue and Conditions):– Installation– Maintenance of constant– Interface
Offline Team Tasks 2:Offline Team Tasks 2:Offline Coordination:
– DAQ Integration– Coordinate the work of detector software subgroups
Reconstruction Calibration Alignment Geometry DB Histogramming
– Montecarlo Integration Data Structure Geometry DB
– Supervise the production of collaborating classes
– Receive, test and commit the code from individual subgroups
Offline Team Tasks 3:Offline Team Tasks 3:Event DisplayCoordinates the Computing
– Hardware Installation– Users and queue coordination– Supervise Tier-1, Tier-2 and Tier-3 computing facilities
DQM– At sub-event level responsibility is of subdetector– Full event is responsability of the offline team
Documentation
Software Project OrganisationSoftware Project Organisation
Offline Team
Representatives fromInstitutions
Representatives fromDetector Subgroups
DCHreco, calib & histo)
EMCreco, calib & histo
TOFreco, calib & histo
Geometry
DatabaseProduction
Digi GEANT
Generator
Representative from Montecarlo Team
Geometry DB
Manpower EstimateManpower Estimate
Activity Profile 2004 2005 2006 2007Off-line Coordination PH 0,8 0,8 0,8 0,8Off-line Framework Develop. PH/CE 1,2 1,2 1,2 1,2Global Reco Develop. (?) PH 1,0 1,0 1,0 1,0Databases (Design & Maint) PH 1,0 1,0 1,0 1,0QA & Documentation PH/CE 0,5 0,5 0,7 1,0I/O Development CE/PH 1,0 1,0 1,0 0,5MDC CE 0,5 0,5 0,5Event Display & UI CE 1,0 1,0 0,5Syst. + web support CT 0,5 1,0 1,0 2,0Physics Tools 1,0 1,0Production PH/CE 1,0Total Needed 6,5 9,0 9,2 9,0
Migrate immediately to C++– Immediately abandon PAW– But accept GEANT3.21 (initially)
Adopt the ROOT framework– Not worried of being dependent on ROOT– Much more worried being dependent on G4, Objy....
Impose a single framework– Provide central support, documentation and distribution– Train users in the framework
Use Root I/O as basic data base in combination with an RDBMS system (MySQL, Oracle?) for the run catalogue.
ProposalProposal