CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi...

17
CHEP 2004, Core Software Integration of POOL into Integration of POOL into three Experiment Software three Experiment Software Frameworks Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine (Argonne National Laboratory) P. Van Gemmeren (BNL) R. Chytracek, D. Duellmann, M. Frank, M. Girone, G. Govi, V. Innocente, P. Mato Vila, J. Moscicki, I. Papadopoulos, H. Schmuecker (CERN) R D Schaffer (LAL-IN2P3) Z. Xie (Princeton University ) T. Barrass (University of Bristol) C. Cioffi (University of Oxford) W. Tanenbaum (Fermi National Accelerator Laboratory)

Transcript of CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi...

Page 1: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core Software

Integration of POOL into three Integration of POOL into three Experiment Software FrameworksExperiment Software Frameworks

Giacomo GoviCERN IT-DB & LCG-POOL

K. Karr, D. Malon, A. Vaniachine (Argonne National Laboratory)P. Van Gemmeren (BNL)R. Chytracek, D. Duellmann, M. Frank, M. Girone, G. Govi, V. Innocente, P. Mato Vila, J. Moscicki, I. Papadopoulos, H. Schmuecker (CERN)R D Schaffer (LAL-IN2P3) Z. Xie (Princeton University )T. Barrass (University of Bristol)C. Cioffi (University of Oxford)W. Tanenbaum (Fermi National Accelerator Laboratory)

Page 2: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 22

OUTLINEOUTLINE

• POOL Project mandate

• ATLAS, CMS, LHCb integration

• Commonalities and differences

• Learning from integration experience

• Conclusions

Page 3: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 33

POOL project mandatePOOL project mandate

• Provide a framework for C++ object persistency - - API neutral on storage technology - Root and RDBMS backends

- File access integrated with current GRID technologies

• Follow up experiment specific requirements - Extract ‘synthesis’ among many (overlapping) use cases

- Resolve conflicting requirements

- Find common solutions

• Encourage concrete experiment participation - - Include experiment members in the POOL Core developers team - Follow up quick integration of POOL releases in the experiment framework - Involve experiment in the validation phase of release processes. • Re-use experience from previously adopted Persistency-

related technologies: Objectivity, Gaudi, RD45

Page 4: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 4

ATLAS: The Overall approach of the experiment framework on object Persistency

From the Athena/Gaudi framework point of view, POOL is just a new I/O “technology” This implies writing a new conversion service Main components:

AthenaPoolCnvSvc - conversion service AthenaPoolConverter - converter base class T_AthenaPoolCnv<T> - templated converters PoolSvc - Athena/Gaudi service interface to POOL

Allows jobOptions DataHeader - stores the refs of the Event Data Objects

Ref to DataHeader is inserted in the event collection

ATLAS has simplified the user interface by allowing “generic” converters: Use templated converter and generate the necessary classes to create the

converter automatically User just needs to specify a “.h” file for each DataObject (pool ref’ed object)

to be stored

Page 5: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 5

ATLAS: The Overall approach of the experiment framework on object Persistency

Algorithm

TransientData Store

ADataObj

SubObj1

SubObj2

SubObj3

IDataObj

retrieve(ptr, “key”)orrecord(ptr, “key”) Data

Service ConversionService

PersistencyService

PoolSvc

AthenPoolCnvSvc

ROOTfiles

POOL::DataSvc

POOL::FileCatalog

Athena services

POOL-specificAthena services

POOL services

generic conversion

Page 6: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core SoftwareG.Govi, IT-DBG.Govi, IT-DB 6

ATLAS: The POOL components used

POOL DataSvcPOOL DataSvc is for now the entry point for object persistency is for now the entry point for object persistency Two caches: input/output, use Ref<T> for object I/O Cache functionality is not strictly needed => duplicates Athena/Gaudi transient store

Object lifetime managed by Athena/Gaudi transient store For event storage, using: RootStorageSvc and Implicit collections Some conditions storage using RootStorageSvc

Support both Tree-based and Key-based ROOT storage, selectable in Athena via JobOptions Using ROOT Trees as default for now to gain experience Key-based approach is similar to what has already been tested in ATLAS via Objectivity Expect to also use Object/Relational storage when available

Using XML catalogs for local data access, EDG RLS and Globus RLS for master file catalogs

For tag collections: using both Root and MySQL collections Currently deploying Oracle DB for detector description parameters via Relational

Access Layer

Page 7: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

G.G

ovi,

IT-D

B

Slide 7

Use of POOL in CMS: current statusUse of POOL in CMS: current status

COBRA OSCAR ORCA Current version being tested using last POOL 2.0.0 internal release

Production still use pool 1.7

Usable for production, deployed to physicists Used for SW tutorials each Friday since autumn 2003 35 Million events produced with OSCAR (G4 simulation)

Essentially same functionality as previous Objectivity-based code Limitations

No concurrent update of databases▫ No direct connection to central database while running

Remote access limited to RFIO or dCache (soon GFAL?) No Schema evolution

Added values No need of a common base class (ooObj) Native support of stl containers Support for transient attributes

Page 8: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

G.G

ovi,

IT-D

B

Slide 8

Algorithm Context

(Event)

pool Refs

Local transient store

Persistent store

POOL

DataSvc

(Object cache)

Reconstruction on demand

Data Access in CMSData Access in CMS

Retrieve a chunk

Page 9: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

G.G

ovi,

IT-D

B

Slide 9

What CMS uses of POOL What CMS uses of POOL

Transition from Objectivity to Pool inspired by a minimum impact principle

• All objects (event and metadata) are stores as root keyed-objects (no root-tree)

• Only object navigation is used, no other access mechanisms• Ref

Full interface• File Catalog

Full interface XML implementation in Physics Applications MySQL & RLS used in production (DC04)

• Session Only Transaction Management No explicit Database/Container handling

Page 10: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core Software G.Govi, IT-DB 10

LHCb Goals for POOL Integration

• Keep the existing framework architecture– Objects are transient and reside in “Data Stores”

• source for conversion to persistent or graphical representation– “Algorithms” access objects by “logical name”

from a data store

• Keep the existing event model description– Code (headers) generated from XML files

• Need to access existing data– Read data with pre-POOL software (ROOT based)

• Usage of POOL transparent to end-users (physicists)

Page 11: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core Software G.Govi, IT-DB 11

Data Access in Gaudi Applications

POOL

(5) Register

DataService

Algorithm

(1) retrieveObject(…)

Try to accessan object data

(2) Search in Store

Data Store

(3) Request load

PersistencyService

ConversionService

Technology dispatcher

PersistencyService

PersistencyService

StorageServiceStorageService

ConversionService(s)

ConversionService(s)

Page 12: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core Software G.Govi, IT-DB 12

Customization of POOL for Gaudi• Currently main technology for event storage

– Write event data in ROOT tree-mode– Detector data etc. not (yet) implemented

• POOL components used– FileCatalog (XML flavor), PersistencySvc, StorageSvc + ROOT backend

implementation– Collections ??

• Usage of Gaudi object cache– Efficiently Managed by the Gaudi framework

Event / time interval (detector description), …– Tree like structure, like file system “/Event/MC/Particles”– Consequence on reference implementation

• Dictionaries generated from XML event description– Non trivial: Dictionary completeness

Page 13: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1313

Commonalities & differences ICommonalities & differences I

• Common starting point: re-use code from existing frameworks

- Minimize the impact of the integration / migration to POOL supported technologies

- Results in three rather different integration approaches - POOL API design highly influenced by experiment-specific requirements

driven by this principle

• Root backend adopted as main storage technology for event data

- With tree-based container (Atlas,LHCb)

- With key-based container (CMS) - Common Interest in the future development of RDBMS backend

• Common choice of file bookkeeping through POOL catalogues

- XML catalog adopted in the three production chains

Page 14: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1414

Commonalities & differences IICommonalities & differences II

• Object caching and navigation - ATLAS: Integrate POOL Ref API with the Athena object store, using

a customized ownership policy

- CMS: Replace Objectivity with the POOL OO-db API (similar rules for navigation)

Both approaches are object-based and leave implicit the database and container management

- LHCb: Integrate a lower-level component (PersistencySvc). Object bookkeeping and navigation left to Gaudi framework.

A Service-based approach. Database and container management are explicitly controlled.

Page 15: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1515

Learning from Integration ExperiencesLearning from Integration Experiences

Differences are large in the object transient store and navigation area. -The navigation mechanism strongly influences the Cache features (object lifetime management, semantics for object association)

-Some experiment framework already had specific object bookkeeping services – not easily re-usable for common purposes

-A real decoupling of the Ref implementation from the Cache details is difficult to achieve

-A review of the top level architecture and API (done in 2003 with the experiments) could not find a common solution

Page 16: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1616

RemarksRemarks

• ATLAS– Some complex components of the data model and some StoreGate

constructs were initially difficult to persistify in POOL– Some compromises made; ATLAS event data model is for the most part

persistifiable in POOL today (sufficient for the Data Challenge)

• CMS– Integration started early with the first POOL releases: important

contribution to debugging and consolidation– Root based streaming in POOL not optimized for UPDATE operation

• LHCb– High level of customization required to POOL API to minimize integration

impact – POOL (or ROOT storing complex objects) needs considerable more CPU

consumption than the simple Gaudi object serialization (based on BLOB structures)

– But: ROOT provides schema evolution; BLOB serialization did not

Page 17: CHEP 2004, Core Software Integration of POOL into three Experiment Software Frameworks Giacomo Govi CERN IT-DB & LCG-POOL K. Karr, D. Malon, A. Vaniachine.

CHEP 2004, Core SoftwareCHEP 2004, Core Software G.Govi, IT-DBG.Govi, IT-DB 1717

ConclusionsConclusions

• POOL has been successfully integrated in three of the LHC experiment software frameworks and used in data challenges

- The common solution provided satisfies requirements for production - As required by all experiment, the impact of the POOL integration to the

existing framework has been kept reasonably low

• Integration approaches differ in the object navigation area - POOL component usage follows the different requirements of the

experiment frameworks - Some area of duplication still present among POOL and the experiment

framework

• The core pool components are all used- by at least one experiment

• Common view to look forward - Improve performance (where possible) for the Root backend - Provide a RDBMS backend