Persistency at LHC Vincenzo Innocente CERN History is as old as Persistency.

61
Persistency at LHC Persistency at LHC Vincenzo Innocente CERN History is as old as Persis History is as old as Persis
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Persistency at LHC Vincenzo Innocente CERN History is as old as Persistency.

Persistency at LHCPersistency at LHC

Vincenzo InnocenteCERN

History is as old as PersistencyHistory is as old as Persistency

April 18, 2023 Vincenzo Innocente LCB workshop

2

Sources and ContributionsSources and Contributions

Presentations at last RD45 workshop Presentations at the “Architecture Working

Group” Experiments’ Web pages Contributions to this Workshop

Focus on LHC experiments’ prototypes New generation experiments (BaBar, STAR, RunII)

experience and plans

Persistency in

General

April 18, 2023 Vincenzo Innocente LCB workshop

4

Process 1Process 1Process 2Process 2

Process 3Process 3

PermanentPermanentStorageStorage

VolatileVolatileMemoryMemory

Persistency: what for?Persistency: what for?

A process saves its state to be later re-used by the same process a different process

running the same executable

a different process running a different executable

Ideal persistency:Core Dump!

April 18, 2023 Vincenzo Innocente LCB workshop

5

Use CasesUse Cases

Extended (in space and time) virtual memory: proprietary format optimized for computational and

storage performance of a single application Import/Export in a heterogeneous environment

“standard” application-independent format conversion to/from internal application format

Management of different versions (identification, query mechanism) and of concurrency (locking) proprietary internal mechanism rely on the file system DBMS

April 18, 2023 Vincenzo Innocente LCB workshop

6

Use CasesUse Cases

Extended (in space and time) virtual memory: proprietary format optimized for computational and

storage performance of a single application

Import/Export in a heterogeneous environment “standard” application-independent format conversion to/from internal application format Management of different versions (identification,

query mechanism) and of concurrency (locking) proprietary internal mechanism rely on the file system DBMS

Conversion is always requiredWhat makes the difference is at which level is done

•Operating System (or below)•Persistency Service Provider•Application Framework•Application Code

Doing at a given level does not imply that it has not been done also at a lower level

Doing it at higher levels introduces flexibility but reduce performances

Doing it at a lower level improves performances but requires high integration (binds to a given solution)

Caveat

Concurrency is not only for banks...Myprog.cc changed on disk; really edit the buffer?

(emacs not oracle)

Caveat

April 18, 2023 Vincenzo Innocente LCB workshop

7

PermanentPermanentStorageStorage

VolatileVolatileMemoryMemory

Object PersistencyObject Persistency

Objects are atomic entities have a state

(data members including relationships)

provide services (methods)

Persistent objects survive process boundaries

when “retrieved” have the same state provide the same services

as they were “stored”

Event

Event

EventEvent

Event

April 18, 2023 Vincenzo Innocente LCB workshop

8

Object PersistencyObject Persistency

Persistency Objects retain their state between two

program contexts Storage entity is a complete object

State of all data members Object class

OO Language Support Abstraction Inheritance Polymorphism Parameterised Types (Templates)

April 18, 2023 Vincenzo Innocente LCB workshop

9

OO Language BindingOO Language Binding

User had to deal with copying between program and I/O representations of the same data User had to traverse the in-memory structure User had to write and maintain specialised code for

I/O of each new class/structure type Tight Language Binding

ODBMS allow to use persistent objects directly as variables of the OO language

C++, Java and Smalltalk (heterogeneity) I/O on demand: No explicit store & retrieve calls

April 18, 2023 Vincenzo Innocente LCB workshop

10

Problems with Naïve OPProblems with Naïve OP

Storing services (methods ready to run) is non trivial persistency services are just object-data store configuration management takes care of code frameworks can use dynamic loading to match data & code

Clean and performant object design is difficult: Different (partial) representations of the state of an object

may be required to cope with computational, storage and I/O efficiencies (and code development efficiency)

Object design and implementation evolve, persistent objects stay the same “Old” persistent objects need to be converted

April 18, 2023 Vincenzo Innocente LCB workshop

11

More Problems with Naïve OPMore Problems with Naïve OP

Object granularity does not match raw I/O granularity (which in turn is device dependent) small objects should be physically clusterized

according to users’ access patterns Object logical relationships do not necessarily reflect

access patterns (old rows vs columns dilemma) How objects become persistent

At construction time (user can control clustering) By reachability: An object becomes persistent when

“attached” to an already persistent object (clustering control difficult)

April 18, 2023 Vincenzo Innocente LCB workshop

12

Physical Model and Logical ModelPhysical Model and Logical Model

• Physical model may be changed to optimise performancePhysical model may be changed to optimise performance• Existing applications continue to workExisting applications continue to work

April 18, 2023 Vincenzo Innocente LCB workshop

13

Realistic Object PersistencyRealistic Object Persistency

filefile

pagepage

objectobject

objectobject

pagepage

objec

ts

objec

ts

compression?compression?Conversion from/to Conversion from/to

computational computational optimal format?optimal format?

Conversion from/to Conversion from/to machine dependent formatmachine dependent formatnew shapenew shape

April 18, 2023 Vincenzo Innocente LCB workshop

14

Components of a POMComponents of a POM

Storage manager manage the physical structure on “disk”

Transaction/concurrency manager client transaction, journaling, locking mechanisms (or rely on OS and file system protections)

RTTI system identifies the concrete type of object to

retrieve/store Converters

from storage format to “user” format and viceversa machine-dependencies, schema-evolutions, user-hooks

April 18, 2023 Vincenzo Innocente LCB workshop

15

Components of a POMComponents of a POM

Application Cache manager dynamic memory management with garbage

collection Tools and (G)UI

naming, indexing, query mechanisms interactive browsing and query development tools administration tools

April 18, 2023 Vincenzo Innocente LCB workshop

16

Objectivity/DBObjectivity/DB

ODBMS close to ODMG standard (library not framework) Storage Manager based on fixed physical hierarchy

slot-page-container-database(file)-federation Lock-server and journals to manage transactions Proprietary parsing of extension of C++ (ooddlx) Objects are converted when “opened”

schema-evolution effects: automatic or user defined Basic naming, indexing and query mechanisms Crude Browsing and administration tools

but Objy is integrated with some third-party frameworks

April 18, 2023 Vincenzo Innocente LCB workshop

17

ROOTROOTApplication Framework with embedded I/O

Storage Manager based on logical hierarchy Tbasket-branch-tree physical “logical-records” in files

No transactions, no concurrency management Parsing of C++ subset via CINT Objects are converted when retrieved (Streamer)

Automatically or by user (schema-evolution only by user) Basic naming, indexing or query mechanisms

and CINT scripting “Paw”erful interactive environment

April 18, 2023 Vincenzo Innocente LCB workshop

18

(Wrapped O)RDBMS(Wrapped O)RDBMS

Powerful, reliable and efficient storage managers with full concurrency and transaction management

SQL query mechanisms with transparent (hidden) indexing and naming

User friendly, fully integrated browsers and tools(for relational tables)

Poor object integration(developers should be both OO and ER experts at the

same time)

Persistency in

HEP

April 18, 2023 Vincenzo Innocente LCB workshop

20

HEP DataHEP Data

Environmental data Detector and Accelerator status Calibrations, Alignments

Event-Collection Meta-Data(luminosity, selection criteria, …)

… Event Data, User Data

Event Event CollectioCollectio

nn

CollectioCollectionn

Meta-Meta-DataData Event Event

ElectronsElectrons

Tracker Tracker AlignmenAlignmen

tt

TracksTracks Ecal Ecal

calibratiocalibrationn

User TagUser Tag(N-tuple)(N-tuple)

April 18, 2023 Vincenzo Innocente LCB workshop

21

Environmental DataEnvironmental Data

timeVersion A

Version BCalibration

Version B

Version A

Alignment

Version C

Version B

Version A

Geometry

Version C

Snapshot for Environmental data items valid for the currently processed event.

Parameters

April 18, 2023 Vincenzo Innocente LCB workshop

22

Event Structure & Placement (BaBar)Event Structure & Placement (BaBar)

EventHeader

Hdr

Raw

Rec

Esd

Aod

SimHeader

RawHeader

EmcHeader

TrkHeader

PidHeader

BetaHeader

SimData

RawData

EmcData

EmcData

TrkData

TrkData

TrkData

PidData

PidData

PidData

BetaData

BetaData

Tag TagEvs

Databases

Sim

April 18, 2023 Vincenzo Innocente LCB workshop

23

BaBar Event StructureBaBar Event Structure Decoupling of placement & navigation Hierarchical Placement Regions

Sim (Simulated Data). ~100kBytes/event Tru (Simulated Truth Data) ~40kBytes/event Raw (Raw Data) ~30kBytes/event Rec (Reconstructed Data) ~100kBytes/event Esd (Event Summary Data) ~20kBytes/event Aod (Analysis Object Data) ~2kBytes/event Tag (Event Selection Tag) ~200Bytes/event

Navigation Trees Minimize size of navigation headers Allow for expansion of data without schema

evolution

April 18, 2023 Vincenzo Innocente LCB workshop

24

Root Physical ClusteringRoot Physical Clustering

April 18, 2023 Vincenzo Innocente LCB workshop

25

Dynamic Load Balancing Dynamic Load Balancing Hierarchical Secure AMSHierarchical Secure AMS

Dynamic

Selection

April 18, 2023 Vincenzo Innocente LCB workshop

26

ODBMS-MSS IntegrationODBMS-MSS Integration

SLAC-Objy Plan Extensible AMS

Allows use of any type of filesystem via oofs layer Generic Authentication Protocol

Allows proper client identification Opaque Information Protocol

Allows passing of hints to improve filesystem performance Defer Request Protocol

Accommodates hierarchical filesystems Redirection Protocol

Accommodates terabyte+ filesystems Provides for dynamic load balancing

April 18, 2023 Vincenzo Innocente LCB workshop

27

One Technology for All ?One Technology for All ?

Event catalogues Update (add and remove) items of a catalogue Searchable: SQL or equivalent

Event data Write once-read many (WORM) Often on tertiary (sequential) storage Bulk data used by the entire collaboration

(Raw, Rec,…) User extracted data (N-tuples)

April 18, 2023 Vincenzo Innocente LCB workshop

28

One Technology for All ?One Technology for All ?

Detector data Updates of data items Versioning of data items Version configuration

Statistical data Understandable by interactive tools

A single coherent solution (non optimal for all purposes)

orAd-hoc optimal product for each given type?

April 18, 2023 Vincenzo Innocente LCB workshop

29

OutputStream

LHCbLHCb Event Persistency Event Persistency

Transient Event Store

Event DataService

PersistencyService

Sicb dataFiles

AlgorithmAlgorithm

SicbCnvSvc

RootCnvSvc

Root dataFiles

ConverterConverterConverter

ConverterConverterConverter

Sic

b/Z

ebra

Root

I/O

OutputStreamAppManager

April 18, 2023 Vincenzo Innocente LCB workshop

30

LHCbLHCb Generic Persistent Model Generic Persistent Model

Link ID Link Info

DB/Cont.name... ...

Storage TypeClass IDEntry IDLink ID

Converter

Technology

12ByteOID

<number>(1)

(2)(3)(4)

Lookup table

April 18, 2023 Vincenzo Innocente LCB workshop

31

LHCbLHCb Link Tables Link Tables

One Link table per Storage technology per DB

Link to Objy object no link table 8 Bytes are enough to hold ooRef directly

Link to ROOT object Link table entry must contain all navigation info

• File name• Tree/Branch name

Link to ZEBRA (SICB) object Link Table contains file name + ZEBRA bank name

April 18, 2023 Vincenzo Innocente LCB workshop

32

Hybrid Event Store in STARHybrid Event Store in STAR Adoption of ROOT I/O for the event store leaves Objectivity with one

role left to cover: the true ‘database’ functions of the event store Navigation among event collections, runs/events, event components Data locality (now translates basically to file lookup) Management of dynamic, asynchronous updating of the event store from

one end of the processing chain to the other From initiation of an event collection (run) in online through addition of

components in reconstruction, analysis and their iterations

But with the concerns and weight of Objectivity it is overkill for this role.

So we went shopping… looking to leverage the world around us, as always and eyeing particularly the rising wave of Internet-driven tools and open

software and came up with MySQL in May.

April 18, 2023 Vincenzo Innocente LCB workshop

33

MySQL data catalog

User request: Run123

User request: Yr1Central

Dataset components (file references with event ranges)

Grand Challenge

Mana

ged

Retr

ieva

l

HPSS

DS

T

ROOTDisk file

RA

W D

AQ

Flo

w t

ag

ROOTDisk file

A nalys isD

ata

set

look

up

New c om ponents ,tags c reated as new

files and added tocatalog (to original

or new dataset)

DS

T hits

Hig

h P

t uD

ST

ROOTDisk file

Flo

w u

DS

T

ROOTDisk fileH

igh P

t ta

g

ROOTDisk file

Data Retrieval an d An alysis:MySQ L + F iles

or

File based data store

Experiments’Status and Plans

April 18, 2023 Vincenzo Innocente LCB workshop

35

ATLASATLAS

Used Objectivity in several test-bed applications HCAL test-beam ATLFAST++ 1TB Milestone (HPSS used as MSS)

Plan to use Objectivity in future test-beams and MonteCarlo reconstruction

The application framework will provide a “database” independent interface

April 18, 2023 Vincenzo Innocente LCB workshop

36

CMSCMS

Uses Objectivity in production Test Beam DAQ Montecarlo (GEANT3) reconstruction

Objectivity fully integrated in Application Framework (CARF) CARF manages transactions, physical clustering and

the whole persistent object structure and its relations with the transient structure

users access persistent objects through C++ pointers

CARF takes care of pinning leaf inheritance from ooObj often used

April 18, 2023 Vincenzo Innocente LCB workshop

37

CMSCMS

Limited use of Objectivity “extentions” associations, indexes, maps, query predicates,

etc. object copy, move, versions

Schema evolution routinely used No complex object conversion attempted so far

Multi-federation environment to decouple production analysis development

April 18, 2023 Vincenzo Innocente LCB workshop

38

ALICEALICE

Simulation and reconstruction framework fully integrated in ROOT

Used in MonteCarlo simulation and reconstruction

Will be Used in TestBeams Mockup Data Challenge done: 7 TB in seven days

Use HPSS and/or CASTOR for file management

April 18, 2023 Vincenzo Innocente LCB workshop

39

ALICE DC IIALICE DC II

NA 57 data source9 PowerPC AIX

LDCLDCLDCLDCLDCLDCLDCLDCLDC Switch

5 MB/s

Intel/PC Linux + PowerPC /AIX +Sun

LDCLDCLDCLDCLDCLDCLDCLDCLDC

Switch

10 MB/s

ALICE DAQ data source

Switch

Computer Centre

Intel/Linux PC Cluster 10/15 nodes

DATE=GDC+LDC

GDCEvent Builder

ROOTObjectifier

pipe

HPSS CASTOR ??

10MB/s GB eth

GB eth

April 18, 2023 Vincenzo Innocente LCB workshop

40

LHCbLHCb

Do not want to limit to one persistency technology Speed, when you need speed Functionality, when you need functionality Ease migration to upcoming (superior) technologies

Independence Well defined interface to persistency technologies Interface: abstract technology independent API Example: ODBC for relational DBs

April 18, 2023 Vincenzo Innocente LCB workshop

41

LHCbLHCb

LHCb application framework (GAUDI) is independent from persistent technology

Manage its own application caches (data services) specialized in event data detector data statistical data

Abstract interface for user provided converters

April 18, 2023 Vincenzo Innocente LCB workshop

42

BaBarBaBar

Taking data since May Use Objectivity for all kind of data

many home made tools to manage the database Complete decoupling between transient

objects (seen by end user) and their persistent representations

No schema evolution (explicit renaming of classes)

Starts using multiple-federations to decouple running environments

April 18, 2023 Vincenzo Innocente LCB workshop

43

STARSTAR

Moved away from Objectivity mainly because of configuration management issues

Hybrid solution: ROOT for event file MySQL for event catalog and environmental data MySQL under test for event tags as well

HPSS (through Grand Challenge) for tertiary storage management

April 18, 2023 Vincenzo Innocente LCB workshop

44

Objectivity Burdens in STARObjectivity Burdens in STAR

The list of burdens imposed by Objectivity grew as our experience and lessons from BaBar mounted

Management, development burden imposed by ensuring consistent schema in a single experiment-wide federation

Schema evolution unusable if forward compatibility is desired (ability to run old executables on new data)

Do-it-yourself access control, particularly with AMS Risk of major impact from platform lock-in due to porting

delays; both Linux and Sun Scalability concerns (fall ‘98) -- lock manager performance

issues in parallel usage?

April 18, 2023 Vincenzo Innocente LCB workshop

45

Requirements: STAR 8/99 View Requirements: STAR 8/99 View Requirement Obj 97 Obj 99 ROOT 97 ROOT 99

C++ API OK OK OK OK

Scalability OK ? No file mgmt MySQL

Aggregate I/O OK ? OK OK

HPSS Planned OK? No OK

Integrity, availability OK OK No file mgmt MySQL

Recovery from lost data OK OK No file mgmt OK, MySQL

Versions, schema evolve OK Your job Crude Almost OK

Long term availability OK? ??? OK? OK

Access control OS Your job OS OS, MySQL

Admin tools OK Basic No MySQL

Recovery of subsets OK OK No file mgmt OK, MySQL

WAN distribution OK Hard No file mgmt MySQL

Data locality control OK OK OS OS, MySQL

Linux No OK OK OK

April 18, 2023 Vincenzo Innocente LCB workshop

46

Fermi RUNII (CDF & DØ)Fermi RUNII (CDF & DØ)

Sequential access model based on RUNI experience focus on efficient data access from hierarchical storage clustering optimized to largest data volume access

pattern Use

ROOT (CDF), EVpack (modified DSPACK) (DØ) for event files (MSQL and Oracle8 evaluated by DØ)

just I/O back-ends to EDM and DØOM DØ uses SAM for event catalog and file management

Oracle8 supporting database

April 18, 2023 Vincenzo Innocente LCB workshop

47

Data Organization Data Organization (Fermi RunII)(Fermi RunII)Data Organization Data Organization (Fermi RunII)(Fermi RunII)

Physical Clustering

Metadata

EventInformationTiersWarm

Cache

User and physics group(derived) data

From Oct 1997 Review - Lee Lueking

April 18, 2023 Vincenzo Innocente LCB workshop

48

Data Access Data Access (Fermi RunII)(Fermi RunII)Data Access Data Access (Fermi RunII)(Fermi RunII)

Mass Storage Pipeline Consumers

=Disk Storage

=Tape Storage

=File

=Event

=Data flow =Group of Users

=Single User=Pipeline Name

Lee Lueking - October 1997

April 18, 2023 Vincenzo Innocente LCB workshop

49

15.0 Stage IV 2/23/98

3.512.5

24.665340

11.6 44.2297000 300 33

23.1 15.2777784.3

2.5 8.6

2.5

118800 62402.8 2.8

0.250

5940 11880196.4

Dynamic Data Buffer

(R4)

RAWData

Archive(R1)

ReconstructionFarm (P-1)

AnalysisData(R2)

User Analysis

Disk(R5)

Static (R3)Data BufferThumbnail

Derived Data

Cache(R6)

Analysis

Processing(P-3)

On Demand FIles

Freight Train Data

EDU50

EDU250

PickEvents(P-2)

PickHot

Cache (R8)

PickWarm

Cache (R7)

To PickHot

To PickHot

PickEvents(P-4)

Season IV - aggregate bandwidths, summed from spreadsheet

(non-technical)Risk

Analysis

April 18, 2023 Vincenzo Innocente LCB workshop

51

Toward 2001 MilestoneToward 2001 Milestone

“If the ODBMS industry flourishes it is very likely that by 2005 CMS will be able to obtain products, embodying thousands of man-years of work, that are well matched to its worldwide data management and access needs. The cost of such products to CMS will be equivalent to at most a few man-years. We believe that the ODBMS industry and the corresponding market are likely to flourish. However, if this is not the case, a decision will have to be made in approximately the year 2000 to devote some tens of man-years of effort to the development of a less satisfactory data management system for the LHC experiments.”

(CMS Computing Technical Proposal, section 3.2, page 22)

April 18, 2023 Vincenzo Innocente LCB workshop

52

Commercial vs Open SourceCommercial vs Open Source

Robust, tested, maintained, well documented (is stable)

Response to upgrade requests is often slow

They can not jeopardize deployed application

priority given to short term profit

difficult to understand internal details (no source)but in principle documentation

should be enough can go out of business

Good enough for physicistsRequire internal certification

Response to upgrade even too fast

old users usually ready to jump on new features

priority given to challenging requests...

Open source often you need it….

Author could get bored

April 18, 2023 Vincenzo Innocente LCB workshop

53

ODBMSODBMSObjectivity seems to satisfy HEP technical

requirements Needs upgrade for

VLDB support Mass storage interface remote access and data distribution

Not really a DBMS. More a DB access layer requires to be integrated (or interfaced) to application

frameworks and to administration tools It is the only real ODBMS survivor on the market

how long it will last?

April 18, 2023 Vincenzo Innocente LCB workshop

54

ROOTROOT A physics analysis framework with I/O support

Classified also as a rapid-prototyping tool (B.Meyer) Not sufficient for the management of large data

volumes (LHC major requirement) an external DBMS is required to manage Meta-Data

Limited experience so far (as POM in production) Many motivated users actively supported by the

authors Requires major architectural changes to make it

“modular” for those who do not want to use it as a framework

April 18, 2023 Vincenzo Innocente LCB workshop

55

Yet Another POMYet Another POM

Prototype required to uncover requirements understand problems estimate development effort

Usable as test-bed before asking upgrades to a commercial partner

Usable as “light-pom”? no transaction, no journaling, no schema, just

data…

April 18, 2023 Vincenzo Innocente LCB workshop

56

Personal CommentsPersonal Comments Event Data

object modeling and direct navigation OK DBMS tools (query processor, smart-association,

index, names, versions) more a burden than an help

Event Catalog, Environmental data, Detector description fit better standard (O)DBMS practices and tools

Statistical data simple I/O is not enough, need direct relations with

event catalog and event data Relation models do not suite HEP applications

April 18, 2023 Vincenzo Innocente LCB workshop

57

Personal CommentsPersonal Comments

“Applications require to be independent of underling technologies” Migration to a new technology should imply a

finite effort: Market survey 0.5PY Learning 1 year Implementation 1PY User Migration 0?

(P stands for Person not Peta!)

April 18, 2023 Vincenzo Innocente LCB workshop

58

Personal CommentsPersonal Comments

My personal LEP experience brought me to the conclusion that a multitude of persistency solution are difficult to manage and integrate properly.

In particular a file-based event-store (with filenames encoding metadata) does not scale.

My current (limited) experience tends to convince me more and more that a coherent approach to persistency is the only solution for LHC given the resource constrains we have

April 18, 2023 Vincenzo Innocente LCB workshop

59

Discussion ItemsDiscussion Items

What kind of POM should be used for raw-data reconstructed-data user-data meta-data environmental data

Is a coherent solution possible? Hybrid Solution

Have we evaluated all technical risks?

April 18, 2023 Vincenzo Innocente LCB workshop

60

Discussion ItemsDiscussion Items

Persistency & Framework which integration conversion & transient cache: who should be

responsible for? Hierarchical storage:

Does it impose constrains on data model and access model (and eventually on the POM)?

What we have learned in using Objectivity ROOT RDBMS

April 18, 2023 Vincenzo Innocente LCB workshop

61

Discussion ItemsDiscussion Items

Non-technical Risks: benefit and risks in choosing between a

Commercial and an Open-Source solution 2001 Milestone

What should really be decided in 2001 Do we understand all technical aspects

involved in choosing a POM? Do we need any further R&D?