ATLAS Data Challenges

26
ATLAS Data Challenges ATLAS Software Workshop CERN September 20th 2001 Gilbert Poulard CERN EP-ATC

description

ATLAS Data Challenges. ATLAS Software Workshop CERN September 20th 2001 Gilbert Poulard CERN EP-ATC. From CERN Computing Review. CERN Computing Review (December 1999 - February 2001) Recommendations: organize the computing for the LHC era LHC Grid project - PowerPoint PPT Presentation

Transcript of ATLAS Data Challenges

Page 1: ATLAS Data Challenges

ATLAS Data Challenges

ATLAS Software Workshop

CERN September 20th 2001

Gilbert PoulardCERN EP-ATC

Page 2: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

2

From CERN Computing Review

CERN Computing Review (December 1999 - February 2001)

Recommendations:organize the computing for the LHC era

LHC Grid project• Phase 1: Development & prototyping (2001-2004)• Phase 2: Installation of the 1st production system

(2005-2007)Software & Computing Committee (SC2)Proposal being submitted to the CERN council

o Ask the experiments to validate their Computing model by iterating on a set of Data Challenges of increasing complexity

Page 3: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

3

LHC Computing GRID project

Phase 1:prototype construction

develop Grid middlewareacquire experience with high-speed wide-area networkdevelop model for distributed analysisadapt LHC applicationsdeploy a prototype (CERN+Tier1+Tier2)

Softwarecomplete the development of the 1st version of the physics application and enable them for the distributed grid model

• develop & support common libraries, tools & frameworks– including simulation, analysis, data management, ...

• in parallel LHC collaborations must develop and deploy the first version of their core software

Page 4: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

4

ATLAS Data challenges

Goal understand and validate our computing model and our software

How?Iterate on a set of DCs of increasing complexity

start with data which looks like real dataRun the filtering and reconstruction chainStore the output data into our databaseRun the analysisProduce physics results

studyPerformances issues, database technologies, analysis scenarios, ...

identifyweaknesses, bottle necks, etc…

Page 5: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

5

ATLAS Data challenges

But:Today we don’t have ‘real data’

Needs to produce ‘simulated data’ first so:

• Physics Event generation • Simulation • Pile-up • Detector response• Plus reconstruction and analysis

will be part of the first Data Challenges

we need also to “satisfy” the ATLAS communitiesHLT, Physics groups, ...

Page 6: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

6

ATLAS Data challenges

DC0 November-December 2001'continuity' test through the software chainaim primarily to check the state of readiness for DC1

DC1 February-July 2002reconstruction & analysis on a large scale

learn about data model; I/O performances; identify bottle necks …

data managementshould involve CERN & outside-CERN sites

scale 107 events in 10-20 days, O(1000) PC’sdata needed by HLT (others?)

simulation & pile-up will play an important rolechecking of Geant4 versus Geant43

DC2 January-September 2003use ‘prototype’, Grid middlewareincreased complexity

Page 7: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

7

DC scenario

Production Chain:Event generationSimulation Pile-upDetectors responsesReconstructionAnalysis

Page 8: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

8

Production stream

Input Output Framework

Event generation Pythia(others)

none Ntuple/FZOO-db

Slug/genzAthena

Simulation Geant3dice

Ntuple/FZ FZ AtlsimSlug/Genz

Pile-up &Detector responses

Atlsim FZRZ

FZ AtlsimSlug/Dice

Data conversion FZ OO-db Athena

Reconstruction OO-db OO-db“Ntuple”

Athena“Atrecon?”

Analysis “Ntuple” PawRootAnapheJas

Page 9: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

9

Event generation

The type of events has to be defined Several event generators will, probably be used

For each of them we have to define the versionin particular Pythiashould it be a special ATLAS one? (size of common block)

We have also to insure that it runs for large statistics Both, events type & event generators have to be defined by

HLT group (for HLT events)Physics community

Depending on the output we can use the following frameworksATGEN/GENZ

for ZEBRA output format

Athenafor output in OO-db (HepMC)

we can also think to use only one framework and ‘convert’ the output from one to the other one (OO-db to Zebra or Zebra to OO-db), depending on the choice. I don’t think this is realistic.

Page 10: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

10

Simulation The goal is here to track the particles generated by the

event generator to the detector. We can use either Geant3 or Geant4

for HLT & physics studies we still rely on Geant3I think that Geant4 should also be used

to get experience with ‘large production’ as part of its validation it would be good to use the same geometry

‘same geometry’ has to be defined This is a question to the ‘simulation’ group

In the early stage we could decide to use only part of the detector

it would also be good to use the same sample of generated events this has also to be defined by the ‘simulation’ group

for Geant3 simulation we will use either the “Slug/Dice” framework or the “Atlsim” framework

In both cases output will be Zebra (“Hits” and “deposited energy” for the calorimeters)

for Geant4 simulation I think that we will use the FADS/Goofy framework

output will be ‘Hits collections’ in OO-db

Page 11: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

11

Pile-up & digitization

We have few possible scenariosWork in “Slug/Dice” or “Atlsim” framework

input is ZEBRA output is ZEBRAadvantage: we have the full machinery in place

Work in “Athena” framework2 possibilities

• 1) ‘mixt’ – input is hits from ZEBRA– ‘’digits’ and digits collections’ are produced– output is ‘digits collections’ in OO-db

• 2) ‘pure’ Athena– input is ‘Hits collections’ from OO-db– ’digits’ and digits collections’ are produced– output is ‘Digits collections’ in OO-db

We have to evaluate the consequences of the choice

Page 12: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

12

Reconstruction

Reconstructionwe want to use the ‘new reconstruction’ code being run in Athena frameworkInput should be from OO-dbOutput in OO-db:

ESD (event summary data)AOD (analysis object data)TAG (event tag)

Atrecon could be a back-up possibilityTo be decided

Page 13: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

13

Analysis

We are just starting to work on this but Analysis tools evaluation should be part of the DC

It will be a good test of the Event Data ModelPerformance issues should be evaluated

Analysis scenario number of analysis group, number of physicists per group, number of people who want to access the data at the same timeis of ‘first’ importance to ‘design’ the analysis environment

• to measure the response time• to identify the bottle necks

for that we need input from you

Page 14: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

14

Data management

Several ‘pieces’ of what I call ‘Infrastructure’ will have to be decided, prepared and put in place. Not only the software but also the hardware and the tools to manage the data. Among them:

Everything related to the OO-db (Objy or/and ORACLE)Tools for creation, replication, distribution, ...

What do we do with ROOT I/OWhich fraction of the events will be done with ROOT I/O

We said that the evaluation of more than one technology is part of the DCFew thousand of files will be produced and we will need a “bookkeeping” to keep track of what happened during the processing of the data and a “catalog” to be able to locate all pieces of information

Where is the “HepMC” data ?Where is the corresponding “simulated” or AOD data ?Which selection criteria have been applied with which selection parameters, etc ?Correlation between different pieces of information?

Page 15: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

15

DC scenario

For DC0 (end of September ?) we will have to see what is in place and decide on the strategy to be adopted in terms of:

Software to be usedDice geometry (which version ?)Reconstruction adapted to this geometryDatabase

Infrastructure I hope that we will have in place ‘tools’ for:

• Automatic job-submission• catalog and bookkeeping• allocation of “run numbers” and of “random numbers”

(bookkeeping)

we have to check with people involved in ‘grid’ projects or other projects (projects are not in phase)

I believe that the ‘validation’ of the various components should start now

Page 16: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

16

DC scenario

For DC1,On the basis of what we will learn from DC0 we will have to adapt our strategySimulation & pile-up will be of great importance

strategy to be defined (I/O rate, number of “event” servers?)

Since we say that we would like to do it ‘world-wide’ we will have to see what can be used from the GRID developmentsWe will have to ‘port’ our software to the GRID environment (we have already a kit based on 1.3.0 release) Don’t forget that we have to provide data to our HLT colleagues and the schedule should take into account their needs

Page 17: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

17

DC1-HLT - CPU

Number of events

Time per event sec SI95

Total time Sec SI95

Total timeHoursSI95

simulation107 3000

3 * 1010 107

reconstruction 107 640

6.4 * 109 2 * 106

Page 18: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

18

DC1-HLT - data

Number of events

Event sizeMB

Total size GB

Total sizeTB

simulation107 2 20000 20

reconstruction 107 0.5 5000 5

Page 19: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

19

DC1-HLT data with pile-up

L Number of events

Event size MB

Total size GB Total size TB

2 x 1033 1.5 x 106 (1) 2.6(2) 4.7

40007000

47

1034 1.5 x 106 (1) 6.5(2) 17.5

1000026000

1026

In addition to ‘simulated’ data, assuming ‘filtering’ after simulation (~14% of the events kept).

- (1) keeping only ‘digits’

- (2) keeping ‘digits’ and ‘hits’

Page 20: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

20

DC scenario

For DC1,On the hardware side we will have to insure that we have enough resources in terms of CPU, disk space, tapes, data servers …

We have started to do the evaluation of our needs but this should be checkedWhat will we do with the data generated during the DC?

• Keep it on CASTOR (CERN mass storage system)? Tapes?• Outside institutes will use other systems (HPSS, …)

How will we exchange the data?• Do we want to have all the information at CERN?,

everywhere?

What are the networking requirements?

Page 21: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

21

Ramp-up scenario

0

50

100

150

200

250

300

350

400

7 11 16 20 24 25 26

CPU

Page 22: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

22

What next

Prepare a first list of goals & requirementswith

HLT, Physics community simulation, reconstruction, database communities people working on ‘infrastructure’ activities

• bookkeeping, cataloguing, ...

In order toprepare a list of tasks

Some Physics orientedBut also like testing code, running production, …set a list of work packages

define the priorities

Page 23: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

23

What next

In parallelStart to build a task force

Volunteers?• Should come from the various activities

Start discussion with: people involved in GRID projects and responsible of Tier centers

Evaluate the necessary resources@ CERN (COCOTIME exercise)Outside CERN

Page 24: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

24

Then

Start the validation of the various components in the chain (putting dead lines for readiness)

SoftwareSimulation, pile-up, …

Infrastructure Database, bookkeeping, …

Estimate what it will be realistic (!) to doFor DC0, DC1where (sharing of the work)

Insure that we have the resourcesincluding manpower

“And turn the key”

Page 25: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

25

Expression of interests

So far, after the NCB meeting of July 10th:Canada, France, Germany, Italy, Japan, Nederland, Nordic Grid, Poland, Russia, UK, US, …

Proposition to help in DC0Proposition to participate to DC1Contact with HLT communityneeds input from other (physics) communities

Contact with Grid projectsEU-Data-GRID

• Kit of ATLAS software

Other projects

contact with Tier centersThe question of the entry level to DC1 has been raised (O(100)?)

Page 26: ATLAS Data Challenges

ATLAS Software Workshop - CERN - 20 September 2001

26

Work packages

First (non exhaustive) list of work packages:Event generationSimulation

Geant3Geant4

Pile-upDetectors responses (Digitization)“Zebra” – “OO-db” conversionEvent filteringReconstructionAnalysisdata managementTools

Job submission & monitoringBookkeeping & cataloguingWeb interface