ATLAS Data Challenge Production and U.S. Participation

17
ATLAS Data Challenge Production and U.S. Participation Kaushik De Kaushik De University of Texas at Arlington University of Texas at Arlington BNL Physics & Computing Meeting BNL Physics & Computing Meeting August 29, 2003 August 29, 2003

description

ATLAS Data Challenge Production and U.S. Participation. Kaushik De University of Texas at Arlington BNL Physics & Computing Meeting August 29, 2003. ATLAS Data Challenges. Original Goals (Nov 15, 2001) - PowerPoint PPT Presentation

Transcript of ATLAS Data Challenge Production and U.S. Participation

Page 1: ATLAS Data Challenge Production and U.S. Participation

ATLAS Data Challenge Productionand U.S. Participation

Kaushik DeKaushik DeUniversity of Texas at ArlingtonUniversity of Texas at Arlington

BNL Physics & Computing MeetingBNL Physics & Computing Meeting

August 29, 2003August 29, 2003

Page 2: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 2

ATLAS Data Challenges

Original Goals (Nov 15, 2001)Original Goals (Nov 15, 2001) Test computing model, its software, its data model, and to ensure

the correctness of the technical choices to be made Data Challenges should be executed at the prototype Tier centres Data challenges will be used as input for a Computing Technical

Design Report due by the end of 2003 (?) and for preparing a MoU

Current StatusCurrent Status Goals are evolving as we gain experience Computing TDR ~end of 2004 DC’s are ~yearly sequence of increasing scale & complexity DC0 and DC1 (completed) DC2 (2004), DC3, and DC4 planned Grid deployment and testing is major part of DC’s

Page 3: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 3

ATLAS DC1: July 2002-April 2003

Goals : Produce the data needed for the HLT TDR Get as many ATLAS institutes involved as possible

Worldwide collaborative activityParticipation : 56 Institutes (39 in phase 1)

Australia Australia

AustriaAustria

CanadaCanada

CERN CERN

ChinaChina

Czech RepublicCzech Republic

Denmark *Denmark *

France France

Germany Germany

GreeceGreece

IsraelIsrael

ItalyItaly

JapanJapan

Norway *Norway *

PolandPoland

RussiaRussia

SpainSpain

Sweden *Sweden *

TaiwanTaiwan

UKUK

USA *USA * New countries or institutes

* using Grid

Page 4: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 4

(6300)(6300)(84)(84)2.5x102.5x1066ReconstructionReconstruction

+ Lvl1/2+ Lvl1/2

14141650165022224x104x1066Lumi02 Pile-upLumi02 Pile-up

22960096001251253x103x1077SimulationSimulation

Single part.Single part.

6060

2121

2323

TBTB

Volume of Volume of datadata

51000 (+6300)51000 (+6300)

37503750

60006000

3000030000

CPU-daysCPU-days

(400 SI2k)(400 SI2k)

kSI2k.monthskSI2k.months

4x104x1066

2.8x102.8x1066

101077

No. of No. of eventsevents

CPU TimeCPU TimeProcessProcess

690 (+84)690 (+84)TotalTotal

5050ReconstructionReconstruction

7878Lumi10 Pile-upLumi10 Pile-up

415415SimulationSimulation

Physics evt.Physics evt.

DC1 Statistics (G. Poulard, July 2003)

Page 5: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 5

End-July 03: End-July 03: Release 7Release 7

Mid-November 03: pre-Mid-November 03: pre-production releaseproduction release

February 1February 1st 04st 04: : Release 8 Release 8 (production)(production)

April 1April 1stst 04: 04:

June 1June 1st st 04: “DC2” 04: “DC2”

July 15thJuly 15th

Put in place, understand & validatePut in place, understand & validate: : Geant4; POOL; LCG applications Event Data Model Digitization; pile-up; byte-stream Conversion of DC1 data to POOL; large scale

persistency tests and reconstructionTesting and validationTesting and validation

Run test-production

Start final validationStart final validation

Start simulation; Pile-up & digitizationStart simulation; Pile-up & digitizationEvent mixingEvent mixingTransfer data to CERNTransfer data to CERN

Intensive Reconstruction on “Tier0”Intensive Reconstruction on “Tier0”Distribution of ESD & AODDistribution of ESD & AODCalibration; alignmentCalibration; alignmentStart Physics analysisStart Physics analysisReprocessingReprocessing

DC2:Scenario & Time scale (G. Poulard)

Page 6: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 6

U.S. ATLAS DC1 Data Production

Year long process, Summer 2002-2003Year long process, Summer 2002-2003

Played 2nd largest role in ATLAS DC1Played 2nd largest role in ATLAS DC1

Exercised both farm and grid based productionExercised both farm and grid based production

10 U.S. sites participating10 U.S. sites participating Tier 1: BNL, Tier 2 prototypes: BU, IU/UC, Grid Testbed sites: ANL,

LBNL, UM, OU, SMU, UTA (UNM & UTPA will join for DC2)

Generated Generated ~2 million~2 million fully simulated, piled-up and fully simulated, piled-up and reconstructed eventsreconstructed events

Largest grid-based DC1 data producer in ATLASLargest grid-based DC1 data producer in ATLAS

Data used for HLT TDR, Athens physics workshop, Data used for HLT TDR, Athens physics workshop, reconstruction software tests...reconstruction software tests...

Page 7: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 7

U.S. ATLAS Grid Testbed

BNL - U.S. Tier 1, 2000 nodes, 5% for BNL - U.S. Tier 1, 2000 nodes, 5% for ATLAS, 10 TB, HPSS through MagdaATLAS, 10 TB, HPSS through Magda

LBNL - pdsf cluster, 400 nodes, 5% for LBNL - pdsf cluster, 400 nodes, 5% for ATLAS (more if idle ~10-15% used), 1TBATLAS (more if idle ~10-15% used), 1TB

Boston U. - prototype Tier 2, 64 nodesBoston U. - prototype Tier 2, 64 nodes

Indiana U. - prototype Tier 2, 64 nodesIndiana U. - prototype Tier 2, 64 nodes

UT Arlington - new 200 cpu’s, 50 TBUT Arlington - new 200 cpu’s, 50 TB

Oklahoma U. - OSCER facilityOklahoma U. - OSCER facility

U. Michigan - test nodesU. Michigan - test nodes

ANL - test nodes, JAZZ clusterANL - test nodes, JAZZ cluster

SMU - 6 production nodesSMU - 6 production nodes

UNM - Los Lobos clusterUNM - Los Lobos cluster

U. Chicago - test nodesU. Chicago - test nodes

Page 8: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 8

U.S. Production Summary

Number of Number of CPU hours CPU hours CPU hours

Files in Magda Events Simulation Pile-up Reconstruction

25 Gev di-jets 41k 1M ~60k 56k 60k+

50 Gev di-jets 10k 250k ~20k 22k 20k+

Single particles 24k 200k 17k 6k

Higgs sample 11k 50k 8k 2k

SUSY sample 7k 50k 13k 2k

minbias sample 7k ? ?

* Total ~30 CPU YEARS delivered to DC1 from U.S.* Total produced file size ~20TB on HPSS tape system, ~10TB on disk.* Black - majority grid produced, Blue - majority farm produced

Exercised both farm and grid based productionExercised both farm and grid based production Valuable large scale grid based production experienceValuable large scale grid based production experience

Page 9: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 9

Grid Production Statistics

UTA33%

OU20%

LBL47%

Figure : Pie chart showing the sites where DC1 single particle simulation jobs were processed. Only three grid testbed sites were

used for this production in August 2002.

Figure : Pie chart showing the number of pile-up jobs successfully completed at various U.S. grid sites for dataset 2001 (25 GeV dijets). A total of 6000 partitions

were generated.

These are examples of some datasets produced on the Grid. Many other large samples were produced, especially at BNL using batch.

Page 10: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 10

DC1 Production Systems

Local batch systems - bulk of productionLocal batch systems - bulk of production

GRAT - grid scripts, generated ~50k files produced in U.S.GRAT - grid scripts, generated ~50k files produced in U.S.

NorduGrid - grid system, ~10k files in Nordic countriesNorduGrid - grid system, ~10k files in Nordic countries

AtCom - GUI, ~10k files at CERN (mostly batch)AtCom - GUI, ~10k files at CERN (mostly batch)

GCE - Chimera based, ~1k files producedGCE - Chimera based, ~1k files produced

GRAPPA - interactive GUI for individual userGRAPPA - interactive GUI for individual user

EDG - test files onlyEDG - test files only

+ systems I forgot…+ systems I forgot…

More systems coming for DC2More systems coming for DC2 LCG GANGA DIAL

Page 11: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 11

Databases used in GRAT

Production databaseProduction database define logical job parameters & filenames track job status, updated periodically by scripts

Data management (Magda)Data management (Magda) file registration/catalogue grid based file transfers

Virtual Data CatalogueVirtual Data Catalogue simulation job definition job parameters, random numbers

Metadata catalogue (AMI)Metadata catalogue (AMI) post-production summary information data provenance

Page 12: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 12

U.S. Middleware Evolution

Used for 95% of DC1 production

Used successfully for simulation

Tested for simulation, used forall grid-based reconstruction

Used successfully for simulation(complex pile-up workflow not yet)

Page 13: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 13

U.S. Experience with DC1

ATLAS software distribution worked well for DC1 farm ATLAS software distribution worked well for DC1 farm production, but not well suited for grid productionproduction, but not well suited for grid production

No integration of databases - caused many problemsNo integration of databases - caused many problems

Magda & AMI very useful - but we are missing data Magda & AMI very useful - but we are missing data management tool for truly distributed productionmanagement tool for truly distributed production

Required a lot of people to run production in the U.S., Required a lot of people to run production in the U.S., especially with so many sites on both grid and farmespecially with so many sites on both grid and farm

Startup of grid production slow - but learned useful lessonsStartup of grid production slow - but learned useful lessons

Software releases were often late - leading to chaotic last Software releases were often late - leading to chaotic last minute rush to finish productionminute rush to finish production

Page 14: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 14

U.S. Plans for DC2

Computing organization in the U.S. has been restructured Computing organization in the U.S. has been restructured to reflect growing importance of grid in DC2 (we hope to to reflect growing importance of grid in DC2 (we hope to use only grid based production for DC2 in the U.S.)use only grid based production for DC2 in the U.S.)

R. Gardner leading effort to develop grid tools and services, R. Gardner leading effort to develop grid tools and services, K. De & P. Nevski leading productionK. De & P. Nevski leading production

New tools being developed for DC2, based on Chimera New tools being developed for DC2, based on Chimera see: http://www.usatlas.bnl.gov/computing/grid/gcesee: http://www.usatlas.bnl.gov/computing/grid/gce

Joint CMS/ATLAS preDC2 exercise underway - called Joint CMS/ATLAS preDC2 exercise underway - called Grid3, for next 6 monthsGrid3, for next 6 months

Need to develop plans and have software ready and tested Need to develop plans and have software ready and tested before real DC2 production/user analysis startsbefore real DC2 production/user analysis starts

Page 15: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 15

Plans for DC2 Production System

Need unified system for ATLASNeed unified system for ATLAS for efficient usage of facilities, improved scheduling, better QC should support all varieties of grid middleware (& batch?)

First “technical” meeting at CERN August 11-12, 2003First “technical” meeting at CERN August 11-12, 2003 attended by Luc Goosens*, KD, Rich Baker, Rob Gardner,

Alessandro De Salvo, Jiri Chudoba, Oxana Smirnova design document is being prepared planning a Supervisor/Executor model (see fig. next slide) first prototype software should be released ~6 months U.S. well represented in this common ATLAS effort Still unresolved - Data Management System Need strong coordination with database group (Luc & Kaushik

attended Database meeting at Oxford in July)

Page 16: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 16

Schematic of New DC2 System

Main featuresMain features Common production

database for all of ATLAS Common ATLAS supervisor

run by all facilities/managers Common data management

system a la Magda Executors developed by

middleware experts (LCG, NorduGrid, Chimera teams)

Final verification of data done by supervisor

U.S. involved in almost all U.S. involved in almost all aspects - could use more helpaspects - could use more help

Page 17: ATLAS Data Challenge Production and U.S. Participation

August 29, 2003August 29, 2003Kaushik De, U.S. ATLAS Physics & Computing Meeting, BNLKaushik De, U.S. ATLAS Physics & Computing Meeting, BNL 17

Conclusion

Data Challenges are important for ATLAS software and Data Challenges are important for ATLAS software and computing infrastructure readinesscomputing infrastructure readiness

U.S. playing a major role in DC planning & productionU.S. playing a major role in DC planning & production

12 U.S. sites ready to participate in DC2, more welcome12 U.S. sites ready to participate in DC2, more welcome

Production software development needs helpProduction software development needs help

Physics analysis major emphasis of DC2Physics analysis major emphasis of DC2

Involvement by more U.S. physicists is needed in DC2Involvement by more U.S. physicists is needed in DC2 to verify quality of data to tune physics algorithms to test scalability of physics analysis model