10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics,...

23
10/03/2008 A.Minaenko 1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08

Transcript of 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics,...

Page 1: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 1

ATLAS computing in Russia

A.MinaenkoInstitute for High Energy Physics, Protvino

JWGC meeting 10/03/08

Page 2: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 2

ATLAS RuTier-2 tasks

• Russian Tier-2 (RuTier-2)computing facility is planned to supply with computing resources all 4 LHC experiments including ATLAS. It is a distributed computing center including at the moment computing farms of 6 institutions: ITEP, KI, SINP (all Moscow), IHEP (Protvino), JINR (Dubna), PNPI (St.Petersburg)• The main RuTier-2 task is providing facilities for physics analysis using AOD, DPD and user derived data formats as ROOT trees. • Full current AOD and 30% of previous AOD version should be available• Development of reconstruction algorithms should be possible which require some subsets of ESD and Raw data• All the data used for analysis should be stored on disk servers (SE) and some unique data (user, group DPD) to be saved on tapes also as well as previous AOD/DPD version• The second important task is production and storage of MC simulated data • The planned RuTier-2 resources should supply the fulfilment of these goals

Page 3: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 3

ATLAS RuTier-2 resource evolution2007 2008 2009 2010 2011 2012

CPU (kSI2k)

320 780 1500 2800 3800 4800

Disc (TB)

150 280 610 1400 2200 3000

Tape (TB)

70 160 370 580 780

• The table above was included in the table of Russia pledge to LCG and it illustrates our current understanding of the resources needed. It can be corrected in future when we’ll understand our needs better• Not taken into account: AOD increase due to inclusive streaming, change of rate MC events (30% instead of 20%), possible increase of AOD event size (taken 100 KB), increase of total DPD size (taken 0.5 of AOD)

Page 4: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 4

Current RuTier-2 resources for all experiments

CPU slots CPU, kSI2k Disc, TB

IHEP 172 260 45

ITEP 146 250 52

JINR 240(160) 670(430) 83

KI 400 1000 30(250)

PNPI 188 280 52

SINP 176 280 9(50)

Total 1322(160) 2740(430) 271(300)

• Red – will be available in 1-2 month• ATLAS request for 2008 = 780 kSI2k, 280 TB

Page 5: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 5

Normalized CPU time (hour*kSI2k)

Page 6: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 6

RuTier-2 for ATLAS in 2007

ATLAS – 21% ATLAS – 846 kh*kSI2k

Page 7: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 7

Site contributions in ATALAS in 2007

Page 8: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 8

ATLAS RuTier-2 in the SARA cloud

• The sites of RuTier-2 are associated with ATLAS Tier-1 SARA

• Now 5 sites IHEP, ITEP, JINR, SINP, PNPI are included in TiersOfAtlas list and FTS channels are tuned for the sites

• 4 sites (IHEP, ITEP, JINR, PNPI) successfully participated in 2007 in data transfer functional tests (next slide). This is a coherent data transfer test Tier-0 →Tiers-1→Tiers-2 for all clouds, using existing SW to generate and replicate data and to monitor data flow.

• Other 2007 ATLAS activity is replication of produced MC AOD from Tiers-1 to Tiers-2 according to ATLAS computing model. It is done using FTS and subscription mechanism. RuTier-2 sites (except ITEP) did not participate in the activity because of the severe lack of a free disk space

• 4 sites (IHEP(15%), ITEP(20%), JINR(100%), PNPI(20%)) participated in replication of M4 data. Here percentage of requested for replication data is shown. Only JINR obtained all the data, the other sites were limited by the size of free disk space

• During one week M4 exercises (Aug-Sep07) about two millions of real muon events were detected, written down on disks and tapes and reconstructed in ATLAS Tier-0. Then the reconstructed data (ESD) in quasi-real time were exported to Tiers-1 and their associated Tiers-2. All the chain was working as it should be during real LHC data taking. This was the first successful experience of this sort for ATLAS

• Two slides (10, 11) illustrate the M4 exercises and the 2nd one shows results for the SARA cloud: practically all subscribed data were successfully transmitted

Page 9: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 9

Activities. Functional Tests

Tier1 Tier2

ASGC AU-ATLAS,TW-FTT

BNL AGLT2,BU,MWT2,OU,SLAC,UTA, WISC

CNAF LNF,MILANO,NAPOLI,ROMA1

FZK CSCS,CYF,DESY-HH,DESY-ZN, FZU,FREIBURG, LRZ, WUP

LYON BEIJING,CPPM,LAL,LAPP,LPC,LPNHE, NIPNE,SACLAY,TOKYO

NDGF

PIC IFIC, UAM, IFAE, LIP

RAL GLASGOW,LANC,MANC,QMUL,DUR,EDINBOURGH, OXF,CAM,LIV,BRUN,RHUL

SARA IHEP, ITEP, JINR, PNPI

TRIUMF ALBERTA,MONTREAL,SFU,TORONTO,UVIC

Sep 06 Oct 06 Nov 06 Sep 07 Oct 07

New

DQ

2 SW

rel

ease

. Ju

n 20

07, D

Q2

0.3

New

DQ

2 SW

rel

ease

(0.

2.12

)

New

DQ

2 SW

rel

ease

. O

ct 2

007,

DQ

2 0.

4

10 Tier-1s and 46 Tier-2s participated

Page 10: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 10

M4 Data Replication Activity Summary for All Sites

Complete replicasIncomplete replicasDatasets subscribed

Summary for all Tier-2 sitesSummary for all Tier-1 sites

IHEP, ITEP, JINR, PNPI

Page 11: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 11

M4 Data Replication Activity Summaryfor SARA Cloud

Transfer status:IHEP: 1 trouble file (0.2%)JINR: 1 trouble file (0.3%)ITEP: no troublesPNPI: no troubles

ESD data only

Page 12: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 12

ITEP,IHEP, JINR,PNPI participatedDelay in replication < 24h

M5 Data Replication Activity Summary

Total subscriptionsCompleted Transfers IHEP, ITEP, JINR,PNPI

Page 13: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 13

Russian contribution to the central ATLAS sw/computing

• Russia contribution to ATLAS M&O budget of Category A has amounted 0.5 FTE this year. Two our colleagues (I.Kachaev, V.Kabachenko) were involved in central ATLAS activities at CERN concerning Core sw maintenance. They fulfilled a number of tasks:

• Support of [email protected] list, i.e. managing user quotas, scratch space distribution, user requests/questions concerning AFS space, access rights etc.

• Support of [email protected] list, i.e. managing central ALAS CVS

• Official ATLAS sw release builds: releases 13.0.20, 13.0.26, 13.0.28, 13.0.30 have been build and 13.0.40 is under construction

• Corresponding documentation update: release pages, librarian documentation

• ATLAS AFS management• a lot of scripts have been written to support release builds, release

copy and move, command line interface to TagCollector, cvs tags search and comparison in the TagCollector, etc.

Page 14: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 14

Russian contribution to the central ATLAS sw/computing

• Two our colleagues (A.Zaytsev, S.Pirogov) were visiting CERN (4+4 month) to make contribution to the activity of ATLAS Distributed Data Management (DDM) group. Their tasks included corresponding sw development as well as participation in central ATLAS DDM operations like support of data transfer functional tests, M4 exercises, etc. Special attention were given to SARA cloud to which Russian sites are attached

• During the visit the following main tasks were fulfilled:• Development of the LFC/LRC Test Suite and applying

it to measuring performance of the updated version of the production LFC server and a new GSI enabled LRC testbed

• Extending functionality and documenting the DDM Data Transfer Request Web Interface

• Installing and configuring a complete PanDA server and a new implementation of PanDA Scheduler Server (Autopilot) at CERN and assisting LYON Tier-1site to do the same

• Contributing to the recent DDM/DQ2 Functional Tests (Aug 2007) activity, developing tools for statistical analysis of the results and applying them to the data gathered during the tests

• All the results were reported at the ATLAS internal meetings and at the computing conference CHEP2007

• Part of the activity (0.3 FTE) was accounted as Russia contribution to ATLAS M&O Category A budget (Central Operations part)

Page 15: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 15

Challenges in 2008• FDR-1

– 10 hrs. data taking @200 Hz a few days in a row• CCRC-1

– 4 weeks operation of full Computing Model– All 4 LHC experiments simultaneously

• Sub detector runs

• M6– First week of March

• FDR-2 Simulation Production– 100M events in 90 days plus merging– Using new release

• CCRC-2– Like CCRC-1 but the whole month of May

• FDR-2– Like FDR-1 but at higher luminosity– Timing uncertain now

• M7 ?

Page 16: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 16

Planned ATLAS activity in 2008

Page 17: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 17

Page 18: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 18

ATLAS Production Tiers (Feb 08. Full Dress Rehearsal)

ASGC AU-ATLAS

TW-FTT

BNL AGLT2 BU IU OU SMU SWT2 SLAC UMICH WISC UC

CNAF LNF MILANO

NAPOLI

ROMA1

FZK CSCS CYF DESY-HH

DESY-ZN

FREIBURG

FZU HEPHY-UIBK

LRZ WUP

LYON BEIJING

CPPM LAL LAPP LPC LPNE NIPNE_02

NIPNE_07

SACLAY TOKYO

NDGF IJST2

PIC IFAE IFIC LIP-COIMBRA

LIP-LISBON

UAM

RAL BHAM BRUN CAM DUR EDINBURGH

ECDF GLASGOW

LANCS LIV MANC OXF SHEF QMUL ICL RALPP

SARA IHEP ITEP JINR NIKHEF

PNPI SINP

TRIUMF ALBERTA

MCGILL

SFU TORONTO

UVIC

10 Tier-1s and 56 “Tier-2s”Metrics for T1 success : 100% data transferred (from CERN, from Tier-1s and to Tier-2s)Metrics for T2/T3 success : 95+% data transferred (transfer within cloud)Metrics for cloud success : 75% of sites participated in the test and 75% passed the test

status done part failed No test

Page 19: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 19

Page 21: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 21

Page 22: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 22

Structure of ATLAS data used for physics analysis

• The streaming of ATLAS data is under discussion now and final decision is not accepted yet

• Streaming is based on trigger decision and the assignment of a given event to a stream can not change over time (does not depend on offline procedures)

• There will be 4-7 RAW/ESD physics streams

• One or a few AOD streams per a ESD stream, with of about 10 final AOD streams

• There are two possible types of streaming– Inclusive streaming – one and the same event can be assigned to different streams if it has

corresponding trigger types

– Exclusive streaming – a given event can be assigned to only one stream; if it has signatures permitting to assign it to more than one stream it goes to special overlap stream

• Now the inclusive streaming is considered as preferable

• A given DPD is intended for a given type(s) of analysis and it can collect events from different streams. A DPD contains only needed for a given analysis set of events and only needed part of event information

• Physics analysis will be carried out using AOD streams and (mainly) different DPDs including specific user created formats (as ROOT trees)

Page 23: 10/03/2008A.Minaenko1 ATLAS computing in Russia A.Minaenko Institute for High Energy Physics, Protvino JWGC meeting 10/03/08.

10/03/2008 A.Minaenko 23

Possible scenarios of data distribution and analysis in RuTier-2

• Scenario A: a given AOD stream (or DPD) is thoroughly kept at a given Tier-2 site:

– advantage – can be easily done from the technical point of view using present ATLAS DDM and analysis tools

– disadvantage – very hard to supply uniform CPU load. At some sites (with “popular” data) CPUs will be overloaded but at other there will be idle CPUs

• Scenario B: each AOD stream (large DPD) is split between all the sites:– advantage – uniform CPU load

– disadvantage – i) possible difficulties with subscription providing automated splitting of data (?); ii) will be analysis grid sub-jobs able to find sites with needed data (?)

• From the point of view of functionality scenario B is more preferable but the question is: do existing ATLAS tools permit to realize the scenario (present answer – yes, but it is necessary to test this practically)

• AOD and DPD to be distributed proportionally to the CPU (kSI2k) between the participating sites