USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC...

USATLAS SC4

2

?!?!

130.199.48.0130.199.48.0

……

……

130.199.185.0130.199.185.0

130.199.48.0130.199.48.0

The same host name for dual NIC dCache door is resolved to different IP addresses depending on which DNS is inquired.

4

20 Gb/s

NSF RAID

(20 TB)

HPSS Mass Storage System

Gridftp

(2 nodes / 0.8 TB local) HRM SRM

(1 node)

dCache SRM

(1 node)

Gridftp door

(4 nodes)

WAN2x10 Gb/s

LHC OPN VLAN

2 x 1 Gb/s1 Gb/s

Write Pool(10 nodes / 2.1 RAID5 TB)

Read Pool (314 nodes / 145 TB)

5 x 1 Gb/sTier 1 VLANS

20 Gb/s

4 x 1 Gb/s

dCache

. . . . N x 1 Gb/s . . . .

20 Gb/s

Logical Connections

BNL Tier 1 WAN Storage Interfaces and Logic View

5

SC4 Throughput Phase

6

SC4 Throughput Phase Summary

All data transfer were bypassing BNL firewall for high performance.All data transfer were bypassing BNL firewall for high performance.

BNL had achieved/exceeded BNL USATLAS Tier 2 Mou target for data BNL had achieved/exceeded BNL USATLAS Tier 2 Mou target for data transfer. One of best WLCG Tier 1 site.transfer. One of best WLCG Tier 1 site.

We gained experience of serving USATLAS production and Service We gained experience of serving USATLAS production and Service challenge on the same dCache system simultaneously. BNL is the only challenge on the same dCache system simultaneously. BNL is the only Tier 1 site doing this. Tier 1 site doing this.

Identified several performance bottlenecks among the stack of USATLAS Identified several performance bottlenecks among the stack of USATLAS data manager and data transfer (Panda DQ2 FTS, dCache, network) data manager and data transfer (Panda DQ2 FTS, dCache, network) which can impact both SC4 and USATLAS production. which can impact both SC4 and USATLAS production.

Fixed the dCache bottleneck by separating core services into multiple Fixed the dCache bottleneck by separating core services into multiple high performance hosts, creating dedicated resources for multiple ATLAS high performance hosts, creating dedicated resources for multiple ATLAS data transfer activities, and tuning memory, file system and database. data transfer activities, and tuning memory, file system and database.

Evaluated the new dCache release. Evaluated the new dCache release.

7

SC4 Service Phase (All ATLAS Tier 1 site)

8

SC4 Service Phase (DQ2 monitoring)

9

SC4 Service Phase Summary

DQ2 coordinated data transfer. DQ2 coordinated data transfer.

BNL provided tape storage for RAW data export and disk only storage for ESD BNL provided tape storage for RAW data export and disk only storage for ESD and AOD.and AOD.

dCache was significantly improved compared with throughput phase. (Thanks for dCache was significantly improved compared with throughput phase. (Thanks for the lessons learned from April). It can easily handle the data transfer the lessons learned from April). It can easily handle the data transfer requirement from Panda OSG production and SC4 ATLAS Tier 0 export.requirement from Panda OSG production and SC4 ATLAS Tier 0 export.

The data transfer channels from other Tier 1 site (except CNAF) to BNL were The data transfer channels from other Tier 1 site (except CNAF) to BNL were verified by Hiro Ito (BNL DDM operation). (CNAF was upgrading their SRM).verified by Hiro Ito (BNL DDM operation). (CNAF was upgrading their SRM).

The ATLAS DDM coordinated data transfer between USATLAS Tier 2 and Tier 1 The ATLAS DDM coordinated data transfer between USATLAS Tier 2 and Tier 1 are well ahead in schedule. (Thanks to OSG Panda production).are well ahead in schedule. (Thanks to OSG Panda production).

We verified the integrated data flow of ATLAS Tier 0 AOD export from Tier 0 to We verified the integrated data flow of ATLAS Tier 0 AOD export from Tier 0 to BNL, then to US Midwest Tier 2 site. BNL, then to US Midwest Tier 2 site.

The data flow between BNL and CERN are using LHCOPN which provides 10 The data flow between BNL and CERN are using LHCOPN which provides 10 Gbps Layer 2 network connection between NYC and CERN. No STARLight is Gbps Layer 2 network connection between NYC and CERN. No STARLight is involved.involved.

11

Meeting Notes

Use Dual-home dCache doorsUse Dual-home dCache doors

The external interface of doors are in 192.12.15.0The external interface of doors are in 192.12.15.0

The internal interface of the doors are in 130.199.185.0.The internal interface of the doors are in 130.199.185.0.

The data flow (in/out) will always go through doors. The data flow (in/out) will always go through doors.

Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal

IP address, determined by which DNS is used. IP address, determined by which DNS is used.

Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7. Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7.

Request ACL for VLAN 315(?) which 192.12.15.0 reside.Request ACL for VLAN 315(?) which 192.12.15.0 reside. One end: LHC OPN address blocks or 3+2 Tier 2s.

The other end will be 192.12.15.0.

What about other T3 sites to contact with the external interface of dCache doors?What about other T3 sites to contact with the external interface of dCache doors? Need to go through firewall or not?

Two types of storage (Durable and Permanent)Two types of storage (Durable and Permanent) When we received ESD2, the ESD1 will be discarded. Therefore, we do not need to save ESD to HPSS. We

need them, we can get from other Tier 0 and Tier 1 sites.

RAW, our fraction of ESD, AOD, Tier 2 simulation results => Permanent which has tape backend.

Other ESD, AOD will go to durable storage which is not necessarily backed up by tape system.

12

BNL SC4 Plans

VLAN 315 can send network traffic? VLAN 315 can send network traffic?

FTS and LFC will be setup.FTS and LFC will be setup.

LCG 2.7.0LCG 2.7.0 VObox: We also installed ATLAS DQ2 installed on top of it (done) BDII provide static and dynamic monitoring information (STATIC Setup?) R-GMA provide traffic monitoring from Tier 1 to Tier 2. (Plan to make it

available before SC4 Service Phase) CE is based on BNL condor system (Plan to be ready before SC4 service

phase June) Lcg-utils (done)

dCache Preparation (Durable, Permanent, Information Publish).dCache Preparation (Durable, Permanent, Information Publish). Permanent

System manages cache, tape copy, Access sometimes slow

Durable User (VO) manages cache, WITHOUT tape copy, Access fast

13

Publish Information for BNL dCache

List of transfer protocols per SE available from information systemList of transfer protocols per SE available from information system SRM knows what it supports, can inform client

FTS Channel Information.FTS Channel Information.

LFC Information.LFC Information.dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain, ... dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain, ...

[...][...]

GlueSARoot: dteam:/pnfs/my_domain/GlueSARoot: dteam:/pnfs/my_domain/durable-pathdurable-path/dteam/dteam

GlueSAPath: /pnfs/my_domain/GlueSAPath: /pnfs/my_domain/durable-pathdurable-path/dteam/dteam

GlueSAType: GlueSAType: durabledurable

[...][...]

GlueChunkKey: GlueSEUniqueID=dcache.my_domainGlueChunkKey: GlueSEUniqueID=dcache.my_domain

[...][...]

dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain, ...dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain, ...

[...] [...]

GlueSARoot: dteam:/pnfs/my_domain/GlueSARoot: dteam:/pnfs/my_domain/permanent-pathpermanent-path/dteam/dteam

GlueSAPath: /pnfs/my_domain/GlueSAPath: /pnfs/my_domain/permanent-pathpermanent-path/dteam/dteam

GlueSAType: GlueSAType: permanentpermanent

[...][...]

GlueChunkKey: GlueSEUniqueID=dcache.my_domainGlueChunkKey: GlueSEUniqueID=dcache.my_domain

14

SC4 Pre-Production System

Pre-production service will be used as soon as it is available and its Pre-production service will be used as soon as it is available and its

usage won't go away when SC4 starts. There may be periods where the usage won't go away when SC4 starts. There may be periods where the

pre-production service is not extensively used, but the goal is from now on pre-production service is not extensively used, but the goal is from now on

to always develop against the pre-production service. to always develop against the pre-production service.

Atlas VOBOX LFC SRM FTS server Channels

Tier-0 Y(Shared with PROD) Y(local) Y Y to and from all T1s Y Y

Tier-1 Y(Shared with PROD) Y(local) Y YTo and from all the other T1s;

To and from all associated T2sY Y

Tier-2 N N Y N N N N

15

SC4 April Throughput

Need dCache!!!Need dCache!!!

April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an

average daily rate to each Tier1 at or above the full nominal rate average daily rate to each Tier1 at or above the full nominal rate

(200MB/Second). (200MB/Second).

We should continue to run at the same rates unattended over Easter We should continue to run at the same rates unattended over Easter

weekend (14 - 16 April).weekend (14 - 16 April).

Tuesday April 18th - Monday April 24th we should perform the tape Tuesday April 18th - Monday April 24th we should perform the tape

tests at the rates in the table below (75 MB/second). tests at the rates in the table below (75 MB/second).

From after the con-call on Monday April 24th until the end of the month From after the con-call on Monday April 24th until the end of the month

experiment-driven transfers can be scheduled. (LFC will be needed by experiment-driven transfers can be scheduled. (LFC will be needed by

then for DQ2).then for DQ2).

16

SC4 Tier 1 to Tier 1 Data Transfer (May)

Within Each VO, the details of the T1<->T1 transfers still need to be finalized. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows:

We have to focus on our two sister Tier 1 site: IN2P3 and FZK first.

All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s.

dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s

These tests would take place during May, after the April throughput tests and before the SC4 service begins in June.

17

ATLAS Specific Plan

Plans (ATLAS)Plans (ATLAS)

Tier 2 PlansTier 2 Plans

Tier 2 WorkshopTier 2 Workshop

Background Information (Darios Slides)Background Information (Darios Slides)

18

Summary of requests from ATLAS

March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)

April-May (pre-SC4): tests of distributed operations on a “small” testbed April-May (pre-SC4): tests of distributed operations on a “small” testbed (PPS)(PPS)

Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to

Tier-1s (Tier-1s (720MB/s + full ESD to BNL), and Send AODs to (at Send AODs to (at least) a few Tier-2sleast) a few Tier-2s

3 weeks in July: distributed processing tests (Part 1)3 weeks in July: distributed processing tests (Part 1)

2 weeks in July-August: distributed analysis tests (Part 1)2 weeks in July-August: distributed analysis tests (Part 1)

3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with 3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with data to Tier-2sdata to Tier-2s

3 weeks in October: distributed processing tests (Part 2)3 weeks in October: distributed processing tests (Part 2)

3-4 weeks in November: distributed analysis tests (Part 2)3-4 weeks in November: distributed analysis tests (Part 2)

19

Tier 2 Plans

Details of involving Tier 2 are in planning too. Details of involving Tier 2 are in planning too. Tier 2 dCache: dCache needs to be stabilize and operational in one

or all sites at Midwest, southwest and Northwest ( first week of

June) for receiving AODs to (at least) a few Tier-2s.

All Tier 2 dCache should be up and in production in September Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data

Base line client tools should be deployed at Tier 2 centers.

No any other services required for Tier 2 except SRM and DQ2.

20

WLCG Tier 2 Workshop

https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorialshttps://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials

http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_mhttp://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_meeting&showDate=all&showSession=all&detailLevel=contributioneeting&showDate=all&showSession=all&detailLevel=contribution from Monday 12 June 2006 (11:00) to Wednesday 14 June 2006 (18:00)

at CERN ( Council Chamber )

Four Experiment Activities Introduction.

MC Simulation User Cases

An Overview of Calibration & Alignment

Analysis Use Cases

Services Required at / for Tier2s (Grid, Application).

Support and Operation Issues.

Happen in the middle of June.

ATLAS plans for 2006:

Computing System Commissioning

and Service Challenge 4

Dario BarberisDario Barberis

CERN & Genoa UniversityCERN & Genoa University

22

Computing System Commissioning Goals

Main aim of Computing System Commissioning will be to Main aim of Computing System Commissioning will be to

test the software and computing infrastructure that we will test the software and computing infrastructure that we will

need at the beginning of 2007:need at the beginning of 2007: Calibration and alignment procedures and conditions DB

Full trigger chain

Event reconstruction and data distribution

Distributed access to the data for analysis

At the end (At the end (autumn-winter autumn-winter 2006) we will have a working 2006) we will have a working

and operational system, ready to take data with cosmic and operational system, ready to take data with cosmic

rays at increasing ratesrays at increasing rates

23

ATLAS Computing Model

Tier-0:Tier-0: Copy RAW data to Castor tape for archival Copy RAW data to Tier-1s for storage and reprocessing Run first-pass calibration/alignment (within 24 hrs) Run first-pass reconstruction (within 48 hrs) Distribute reconstruction output (ESDs, AODs & TAGS) to Tier-1s

Tier-1s:Tier-1s: Store and take care of a fraction of RAW data Run “slow” calibration/alignment procedures Rerun reconstruction with better calib/align and/or algorithms Distribute reconstruction output to Tier-2s Keep current versions of ESDs and AODs on disk for analysis

Tier-2s:Tier-2s: Run simulation Keep current versions of AODs on disk for analysis

24

ATLAS Tier-0 Data Flow

EF

CPUfarm

T1T1T1sCastorbuffer

RAW

1.6 GB/file0.2 Hz17K f/day320 MB/s27 TB/day

ESD


AOD

10 MB/file2 Hz170K f/day20 MB/s1.6 TB/day

AODm

500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day

RAW

AOD

RAW

ESD (2x)

AODm (10x)

RAW

ESD

AODm

0.44 Hz37K f/day440 MB/s

1 Hz85K f/day720 MB/s

0.4 Hz190K f/day340 MB/s

2.24 Hz170K f/day (temp)20K f/day (perm)140 MB/s

Tape

25

Recent Update for Tier 0 Tier 1 Data Transfer

Location Fract. RAW ESD AODm1 Total RateBrookhaven 24 76.8 100 20 196.8Amsterdam 13 41.6 26 20 87.6Lyon 13.5 43.2 27 20 90.2Karlsruhe 10.5 33.6 21 20 74.6Didcot 7.5 24 15 20 59Taipei 7.7 24.6 15.4 20 60Bologna 7.5 24 15 20 59distributed 5.5 17.6 11 20 48.6Barcelona 5.5 17.6 11 20 48.6Vancouver 5.3 17 10.6 20 47.6Total 100 320 252 200 772

26

BNL Data Flow (2008 Based on 20%)

Tier-0

CPUfarm

T1T1OtherTier-1s

BNLdisk

buffer

RAW

1.6 GB/file0.04 Hz3.4K f/day64 MB/s5.4 TB/day

ESD2


AOD2

10 MB/file0.4 Hz34K f/day4 MB/s0.32 TB/day

AODm2


RAW

ESD2

AODm2

0.088 Hz7.48K f/day88 MB/s7.32 TB/day

T1T1OtherTier-1s

T1T1Tier-2s

BNLTape

RAW


diskstorage

AODm2


ESD2


AOD2

10 MB/file0.4 Hz34K f/day4 MB/s0.32 TB/day

ESD2


AODm2


ESD2


AODm2

500 MB/file0.036 Hz3.1K f/day4*9 MB/s1.44 TB/day

ESD1


AODm1


AODm1

500 MB/file0.04 Hz3.4K f/day20 MB/s*31.6 TB/day

AODm2

500 MB/file0.008 Hz0.70 f/day4 MB/s*30.32 TB/day

Plus simulation Plus simulation && analysis data analysis data

flowflow

Real data storage, reprocessing and

distribution

234MB*nanalysis

27

BNL to 3+2 Tier2 (Estimation!)

See See https://https://uimonuimon..cerncern..chch//twikitwiki/bin/view/Atlas/Tier1DataFlow/bin/view/Atlas/Tier1DataFlow

Tier 1 to Tier 2 likely to be very bursty and driven by analysis demandsTier 1 to Tier 2 likely to be very bursty and driven by analysis demands

Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of

10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.).10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.). Desire to reach 100MBs for each of 3+2 Tier 2 clusters.

300MB/second ~ 500MB/second in total to BNL.

Tier 2 to Tier 1 transfer are almost entirely continuous simulation Tier 2 to Tier 1 transfer are almost entirely continuous simulation

transferstransfers The aggregate input rate to Tier 1 center is comparable to 20%~25% of the

rate from tier 0.

28

Tier-0Tier-1

BNLWritebuffer

T1T1Tier-2s

BNLTape

Readstorage

AODm2


ESD2


ESD1 100MB

AODM1 20MB

RAW 64MB

ESD2 80MB (80% EST from T1s)

AODM2 16MB

ESD2 20MB

AODM2 36MB

CPUfarm

BNL Data Flow (2008)

88MB (RAW, ESD, AOD)88MB (RAW, ESD, AOD)

350MB 350MB

(including raw data)(including raw data)

(Analysis AOD)500MB

200MB(?)200MB(?)

(304MB*20%)~60MB, Simu(304MB*20%)~60MB, Simu

60MB (Tier 2)60MB (Tier 2)

29

ATLAS SC4 Tests

Complete Tier-0 testComplete Tier-0 test Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape,

CPU farm Calibration loop and handling of conditions data

Including distribution of conditions data to Tier-1s (and Tier-2s)

Transfer of RAW, ESD, AOD and TAG data to Tier-1s Transfer of AOD and TAG data to Tier-2s via Tier 1 Data and dataset registration in DB (add meta-data information to meta-data

DB)

Distributed productionDistributed production Full simulation chain run at Tier-2s (and Tier-1s)

Data distribution to Tier-1s, other Tier-2s and CAF

Reprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF

Distributed analysisDistributed analysis “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) Tests of performance of job submission, distribution and output retrieval

30

ATLAS SC4 Plans (1)

Tier-0 data flow tests:Tier-0 data flow tests: Phase 0: 3-4 weeks in March-April for internal Tier-0 tests

Phase 1: last 3 weeks of June with data distribution to Tier-1s Run integrated data flow tests using the SC4 infrastructure for data

distribution Send AODs to (at least) a few Tier-2s Automatic operation for O(1 week) First version of shifter’s interface tools Treatment of error conditions

Phase 2: 3-4 weeks in September-October Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data

31

ATLAS SC4 Plans (2)

ATLAS includes continuous distributed simulation productions ATLAS includes continuous distributed simulation productions

(Kaushik)(Kaushik)

SC4: distributed reprocessing tests:SC4: distributed reprocessing tests: Test of the computing model using the SC4 data management

infrastructure Needs file transfer capabilities between Tier-1s and back to CERN CAF Also distribution of conditions data to Tier-1s (3D) Storage management is also an issue

Could use 3 weeks in July and 3 weeks in October

SC4: distributed simulation intensive tests:SC4: distributed simulation intensive tests: Once reprocessing tests are OK, we can use the same infrastructure to

implement our computing model for simulation productions As they would use the same setup both from our ProdSys and the SC4 side

First separately, then concurrently.

32

Overview of requirements for SC4

SRM (“baseline version”) on all storagesSRM (“baseline version”) on all storages

VO Box per Tier-1 and in Tier-0VO Box per Tier-1 and in Tier-0

LFC server per Tier-1 and in Tier-0LFC server per Tier-1 and in Tier-0

FTS server per Tier-1 and in Tier-0FTS server per Tier-1 and in Tier-0

Permanent Storage and Durable Storage.Permanent Storage and Durable Storage. separate SRM entry points for permanent and durable storages.

Disk space is managed by DQ2.

Counts as online (“disk”) data in the ATLAS Computing Model

Ability to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO BoxAbility to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO Box

Ability to deploy DQ2 services on VO Box as during SC3Ability to deploy DQ2 services on VO Box as during SC3

NNo new requirements on the Tier-2s besides SRM SEo new requirements on the Tier-2s besides SRM SE

33

Overview of FTS and VO Box

Hence, an ATLAS VO Box will contain:Hence, an ATLAS VO Box will contain: FTS ATLAS agents And remaining DQ2 persistent services (less s/w than for SC3 as some functionality

merged into FTS in the form of FTS VO agents)

DQ2 site services will have associated SFTs for testingDQ2 site services will have associated SFTs for testing

34

ATLAS SC4 Requirement (PPS)

Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to test our distributed systems (ProdSys, DDM, DA) prior to deploymenttest our distributed systems (ProdSys, DDM, DA) prior to deployment It would allow testing new m/w features without disturbing other operations We could also tune properly the operations on our side The aim is to get to the agreed scheduled time slots with an already tested

system and really use the available time for relevant scaling tests This setup would not interfere with concurrent large-scale tests or data

transfers run by other experiments

USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC...

Documents

Transcript of USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC...