USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC...
-
Upload
wendy-colleen-riley -
Category
Documents
-
view
215 -
download
0
Transcript of USATLAS SC4. 2 ?! 130.199.48.0…… 130.199.185.0 130.199.48.0 The same host name for dual NIC...
USATLAS SC4
2
?!?!
130.199.48.0130.199.48.0
……
……
130.199.185.0130.199.185.0
130.199.48.0130.199.48.0
The same host name for dual NIC dCache door is resolved to different IP addresses depending on which DNS is inquired.
3
4
20 Gb/s
NSF RAID
(20 TB)
HPSS Mass Storage System
Gridftp
(2 nodes / 0.8 TB local) HRM SRM
(1 node)
dCache SRM
(1 node)
Gridftp door
(4 nodes)
WAN2x10 Gb/s
LHC OPN VLAN
2 x 1 Gb/s1 Gb/s
Write Pool(10 nodes / 2.1 RAID5 TB)
Read Pool (314 nodes / 145 TB)
5 x 1 Gb/sTier 1 VLANS
20 Gb/s
4 x 1 Gb/s
dCache
. . . . N x 1 Gb/s . . . .
20 Gb/s
Logical Connections
BNL Tier 1 WAN Storage Interfaces and Logic View
5
SC4 Throughput Phase
6
SC4 Throughput Phase Summary
All data transfer were bypassing BNL firewall for high performance.All data transfer were bypassing BNL firewall for high performance.
BNL had achieved/exceeded BNL USATLAS Tier 2 Mou target for data BNL had achieved/exceeded BNL USATLAS Tier 2 Mou target for data transfer. One of best WLCG Tier 1 site.transfer. One of best WLCG Tier 1 site.
We gained experience of serving USATLAS production and Service We gained experience of serving USATLAS production and Service challenge on the same dCache system simultaneously. BNL is the only challenge on the same dCache system simultaneously. BNL is the only Tier 1 site doing this. Tier 1 site doing this.
Identified several performance bottlenecks among the stack of USATLAS Identified several performance bottlenecks among the stack of USATLAS data manager and data transfer (Panda DQ2 FTS, dCache, network) data manager and data transfer (Panda DQ2 FTS, dCache, network) which can impact both SC4 and USATLAS production. which can impact both SC4 and USATLAS production.
Fixed the dCache bottleneck by separating core services into multiple Fixed the dCache bottleneck by separating core services into multiple high performance hosts, creating dedicated resources for multiple ATLAS high performance hosts, creating dedicated resources for multiple ATLAS data transfer activities, and tuning memory, file system and database. data transfer activities, and tuning memory, file system and database.
Evaluated the new dCache release. Evaluated the new dCache release.
7
SC4 Service Phase (All ATLAS Tier 1 site)
8
SC4 Service Phase (DQ2 monitoring)
9
SC4 Service Phase Summary
DQ2 coordinated data transfer. DQ2 coordinated data transfer.
BNL provided tape storage for RAW data export and disk only storage for ESD BNL provided tape storage for RAW data export and disk only storage for ESD and AOD.and AOD.
dCache was significantly improved compared with throughput phase. (Thanks for dCache was significantly improved compared with throughput phase. (Thanks for the lessons learned from April). It can easily handle the data transfer the lessons learned from April). It can easily handle the data transfer requirement from Panda OSG production and SC4 ATLAS Tier 0 export.requirement from Panda OSG production and SC4 ATLAS Tier 0 export.
The data transfer channels from other Tier 1 site (except CNAF) to BNL were The data transfer channels from other Tier 1 site (except CNAF) to BNL were verified by Hiro Ito (BNL DDM operation). (CNAF was upgrading their SRM).verified by Hiro Ito (BNL DDM operation). (CNAF was upgrading their SRM).
The ATLAS DDM coordinated data transfer between USATLAS Tier 2 and Tier 1 The ATLAS DDM coordinated data transfer between USATLAS Tier 2 and Tier 1 are well ahead in schedule. (Thanks to OSG Panda production).are well ahead in schedule. (Thanks to OSG Panda production).
We verified the integrated data flow of ATLAS Tier 0 AOD export from Tier 0 to We verified the integrated data flow of ATLAS Tier 0 AOD export from Tier 0 to BNL, then to US Midwest Tier 2 site. BNL, then to US Midwest Tier 2 site.
The data flow between BNL and CERN are using LHCOPN which provides 10 The data flow between BNL and CERN are using LHCOPN which provides 10 Gbps Layer 2 network connection between NYC and CERN. No STARLight is Gbps Layer 2 network connection between NYC and CERN. No STARLight is involved.involved.
10
11
Meeting Notes
Use Dual-home dCache doorsUse Dual-home dCache doors
The external interface of doors are in 192.12.15.0The external interface of doors are in 192.12.15.0
The internal interface of the doors are in 130.199.185.0.The internal interface of the doors are in 130.199.185.0.
The data flow (in/out) will always go through doors. The data flow (in/out) will always go through doors.
Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal Use External/Internal DNS to resolve the same host name of doors to the external IP address/internal
IP address, determined by which DNS is used. IP address, determined by which DNS is used.
Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7. Bring the routing for 130.199.185.0 and 130.199.48.0/23 back to USATLAS SW7.
Request ACL for VLAN 315(?) which 192.12.15.0 reside.Request ACL for VLAN 315(?) which 192.12.15.0 reside. One end: LHC OPN address blocks or 3+2 Tier 2s.
The other end will be 192.12.15.0.
What about other T3 sites to contact with the external interface of dCache doors?What about other T3 sites to contact with the external interface of dCache doors? Need to go through firewall or not?
Two types of storage (Durable and Permanent)Two types of storage (Durable and Permanent) When we received ESD2, the ESD1 will be discarded. Therefore, we do not need to save ESD to HPSS. We
need them, we can get from other Tier 0 and Tier 1 sites.
RAW, our fraction of ESD, AOD, Tier 2 simulation results => Permanent which has tape backend.
Other ESD, AOD will go to durable storage which is not necessarily backed up by tape system.
12
BNL SC4 Plans
VLAN 315 can send network traffic? VLAN 315 can send network traffic?
FTS and LFC will be setup.FTS and LFC will be setup.
LCG 2.7.0LCG 2.7.0 VObox: We also installed ATLAS DQ2 installed on top of it (done) BDII provide static and dynamic monitoring information (STATIC Setup?) R-GMA provide traffic monitoring from Tier 1 to Tier 2. (Plan to make it
available before SC4 Service Phase) CE is based on BNL condor system (Plan to be ready before SC4 service
phase June) Lcg-utils (done)
dCache Preparation (Durable, Permanent, Information Publish).dCache Preparation (Durable, Permanent, Information Publish). Permanent
System manages cache, tape copy, Access sometimes slow
Durable User (VO) manages cache, WITHOUT tape copy, Access fast
13
Publish Information for BNL dCache
List of transfer protocols per SE available from information systemList of transfer protocols per SE available from information system SRM knows what it supports, can inform client
FTS Channel Information.FTS Channel Information.
LFC Information.LFC Information.dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain, ... dn: GlueSALocalID=dteam-durable,GlueSEUniqueID=dcache.my_domain, ...
[...][...]
GlueSARoot: dteam:/pnfs/my_domain/GlueSARoot: dteam:/pnfs/my_domain/durable-pathdurable-path/dteam/dteam
GlueSAPath: /pnfs/my_domain/GlueSAPath: /pnfs/my_domain/durable-pathdurable-path/dteam/dteam
GlueSAType: GlueSAType: durabledurable
[...][...]
GlueChunkKey: GlueSEUniqueID=dcache.my_domainGlueChunkKey: GlueSEUniqueID=dcache.my_domain
[...][...]
dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain, ...dn: GlueSALocalID=dteam-permanent,GlueSEUniqueID=dcache.my_domain, ...
[...] [...]
GlueSARoot: dteam:/pnfs/my_domain/GlueSARoot: dteam:/pnfs/my_domain/permanent-pathpermanent-path/dteam/dteam
GlueSAPath: /pnfs/my_domain/GlueSAPath: /pnfs/my_domain/permanent-pathpermanent-path/dteam/dteam
GlueSAType: GlueSAType: permanentpermanent
[...][...]
GlueChunkKey: GlueSEUniqueID=dcache.my_domainGlueChunkKey: GlueSEUniqueID=dcache.my_domain
14
SC4 Pre-Production System
Pre-production service will be used as soon as it is available and its Pre-production service will be used as soon as it is available and its
usage won't go away when SC4 starts. There may be periods where the usage won't go away when SC4 starts. There may be periods where the
pre-production service is not extensively used, but the goal is from now on pre-production service is not extensively used, but the goal is from now on
to always develop against the pre-production service. to always develop against the pre-production service.
Atlas VOBOX LFC SRM FTS server Channels
Tier-0 Y(Shared with PROD) Y(local) Y Y to and from all T1s Y Y
Tier-1 Y(Shared with PROD) Y(local) Y YTo and from all the other T1s;
To and from all associated T2sY Y
Tier-2 N N Y N N N N
15
SC4 April Throughput
Need dCache!!!Need dCache!!!
April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an April 3rd (Monday) - April 13th (Thursday before Easter) - sustain an
average daily rate to each Tier1 at or above the full nominal rate average daily rate to each Tier1 at or above the full nominal rate
(200MB/Second). (200MB/Second).
We should continue to run at the same rates unattended over Easter We should continue to run at the same rates unattended over Easter
weekend (14 - 16 April).weekend (14 - 16 April).
Tuesday April 18th - Monday April 24th we should perform the tape Tuesday April 18th - Monday April 24th we should perform the tape
tests at the rates in the table below (75 MB/second). tests at the rates in the table below (75 MB/second).
From after the con-call on Monday April 24th until the end of the month From after the con-call on Monday April 24th until the end of the month
experiment-driven transfers can be scheduled. (LFC will be needed by experiment-driven transfers can be scheduled. (LFC will be needed by
then for DQ2).then for DQ2).
16
SC4 Tier 1 to Tier 1 Data Transfer (May)
Within Each VO, the details of the T1<->T1 transfers still need to be finalized. A "dTeam" phase should be foreseen, to ensure that the basic infrastructure is setup. Similarly for T1->T2. A possible scenario follows:
We have to focus on our two sister Tier 1 site: IN2P3 and FZK first.
All Tier1s need to setup an FTS service and configure channels to enable transfers to/from all other Tier1s.
dTeam transfers at 5MB/s (10MB/s?) need to be demonstrated between each T1 and all other T1s
These tests would take place during May, after the April throughput tests and before the SC4 service begins in June.
17
ATLAS Specific Plan
Plans (ATLAS)Plans (ATLAS)
Tier 2 PlansTier 2 Plans
Tier 2 WorkshopTier 2 Workshop
Background Information (Darios Slides)Background Information (Darios Slides)
18
Summary of requests from ATLAS
March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)March-April (pre-SC4): 3-4 weeks in for internal Tier-0 tests (Phase 0)
April-May (pre-SC4): tests of distributed operations on a “small” testbed April-May (pre-SC4): tests of distributed operations on a “small” testbed (PPS)(PPS)
Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to Last 3 weeks of June: Tier-0 test (Phase 1) with data distribution to
Tier-1s (Tier-1s (720MB/s + full ESD to BNL), and Send AODs to (at Send AODs to (at least) a few Tier-2sleast) a few Tier-2s
3 weeks in July: distributed processing tests (Part 1)3 weeks in July: distributed processing tests (Part 1)
2 weeks in July-August: distributed analysis tests (Part 1)2 weeks in July-August: distributed analysis tests (Part 1)
3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with 3-4 weeks in September-October: Tier-0 test (Phase 2 of Part 1) with data to Tier-2sdata to Tier-2s
3 weeks in October: distributed processing tests (Part 2)3 weeks in October: distributed processing tests (Part 2)
3-4 weeks in November: distributed analysis tests (Part 2)3-4 weeks in November: distributed analysis tests (Part 2)
19
Tier 2 Plans
Details of involving Tier 2 are in planning too. Details of involving Tier 2 are in planning too. Tier 2 dCache: dCache needs to be stabilize and operational in one
or all sites at Midwest, southwest and Northwest ( first week of
June) for receiving AODs to (at least) a few Tier-2s.
All Tier 2 dCache should be up and in production in September Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data
Base line client tools should be deployed at Tier 2 centers.
No any other services required for Tier 2 except SRM and DQ2.
20
WLCG Tier 2 Workshop
https://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorialshttps://twiki.cern.ch/twiki/bin/view/LCG/WorkshopAndTutorials
http://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_mhttp://indico.cern.ch/conferenceDisplay.py?confId=1148&view=egee_meeting&showDate=all&showSession=all&detailLevel=contributioneeting&showDate=all&showSession=all&detailLevel=contribution from Monday 12 June 2006 (11:00) to Wednesday 14 June 2006 (18:00)
at CERN ( Council Chamber )
Four Experiment Activities Introduction.
MC Simulation User Cases
An Overview of Calibration & Alignment
Analysis Use Cases
Services Required at / for Tier2s (Grid, Application).
Support and Operation Issues.
Happen in the middle of June.
ATLAS plans for 2006:
Computing System Commissioning
and Service Challenge 4
Dario BarberisDario Barberis
CERN & Genoa UniversityCERN & Genoa University
22
Computing System Commissioning Goals
Main aim of Computing System Commissioning will be to Main aim of Computing System Commissioning will be to
test the software and computing infrastructure that we will test the software and computing infrastructure that we will
need at the beginning of 2007:need at the beginning of 2007: Calibration and alignment procedures and conditions DB
Full trigger chain
Event reconstruction and data distribution
Distributed access to the data for analysis
At the end (At the end (autumn-winter autumn-winter 2006) we will have a working 2006) we will have a working
and operational system, ready to take data with cosmic and operational system, ready to take data with cosmic
rays at increasing ratesrays at increasing rates
23
ATLAS Computing Model
Tier-0:Tier-0: Copy RAW data to Castor tape for archival Copy RAW data to Tier-1s for storage and reprocessing Run first-pass calibration/alignment (within 24 hrs) Run first-pass reconstruction (within 48 hrs) Distribute reconstruction output (ESDs, AODs & TAGS) to Tier-1s
Tier-1s:Tier-1s: Store and take care of a fraction of RAW data Run “slow” calibration/alignment procedures Rerun reconstruction with better calib/align and/or algorithms Distribute reconstruction output to Tier-2s Keep current versions of ESDs and AODs on disk for analysis
Tier-2s:Tier-2s: Run simulation Keep current versions of AODs on disk for analysis
24
ATLAS Tier-0 Data Flow
EF
CPUfarm
T1T1T1sCastorbuffer
RAW
1.6 GB/file0.2 Hz17K f/day320 MB/s27 TB/day
ESD
0.5 GB/file0.2 Hz17K f/day100 MB/s8 TB/day
AOD
10 MB/file2 Hz170K f/day20 MB/s1.6 TB/day
AODm
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
RAW
AOD
RAW
ESD (2x)
AODm (10x)
RAW
ESD
AODm
0.44 Hz37K f/day440 MB/s
1 Hz85K f/day720 MB/s
0.4 Hz190K f/day340 MB/s
2.24 Hz170K f/day (temp)20K f/day (perm)140 MB/s
Tape
25
Recent Update for Tier 0 Tier 1 Data Transfer
Location Fract. RAW ESD AODm1 Total RateBrookhaven 24 76.8 100 20 196.8Amsterdam 13 41.6 26 20 87.6Lyon 13.5 43.2 27 20 90.2Karlsruhe 10.5 33.6 21 20 74.6Didcot 7.5 24 15 20 59Taipei 7.7 24.6 15.4 20 60Bologna 7.5 24 15 20 59distributed 5.5 17.6 11 20 48.6Barcelona 5.5 17.6 11 20 48.6Vancouver 5.3 17 10.6 20 47.6Total 100 320 252 200 772
26
BNL Data Flow (2008 Based on 20%)
Tier-0
CPUfarm
T1T1OtherTier-1s
BNLdisk
buffer
RAW
1.6 GB/file0.04 Hz3.4K f/day64 MB/s5.4 TB/day
ESD2
0.5 GB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AOD2
10 MB/file0.4 Hz34K f/day4 MB/s0.32 TB/day
AODm2
500 MB/file0.004 Hz0.34K f/day4 MB/s0.32 TB/day
RAW
ESD2
AODm2
0.088 Hz7.48K f/day88 MB/s7.32 TB/day
T1T1OtherTier-1s
T1T1Tier-2s
BNLTape
RAW
1.6 GB/file0.04 Hz3.4K f/day64 MB/s5.4 TB/day
diskstorage
AODm2
500 MB/file0.008 Hz0.68K f/day4 MB/s0.32 TB/day
ESD2
0.5 GB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AOD2
10 MB/file0.4 Hz34K f/day4 MB/s0.32 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day80 MB/s0.8 TB/day
AODm2
500 MB/file0.03 Hz3.0K f/day16 MB/s1.44 TB/day
ESD2
0.5 GB/file0.02 Hz1.7K f/day20 MB/s1.6 TB/day
AODm2
500 MB/file0.036 Hz3.1K f/day4*9 MB/s1.44 TB/day
ESD1
0.5 GB/file0.2 Hz17K f/day100 MB/s8 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
AODm1
500 MB/file0.04 Hz3.4K f/day20 MB/s*31.6 TB/day
AODm2
500 MB/file0.008 Hz0.70 f/day4 MB/s*30.32 TB/day
Plus simulation Plus simulation && analysis data analysis data
flowflow
Real data storage, reprocessing and
distribution
234MB*nanalysis
27
BNL to 3+2 Tier2 (Estimation!)
See See https://https://uimonuimon..cerncern..chch//twikitwiki/bin/view/Atlas/Tier1DataFlow/bin/view/Atlas/Tier1DataFlow
Tier 1 to Tier 2 likely to be very bursty and driven by analysis demandsTier 1 to Tier 2 likely to be very bursty and driven by analysis demands
Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of Network to Tier 2 are expected to be a fraction of 10Gbps (UC 30% of
10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.).10 Gbps is allocated, opportunistic usage may bump up to 10Gbps.). Desire to reach 100MBs for each of 3+2 Tier 2 clusters.
300MB/second ~ 500MB/second in total to BNL.
Tier 2 to Tier 1 transfer are almost entirely continuous simulation Tier 2 to Tier 1 transfer are almost entirely continuous simulation
transferstransfers The aggregate input rate to Tier 1 center is comparable to 20%~25% of the
rate from tier 0.
28
Tier-0Tier-1
BNLWritebuffer
T1T1Tier-2s
BNLTape
Readstorage
AODm2
500 MB/file0.008 Hz0.68K f/day4 MB/s0.32 TB/day
ESD2
0.5 GB/file0.04 Hz3.4K f/day20 MB/s1.6 TB/day
ESD1 100MB
AODM1 20MB
RAW 64MB
ESD2 80MB (80% EST from T1s)
AODM2 16MB
ESD2 20MB
AODM2 36MB
CPUfarm
BNL Data Flow (2008)
88MB (RAW, ESD, AOD)88MB (RAW, ESD, AOD)
350MB 350MB
(including raw data)(including raw data)
(Analysis AOD)500MB
200MB(?)200MB(?)
(304MB*20%)~60MB, Simu(304MB*20%)~60MB, Simu
60MB (Tier 2)60MB (Tier 2)
29
ATLAS SC4 Tests
Complete Tier-0 testComplete Tier-0 test Internal data transfer from “Event Filter” farm to Castor disk pool, Castor tape,
CPU farm Calibration loop and handling of conditions data
Including distribution of conditions data to Tier-1s (and Tier-2s)
Transfer of RAW, ESD, AOD and TAG data to Tier-1s Transfer of AOD and TAG data to Tier-2s via Tier 1 Data and dataset registration in DB (add meta-data information to meta-data
DB)
Distributed productionDistributed production Full simulation chain run at Tier-2s (and Tier-1s)
Data distribution to Tier-1s, other Tier-2s and CAF
Reprocessing raw data at Tier-1s Data distribution to other Tier-1s, Tier-2s and CAF
Distributed analysisDistributed analysis “Random” job submission accessing data at Tier-1s (some) and Tier-2s (mostly) Tests of performance of job submission, distribution and output retrieval
30
ATLAS SC4 Plans (1)
Tier-0 data flow tests:Tier-0 data flow tests: Phase 0: 3-4 weeks in March-April for internal Tier-0 tests
Phase 1: last 3 weeks of June with data distribution to Tier-1s Run integrated data flow tests using the SC4 infrastructure for data
distribution Send AODs to (at least) a few Tier-2s Automatic operation for O(1 week) First version of shifter’s interface tools Treatment of error conditions
Phase 2: 3-4 weeks in September-October Extend data distribution to all (most) Tier-2s Use 3D tools to distribute calibration data
31
ATLAS SC4 Plans (2)
ATLAS includes continuous distributed simulation productions ATLAS includes continuous distributed simulation productions
(Kaushik)(Kaushik)
SC4: distributed reprocessing tests:SC4: distributed reprocessing tests: Test of the computing model using the SC4 data management
infrastructure Needs file transfer capabilities between Tier-1s and back to CERN CAF Also distribution of conditions data to Tier-1s (3D) Storage management is also an issue
Could use 3 weeks in July and 3 weeks in October
SC4: distributed simulation intensive tests:SC4: distributed simulation intensive tests: Once reprocessing tests are OK, we can use the same infrastructure to
implement our computing model for simulation productions As they would use the same setup both from our ProdSys and the SC4 side
First separately, then concurrently.
32
Overview of requirements for SC4
SRM (“baseline version”) on all storagesSRM (“baseline version”) on all storages
VO Box per Tier-1 and in Tier-0VO Box per Tier-1 and in Tier-0
LFC server per Tier-1 and in Tier-0LFC server per Tier-1 and in Tier-0
FTS server per Tier-1 and in Tier-0FTS server per Tier-1 and in Tier-0
Permanent Storage and Durable Storage.Permanent Storage and Durable Storage. separate SRM entry points for permanent and durable storages.
Disk space is managed by DQ2.
Counts as online (“disk”) data in the ATLAS Computing Model
Ability to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO BoxAbility to install FTS ATLAS VO agents on Tier-1 and Tier-0 VO Box
Ability to deploy DQ2 services on VO Box as during SC3Ability to deploy DQ2 services on VO Box as during SC3
NNo new requirements on the Tier-2s besides SRM SEo new requirements on the Tier-2s besides SRM SE
33
Overview of FTS and VO Box
Hence, an ATLAS VO Box will contain:Hence, an ATLAS VO Box will contain: FTS ATLAS agents And remaining DQ2 persistent services (less s/w than for SC3 as some functionality
merged into FTS in the form of FTS VO agents)
DQ2 site services will have associated SFTs for testingDQ2 site services will have associated SFTs for testing
34
ATLAS SC4 Requirement (PPS)
Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to Small testbed with (part of) CERN, a few Tier-1s and a few Tier-2s to test our distributed systems (ProdSys, DDM, DA) prior to deploymenttest our distributed systems (ProdSys, DDM, DA) prior to deployment It would allow testing new m/w features without disturbing other operations We could also tune properly the operations on our side The aim is to get to the agreed scheduled time slots with an already tested
system and really use the available time for relevant scaling tests This setup would not interfere with concurrent large-scale tests or data
transfers run by other experiments