STEP’09: Last challenge before data taking

17
STEP’09: LAST CHALLENGE BEFORE DATA TAKING Patricia Méndez Lorenzo (CERN, IT/GS) ALICE Offline Week Grid Status & Experience session CERN, 24/06/09

description

STEP’09: Last challenge before data taking. Patricia Méndez Lorenzo (CERN, IT/GS) ALICE Offline Week Grid Status & Experience session CERN, 24/06/09. Outlook. CCRC’08: Reminder What is STEP’09: Origin STEP’09 for ATLAS, CMS and LHCb STEP’09 for ALICE Goals and Results - PowerPoint PPT Presentation

Transcript of STEP’09: Last challenge before data taking

Page 1: STEP’09: Last challenge  before  data  taking

STEP’09: LAST CHALLENGE BEFORE DATA TAKINGPatricia Méndez Lorenzo (CERN, IT/GS)

ALICE Offline WeekGrid Status & Experience sessionCERN, 24/06/09

Page 2: STEP’09: Last challenge  before  data  taking

2

STEP'09: last challenge before data taking

OUTLOOK

CCRC’08: Reminder What is STEP’09: Origin STEP’09 for ATLAS, CMS and LHCb STEP’09 for ALICE

Goals and Results Summary and Conclusions

24/06/09

Page 3: STEP’09: Last challenge  before  data  taking

3

STEP'09: last challenge before data taking

WHAT IS STEP’09: CCRC’08 REMINDER

WLCG Common Computing Readiness Challenge 2008 (CCRC’08)

It was the first big WLCG Service Challenge which joined the 4 experiments together

Proposed by CMS and ATLAS during the pre-CHEP WLCG WS in Victoria (2007) Goal: measurement of the readiness of the Grid

services and operations before the real data taking

Complementary to the experimentsFull Dress Rehearsals

Distributed in two phases: Feb and May 2008

24/06/09

Page 4: STEP’09: Last challenge  before  data  taking

4

STEP'09: last challenge before data taking

ALICE RESULTS DURING THE CCRC’0824/06/09Slides taken from the

CCRC08 Post-Mortem WS at CERN (June 2008)

Page 5: STEP’09: Last challenge  before  data  taking

5

STEP'09: last challenge before data taking

WHAT IS STEP’09: ORIGIN

WLCG Scale Testing for the Experiment Program at the WLCG 2009: STEP’09

Proposed by CMS during the WLCG pre-CHEP WS in Prague (2009) Scheduled for June 2009

Similar scope to CCRC’08 with special emphasis to data management (data recording, MSS behaviour and transfers) STEP’09 post-mortem in July All experiments presented their programs during

WLCG GDB in April 2009

24/06/09

Page 6: STEP’09: Last challenge  before  data  taking

6

STEP'09: last challenge before data taking

STEP’09 FOR CMSTests ObjectivesT0 multi-VO tape recording at full rate

• Test writing to tape at T0 at full data taking rate and overlapping with other VOs• Sustained writing to tape for several days

T1 archiving and processing at requested scale

Stress tests of the MSS at scale with all concurrent T1 workflows (pre-staging specially relevant)

Transfer tests at requested scale

• Special emphasis in T1-T1 tests• Transfer tests can be easily run among any Tiers in parallel to other VOs to evaluate overlap (not needed by CMS)

Analysis at commissioned T2 at requested scale

CMS should have analysis at a scale that uses all pledged resources at T2

24/06/09

In common with ALICE

Page 7: STEP’09: Last challenge  before  data  taking

7

STEP'09: last challenge before data taking

STEP’09 FOR ATLASTests ObjectivesDDM Functional tests • Tests the full ATLAS data

placement model including tape (RAW) writing• ATLAS ready to create nominal load and file sizes• T0-T1 average rate: 940MB/s• Calibration data distribution also foreseen

Simulation production • G4 HITS Production• HITS production at T2 and upload to T1•HITS merging in T1 and archive on tape• MC reconstruction at T1 only•Pre-staging of merged HITS from tape•Output AOD’s merged to tape and distributed to other clouds

Repeat Cosmic Ray Data Re‐processing

RAW pre‐staging from Tape and data access from the WN’s

Run Hammer cloud in all clouds

• Loads CPU capacity in T2’s• Tests data access to the WN’s

24/06/09

In common with ALICE

Page 8: STEP’09: Last challenge  before  data  taking

8

STEP'09: last challenge before data taking

STEP’09 FOR LHCB

Participation in STEP’09 as part of their specific Full Experiment Test (FEST’09)

LHCb goals Data injection into the HLT farm

File size can be tuned Distribution to T1 sites

Using standard share Reconstruction at T1 sites

Long enough queues at the sites are needed Storage Requirements

3.5 TB/day for RAW (T1D0) at Tier0 < 1TB/day for RAW at Tier1s

24/06/09

In common with ALICE

Page 9: STEP’09: Last challenge  before  data  taking

9

STEP'09: last challenge before data taking

STEP’09 FOR ALICE

Grid activities Replication T0->T1

Planned together with Cosmics data taking, or Repeat the exercise of CCRC’08 with same rates

(100MB/s) and same destinations (All T1 sites) Re-processing with data recalls from tape at T1

Highly desirable exercise, data already available at the T1 MSS storage

Non-Grid activities Transfer rate tests from DAQ@PIT to CASTOR

Validation of the new CASTOR and xrootd for RAW Critically dependent on the availability of CASTOR

v2.1.8 Transfer rate test coupled with the 1st pass reco@T0

24/06/09

Page 10: STEP’09: Last challenge  before  data  taking

10

STEP'09: last challenge before data taking

ALICE NON-GRID ACTIVITIES

RAW data transfers from PIT to CASTOR Basically validated The goal was 1.25GB/sec for one week (just finished) DAQ managed to fill the entire alicedisk pool (850TB)

Validation and feedback of the CASTOR v2.1.8 and xrootd Very positive results the xrootd copy P2->Disk is basically validated second part is disk->tape copy (to a recyclabe pool of

tapes) with the same speed of 1.25GB/sec (this is Pb+Pb full rate) Activity still ongoing

Pass 1 reconstruction of RAW data at the T0 Still pending

24/06/09

Page 11: STEP’09: Last challenge  before  data  taking

11

STEP'09: last challenge before data taking

ALICE GRID ACTIVITIES: RESULTS

ALICE began the STEP09 exercise the 1st of June and finished it the 18th of June

Production results New record of 15000 concurrent jobs by the 1st

of June

24/06/09

New MC cycle

Page 12: STEP’09: Last challenge  before  data  taking

12

STEP'09: last challenge before data taking

PROBLEMS FACED BY ALICE: PRODUCTION

Instabilities with the CREAM-CE system at CERN The system has faced instabilities for some days Fully affecting the production by the 17th of June

Both CREAM-CE services down This morning the system came back in production

A power cut by the 18th of June voalice03 (CREAM VOBOX) could not be recovered In addition the VOBOXES will be out of warranty at the end of the

year 4 VOBOXES have been required (2 production, 2 backup)

New site entered production: CESGA (Santiago de Compostela, Spain) 800 jobs submitted for 29 CPUs Site was reporting 0 jobs running/waiting through VOview ALICE has changed the query to the info system based in

VOview

24/06/09

Page 13: STEP’09: Last challenge  before  data  taking

13

STEP'09: last challenge before data taking

ALICE FTS TRANSFERS

General result: Very successfull exercise during the whole STEP09 period New FTD module in production

During the whole period the 6 T1 sites were available with few issues always solved in the day

Very good support of the FTS experts during the whole period

24/06/09

ALICE requirement

Page 14: STEP’09: Last challenge  before  data  taking

14

STEP'09: last challenge before data taking

PROBLEMS FACED BY ALICE: TRANSFERS

Pre-staging on files: MEETING WITH FIO STILL PENDING The operation takes forever New files have to be created instead of pre-

staging those already existing Asked CMS and LHCb for their own procedures

CMS has implemented a Phedex utility at the client level for CASTOR sites able to make the pre-staging. Comparisons between methods using SRM APIS,

Manual pre-staging and also the same Phedex The staging speed in the 3 cases is comparable and CMS used the STEP09 exercise to define the best way

to define the pre-staging LHCb is using GFAL libs to make an asyn. pre-staging of

the files

24/06/09

Page 15: STEP’09: Last challenge  before  data  taking

15

STEP'09: last challenge before data taking

PROBLEMS FACED BY ALICE: TRANSFERS

Files overwritting: SOLVED This procedure would allow to perform a previous

removal of the already transferred file ALICE implemented correctly the corresponding

option however still failing FTS experts involved in the discussion:

the 'overwrite' flag is properly passed to the FTS agent, however it selects SRMv1 endpoint instead of SRMv2.2

While checking the details ALICE should chose the qualified SURL to ensure the usage of SRMv2.2.

24/06/09

Page 16: STEP’09: Last challenge  before  data  taking

16

STEP'09: last challenge before data taking

PROBLEMS FACED BY ALICE: TRANSFERS

Issues per site NDGF using a wrong SURL while tranferring files:

SOLVED RAL: Permission denied to write in the

corresponding SE area (twice): SOLVED SARA: No space available (twice): SOLVED FZK: gridFTP issue. There was a problem of

dcache pools beeing filled up, and also a gpfs problem of not correctly reporting space: SOLVED

This week CERN: Transfers stucked for more than 60h. Still under investigation It seems some sites do not allow concurrent transfers

24/06/09

Page 17: STEP’09: Last challenge  before  data  taking

17

STEP'09: last challenge before data taking

SUMMARY AND CONCLUSIONS

STEP’09 has been the 2nd multi-VO exercise before the real data taking

Proposed by CMS during the pre-CHEP WS in Prague

ALICE emphasize the testing of the Data Management elements of the computing model Key elements for the 4 LHC experiments

ALICE results: Very good behaviour in terms of production, MSS@T1 and FTS transfers

The 4 LHC experiments will present their results during the new STEP’09 post-mortem WS in July at CERN (9-10 July)

24/06/09