Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the...

20
efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc- federated-xrootd working group S&C week Jun 2, 2014

Transcript of Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the...

Page 1: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

FAX status reportIlija Vukoticon behalf of the atlas-adc-federated-xrootd working group

S&C weekJun 2, 2014

Page 2: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

2

Content

• Status– Coverage– Traffic– Failover– Overflow

• Changes in localSetupFAX • Monitoring changes

– Changes in GLED collector, dashboard– Failover & overflow monitoring– FaxStatusBoard

• Meetings – Tutorial – 23 -27 June – dedicated to instructing on xAOD and the

new analysis model – ROOTIO – 25-27 June

Page 3: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

3

FAX topology

Topology change in North America• added East and

West• will serve CA cloud• all hosted at BNL

Will need NL cloud redirector

Page 4: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

4

FAX in Europe

To come:SaraNikhefIL cloud - IL-TAU, Technion, Weizmann

Page 5: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

5

FAX in North America To come:TRIUMF (June?)McGill (end of June)SCINET (end of June)Victoria (~August)

Page 6: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

6

FAX in Asia

To come:Beijing (~two weeks)TokyoAustralia (few weeks)

Page 7: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

7

Status

• Most sites running stably• Glitches do happen but

are fixed usually in few hours

• SSB issues solved• New sites added

– IFAE– PIC– IN2P3-LPC

• In need of restart:– UNIBE-LHEP

Page 8: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

8

Coverage

• Now auto-updated Twiki page– https://twiki.cern.ch/twiki/bin/view/AtlasComputing/FaxCoverage

• Coverage is good (~85%), but we should aim for >95% !• Info fetched from

http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary

Page 9: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

9

Traffic• Slowly increasing• Max peak output record broken• Still small to what we expect will come

Page 10: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

10

Failover • Running stably

Page 11: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

11

Overflow status

• All the chain ready

• I have set all the US queues to allow 3 Gbps to be both delivered to and delivered from sites.

• Test tasks submitted to sites that don’t have the data so that transfertype=FAX is invoked.

• This does not test the JEDI decision making (the one based on cost matrix)

• Waiting for actual jobs to check the full chain– Users not yet instructed to use JEDI client

– Waiting for JEDI monitor

Page 12: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

12

Overflow tests

• Test is the hardest IO test – 100% events, all branches read, standard TTC/no AsyncPrefetch.

• Site specific FDR datasets (10 DSs, 744 files, 2.7TB) • All the source/destination combinations of US sites• All of it submitted in 3 batches, but not all started

simultaneously. Affected by priority degradation.• Three input files per job. • If site is copy2scratch pilot does xrdcp to scratch, if

not jobs access files remotely.

Page 13: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

13

Overflow tests

• Error rate– Total 9188 jobs– Finished 9052– Failed 117 – 1.3%

o 24 – OU reading OU (no FAX involved)o 66 – reading from WT2 (files are corrupted)o 27 – 0.29 % -actual FAX errors where SWT2 did not

deliver the files. Will be investigated.o The rest are “Payload run out of memory”

Page 14: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

14

Overflow tests

• Jobs reading from local scratch - for comparison

Direct access site Reading locallyPer job:• 7.2 MB/s• 67% CPU eff• 71 ev/s

Scout jobsScout jobs

Copy2scratch site

Per job:• 11.0 MB/s• 97% CPU eff• 109 ev/s

Page 15: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

15

Overflow tests

• Jobs reading remote sources

Direct access site Reading remotelyPer job:• 4.2 MB/s• 43% CPU eff• 42 ev/s

Direct access siteReading remotelyPer job:• 3.5 MB/s• 29% CPU eff• 34 ev/s

No saturationPossibly a start of saturation

Page 16: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

16

Overflow tests

• MWT2 reading from OU and SWT2 simultaneously• In aggregate reached 850 MB/s – limit for MWT2 at that

time.

Page 17: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

17

Cost matrix

destination

sour

ce

http://1-dot-waniotest.appspot.com/

Page 18: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

18

localSetupFAX

• Added command fax-ls – Made by Shuwei YE.– Will finally replace isDSinFAX– He will move all the other tools to Rucio

• Change in fax-get-best-redirector– Each time does three queries

o SSB to get endpoints and their statuso AGIS to get sites, hosting the endpointso AGIS to get site coordinates

– Each call returns hundreds of kb’s – Can’t scale to large number of requests– Solution:

o Made a GoogleAppEngine servlets that each 30 min take info from SSB and AGIS and deliver it from memory

o Information slimmed to what is actually needed: ~several kbo Now requests served in few tens of ms.o “Infinitely” scalable

Page 19: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

19

Monitoring – collector, dashboard• Problem: support of multi-VO sites• Meeting: Alex, Matevz, me• Issues:

– Site name: o ATLAS reports it o CMS not or badly, will fix it

– Requesting user’s VOo ATLAS does ito CMS not strict about it. US-CMS uses GUMS. Will fix it.

• Proposal:– During the summer Matevz develops XrdMon that can handle multi-VO

messages– Sends messages from multi-VO sites to a special “mixed” AMQ. Dashboard

splits traffic according to user’s VO.Details:https://docs.google.com/document/d/1Syx3_vkwCfc5lj2lQzbUUrKT0Je238w6lcwVL7IY1GY/edit#

Page 20: Efi.uchicago.edu ci.uchicago.edu FAX status report Ilija Vukotic on behalf of the atlas-adc-federated-xrootd working group S&C week Jun 2, 2014.

efi.uchicago.educi.uchicago.edu

20

Monitoring

• Failover– Not flexible enough

• Overflow– No monitoring yet– Need to compare jobs grouped by transfer type