FAX Performance

10
FAX PERFORMANCE TIM, Tokyo May 2013

description

FAX Performance. TIM, Tokyo May 2013. Performance. Metrics Data Coverage Number of users Percentage of successful jobs Total amount of data delivered Bandwidth usage Source Ganglia plots MonaLisa FAX Dashboard HC tests CostMatrix tests Special tests using dedicated resources. - PowerPoint PPT Presentation

Transcript of FAX Performance

Page 1: FAX Performance

FAX PERFORMANCE

TIM, Tokyo May 2013

Page 2: FAX Performance

IL I JA VUKOTIC [email protected] 2

PERFORMANCE

TIM, TOKYO, MAY 2013

Metrics Data Coverage Number of users Percentage of successful jobs Total amount of data delivered Bandwidth usage

Source Ganglia plots MonaLisa FAX Dashboard HC tests CostMatrix tests Special tests using dedicated

resources

better than 97%, more than 2 replicasmostly UofC, Prague users

Latest HC tests >99%~ 2PB/week

Page 3: FAX Performance

IL I JA VUKOTIC [email protected] 3

COST MATRIX

TIM, TOKYO, MAY 2013

destinationRate MB/s

BNL-ATLAS CERN-PROD DESY-HH INFN-ROMA1 LRZ-LMU MWT2 RAL-LCG2 SWT2_CPB UKI-LT2-QMUL UKI-SCOTGRID-GLASGOW

source

AGLT2 1.10 2.69 0.65 0.55 6.68 0.90 1.24 1.23BNL-ATLAS 57.50 0.69CERN-PROD 63.41 25.96 5.74 3.69 6.10 0.94 0.76 10.33 6.98DESY-HH 3.73 46.52 1.47 3.91 0.51 4.56 5.02IllinoisHEP 0.75 2.45 0.58 0.66 36.50 1.05 4.73 1.34INFN-FRASCATI 1.44 0.72 4.00 0.77 0.62 0.95 0.79INFN-NAPOLI-ATLAS 4.37 9.86 12.71 1.45 2.62 0.40 5.13INFN-ROMA1 4.08 5.30 9.29 1.58 1.94 0.42 4.66 4.68LRZ-LMU 3.96 11.95 2.13 39.97 8.31 0.43 0.69 8.80 5.82MPPMU 4.16 10.44 2.23 39.90 8.30 0.43 0.71 4.90 5.87MWT2 1.02 2.59 0.71 0.74 13.09 1.30 2.08 1.35OU_OCHEP_SWT2 0.57 0.57 0.55 0.39 1.87 2.03 0.71 0.91praguelcg2 2.58 2.96 1.48 2.14 2.49 0.44 0.53 3.95 1.76RAL-LCG2 3.08 1.56 1.37 2.00 34.14 10.55 4.07RU-Protvino-IHEP 0.94 1.29 0.85 0.95 1.03 0.51 0.45 1.69 1.54SWT2_CPB 0.81 3.73 0.67 0.74 27.28 43.29 5.86 1.50UKI-LT2-QMUL 3.02 2.44 1.26 1.46 1.38 2.19 0.42 1.60 3.46UKI-SCOTGRID-ECDF 1.94 3.53 1.28 1.12 4.48 0.69 0.39 2.08 4.22UKI-SCOTGRID-GLASGOW 7.63 4.55 5.12 1.96 2.01 2.86 1.09 0.53 7.77 10.34UKI-SOUTHGRID-OX-HEP 3.19 3.80 1.52 3.82 4.32 5.46 3.29WT2 15.68 0.76 3.22 0.61 0.60 10.51 0.82 3.53 1.34

A place to get idea on rate a single job can expect to see.Are our pipes really this full?

Let’s see other sources of information.

Page 4: FAX Performance

IL I JA VUKOTIC [email protected] 4

COST MATRIX VS. PERFSONAR

TIM, TOKYO, MAY 2013

Comparison of just one link in one direction:

source AGLT destination MWT2

Perfsonar info at 4 h intervals.

Can it be worker nodes links are saturating?

Page 5: FAX Performance

IL I JA VUKOTIC [email protected] 5

MWT2SLAC

AGLT2 BNL

CERN

CLOGGING THE PIPES Using HC submitted jobs submitted to 4 ANALY queues

AGLT2, BNL, MWT2, SLAC

Each site runs 300 jobs of two types – 50 in parallel xrdcp 3 files randomly chosen from SMWZ datasets prepared for FDR from

others Reads 10% of events from 3 file randomly chosen from FDR SMWZ from

others

Uploads time to finish, events/s, MB/s for each job, pandaid so jobs can be investigated

All jobs submitted through FDR web interface http://ivukotic.web.cern.ch/ivukotic/FDR/index.asp

All in parallel to other HC stress tests

TIM, TOKYO, MAY 2013

Page 6: FAX Performance

IL I JA VUKOTIC [email protected] 6

TESTS

0.17% failure rate !

TIM, TOKYO, MAY 2013

Page 7: FAX Performance

IL I JA VUKOTIC [email protected] 7

COPY Clearly not limited by WN links Assuming just 30 simultaneous

jobs worst case delivery rates are:

BNL to CERN: 75 MB/s CERN to AGLT2: 170 MB/s MWT2 to AGLT2: 100 MB/s AGLT to CERN: 90 MB/s SLAC to BNL: 300 MB/s

Average WAN access ~ 300 MB/s

TIM, TOKYO, MAY 2013

MB/s BNL-ATLAS CERN-PROD MWT2 AGLT2 SLAC

source

BNL-ATLAS 86.09 2.51 13.41 8.97  CERN-PROD 11.53 76.38 10.56 5.76  

MWT2 11.1 1.91 27.08 3.32  AGLT2 22.41 2.9 23.08 65.49  SLAC 9.87 2.06 8.76 6.38 82.71

Page 8: FAX Performance

IL I JA VUKOTIC [email protected] 8

READ Jobs were reading 10% of events using TTC

30MB

100% data are transferred and decompressed.

ROOT can decompress our D3PD at ~20 MB/s

Rates are the same as for xrdcp except when local access.

Over WAN one should expect at least 50% of CPU efficiency of local access.

Less than 100 simultaneous standard analysis jobs will saturate 10 Gb WAN link.

FAX needs to be used judiciously, can easily overwhelm weaker links

TIM, TOKYO, MAY 2013

READ destinationevents/s BNL-ATLAS CERN-PROD MWT2 AGLT2 SLAC

source

BNL-ATLAS 163.7 19.61 48.24 91.99  CERN-PROD 62.07 224.23 46.63 63.35  

MWT2 116.15 11.69 141.07 34.94  AGLT2 75.41 17.42 179.38  SLAC 48.67 8.8 34.92 43.08 92.46

Page 9: FAX Performance

IL I JA VUKOTIC [email protected] 9

MONA LISA

TIM, TOKYO, MAY 2013

Page 10: FAX Performance

IL I JA VUKOTIC [email protected] 10

WAYS AHEAD

TIM, TOKYO, MAY 2013

Increase coverage, add redundancy, increase total bandwidth Enlargement

Increases performance, reduces bandwidth needs

Caching Cost matrix – smart FAX Smart network - Bandwidth requests, QOS assurance

Improve adoption rate Presenting, teaching, preaching

New services

Improve satisfaction FAX tuning

Application tuning

New services