FAX Performance
description
Transcript of FAX Performance
FAX PERFORMANCE
TIM, Tokyo May 2013
IL I JA VUKOTIC [email protected] 2
PERFORMANCE
TIM, TOKYO, MAY 2013
Metrics Data Coverage Number of users Percentage of successful jobs Total amount of data delivered Bandwidth usage
Source Ganglia plots MonaLisa FAX Dashboard HC tests CostMatrix tests Special tests using dedicated
resources
better than 97%, more than 2 replicasmostly UofC, Prague users
Latest HC tests >99%~ 2PB/week
IL I JA VUKOTIC [email protected] 3
COST MATRIX
TIM, TOKYO, MAY 2013
destinationRate MB/s
BNL-ATLAS CERN-PROD DESY-HH INFN-ROMA1 LRZ-LMU MWT2 RAL-LCG2 SWT2_CPB UKI-LT2-QMUL UKI-SCOTGRID-GLASGOW
source
AGLT2 1.10 2.69 0.65 0.55 6.68 0.90 1.24 1.23BNL-ATLAS 57.50 0.69CERN-PROD 63.41 25.96 5.74 3.69 6.10 0.94 0.76 10.33 6.98DESY-HH 3.73 46.52 1.47 3.91 0.51 4.56 5.02IllinoisHEP 0.75 2.45 0.58 0.66 36.50 1.05 4.73 1.34INFN-FRASCATI 1.44 0.72 4.00 0.77 0.62 0.95 0.79INFN-NAPOLI-ATLAS 4.37 9.86 12.71 1.45 2.62 0.40 5.13INFN-ROMA1 4.08 5.30 9.29 1.58 1.94 0.42 4.66 4.68LRZ-LMU 3.96 11.95 2.13 39.97 8.31 0.43 0.69 8.80 5.82MPPMU 4.16 10.44 2.23 39.90 8.30 0.43 0.71 4.90 5.87MWT2 1.02 2.59 0.71 0.74 13.09 1.30 2.08 1.35OU_OCHEP_SWT2 0.57 0.57 0.55 0.39 1.87 2.03 0.71 0.91praguelcg2 2.58 2.96 1.48 2.14 2.49 0.44 0.53 3.95 1.76RAL-LCG2 3.08 1.56 1.37 2.00 34.14 10.55 4.07RU-Protvino-IHEP 0.94 1.29 0.85 0.95 1.03 0.51 0.45 1.69 1.54SWT2_CPB 0.81 3.73 0.67 0.74 27.28 43.29 5.86 1.50UKI-LT2-QMUL 3.02 2.44 1.26 1.46 1.38 2.19 0.42 1.60 3.46UKI-SCOTGRID-ECDF 1.94 3.53 1.28 1.12 4.48 0.69 0.39 2.08 4.22UKI-SCOTGRID-GLASGOW 7.63 4.55 5.12 1.96 2.01 2.86 1.09 0.53 7.77 10.34UKI-SOUTHGRID-OX-HEP 3.19 3.80 1.52 3.82 4.32 5.46 3.29WT2 15.68 0.76 3.22 0.61 0.60 10.51 0.82 3.53 1.34
A place to get idea on rate a single job can expect to see.Are our pipes really this full?
Let’s see other sources of information.
IL I JA VUKOTIC [email protected] 4
COST MATRIX VS. PERFSONAR
TIM, TOKYO, MAY 2013
Comparison of just one link in one direction:
source AGLT destination MWT2
Perfsonar info at 4 h intervals.
Can it be worker nodes links are saturating?
IL I JA VUKOTIC [email protected] 5
MWT2SLAC
AGLT2 BNL
CERN
CLOGGING THE PIPES Using HC submitted jobs submitted to 4 ANALY queues
AGLT2, BNL, MWT2, SLAC
Each site runs 300 jobs of two types – 50 in parallel xrdcp 3 files randomly chosen from SMWZ datasets prepared for FDR from
others Reads 10% of events from 3 file randomly chosen from FDR SMWZ from
others
Uploads time to finish, events/s, MB/s for each job, pandaid so jobs can be investigated
All jobs submitted through FDR web interface http://ivukotic.web.cern.ch/ivukotic/FDR/index.asp
All in parallel to other HC stress tests
TIM, TOKYO, MAY 2013
IL I JA VUKOTIC [email protected] 7
COPY Clearly not limited by WN links Assuming just 30 simultaneous
jobs worst case delivery rates are:
BNL to CERN: 75 MB/s CERN to AGLT2: 170 MB/s MWT2 to AGLT2: 100 MB/s AGLT to CERN: 90 MB/s SLAC to BNL: 300 MB/s
Average WAN access ~ 300 MB/s
TIM, TOKYO, MAY 2013
MB/s BNL-ATLAS CERN-PROD MWT2 AGLT2 SLAC
source
BNL-ATLAS 86.09 2.51 13.41 8.97 CERN-PROD 11.53 76.38 10.56 5.76
MWT2 11.1 1.91 27.08 3.32 AGLT2 22.41 2.9 23.08 65.49 SLAC 9.87 2.06 8.76 6.38 82.71
IL I JA VUKOTIC [email protected] 8
READ Jobs were reading 10% of events using TTC
30MB
100% data are transferred and decompressed.
ROOT can decompress our D3PD at ~20 MB/s
Rates are the same as for xrdcp except when local access.
Over WAN one should expect at least 50% of CPU efficiency of local access.
Less than 100 simultaneous standard analysis jobs will saturate 10 Gb WAN link.
FAX needs to be used judiciously, can easily overwhelm weaker links
TIM, TOKYO, MAY 2013
READ destinationevents/s BNL-ATLAS CERN-PROD MWT2 AGLT2 SLAC
source
BNL-ATLAS 163.7 19.61 48.24 91.99 CERN-PROD 62.07 224.23 46.63 63.35
MWT2 116.15 11.69 141.07 34.94 AGLT2 75.41 17.42 179.38 SLAC 48.67 8.8 34.92 43.08 92.46
IL I JA VUKOTIC [email protected] 10
WAYS AHEAD
TIM, TOKYO, MAY 2013
Increase coverage, add redundancy, increase total bandwidth Enlargement
Increases performance, reduces bandwidth needs
Caching Cost matrix – smart FAX Smart network - Bandwidth requests, QOS assurance
Improve adoption rate Presenting, teaching, preaching
New services
Improve satisfaction FAX tuning
Application tuning
New services