Performance Network Monitoring for the LHC Grid
-
Upload
kennan-ferguson -
Category
Documents
-
view
35 -
download
0
description
Transcript of Performance Network Monitoring for the LHC Grid
![Page 1: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/1.jpg)
1
Performance Network Monitoring for the LHC Grid
Les Cottrell, SLAC
International ICFA Workshop on Grid Activities within Large Scale International Collaborations, Sinaia, Romania Oct 13-18, 2006
Partially funded by DOE/MICS for Internet End-to-end Performance Monitoring (IEPM), and by Internet2
![Page 2: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/2.jpg)
2
Why & Outline• Data intensive sciences (e.g. HEP) needs to move
large volumes of data worldwide– Requires understanding and effective use of fast networks– Requires continuous monitoring and interpretation
• For HEP LHC-OPN focus on tier 0, tier 1 and a few tier 2 sites, i.e. just a few sites
• Outline of talk:– What does monitoring provide?– Active E2E measurements today and some challenges– Visualization, forecasting, problem ID– Passive monitoring
• Netflow,
• Some conclusions
![Page 3: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/3.jpg)
3
Uses of Measurements• Automated problem identification & trouble shooting:
– Alerts for network administrators, e.g. • Baselines, bandwidth changes in time-series, iperf, SNMP
– Alerts for systems people• OS/Host metrics
• Forecasts for Grid Middleware, e.g. replica manager, data placement
• Engineering, planning, SLA (set & verify), expectations• Also (not addressed here):
– Security: spot anomalies, intrusion detection– Accounting
![Page 4: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/4.jpg)
4
Active E2E Monitoring
![Page 5: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/5.jpg)
5
E.g. Using Active IEPM-BW measurements
• Focus on high performance for a few hosts needing to send data to a small number of collaborator sites, e.g. HEP tiered model
• Makes regular measurements with probe tools– ping (RTT, connectivity), owamp (1 way delay) traceroute (routes) – pathchirp, pathload (available bandwidth)– iperf (one & multi-stream), thrulay, (achievable throughput)– supports bbftp, bbcp (file transfer applications, not network)
• Looking at GridFTP but complex requiring renewing certificates
– Choice of probes depends on importance of path, e.g.• For major paths (tier 0, 1 & some 2) use full suite• For tier 3 use just ping and traceroute
• Running at major HEP sites: CERN, SLAC, FNAL, BNL, Caltech, Taiwan, SNV to about 40 remote sites– http://www.slac.stanford.edu/comp/net/iepm-bw.slac.stanford.edu/slac_w
an_bw_tests.html
![Page 6: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/6.jpg)
6
IEPM-BW Measurement Topology• 40 target hosts in 13 countries• Bottlenecks vary from 0.5Mbits/s to 1Gbits/s• Traverse ~ 50 AS’, 15 major Internet providers• 5 targets at PoPs, rest at end sites
Taiwan
TWaren• Added Sunnyvale
for UltraLight• Adding FZK
Karlsruhe
![Page 7: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/7.jpg)
7
Top page
![Page 8: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/8.jpg)
8
Probes: Ping/traceroute• Ping still useful
– Is path connected/node reachable?– RTT, jitter, loss– Great for low performance links (e.g. Digital Divide), e.g.
AMP (NLANR)/PingER (SLAC)– Nothing to install, but blocking
• OWAMP/I2 similar but One Way– But needs server installed at other end and good timers– Now built into IEPM-BW
• Traceroute– Needs good visualization (traceanal/SLAC) – No use for dedicated λ layer 1 or 2
• However still want to know topology of paths
![Page 9: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/9.jpg)
9
Probes: Packet Pair Dispersion
Used by pathload, pathchirp, ABwE available bw• Send packets with known separation• See how separation changes due to bottleneck• Can be low network intrusive, e.g. ABwE only 20
packets/direction, also fast < 1 sec• From PAM paper, pathchirp more accurate than
ABwE, but– Ten times as long (10s vs 1s)– More network traffic (~factor of 10)
• Pathload factor of 10 again more– http://www.pam2005.org/PDF/34310310.pdf
• IEPM-BW now supports ABwE, Pathchirp, Pathload
Bottleneck
Min spacingAt bottleneck Spacing preserved
On higher speed links
![Page 10: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/10.jpg)
10
BUT…• Packet pair dispersion relies on accurate timing
of inter packet separation– At > 1Gbps this is getting beyond resolution of Unix
clocks– AND 10GE NICs are offloading function
• Coalescing interrupts, Large Send & Receive Offload, TOE
• Need to work with TOE vendors– Turn off offload (Neterion supports multiple channels, can
eliminate offload to get more accurate timing in host)– Do timing in NICs– No standards for interfaces
• Possibly use packet trains, e.g. pathneck
![Page 11: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/11.jpg)
11
Achievable Throughput• Use TCP or UDP to send as much data as can
memory to memory from source to destination
• Tools: iperf (bwctl/I2), netperf, thrulay (from Stas Shalunov/I2), udpmon …
• Pseudo file copy: Bbcp also has memory to memory mode to avoid disk/file problems
![Page 12: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/12.jpg)
12
BUT…• At 10Gbits/s on transatlantic path Slow start
takes over 6 seconds– To get 90% of measurement in congestion
avoidance need to measure for 1 minute (5.25 GBytes at 7Gbits/s (today’s typical performance)
• Needs scheduling to scale, even then …
• It’s not disk-to-disk or application-to application– So use bbcp, bbftp, or GridFTP
![Page 13: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/13.jpg)
13
AND …• For testbeds such as UltraLight,
UltraScienceNet etc. have to reserve the path– So the measurement infrastructure needs to add
capability to reserve the path (so need API to reservation application)
– OSCARS from ESnet developing a web services interface (http://www.es.net/oscars/):
• For lightweight have a “persistent” capability• For more intrusive, must reserve just before make
measurement
![Page 14: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/14.jpg)
14
Visualization & Forecasting in Real
World
![Page 15: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/15.jpg)
15
• Some are seasonal• Others are not• Events may affect
multiple-metrics
• Misconfigured windows• New path• Very noisy
Examples of real data
• Seasonal effects– Daily & weekly
Caltech: thrulay
Nov05
Mar060
800Mbps
UToronto: miperf
Nov05
Jan060
250
Mbps
UTDallas Pathchirp
thrulay
Mar-10-06 Mar-20-06iperf0
120
Mbps
• Events can be caused by host or site congestion• Few route changes result in bandwidth changes (~20%)• Many significant events are not associated with route
changes (~50%)
![Page 16: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/16.jpg)
16
Scattter plots & histograms
RT
T (
ms)
Throughput (Mbits/s)
Thrulay
Pathchirp & iperf (Mbps)
Th
rula
y (M
bp
s) Iperf
Pathchirp
Scatter plots: quickly identify correlations between metrics
Histograms: quickly identify variability or multimodality
PathchirpThrulay
![Page 17: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/17.jpg)
17
Changes in network topology (BGP) can result in dramatic changes in performance
Snapshot of traceroute summary table
Samples of traceroute trees generated from the table
ABwE measurement one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am
Drop in performance(From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech )
Back to original path
Changes detected by IEPM-Iperf and AbWE
Esnet-LosNettos segment in the path(100 Mbits/s)
Hour
Rem
ote
host
Dynamic BW capacity (DBC)
Cross-traffic (XT)
Available BW = (DBC-XT)
Mbit
s/s
Notes:1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:002. ESnet/GEANT working on routes from 2:00 to 14:003. A previous occurrence went un-noticed for 2 months4. Next step is to auto detect and notify
Los-Nettos (100Mbps)
![Page 18: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/18.jpg)
18
On the other hand• Route changes may affect the RTT (in yellow)• Yet have no noticeable effect on on available
bandwidth or throughput
Route changes
AvailableBandwidth
AchievableThroughput
![Page 19: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/19.jpg)
19
However…• Elegant graphics are great to understand
problems BUT:– Can be thousands of graphs to look at (many site
pairs, many devices, many metrics)– Need automated problem recognition AND
diagnosis
• So developing tools to reliably detect significant, persistent changes in performance– Initially using simple plateau algorithm to detect
step changes
![Page 20: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/20.jpg)
20
Seasonal Effects on events• Change in bandwidth (drops) between 19:00 &
22:00 Pacific Time (7:00-10:00am PK time)
• Causes more anomalous events around this time
![Page 21: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/21.jpg)
21
Forecasting• Over-provisioned
paths should have pretty flat time series
– Short/local term smoothing
– Long term linear trends
– Seasonal smoothing
• But seasonal trends (diurnal, weekly need to be accounted for) on about 10% of our paths
• Use Holt-Winters triple exponential weighted moving averages
![Page 22: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/22.jpg)
22
Experimental Alerting• Have false positives down to reasonable level
(few per week), so sending alerts to developers
• Saved in database
• Links to traceroutes, event analysis, time-series
![Page 23: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/23.jpg)
23
Passive• Active monitoring
– Pro: regularly spaced data on known paths, can make on-demand
– Con: adds data to network, can interfere with real data and measurements
• What about Passive?
![Page 24: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/24.jpg)
24
Netflow et. al.• Switch identifies flow by sce/dst ports, protocol• Cuts record for each flow:
– src, dst, ports, protocol, TOS, start, end time
• Collect records and analyze• Can be a lot of data to collect each day, needs lot cpu
– Hundreds of MBytes to GBytes
• No intrusive traffic, real: traffic, collaborators, applications• No accounts/pwds/certs/keys• No reservations etc• Characterize traffic: top talkers, applications, flow lengths etc.• LHC-OPN requires edge routers to provide Netflow data• Internet 2 backbone
– http://netflow.internet2.edu/weekly/
• SLAC:– www.slac.stanford.edu/comp/net/slac-netflow/html/SLAC-netflow.html
![Page 25: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/25.jpg)
25
Typical day’s flows• Very much work in
progress• Look at SLAC border• Typical day:
– ~ 28K flows/day– ~ 75 sites with > 100KB
bulk-data flows– Few hundred flows >
GByte
• Collect records for several weeks• Filter 40 major collaborator sites, big (> 100KBytes) flows, bulk
transport apps/ports (bbcp, bbftp, iperf, thrulay, scp, ftp …)
• Divide by remote site, aggregate parallel streams• Look at throughput distribution
![Page 26: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/26.jpg)
26
Netflow et. al. • Peaks at known capacities and RTTs
– RTTs might suggest windows not optimized, peaks at default OS window size(BW=Window/RTT)
![Page 27: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/27.jpg)
27
How many sites have enough flows?• In May ’05 found 15 sites at SLAC border with > 1440
(1/30 mins) flows– Maybe Enough for time series forecasting for seasonal
effects• Three sites (Caltech, BNL, CERN) were actively
monitored• Rest were “free”• Only 10% sites have
big seasonal effects in active measurement
• Remainder need fewer flows
• So promising
![Page 28: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/28.jpg)
28
Mining data for sites• Real application use (bbftp) for 4 months• Gives rough idea of throughput (and confidence) for
14 sites seen from SLAC
![Page 29: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/29.jpg)
29
Multi months
• Bbcp SLAC to PadovaBbcp throughput from SLAC to Padova
Fairly stable with time, large variance• Many non network related factors
![Page 30: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/30.jpg)
30
Netflow limitations• Use of dynamic ports makes harder to detect app.
– GridFTP, bbcp, bbftp can use fixed ports (but may not)– P2P often uses dynamic ports– Discriminate type of flow based on headers (not relying on
ports)• Types: bulk data, interactive …• Discriminators: inter-arrival time, length of flow, packet length,
volume of flow• Use machine learning/neural nets to cluster flows• E.g. http://www.pam2004.org/papers/166.pdf
• Aggregation of parallel flows (needs care, but not difficult)
• Can use for giving performance forecast– Unclear if can use for detecting steps in performance
![Page 31: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/31.jpg)
31
Conclusions• Some tools fail at higher speeds• Throughputs often depend on non-network
factors:– Host: interface speeds (DSL, 10Mbps Enet,
wireless), loads, resource congestion– Configurations (window sizes, hosts, number of
parallel streams)– Applications (disk/file vs mem-to-mem)
• Looking at distributions by site, often multi-modal
• Predictions may have large standard deviations• Need automated assist to diagnose events
![Page 32: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/32.jpg)
32
In Progress• Working on Netflow viz (currently at BNL & SLAC) then work
with other LHC sites to deploy• Add support for pathneck• Look at other forecasters: e.g. ARMA/ARIMA, maybe Kalman
filters, neural nets• Working on diagnosis of events
– Multi-metrics, multi-paths
• Signed collaborative agreement with Internet2 to collaborate with PerfSONAR– Provide web services access to IEPM data– Provide analysis forecasting and event detection to PerfSONAR data– Use PerfSONAR (e.g. router) data for diagnosis– Provide viz of PerfSONAR route information– Apply to LHCnet– Look at layer 1 & 2 information
![Page 33: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/33.jpg)
33
Questions, More information• Comparisons of Active Infrastructures:
– www.slac.stanford.edu/grp/scs/net/proposals/infra-mon.html • Some active public measurement infrastructures:
– www-iepm.slac.stanford.edu/– www-iepm.slac.stanford.edu/pinger/ – e2epi.internet2.edu/owamp/ – amp.nlanr.net/
• Monitoring tools– www.slac.stanford.edu/xorg/nmtf/nmtf-tools.html – www.caida.org/tools/ – Google for iperf, thrulay, bwctl, pathload, pathchirp
• Event detection– www.slac.stanford.edu/grp/scs/net/papers/noms/noms14224-122705-
d.doc
![Page 34: Performance Network Monitoring for the LHC Grid](https://reader035.fdocuments.us/reader035/viewer/2022062314/56813578550346895d9cdbed/html5/thumbnails/34.jpg)
34
Outline• Deployment, keeping in sync, management,
timeouts, killing hung processes, host OS/env different
• Implementation:– MySQL dbs for data and configuration (host, tools,
plotting etc.) info– Scheduler, prevents backup– Log files, analyze for troubles– Local target as a sanity check on monitor