Visual Flow Analysis: What do real-world problems look like? Brent Draney NERSC Center Division,...
-
Upload
sheila-caldwell -
Category
Documents
-
view
217 -
download
0
description
Transcript of Visual Flow Analysis: What do real-world problems look like? Brent Draney NERSC Center Division,...
Visual Flow Analysis: What do real-world problems look like?
Brent Draney
NERSC Center Division, LBNL2/07/06
2
What is NERSC
• DOE scientific computer center• Supports ~2000 scientists around the world
(mainly DOE and Universities)• Supports most major disciplines• Combined ~20-TFLOPS, 8.8 Petabytes• 10 Gigabit lan backbone and 10 Gigabit ESnet
uplink• O(100) sockets accounts for ~95% of bytes
transferred• O(5000) IP addresses in a single building but only
100 desktops
3
Network and Security Team(NAST)
• Enablers and Inhibitors of the network in one group– All responsibility is here
• Networking is responsible for end-to-end performance– Wherever the customer is– “Not our problem” is not sufficient or
acceptable
4
Performance tools
• Optical taps everywhere• Mobile crashcart with all types of
interfaces• Tcpdump, Tcptrace and Xplot• A lot of head scratching
Note: Analyzing a mult-Gigabyte flow packet by packet is impossible!
5
Simple Example
Consistent Slope
No anomalies
Protocol limited
6
Simple Example Detail
PacketsACK’ed data
Sender Advertised Window
7
Brick Wall Example
Few anomalies
Transfer Hangs
8
Brick Wall Detail
One Dropped packet
3 Dupe ACK’s
No Retransmit, Ever
9
Brick Wall Example Troubleshooting and Answer
• Troubleshooting– Sender verifies that retransmits are sent– “Non-tuned” traffic never fails
• Answer– A stateful firewall tracking TCP sequence
numbers didn’t believe that the retransmits were legitimate
10
Perverse Example
Holy Mackerel!
Jumbo Packets
Retransmits
11
Perverse Example
Is PMTU working? Yes[Scratch Head]
12
Perverse Example Troubleshooting and Answer
• Troubleshooting– Review sender configuration– PMTU installed in routing table correctly? Yes– TCPdump on host shows 64K packets leaving a 9k
interface– “Large Send” enabled offloading packet creation to NIC
• Answer– NIC doesn’t have access to routing table
• Route MTU not honored– Retransmits handled by kernel
• Route MTU Honored
13
Conclusions
• Diverse problems have the same general feel of poor performance.
• Flow visualization can isolate problems quickly.• Very large flows require visualization.• Protocol limits (host buffers, sftp …) are still a
major cause but are becoming less so.• New and “creative” methods to achieve higher
performance can create strangeness and are becoming more of a problem.
• Seeing is believing. Pictures are convincing (to users, system admins and network admins).